changeset 260:999f2966a3e5 Dev_ML

new branch -- Dev_ML -- for making VMS take langlets whose constructs can be mixed
author Sean Halle <seanhalle@yahoo.com>
date Wed, 19 Sep 2012 23:12:44 -0700
parents 0dc0b8653902
children dafae55597ce
files AnimationMaster.c CoreController.c Defines/MEAS__macros_to_be_moved_to_langs.h Defines/PR_defs.h Defines/PR_defs__HW_constants.h Defines/VMS_defs.h Defines/VMS_defs__HW_constants.h HW_Dependent_Primitives/PR__HW_measurement.c HW_Dependent_Primitives/PR__HW_measurement.h HW_Dependent_Primitives/PR__primitives.c HW_Dependent_Primitives/PR__primitives.h HW_Dependent_Primitives/PR__primitives_asm.s HW_Dependent_Primitives/VMS__HW_measurement.c HW_Dependent_Primitives/VMS__HW_measurement.h HW_Dependent_Primitives/VMS__primitives.c HW_Dependent_Primitives/VMS__primitives.h HW_Dependent_Primitives/VMS__primitives_asm.s PR.h PR__PI.c PR__WL.c PR__int.c PR__startup_and_shutdown.c PR_primitive_data_types.h Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h Services_Offered_by_PR/Measurement_and_Stats/probes.c Services_Offered_by_PR/Measurement_and_Stats/probes.h Services_Offered_by_PR/Memory_Handling/vmalloc.c Services_Offered_by_PR/Memory_Handling/vmalloc.h Services_Offered_by_VMS/Debugging/DEBUG__macros.h Services_Offered_by_VMS/Lang_Constructs/VMS_Lang.h Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h Services_Offered_by_VMS/Measurement_and_Stats/probes.c Services_Offered_by_VMS/Measurement_and_Stats/probes.h Services_Offered_by_VMS/Memory_Handling/vmalloc.c Services_Offered_by_VMS/Memory_Handling/vmalloc.h VMS.h VMS__PI.c VMS__WL.c VMS__int.c VMS__startup_and_shutdown.c VMS_primitive_data_types.h __README__Code_Overview.txt
diffstat 42 files changed, 4729 insertions(+), 3923 deletions(-) [+]
line diff
     1.1 --- a/AnimationMaster.c	Mon Sep 03 03:34:54 2012 -0700
     1.2 +++ b/AnimationMaster.c	Wed Sep 19 23:12:44 2012 -0700
     1.3 @@ -9,7 +9,7 @@
     1.4  #include <stdio.h>
     1.5  #include <stddef.h>
     1.6  
     1.7 -#include "VMS.h"
     1.8 +#include "PR.h"
     1.9  
    1.10  
    1.11  
    1.12 @@ -20,11 +20,39 @@
    1.13   * 
    1.14   *Within the code, this is the top-level-function of the masterVPs, and
    1.15   * runs when the coreController has no more slave VPs.  It's job is to
    1.16 - * refill the animation slots with slaves.
    1.17 + * refill the animation slots with slaves that have work.
    1.18   *
    1.19 - *To do this, it scans the animation slots for just-completed slaves.
    1.20 - * Each of these has a request in it.  So, the master hands each to the
    1.21 - * plugin's request handler.
    1.22 + *There are multiple versions of the master, each tuned to a specific 
    1.23 + * combination of modes.  This keeps the master simple, with reduced overhead,
    1.24 + * when the application is not using the extra complexity.
    1.25 + * 
    1.26 + *As of Sept 2012, the versions available will be:
    1.27 + * 1) Single langauge, which only exposes slaves (such as SSR or Vthread)
    1.28 + * 2) Single language, which only exposes tasks  (such as pure dataflow)
    1.29 + * 3) Single language, which exposes both (like Cilk, StarSs, and OpenMP)
    1.30 + * 4) Multi-language, which always assumes both tasks and slaves
    1.31 + * 5) Multi-language and multi-process, which also assumes both tasks and slaves
    1.32 + *
    1.33 + * 
    1.34 + *
    1.35 + */
    1.36 +
    1.37 +
    1.38 +//=====================  The versions of the Animation Master  =================
    1.39 +//
    1.40 +//==============================================================================
    1.41 +
    1.42 +/* 1) This version is for a single language, that has only slaves, no tasks,
    1.43 + *    such as Vthread or SSR.
    1.44 + *This version is for when an application has only a single language, and
    1.45 + * that language exposes slaves explicitly (as opposed to a task based 
    1.46 + * language like pure dataflow).
    1.47 + * 
    1.48 + *
    1.49 + *It scans the animation slots for just-completed slaves.
    1.50 + * Each completed slave has a request in it.  So, the master hands each to
    1.51 + * the plugin's request handler (there is only one plugin, because only one
    1.52 + * lang).
    1.53   *Each request represents a language construct that has been encountered
    1.54   * by the application code in the slave. Passing the request to the
    1.55   * request handler is how that language construct's behavior gets invoked.
    1.56 @@ -77,24 +105,24 @@
    1.57   *There is a separate masterVP for each core, but a single semantic
    1.58   * environment shared by all cores.  Each core also has its own scheduling
    1.59   * slots, which are used to communicate slaves between animationMaster and
    1.60 - * coreController.  There is only one global variable, _VMSMasterEnv, which
    1.61 + * coreController.  There is only one global variable, _PRMasterEnv, which
    1.62   * holds the semantic env and other things shared by the different
    1.63   * masterVPs.  The request handler and Assigner are registered with
    1.64   * the animationMaster by the language's init function, and a pointer to
    1.65 - * each is in the _VMSMasterEnv. (There are also some pthread related global
    1.66 - * vars, but they're only used during init of VMS).
    1.67 - *VMS gains control over the cores by essentially "turning off" the OS's
    1.68 + * each is in the _PRMasterEnv. (There are also some pthread related global
    1.69 + * vars, but they're only used during init of PR).
    1.70 + *PR gains control over the cores by essentially "turning off" the OS's
    1.71   * scheduler, using pthread pin-to-core commands.
    1.72   *
    1.73   *The masterVPs are created during init, with this animationMaster as their
    1.74   * top level function.  The masterVPs use the same SlaveVP data structure,
    1.75   * even though they're not slave VPs.
    1.76   *A "seed slave" is also created during init -- this is equivalent to the
    1.77 - * "main" function in C, and acts as the entry-point to the VMS-language-
    1.78 + * "main" function in C, and acts as the entry-point to the PR-language-
    1.79   * based application.
    1.80 - *The masterVPs shared a single system-wide master-lock, so only one
    1.81 + *The masterVPs share a single system-wide master-lock, so only one
    1.82   * masterVP may be animated at a time.
    1.83 - *The core controllers access _VMSMasterEnv to get the masterVP, and when
    1.84 + *The core controllers access _PRMasterEnv to get the masterVP, and when
    1.85   * they start, the slots are all empty, so they run their associated core's
    1.86   * masterVP.  The first of those to get the master lock sees the seed slave
    1.87   * in the shared semantic environment, so when it runs the Assigner, that
    1.88 @@ -104,14 +132,14 @@
    1.89   * constructs to create more slaves, and so on.  Each of those constructs
    1.90   * causes the seed slave to suspend, switching over to the core controller,
    1.91   * which eventually switches to the masterVP, which executes the 
    1.92 - * request handler, which uses VMS primitives to carry out the creation of
    1.93 + * request handler, which uses PR primitives to carry out the creation of
    1.94   * new slave VPs, which are marked as ready for the Assigner, and so on..
    1.95   * 
    1.96   *On animation slots, and system behavior:
    1.97 - * A request may linger in a animation slot for a long time while
    1.98 + * A request may linger in an animation slot for a long time while
    1.99   * the slaves in the other slots are animated.  This only becomes a problem
   1.100   * when such a request is a choke-point in the constraints, and is needed
   1.101 - * to free work for *other* cores.  To reduce this occurance, the number
   1.102 + * to free work for *other* cores.  To reduce this occurrence, the number
   1.103   * of animation slots should be kept low.  In balance, having multiple
   1.104   * animation slots amortizes the overhead of switching to the masterVP and
   1.105   * executing the animationMaster code, which drives for more than one. In
   1.106 @@ -163,7 +191,29 @@
   1.107         HOLISTIC__Record_AppResponder_start;
   1.108                 MEAS__startReqHdlr;
   1.109                 
   1.110 -            //process the requests made by the slave (held inside slave struc)
   1.111 +           currSlot->workIsDone         = FALSE;
   1.112 +            currSlot->needsSlaveAssigned = TRUE;
   1.113 +            SlaveVP *currSlave = currSlot->slaveAssignedToSlot;
   1.114 +            
   1.115 +	justAddedReqHdlrChg();
   1.116 +			//handle the request, either by VMS or by the language
   1.117 +            if( currSlave->requests->reqType != LangReq )
   1.118 +             {    //The request is a standard VMS one, not one defined by the
   1.119 +                  // language, so VMS handles it, then queues slave to be assigned
   1.120 +               handleReqInVMS( currSlave );
   1.121 +               writePrivQ( currSlave, VMSReadyQ ); //Q slave to be assigned below
   1.122 +             }
   1.123 +            else
   1.124 +             {       MEAS__startReqHdlr;
   1.125 +
   1.126 +                  //Language handles request, which is held inside slave struc
   1.127 +               (*requestHandler)( currSlave, semanticEnv );
   1.128 +
   1.129 +                     MEAS__endReqHdlr;
   1.130 +             }
   1.131 +          }
   1.132 +
   1.133 +		  //process the requests made by the slave (held inside slave struc)
   1.134           (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
   1.135           
   1.136           HOLISTIC__Record_AppResponder_end;
   1.137 @@ -196,3 +246,756 @@
   1.138     }//while(1) 
   1.139   }
   1.140  
   1.141 +
   1.142 +/* 2)  This version is for a single language that has only tasks, which 
   1.143 + *     cannot be suspended.
   1.144 + */
   1.145 +void animationMaster( void *initData, SlaveVP *masterVP )
   1.146 + { 
   1.147 +      //Used while scanning and filling animation slots
   1.148 +   int32           slotIdx, numSlotsFilled;
   1.149 +   AnimSlot       *currSlot, **animSlots;
   1.150 +   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
   1.151 +   
   1.152 +      //Local copies, for performance
   1.153 +   MasterEnv      *masterEnv;
   1.154 +   SlaveAssigner   slaveAssigner;
   1.155 +   RequestHandler  requestHandler;
   1.156 +   PRSemEnv       *semanticEnv;
   1.157 +   int32           thisCoresIdx;
   1.158 +
   1.159 +   //#ifdef  MODE__MULTI_LANG
   1.160 +   SlaveVP        *slave;
   1.161 +   PRProcess      *process;
   1.162 +   PRConstrEnvHolder *constrEnvHolder;
   1.163 +   int32           langMagicNumber;
   1.164 +   //#endif
   1.165 +   
   1.166 +   //======================== Initializations ========================
   1.167 +   masterEnv        = (MasterEnv*)_PRMasterEnv;
   1.168 +   
   1.169 +   thisCoresIdx     = masterVP->coreAnimatedBy;
   1.170 +   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
   1.171 +
   1.172 +   requestHandler   = masterEnv->requestHandler;
   1.173 +   slaveAssigner    = masterEnv->slaveAssigner;
   1.174 +   semanticEnv      = masterEnv->semanticEnv;
   1.175 +   
   1.176 +      //initialize, for non-multi-lang, non multi-proc case
   1.177 +      // default handler gets put into master env by a registration call by lang
   1.178 +   endTaskHandler   = masterEnv->defaultTaskHandler;
   1.179 +   
   1.180 +      HOLISTIC__Insert_Master_Global_Vars;
   1.181 +   
   1.182 +   //======================== animationMaster ========================
   1.183 +   //Do loop gets requests handled and work assigned to slots..
   1.184 +   // work can either be a task or a resumed slave
   1.185 +   //Having two cases makes this logic complex.. can be finishing either, and 
   1.186 +   // then the next available work may be either.. so really have two distinct
   1.187 +   // loops that are inter-twined.. 
   1.188 +   while(1){
   1.189 +       
   1.190 +      MEAS__Capture_Pre_Master_Point
   1.191 +
   1.192 +      //Scan the animation slots
   1.193 +   numSlotsFilled = 0;
   1.194 +   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
   1.195 +    {
   1.196 +      currSlot = animSlots[ slotIdx ];
   1.197 +
   1.198 +         //Check if newly-done slave in slot, which will need request handled
   1.199 +      if( currSlot->workIsDone )
   1.200 +       { currSlot->workIsDone = FALSE;
   1.201 +       
   1.202 +               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
   1.203 +               MEAS__startReqHdlr;
   1.204 +               
   1.205 +         
   1.206 +            //process the request made by the slave (held inside slave struc)
   1.207 +         slave = currSlot->slaveAssignedToSlot;
   1.208 +         
   1.209 +            //check if the completed work was a task..
   1.210 +         if( slave->taskMetaInfo->isATask )
   1.211 +          {
   1.212 +             if( slave->reqst->type == TaskEnd ) 
   1.213 +              {    //do task end handler, which is registered separately
   1.214 +                   //note, end hdlr may use semantic data from reqst..
   1.215 +                //#ifdef  MODE__MULTI_LANG
   1.216 +                   //get end-task handler
   1.217 +                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
   1.218 +                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
   1.219 +                //#endif
   1.220 +                (*taskEndHandler)( slave, semanticEnv );
   1.221 +                
   1.222 +                goto AssignWork;
   1.223 +              }
   1.224 +             else  //is a task, and just suspended
   1.225 +              {    //turn slot slave into free task slave & make replacement
   1.226 +                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
   1.227 +                
   1.228 +                //goto normal slave request handling
   1.229 +                goto SlaveReqHandling; 
   1.230 +              }
   1.231 +          }
   1.232 +         else //is a slave that suspended
   1.233 +          {
   1.234 +          SlaveReqHandling:
   1.235 +            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
   1.236 +         
   1.237 +               HOLISTIC__Record_AppResponder_end;
   1.238 +               MEAS__endReqHdlr;
   1.239 +               
   1.240 +            goto AssignWork;
   1.241 +          }
   1.242 +       } //if has suspended slave that needs handling
   1.243 +      
   1.244 +         //if slot empty, hand to Assigner to fill with a slave
   1.245 +      if( currSlot->needsSlaveAssigned )
   1.246 +       {    //Call plugin's Assigner to give slot a new slave
   1.247 +               HOLISTIC__Record_Assigner_start;
   1.248 +               
   1.249 +       AssignWork:
   1.250 +     
   1.251 +         assignedSlaveVP = assignWork( semanticEnv, currSlot );
   1.252 +       
   1.253 +            //put the chosen slave into slot, and adjust flags and state
   1.254 +         if( assignedSlaveVP != NULL )
   1.255 +          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
   1.256 +            assignedSlaveVP->animSlotAssignedTo = currSlot;
   1.257 +            currSlot->needsSlaveAssigned  = FALSE;
   1.258 +            numSlotsFilled               += 1;
   1.259 +          }
   1.260 +         else
   1.261 +          {
   1.262 +            currSlot->needsSlaveAssigned  = TRUE; //local write
   1.263 +          }
   1.264 +               HOLISTIC__Record_Assigner_end;
   1.265 +       }//if slot needs slave assigned
   1.266 +    }//for( slotIdx..
   1.267 +
   1.268 +         MEAS__Capture_Post_Master_Point;
   1.269 +   
   1.270 +   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
   1.271 +   flushRegisters();
   1.272 +   }//while(1) 
   1.273 + }
   1.274 +
   1.275 +
   1.276 +/*This is the master when just multi-lang, but not multi-process mode is on.
   1.277 + * This version has to handle both tasks and slaves, and do extra work of 
   1.278 + * looking up the semantic env and handlers to use, for each completed bit of 
   1.279 + * work.
   1.280 + *It also has to search through the semantic envs to find one with work,
   1.281 + * then ask that env's assigner to return a unit of that work.
   1.282 + * 
   1.283 + *The language is written to startup in the same way as if it were the only
   1.284 + * language in the app, and it operates in the same way,
   1.285 + * the only difference between single language and multi-lang is here, in the
   1.286 + * master.
   1.287 + *This invisibility to mode is why the language has to use registration calls
   1.288 + * for everything during startup -- those calls do different things depending
   1.289 + * on whether it's single-language or multi-language mode.
   1.290 + * 
   1.291 + *In this version of the master, work can either be a task or a resumed slave
   1.292 + *Having two cases makes this logic complex.. can be finishing either, and
   1.293 + * then the next available work may be either.. so really have two distinct 
   1.294 + * loops that are inter-twined.. 
   1.295 + * 
   1.296 + *Some special cases:
   1.297 + * A task-end is a special case for a few reasons (below).
   1.298 + * A task-end can't block a slave (can't cause it to "logically suspend")
   1.299 + * A task available for work can only be assigned to a special slave, which 
   1.300 + *   has been set aside for doing tasks, one such task-slave is always 
   1.301 + *   assigned to each slot. So, when a task ends, a new task is assigned to
   1.302 + *   that slot's task-slave right away.  
   1.303 + * But if no tasks are available, then have to switch over to looking at
   1.304 + *   slaves to find one ready to resume, to find work for the slot.
   1.305 + * If a task just suspends, not ends, then its task-slave is no longer 
   1.306 + *   available to take new tasks, so a new task-slave has to be assigned to
   1.307 + *   that slot.  Then the slave of the suspended task is turned into a free
   1.308 + *   task-slave and request handling is done on it as if it were a slave 
   1.309 + *   that suspended.
   1.310 + * After request handling, do the same sequence of looking for a task to be
   1.311 + *   work, and if none, look for a slave ready to resume, as work for the slot.
   1.312 + * If a slave suspends, handle its request, then look for work.. first for a
   1.313 + *   task to assign, and if none, slaves ready to resume.
   1.314 + * Another special case is when task-end is done on a free task-slave.. in
   1.315 + *   that case, the slave has no more work and no way to get more.. so place
   1.316 + *   it into a recycle queue.
   1.317 + * If no work is found of either type, then do a special thing to prune down
   1.318 + *   the extra slaves in the recycle queue, just so don't get too many..
   1.319 + * 
   1.320 + *The multi-lang thing complicates matters..  
   1.321 + *
   1.322 + *For request handling, it means have to first fetch the semantic environment
   1.323 + * of the language, and then do the request handler pointed to by that
   1.324 + * semantic env.
   1.325 + *For assigning, things get more complex because of competing goals..  One
   1.326 + * goal is for language specific stuff to be used during assignment, so
   1.327 + * assigner can make higher quality decisions..  but with multiple languages,
   1.328 + * which only get mixed in the application, the assigners can't be written
   1.329 + * with knowledge of each other.  So, they can only make localized decisions,
   1.330 + * and so different language's assigners may interfere with each other..
   1.331 + * 
   1.332 + *So, have some possibilities available:
   1.333 + *1) can have a fixed scheduler in the proto-runtime, that all the
   1.334 + * languages give their work to..  (but then lose language-specific info, 
   1.335 + * there is a standard PR format for assignment info, and the langauge 
   1.336 + * attaches this to the work-unit when it gives it to PR.. also have issue
   1.337 + * with HWSim, which uses a priority Q instead of FIFO, and requests can 
   1.338 + * "undo" previous work put in, so request handlers need way to manipulate
   1.339 + * the work-holding Q..) (this might be fudgeable with
   1.340 + * HWSim, if the master did a lang-supplied callback each time it assigns a
   1.341 + * unit to a slot..  then HWSim can keep exactly one unit of work in PR's
   1.342 + * queue at a time..  but this is quite hack-like.. or perhaps HWSim supplies
   1.343 + * a task-end handler that kicks the next unit of work from HWSim internal
   1.344 + * priority queue, over to PR readyQ)
   1.345 + *2) can have each language have its own semantic env, that holds its own
   1.346 + * work, which is assigned by its own assigner.. then the master searches
   1.347 + * through all the semantic envs to find one with work and asks it give work..
   1.348 + * (this has downside of blinding assigners to each other.. but does work
   1.349 + * for HWSim case)
   1.350 + *3) could make PR have a different readyQ for each core, and ask the lang
   1.351 + * to put work to the core it prefers.. but the work may be moved by PR if
   1.352 + * needed, say if one core idles for too long. This is a hybrid approach, 
   1.353 + * letting the language decide which core, but PR keeps the work and does it
   1.354 + * FIFO style.. (this might als be fudgeable with HWSim, in similar fashion, 
   1.355 + * but it would be complicated by having to track cores separately) 
   1.356 + *
   1.357 + *Choosing 2, to keep compatibility with single-lang mode..  it allows the same
   1.358 + * assigner to be used for single-lang as for multi-lang..  the overhead of
   1.359 + * the extra master search for work is part of the price of the flexibility,
   1.360 + * but should be fairly small.. takes the first env that has work available, 
   1.361 + * and whatever it returns is assigned to the slot..
   1.362 + * 
   1.363 + *As a hybrid, giving an option for a unified override assigner to be registered
   1.364 + * and used..  This allows something like a static analysis to detect
   1.365 + * which languages are grouped together, and then analyze the pattern of 
   1.366 + * construct calls, and generate a custom assigner that uses info from all
   1.367 + * the languages in a unified way..  Don't really expect this to happen, 
   1.368 + * but making it possible.
   1.369 + */
   1.370 +#ifdef  MODE__MULTI_LANG
   1.371 +void animationMaster( void *initData, SlaveVP *masterVP )
   1.372 + { 
   1.373 +      //Used while scanning and filling animation slots
   1.374 +   int32           slotIdx, numSlotsFilled;
   1.375 +   AnimSlot       *currSlot, **animSlots;
   1.376 +   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
   1.377 +   
   1.378 +      //Local copies, for performance
   1.379 +   MasterEnv      *masterEnv;
   1.380 +   SlaveAssigner   slaveAssigner;
   1.381 +   RequestHandler  requestHandler;
   1.382 +   PRSemEnv       *semanticEnv;
   1.383 +   int32           thisCoresIdx;
   1.384 +
   1.385 +   //#ifdef  MODE__MULTI_LANG
   1.386 +   SlaveVP        *slave;
   1.387 +   PRProcess      *process;
   1.388 +   PRConstrEnvHolder *constrEnvHolder;
   1.389 +   int32           langMagicNumber;
   1.390 +   //#endif
   1.391 +   
   1.392 +   //======================== Initializations ========================
   1.393 +   masterEnv        = (MasterEnv*)_PRMasterEnv;
   1.394 +   
   1.395 +   thisCoresIdx     = masterVP->coreAnimatedBy;
   1.396 +   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
   1.397 +
   1.398 +   requestHandler   = masterEnv->requestHandler;
   1.399 +   slaveAssigner    = masterEnv->slaveAssigner;
   1.400 +   semanticEnv      = masterEnv->semanticEnv;
   1.401 +   
   1.402 +      //initialize, for non-multi-lang, non multi-proc case
   1.403 +      // default handler gets put into master env by a registration call by lang
   1.404 +   endTaskHandler   = masterEnv->defaultTaskHandler;
   1.405 +   
   1.406 +      HOLISTIC__Insert_Master_Global_Vars;
   1.407 +   
   1.408 +   //======================== animationMaster ========================
   1.409 +   //Do loop gets requests handled and work assigned to slots..
   1.410 +   // work can either be a task or a resumed slave
   1.411 +   //Having two cases makes this logic complex.. can be finishing either, and 
   1.412 +   // then the next available work may be either.. so really have two distinct
   1.413 +   // loops that are inter-twined.. 
   1.414 +   while(1){
   1.415 +       
   1.416 +      MEAS__Capture_Pre_Master_Point
   1.417 +
   1.418 +      //Scan the animation slots
   1.419 +   numSlotsFilled = 0;
   1.420 +   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
   1.421 +    {
   1.422 +      currSlot = animSlots[ slotIdx ];
   1.423 +
   1.424 +         //Check if newly-done slave in slot, which will need request handled
   1.425 +      if( currSlot->workIsDone )
   1.426 +       { currSlot->workIsDone = FALSE;
   1.427 +       
   1.428 +               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
   1.429 +               MEAS__startReqHdlr;
   1.430 +               
   1.431 +         
   1.432 +            //process the request made by the slave (held inside slave struc)
   1.433 +         slave = currSlot->slaveAssignedToSlot;
   1.434 +         
   1.435 +            //check if the completed work was a task..
   1.436 +         if( slave->taskMetaInfo->isATask )
   1.437 +          {
   1.438 +             if( slave->reqst->type == TaskEnd ) 
   1.439 +              {    //do task end handler, which is registered separately
   1.440 +                   //note, end hdlr may use semantic data from reqst..
   1.441 +                //#ifdef  MODE__MULTI_LANG
   1.442 +                   //get end-task handler
   1.443 +                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
   1.444 +                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
   1.445 +                //#endif
   1.446 +                (*taskEndHandler)( slave, semanticEnv );
   1.447 +                
   1.448 +                goto AssignWork;
   1.449 +              }
   1.450 +             else  //is a task, and just suspended
   1.451 +              {    //turn slot slave into free task slave & make replacement
   1.452 +                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
   1.453 +                
   1.454 +                //goto normal slave request handling
   1.455 +                goto SlaveReqHandling; 
   1.456 +              }
   1.457 +          }
   1.458 +         else //is a slave that suspended
   1.459 +          {
   1.460 +          SlaveReqHandling:
   1.461 +            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
   1.462 +         
   1.463 +               HOLISTIC__Record_AppResponder_end;
   1.464 +               MEAS__endReqHdlr;
   1.465 +               
   1.466 +            goto AssignWork;
   1.467 +          }
   1.468 +       } //if has suspended slave that needs handling
   1.469 +      
   1.470 +         //if slot empty, hand to Assigner to fill with a slave
   1.471 +      if( currSlot->needsSlaveAssigned )
   1.472 +       {    //Call plugin's Assigner to give slot a new slave
   1.473 +               HOLISTIC__Record_Assigner_start;
   1.474 +               
   1.475 +       AssignWork:
   1.476 +     
   1.477 +         assignedSlaveVP = assignWork( semanticEnv, currSlot );
   1.478 +       
   1.479 +            //put the chosen slave into slot, and adjust flags and state
   1.480 +         if( assignedSlaveVP != NULL )
   1.481 +          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
   1.482 +            assignedSlaveVP->animSlotAssignedTo = currSlot;
   1.483 +            currSlot->needsSlaveAssigned  = FALSE;
   1.484 +            numSlotsFilled               += 1;
   1.485 +          }
   1.486 +         else
   1.487 +          {
   1.488 +            currSlot->needsSlaveAssigned  = TRUE; //local write
   1.489 +          }
   1.490 +               HOLISTIC__Record_Assigner_end;
   1.491 +       }//if slot needs slave assigned
   1.492 +    }//for( slotIdx..
   1.493 +
   1.494 +         MEAS__Capture_Post_Master_Point;
   1.495 +   
   1.496 +   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
   1.497 +   flushRegisters();
   1.498 +   }//while(1) 
   1.499 + }
   1.500 +#endif //MODE__MULTI_LANG
   1.501 +
   1.502 +
   1.503 +
   1.504 +//This is the master when both multi-lang and multi-process modes are turned on
   1.505 +//#ifdef MODE__MULTI_LANG
   1.506 +//#ifdef MODE__MULTI_PROCESS
   1.507 +void animationMaster( void *initData, SlaveVP *masterVP )
   1.508 + { 
   1.509 +      //Used while scanning and filling animation slots
   1.510 +   int32           slotIdx, numSlotsFilled;
   1.511 +   AnimSlot       *currSlot, **animSlots;
   1.512 +   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
   1.513 +   
   1.514 +      //Local copies, for performance
   1.515 +   MasterEnv      *masterEnv;
   1.516 +   SlaveAssigner   slaveAssigner;
   1.517 +   RequestHandler  requestHandler;
   1.518 +   PRSemEnv       *semanticEnv;
   1.519 +   int32           thisCoresIdx;
   1.520 +
   1.521 +   SlaveVP        *slave;
   1.522 +   PRProcess      *process;
   1.523 +   PRConstrEnvHolder *constrEnvHolder;
   1.524 +   int32           langMagicNumber;
   1.525 +   
   1.526 +   //======================== Initializations ========================
   1.527 +   masterEnv        = (MasterEnv*)_PRMasterEnv;
   1.528 +   
   1.529 +   thisCoresIdx     = masterVP->coreAnimatedBy;
   1.530 +   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
   1.531 +
   1.532 +   requestHandler   = masterEnv->requestHandler;
   1.533 +   slaveAssigner    = masterEnv->slaveAssigner;
   1.534 +   semanticEnv      = masterEnv->semanticEnv;
   1.535 +   
   1.536 +      //initialize, for non-multi-lang, non multi-proc case
   1.537 +      // default handler gets put into master env by a registration call by lang
   1.538 +   endTaskHandler   = masterEnv->defaultTaskHandler;
   1.539 +   
   1.540 +      HOLISTIC__Insert_Master_Global_Vars;
   1.541 +   
   1.542 +   //======================== animationMaster ========================
   1.543 +   //Do loop gets requests handled and work assigned to slots..
   1.544 +   // work can either be a task or a resumed slave
   1.545 +   //Having two cases makes this logic complex.. can be finishing either, and 
   1.546 +   // then the next available work may be either.. so really have two distinct
   1.547 +   // loops that are inter-twined.. 
   1.548 +   while(1){
   1.549 +       
   1.550 +      MEAS__Capture_Pre_Master_Point
   1.551 +
   1.552 +      //Scan the animation slots
   1.553 +   numSlotsFilled = 0;
   1.554 +   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
   1.555 +    {
   1.556 +      currSlot = animSlots[ slotIdx ];
   1.557 +
   1.558 +         //Check if newly-done slave in slot, which will need request handled
   1.559 +      if( currSlot->workIsDone )
   1.560 +       { currSlot->workIsDone = FALSE;
   1.561 +       
   1.562 +               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
   1.563 +               MEAS__startReqHdlr;
   1.564 +               
   1.565 +         
   1.566 +            //process the request made by the slave (held inside slave struc)
   1.567 +         slave = currSlot->slaveAssignedToSlot;
   1.568 +         
   1.569 +            //check if the completed work was a task..
   1.570 +         if( slave->taskMetaInfo->isATask )
   1.571 +          {
   1.572 +             if( slave->reqst->type == TaskEnd ) 
   1.573 +              {    //do task end handler, which is registered separately
   1.574 +                   //note, end hdlr may use semantic data from reqst..
   1.575 +                   //get end-task handler
   1.576 +                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
   1.577 +                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
   1.578 +                
   1.579 +                (*taskEndHandler)( slave, semanticEnv );
   1.580 +                
   1.581 +                goto AssignWork;
   1.582 +              }
   1.583 +             else  //is a task, and just suspended
   1.584 +              {    //turn slot slave into free task slave & make replacement
   1.585 +                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
   1.586 +                
   1.587 +                //goto normal slave request handling
   1.588 +                goto SlaveReqHandling; 
   1.589 +              }
   1.590 +          }
   1.591 +         else //is a slave that suspended
   1.592 +          {
   1.593 +             
   1.594 +          SlaveReqHandling:
   1.595 +            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
   1.596 +         
   1.597 +               HOLISTIC__Record_AppResponder_end;
   1.598 +               MEAS__endReqHdlr;
   1.599 +               
   1.600 +            goto AssignWork;
   1.601 +          }
   1.602 +       } //if has suspended slave that needs handling
   1.603 +      
   1.604 +         //if slot empty, hand to Assigner to fill with a slave
   1.605 +      if( currSlot->needsSlaveAssigned )
   1.606 +       {    //Scan sem environs, looking for one with ready work.
   1.607 +            // call the Assigner for that sem Env, to give slot a new slave
   1.608 +               HOLISTIC__Record_Assigner_start;
   1.609 +               
   1.610 +       AssignWork:
   1.611 +     
   1.612 +         assignedSlaveVP = assignWork( semanticEnv, currSlot );
   1.613 +       
   1.614 +            //put the chosen slave into slot, and adjust flags and state
   1.615 +         if( assignedSlaveVP != NULL )
   1.616 +          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
   1.617 +            assignedSlaveVP->animSlotAssignedTo = currSlot;
   1.618 +            currSlot->needsSlaveAssigned  = FALSE;
   1.619 +            numSlotsFilled               += 1;
   1.620 +          }
   1.621 +         else
   1.622 +          {
   1.623 +            currSlot->needsSlaveAssigned  = TRUE; //local write
   1.624 +          }
   1.625 +               HOLISTIC__Record_Assigner_end;
   1.626 +       }//if slot needs slave assigned
   1.627 +    }//for( slotIdx..
   1.628 +
   1.629 +         MEAS__Capture_Post_Master_Point;
   1.630 +   
   1.631 +   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
   1.632 +   flushRegisters();
   1.633 +   }//while(1) 
   1.634 + }
   1.635 +#endif  //MODE__MULTI_LANG
   1.636 +#endif  //MODE__MULTI_PROCESS
   1.637 +
   1.638 +
   1.639 +/*This does three things:
   1.640 + * 1) ask for a slave ready to resume
   1.641 + * 2) if none, then ask for a task, and assign to the slot slave
   1.642 + * 3) if none, then prune former task slaves waiting to be recycled.
   1.643 + *
   1.644 +   //Have two separate assigners in each semantic env,
   1.645 +   // which keeps its own work in its own structures.. the master, here, 
   1.646 +   // searches through the semantic environs, takes the first that has work
   1.647 +   // available, and whatever it returns is assigned to the slot..
   1.648 +   //However, also have an override assigner.. because static analysis tools know
   1.649 +   // which languages are grouped together.. and the override enables them to
   1.650 +   // generate a custom assigner that uses info from all the languages in a 
   1.651 +   // unified way..  Don't really expect this to happen, but making it possible.
   1.652 + */
   1.653 +inline SlaveVP *
   1.654 +assignWork( PRProcessEnv *processEnv, AnimSlot *slot )
   1.655 + { SlaveVP     *returnSlv;
   1.656 +   //VSsSemEnv   *semEnv;
   1.657 +   //VSsSemData  *semData;
   1.658 +   int32        coreNum, slotNum;
   1.659 +   PRTaskMetaInfo *newTaskStub;
   1.660 +   SlaveVP     *freeTaskSlv;
   1.661 +
   1.662 +   
   1.663 +      //master has to handle slot slaves.. so either assigner returns
   1.664 +      // taskMetaInfo or else two assigners, one for slaves, other for tasks..     
   1.665 +   semEnvs = processEnv->semEnvs;
   1.666 +   numEnvs = processEnv->numSemEnvs;
   1.667 +   for( envIdx = 0; envIdx < numEnvs; envIdx++ )
   1.668 +    { semEnv = semEnvs[envIdx];
   1.669 +      if( semEnv->hasWork )
   1.670 +       { assigner = semEnv->assigner; 
   1.671 +         retTaskMetaInfo = (*assigner)( semEnv, slot );
   1.672 +         
   1.673 +         return retTaskMetaInfo; //quit, have work
   1.674 +       }
   1.675 +    }
   1.676 +   
   1.677 +   coreNum = slot->coreSlotIsOn;
   1.678 +   slotNum = slot->slotIdx;
   1.679 + 
   1.680 +      //first try to get a ready slave
   1.681 +   returnSlv = getReadySlave();
   1.682 +
   1.683 +   if( returnSlv != NULL )
   1.684 +    { returnSlv->coreAnimatedBy   = coreNum;
   1.685 +    
   1.686 +         //have work, so reset Done flag (when work generated on other core)
   1.687 +      if( processEnv->coreIsDone[coreNum] == TRUE ) //reads are higher perf
   1.688 +         processEnv->coreIsDone[coreNum] = FALSE;   //don't just write always
   1.689 +    
   1.690 +      goto ReturnTheSlv;
   1.691 +    }
   1.692 +   
   1.693 +      //were no slaves, so try to get a ready task.. 
   1.694 +   newTaskStub = getTaskStub();
   1.695 +   
   1.696 +   if( newTaskStub != NULL )
   1.697 +    { 
   1.698 +         //get the slot slave to assign the task to..
   1.699 +      returnSlv = processEnv->slotTaskSlvs[coreNum][slotNum];
   1.700 +
   1.701 +         //point slave to task's function, and mark slave as having task
   1.702 +      PR_int__reset_slaveVP_to_TopLvlFn( returnSlv, 
   1.703 +                          newTaskStub->taskType->fn, newTaskStub->args );
   1.704 +      returnSlv->taskStub          = newTaskStub;
   1.705 +      newTaskStub->slaveAssignedTo = returnSlv;
   1.706 +      returnSlv->needsTaskAssigned = FALSE;  //slot slave is a "Task" slave type
   1.707 +      
   1.708 +         //have work, so reset Done flag, if was set
   1.709 +      if( processEnv->coreIsDone[coreNum] == TRUE ) //reads are higher perf
   1.710 +         processEnv->coreIsDone[coreNum] = FALSE;   //don't just write always
   1.711 +      
   1.712 +      goto ReturnTheSlv;
   1.713 +    }
   1.714 +   else
   1.715 +    {    //no task, so prune the recycle pool of free task slaves
   1.716 +      freeTaskSlv = readPrivQ( processEnv->freeTaskSlvRecycleQ );
   1.717 +      if( freeTaskSlv != NULL )
   1.718 +       {    //delete to bound the num extras, and deliver shutdown cond
   1.719 +         handleDissipate( freeTaskSlv, processEnv );
   1.720 +            //then return NULL
   1.721 +         returnSlv = NULL;
   1.722 +         
   1.723 +         goto ReturnTheSlv;
   1.724 +       }
   1.725 +      else
   1.726 +       { //candidate for shutdown.. if all extras dissipated, and no tasks
   1.727 +         // and no ready to resume slaves, then no way to generate
   1.728 +         // more tasks (on this core -- other core might have task still)
   1.729 +         if( processEnv->numLiveExtraTaskSlvs == 0 && 
   1.730 +             processEnv->numLiveThreadSlvs == 0 )
   1.731 +          { //This core sees no way to generate more tasks, so say it
   1.732 +            if( processEnv->coreIsDone[coreNum] == FALSE )
   1.733 +             { processEnv->numCoresDone += 1;
   1.734 +               processEnv->coreIsDone[coreNum] = TRUE;
   1.735 +               #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
   1.736 +               processEnv->shutdownInitiated = TRUE;
   1.737 +               
   1.738 +               #else
   1.739 +               if( processEnv->numCoresDone == NUM_CORES )
   1.740 +                { //means no cores have work, and none can generate more
   1.741 +                  processEnv->shutdownInitiated = TRUE;
   1.742 +                }
   1.743 +               #endif
   1.744 +             }
   1.745 +          }
   1.746 +            //check if shutdown has been initiated by this or other core
   1.747 +         if(processEnv->shutdownInitiated) 
   1.748 +          { returnSlv = PR_SS__create_shutdown_slave();
   1.749 +          }
   1.750 +         else
   1.751 +            returnSlv = NULL;
   1.752 +
   1.753 +         goto ReturnTheSlv; //don't need, but completes pattern
   1.754 +       } //if( freeTaskSlv != NULL )
   1.755 +    } //if( newTaskStub == NULL )
   1.756 +   //outcome: 1)slave was just pointed to task, 2)no tasks, so slave NULL
   1.757 + 
   1.758 +
   1.759 + ReturnTheSlv:  //All paths goto here.. to provide single point for holistic..
   1.760 +
   1.761 +   #ifdef HOLISTIC__TURN_ON_OBSERVE_UCC
   1.762 +   if( returnSlv == NULL )
   1.763 +    { returnSlv = processEnv->idleSlv[coreNum][slotNum]; 
   1.764 +    
   1.765 +         //things that would normally happen in resume(), but idle VPs
   1.766 +         // never go there
   1.767 +      returnSlv->assignCount++; //gives each idle unit a unique ID
   1.768 +      Unit newU;
   1.769 +      newU.vp = returnSlv->slaveID;
   1.770 +      newU.task = returnSlv->assignCount;
   1.771 +      addToListOfArrays(Unit,newU,processEnv->unitList);
   1.772 +
   1.773 +      if (returnSlv->assignCount > 1) //make a dependency from prev idle unit
   1.774 +       { Dependency newD;             // to this one
   1.775 +         newD.from_vp = returnSlv->slaveID;
   1.776 +         newD.from_task = returnSlv->assignCount - 1;
   1.777 +         newD.to_vp = returnSlv->slaveID;
   1.778 +         newD.to_task = returnSlv->assignCount;
   1.779 +         addToListOfArrays(Dependency, newD ,processEnv->ctlDependenciesList);  
   1.780 +       }
   1.781 +    }
   1.782 +   else //have a slave will be assigned to the slot
   1.783 +    { //assignSlv->numTimesAssigned++;
   1.784 +         //get previous occupant of the slot
   1.785 +      Unit prev_in_slot = 
   1.786 +         processEnv->last_in_slot[coreNum * NUM_ANIM_SLOTS + slotNum];
   1.787 +      if(prev_in_slot.vp != 0) //if not first slave in slot, make dependency
   1.788 +       { Dependency newD;      // is a hardware dependency
   1.789 +         newD.from_vp = prev_in_slot.vp;
   1.790 +         newD.from_task = prev_in_slot.task;
   1.791 +         newD.to_vp = returnSlv->slaveID;
   1.792 +         newD.to_task = returnSlv->assignCount;
   1.793 +         addToListOfArrays(Dependency,newD,processEnv->hwArcs);   
   1.794 +       }
   1.795 +      prev_in_slot.vp = returnSlv->slaveID; //make new slave the new previous
   1.796 +      prev_in_slot.task = returnSlv->assignCount;
   1.797 +      processEnv->last_in_slot[coreNum * NUM_ANIM_SLOTS + slotNum] =
   1.798 +         prev_in_slot;        
   1.799 +    }
   1.800 +   #endif
   1.801 +
   1.802 +   return( returnSlv );
   1.803 + }
   1.804 +
   1.805 +      
   1.806 +//=================================================================
   1.807 +         //#else  //is MODE__MULTI_LANG
   1.808 +            //For multi-lang mode, first, get the constraint-env holder out of
   1.809 +            // the process, which is in the slave.
   1.810 +            //Second, get the magic number out of the request, use it to look up
   1.811 +            // the constraint Env within the constraint-env holder.
   1.812 +            //Then get the request handler out of the constr env
   1.813 +         constrEnvHolder = slave->process->constrEnvHolder;
   1.814 +         reqst = slave->request;
   1.815 +         langMagicNumber = reqst->langMagicNumber;
   1.816 +         semanticEnv = lookup( langMagicNumber, constrEnvHolder ); //a macro
   1.817 +         if( slave->reqst->type == taskEnd ) //end-task is special
   1.818 +          {    //need to know what lang's task ended
   1.819 +            taskEndHandler = semanticEnv->taskEndHandler;
   1.820 +            (*taskEndHandler)( slave, reqst, semanticEnv ); //can put semantic data into task end reqst, for continuation, etc
   1.821 +               //this is a slot slave, get a new task for it
   1.822 +            if( !existsOverrideAssigner )//if exists, is set above, before loop
   1.823 +             {    //search for task assigner that has work
   1.824 +               for( a = 0; a < num_assigners; a++ )
   1.825 +                { if( taskAssigners[a]->hasWork )
   1.826 +                   { newTaskAssigner = taskAssigners[a];
   1.827 +                     (*newTaskAssigner)( slave, semanticEnv );
   1.828 +                     goto GotTask;
   1.829 +                   }
   1.830 +                }
   1.831 +               goto NoTasks;
   1.832 +             }
   1.833 +            
   1.834 +           GotTask:
   1.835 +            continue; //have work, so do next iter of loop, don't call slave assigner
   1.836 +          }
   1.837 +         if( slave->typeOfVP == taskSlotSlv ) changeSlvType();//is suspended task
   1.838 +            //now do normal suspended slave request handler
   1.839 +         requestHandler = semanticEnv->requestHandler;
   1.840 +         //#endif
   1.841 +
   1.842 +         
   1.843 +       }
   1.844 +         //If make it here, then was no task for this slot
   1.845 +         //slot empty, hand to Assigner to fill with a slave
   1.846 +      if( currSlot->needsSlaveAssigned )
   1.847 +       {    //Call plugin's Assigner to give slot a new slave
   1.848 +               HOLISTIC__Record_Assigner_start;
   1.849 +               
   1.850 +         //#ifdef  MODE__MULTI_LANG
   1.851 +        NoTasks:
   1.852 +            //First, choose an Assigner..
   1.853 +            //There are several Assigners, one for each langlet.. they all
   1.854 +            // indicate whether they have work available.. just pick the first
   1.855 +            // one that has work..  Or, if there's a Unified Assigner, call
   1.856 +            // that one..  So, go down array, checking..
   1.857 +         if( !existsOverrideAssigner ) 
   1.858 +          { for( a = 0; a < num_assigners; a++ )
   1.859 +             { if( assigners[a]->hasWork )
   1.860 +                { slaveAssigner = assigners[a];
   1.861 +                  goto GotAssigner;
   1.862 +                }
   1.863 +             }
   1.864 +            //no work, so just continue to next iter of scan loop
   1.865 +            continue;
   1.866 +          }
   1.867 +         //when exists override, the assigner is set, once, above, so do nothing
   1.868 +        GotAssigner:
   1.869 +         //#endif
   1.870 +        
   1.871 +         assignedSlaveVP =
   1.872 +          (*slaveAssigner)( semanticEnv, currSlot );
   1.873 +         
   1.874 +            //put the chosen slave into slot, and adjust flags and state
   1.875 +         if( assignedSlaveVP != NULL )
   1.876 +          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
   1.877 +            assignedSlaveVP->animSlotAssignedTo = currSlot;
   1.878 +            currSlot->needsSlaveAssigned  = FALSE;
   1.879 +            numSlotsFilled               += 1;
   1.880 +            
   1.881 +            HOLISTIC__Record_Assigner_end;
   1.882 +          }
   1.883 +       }//if slot needs slave assigned
   1.884 +    }//for( slotIdx..
   1.885 +
   1.886 +         MEAS__Capture_Post_Master_Point;
   1.887 +   
   1.888 +   masterSwitchToCoreCtlr( masterVP );
   1.889 +   flushRegisters();
   1.890 +         DEBUG__printf(FALSE,"came back after switch to core -- so lock released!");
   1.891 +   }//while(1) 
   1.892 + }
   1.893 +
     2.1 --- a/CoreController.c	Mon Sep 03 03:34:54 2012 -0700
     2.2 +++ b/CoreController.c	Wed Sep 19 23:12:44 2012 -0700
     2.3 @@ -5,7 +5,7 @@
     2.4   */
     2.5  
     2.6  
     2.7 -#include "VMS.h"
     2.8 +#include "PR.h"
     2.9  
    2.10  #include <stdlib.h>
    2.11  #include <stdio.h>
    2.12 @@ -55,9 +55,9 @@
    2.13   * amortize the overhead of switching to the master VP and running it.  With
    2.14   * multiple animation slots, the time to switch-to-master and the code in
    2.15   * the animation master is divided by the number of animation slots.
    2.16 - *The core controller and animation slots are not fundamental parts of VMS,
    2.17 + *The core controller and animation slots are not fundamental parts of PR,
    2.18   * but rather optimizations put into the shared-semantic-state version of
    2.19 - * VMS.  Other versions of VMS will not have a core controller nor scheduling
    2.20 + * PR.  Other versions of PR will not have a core controller nor scheduling
    2.21   * slots.
    2.22   * 
    2.23   *The core controller "owns" the physical core, in effect, and is the 
    2.24 @@ -92,13 +92,13 @@
    2.25     thisCoresIdx = thisCoresThdParams->coreNum;
    2.26  
    2.27        //Assembly that saves addr of label of return instr -- label in assmbly
    2.28 -   recordCoreCtlrReturnLabelAddr((void**)&(_VMSMasterEnv->coreCtlrReturnPt));
    2.29 +   recordCoreCtlrReturnLabelAddr((void**)&(_PRMasterEnv->coreCtlrReturnPt));
    2.30  
    2.31 -   animSlots = _VMSMasterEnv->allAnimSlots[thisCoresIdx];
    2.32 +   animSlots = _PRMasterEnv->allAnimSlots[thisCoresIdx];
    2.33     currSlotIdx = 0; //start at slot 0, go up until one empty, then do master
    2.34     numRepetitionsWithNoWork = 0;
    2.35 -   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
    2.36 -   thisCoresMasterVP = _VMSMasterEnv->masterVPs[thisCoresIdx];
    2.37 +   addrOfMasterLock = &(_PRMasterEnv->masterLock);
    2.38 +   thisCoresMasterVP = _PRMasterEnv->masterVPs[thisCoresIdx];
    2.39     
    2.40     //==================== pthread related stuff ======================
    2.41        //pin the pthread to the core -- takes away Linux control
    2.42 @@ -113,7 +113,7 @@
    2.43  
    2.44        //make sure the controllers all start at same time, by making them wait
    2.45     pthread_mutex_lock(  &suspendLock );
    2.46 -   while( !(_VMSMasterEnv->setupComplete) )
    2.47 +   while( !(_PRMasterEnv->setupComplete) )
    2.48      { pthread_cond_wait( &suspendCond, &suspendLock );
    2.49      }
    2.50     pthread_mutex_unlock( &suspendLock );
    2.51 @@ -209,7 +209,7 @@
    2.52      }//while(1)
    2.53   }
    2.54  
    2.55 -/*Shutdown of VMS involves several steps, of which this is the last.  This
    2.56 +/*Shutdown of PR involves several steps, of which this is the last.  This
    2.57   * function is jumped to from the asmTerminateCoreCtrl, which is in turn
    2.58   * called from endOSThreadFn, which is the top-level-fn of the shutdown
    2.59   * slaves.
    2.60 @@ -218,18 +218,18 @@
    2.61  terminateCoreCtlr(SlaveVP *currSlv)
    2.62   {
    2.63     //first, free shutdown Slv that jumped here, then end the pthread
    2.64 -   VMS_int__dissipate_slaveVP( currSlv );
    2.65 +   PR_int__dissipate_slaveVP( currSlv );
    2.66     pthread_exit( NULL );
    2.67   }
    2.68  
    2.69  inline uint32_t
    2.70  randomNumber()
    2.71   {
    2.72 -	_VMSMasterEnv->seed1 = (uint32)(36969 * (_VMSMasterEnv->seed1 & 65535) + 
    2.73 -                                   (_VMSMasterEnv->seed1 >> 16) );
    2.74 -	_VMSMasterEnv->seed2 = (uint32)(18000 * (_VMSMasterEnv->seed2 & 65535) + 
    2.75 -                                   (_VMSMasterEnv->seed2 >> 16) );
    2.76 -	return (_VMSMasterEnv->seed1 << 16) + _VMSMasterEnv->seed2;
    2.77 +	_PRMasterEnv->seed1 = (uint32)(36969 * (_PRMasterEnv->seed1 & 65535) + 
    2.78 +                                   (_PRMasterEnv->seed1 >> 16) );
    2.79 +	_PRMasterEnv->seed2 = (uint32)(18000 * (_PRMasterEnv->seed2 & 65535) + 
    2.80 +                                   (_PRMasterEnv->seed2 >> 16) );
    2.81 +	return (_PRMasterEnv->seed1 << 16) + _PRMasterEnv->seed2;
    2.82   }
    2.83  
    2.84  
    2.85 @@ -292,14 +292,14 @@
    2.86     
    2.87     //===============  Initializations ===================
    2.88     thisCoresIdx = 0; //sequential version
    2.89 -   animSlots = _VMSMasterEnv->allAnimSlots[thisCoresIdx];
    2.90 +   animSlots = _PRMasterEnv->allAnimSlots[thisCoresIdx];
    2.91     currSlotIdx = 0; //start at slot 0, go up until one empty, then do master
    2.92     numRepetitionsWithNoWork = 0;
    2.93 -   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
    2.94 -   thisCoresMasterVP = _VMSMasterEnv->masterVPs[thisCoresIdx];
    2.95 +   addrOfMasterLock = &(_PRMasterEnv->masterLock);
    2.96 +   thisCoresMasterVP = _PRMasterEnv->masterVPs[thisCoresIdx];
    2.97     
    2.98        //Assembly that saves addr of label of return instr -- label in assmbly
    2.99 -   recordCoreCtlrReturnLabelAddr((void**)&(_VMSMasterEnv->coreCtlrReturnPt));
   2.100 +   recordCoreCtlrReturnLabelAddr((void**)&(_PRMasterEnv->coreCtlrReturnPt));
   2.101  
   2.102     
   2.103     //====================== The Core Controller ======================
     3.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     3.2 +++ b/Defines/MEAS__macros_to_be_moved_to_langs.h	Wed Sep 19 23:12:44 2012 -0700
     3.3 @@ -0,0 +1,57 @@
     3.4 +/*
     3.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
     3.6 + *  Licensed under GNU General Public License version 2
     3.7 + *
     3.8 + * Author: seanhalle@yahoo.com
     3.9 + * 
    3.10 + */
    3.11 +
    3.12 +#ifndef  _PR_LANG_SPEC_DEFS_H
    3.13 +#define	_PR_LANG_SPEC_DEFS_H
    3.14 +
    3.15 +
    3.16 +
    3.17 +//===================  Language-specific Measurement Stuff ===================
    3.18 +//
    3.19 +//TODO:  move these into the language implementation directories
    3.20 +//
    3.21 +
    3.22 +
    3.23 +//===========================================================================
    3.24 +//VCilk
    3.25 +
    3.26 +#ifdef VCILK
    3.27 +
    3.28 +#define spawnHistIdx      1 //note: starts at 1
    3.29 +#define syncHistIdx       2
    3.30 +
    3.31 +#define MEAS__Make_Meas_Hists_for_Language() \
    3.32 +   _PRMasterEnv->measHistsInfo = \
    3.33 +          makePrivDynArrayOfSize( (void***)&(_PRMasterEnv->measHists), 200); \
    3.34 +    makeAMeasHist( spawnHistIdx,      "Spawn",        50, 0, 200 ) \
    3.35 +    makeAMeasHist( syncHistIdx,       "Sync",         50, 0, 200 )
    3.36 +
    3.37 +
    3.38 +#define Meas_startSpawn \
    3.39 +    int32 startStamp, endStamp; \
    3.40 +    saveLowTimeStampCountInto( startStamp ); \
    3.41 +
    3.42 +#define Meas_endSpawn \
    3.43 +    saveLowTimeStampCountInto( endStamp ); \
    3.44 +    addIntervalToHist( startStamp, endStamp, \
    3.45 +                             _PRMasterEnv->measHists[ spawnHistIdx ] );
    3.46 +
    3.47 +#define Meas_startSync \
    3.48 +    int32 startStamp, endStamp; \
    3.49 +    saveLowTimeStampCountInto( startStamp ); \
    3.50 +
    3.51 +#define Meas_endSync \
    3.52 +    saveLowTimeStampCountInto( endStamp ); \
    3.53 +    addIntervalToHist( startStamp, endStamp, \
    3.54 +                             _PRMasterEnv->measHists[ syncHistIdx ] );
    3.55 +#endif
    3.56 +
    3.57 +//===========================================================================
    3.58 +
    3.59 +#endif	/* _PR_DEFS_H */
    3.60 +
     4.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     4.2 +++ b/Defines/PR_defs.h	Wed Sep 19 23:12:44 2012 -0700
     4.3 @@ -0,0 +1,43 @@
     4.4 +/*
     4.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
     4.6 + *  Licensed under GNU General Public License version 2
     4.7 + *
     4.8 + * Author: seanhalle@yahoo.com
     4.9 + * 
    4.10 + */
    4.11 +
    4.12 +#ifndef  _PR_DEFS_MAIN_H
    4.13 +#define	_PR_DEFS_MAIN_H
    4.14 +#define _GNU_SOURCE
    4.15 +
    4.16 +//===========================  PR-wide defs  ===============================
    4.17 +
    4.18 +#define SUCCESS 0
    4.19 +
    4.20 +   //only after macro-expansion are the defs of writePrivQ, aso looked up
    4.21 +   // so these defs can be at the top, and writePrivQ defined later on..
    4.22 +#define writePRQ     writePrivQ
    4.23 +#define readPRQ      readPrivQ
    4.24 +#define makePRQ      makePrivQ
    4.25 +#define numInPRQ     numInPrivQ
    4.26 +#define PRQueueStruc PrivQueueStruc
    4.27 +
    4.28 +
    4.29 +/*The language should re-define this, but need a default in case it doesn't*/
    4.30 +#ifndef _LANG_NAME_
    4.31 +#define _LANG_NAME_ ""
    4.32 +#endif
    4.33 +
    4.34 +//======================  Hardware Constants ============================
    4.35 +#include "PR_defs__HW_constants.h"
    4.36 +
    4.37 +//======================  Macros  ======================
    4.38 +   //for turning macros and other PR features on and off
    4.39 +#include "PR_defs__turn_on_and_off.h"
    4.40 +
    4.41 +#include "../Services_Offered_by_PR/Debugging/DEBUG__macros.h"
    4.42 +#include "../Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h"
    4.43 +
    4.44 +//===========================================================================
    4.45 +#endif	/*  */
    4.46 +
     5.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     5.2 +++ b/Defines/PR_defs__HW_constants.h	Wed Sep 19 23:12:44 2012 -0700
     5.3 @@ -0,0 +1,54 @@
     5.4 +/*
     5.5 + *  Copyright 2012 OpenSourceStewardshipFoundation
     5.6 + *  Licensed under BSD
     5.7 + *
     5.8 + * Author: seanhalle@yahoo.com
     5.9 + * 
    5.10 + */
    5.11 +
    5.12 +#ifndef _PR_HW_SPEC_DEFS_H
    5.13 +#define	_PR_HW_SPEC_DEFS_H
    5.14 +#define _GNU_SOURCE
    5.15 +
    5.16 +
    5.17 +//=========================  Hardware related Constants =====================
    5.18 +   //This value is the number of hardware threads in the shared memory
    5.19 +   // machine
    5.20 +#define NUM_CORES        4
    5.21 +
    5.22 +   // tradeoff amortizing master fixed overhead vs imbalance potential
    5.23 +   // when work-stealing, can make bigger, at risk of losing cache affinity
    5.24 +#define NUM_ANIM_SLOTS  1
    5.25 +
    5.26 +   //These are for backoff inside core-loop, which reduces lock contention
    5.27 +#define NUM_REPS_W_NO_WORK_BEFORE_YIELD      10
    5.28 +#define NUM_REPS_W_NO_WORK_BEFORE_BACKOFF    2
    5.29 +#define MASTERLOCK_RETRIES_BEFORE_YIELD      100
    5.30 +#define NUM_TRIES_BEFORE_DO_BACKOFF          10
    5.31 +#define GET_LOCK_BACKOFF_WEIGHT 100
    5.32 +   
    5.33 +   // stack size in virtual processors created
    5.34 +#define VIRT_PROCR_STACK_SIZE 0x8000 /* 32K */
    5.35 +
    5.36 +   // memory for PR_int__malloc
    5.37 +#define MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE 0x8000000 /* 128M */
    5.38 +
    5.39 +   //Frequency of TS counts -- have to do tests to verify
    5.40 +   //NOTE: turn off (in BIOS)  TURBO-BOOST and SPEED-STEP else won't be const
    5.41 +#define TSCOUNT_FREQ 3180000000
    5.42 +
    5.43 +#define CACHE_LINE_SZ  256
    5.44 +#define PAGE_SIZE     4096
    5.45 +
    5.46 +//To prevent false-sharing, aligns a variable to a cache-line boundary.
    5.47 +//No need to use for local vars because those are never shared between cores
    5.48 +#define __align_to_cacheline__ __attribute__ ((aligned(CACHE_LINE_SZ)))
    5.49 +
    5.50 +//aligns a pointer to cacheline. The memory area has to contain at least
    5.51 +//CACHE_LINE_SZ bytes more then needed
    5.52 +#define __align_address(ptr) ((void*)(((uintptr_t)(ptr))&((uintptr_t)(~0x0FF))))
    5.53 +
    5.54 +//===========================================================================
    5.55 +
    5.56 +#endif	/* _PR_DEFS_H */
    5.57 +
     6.1 --- a/Defines/VMS_defs.h	Mon Sep 03 03:34:54 2012 -0700
     6.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     6.3 @@ -1,43 +0,0 @@
     6.4 -/*
     6.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
     6.6 - *  Licensed under GNU General Public License version 2
     6.7 - *
     6.8 - * Author: seanhalle@yahoo.com
     6.9 - * 
    6.10 - */
    6.11 -
    6.12 -#ifndef  _VMS_DEFS_MAIN_H
    6.13 -#define	_VMS_DEFS_MAIN_H
    6.14 -#define _GNU_SOURCE
    6.15 -
    6.16 -//===========================  VMS-wide defs  ===============================
    6.17 -
    6.18 -#define SUCCESS 0
    6.19 -
    6.20 -   //only after macro-expansion are the defs of writePrivQ, aso looked up
    6.21 -   // so these defs can be at the top, and writePrivQ defined later on..
    6.22 -#define writeVMSQ     writePrivQ
    6.23 -#define readVMSQ      readPrivQ
    6.24 -#define makeVMSQ      makePrivQ
    6.25 -#define numInVMSQ     numInPrivQ
    6.26 -#define VMSQueueStruc PrivQueueStruc
    6.27 -
    6.28 -
    6.29 -/*The language should re-define this, but need a default in case it doesn't*/
    6.30 -#ifndef _LANG_NAME_
    6.31 -#define _LANG_NAME_ ""
    6.32 -#endif
    6.33 -
    6.34 -//======================  Hardware Constants ============================
    6.35 -#include "VMS_defs__HW_constants.h"
    6.36 -
    6.37 -//======================  Macros  ======================
    6.38 -   //for turning macros and other VMS features on and off
    6.39 -#include "VMS_defs__turn_on_and_off.h"
    6.40 -
    6.41 -#include "../Services_Offered_by_VMS/Debugging/DEBUG__macros.h"
    6.42 -#include "../Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h"
    6.43 -
    6.44 -//===========================================================================
    6.45 -#endif	/*  */
    6.46 -
     7.1 --- a/Defines/VMS_defs__HW_constants.h	Mon Sep 03 03:34:54 2012 -0700
     7.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     7.3 @@ -1,54 +0,0 @@
     7.4 -/*
     7.5 - *  Copyright 2012 OpenSourceStewardshipFoundation
     7.6 - *  Licensed under BSD
     7.7 - *
     7.8 - * Author: seanhalle@yahoo.com
     7.9 - * 
    7.10 - */
    7.11 -
    7.12 -#ifndef _VMS_HW_SPEC_DEFS_H
    7.13 -#define	_VMS_HW_SPEC_DEFS_H
    7.14 -#define _GNU_SOURCE
    7.15 -
    7.16 -
    7.17 -//=========================  Hardware related Constants =====================
    7.18 -   //This value is the number of hardware threads in the shared memory
    7.19 -   // machine
    7.20 -#define NUM_CORES        4
    7.21 -
    7.22 -   // tradeoff amortizing master fixed overhead vs imbalance potential
    7.23 -   // when work-stealing, can make bigger, at risk of losing cache affinity
    7.24 -#define NUM_ANIM_SLOTS  1
    7.25 -
    7.26 -   //These are for backoff inside core-loop, which reduces lock contention
    7.27 -#define NUM_REPS_W_NO_WORK_BEFORE_YIELD      10
    7.28 -#define NUM_REPS_W_NO_WORK_BEFORE_BACKOFF    2
    7.29 -#define MASTERLOCK_RETRIES_BEFORE_YIELD      100
    7.30 -#define NUM_TRIES_BEFORE_DO_BACKOFF          10
    7.31 -#define GET_LOCK_BACKOFF_WEIGHT 100
    7.32 -   
    7.33 -   // stack size in virtual processors created
    7.34 -#define VIRT_PROCR_STACK_SIZE 0x8000 /* 32K */
    7.35 -
    7.36 -   // memory for VMS_int__malloc
    7.37 -#define MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE 0x8000000 /* 128M */
    7.38 -
    7.39 -   //Frequency of TS counts -- have to do tests to verify
    7.40 -   //NOTE: turn off (in BIOS)  TURBO-BOOST and SPEED-STEP else won't be const
    7.41 -#define TSCOUNT_FREQ 3180000000
    7.42 -
    7.43 -#define CACHE_LINE_SZ  256
    7.44 -#define PAGE_SIZE     4096
    7.45 -
    7.46 -//To prevent false-sharing, aligns a variable to a cache-line boundary.
    7.47 -//No need to use for local vars because those are never shared between cores
    7.48 -#define __align_to_cacheline__ __attribute__ ((aligned(CACHE_LINE_SZ)))
    7.49 -
    7.50 -//aligns a pointer to cacheline. The memory area has to contain at least
    7.51 -//CACHE_LINE_SZ bytes more then needed
    7.52 -#define __align_address(ptr) ((void*)(((uintptr_t)(ptr))&((uintptr_t)(~0x0FF))))
    7.53 -
    7.54 -//===========================================================================
    7.55 -
    7.56 -#endif	/* _VMS_DEFS_H */
    7.57 -
     8.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     8.2 +++ b/HW_Dependent_Primitives/PR__HW_measurement.c	Wed Sep 19 23:12:44 2012 -0700
     8.3 @@ -0,0 +1,87 @@
     8.4 +#include <unistd.h>
     8.5 +#include <fcntl.h>
     8.6 +#include <linux/types.h>
     8.7 +#include <linux/perf_event.h>
     8.8 +#include <errno.h>
     8.9 +#include <sys/syscall.h>
    8.10 +#include <linux/prctl.h>
    8.11 +
    8.12 +#include "../PR.h"
    8.13 +
    8.14 +void setup_perf_counters(){
    8.15 +#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
    8.16 +    struct perf_event_attr hw_event;
    8.17 +   memset(&hw_event,0,sizeof(hw_event));
    8.18 +	hw_event.size = sizeof(struct perf_event_attr);
    8.19 +	hw_event.disabled = 1;
    8.20 +	hw_event.inherit = 1; /* children inherit it   */
    8.21 +	hw_event.pinned = 1; /* must always be on PMU */
    8.22 +	hw_event.exclusive = 0; /* only group on PMU     */
    8.23 +	hw_event.exclude_user = 0; /* don't count user      */
    8.24 +	hw_event.exclude_kernel = 0; /* ditto kernel          */
    8.25 +	hw_event.exclude_hv = 0; /* ditto hypervisor      */
    8.26 +	hw_event.exclude_idle = 0; /* don't count when idle */
    8.27 +
    8.28 +        int coreIdx;
    8.29 +   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
    8.30 +    {
    8.31 +       hw_event.type = PERF_TYPE_HARDWARE;	
    8.32 +       hw_event.config = PERF_COUNT_HW_CPU_CYCLES; //cycles
    8.33 +        _PRMasterEnv->cycles_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
    8.34 + 		0,//pid_t pid, 
    8.35 +		coreIdx,//int cpu, 
    8.36 +		-1,//int group_fd,
    8.37 +		0//unsigned long flags
    8.38 +	);
    8.39 +        if (_PRMasterEnv->cycles_counter_fd[coreIdx]<0){
    8.40 +            fprintf(stderr,"On core %d: ",coreIdx);
    8.41 +            perror("Failed to open cycles counter");
    8.42 +        }
    8.43 +        hw_event.type = PERF_TYPE_HARDWARE;
    8.44 +        hw_event.config = PERF_COUNT_HW_INSTRUCTIONS; //instrs
    8.45 +        _PRMasterEnv->instrs_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
    8.46 + 		0,//pid_t pid, 
    8.47 +		coreIdx,//int cpu, 
    8.48 +		-1,//int group_fd,
    8.49 +		0//unsigned long flags
    8.50 +	);
    8.51 +        if (_PRMasterEnv->instrs_counter_fd[coreIdx]<0){
    8.52 +            fprintf(stderr,"On core %d: ",coreIdx);
    8.53 +            perror("Failed to open instrs counter");
    8.54 +        }
    8.55 +        hw_event.type = PERF_TYPE_HW_CACHE;
    8.56 +        hw_event.config = PERF_COUNT_HW_CACHE_L1D <<  0  |
    8.57 +	(PERF_COUNT_HW_CACHE_OP_READ		<<  8) |
    8.58 +	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16); //cache misses
    8.59 +        _PRMasterEnv->cachem_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
    8.60 + 		0,//pid_t pid, 
    8.61 +		coreIdx,//int cpu, 
    8.62 +		-1,//int group_fd,
    8.63 +		0//unsigned long flags
    8.64 +	);
    8.65 +        if (_PRMasterEnv->cachem_counter_fd[coreIdx]<0){
    8.66 +            fprintf(stderr,"On core %d: ",coreIdx);
    8.67 +            perror("Failed to open cache miss counter");
    8.68 +            exit(1);
    8.69 +        }
    8.70 +   }
    8.71 +        
    8.72 +   prctl(PR_TASK_PERF_EVENTS_ENABLE);
    8.73 +#endif
    8.74 +}
    8.75 +
    8.76 +__inline__ uint64_t rdtsc(){
    8.77 +    uint32_t lo, hi;
    8.78 +    __asm__ __volatile__ (      // serialize
    8.79 +    "xorl %%eax,%%eax \n        cpuid"
    8.80 +    ::: "%rax", "%rbx", "%rcx", "%rdx");
    8.81 +    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); 
    8.82 +   /* asm volatile("RDTSC;"                   
    8.83 +                 "movl %%eax, %0;"         
    8.84 +                 "movl %%edx, %1;"         
    8.85 +               : "=m" (lo), "=m" (hi)
    8.86 +               :                        
    8.87 +               : "%eax", "%edx"         
    8.88 +                ); */
    8.89 +    return (uint64_t)hi << 32 | lo;
    8.90 +}
    8.91 \ No newline at end of file
     9.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     9.2 +++ b/HW_Dependent_Primitives/PR__HW_measurement.h	Wed Sep 19 23:12:44 2012 -0700
     9.3 @@ -0,0 +1,63 @@
     9.4 +/*
     9.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
     9.6 + *  Licensed under GNU General Public License version 2
     9.7 + *
     9.8 + * Author: seanhalle@yahoo.com
     9.9 + * 
    9.10 + */
    9.11 +
    9.12 +#ifndef _PR__HW_MEASUREMENT_H
    9.13 +#define	_PR__HW_MEASUREMENT_H
    9.14 +#define _GNU_SOURCE
    9.15 +
    9.16 +
    9.17 +//===================  Macros to Capture Measurements  ======================
    9.18 +
    9.19 +typedef union
    9.20 + { uint32 lowHigh[2];
    9.21 +   uint64 longVal;
    9.22 + }
    9.23 +TSCountLowHigh;
    9.24 +
    9.25 +
    9.26 +//===================  Macros to Capture Measurements  ======================
    9.27 +//
    9.28 +//===== RDTSC wrapper ===== 
    9.29 +//Also runs with x86_64 code
    9.30 +#define saveTSCLowHigh(lowHighIn) \
    9.31 +   asm volatile("RDTSC;                   \
    9.32 +                 movl %%eax, %0;          \
    9.33 +                 movl %%edx, %1;"         \
    9.34 +   /* outputs */ : "=m" (lowHighIn.lowHigh[0]), "=m" (lowHighIn.lowHigh[1])\
    9.35 +   /* inputs  */ :                        \
    9.36 +   /* clobber */ : "%eax", "%edx"         \
    9.37 +                );
    9.38 +
    9.39 +#define saveTimeStampCountInto(low, high) \
    9.40 +   asm volatile("RDTSC;                   \
    9.41 +                 movl %%eax, %0;          \
    9.42 +                 movl %%edx, %1;"         \
    9.43 +   /* outputs */ : "=m" (low), "=m" (high)\
    9.44 +   /* inputs  */ :                        \
    9.45 +   /* clobber */ : "%eax", "%edx"         \
    9.46 +                );
    9.47 +
    9.48 +#define saveLowTimeStampCountInto(low)    \
    9.49 +   asm volatile("RDTSC;                   \
    9.50 +                 movl %%eax, %0;"         \
    9.51 +   /* outputs */ : "=m" (low)             \
    9.52 +   /* inputs  */ :                        \
    9.53 +   /* clobber */ : "%eax", "%edx"         \
    9.54 +                );
    9.55 +
    9.56 +inline TSCount getTSCount();
    9.57 +
    9.58 +
    9.59 +   //For code that calculates normalization-offset between TSC counts of
    9.60 +   // different cores.
    9.61 +//#define NUM_TSC_ROUND_TRIPS 10
    9.62 +
    9.63 +void setup_perf_counters();
    9.64 +uint64_t rdtsc(void);
    9.65 +#endif	/* */
    9.66 +
    10.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    10.2 +++ b/HW_Dependent_Primitives/PR__primitives.c	Wed Sep 19 23:12:44 2012 -0700
    10.3 @@ -0,0 +1,137 @@
    10.4 +/*
    10.5 + * This File contains all hardware dependent C code.
    10.6 + */
    10.7 +
    10.8 +
    10.9 +#include "../PR.h"
   10.10 +
   10.11 +/*Reset the stack then set it up with __cdecl structure on it
   10.12 + * Except doing a trick for 64 bits, where point slave to helper assembly
   10.13 + * that copies the function pointer off stack and into a reg, then
   10.14 + * jumps to it.  So, set the resumeInstrPtr to the helper-assembly.
   10.15 + *This is for first-time startup of slave.. it trashes the stack.
   10.16 + *No registers saved into old stack frame, and no animator state to
   10.17 + * return to
   10.18 + * 
   10.19 + *This was factored into separate function because it's used stand-alone in
   10.20 + * some wrapper-libraries (but only "int" version, to warn users to check
   10.21 + * carefully that it's safe)
   10.22 + */
   10.23 +inline void
   10.24 +PR_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
   10.25 +                              void    *dataParam)
   10.26 + { void  *stackPtr;
   10.27 +
   10.28 +// Start of Hardware dependent part           
   10.29 +   
   10.30 +    //Set slave's instr pointer to a helper Fn that copies params from stack
   10.31 +   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&startUpTopLevelFn;
   10.32 +   
   10.33 +    //fnPtr takes two params -- void *dataParam & void *animSlv
   10.34 +    // Stack grows *down*, so start it at highest stack addr, minus room
   10.35 +    // for 2 params + return addr. Do ptr arith in terms of bytes..
   10.36 +   stackPtr = 
   10.37 +     (uint8 *)slaveVP->startOfStack + VIRT_PROCR_STACK_SIZE - 4*sizeof(void*);
   10.38 +  
   10.39 +    //setup __cdecl on stack
   10.40 +    //Normally, return Addr is in loc pointed to by stackPtr, but doing a
   10.41 +    // trick for 64 bit arch, where put ptr to top-level fn there instead,
   10.42 +    // and set resumeInstrPtr to a helper-fn that copies the top-level
   10.43 +    // fn ptr and params into registers.
   10.44 +    //Then, dataParam is at stackPtr + 8 bytes, & animating SlaveVP above
   10.45 +    //Do ptr arith in terms of pointers
   10.46 +   *((SlaveVP**)stackPtr + 2 ) = slaveVP; //rightmost param
   10.47 +   *((void**)stackPtr + 1 ) = dataParam;  //next  param to left
   10.48 +   *((void**)stackPtr) = (void*)fnPtr;    //copied to reg by helper Fn
   10.49 +   
   10.50 +  
   10.51 +// end of Hardware dependent part           
   10.52 +   
   10.53 +      //core controller will switch to stack & frame pointers stored in slave,
   10.54 +      // can't use this fn if have state on stack that needs preserving.
   10.55 +   slaveVP->stackPtr = stackPtr; 
   10.56 +   slaveVP->framePtr = stackPtr; 
   10.57 + }
   10.58 +
   10.59 +
   10.60 +/*Preserve the stack, pushing the __cdecl structure onto it
   10.61 + * For 64 bits, params passed in regs, so point slave to helper assembly
   10.62 + * that copies the arguments off stack and into regs, then
   10.63 + * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
   10.64 + * 
   10.65 + *This preserves the stack state existed at time slave was suspended.
   10.66 + */
   10.67 +inline void
   10.68 +PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
   10.69 +                              void    *param)
   10.70 + { void  *stackPtr;
   10.71 +
   10.72 +// Start of Hardware dependent part           
   10.73 +   
   10.74 +    // Get the slave's current stack ptr, and make room for param + ret addr
   10.75 +   stackPtr = ((void **)slaveVP->stackPtr - 2);
   10.76 +  
   10.77 +    //save slave's current instr ptr as the return addr, so stack looks
   10.78 +    // just like it does after a call instr.
   10.79 +    //Put argument plus fn addr onto stack -- helper will copy into regs
   10.80 +    // then jump to the fn
   10.81 +    //fnPtr is just below top of stack, param is above at stackPtr + 8 bytes
   10.82 +   *((void**)stackPtr + 1 ) = param;
   10.83 +   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
   10.84 +   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
   10.85 +   
   10.86 +    //Set slave's instr pointer to a helper Fn that copies params from stack
   10.87 +   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToOneParamFn;
   10.88 +   
   10.89 +// end of Hardware dependent part           
   10.90 +   
   10.91 +      //core controller will switch to stack & frame pointers stored in slave,
   10.92 +      // then jmp to helper Fn, which will then move param to register used
   10.93 +      // to pass argument and jmp to fnPtr saved on stack.
   10.94 +      //That fn should save the framePtr on stack and make room
   10.95 +      // for its own frame, as normal.  So don't modify framePtr, only stack
   10.96 +   slaveVP->stackPtr = stackPtr;
   10.97 + }
   10.98 +
   10.99 +
  10.100 +/*Same as for one-parameter function, but puts two arguments on stack
  10.101 + *Preserve the stack, pushing the __cdecl structure onto it
  10.102 + * For 64 bits, params passed in regs, so point slave to helper assembly
  10.103 + * that copies the arguments off stack and into regs, then
  10.104 + * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
  10.105 + * 
  10.106 + *This preserves the stack state existed at time slave was suspended.
  10.107 + */
  10.108 +inline void
  10.109 +PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
  10.110 +                              void    *param1, void *param2)
  10.111 + { void  *stackPtr;
  10.112 +
  10.113 +// Start of Hardware dependent part           
  10.114 +   
  10.115 +    // Get the slave's current stack ptr, and make room for param + ret addr
  10.116 +   stackPtr = slaveVP->stackPtr - 3;
  10.117 +  
  10.118 +    //save slave's current instr ptr as the return addr, so stack looks
  10.119 +    // just like it does after a call instr.
  10.120 +    //Put argument plus fn addr onto stack -- helper will copy into regs
  10.121 +    // then jump to the fn
  10.122 +    //fnPtr is just below top of stack, param1 is above at stackPtr + 8 bytes
  10.123 +   *((void**)stackPtr + 2 ) = param2;
  10.124 +   *((void**)stackPtr + 1 ) = param1;
  10.125 +   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
  10.126 +   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
  10.127 +   
  10.128 +    //Set slave's instr pointer to a helper Fn that copies params from stack
  10.129 +   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToTwoParamFn;
  10.130 +   
  10.131 +// end of Hardware dependent part           
  10.132 +   
  10.133 +      //core controller will switch to stack & frame pointers stored in slave,
  10.134 +      // then jmp to helper Fn, which will then move param to register used
  10.135 +      // to pass argument and jmp to fnPtr saved on stack.
  10.136 +      //That fn should save the framePtr on stack and make room
  10.137 +      // for its own frame, as normal.  So don't modify framePtr, only stack
  10.138 +   slaveVP->stackPtr = stackPtr;
  10.139 + }
  10.140 +
    11.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    11.2 +++ b/HW_Dependent_Primitives/PR__primitives.h	Wed Sep 19 23:12:44 2012 -0700
    11.3 @@ -0,0 +1,55 @@
    11.4 +/*
    11.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
    11.6 + *  Licensed under GNU General Public License version 2
    11.7 + *
    11.8 + * Author: seanhalle@yahoo.com
    11.9 + * 
   11.10 + */
   11.11 +
   11.12 +#ifndef  _PR__PRIMITIVES_H
   11.13 +#define	_PR__PRIMITIVES_H
   11.14 +#define _GNU_SOURCE
   11.15 +
   11.16 +void 
   11.17 +recordCoreCtlrReturnLabelAddr(void **returnAddress);
   11.18 +
   11.19 +void 
   11.20 +switchToSlv(SlaveVP *nextSlave);
   11.21 +
   11.22 +void 
   11.23 +switchToCoreCtlr(SlaveVP *nextSlave);
   11.24 +
   11.25 +void 
   11.26 +masterSwitchToCoreCtlr(SlaveVP *nextSlave);
   11.27 +
   11.28 +void 
   11.29 +startUpTopLevelFn();
   11.30 +
   11.31 +void 
   11.32 +jmpToOneParamFn();
   11.33 +
   11.34 +void 
   11.35 +jmpToTwoParamFn();
   11.36 +
   11.37 +void *
   11.38 +asmTerminateCoreCtlr(SlaveVP *currSlv);
   11.39 +
   11.40 +#define flushRegisters() \
   11.41 +        asm volatile ("":::"%rbx", "%r12", "%r13","%r14","%r15")
   11.42 +
   11.43 +void
   11.44 +PR_int__save_return_into_ptd_to_loc_then_do_ret(void *ptdToLoc);
   11.45 +
   11.46 +void
   11.47 +PR_int__return_to_addr_in_ptd_to_loc(void *ptdToLoc);
   11.48 +
   11.49 +inline void
   11.50 +PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
   11.51 +                              void    *param);
   11.52 +
   11.53 +inline void
   11.54 +PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
   11.55 +                              void    *param1, void *param2);
   11.56 +
   11.57 +#endif	/* _PR__HW_DEPENDENT_H */
   11.58 +
    12.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    12.2 +++ b/HW_Dependent_Primitives/PR__primitives_asm.s	Wed Sep 19 23:12:44 2012 -0700
    12.3 @@ -0,0 +1,189 @@
    12.4 +.data
    12.5 +
    12.6 +
    12.7 +.text
    12.8 +
    12.9 +//Save return label address for the coreCtlr to pointer
   12.10 +//Arguments: Pointer to variable holding address
   12.11 +.globl recordCoreCtlrReturnLabelAddr
   12.12 +recordCoreCtlrReturnLabelAddr:
   12.13 +    movq    $coreCtlrReturn, %rcx  #load label address
   12.14 +    movq    %rcx, (%rdi)           #save address to pointer
   12.15 +    ret
   12.16 +
   12.17 +
   12.18 +//Trick for 64 bit arch -- copies args from stack into regs, then does jmp to
   12.19 +// the top-level function, which was pointed to by the stack-ptr
   12.20 +.globl startUpTopLevelFn
   12.21 +startUpTopLevelFn:
   12.22 +    movq    %rdi      , %rsi #get second argument from first argument of switchSlv
   12.23 +    movq    0x08(%rsp), %rdi #get first argument from stack
   12.24 +    movq    (%rsp)    , %rax #get top-level function's addr from stack
   12.25 +    jmp     *%rax            #jump to the top-level function
   12.26 +
   12.27 +
   12.28 +//Args passed in regs in 64 bit arch. This copies args from stack into regs,
   12.29 +// then does jmp to the function, whose addr is on stack.
   12.30 +//For 64bit, %rdi is first arg, %rsi is second arg to function
   12.31 +//The top of stack is a valid return addr (old value of slaveVP's instrPtr),
   12.32 +// and the fnPtr is just below the top of stack (will be overwritten when
   12.33 +// fn saves the frame ptr)
   12.34 +.globl jmpToOneParamFn
   12.35 +jmpToOneParamFn:
   12.36 +    movq    0x08(%rsp), %rdi #get the argument from stack
   12.37 +    movq   -0x08(%rsp), %rax #get function's addr from stack
   12.38 +    jmp     *%rax            #jump to the function
   12.39 +
   12.40 +.globl jmpToTwoParamFn
   12.41 +jmpToTwoParamFn:
   12.42 +    movq    0x10(%rsp), %rsi #get the second argument from stack
   12.43 +    movq    0x08(%rsp), %rdi #get the first argument from stack
   12.44 +    movq   -0x08(%rsp), %rax #get function's addr from stack
   12.45 +    jmp     *%rax            #jump to the function
   12.46 +
   12.47 +
   12.48 +//Switches form CoreCtlr to either a normal Slv VP or the Master VP
   12.49 +//switch to VP's stack and frame ptr then jump to VP's next-instr-ptr
   12.50 +/* SlaveVP  offsets:
   12.51 + * 0x00  stackPtr
   12.52 + * 0x08 framePtr
   12.53 + * 0x10 resumeInstrPtr
   12.54 + * 0x18 coreCtlrFramePtr
   12.55 + * 0x20 coreCtlrStackPtr
   12.56 + *
   12.57 + * _PRMasterEnv  offsets:
   12.58 + * 0x00 coreCtlrReturnPt
   12.59 + * 0x100 masterLock
   12.60 + */
   12.61 +.globl switchToSlv
   12.62 +switchToSlv:
   12.63 +    #SlaveVP in %rdi
   12.64 +    movq    %rsp      , 0x20(%rdi)   #save core ctlr stack pointer 
   12.65 +    movq    %rbp      , 0x18(%rdi)   #save core ctlr frame pointer
   12.66 +    movq    0x00(%rdi), %rsp         #restore stack pointer
   12.67 +    movq    0x08(%rdi), %rbp         #restore frame pointer
   12.68 +    movq    0x10(%rdi), %rax         #get jmp pointer
   12.69 +    jmp     *%rax                    #jmp to Slv
   12.70 +coreCtlrReturn:
   12.71 +    ret
   12.72 +
   12.73 +    
   12.74 +//switches to core controller. saves return address
   12.75 +/* SlaveVP  offsets:
   12.76 + * 0x00  stackPtr
   12.77 + * 0x08 framePtr
   12.78 + * 0x10 resumeInstrPtr
   12.79 + * 0x18 coreCtlrFramePtr
   12.80 + * 0x20 coreCtlrStackPtr
   12.81 + *
   12.82 + * _PRMasterEnv  offsets:
   12.83 + * 0x00 coreCtlrReturnPt
   12.84 + * 0x100 masterLock
   12.85 + */
   12.86 +.globl switchToCoreCtlr
   12.87 +switchToCoreCtlr:
   12.88 +    #SlaveVP in %rdi
   12.89 +    movq    $SlvReturn, 0x10(%rdi)   #store return address
   12.90 +    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
   12.91 +    movq    %rbp      , 0x08(%rdi)   #save frame pointer
   12.92 +    movq    0x20(%rdi), %rsp         #restore stack pointer
   12.93 +    movq    0x18(%rdi), %rbp         #restore frame pointer
   12.94 +    movq    $_PRMasterEnv, %rcx
   12.95 +    movq        (%rcx), %rcx         #_PRMasterEnv is pointer to struct
   12.96 +    movq    0x00(%rcx), %rax         #get CoreCtlrStartPt
   12.97 +    jmp     *%rax                    #jmp to CoreCtlr
   12.98 +SlvReturn:
   12.99 +    ret
  12.100 +
  12.101 +
  12.102 +
  12.103 +//switches to core controller from master. saves return address
  12.104 +//Releases masterLock so the next AnimationMaster can be executed
  12.105 +/* SlaveVP  offsets:
  12.106 + * 0x00  stackPtr
  12.107 + * 0x08 framePtr
  12.108 + * 0x10 resumeInstrPtr
  12.109 + * 0x18 coreCtlrFramePtr
  12.110 + * 0x20 coreCtlrStackPtr
  12.111 + *
  12.112 + * _PRMasterEnv  offsets:
  12.113 + * 0x00 coreCtlrReturnPt
  12.114 + * 0x100 masterLock
  12.115 + */
  12.116 +.globl masterSwitchToCoreCtlr
  12.117 +masterSwitchToCoreCtlr:
  12.118 +    #SlaveVP in %rdi
  12.119 +    movq    $MasterReturn, 0x10(%rdi)   #store return address
  12.120 +    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
  12.121 +    movq    %rbp      , 0x08(%rdi)   #save frame pointer
  12.122 +    movq    0x20(%rdi), %rsp         #restore stack pointer
  12.123 +    movq    0x18(%rdi), %rbp         #restore frame pointer
  12.124 +    movq    $_PRMasterEnv, %rcx
  12.125 +    movq        (%rcx), %rcx         #_PRMasterEnv is pointer to struct
  12.126 +    movq    0x00(%rcx), %rax         #get CoreCtlr return pt
  12.127 +    movl    $0x0      , 0x100(%rcx)  #release lock
  12.128 +    jmp     *%rax                    #jmp to CoreCtlr
  12.129 +MasterReturn:
  12.130 +    ret
  12.131 +
  12.132 +
  12.133 +/*Switch to terminateCoreCtlr
  12.134 + *This is called by endOSThreadFn, which is the top-level function given
  12.135 + * to a shutdown slave.  When such a slave gets switched to, by the core
  12.136 + * controller, it runs the top-level function, which calls this, which
  12.137 + * then calls terminateCoreCtlr, which ends the pthread.  Note, when get
  12.138 + * here, stack is already set up for switchSlv and Slv ptr is in %rdi.
  12.139 + *Do not save registers of Slv because this function will never return
  12.140 + *
  12.141 + * SlaveVP  offsets:
  12.142 + * 0x00  stackPtr
  12.143 + * 0x08 framePtr
  12.144 + * 0x10 resumeInstrPtr
  12.145 + * 0x18 coreCtlrFramePtr
  12.146 + * 0x20 coreCtlrStackPtr
  12.147 + *
  12.148 + * _PRMasterEnv  offsets:
  12.149 + * 0x00 coreCtlrReturnPt
  12.150 + * 0x100 masterLock
  12.151 + */
  12.152 +.globl asmTerminateCoreCtlr
  12.153 +asmTerminateCoreCtlr:                #SlaveVP ptr is in %rdi
  12.154 +    movq    0x20(%rdi), %rsp         #restore stack pointer
  12.155 +    movq    0x18(%rdi), %rbp         #restore frame pointer
  12.156 +    movq    $terminateCoreCtlr, %rax
  12.157 +    jmp     *%rax                    #jmp to fn that ends the pthread
  12.158 +
  12.159 +
  12.160 +/*
  12.161 + * This one for the sequential version is special. It discards the current stack
  12.162 + * and returns directly from the coreCtlr after PR_WL__dissipate_slaveVP was called
  12.163 + */
  12.164 +.globl asmTerminateCoreCtlrSeq
  12.165 +asmTerminateCoreCtlrSeq:
  12.166 +    #SlaveVP in %rdi
  12.167 +    movq    0x20(%rdi), %rsp         #restore stack pointer
  12.168 +    movq    0x18(%rdi), %rbp         #restore frame pointer
  12.169 +    #argument is in %rdi
  12.170 +    call    PR_int__dissipate_slaveVP
  12.171 +    movq    %rbp      , %rsp        #goto the coreCtlrs stack
  12.172 +    pop     %rbp        #restore the old framepointer
  12.173 +    ret                 #return from core controller
  12.174 +    
  12.175 +
  12.176 +//Takes the return addr off the stack and saves into the loc pointed to by
  12.177 +// by the parameter passed in via rdi.  Return addr is at 0x8(%rbp) for 64bit
  12.178 +.globl PR_int__save_return_into_ptd_to_loc_then_do_ret
  12.179 +PR_int__save_return_into_ptd_to_loc_then_do_ret:
  12.180 +    movq 0x08(%rbp),   %rax  #get ret address, rbp is the same as in the calling function
  12.181 +    movq      %rax,   (%rdi) #write ret addr into addr passed as param field
  12.182 +    ret
  12.183 +
  12.184 +
  12.185 +//Assembly code changes the return addr on the stack to the one
  12.186 +// pointed to by the parameter, then returns. Stack's return addr is at 0x8(%rbp)
  12.187 +.globl PR_int__return_to_addr_in_ptd_to_loc
  12.188 +PR_int__return_to_addr_in_ptd_to_loc:
  12.189 +    movq   (%rdi),     %rax  #get return addr from addr passed as param
  12.190 +    movq    %rax, 0x08(%rbp) #write return addr to the stack of the caller
  12.191 +    ret
  12.192 +
    13.1 --- a/HW_Dependent_Primitives/VMS__HW_measurement.c	Mon Sep 03 03:34:54 2012 -0700
    13.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    13.3 @@ -1,87 +0,0 @@
    13.4 -#include <unistd.h>
    13.5 -#include <fcntl.h>
    13.6 -#include <linux/types.h>
    13.7 -#include <linux/perf_event.h>
    13.8 -#include <errno.h>
    13.9 -#include <sys/syscall.h>
   13.10 -#include <linux/prctl.h>
   13.11 -
   13.12 -#include "../VMS.h"
   13.13 -
   13.14 -void setup_perf_counters(){
   13.15 -#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
   13.16 -    struct perf_event_attr hw_event;
   13.17 -   memset(&hw_event,0,sizeof(hw_event));
   13.18 -	hw_event.size = sizeof(struct perf_event_attr);
   13.19 -	hw_event.disabled = 1;
   13.20 -	hw_event.inherit = 1; /* children inherit it   */
   13.21 -	hw_event.pinned = 1; /* must always be on PMU */
   13.22 -	hw_event.exclusive = 0; /* only group on PMU     */
   13.23 -	hw_event.exclude_user = 0; /* don't count user      */
   13.24 -	hw_event.exclude_kernel = 0; /* ditto kernel          */
   13.25 -	hw_event.exclude_hv = 0; /* ditto hypervisor      */
   13.26 -	hw_event.exclude_idle = 0; /* don't count when idle */
   13.27 -
   13.28 -        int coreIdx;
   13.29 -   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
   13.30 -    {
   13.31 -       hw_event.type = PERF_TYPE_HARDWARE;	
   13.32 -       hw_event.config = PERF_COUNT_HW_CPU_CYCLES; //cycles
   13.33 -        _VMSMasterEnv->cycles_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
   13.34 - 		0,//pid_t pid, 
   13.35 -		coreIdx,//int cpu, 
   13.36 -		-1,//int group_fd,
   13.37 -		0//unsigned long flags
   13.38 -	);
   13.39 -        if (_VMSMasterEnv->cycles_counter_fd[coreIdx]<0){
   13.40 -            fprintf(stderr,"On core %d: ",coreIdx);
   13.41 -            perror("Failed to open cycles counter");
   13.42 -        }
   13.43 -        hw_event.type = PERF_TYPE_HARDWARE;
   13.44 -        hw_event.config = PERF_COUNT_HW_INSTRUCTIONS; //instrs
   13.45 -        _VMSMasterEnv->instrs_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
   13.46 - 		0,//pid_t pid, 
   13.47 -		coreIdx,//int cpu, 
   13.48 -		-1,//int group_fd,
   13.49 -		0//unsigned long flags
   13.50 -	);
   13.51 -        if (_VMSMasterEnv->instrs_counter_fd[coreIdx]<0){
   13.52 -            fprintf(stderr,"On core %d: ",coreIdx);
   13.53 -            perror("Failed to open instrs counter");
   13.54 -        }
   13.55 -        hw_event.type = PERF_TYPE_HW_CACHE;
   13.56 -        hw_event.config = PERF_COUNT_HW_CACHE_L1D <<  0  |
   13.57 -	(PERF_COUNT_HW_CACHE_OP_READ		<<  8) |
   13.58 -	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16); //cache misses
   13.59 -        _VMSMasterEnv->cachem_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
   13.60 - 		0,//pid_t pid, 
   13.61 -		coreIdx,//int cpu, 
   13.62 -		-1,//int group_fd,
   13.63 -		0//unsigned long flags
   13.64 -	);
   13.65 -        if (_VMSMasterEnv->cachem_counter_fd[coreIdx]<0){
   13.66 -            fprintf(stderr,"On core %d: ",coreIdx);
   13.67 -            perror("Failed to open cache miss counter");
   13.68 -            exit(1);
   13.69 -        }
   13.70 -   }
   13.71 -        
   13.72 -   prctl(PR_TASK_PERF_EVENTS_ENABLE);
   13.73 -#endif
   13.74 -}
   13.75 -
   13.76 -__inline__ uint64_t rdtsc(){
   13.77 -    uint32_t lo, hi;
   13.78 -    __asm__ __volatile__ (      // serialize
   13.79 -    "xorl %%eax,%%eax \n        cpuid"
   13.80 -    ::: "%rax", "%rbx", "%rcx", "%rdx");
   13.81 -    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); 
   13.82 -   /* asm volatile("RDTSC;"                   
   13.83 -                 "movl %%eax, %0;"         
   13.84 -                 "movl %%edx, %1;"         
   13.85 -               : "=m" (lo), "=m" (hi)
   13.86 -               :                        
   13.87 -               : "%eax", "%edx"         
   13.88 -                ); */
   13.89 -    return (uint64_t)hi << 32 | lo;
   13.90 -}
   13.91 \ No newline at end of file
    14.1 --- a/HW_Dependent_Primitives/VMS__HW_measurement.h	Mon Sep 03 03:34:54 2012 -0700
    14.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    14.3 @@ -1,63 +0,0 @@
    14.4 -/*
    14.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    14.6 - *  Licensed under GNU General Public License version 2
    14.7 - *
    14.8 - * Author: seanhalle@yahoo.com
    14.9 - * 
   14.10 - */
   14.11 -
   14.12 -#ifndef _VMS__HW_MEASUREMENT_H
   14.13 -#define	_VMS__HW_MEASUREMENT_H
   14.14 -#define _GNU_SOURCE
   14.15 -
   14.16 -
   14.17 -//===================  Macros to Capture Measurements  ======================
   14.18 -
   14.19 -typedef union
   14.20 - { uint32 lowHigh[2];
   14.21 -   uint64 longVal;
   14.22 - }
   14.23 -TSCountLowHigh;
   14.24 -
   14.25 -
   14.26 -//===================  Macros to Capture Measurements  ======================
   14.27 -//
   14.28 -//===== RDTSC wrapper ===== 
   14.29 -//Also runs with x86_64 code
   14.30 -#define saveTSCLowHigh(lowHighIn) \
   14.31 -   asm volatile("RDTSC;                   \
   14.32 -                 movl %%eax, %0;          \
   14.33 -                 movl %%edx, %1;"         \
   14.34 -   /* outputs */ : "=m" (lowHighIn.lowHigh[0]), "=m" (lowHighIn.lowHigh[1])\
   14.35 -   /* inputs  */ :                        \
   14.36 -   /* clobber */ : "%eax", "%edx"         \
   14.37 -                );
   14.38 -
   14.39 -#define saveTimeStampCountInto(low, high) \
   14.40 -   asm volatile("RDTSC;                   \
   14.41 -                 movl %%eax, %0;          \
   14.42 -                 movl %%edx, %1;"         \
   14.43 -   /* outputs */ : "=m" (low), "=m" (high)\
   14.44 -   /* inputs  */ :                        \
   14.45 -   /* clobber */ : "%eax", "%edx"         \
   14.46 -                );
   14.47 -
   14.48 -#define saveLowTimeStampCountInto(low)    \
   14.49 -   asm volatile("RDTSC;                   \
   14.50 -                 movl %%eax, %0;"         \
   14.51 -   /* outputs */ : "=m" (low)             \
   14.52 -   /* inputs  */ :                        \
   14.53 -   /* clobber */ : "%eax", "%edx"         \
   14.54 -                );
   14.55 -
   14.56 -inline TSCount getTSCount();
   14.57 -
   14.58 -
   14.59 -   //For code that calculates normalization-offset between TSC counts of
   14.60 -   // different cores.
   14.61 -//#define NUM_TSC_ROUND_TRIPS 10
   14.62 -
   14.63 -void setup_perf_counters();
   14.64 -uint64_t rdtsc(void);
   14.65 -#endif	/* */
   14.66 -
    15.1 --- a/HW_Dependent_Primitives/VMS__primitives.c	Mon Sep 03 03:34:54 2012 -0700
    15.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    15.3 @@ -1,137 +0,0 @@
    15.4 -/*
    15.5 - * This File contains all hardware dependent C code.
    15.6 - */
    15.7 -
    15.8 -
    15.9 -#include "../VMS.h"
   15.10 -
   15.11 -/*Reset the stack then set it up with __cdecl structure on it
   15.12 - * Except doing a trick for 64 bits, where point slave to helper assembly
   15.13 - * that copies the function pointer off stack and into a reg, then
   15.14 - * jumps to it.  So, set the resumeInstrPtr to the helper-assembly.
   15.15 - *This is for first-time startup of slave.. it trashes the stack.
   15.16 - *No registers saved into old stack frame, and no animator state to
   15.17 - * return to
   15.18 - * 
   15.19 - *This was factored into separate function because it's used stand-alone in
   15.20 - * some wrapper-libraries (but only "int" version, to warn users to check
   15.21 - * carefully that it's safe)
   15.22 - */
   15.23 -inline void
   15.24 -VMS_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
   15.25 -                              void    *dataParam)
   15.26 - { void  *stackPtr;
   15.27 -
   15.28 -// Start of Hardware dependent part           
   15.29 -   
   15.30 -    //Set slave's instr pointer to a helper Fn that copies params from stack
   15.31 -   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&startUpTopLevelFn;
   15.32 -   
   15.33 -    //fnPtr takes two params -- void *dataParam & void *animSlv
   15.34 -    // Stack grows *down*, so start it at highest stack addr, minus room
   15.35 -    // for 2 params + return addr. Do ptr arith in terms of bytes..
   15.36 -   stackPtr = 
   15.37 -     (uint8 *)slaveVP->startOfStack + VIRT_PROCR_STACK_SIZE - 4*sizeof(void*);
   15.38 -  
   15.39 -    //setup __cdecl on stack
   15.40 -    //Normally, return Addr is in loc pointed to by stackPtr, but doing a
   15.41 -    // trick for 64 bit arch, where put ptr to top-level fn there instead,
   15.42 -    // and set resumeInstrPtr to a helper-fn that copies the top-level
   15.43 -    // fn ptr and params into registers.
   15.44 -    //Then, dataParam is at stackPtr + 8 bytes, & animating SlaveVP above
   15.45 -    //Do ptr arith in terms of pointers
   15.46 -   *((SlaveVP**)stackPtr + 2 ) = slaveVP; //rightmost param
   15.47 -   *((void**)stackPtr + 1 ) = dataParam;  //next  param to left
   15.48 -   *((void**)stackPtr) = (void*)fnPtr;    //copied to reg by helper Fn
   15.49 -   
   15.50 -  
   15.51 -// end of Hardware dependent part           
   15.52 -   
   15.53 -      //core controller will switch to stack & frame pointers stored in slave,
   15.54 -      // can't use this fn if have state on stack that needs preserving.
   15.55 -   slaveVP->stackPtr = stackPtr; 
   15.56 -   slaveVP->framePtr = stackPtr; 
   15.57 - }
   15.58 -
   15.59 -
   15.60 -/*Preserve the stack, pushing the __cdecl structure onto it
   15.61 - * For 64 bits, params passed in regs, so point slave to helper assembly
   15.62 - * that copies the arguments off stack and into regs, then
   15.63 - * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
   15.64 - * 
   15.65 - *This preserves the stack state existed at time slave was suspended.
   15.66 - */
   15.67 -inline void
   15.68 -VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
   15.69 -                              void    *param)
   15.70 - { void  *stackPtr;
   15.71 -
   15.72 -// Start of Hardware dependent part           
   15.73 -   
   15.74 -    // Get the slave's current stack ptr, and make room for param + ret addr
   15.75 -   stackPtr = ((void **)slaveVP->stackPtr - 2);
   15.76 -  
   15.77 -    //save slave's current instr ptr as the return addr, so stack looks
   15.78 -    // just like it does after a call instr.
   15.79 -    //Put argument plus fn addr onto stack -- helper will copy into regs
   15.80 -    // then jump to the fn
   15.81 -    //fnPtr is just below top of stack, param is above at stackPtr + 8 bytes
   15.82 -   *((void**)stackPtr + 1 ) = param;
   15.83 -   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
   15.84 -   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
   15.85 -   
   15.86 -    //Set slave's instr pointer to a helper Fn that copies params from stack
   15.87 -   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToOneParamFn;
   15.88 -   
   15.89 -// end of Hardware dependent part           
   15.90 -   
   15.91 -      //core controller will switch to stack & frame pointers stored in slave,
   15.92 -      // then jmp to helper Fn, which will then move param to register used
   15.93 -      // to pass argument and jmp to fnPtr saved on stack.
   15.94 -      //That fn should save the framePtr on stack and make room
   15.95 -      // for its own frame, as normal.  So don't modify framePtr, only stack
   15.96 -   slaveVP->stackPtr = stackPtr;
   15.97 - }
   15.98 -
   15.99 -
  15.100 -/*Same as for one-parameter function, but puts two arguments on stack
  15.101 - *Preserve the stack, pushing the __cdecl structure onto it
  15.102 - * For 64 bits, params passed in regs, so point slave to helper assembly
  15.103 - * that copies the arguments off stack and into regs, then
  15.104 - * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
  15.105 - * 
  15.106 - *This preserves the stack state existed at time slave was suspended.
  15.107 - */
  15.108 -inline void
  15.109 -VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
  15.110 -                              void    *param1, void *param2)
  15.111 - { void  *stackPtr;
  15.112 -
  15.113 -// Start of Hardware dependent part           
  15.114 -   
  15.115 -    // Get the slave's current stack ptr, and make room for param + ret addr
  15.116 -   stackPtr = slaveVP->stackPtr - 3;
  15.117 -  
  15.118 -    //save slave's current instr ptr as the return addr, so stack looks
  15.119 -    // just like it does after a call instr.
  15.120 -    //Put argument plus fn addr onto stack -- helper will copy into regs
  15.121 -    // then jump to the fn
  15.122 -    //fnPtr is just below top of stack, param1 is above at stackPtr + 8 bytes
  15.123 -   *((void**)stackPtr + 2 ) = param2;
  15.124 -   *((void**)stackPtr + 1 ) = param1;
  15.125 -   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
  15.126 -   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
  15.127 -   
  15.128 -    //Set slave's instr pointer to a helper Fn that copies params from stack
  15.129 -   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToTwoParamFn;
  15.130 -   
  15.131 -// end of Hardware dependent part           
  15.132 -   
  15.133 -      //core controller will switch to stack & frame pointers stored in slave,
  15.134 -      // then jmp to helper Fn, which will then move param to register used
  15.135 -      // to pass argument and jmp to fnPtr saved on stack.
  15.136 -      //That fn should save the framePtr on stack and make room
  15.137 -      // for its own frame, as normal.  So don't modify framePtr, only stack
  15.138 -   slaveVP->stackPtr = stackPtr;
  15.139 - }
  15.140 -
    16.1 --- a/HW_Dependent_Primitives/VMS__primitives.h	Mon Sep 03 03:34:54 2012 -0700
    16.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    16.3 @@ -1,55 +0,0 @@
    16.4 -/*
    16.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    16.6 - *  Licensed under GNU General Public License version 2
    16.7 - *
    16.8 - * Author: seanhalle@yahoo.com
    16.9 - * 
   16.10 - */
   16.11 -
   16.12 -#ifndef  _VMS__PRIMITIVES_H
   16.13 -#define	_VMS__PRIMITIVES_H
   16.14 -#define _GNU_SOURCE
   16.15 -
   16.16 -void 
   16.17 -recordCoreCtlrReturnLabelAddr(void **returnAddress);
   16.18 -
   16.19 -void 
   16.20 -switchToSlv(SlaveVP *nextSlave);
   16.21 -
   16.22 -void 
   16.23 -switchToCoreCtlr(SlaveVP *nextSlave);
   16.24 -
   16.25 -void 
   16.26 -masterSwitchToCoreCtlr(SlaveVP *nextSlave);
   16.27 -
   16.28 -void 
   16.29 -startUpTopLevelFn();
   16.30 -
   16.31 -void 
   16.32 -jmpToOneParamFn();
   16.33 -
   16.34 -void 
   16.35 -jmpToTwoParamFn();
   16.36 -
   16.37 -void *
   16.38 -asmTerminateCoreCtlr(SlaveVP *currSlv);
   16.39 -
   16.40 -#define flushRegisters() \
   16.41 -        asm volatile ("":::"%rbx", "%r12", "%r13","%r14","%r15")
   16.42 -
   16.43 -void
   16.44 -VMS_int__save_return_into_ptd_to_loc_then_do_ret(void *ptdToLoc);
   16.45 -
   16.46 -void
   16.47 -VMS_int__return_to_addr_in_ptd_to_loc(void *ptdToLoc);
   16.48 -
   16.49 -inline void
   16.50 -VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
   16.51 -                              void    *param);
   16.52 -
   16.53 -inline void
   16.54 -VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
   16.55 -                              void    *param1, void *param2);
   16.56 -
   16.57 -#endif	/* _VMS__HW_DEPENDENT_H */
   16.58 -
    17.1 --- a/HW_Dependent_Primitives/VMS__primitives_asm.s	Mon Sep 03 03:34:54 2012 -0700
    17.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    17.3 @@ -1,189 +0,0 @@
    17.4 -.data
    17.5 -
    17.6 -
    17.7 -.text
    17.8 -
    17.9 -//Save return label address for the coreCtlr to pointer
   17.10 -//Arguments: Pointer to variable holding address
   17.11 -.globl recordCoreCtlrReturnLabelAddr
   17.12 -recordCoreCtlrReturnLabelAddr:
   17.13 -    movq    $coreCtlrReturn, %rcx  #load label address
   17.14 -    movq    %rcx, (%rdi)           #save address to pointer
   17.15 -    ret
   17.16 -
   17.17 -
   17.18 -//Trick for 64 bit arch -- copies args from stack into regs, then does jmp to
   17.19 -// the top-level function, which was pointed to by the stack-ptr
   17.20 -.globl startUpTopLevelFn
   17.21 -startUpTopLevelFn:
   17.22 -    movq    %rdi      , %rsi #get second argument from first argument of switchSlv
   17.23 -    movq    0x08(%rsp), %rdi #get first argument from stack
   17.24 -    movq    (%rsp)    , %rax #get top-level function's addr from stack
   17.25 -    jmp     *%rax            #jump to the top-level function
   17.26 -
   17.27 -
   17.28 -//Args passed in regs in 64 bit arch. This copies args from stack into regs,
   17.29 -// then does jmp to the function, whose addr is on stack.
   17.30 -//For 64bit, %rdi is first arg, %rsi is second arg to function
   17.31 -//The top of stack is a valid return addr (old value of slaveVP's instrPtr),
   17.32 -// and the fnPtr is just below the top of stack (will be overwritten when
   17.33 -// fn saves the frame ptr)
   17.34 -.globl jmpToOneParamFn
   17.35 -jmpToOneParamFn:
   17.36 -    movq    0x08(%rsp), %rdi #get the argument from stack
   17.37 -    movq   -0x08(%rsp), %rax #get function's addr from stack
   17.38 -    jmp     *%rax            #jump to the function
   17.39 -
   17.40 -.globl jmpToTwoParamFn
   17.41 -jmpToTwoParamFn:
   17.42 -    movq    0x10(%rsp), %rsi #get the second argument from stack
   17.43 -    movq    0x08(%rsp), %rdi #get the first argument from stack
   17.44 -    movq   -0x08(%rsp), %rax #get function's addr from stack
   17.45 -    jmp     *%rax            #jump to the function
   17.46 -
   17.47 -
   17.48 -//Switches form CoreCtlr to either a normal Slv VP or the Master VP
   17.49 -//switch to VP's stack and frame ptr then jump to VP's next-instr-ptr
   17.50 -/* SlaveVP  offsets:
   17.51 - * 0x00  stackPtr
   17.52 - * 0x08 framePtr
   17.53 - * 0x10 resumeInstrPtr
   17.54 - * 0x18 coreCtlrFramePtr
   17.55 - * 0x20 coreCtlrStackPtr
   17.56 - *
   17.57 - * _VMSMasterEnv  offsets:
   17.58 - * 0x00 coreCtlrReturnPt
   17.59 - * 0x100 masterLock
   17.60 - */
   17.61 -.globl switchToSlv
   17.62 -switchToSlv:
   17.63 -    #SlaveVP in %rdi
   17.64 -    movq    %rsp      , 0x20(%rdi)   #save core ctlr stack pointer 
   17.65 -    movq    %rbp      , 0x18(%rdi)   #save core ctlr frame pointer
   17.66 -    movq    0x00(%rdi), %rsp         #restore stack pointer
   17.67 -    movq    0x08(%rdi), %rbp         #restore frame pointer
   17.68 -    movq    0x10(%rdi), %rax         #get jmp pointer
   17.69 -    jmp     *%rax                    #jmp to Slv
   17.70 -coreCtlrReturn:
   17.71 -    ret
   17.72 -
   17.73 -    
   17.74 -//switches to core controller. saves return address
   17.75 -/* SlaveVP  offsets:
   17.76 - * 0x00  stackPtr
   17.77 - * 0x08 framePtr
   17.78 - * 0x10 resumeInstrPtr
   17.79 - * 0x18 coreCtlrFramePtr
   17.80 - * 0x20 coreCtlrStackPtr
   17.81 - *
   17.82 - * _VMSMasterEnv  offsets:
   17.83 - * 0x00 coreCtlrReturnPt
   17.84 - * 0x100 masterLock
   17.85 - */
   17.86 -.globl switchToCoreCtlr
   17.87 -switchToCoreCtlr:
   17.88 -    #SlaveVP in %rdi
   17.89 -    movq    $SlvReturn, 0x10(%rdi)   #store return address
   17.90 -    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
   17.91 -    movq    %rbp      , 0x08(%rdi)   #save frame pointer
   17.92 -    movq    0x20(%rdi), %rsp         #restore stack pointer
   17.93 -    movq    0x18(%rdi), %rbp         #restore frame pointer
   17.94 -    movq    $_VMSMasterEnv, %rcx
   17.95 -    movq        (%rcx), %rcx         #_VMSMasterEnv is pointer to struct
   17.96 -    movq    0x00(%rcx), %rax         #get CoreCtlrStartPt
   17.97 -    jmp     *%rax                    #jmp to CoreCtlr
   17.98 -SlvReturn:
   17.99 -    ret
  17.100 -
  17.101 -
  17.102 -
  17.103 -//switches to core controller from master. saves return address
  17.104 -//Releases masterLock so the next AnimationMaster can be executed
  17.105 -/* SlaveVP  offsets:
  17.106 - * 0x00  stackPtr
  17.107 - * 0x08 framePtr
  17.108 - * 0x10 resumeInstrPtr
  17.109 - * 0x18 coreCtlrFramePtr
  17.110 - * 0x20 coreCtlrStackPtr
  17.111 - *
  17.112 - * _VMSMasterEnv  offsets:
  17.113 - * 0x00 coreCtlrReturnPt
  17.114 - * 0x100 masterLock
  17.115 - */
  17.116 -.globl masterSwitchToCoreCtlr
  17.117 -masterSwitchToCoreCtlr:
  17.118 -    #SlaveVP in %rdi
  17.119 -    movq    $MasterReturn, 0x10(%rdi)   #store return address
  17.120 -    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
  17.121 -    movq    %rbp      , 0x08(%rdi)   #save frame pointer
  17.122 -    movq    0x20(%rdi), %rsp         #restore stack pointer
  17.123 -    movq    0x18(%rdi), %rbp         #restore frame pointer
  17.124 -    movq    $_VMSMasterEnv, %rcx
  17.125 -    movq        (%rcx), %rcx         #_VMSMasterEnv is pointer to struct
  17.126 -    movq    0x00(%rcx), %rax         #get CoreCtlr return pt
  17.127 -    movl    $0x0      , 0x100(%rcx)  #release lock
  17.128 -    jmp     *%rax                    #jmp to CoreCtlr
  17.129 -MasterReturn:
  17.130 -    ret
  17.131 -
  17.132 -
  17.133 -/*Switch to terminateCoreCtlr
  17.134 - *This is called by endOSThreadFn, which is the top-level function given
  17.135 - * to a shutdown slave.  When such a slave gets switched to, by the core
  17.136 - * controller, it runs the top-level function, which calls this, which
  17.137 - * then calls terminateCoreCtlr, which ends the pthread.  Note, when get
  17.138 - * here, stack is already set up for switchSlv and Slv ptr is in %rdi.
  17.139 - *Do not save registers of Slv because this function will never return
  17.140 - *
  17.141 - * SlaveVP  offsets:
  17.142 - * 0x00  stackPtr
  17.143 - * 0x08 framePtr
  17.144 - * 0x10 resumeInstrPtr
  17.145 - * 0x18 coreCtlrFramePtr
  17.146 - * 0x20 coreCtlrStackPtr
  17.147 - *
  17.148 - * _VMSMasterEnv  offsets:
  17.149 - * 0x00 coreCtlrReturnPt
  17.150 - * 0x100 masterLock
  17.151 - */
  17.152 -.globl asmTerminateCoreCtlr
  17.153 -asmTerminateCoreCtlr:                #SlaveVP ptr is in %rdi
  17.154 -    movq    0x20(%rdi), %rsp         #restore stack pointer
  17.155 -    movq    0x18(%rdi), %rbp         #restore frame pointer
  17.156 -    movq    $terminateCoreCtlr, %rax
  17.157 -    jmp     *%rax                    #jmp to fn that ends the pthread
  17.158 -
  17.159 -
  17.160 -/*
  17.161 - * This one for the sequential version is special. It discards the current stack
  17.162 - * and returns directly from the coreCtlr after VMS_WL__dissipate_slaveVP was called
  17.163 - */
  17.164 -.globl asmTerminateCoreCtlrSeq
  17.165 -asmTerminateCoreCtlrSeq:
  17.166 -    #SlaveVP in %rdi
  17.167 -    movq    0x20(%rdi), %rsp         #restore stack pointer
  17.168 -    movq    0x18(%rdi), %rbp         #restore frame pointer
  17.169 -    #argument is in %rdi
  17.170 -    call    VMS_int__dissipate_slaveVP
  17.171 -    movq    %rbp      , %rsp        #goto the coreCtlrs stack
  17.172 -    pop     %rbp        #restore the old framepointer
  17.173 -    ret                 #return from core controller
  17.174 -    
  17.175 -
  17.176 -//Takes the return addr off the stack and saves into the loc pointed to by
  17.177 -// by the parameter passed in via rdi.  Return addr is at 0x8(%rbp) for 64bit
  17.178 -.globl VMS_int__save_return_into_ptd_to_loc_then_do_ret
  17.179 -VMS_int__save_return_into_ptd_to_loc_then_do_ret:
  17.180 -    movq 0x08(%rbp),   %rax  #get ret address, rbp is the same as in the calling function
  17.181 -    movq      %rax,   (%rdi) #write ret addr into addr passed as param field
  17.182 -    ret
  17.183 -
  17.184 -
  17.185 -//Assembly code changes the return addr on the stack to the one
  17.186 -// pointed to by the parameter, then returns. Stack's return addr is at 0x8(%rbp)
  17.187 -.globl VMS_int__return_to_addr_in_ptd_to_loc
  17.188 -VMS_int__return_to_addr_in_ptd_to_loc:
  17.189 -    movq   (%rdi),     %rax  #get return addr from addr passed as param
  17.190 -    movq    %rax, 0x08(%rbp) #write return addr to the stack of the caller
  17.191 -    ret
  17.192 -
    18.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    18.2 +++ b/PR.h	Wed Sep 19 23:12:44 2012 -0700
    18.3 @@ -0,0 +1,442 @@
    18.4 +/*
    18.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
    18.6 + *  Licensed under GNU General Public License version 2
    18.7 + *
    18.8 + * Author: seanhalle@yahoo.com
    18.9 + * 
   18.10 + */
   18.11 +
   18.12 +#ifndef _PR_H
   18.13 +#define	_PR_H
   18.14 +#define _GNU_SOURCE
   18.15 +
   18.16 +#include "DynArray/DynArray.h"
   18.17 +#include "Hash_impl/PrivateHash.h"
   18.18 +#include "Histogram/Histogram.h"
   18.19 +#include "Queue_impl/PrivateQueue.h"
   18.20 +
   18.21 +#include "PR_primitive_data_types.h"
   18.22 +#include "Services_Offered_by_PR/Memory_Handling/vmalloc.h"
   18.23 +
   18.24 +#include <pthread.h>
   18.25 +#include <sys/time.h>
   18.26 +
   18.27 +//=================  Defines: included from separate files  =================
   18.28 +//
   18.29 +// Note: ALL defines are in other files, none are in here
   18.30 +//
   18.31 +#include "Defines/PR_defs.h"
   18.32 +
   18.33 +
   18.34 +//================================ Typedefs =================================
   18.35 +//
   18.36 +typedef unsigned long long    TSCount;
   18.37 +
   18.38 +typedef struct _AnimSlot     AnimSlot;
   18.39 +typedef struct _PRReqst      PRReqst;
   18.40 +typedef struct _SlaveVP       SlaveVP;
   18.41 +typedef struct _MasterVP      MasterVP;
   18.42 +typedef struct _IntervalProbe IntervalProbe;
   18.43 +
   18.44 +
   18.45 +typedef SlaveVP *(*SlaveAssigner)  ( void *, AnimSlot*); //semEnv, slot for HW info
   18.46 +typedef void     (*RequestHandler) ( SlaveVP *, void * ); //prWReqst, semEnv
   18.47 +typedef void     (*TopLevelFnPtr)  ( void *, SlaveVP * ); //initData, animSlv
   18.48 +typedef void       TopLevelFn      ( void *, SlaveVP * ); //initData, animSlv
   18.49 +typedef void     (*ResumeSlvFnPtr) ( SlaveVP *, void * );
   18.50 +      //=========== MEASUREMENT STUFF ==========
   18.51 +        MEAS__Insert_Counter_Handler
   18.52 +      //========================================
   18.53 +
   18.54 +//============================ HW Dependent Fns ================================
   18.55 +
   18.56 +#include "HW_Dependent_Primitives/PR__HW_measurement.h"
   18.57 +#include "HW_Dependent_Primitives/PR__primitives.h"
   18.58 +
   18.59 +
   18.60 +//============= Request Related ===========
   18.61 +//
   18.62 +
   18.63 +enum PRReqstType   //avoid starting enums at 0, for debug reasons
   18.64 + {
   18.65 +   semantic = 1,
   18.66 +   createReq,
   18.67 +   dissipate,
   18.68 +   PRSemantic      //goes with PRSemReqst below
   18.69 + };
   18.70 +
   18.71 +struct _PRReqst
   18.72 + {
   18.73 +   enum PRReqstType  reqType;//used for dissipate and in future for IO requests
   18.74 +   void              *semReqData;
   18.75 +
   18.76 +   PRReqst *nextReqst;
   18.77 + };
   18.78 +//PRReqst
   18.79 +
   18.80 +enum PRSemReqstType   //These are equivalent to semantic requests, but for
   18.81 + {                     // PR's services available directly to app, like OS
   18.82 +   make_probe = 1,    // and probe services -- like a PR-wide built-in lang
   18.83 +   throw_excp,
   18.84 +   openFile,
   18.85 +   otherIO
   18.86 + };
   18.87 +
   18.88 +typedef struct
   18.89 + { enum PRSemReqstType reqType;
   18.90 +   SlaveVP             *requestingSlv;
   18.91 +   char                *nameStr;  //for create probe
   18.92 +   char                *msgStr;   //for exception
   18.93 +   void                *exceptionData;
   18.94 + }
   18.95 + PRSemReq;
   18.96 +
   18.97 +
   18.98 +//====================  Core data structures  ===================
   18.99 +
  18.100 +typedef struct
  18.101 + {
  18.102 +   //for future expansion
  18.103 + }
  18.104 +SlotPerfInfo;
  18.105 +
  18.106 +struct _AnimSlot
  18.107 + {
  18.108 +   int           workIsDone;
  18.109 +   int           needsSlaveAssigned;
  18.110 +   SlaveVP      *slaveAssignedToSlot;
  18.111 +   
  18.112 +   int           slotIdx;  //needed by Holistic Model's data gathering
  18.113 +   int           coreSlotIsOn;
  18.114 +   SlotPerfInfo *perfInfo; //used by assigner to pick best slave for core
  18.115 + };
  18.116 +//AnimSlot
  18.117 +
  18.118 +enum VPtype 
  18.119 + { TaskSlotSlv = 1,//Slave tied to an anim slot, only animates tasks
  18.120 +   TaskExtraSlv,   //When a suspended task ends, the slave becomes this
  18.121 +   PersistentSlv,  //the VP is explicitly seen in the app code, or task suspends
  18.122 +   Slave, //to be removed
  18.123 +   Master,
  18.124 +   Shutdown,
  18.125 +   Idle
  18.126 + };
  18.127 + 
  18.128 +/*This structure embodies the state of a slaveVP.  It is reused for masterVP
  18.129 + * and shutdownVPs.
  18.130 + */
  18.131 +struct _SlaveVP
  18.132 + {    //The offsets of these fields are hard-coded into assembly
  18.133 +   void       *stackPtr;         //save the core's stack ptr when suspend
  18.134 +   void       *framePtr;         //save core's frame ptr when suspend
  18.135 +   void       *resumeInstrPtr;   //save core's program-counter when suspend
  18.136 +   void       *coreCtlrFramePtr; //restore before jmp back to core controller
  18.137 +   void       *coreCtlrStackPtr; //restore before jmp back to core controller
  18.138 +   
  18.139 +      //============ below this, no fields are used in asm =============
  18.140 +   
  18.141 +   int         slaveID;       //each slave given a globally unique ID
  18.142 +   int         coreAnimatedBy; 
  18.143 +   void       *startOfStack;  //used to free, and to point slave to Fn
  18.144 +   enum VPtype typeOfVP;      //Slave vs Master vs Shutdown..
  18.145 +   int         assignCount;   //Each assign is for one work-unit, so IDs it
  18.146 +      //note, a scheduling decision is uniquely identified by the triple:
  18.147 +      // <slaveID, coreAnimatedBy, assignCount> -- used in record & replay
  18.148 +   
  18.149 +      //for comm -- between master and coreCtlr & btwn wrapper lib and plugin
  18.150 +   AnimSlot   *animSlotAssignedTo;
  18.151 +   PRReqst   *request;      //wrapper lib puts in requests, plugin takes out
  18.152 +   void       *dataRetFromReq;//Return vals from plugin to Wrapper Lib
  18.153 +
  18.154 +      //For using Slave as carrier for data
  18.155 +   void       *semanticData;  //Lang saves lang-specific things in slave here
  18.156 +
  18.157 +        //=========== MEASUREMENT STUFF ==========
  18.158 +         MEAS__Insert_Meas_Fields_into_Slave;
  18.159 +         float64     createPtInSecs;  //time VP created, in seconds
  18.160 +        //========================================
  18.161 + };
  18.162 +//SlaveVP
  18.163 +
  18.164 + 
  18.165 +/* The one and only global variable, holds many odds and ends
  18.166 + */
  18.167 +typedef struct
  18.168 + {    //The offsets of these fields are hard-coded into assembly
  18.169 +   void            *coreCtlrReturnPt;    //offset to this field used in asm
  18.170 +   int8             falseSharePad1[256 - sizeof(void*)];
  18.171 +   int32            masterLock;          //offset to this field used in asm
  18.172 +   int8             falseSharePad2[256 - sizeof(int32)];
  18.173 +      //============ below this, no fields are used in asm =============
  18.174 +
  18.175 +      //Basic PR infrastructure
  18.176 +   SlaveVP        **masterVPs;
  18.177 +   AnimSlot      ***allAnimSlots;
  18.178 +   
  18.179 +      //plugin related
  18.180 +   PRSemEnv       **langlets;
  18.181 +   
  18.182 +      //Slave creation -- global count of slaves existing, across langs and processes
  18.183 +   int32            numSlavesCreated;  //used to give unique ID to processor
  18.184 +//no reasonable way to do fail-safe when have mult langlets and processes.. have to detect for each langlet separately
  18.185 +//   int32            numSlavesAlive;    //used to detect fail-safe shutdown
  18.186 +
  18.187 +      //Initialization related
  18.188 +   int32            setupComplete;      //use while starting up coreCtlr
  18.189 +
  18.190 +      //Memory management related
  18.191 +   MallocArrays    *freeLists;
  18.192 +   int32            amtOfOutstandingMem;//total currently allocated
  18.193 +
  18.194 +      //Random number seeds -- random nums used in various places  
  18.195 +   uint32_t seed1;
  18.196 +   uint32_t seed2;
  18.197 +
  18.198 +      //=========== MEASUREMENT STUFF =============
  18.199 +       IntervalProbe   **intervalProbes;
  18.200 +       PtrToPrivDynArray *dynIntervalProbesInfo;
  18.201 +       HashTable        *probeNameHashTbl;
  18.202 +       int32             masterCreateProbeID;
  18.203 +       float64           createPtInSecs; //real-clock time PR initialized
  18.204 +       Histogram       **measHists;
  18.205 +       PtrToPrivDynArray *measHistsInfo;
  18.206 +       MEAS__Insert_Susp_Meas_Fields_into_MasterEnv;
  18.207 +       MEAS__Insert_Master_Meas_Fields_into_MasterEnv;
  18.208 +       MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv;
  18.209 +       MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv;
  18.210 +       MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv;
  18.211 +       MEAS__Insert_System_Meas_Fields_into_MasterEnv;
  18.212 +       MEAS__Insert_Counter_Meas_Fields_into_MasterEnv;
  18.213 +      //==========================================
  18.214 + }
  18.215 +MasterEnv;
  18.216 +
  18.217 +//=====================
  18.218 +typedef struct
  18.219 + { int32   langletID; //acts as index into array of langlets in master env
  18.220 +   void   *langletSemEnv;
  18.221 +   int32   langMagicNumber;
  18.222 +   SlaveAssigner    slaveAssigner;
  18.223 +   RequestHandler   requestHandler;
  18.224 +   EndTaskHandler   endTaskHandler;
  18.225 +   
  18.226 +      //Tack slaves created, separately for each langlet (in each process)
  18.227 +   int32            numSlavesCreated;  //gives ordering to processor creation
  18.228 +   int32            numSlavesAlive;    //used to detect fail-safe shutdown
  18.229 +   
  18.230 +      //when multi-lang, master polls sem env's to find one with work in it..
  18.231 +      // in single-lang case, flag ignored, master always asks lang for work
  18.232 +   int32   hasWork;    
  18.233 + }
  18.234 +PRSemEnv;
  18.235 +
  18.236 +//=====================  Top Processor level Data Strucs  ======================
  18.237 +typedef struct
  18.238 + { 
  18.239 +   
  18.240 + }
  18.241 +PRProcess;
  18.242 +/*This structure holds all the information PR needs to manage a program.  PR
  18.243 + * stores information about what percent of CPU time the program is getting, 
  18.244 + * 
  18.245 + */
  18.246 +typedef struct
  18.247 + { //void               *semEnv;
  18.248 +   //RequestHdlrFnPtr    requestHandler;
  18.249 +   //SlaveAssignerFnPtr  slaveAssigner;
  18.250 +   int32               numSlavesLive;
  18.251 +   void               *resultToReturn;
  18.252 +  
  18.253 +   SlaveVP        *seedSlv;   
  18.254 +   
  18.255 +      //These are used to coordinate within the main function..?
  18.256 +   bool32          executionIsComplete;
  18.257 +   pthread_mutex_t doneLock; //? not sure need these..?
  18.258 +   pthread_cond_t  doneCond;
  18.259 + }
  18.260 +PRProcess;
  18.261 +
  18.262 +
  18.263 +//=========================  Extra Stuff Data Strucs  =======================
  18.264 +typedef struct
  18.265 + {
  18.266 +
  18.267 + }
  18.268 +PRExcp; //exception
  18.269 +
  18.270 +//=======================  OS Thread related  ===============================
  18.271 +
  18.272 +void * coreController( void *paramsIn );  //standard PThreads fn prototype
  18.273 +void * coreCtlr_Seq( void *paramsIn );  //standard PThreads fn prototype
  18.274 +void animationMaster( void *initData, SlaveVP *masterVP );
  18.275 +
  18.276 +
  18.277 +typedef struct
  18.278 + {
  18.279 +   void           *endThdPt;
  18.280 +   unsigned int    coreNum;
  18.281 + }
  18.282 +ThdParams;
  18.283 +
  18.284 +//=============================  Global Vars ================================
  18.285 +
  18.286 +volatile MasterEnv      *_PRMasterEnv __align_to_cacheline__;
  18.287 +
  18.288 +   //these are global, but only used for startup and shutdown
  18.289 +pthread_t       coreCtlrThdHandles[ NUM_CORES ]; //pthread's virt-procr state
  18.290 +ThdParams      *coreCtlrThdParams [ NUM_CORES ];
  18.291 +
  18.292 +pthread_mutex_t suspendLock;
  18.293 +pthread_cond_t  suspendCond;
  18.294 +
  18.295 +//=========================  Function Prototypes  ===========================
  18.296 +/* MEANING OF   WL  PI  SS  int PROS
  18.297 + * These indicate which places the function is safe to use.  They stand for:
  18.298 + * 
  18.299 + * WL   Wrapper Library -- wrapper lib code should only use these
  18.300 + * PI   Plugin          -- plugin code should only use these
  18.301 + * SS   Startup and Shutdown -- designates these relate to startup & shutdown
  18.302 + * int  internal to PR -- should not be used in wrapper lib or plugin
  18.303 + * PROS means "OS functions for applications to use"
  18.304 + * 
  18.305 + * PR_int__ functions touch internal PR data structs and are only safe
  18.306 + *  to be used inside the master lock.  However, occasionally, they appear
  18.307 + * in wrapper-lib or plugin code.  In those cases, very careful analysis
  18.308 + * has been done to be sure no concurrency issues could arise.
  18.309 + * 
  18.310 + * PR_WL__ functions are all safe for use outside the master lock.
  18.311 + * 
  18.312 + * PROS are only safe for applications to use -- they're like a second
  18.313 + * language mixed in -- but they can't be used inside plugin code, and
  18.314 + * aren't meant for use in wrapper libraries, because they are themselves
  18.315 + * wrapper-library calls!
  18.316 + */
  18.317 +//========== Startup and shutdown ==========
  18.318 +void
  18.319 +PR__start();
  18.320 +
  18.321 +void
  18.322 +PR_SS__start_the_work_then_wait_until_done();
  18.323 +
  18.324 +SlaveVP* 
  18.325 +PR_SS__create_shutdown_slave();
  18.326 +
  18.327 +void
  18.328 +PR_SS__shutdown();
  18.329 +
  18.330 +void
  18.331 +PR_SS__cleanup_at_end_of_shutdown();
  18.332 +
  18.333 +void
  18.334 +PR_SS__register_langlets_semEnv( PRSemEnv *semEnv, int32 VSs_MAGIC_NUMBER, 
  18.335 +                              SlaveVP  *seedVP );
  18.336 +
  18.337 +
  18.338 +//==============    ===============
  18.339 +
  18.340 +inline SlaveVP *
  18.341 +PR_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
  18.342 +#define PR_PI__create_slaveVP PR_int__create_slaveVP
  18.343 +#define PR_WL__create_slaveVP PR_int__create_slaveVP
  18.344 +
  18.345 +   //Use this to create processor inside entry point & other places outside
  18.346 +   // the PR system boundary (IE, don't animate with a SlaveVP or MasterVP)
  18.347 +SlaveVP *
  18.348 +PR_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
  18.349 +
  18.350 +inline SlaveVP *
  18.351 +PR_int__create_slaveVP_helper( SlaveVP *newSlv,       TopLevelFnPtr  fnPtr,
  18.352 +                                void      *dataParam, void           *stackLocs );
  18.353 +
  18.354 +inline void
  18.355 +PR_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
  18.356 +                              void    *dataParam);
  18.357 +
  18.358 +inline void
  18.359 +PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
  18.360 +                              void    *param);
  18.361 +
  18.362 +inline void
  18.363 +PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
  18.364 +                              void    *param1, void *param2);
  18.365 +
  18.366 +void
  18.367 +PR_int__dissipate_slaveVP( SlaveVP *slaveToDissipate );
  18.368 +#define PR_PI__dissipate_slaveVP PR_int__dissipate_slaveVP
  18.369 +//WL: dissipate a SlaveVP by sending a request
  18.370 +
  18.371 +void
  18.372 +PR_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate );
  18.373 +
  18.374 +void
  18.375 +PR_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, PRExcp *excpData );
  18.376 +#define PR_PI__throw_exception  PR_int__throw_exception
  18.377 +void
  18.378 +PR_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  PRExcp *excpData );
  18.379 +#define PR_App__throw_exception PR_WL__throw_exception
  18.380 +
  18.381 +void *
  18.382 +PR_int__give_sem_env_for( SlaveVP *animSlv );
  18.383 +#define PR_PI__give_sem_env_for  PR_int__give_sem_env_for
  18.384 +#define PR_SS__give_sem_env_for  PR_int__give_sem_env_for
  18.385 +//No WL version -- not safe!  if use in WL, be sure data rd & wr is stable
  18.386 +
  18.387 +
  18.388 +inline void
  18.389 +PR_int__get_master_lock();
  18.390 +
  18.391 +#define PR_int__release_master_lock() _PRMasterEnv->masterLock = UNLOCKED
  18.392 +
  18.393 +inline uint32_t
  18.394 +PR_int__randomNumber();
  18.395 +
  18.396 +//==============  Request Related  ===============
  18.397 +
  18.398 +void
  18.399 +PR_int__suspend_slaveVP_and_send_req( SlaveVP *callingSlv );
  18.400 +
  18.401 +inline void
  18.402 +PR_WL__add_sem_request_in_mallocd_PRReqst( void *semReqData, SlaveVP *callingSlv );
  18.403 +
  18.404 +inline void
  18.405 +PR_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv );
  18.406 +
  18.407 +void
  18.408 +PR_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv );
  18.409 +
  18.410 +void inline
  18.411 +PR_WL__send_dissipate_req( SlaveVP *prToDissipate );
  18.412 +
  18.413 +inline void
  18.414 +PR_WL__send_PRSem_request( void *semReqData, SlaveVP *callingSlv );
  18.415 +
  18.416 +PRReqst *
  18.417 +PR_PI__take_next_request_out_of( SlaveVP *slaveWithReq );
  18.418 +//#define PR_PI__take_next_request_out_of( slave ) slave->requests
  18.419 +
  18.420 +//inline void *
  18.421 +//PR_PI__take_sem_reqst_from( PRReqst *req );
  18.422 +#define PR_PI__take_sem_reqst_from( req ) req->semReqData
  18.423 +
  18.424 +void inline
  18.425 +PR_PI__handle_PRSemReq( PRReqst *req, SlaveVP *requestingSlv, void *semEnv,
  18.426 +                       ResumeSlvFnPtr resumeSlvFnPtr );
  18.427 +
  18.428 +//======================== MEASUREMENT ======================
  18.429 +uint64
  18.430 +PR_WL__give_num_plugin_cycles();
  18.431 +uint32
  18.432 +PR_WL__give_num_plugin_animations();
  18.433 +
  18.434 +
  18.435 +//========================= Utilities =======================
  18.436 +inline char *
  18.437 +PR_int__strDup( char *str );
  18.438 +
  18.439 +
  18.440 +//========================= Probes =======================
  18.441 +#include "Services_Offered_by_PR/Measurement_and_Stats/probes.h"
  18.442 +
  18.443 +//================================================
  18.444 +#endif	/* _PR_H */
  18.445 +
    19.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    19.2 +++ b/PR__PI.c	Wed Sep 19 23:12:44 2012 -0700
    19.3 @@ -0,0 +1,121 @@
    19.4 +/*
    19.5 + * Copyright 2010  OpenSourceStewardshipFoundation
    19.6 + *
    19.7 + * Licensed under BSD
    19.8 + */
    19.9 +
   19.10 +#include <stdio.h>
   19.11 +#include <stdlib.h>
   19.12 +#include <string.h>
   19.13 +#include <malloc.h>
   19.14 +#include <inttypes.h>
   19.15 +#include <sys/time.h>
   19.16 +
   19.17 +#include "PR.h"
   19.18 +
   19.19 +
   19.20 +/* MEANING OF   WL  PI  SS  int
   19.21 + * These indicate which places the function is safe to use.  They stand for:
   19.22 + * WL: Wrapper Library
   19.23 + * PI: Plugin 
   19.24 + * SS: Startup and Shutdown
   19.25 + * int: internal to the PR implementation
   19.26 + */
   19.27 +
   19.28 +//=========================  Local Declarations  ========================
   19.29 +void inline
   19.30 +handleMakeProbe( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
   19.31 +
   19.32 +void inline
   19.33 +handleThrowException( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
   19.34 +//=======================================================================
   19.35 +
   19.36 + 
   19.37 +PRReqst *
   19.38 +PR_PI__take_next_request_out_of( SlaveVP *slaveWithReq )
   19.39 + { PRReqst *req;
   19.40 +
   19.41 +   req = slaveWithReq->request;
   19.42 +   if( req == NULL ) return NULL;
   19.43 +
   19.44 +   slaveWithReq->request = slaveWithReq->request->nextReqst;
   19.45 +   return req;
   19.46 + }
   19.47 +
   19.48 + 
   19.49 +
   19.50 +/*May 2012
   19.51 + *CHANGED IMPL -- now a macro in header file
   19.52 + *
   19.53 + *Turn function into macro that just accesses the request field
   19.54 + *
   19.55 +inline void *
   19.56 +PR_PI__take_sem_reqst_from( PRReqst *req )
   19.57 + {
   19.58 +   return req->semReqData;
   19.59 + }
   19.60 +*/
   19.61 +
   19.62 +
   19.63 +/* This is for OS requests and PR infrastructure requests, such as to create
   19.64 + *  a probe -- a probe is inside the heart of PR-core, it's not part of any
   19.65 + *  language -- but it's also a semantic thing that's triggered from and used
   19.66 + *  in the application.. so it crosses abstractions..  so, need some special
   19.67 + *  pattern here for handling such requests.
   19.68 + * Doing this just like it were a second language sharing PR-core.
   19.69 + * 
   19.70 + * This is called from the language's request handler when it sees a request
   19.71 + *  of type PRSemReq
   19.72 + *
   19.73 + * TODO: Later change this, to give probes their own separate plugin & have
   19.74 + *  PR-core steer the request to appropriate plugin
   19.75 + * Do the same for OS calls -- look later at it..
   19.76 + */
   19.77 +void inline
   19.78 +PR_PI__handle_PRSemReq( PRReqst *req, SlaveVP *requestingSlv, void *semEnv,
   19.79 +                       ResumeSlvFnPtr resumeFn )
   19.80 + { PRSemReq *semReq;
   19.81 +
   19.82 +   semReq = PR_PI__take_sem_reqst_from(req);
   19.83 +   if( semReq == NULL ) return;
   19.84 +   switch( semReq->reqType )  //sem handlers are all in other file
   19.85 +    {
   19.86 +      case make_probe:      handleMakeProbe(   semReq, semEnv, resumeFn);
   19.87 +         break;
   19.88 +      case throw_excp:  handleThrowException(  semReq, semEnv, resumeFn);
   19.89 +         break;
   19.90 +    }
   19.91 + }
   19.92 +
   19.93 +/*
   19.94 + */
   19.95 +void inline
   19.96 +handleMakeProbe( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
   19.97 + { IntervalProbe *newProbe;
   19.98 +
   19.99 +   newProbe          = PR_int__malloc( sizeof(IntervalProbe) );
  19.100 +   newProbe->nameStr = PR_int__strDup( semReq->nameStr );
  19.101 +   newProbe->hist    = NULL;
  19.102 +   newProbe->schedChoiceWasRecorded = FALSE;
  19.103 +
  19.104 +      //This runs in masterVP, so no race-condition worries
  19.105 +   newProbe->probeID =
  19.106 +            addToDynArray( newProbe, _PRMasterEnv->dynIntervalProbesInfo );
  19.107 +
  19.108 +   semReq->requestingSlv->dataRetFromReq = newProbe;
  19.109 +
  19.110 +   //This in inside PR, while resume_slaveVP fn is inside language, so pass
  19.111 +   // pointer from lang to here, then call it.
  19.112 +   (*resumeFn)( semReq->requestingSlv, semEnv );
  19.113 + }
  19.114 +
  19.115 +void inline
  19.116 +handleThrowException( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
  19.117 + {
  19.118 +   PR_int__throw_exception(  semReq->msgStr, semReq->requestingSlv, semReq->exceptionData );
  19.119 +   
  19.120 +   (*resumeFn)( semReq->requestingSlv, semEnv );
  19.121 + }
  19.122 +
  19.123 +
  19.124 +
    20.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    20.2 +++ b/PR__WL.c	Wed Sep 19 23:12:44 2012 -0700
    20.3 @@ -0,0 +1,160 @@
    20.4 +/*
    20.5 + * Copyright 2010  OpenSourceStewardshipFoundation
    20.6 + *
    20.7 + * Licensed under BSD
    20.8 + */
    20.9 +
   20.10 +#include <stdio.h>
   20.11 +#include <stdlib.h>
   20.12 +#include <string.h>
   20.13 +#include <malloc.h>
   20.14 +#include <inttypes.h>
   20.15 +#include <sys/time.h>
   20.16 +
   20.17 +#include "PR.h"
   20.18 +
   20.19 +
   20.20 +/* MEANING OF   WL  PI  SS  int
   20.21 + * These indicate which places the function is safe to use.  They stand for:
   20.22 + * WL: Wrapper Library
   20.23 + * PI: Plugin 
   20.24 + * SS: Startup and Shutdown
   20.25 + * int: internal to the PR implementation
   20.26 + */
   20.27 +
   20.28 +
   20.29 +
   20.30 +/*For this implementation of PR, it may not make much sense to have the
   20.31 + * system of requests for creating a new processor done this way.. but over
   20.32 + * the scope of single-master, multi-master, mult-tasking, OS-implementing,
   20.33 + * distributed-memory, and so on, this gives PR implementation a chance to
   20.34 + * do stuff before suspend, in the SlaveVP, and in the Master before the plugin
   20.35 + * is called, as well as in the lang-lib before this is called, and in the
   20.36 + * plugin.  So, this gives both PR and language implementations a chance to
   20.37 + * intercept at various points and do order-dependent stuff.
   20.38 + *Having a standard PRNewPrReqData struc allows the language to create and
   20.39 + * free the struc, while PR knows how to get the newSlv if it wants it, and
   20.40 + * it lets the lang have lang-specific data related to creation transported
   20.41 + * to the plugin.
   20.42 + */
   20.43 +void
   20.44 +PR_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv )
   20.45 + { PRReqst req;
   20.46 +
   20.47 +   req.reqType          = createReq;
   20.48 +   req.semReqData       = semReqData;
   20.49 +   req.nextReqst        = reqstingSlv->request;
   20.50 +   reqstingSlv->request = &req;
   20.51 +
   20.52 +   PR_int__suspend_slaveVP_and_send_req( reqstingSlv );
   20.53 + }
   20.54 +
   20.55 +
   20.56 +/*
   20.57 + *This adds a request to dissipate, then suspends the processor so that the
   20.58 + * request handler will receive the request.  The request handler is what
   20.59 + * does the work of freeing memory and removing the processor from the
   20.60 + * semantic environment's data structures.
   20.61 + *The request handler also is what figures out when to shutdown the PR
   20.62 + * system -- which causes all the core controller threads to die, and returns from
   20.63 + * the call that started up PR to perform the work.
   20.64 + *
   20.65 + *This form is a bit misleading to understand if one is trying to figure out
   20.66 + * how PR works -- it looks like a normal function call, but inside it
   20.67 + * sends a request to the request handler and suspends the processor, which
   20.68 + * jumps out of the PR_WL__dissipate_slaveVP function, and out of all nestings
   20.69 + * above it, transferring the work of dissipating to the request handler,
   20.70 + * which then does the actual work -- causing the processor that animated
   20.71 + * the call of this function to disappear and the "hanging" state of this
   20.72 + * function to just poof into thin air -- the virtual processor's trace
   20.73 + * never returns from this call, but instead the virtual processor's trace
   20.74 + * gets suspended in this call and all the virt processor's state disap-
   20.75 + * pears -- making that suspend the last thing in the Slv's trace.
   20.76 + */
   20.77 +void
   20.78 +PR_WL__send_dissipate_req( SlaveVP *slaveToDissipate )
   20.79 + { PRReqst req;
   20.80 +
   20.81 +   req.reqType                = dissipate;
   20.82 +   req.nextReqst              = slaveToDissipate->request;
   20.83 +   slaveToDissipate->request = &req;
   20.84 +
   20.85 +   PR_int__suspend_slaveVP_and_send_req( slaveToDissipate );
   20.86 + }
   20.87 +
   20.88 +
   20.89 +
   20.90 +/*This call's name indicates that request is malloc'd -- so req handler
   20.91 + * has to free any extra requests tacked on before a send, using this.
   20.92 + *
   20.93 + * This inserts the semantic-layer's request data into standard PR carrier
   20.94 + * request data-struct that is mallocd.  The sem request doesn't need to
   20.95 + * be malloc'd if this is called inside the same call chain before the
   20.96 + * send of the last request is called.
   20.97 + *
   20.98 + *The request handler has to call PR_int__free_PRReq for any of these
   20.99 + */
  20.100 +inline void
  20.101 +PR_WL__add_sem_request_in_mallocd_PRReqst( void *semReqData,
  20.102 +                                          SlaveVP *callingSlv )
  20.103 + { PRReqst *req;
  20.104 +
  20.105 +   req = PR_int__malloc( sizeof(PRReqst) );
  20.106 +   req->reqType         = semantic;
  20.107 +   req->semReqData      = semReqData;
  20.108 +   req->nextReqst       = callingSlv->request;
  20.109 +   callingSlv->request = req;
  20.110 + }
  20.111 +
  20.112 +/*This inserts the semantic-layer's request data into standard PR carrier
  20.113 + * request data-struct is allocated on stack of this call & ptr to it sent
  20.114 + * to plugin
  20.115 + *Then it does suspend, to cause request to be sent.
  20.116 + */
  20.117 +inline void
  20.118 +PR_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv )
  20.119 + { PRReqst req;
  20.120 +
  20.121 +   req.reqType         = semantic;
  20.122 +   req.semReqData      = semReqData;
  20.123 +   req.nextReqst       = callingSlv->request;
  20.124 +   callingSlv->request = &req;
  20.125 +   
  20.126 +   PR_int__suspend_slaveVP_and_send_req( callingSlv );
  20.127 + }
  20.128 +
  20.129 +
  20.130 +/*May 2012 Not sure what this is..  looks like old idea for PR semantic
  20.131 + * request
  20.132 + */
  20.133 +inline void
  20.134 +PR_WL__send_PRSem_request( void *semReqData, SlaveVP *callingSlv )
  20.135 + { PRReqst req;
  20.136 +
  20.137 +   req.reqType         = PRSemantic;
  20.138 +   req.semReqData      = semReqData;
  20.139 +   req.nextReqst       = callingSlv->request; //gab any other preceeding 
  20.140 +   callingSlv->request = &req;
  20.141 +
  20.142 +   PR_int__suspend_slaveVP_and_send_req( callingSlv );
  20.143 + }
  20.144 +
  20.145 +/*May 2012
  20.146 + *To throw exception from wrapper lib or application, first turn
  20.147 + * it into a request, then send the request
  20.148 + */
  20.149 +void
  20.150 +PR_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  PRExcp *excpData )
  20.151 + { PRReqst req;
  20.152 +   PRSemReq semReq;
  20.153 +
  20.154 +   req.reqType         = PRSemantic;
  20.155 +   req.semReqData      = &semReq;
  20.156 +   req.nextReqst       = reqstSlv->request; //gab any other preceeding 
  20.157 +   reqstSlv->request   = &req;
  20.158 +
  20.159 +   semReq.msgStr        = msgStr;
  20.160 +   semReq.exceptionData = excpData;
  20.161 +   
  20.162 +   PR_int__suspend_slaveVP_and_send_req( reqstSlv );
  20.163 + }
    21.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    21.2 +++ b/PR__int.c	Wed Sep 19 23:12:44 2012 -0700
    21.3 @@ -0,0 +1,289 @@
    21.4 +/*
    21.5 + * Copyright 2010  OpenSourceStewardshipFoundation
    21.6 + *
    21.7 + * Licensed under BSD
    21.8 + */
    21.9 +
   21.10 +#include <stdio.h>
   21.11 +#include <stdlib.h>
   21.12 +#include <string.h>
   21.13 +#include <malloc.h>
   21.14 +#include <inttypes.h>
   21.15 +#include <sys/time.h>
   21.16 +
   21.17 +#include "PR.h"
   21.18 +
   21.19 +
   21.20 +/* MEANING OF   WL  PI  SS  int
   21.21 + * These indicate which places the function is safe to use.  They stand for:
   21.22 + * WL: Wrapper Library
   21.23 + * PI: Plugin 
   21.24 + * SS: Startup and Shutdown
   21.25 + * int: internal to the PR implementation
   21.26 + */
   21.27 +
   21.28 +
   21.29 +inline SlaveVP *
   21.30 +PR_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
   21.31 + { SlaveVP *newSlv;
   21.32 +   void      *stackLocs;
   21.33 +
   21.34 +   newSlv      = PR_int__malloc( sizeof(SlaveVP) );
   21.35 +   stackLocs   = PR_int__malloc( VIRT_PROCR_STACK_SIZE );
   21.36 +   if( stackLocs == 0 )
   21.37 +    { perror("PR_int__malloc stack"); exit(1); }
   21.38 +
   21.39 +   _PRMasterEnv->numSlavesAlive += 1;
   21.40 +
   21.41 +   return PR_int__create_slaveVP_helper( newSlv, fnPtr, dataParam, stackLocs );
   21.42 + }
   21.43 +
   21.44 +/* "ext" designates that it's for use outside the PR system -- should only
   21.45 + * be called from main thread or other thread -- never from code animated by
   21.46 + * a PR virtual processor.
   21.47 + */
   21.48 +inline SlaveVP *
   21.49 +PR_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
   21.50 + { SlaveVP *newSlv;
   21.51 +   char      *stackLocs;
   21.52 +
   21.53 +   newSlv      = malloc( sizeof(SlaveVP) );
   21.54 +   stackLocs  = malloc( VIRT_PROCR_STACK_SIZE );
   21.55 +   if( stackLocs == 0 )
   21.56 +    { perror("malloc stack"); exit(1); }
   21.57 +
   21.58 +   _PRMasterEnv->numSlavesAlive += 1;
   21.59 +
   21.60 +   return PR_int__create_slaveVP_helper(newSlv, fnPtr, dataParam, stackLocs);
   21.61 + }
   21.62 +
   21.63 +
   21.64 +//===========================================================================
   21.65 +/*there is a label inside this function -- save the addr of this label in
   21.66 + * the callingSlv struc, as the pick-up point from which to start the next
   21.67 + * work-unit for that slave.  If turns out have to save registers, then
   21.68 + * save them in the slave struc too.  Then do assembly jump to the CoreCtlr's
   21.69 + * "done with work-unit" label.  The slave struc is in the request in the
   21.70 + * slave that animated the just-ended work-unit, so all the state is saved
   21.71 + * there, and will get passed along, inside the request handler, to the
   21.72 + * next work-unit for that slave.
   21.73 + */
   21.74 +void
   21.75 +PR_int__suspend_slaveVP_and_send_req( SlaveVP *animatingSlv )
   21.76 + { 
   21.77 +
   21.78 +      //This suspended Slv will get assigned by Master again at some
   21.79 +      // future point
   21.80 +
   21.81 +      //return ownership of the Slv and anim slot to Master virt pr
   21.82 +   animatingSlv->animSlotAssignedTo->workIsDone = TRUE;
   21.83 +
   21.84 +        HOLISTIC__Record_HwResponderInvocation_start;
   21.85 +         MEAS__Capture_Pre_Susp_Point;
   21.86 +      //This assembly function is a PR primitive that first saves the
   21.87 +      // stack and frame pointer, plus an addr inside this assembly code.
   21.88 +      //When core ctlr later gets this slave out of a sched slot, it
   21.89 +      // restores the stack and frame and then jumps to the addr.. that
   21.90 +      // jmp causes return from this function.
   21.91 +      //So, in effect, this function takes a variable amount of wall-clock
   21.92 +      // time to complete -- the amount of time is determined by the
   21.93 +      // Master, which makes sure the memory is in a consistent state first.
   21.94 +   switchToCoreCtlr(animatingSlv);
   21.95 +   flushRegisters();
   21.96 +         MEAS__Capture_Post_Susp_Point;
   21.97 +		 
   21.98 +   return;
   21.99 + }
  21.100 +
  21.101 +
  21.102 +/* "ext" designates that it's for use outside the PR system -- should only
  21.103 + * be called from main thread or other thread -- never from code animated by
  21.104 + * a SlaveVP, nor from a masterVP.
  21.105 + *
  21.106 + *Use this version to dissipate Slvs created outside the PR system.
  21.107 + */
  21.108 +void
  21.109 +PR_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate )
  21.110 + {
  21.111 +   _PRMasterEnv->numSlavesAlive -= 1;
  21.112 +   if( _PRMasterEnv->numSlavesAlive == 0 )
  21.113 +    {    //no more work, so shutdown
  21.114 +      PR_SS__shutdown();  //note, creates shut-down slaves on each core
  21.115 +    }
  21.116 +
  21.117 +   //NOTE: dataParam was given to the processor, so should either have
  21.118 +      // been alloc'd with PR_int__malloc, or freed by the level above animSlv.
  21.119 +      //So, all that's left to free here is the stack and the SlaveVP struc
  21.120 +      // itself
  21.121 +      //Note, should not stack-allocate the data param -- no guarantee, in
  21.122 +      // general that creating processor will outlive ones it creates.
  21.123 +   free( slaveToDissipate->startOfStack );
  21.124 +   free( slaveToDissipate );
  21.125 + }
  21.126 +
  21.127 +
  21.128 +
  21.129 +/*This must be called by the request handler plugin -- it cannot be called
  21.130 + * from the semantic library "dissipate processor" function -- instead, the
  21.131 + * semantic layer has to generate a request, and the plug-in calls this
  21.132 + * function.
  21.133 + *The reason is that this frees the virtual processor's stack -- which is
  21.134 + * still in use inside semantic library calls!
  21.135 + *
  21.136 + *This frees or recycles all the state owned by and comprising the PR
  21.137 + * portion of the animating virtual procr.  The request handler must first
  21.138 + * free any semantic data created for the processor that didn't use the
  21.139 + * PR_malloc mechanism.  Then it calls this, which first asks the malloc
  21.140 + * system to disown any state that did use PR_malloc, and then frees the
  21.141 + * statck and the processor-struct itself.
  21.142 + *If the dissipated processor is the sole (remaining) owner of PR_int__malloc'd
  21.143 + * state, then that state gets freed (or sent to recycling) as a side-effect
  21.144 + * of dis-owning it.
  21.145 + */
  21.146 +void
  21.147 +PR_int__dissipate_slaveVP( SlaveVP *animatingSlv )
  21.148 + {
  21.149 +         DEBUG__printf2(dbgRqstHdlr, "PR int dissipate slaveID: %d, alive: %d",animatingSlv->slaveID, _PRMasterEnv->numSlavesAlive-1);
  21.150 +      //dis-own all locations owned by this processor, causing to be freed
  21.151 +      // any locations that it is (was) sole owner of
  21.152 +   _PRMasterEnv->numSlavesAlive -= 1;
  21.153 +   if( _PRMasterEnv->numSlavesAlive == 0 )
  21.154 +    {    //no more work, so shutdown
  21.155 +      PR_SS__shutdown();  //note, creates shut-down processor on each core
  21.156 +    }
  21.157 +
  21.158 +      //NOTE: dataParam was given to the processor, so should either have
  21.159 +      // been alloc'd with PR_int__malloc, or freed by the level above animSlv.
  21.160 +      //So, all that's left to free here is the stack and the SlaveVP struc
  21.161 +      // itself
  21.162 +      //Note, should not stack-allocate initial data -- no guarantee, in
  21.163 +      // general that creating processor will outlive ones it creates.
  21.164 +   PR_int__free( animatingSlv->startOfStack );
  21.165 +   PR_int__free( animatingSlv );
  21.166 + }
  21.167 +
  21.168 +/*Anticipating multi-tasking
  21.169 + */
  21.170 +void *
  21.171 +PR_int__give_sem_env_for( SlaveVP *animSlv )
  21.172 + {
  21.173 +   return _PRMasterEnv->semanticEnv;
  21.174 + }
  21.175 +
  21.176 +/*
  21.177 + *
  21.178 + */
  21.179 +inline SlaveVP *
  21.180 +PR_int__create_slaveVP_helper( SlaveVP *newSlv,    TopLevelFnPtr  fnPtr,
  21.181 +                     void    *dataParam, void          *stackLocs )
  21.182 + {
  21.183 +   newSlv->startOfStack = stackLocs;
  21.184 +   newSlv->slaveID      = _PRMasterEnv->numSlavesCreated++;
  21.185 +   newSlv->request     = NULL;
  21.186 +   newSlv->animSlotAssignedTo    = NULL;
  21.187 +   newSlv->typeOfVP     = Slave;
  21.188 +   newSlv->assignCount  = 0;
  21.189 +
  21.190 +   PR_int__reset_slaveVP_to_TopLvlFn( newSlv, fnPtr, dataParam );
  21.191 +           
  21.192 +   //============================= MEASUREMENT STUFF ========================
  21.193 +   #ifdef PROBES__TURN_ON_STATS_PROBES
  21.194 +   //TODO: make this TSCHiLow or generic equivalent
  21.195 +   //struct timeval timeStamp;
  21.196 +   //gettimeofday( &(timeStamp), NULL);
  21.197 +   //newSlv->createPtInSecs = timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0) -
  21.198 +   //                                           _PRMasterEnv->createPtInSecs;
  21.199 +   #endif
  21.200 +   //========================================================================
  21.201 +
  21.202 +   return newSlv;
  21.203 + }
  21.204 +
  21.205 +
  21.206 +/*Later, improve this -- for now, just exits the application after printing
  21.207 + * the error message.
  21.208 + */
  21.209 +void
  21.210 +PR_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, PRExcp *excpData )
  21.211 + {
  21.212 +   printf("%s",msgStr);
  21.213 +   fflush(stdin);
  21.214 +   exit(1);
  21.215 + }
  21.216 +
  21.217 +
  21.218 +inline char *
  21.219 +PR_int__strDup( char *str )
  21.220 + { char *retStr;
  21.221 +
  21.222 +   if( str == NULL ) return (char *)NULL;
  21.223 +   retStr = (char *)PR_int__malloc( strlen(str) + 1 );
  21.224 +   strcpy( retStr, str );
  21.225 +
  21.226 +   return (char *)retStr;
  21.227 + }
  21.228 +
  21.229 +
  21.230 +inline void
  21.231 +PR_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock );
  21.232 +
  21.233 +inline void
  21.234 +PR_int__get_master_lock()
  21.235 + { int32 *addrOfMasterLock;
  21.236 + 
  21.237 +   addrOfMasterLock = &(_PRMasterEnv->masterLock);
  21.238 +
  21.239 +   int numTriesToGetLock = 0;
  21.240 +   int gotLock = 0;
  21.241 +   
  21.242 +            MEAS__Capture_Pre_Master_Lock_Point;
  21.243 +
  21.244 +   while( !gotLock ) //keep going until get master lock
  21.245 +    { 
  21.246 +      numTriesToGetLock++;   //if too many, means too much contention
  21.247 +      if( numTriesToGetLock > NUM_TRIES_BEFORE_DO_BACKOFF )
  21.248 +       { PR_int__backoff_for_TooLongToGetLock( numTriesToGetLock );
  21.249 +       }
  21.250 +      if( numTriesToGetLock > MASTERLOCK_RETRIES_BEFORE_YIELD ) 
  21.251 +       { numTriesToGetLock = 0; 
  21.252 +         pthread_yield();
  21.253 +       }
  21.254 +   
  21.255 +         //try to get the lock
  21.256 +      gotLock = __sync_bool_compare_and_swap( addrOfMasterLock,
  21.257 +                                                         UNLOCKED, LOCKED );
  21.258 +    }
  21.259 +            MEAS__Capture_Post_Master_Lock_Point;
  21.260 + }
  21.261 +
  21.262 +/*Used by the backoff to pick a random amount of busy-wait.  Can't use the
  21.263 + * system rand because it takes much too long.
  21.264 + *Note, are passing pointers to the seeds, which are then modified
  21.265 + */
  21.266 +inline uint32_t
  21.267 +PR_int__randomNumber()
  21.268 + {
  21.269 +	_PRMasterEnv->seed1 = 36969 * (_PRMasterEnv->seed1 & 65535) + 
  21.270 +                          (_PRMasterEnv->seed1 >> 16);
  21.271 +	_PRMasterEnv->seed2 = 18000 * (_PRMasterEnv->seed2 & 65535) + 
  21.272 +                          (_PRMasterEnv->seed2 >> 16);
  21.273 +	return (_PRMasterEnv->seed1 << 16) + _PRMasterEnv->seed2;
  21.274 + }
  21.275 +
  21.276 +
  21.277 +/*Busy-waits for a random number of cycles -- chooses number of cycles 
  21.278 + * differently than for the no-work backoff
  21.279 + */
  21.280 +inline void
  21.281 +PR_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock )
  21.282 + { int32 i, waitIterations;
  21.283 +   volatile double fakeWorkVar; //busy-wait fake work
  21.284 +
  21.285 +   waitIterations = 
  21.286 +    PR_int__randomNumber()% (numTriesToGetLock * GET_LOCK_BACKOFF_WEIGHT);   
  21.287 +   //addToHist( wait_iterations, coreLoopThdParams->wait_iterations_hist );
  21.288 +   for( i = 0; i < waitIterations; i++ )
  21.289 +    { fakeWorkVar += (fakeWorkVar + 32.0) / 2.0; //busy-wait
  21.290 +    }
  21.291 + }
  21.292 +
    22.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    22.2 +++ b/PR__startup_and_shutdown.c	Wed Sep 19 23:12:44 2012 -0700
    22.3 @@ -0,0 +1,601 @@
    22.4 +/*
    22.5 + * Copyright 2010  OpenSourceStewardshipFoundation
    22.6 + *
    22.7 + * Licensed under BSD
    22.8 + */
    22.9 +
   22.10 +#include <stdio.h>
   22.11 +#include <stdlib.h>
   22.12 +#include <string.h>
   22.13 +#include <malloc.h>
   22.14 +#include <inttypes.h>
   22.15 +#include <sys/time.h>
   22.16 +#include <pthread.h>
   22.17 +
   22.18 +#include "PR.h"
   22.19 +
   22.20 +
   22.21 +#define thdAttrs NULL
   22.22 +
   22.23 +
   22.24 +/* MEANING OF   WL  PI  SS  int
   22.25 + * These indicate which places the function is safe to use.  They stand for:
   22.26 + * WL: Wrapper Library
   22.27 + * PI: Plugin 
   22.28 + * SS: Startup and Shutdown
   22.29 + * int: internal to the PR implementation
   22.30 + */
   22.31 +
   22.32 +
   22.33 +//===========================================================================
   22.34 +AnimSlot **
   22.35 +create_anim_slots( int32 coreSlotsAreOn );
   22.36 +
   22.37 +void
   22.38 +create_masterEnv();
   22.39 +
   22.40 +void
   22.41 +create_the_coreCtlr_OS_threads();
   22.42 +
   22.43 +MallocProlog *
   22.44 +create_free_list();
   22.45 +
   22.46 +void
   22.47 +endOSThreadFn( void *initData, SlaveVP *animatingSlv );
   22.48 +
   22.49 +
   22.50 +//===========================================================================
   22.51 +
   22.52 +/*Setup has two phases:
   22.53 + * 1) Semantic layer first calls init_PR, which creates masterEnv, and puts
   22.54 + *    the master Slv into the work-queue, ready for first "call"
   22.55 + * 2) Semantic layer then does its own init, which creates the seed virt
   22.56 + *    slave inside the semantic layer, ready to assign it when
   22.57 + *    asked by the first run of the animationMaster.
   22.58 + *
   22.59 + *This part is bit weird because PR really wants to be "always there", and
   22.60 + * have applications attach and detach..  for now, this PR is part of
   22.61 + * the app, so the PR system starts up as part of running the app.
   22.62 + *
   22.63 + *The semantic layer is isolated from the PR internals by making the
   22.64 + * semantic layer do setup to a state that it's ready with its
   22.65 + * initial Slvs, ready to assign them to slots when the animationMaster
   22.66 + * asks.  Without this pattern, the semantic layer's setup would
   22.67 + * have to modify slots directly to assign the initial virt-procrs, and put
   22.68 + * them into the readyToAnimateQ itself, breaking the isolation completely.
   22.69 + *
   22.70 + * 
   22.71 + *The semantic layer creates the initial Slv(s), and adds its
   22.72 + * own environment to masterEnv, and fills in the pointers to
   22.73 + * the requestHandler and slaveAssigner plug-in functions
   22.74 + */
   22.75 +
   22.76 +/*This allocates PR data structures, populates the master PRProc,
   22.77 + * and master environment, and returns the master environment to the semantic
   22.78 + * layer.
   22.79 + */
   22.80 +void
   22.81 +PR__start()
   22.82 + {
   22.83 +   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
   22.84 +      create_masterEnv();
   22.85 +      printf( "\n\n Running in SEQUENTIAL mode \n\n" );
   22.86 +   #else
   22.87 +      create_masterEnv();
   22.88 +      DEBUG__printf1(dbgInfra,"Offset of lock in masterEnv: %d ", (int32)offsetof(MasterEnv,masterLock) );
   22.89 +      create_the_coreCtlr_OS_threads();
   22.90 +   #endif
   22.91 + }
   22.92 +
   22.93 +/*This gets the process struct out of the seedVP, then gets the semEnv-holding
   22.94 + * struct out of that, then inserts the semantic env into that struct, using
   22.95 + * the magic number as the key to the sem env placement.  The master will 
   22.96 + * use the magic number from a request to retrieve the semantic env appropriate
   22.97 + * for the construct that made the request.
   22.98 + */
   22.99 +void
  22.100 +PR__register_langlets_semEnv( PRSemEnv *semEnv, int32 magicNumber, 
  22.101 +                              SlaveVP  *seedVP )
  22.102 + { PREnvHolder *envHolder;
  22.103 +   PRProcess   *process;
  22.104 +
  22.105 +   process   = seedVP->process;
  22.106 +   envHolder = process->semEnvHolder;
  22.107 +   
  22.108 +   insert( magicNumber, semEnv, envHolder );
  22.109 + }
  22.110 +
  22.111 +
  22.112 +/*TODO: finish implementing
  22.113 + *This function returns information about the version of PR, the language
  22.114 + * the program is being run in, its version, and information on the 
  22.115 + * hardware.
  22.116 + */
  22.117 +/*
  22.118 +char *
  22.119 +PR_App__give_environment_string()
  22.120 + {
  22.121 +   //--------------------------
  22.122 +    fprintf(output, "#\n# >> Build information <<\n");
  22.123 +    fprintf(output, "# GCC VERSION: %d.%d.%d\n",__GNUC__,__GNUC_MINOR__,__GNUC_PATCHLEVEL__);
  22.124 +    fprintf(output, "# Build Date: %s %s\n", __DATE__, __TIME__);
  22.125 +    
  22.126 +    fprintf(output, "#\n# >> Hardware information <<\n");
  22.127 +    fprintf(output, "# Hardware Architecture: ");
  22.128 +   #ifdef __x86_64
  22.129 +    fprintf(output, "x86_64");
  22.130 +   #endif //__x86_64
  22.131 +   #ifdef __i386
  22.132 +    fprintf(output, "x86");
  22.133 +   #endif //__i386
  22.134 +    fprintf(output, "\n");
  22.135 +    fprintf(output, "# Number of Cores: %d\n", NUM_CORES);
  22.136 +   //--------------------------
  22.137 +    
  22.138 +   //PR Plugins
  22.139 +    fprintf(output, "#\n# >> PR Plugins <<\n");
  22.140 +    fprintf(output, "# Language : ");
  22.141 +    fprintf(output, _LANG_NAME_);
  22.142 +    fprintf(output, "\n");
  22.143 +       //Meta info gets set by calls from the language during its init,
  22.144 +       // and info registered by calls from inside the application
  22.145 +    fprintf(output, "# Assigner: %s\n", _PRMasterEnv->metaInfo->assignerInfo);
  22.146 +
  22.147 +   //--------------------------
  22.148 +   //Application
  22.149 +    fprintf(output, "#\n# >> Application <<\n");
  22.150 +    fprintf(output, "# Name: %s\n", _PRMasterEnv->metaInfo->appInfo);
  22.151 +    fprintf(output, "# Data Set:\n%s\n",_PRMasterEnv->metaInfo->inputSet);
  22.152 +    
  22.153 +   //--------------------------
  22.154 + }
  22.155 + */
  22.156 + 
  22.157 +
  22.158 +/*A pointer to the startup-function for the language is given as the last
  22.159 + * argument to the call.  Use this to initialize a program in the language.
  22.160 + * This creates a data structure that encapsulates the bookkeeping info
  22.161 + * PR uses to track and schedule a program run.
  22.162 + */
  22.163 +PRProcess *
  22.164 +PR__spawn_program_on_data_in_Lang( TopLevelFnPtr seed_fn, void *data )
  22.165 + { PRProcess *newProcess;
  22.166 +   newProcess = malloc( sizeof(PRProcess) );
  22.167 +   
  22.168 +   newProcess->doneLock = PTHREAD_MUTEX_INITIALIZER;
  22.169 +   newProcess->doneCond = PTHREAD_COND_INITIALIZER;
  22.170 +   newProcess->executionIsComplete = FALSE;
  22.171 +   newProcess->numSlavesLive = 0;
  22.172 +   
  22.173 +   newProcess->dataForSeed = data;
  22.174 +   newProcess->seedFnPtr   = prog_seed_fn;
  22.175 +   
  22.176 +      //The language's spawn-process function fills in the plugin function-ptrs in
  22.177 +      // the PRProcess struct, gives the struct to PR, which then makes and
  22.178 +      // queues the seed SlaveVP, which starts processors made from the code being
  22.179 +      // animated.
  22.180 +    
  22.181 +   (*langInitFnPtr)( newProcess );  
  22.182 +   
  22.183 +   return newProcess;
  22.184 + }
  22.185 +
  22.186 +
  22.187 +/*When all SlaveVPs owned by the program-run associated to the process have
  22.188 + * dissipated, then return from this call.  There is no language to cleanup,
  22.189 + * and PR does not shutdown..  but the process bookkeeping structure,
  22.190 + * which is used by PR to track and schedule the program, is freed.
  22.191 + *The PRProcess structure is kept until this call collects the results from it,
  22.192 + * then freed.  If the process is not done yet when PR gets this
  22.193 + * call, then this call waits..  the challenge here is that this call comes from
  22.194 + * a live OS thread that's outside PR..  so, inside here, it waits on a 
  22.195 + * condition..  then it's a PR thread that signals this to wake up..
  22.196 + *First checks whether the process is done, if yes, calls the clean-up fn then
  22.197 + * returns the result extracted from the PRProcess struct.
  22.198 + *If process not done yet, then performs a wait (in a loop to be sure the
  22.199 + * wakeup is not spurious, which can happen).  PR registers the wait, and upon
  22.200 + * the process ending (last SlaveVP owned by it dissipates), then PR signals
  22.201 + * this to wakeup.  This then calls the cleanup fn and returns the result.
  22.202 + */
  22.203 +/*
  22.204 +void *
  22.205 +PR_App__give_results_when_done_for( PRProcess *process )
  22.206 + { void *result;
  22.207 +   
  22.208 +   pthread_mutex_lock( process->doneLock );
  22.209 +   while( !(process->executionIsComplete) )
  22.210 +    {
  22.211 +      pthread_cond_wait( process->doneCond,
  22.212 +                         process->doneLock );
  22.213 +    }
  22.214 +   pthread_mutex_unlock( process->doneLock );
  22.215 +   
  22.216 +   result = process->resultToReturn;
  22.217 +   
  22.218 +   PR_int__cleanup_process_after_done( process );
  22.219 +   free( process );  //was malloc'd above, so free it here
  22.220 +   
  22.221 +   return result;
  22.222 + }
  22.223 +*/
  22.224 +
  22.225 +/*Turns off the PR system, and frees all data associated with it.  Does this
  22.226 + * by creating shutdown SlaveVPs and inserting them into animation slots.
  22.227 + * Will probably have to wake up sleeping cores as part of this -- the fn that
  22.228 + * inserts the new SlaveVPs should handle the wakeup..
  22.229 + */
  22.230 +/*
  22.231 +void
  22.232 +PR_SS__shutdown(); //already defined -- look at it
  22.233 +
  22.234 +void
  22.235 +PR_App__shutdown()
  22.236 + {
  22.237 +   for( cores )
  22.238 +    { slave = PR_int__create_new_SlaveVP( endOSThreadFn, NULL );
  22.239 +      PR_int__insert_slave_onto_core( SlaveVP *slave, coreNum );
  22.240 +    }
  22.241 + }
  22.242 +*/
  22.243 +
  22.244 +/* PR_App__start_PR_running();
  22.245 +
  22.246 +   PRProcess matrixMultProcess;
  22.247 +   
  22.248 +   matrixMultProcess =
  22.249 +    PR_App__spawn_program_on_data_in_Lang( &prog_seed_fn, data, Vthread_lang );
  22.250 +   
  22.251 +   resMatrix = PR_App__give_results_when_done_for( matrixMultProcess );
  22.252 +   
  22.253 +   PR_App__shutdown();
  22.254 + */
  22.255 +
  22.256 +void
  22.257 +create_masterEnv()
  22.258 + { MasterEnv       *masterEnv;
  22.259 +   PRQueueStruc  **readyToAnimateQs;
  22.260 +   int              coreIdx;
  22.261 +   SlaveVP        **masterVPs;
  22.262 +   AnimSlot     ***allAnimSlots; //ptr to array of ptrs
  22.263 +
  22.264 +
  22.265 +      //Make the master env, which holds everything else
  22.266 +   _PRMasterEnv = malloc( sizeof(MasterEnv) );
  22.267 +
  22.268 +        //Very first thing put into the master env is the free-list, seeded
  22.269 +        // with a massive initial chunk of memory.
  22.270 +        //After this, all other mallocs are PR__malloc.
  22.271 +   _PRMasterEnv->freeLists        = PR_ext__create_free_list();
  22.272 +   
  22.273 +   
  22.274 +   //===================== Only PR__malloc after this ====================
  22.275 +   masterEnv     = (MasterEnv*)_PRMasterEnv;
  22.276 +   
  22.277 +      //Make a readyToAnimateQ for each core controller
  22.278 +   readyToAnimateQs = PR_int__malloc( NUM_CORES * sizeof(PRQueueStruc *) );
  22.279 +   masterVPs        = PR_int__malloc( NUM_CORES * sizeof(SlaveVP *) );
  22.280 +
  22.281 +      //One array for each core, several in array, core's masterVP scheds all
  22.282 +   allAnimSlots    = PR_int__malloc( NUM_CORES * sizeof(AnimSlot *) );
  22.283 +
  22.284 +   _PRMasterEnv->numSlavesAlive = 0;  //used to detect shut-down condition
  22.285 +
  22.286 +//========================================
  22.287 +   semEnv->shutdownInitiated = FALSE;
  22.288 +   semEnv->coreIsDone = PR_int__malloc( NUM_CORES * sizeof( bool32 ) );
  22.289 +   
  22.290 +      //For each animation slot, there is an idle slave, and an initial
  22.291 +      // slave assigned as the current-task-slave.  Create them here.
  22.292 +   SlaveVP *idleSlv, *slotTaskSlv;
  22.293 +   for( coreNum = 0; coreNum < NUM_CORES; coreNum++ )
  22.294 +    { semEnv->coreIsDone[coreNum] = FALSE; //use during shutdown
  22.295 +    
  22.296 +      for( slotNum = 0; slotNum < NUM_ANIM_SLOTS; ++slotNum )
  22.297 +       { idleSlv = VSs__create_slave_helper( &idle_fn, NULL, semEnv, 0);
  22.298 +         idleSlv->coreAnimatedBy                = coreNum;
  22.299 +         idleSlv->animSlotAssignedTo            =
  22.300 +                               _PRMasterEnv->allAnimSlots[coreNum][slotNum];
  22.301 +         semEnv->idleSlv[coreNum][slotNum] = idleSlv;
  22.302 +         
  22.303 +         slotTaskSlv = VSs__create_slave_helper( &idle_fn, NULL, semEnv, 0);
  22.304 +         slotTaskSlv->coreAnimatedBy            = coreNum;
  22.305 +         slotTaskSlv->animSlotAssignedTo        = 
  22.306 +                               _PRMasterEnv->allAnimSlots[coreNum][slotNum];
  22.307 +         
  22.308 +         semData                    = slotTaskSlv->semanticData;
  22.309 +         semData->needsTaskAssigned = TRUE;
  22.310 +         semData->slaveType         = SlotTaskSlv;
  22.311 +         semEnv->slotTaskSlvs[coreNum][slotNum] = slotTaskSlv;
  22.312 +       }
  22.313 +    }
  22.314 +
  22.315 +      //create the recycle queue where free task slaves are put after their task ends
  22.316 +   semEnv->freeTaskSlvRecycleQ  = makePRQ();
  22.317 +   
  22.318 +
  22.319 +   semEnv->numLiveExtraTaskSlvs   = 0;
  22.320 +   semEnv->numLiveThreadSlvs      = 0; //none existent yet.. "create process" creates the seeds  
  22.321 +//==================================================================
  22.322 +   
  22.323 +   _PRMasterEnv->numSlavesCreated = 0;  //used by create slave to set slave ID
  22.324 +   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  22.325 +    {    
  22.326 +      readyToAnimateQs[ coreIdx ] = makePRQ();
  22.327 +      
  22.328 +         //Q: should give masterVP core-specific info as its init data?
  22.329 +      masterVPs[ coreIdx ] = PR_int__create_slaveVP( (TopLevelFnPtr)&animationMaster, (void*)masterEnv );
  22.330 +      masterVPs[ coreIdx ]->coreAnimatedBy = coreIdx;
  22.331 +      masterVPs[ coreIdx ]->typeOfVP = Master;
  22.332 +      allAnimSlots[ coreIdx ] = create_anim_slots( coreIdx ); //makes for one core
  22.333 +    }
  22.334 +   _PRMasterEnv->masterVPs        = masterVPs;
  22.335 +   _PRMasterEnv->masterLock       = UNLOCKED;
  22.336 +   _PRMasterEnv->seed1 = rand()%1000; // init random number generator
  22.337 +   _PRMasterEnv->seed2 = rand()%1000; // init random number generator
  22.338 +   _PRMasterEnv->allAnimSlots    = allAnimSlots;
  22.339 +   _PRMasterEnv->measHistsInfo = NULL; 
  22.340 +
  22.341 +   //============================= MEASUREMENT STUFF ========================
  22.342 +      
  22.343 +         MEAS__Make_Meas_Hists_for_Susp_Meas;
  22.344 +         MEAS__Make_Meas_Hists_for_Master_Meas;
  22.345 +         MEAS__Make_Meas_Hists_for_Master_Lock_Meas;
  22.346 +         MEAS__Make_Meas_Hists_for_Malloc_Meas;
  22.347 +         MEAS__Make_Meas_Hists_for_Plugin_Meas;
  22.348 +         MEAS__Make_Meas_Hists_for_Language;
  22.349 +
  22.350 +         PROBES__Create_Probe_Bookkeeping_Vars;
  22.351 +         
  22.352 +         HOLISTIC__Setup_Perf_Counters;
  22.353 +         
  22.354 +   //========================================================================
  22.355 + }
  22.356 +
  22.357 +AnimSlot **
  22.358 +create_anim_slots( int32 coreSlotsAreOn )
  22.359 + { AnimSlot  **animSlots;
  22.360 +   int i;
  22.361 +
  22.362 +   animSlots  = PR_int__malloc( NUM_ANIM_SLOTS * sizeof(AnimSlot *) );
  22.363 +
  22.364 +   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
  22.365 +    {
  22.366 +      animSlots[i] = PR_int__malloc( sizeof(AnimSlot) );
  22.367 +
  22.368 +         //Set state to mean "handling requests done, slot needs filling"
  22.369 +      animSlots[i]->workIsDone         = FALSE;
  22.370 +      animSlots[i]->needsSlaveAssigned = TRUE;
  22.371 +      animSlots[i]->slotIdx            = i; //quick retrieval of slot pos
  22.372 +      animSlots[i]->coreSlotIsOn       = coreSlotsAreOn;
  22.373 +    }
  22.374 +   return animSlots;
  22.375 + }
  22.376 +
  22.377 +
  22.378 +void
  22.379 +freeAnimSlots( AnimSlot **animSlots )
  22.380 + { int i;
  22.381 +   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
  22.382 +    {
  22.383 +      PR_int__free( animSlots[i] );
  22.384 +    }
  22.385 +   PR_int__free( animSlots );
  22.386 + }
  22.387 +
  22.388 +
  22.389 +void
  22.390 +create_the_coreCtlr_OS_threads()
  22.391 + {
  22.392 +   //========================================================================
  22.393 +   //                      Create the Threads
  22.394 +   int coreIdx, retCode;
  22.395 +
  22.396 +      //Need the threads to be created suspended, and wait for a signal
  22.397 +      // before proceeding -- gives time after creating to initialize other
  22.398 +      // stuff before the coreCtlrs set off.
  22.399 +   _PRMasterEnv->setupComplete = 0;
  22.400 +   
  22.401 +      //initialize the cond used to make the new threads wait and sync up
  22.402 +      //must do this before *creating* the threads..
  22.403 +   pthread_mutex_init( &suspendLock, NULL );
  22.404 +   pthread_cond_init( &suspendCond, NULL );
  22.405 +
  22.406 +      //Make the threads that animate the core controllers
  22.407 +   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
  22.408 +    { coreCtlrThdParams[coreIdx]          = PR_int__malloc( sizeof(ThdParams) );
  22.409 +      coreCtlrThdParams[coreIdx]->coreNum = coreIdx;
  22.410 +
  22.411 +      retCode =
  22.412 +      pthread_create( &(coreCtlrThdHandles[coreIdx]),
  22.413 +                        thdAttrs,
  22.414 +                       &coreController,
  22.415 +               (void *)(coreCtlrThdParams[coreIdx]) );
  22.416 +      if(retCode){printf("ERROR creating thread: %d\n", retCode); exit(1);}
  22.417 +    }
  22.418 + }
  22.419 +
  22.420 +
  22.421 +/*This is what causes the PR system to initialize.. then waits for it to
  22.422 + * exit.
  22.423 + * 
  22.424 + *Wrapper lib layer calls this when it wants the system to start running..
  22.425 + */
  22.426 +/*
  22.427 +void
  22.428 +PR_SS__start_the_work_then_wait_until_done()
  22.429 + { 
  22.430 +#ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
  22.431 +   //Only difference between version with an OS thread pinned to each core and
  22.432 +   // the sequential version of PR is PR__init_Seq, this, and coreCtlr_Seq.
  22.433 +   //
  22.434 +         //Instead of un-suspending threads, just call the one and only
  22.435 +         // core ctlr (sequential version), in the main thread.
  22.436 +      coreCtlr_Seq( NULL );
  22.437 +      flushRegisters();
  22.438 +#else
  22.439 +   int coreIdx;
  22.440 +      //Start the core controllers running
  22.441 +   
  22.442 +      //tell the core controller threads that setup is complete
  22.443 +      //get lock, to lock out any threads still starting up -- they'll see
  22.444 +      // that setupComplete is true before entering while loop, and so never
  22.445 +      // wait on the condition
  22.446 +   pthread_mutex_lock(     &suspendLock );
  22.447 +   _PRMasterEnv->setupComplete = 1;
  22.448 +   pthread_mutex_unlock(   &suspendLock );
  22.449 +   pthread_cond_broadcast( &suspendCond );
  22.450 +   
  22.451 +   
  22.452 +      //wait for all to complete
  22.453 +   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
  22.454 +    {
  22.455 +      pthread_join( coreCtlrThdHandles[coreIdx], NULL );
  22.456 +    }
  22.457 +   
  22.458 +      //NOTE: do not clean up PR env here -- semantic layer has to have
  22.459 +      // a chance to clean up its environment first, then do a call to free
  22.460 +      // the Master env and rest of PR locations
  22.461 +#endif
  22.462 + }
  22.463 +*/
  22.464 +
  22.465 +SlaveVP* PR_SS__create_shutdown_slave(){
  22.466 +    SlaveVP* shutdownVP;
  22.467 +    
  22.468 +    shutdownVP = PR_int__create_slaveVP( &endOSThreadFn, NULL );
  22.469 +    shutdownVP->typeOfVP = Shutdown;
  22.470 +    
  22.471 +    return shutdownVP;
  22.472 +}
  22.473 +
  22.474 +//TODO: look at architecting cleanest separation between request handler
  22.475 +// and animation master, for dissipate, create, shutdown, and other non-semantic
  22.476 +// requests.  Issue is chain: one removes requests from AppSlv, one dispatches
  22.477 +// on type of request, and one handles each type..  but some types require
  22.478 +// action from both request handler and animation master -- maybe just give the
  22.479 +// request handler calls like:  PR__handle_X_request_type
  22.480 +
  22.481 +
  22.482 +/*This is called by the semantic layer's request handler when it decides its
  22.483 + * time to shut down the PR system.  Calling this causes the core controller OS
  22.484 + * threads to exit, which unblocks the entry-point function that started up
  22.485 + * PR, and allows it to grab the result and return to the original single-
  22.486 + * threaded application.
  22.487 + * 
  22.488 + *The _PRMasterEnv is needed by this shut down function, so the create-seed-
  22.489 + * and-wait function has to free a bunch of stuff after it detects the
  22.490 + * threads have all died: the masterEnv, the thread-related locations,
  22.491 + * masterVP any AppSlvs that might still be allocated and sitting in the
  22.492 + * semantic environment, or have been orphaned in the _PRWorkQ.
  22.493 + * 
  22.494 + *NOTE: the semantic plug-in is expected to use PR__malloc to get all the
  22.495 + * locations it needs, and give ownership to masterVP.  Then, they will be
  22.496 + * automatically freed.
  22.497 + *
  22.498 + *In here,create one core-loop shut-down processor for each core controller and put
  22.499 + * them all directly into the readyToAnimateQ.
  22.500 + *Note, this function can ONLY be called after the semantic environment no
  22.501 + * longer cares if AppSlvs get animated after the point this is called.  In
  22.502 + * other words, this can be used as an abort, or else it should only be
  22.503 + * called when all AppSlvs have finished dissipate requests -- only at that
  22.504 + * point is it sure that all results have completed.
  22.505 + */
  22.506 +void
  22.507 +PR_SS__shutdown()
  22.508 + { int32       coreIdx;
  22.509 +   SlaveVP    *shutDownSlv;
  22.510 +   AnimSlot **animSlots;
  22.511 +      //create the shutdown processors, one for each core controller -- put them
  22.512 +      // directly into the Q -- each core will die when gets one
  22.513 +   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  22.514 +    {    //Note, this is running in the master
  22.515 +      shutDownSlv = PR_SS__create_shutdown_slave();
  22.516 +         //last slave has dissipated, so no more in slots, so write
  22.517 +         // shut down slave into first animulng slot.
  22.518 +      animSlots = _PRMasterEnv->allAnimSlots[ coreIdx ];
  22.519 +      animSlots[0]->slaveAssignedToSlot = shutDownSlv;
  22.520 +      animSlots[0]->needsSlaveAssigned = FALSE;
  22.521 +      shutDownSlv->coreAnimatedBy = coreIdx;
  22.522 +      shutDownSlv->animSlotAssignedTo = animSlots[ 0 ];
  22.523 +    }
  22.524 + }
  22.525 +
  22.526 +
  22.527 +/*Am trying to be cute, avoiding IF statement in coreCtlr that checks for
  22.528 + * a special shutdown slaveVP.  Ended up with extra-complex shutdown sequence.
  22.529 + *This function has the sole purpose of setting the stack and framePtr
  22.530 + * to the coreCtlr's stack and framePtr.. it does that then jumps to the
  22.531 + * core ctlr's shutdown point -- might be able to just call Pthread_exit
  22.532 + * from here, but am going back to the pthread's stack and setting everything
  22.533 + * up just as if it never jumped out, before calling pthread_exit.
  22.534 + *The end-point of core ctlr will free the stack and so forth of the
  22.535 + * processor that animates this function, (this fn is transfering the
  22.536 + * animator of the AppSlv that is in turn animating this function over
  22.537 + * to core controller function -- note that this slices out a level of virtual
  22.538 + * processors).
  22.539 + */
  22.540 +void
  22.541 +endOSThreadFn( void *initData, SlaveVP *animatingSlv )
  22.542 + { 
  22.543 +   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
  22.544 +    asmTerminateCoreCtlrSeq(animatingSlv);
  22.545 +   #else
  22.546 +    asmTerminateCoreCtlr(animatingSlv);
  22.547 +   #endif
  22.548 + }
  22.549 +
  22.550 +
  22.551 +/*This is called from the startup & shutdown
  22.552 + */
  22.553 +void
  22.554 +PR_SS__cleanup_at_end_of_shutdown()
  22.555 + { 
  22.556 +      //Before getting rid of everything, print out any measurements made
  22.557 +   if( _PRMasterEnv->measHistsInfo != NULL )
  22.558 +    { forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&printHist );
  22.559 +      forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&saveHistToFile);
  22.560 +      forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&freeHist );
  22.561 +    }
  22.562 +   
  22.563 +   MEAS__Print_Hists_for_Susp_Meas;
  22.564 +   MEAS__Print_Hists_for_Master_Meas;
  22.565 +   MEAS__Print_Hists_for_Master_Lock_Meas;
  22.566 +   MEAS__Print_Hists_for_Malloc_Meas;
  22.567 +   MEAS__Print_Hists_for_Plugin_Meas;
  22.568 +   
  22.569 +
  22.570 +      //All the environment data has been allocated with PR__malloc, so just
  22.571 +      // free its internal big-chunk and all inside it disappear.
  22.572 +/*
  22.573 +   readyToAnimateQs = _PRMasterEnv->readyToAnimateQs;
  22.574 +   masterVPs        = _PRMasterEnv->masterVPs;
  22.575 +   allAnimSlots    = _PRMasterEnv->allAnimSlots;
  22.576 +   
  22.577 +   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  22.578 +    {
  22.579 +      freePRQ( readyToAnimateQs[ coreIdx ] );
  22.580 +         //master Slvs were created external to PR, so use external free
  22.581 +      PR_int__dissipate_slaveVP( masterVPs[ coreIdx ] );
  22.582 +      
  22.583 +      freeAnimSlots( allAnimSlots[ coreIdx ] );
  22.584 +    }
  22.585 +   
  22.586 +   PR_int__free( _PRMasterEnv->readyToAnimateQs );
  22.587 +   PR_int__free( _PRMasterEnv->masterVPs );
  22.588 +   PR_int__free( _PRMasterEnv->allAnimSlots );
  22.589 +   
  22.590 +   //============================= MEASUREMENT STUFF ========================
  22.591 +   #ifdef PROBES__TURN_ON_STATS_PROBES
  22.592 +   freeDynArrayDeep( _PRMasterEnv->dynIntervalProbesInfo, &PR_WL__free_probe);
  22.593 +   #endif
  22.594 +   //========================================================================
  22.595 +*/
  22.596 +      //These are the only two that use system free 
  22.597 +   PR_ext__free_free_list( _PRMasterEnv->freeLists );
  22.598 +   free( (void *)_PRMasterEnv );
  22.599 + }
  22.600 +
  22.601 +
  22.602 +//================================
  22.603 +
  22.604 +
    23.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    23.2 +++ b/PR_primitive_data_types.h	Wed Sep 19 23:12:44 2012 -0700
    23.3 @@ -0,0 +1,42 @@
    23.4 +/*
    23.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
    23.6 + *  Licensed under GNU General Public License version 2
    23.7 + *  
    23.8 + * Author: seanhalle@yahoo.com
    23.9 + *  
   23.10 +
   23.11 + */
   23.12 +
   23.13 +#ifndef _PRIMITIVE_DATA_TYPES_H
   23.14 +#define _PRIMITIVE_DATA_TYPES_H
   23.15 +
   23.16 +
   23.17 +/*For portability, need primitive data types that have a well defined
   23.18 + * size, and well-defined layout into bytes
   23.19 + *To do this, provide standard aliases for all primitive data types
   23.20 + *These aliases must be used in all functions instead of the ANSI types
   23.21 + *
   23.22 + *When PR is used together with BLIS, these definitions will be replaced
   23.23 + * inside each specialization module according to the compiler used in
   23.24 + * that module and the hardware being specialized to.
   23.25 + */
   23.26 +typedef char               bool8;
   23.27 +typedef char               int8;
   23.28 +typedef char               uint8;
   23.29 +typedef short              int16;
   23.30 +typedef unsigned short     uint16;
   23.31 +typedef int                int32;
   23.32 +typedef unsigned int       uint32;
   23.33 +typedef unsigned int       bool32;
   23.34 +typedef long long          int64;
   23.35 +typedef unsigned long long uint64;
   23.36 +typedef float              float32;
   23.37 +typedef double             float64;
   23.38 +//typedef double double      float128;  //GCC doesn't like this
   23.39 +#define float128 double double
   23.40 +
   23.41 +#define TRUE  1
   23.42 +#define FALSE 0
   23.43 +
   23.44 +#endif	/* _PRIMITIVE_DATA_TYPES_H */
   23.45 +
    24.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    24.2 +++ b/Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h	Wed Sep 19 23:12:44 2012 -0700
    24.3 @@ -0,0 +1,514 @@
    24.4 +/*
    24.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
    24.6 + *  Licensed under GNU General Public License version 2
    24.7 + *
    24.8 + * Author: seanhalle@yahoo.com
    24.9 + * 
   24.10 + */
   24.11 +
   24.12 +#ifndef _PR_MEAS_MACROS_H
   24.13 +#define _PR_MEAS_MACROS_H
   24.14 +#define _GNU_SOURCE
   24.15 +
   24.16 +//==================  Macros define types of meas want  =====================
   24.17 +//
   24.18 +/*Generic measurement macro -- has name-space collision potential, which
   24.19 + * compiler will catch..  so only use one pair inside a given set of 
   24.20 + * curly braces. 
   24.21 + */
   24.22 +//TODO: finish generic capture interval in hist
   24.23 +enum histograms
   24.24 + { generic1
   24.25 + };
   24.26 +   #define MEAS__Capture_Pre_Point \
   24.27 +      int32 startStamp, endStamp; \
   24.28 +      saveLowTimeStampCountInto( startStamp );
   24.29 +
   24.30 +   #define MEAS__Capture_Post_Point( histName ) \
   24.31 +      saveLowTimeStampCountInto( endStamp ); \
   24.32 +      addIntervalToHist( startStamp, endStamp, _PRMasterEnv->histName ); 
   24.33 +
   24.34 +
   24.35 +
   24.36 +
   24.37 +//==================  Macros define types of meas want  =====================
   24.38 +
   24.39 +#ifdef MEAS__TURN_ON_SUSP_MEAS
   24.40 +   #define MEAS__Insert_Susp_Meas_Fields_into_Slave \
   24.41 +       uint32  preSuspTSCLow; \
   24.42 +       uint32  postSuspTSCLow;
   24.43 +
   24.44 +   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv \
   24.45 +       Histogram       *suspLowTimeHist; \
   24.46 +       Histogram       *suspHighTimeHist;
   24.47 +
   24.48 +   #define MEAS__Make_Meas_Hists_for_Susp_Meas \
   24.49 +      _PRMasterEnv->suspLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   24.50 +                                                    "master_low_time_hist");\
   24.51 +      _PRMasterEnv->suspHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   24.52 +                                                    "master_high_time_hist");
   24.53 +      
   24.54 +      //record time stamp: compare to time-stamp recorded below
   24.55 +   #define MEAS__Capture_Pre_Susp_Point \
   24.56 +      saveLowTimeStampCountInto( animatingSlv->preSuspTSCLow );
   24.57 +   
   24.58 +      //NOTE: only take low part of count -- do sanity check when take diff
   24.59 +   #define MEAS__Capture_Post_Susp_Point \
   24.60 +      saveLowTimeStampCountInto( animatingSlv->postSuspTSCLow );\
   24.61 +      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
   24.62 +                         _PRMasterEnv->suspLowTimeHist ); \
   24.63 +      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
   24.64 +                         _PRMasterEnv->suspHighTimeHist );
   24.65 +
   24.66 +   #define MEAS__Print_Hists_for_Susp_Meas \
   24.67 +      printHist( _PRMasterEnv->pluginTimeHist );
   24.68 +      
   24.69 +#else
   24.70 +   #define MEAS__Insert_Susp_Meas_Fields_into_Slave     
   24.71 +   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv 
   24.72 +   #define MEAS__Make_Meas_Hists_for_Susp_Meas 
   24.73 +   #define MEAS__Capture_Pre_Susp_Point
   24.74 +   #define MEAS__Capture_Post_Susp_Point   
   24.75 +   #define MEAS__Print_Hists_for_Susp_Meas 
   24.76 +#endif
   24.77 +
   24.78 +#ifdef MEAS__TURN_ON_MASTER_MEAS
   24.79 +   #define MEAS__Insert_Master_Meas_Fields_into_Slave \
   24.80 +       uint32  startMasterTSCLow; \
   24.81 +       uint32  endMasterTSCLow;
   24.82 +
   24.83 +   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv \
   24.84 +       Histogram       *masterLowTimeHist; \
   24.85 +       Histogram       *masterHighTimeHist;
   24.86 +
   24.87 +   #define MEAS__Make_Meas_Hists_for_Master_Meas \
   24.88 +      _PRMasterEnv->masterLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   24.89 +                                                    "master_low_time_hist");\
   24.90 +      _PRMasterEnv->masterHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   24.91 +                                                    "master_high_time_hist");
   24.92 +
   24.93 +      //Total Master time includes one coreloop time -- just assume the core
   24.94 +      // loop time is same for Master as for AppSlvs, even though it may be
   24.95 +      // smaller due to higher predictability of the fixed jmp.
   24.96 +   #define MEAS__Capture_Pre_Master_Point\
   24.97 +      saveLowTimeStampCountInto( masterVP->startMasterTSCLow );
   24.98 +
   24.99 +   #define MEAS__Capture_Post_Master_Point \
  24.100 +      saveLowTimeStampCountInto( masterVP->endMasterTSCLow );\
  24.101 +      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
  24.102 +                         _PRMasterEnv->masterLowTimeHist ); \
  24.103 +      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
  24.104 +                         _PRMasterEnv->masterHighTimeHist );
  24.105 +
  24.106 +   #define MEAS__Print_Hists_for_Master_Meas \
  24.107 +      printHist( _PRMasterEnv->pluginTimeHist );
  24.108 +
  24.109 +#else
  24.110 +   #define MEAS__Insert_Master_Meas_Fields_into_Slave
  24.111 +   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv 
  24.112 +   #define MEAS__Make_Meas_Hists_for_Master_Meas
  24.113 +   #define MEAS__Capture_Pre_Master_Point 
  24.114 +   #define MEAS__Capture_Post_Master_Point 
  24.115 +   #define MEAS__Print_Hists_for_Master_Meas 
  24.116 +#endif
  24.117 +
  24.118 +      
  24.119 +#ifdef MEAS__TURN_ON_MASTER_LOCK_MEAS
  24.120 +   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv \
  24.121 +       Histogram       *masterLockLowTimeHist; \
  24.122 +       Histogram       *masterLockHighTimeHist;
  24.123 +
  24.124 +   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas \
  24.125 +      _PRMasterEnv->masterLockLowTimeHist  = makeFixedBinHist( 50, 0, 2, \
  24.126 +                                               "master lock low time hist");\
  24.127 +      _PRMasterEnv->masterLockHighTimeHist  = makeFixedBinHist( 50, 0, 100,\
  24.128 +                                               "master lock high time hist");
  24.129 +
  24.130 +   #define MEAS__Capture_Pre_Master_Lock_Point \
  24.131 +      int32 startStamp, endStamp; \
  24.132 +      saveLowTimeStampCountInto( startStamp );
  24.133 +
  24.134 +   #define MEAS__Capture_Post_Master_Lock_Point \
  24.135 +      saveLowTimeStampCountInto( endStamp ); \
  24.136 +      addIntervalToHist( startStamp, endStamp,\
  24.137 +                         _PRMasterEnv->masterLockLowTimeHist ); \
  24.138 +      addIntervalToHist( startStamp, endStamp,\
  24.139 +                         _PRMasterEnv->masterLockHighTimeHist );
  24.140 +
  24.141 +   #define MEAS__Print_Hists_for_Master_Lock_Meas \
  24.142 +      printHist( _PRMasterEnv->masterLockLowTimeHist ); \
  24.143 +      printHist( _PRMasterEnv->masterLockHighTimeHist );
  24.144 +      
  24.145 +#else
  24.146 +   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv
  24.147 +   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas
  24.148 +   #define MEAS__Capture_Pre_Master_Lock_Point 
  24.149 +   #define MEAS__Capture_Post_Master_Lock_Point 
  24.150 +   #define MEAS__Print_Hists_for_Master_Lock_Meas
  24.151 +#endif
  24.152 +
  24.153 +
  24.154 +#ifdef MEAS__TURN_ON_MALLOC_MEAS
  24.155 +   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv\
  24.156 +       Histogram       *mallocTimeHist; \
  24.157 +       Histogram       *freeTimeHist;
  24.158 +
  24.159 +   #define MEAS__Make_Meas_Hists_for_Malloc_Meas \
  24.160 +      _PRMasterEnv->mallocTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
  24.161 +                                                       "malloc_time_hist");\
  24.162 +      _PRMasterEnv->freeTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
  24.163 +                                                       "free_time_hist");
  24.164 +
  24.165 +   #define MEAS__Capture_Pre_Malloc_Point \
  24.166 +      int32 startStamp, endStamp; \
  24.167 +      saveLowTimeStampCountInto( startStamp );
  24.168 +
  24.169 +   #define MEAS__Capture_Post_Malloc_Point \
  24.170 +      saveLowTimeStampCountInto( endStamp ); \
  24.171 +      addIntervalToHist( startStamp, endStamp,\
  24.172 +                         _PRMasterEnv->mallocTimeHist ); 
  24.173 +
  24.174 +   #define MEAS__Capture_Pre_Free_Point \
  24.175 +      int32 startStamp, endStamp; \
  24.176 +      saveLowTimeStampCountInto( startStamp );
  24.177 +
  24.178 +   #define MEAS__Capture_Post_Free_Point \
  24.179 +      saveLowTimeStampCountInto( endStamp ); \
  24.180 +      addIntervalToHist( startStamp, endStamp,\
  24.181 +                         _PRMasterEnv->freeTimeHist ); 
  24.182 +
  24.183 +   #define MEAS__Print_Hists_for_Malloc_Meas \
  24.184 +      printHist( _PRMasterEnv->mallocTimeHist   ); \
  24.185 +      saveHistToFile( _PRMasterEnv->mallocTimeHist   ); \
  24.186 +      printHist( _PRMasterEnv->freeTimeHist     ); \
  24.187 +      saveHistToFile( _PRMasterEnv->freeTimeHist     ); \
  24.188 +      freeHistExt( _PRMasterEnv->mallocTimeHist ); \
  24.189 +      freeHistExt( _PRMasterEnv->freeTimeHist   );
  24.190 +      
  24.191 +#else
  24.192 +   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv
  24.193 +   #define MEAS__Make_Meas_Hists_for_Malloc_Meas 
  24.194 +   #define MEAS__Capture_Pre_Malloc_Point
  24.195 +   #define MEAS__Capture_Post_Malloc_Point
  24.196 +   #define MEAS__Capture_Pre_Free_Point
  24.197 +   #define MEAS__Capture_Post_Free_Point
  24.198 +   #define MEAS__Print_Hists_for_Malloc_Meas 
  24.199 +#endif
  24.200 +
  24.201 +
  24.202 +
  24.203 +#ifdef MEAS__TURN_ON_PLUGIN_MEAS 
  24.204 +   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv \
  24.205 +      Histogram       *reqHdlrLowTimeHist; \
  24.206 +      Histogram       *reqHdlrHighTimeHist;
  24.207 +          
  24.208 +   #define MEAS__Make_Meas_Hists_for_Plugin_Meas \
  24.209 +      _PRMasterEnv->reqHdlrLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
  24.210 +                                                    "plugin_low_time_hist");\
  24.211 +      _PRMasterEnv->reqHdlrHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
  24.212 +                                                    "plugin_high_time_hist");
  24.213 +
  24.214 +   #define MEAS__startReqHdlr \
  24.215 +      int32 startStamp1, endStamp1; \
  24.216 +      saveLowTimeStampCountInto( startStamp1 );
  24.217 +
  24.218 +   #define MEAS__endReqHdlr \
  24.219 +      saveLowTimeStampCountInto( endStamp1 ); \
  24.220 +      addIntervalToHist( startStamp1, endStamp1, \
  24.221 +                           _PRMasterEnv->reqHdlrLowTimeHist ); \
  24.222 +      addIntervalToHist( startStamp1, endStamp1, \
  24.223 +                           _PRMasterEnv->reqHdlrHighTimeHist );
  24.224 +
  24.225 +   #define MEAS__Print_Hists_for_Plugin_Meas \
  24.226 +      printHist( _PRMasterEnv->reqHdlrLowTimeHist ); \
  24.227 +      saveHistToFile( _PRMasterEnv->reqHdlrLowTimeHist ); \
  24.228 +      printHist( _PRMasterEnv->reqHdlrHighTimeHist ); \
  24.229 +      saveHistToFile( _PRMasterEnv->reqHdlrHighTimeHist ); \
  24.230 +      freeHistExt( _PRMasterEnv->reqHdlrLowTimeHist ); \
  24.231 +      freeHistExt( _PRMasterEnv->reqHdlrHighTimeHist );
  24.232 +#else
  24.233 +   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv
  24.234 +   #define MEAS__Make_Meas_Hists_for_Plugin_Meas
  24.235 +   #define MEAS__startReqHdlr 
  24.236 +   #define MEAS__endReqHdlr 
  24.237 +   #define MEAS__Print_Hists_for_Plugin_Meas 
  24.238 +
  24.239 +#endif
  24.240 +
  24.241 +      
  24.242 +#ifdef MEAS__TURN_ON_SYSTEM_MEAS
  24.243 +   #define MEAS__Insert_System_Meas_Fields_into_Slave \
  24.244 +      TSCountLowHigh  startSusp; \
  24.245 +      uint64  totalSuspCycles; \
  24.246 +      uint32  numGoodSusp;
  24.247 +
  24.248 +   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv \
  24.249 +       TSCountLowHigh   startMaster; \
  24.250 +       uint64           totalMasterCycles; \
  24.251 +       uint32           numMasterAnimations; \
  24.252 +       TSCountLowHigh   startReqHdlr; \
  24.253 +       uint64           totalPluginCycles; \
  24.254 +       uint32           numPluginAnimations; \
  24.255 +       uint64           cyclesTillStartAnimationMaster; \
  24.256 +       TSCountLowHigh   endAnimationMaster;
  24.257 +
  24.258 +   #define MEAS__startAnimationMaster_forSys \
  24.259 +      TSCountLowHigh startStamp1, endStamp1; \
  24.260 +      saveTSCLowHigh( endStamp1 ); \
  24.261 +      _PRMasterEnv->cyclesTillStartAnimationMaster = \
  24.262 +      endStamp1.longVal - masterVP->startSusp.longVal;
  24.263 +
  24.264 +   #define Meas_startReqHdlr_forSys \
  24.265 +        saveTSCLowHigh( startStamp1 ); \
  24.266 +        _PRMasterEnv->startReqHdlr.longVal = startStamp1.longVal;
  24.267 + 
  24.268 +   #define MEAS__endAnimationMaster_forSys \
  24.269 +      saveTSCLowHigh( startStamp1 ); \
  24.270 +      _PRMasterEnv->endAnimationMaster.longVal = startStamp1.longVal;
  24.271 +
  24.272 +   /*A TSC is stored in VP first thing inside wrapper-lib
  24.273 +    * Now, measures cycles from there to here
  24.274 +    * Master and Plugin will add this value to other trace-seg measures
  24.275 +    */
  24.276 +   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys\
  24.277 +          saveTSCLowHigh(endSusp); \
  24.278 +          numCycles = endSusp.longVal - currVP->startSusp.longVal; \
  24.279 +          /*sanity check (400K is about 20K iters)*/ \
  24.280 +          if( numCycles < 400000 ) \
  24.281 +           { currVP->totalSuspCycles += numCycles; \
  24.282 +             currVP->numGoodSusp++; \
  24.283 +           } \
  24.284 +             /*recorded every time, but only read if currVP == MasterVP*/ \
  24.285 +          _PRMasterEnv->startMaster.longVal = endSusp.longVal;
  24.286 +
  24.287 +#else
  24.288 +   #define MEAS__Insert_System_Meas_Fields_into_Slave 
  24.289 +   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv 
  24.290 +   #define MEAS__Make_Meas_Hists_for_System_Meas
  24.291 +   #define MEAS__startAnimationMaster_forSys 
  24.292 +   #define MEAS__startReqHdlr_forSys
  24.293 +   #define MEAS__endAnimationMaster_forSys
  24.294 +   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys
  24.295 +   #define MEAS__Print_Hists_for_System_Meas 
  24.296 +#endif
  24.297 +
  24.298 +#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
  24.299 +   
  24.300 +   #define MEAS__Insert_Counter_Handler \
  24.301 +   typedef void (*CounterHandler) (int,int,int,SlaveVP*,uint64,uint64,uint64);
  24.302 + 
  24.303 +   enum eventType {
  24.304 +    DebugEvt = 0,
  24.305 +    AppResponderInvocation_start,
  24.306 +    AppResponder_start,
  24.307 +    AppResponder_end,
  24.308 +    AssignerInvocation_start,
  24.309 +    NextAssigner_start,
  24.310 +    Assigner_start,
  24.311 +    Assigner_end,
  24.312 +    Work_start,
  24.313 +    Work_end,
  24.314 +    HwResponderInvocation_start,
  24.315 +    Timestamp_start,
  24.316 +    Timestamp_end
  24.317 +   };
  24.318 +   
  24.319 +   #define saveCyclesAndInstrs(core,cycles,instrs,cachem) do{ \
  24.320 +   int cycles_fd = _PRMasterEnv->cycles_counter_fd[core]; \
  24.321 +   int instrs_fd = _PRMasterEnv->instrs_counter_fd[core]; \
  24.322 +   int cachem_fd = _PRMasterEnv->cachem_counter_fd[core]; \
  24.323 +   int nread;                                           \
  24.324 +                                                        \
  24.325 +   nread = read(cycles_fd,&(cycles),sizeof(cycles));    \
  24.326 +   if(nread<0){                                         \
  24.327 +       perror("Error reading cycles counter");          \
  24.328 +       cycles = 0;                                      \
  24.329 +   }                                                    \
  24.330 +                                                        \
  24.331 +   nread = read(instrs_fd,&(instrs),sizeof(instrs));    \
  24.332 +   if(nread<0){                                         \
  24.333 +       perror("Error reading cycles counter");          \
  24.334 +       instrs = 0;                                      \
  24.335 +   }                                                    \
  24.336 +   nread = read(cachem_fd,&(cachem),sizeof(cachem));    \
  24.337 +   if(nread<0){                                         \
  24.338 +       perror("Error reading last level cache miss counter");          \
  24.339 +       cachem = 0;                                      \
  24.340 +   }                                                    \
  24.341 +   } while (0) 
  24.342 +
  24.343 +   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv \
  24.344 +     int cycles_counter_fd[NUM_CORES]; \
  24.345 +     int instrs_counter_fd[NUM_CORES]; \
  24.346 +     int cachem_counter_fd[NUM_CORES]; \
  24.347 +     uint64 start_master_lock[NUM_CORES][3]; \
  24.348 +     CounterHandler counterHandler;
  24.349 +
  24.350 +   #define HOLISTIC__Setup_Perf_Counters setup_perf_counters();
  24.351 +   
  24.352 +
  24.353 +   #define HOLISTIC__CoreCtrl_Setup \
  24.354 +   CounterHandler counterHandler = _PRMasterEnv->counterHandler; \
  24.355 +   SlaveVP      *lastVPBeforeMaster = NULL; \
  24.356 +   /*if(thisCoresThdParams->coreNum == 0){ \
  24.357 +       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
  24.358 +       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
  24.359 +   } \
  24.360 +   if(0 < (thisCoresThdParams->coreNum) && (thisCoresThdParams->coreNum) < (NUM_CORES - 1)){ \
  24.361 +       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
  24.362 +       int sndctr = tsc_offset_resp(sendCoresThdParams, 0); \
  24.363 +       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
  24.364 +       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
  24.365 +   }  \
  24.366 +   if(thisCoresThdParams->coreNum == (NUM_CORES - 1)){ \
  24.367 +       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
  24.368 +       int sndctr = tsc_offset_resp(sendCoresThdParams,0); \
  24.369 +   }*/
  24.370 +   
  24.371 +   
  24.372 +   #define HOLISTIC__Insert_Master_Global_Vars \
  24.373 +        int vpid,task; \
  24.374 +        CounterHandler counterHandler = masterEnv->counterHandler;
  24.375 +   
  24.376 +   #define HOLISTIC__Record_last_work lastVPBeforeMaster = currVP;
  24.377 +
  24.378 +   #define HOLISTIC__Record_AppResponderInvocation_start \
  24.379 +      uint64 cycles,instrs,cachem; \
  24.380 +      saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  24.381 +      if(lastVPBeforeMaster){ \
  24.382 +        (*counterHandler)(AppResponderInvocation_start,lastVPBeforeMaster->slaveID,lastVPBeforeMaster->assignCount,lastVPBeforeMaster,cycles,instrs,cachem); \
  24.383 +        lastVPBeforeMaster = NULL; \
  24.384 +      } else { \
  24.385 +          _PRMasterEnv->start_master_lock[thisCoresIdx][0] = cycles; \
  24.386 +          _PRMasterEnv->start_master_lock[thisCoresIdx][1] = instrs; \
  24.387 +          _PRMasterEnv->start_master_lock[thisCoresIdx][2] = cachem; \
  24.388 +      }
  24.389 + 
  24.390 +           /* Request Handler may call resume() on the VP, but we want to 
  24.391 +                * account the whole interval to the same task. Therefore, need
  24.392 +                * to save task ID at the beginning.
  24.393 +                * 
  24.394 +                * Using this value as "end of AppResponder Invocation Time"
  24.395 +                * is possible if there is only one SchedSlot per core -
  24.396 +                * invoking processor is last to be treated here! If more than
  24.397 +                * one slot, MasterLoop processing time for all but the last VP
  24.398 +                * would be erroneously counted as invocation time.
  24.399 +                */
  24.400 +   #define HOLISTIC__Record_AppResponder_start \
  24.401 +               vpid = currSlot->slaveAssignedToSlot->slaveID; \
  24.402 +               task = currSlot->slaveAssignedToSlot->assignCount; \
  24.403 +               uint64 cycles, instrs, cachem; \
  24.404 +               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  24.405 +               (*counterHandler)(AppResponder_start,vpid,task,currSlot->slaveAssignedToSlot,cycles,instrs,cachem);
  24.406 +
  24.407 +   #define HOLISTIC__Record_AppResponder_end \
  24.408 +        uint64 cycles2,instrs2,cachem2; \
  24.409 +        saveCyclesAndInstrs(thisCoresIdx,cycles2, instrs2,cachem2); \
  24.410 +        (*counterHandler)(AppResponder_end,vpid,task,currSlot->slaveAssignedToSlot,cycles2,instrs2,cachem2); \
  24.411 +        (*counterHandler)(Timestamp_end,vpid,task,currSlot->slaveAssignedToSlot,rdtsc(),0,0);
  24.412 +
  24.413 +   
  24.414 +   /* Don't know who to account time to yet - goes to assigned VP
  24.415 +    * after the call.
  24.416 +    */
  24.417 +   #define HOLISTIC__Record_Assigner_start \
  24.418 +       int empty = FALSE; \
  24.419 +       if(currSlot->slaveAssignedToSlot == NULL){ \
  24.420 +           empty= TRUE; \
  24.421 +       } \
  24.422 +       uint64 tmp_cycles, tmp_instrs, tmp_cachem; \
  24.423 +       saveCyclesAndInstrs(thisCoresIdx,tmp_cycles,tmp_instrs,tmp_cachem); \
  24.424 +       uint64 tsc = rdtsc(); \
  24.425 +       if(vpid > 0) { \
  24.426 +           (*counterHandler)(NextAssigner_start,vpid,task,currSlot->slaveAssignedToSlot,tmp_cycles,tmp_instrs,tmp_cachem); \
  24.427 +           vpid = 0; \
  24.428 +           task = 0; \
  24.429 +        }
  24.430 +
  24.431 +   #define HOLISTIC__Record_Assigner_end \
  24.432 +        uint64 cycles,instrs,cachem; \
  24.433 +        saveCyclesAndInstrs(thisCoresIdx,cycles,instrs,cachem); \
  24.434 +        if(empty){ \
  24.435 +            (*counterHandler)(AssignerInvocation_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,masterEnv->start_master_lock[thisCoresIdx][0],masterEnv->start_master_lock[thisCoresIdx][1],masterEnv->start_master_lock[thisCoresIdx][2]); \
  24.436 +        } \
  24.437 +        (*counterHandler)(Timestamp_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tsc,0,0); \
  24.438 +        (*counterHandler)(Assigner_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tmp_cycles,tmp_instrs,tmp_cachem); \
  24.439 +        (*counterHandler)(Assigner_end,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,cycles,instrs,tmp_cachem);
  24.440 +
  24.441 +   #define HOLISTIC__Record_Work_start \
  24.442 +        if(currVP){ \
  24.443 +                uint64 cycles,instrs,cachem; \
  24.444 +                saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  24.445 +                (*counterHandler)(Work_start,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
  24.446 +        }
  24.447 +   
  24.448 +   #define HOLISTIC__Record_Work_end \
  24.449 +       if(currVP){ \
  24.450 +               uint64 cycles,instrs,cachem; \
  24.451 +               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  24.452 +               (*counterHandler)(Work_end,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
  24.453 +       }
  24.454 +
  24.455 +   #define HOLISTIC__Record_HwResponderInvocation_start \
  24.456 +        uint64 cycles,instrs,cachem; \
  24.457 +        saveCyclesAndInstrs(animatingSlv->coreAnimatedBy,cycles, instrs,cachem); \
  24.458 +        (*(_PRMasterEnv->counterHandler))(HwResponderInvocation_start,animatingSlv->slaveID,animatingSlv->assignCount,animatingSlv,cycles,instrs,cachem); 
  24.459 +        
  24.460 +
  24.461 +   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr) do{     \
  24.462 +void* frame_ptr0 = vp_ptr->framePtr;                               \
  24.463 +void* frame_ptr1 = *((void**)frame_ptr0);                          \
  24.464 +void* frame_ptr2 = *((void**)frame_ptr1);                          \
  24.465 +void* frame_ptr3 = *((void**)frame_ptr2);                          \
  24.466 +void* ret_addr = *((void**)frame_ptr3 + 1);                        \
  24.467 +*res_ptr = ret_addr;                                               \
  24.468 +} while (0)
  24.469 +
  24.470 +#else  
  24.471 +   #define MEAS__Insert_Counter_Handler
  24.472 +   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv
  24.473 +   #define HOLISTIC__Setup_Perf_Counters
  24.474 +   #define HOLISTIC__CoreCtrl_Setup
  24.475 +   #define HOLISTIC__Insert_Master_Global_Vars
  24.476 +   #define HOLISTIC__Record_last_work
  24.477 +   #define HOLISTIC__Record_AppResponderInvocation_start
  24.478 +   #define HOLISTIC__Record_AppResponder_start
  24.479 +   #define HOLISTIC__Record_AppResponder_end
  24.480 +   #define HOLISTIC__Record_Assigner_start
  24.481 +   #define HOLISTIC__Record_Assigner_end
  24.482 +   #define HOLISTIC__Record_Work_start
  24.483 +   #define HOLISTIC__Record_Work_end
  24.484 +   #define HOLISTIC__Record_HwResponderInvocation_start
  24.485 +   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr)
  24.486 +#endif
  24.487 +
  24.488 +//Experiment in two-step macros -- if doesn't work, insert each separately
  24.489 +#define MEAS__Insert_Meas_Fields_into_Slave  \
  24.490 +   MEAS__Insert_Susp_Meas_Fields_into_Slave \
  24.491 +   MEAS__Insert_Master_Meas_Fields_into_Slave \
  24.492 +   MEAS__Insert_System_Meas_Fields_into_Slave 
  24.493 +
  24.494 +
  24.495 +//======================  Histogram Macros -- Create ========================
  24.496 +//
  24.497 +//
  24.498 +
  24.499 +//The language implementation should include a definition of this macro,
  24.500 +// which creates all the histograms the language uses to collect measurements
  24.501 +// of plugin operation -- so, if the language didn't define it, must
  24.502 +// define it here (as empty), to avoid compile error
  24.503 +#ifndef MEAS__Make_Meas_Hists_for_Language
  24.504 +#define MEAS__Make_Meas_Hists_for_Language
  24.505 +#endif
  24.506 +
  24.507 +#define makeAMeasHist( idx, name, numBins, startVal, binWidth ) \
  24.508 +      makeHighestDynArrayIndexBeAtLeast( _PRMasterEnv->measHistsInfo, idx ); \
  24.509 +      _PRMasterEnv->measHists[idx] =  \
  24.510 +                       makeFixedBinHist( numBins, startVal, binWidth, name );
  24.511 +
  24.512 +//==============================  Probes  ===================================
  24.513 +
  24.514 +
  24.515 +//===========================================================================
  24.516 +#endif	/* _PR_DEFS_MEAS_H */
  24.517 +
    25.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    25.2 +++ b/Services_Offered_by_PR/Measurement_and_Stats/probes.c	Wed Sep 19 23:12:44 2012 -0700
    25.3 @@ -0,0 +1,304 @@
    25.4 +/*
    25.5 + * Copyright 2010  OpenSourceStewardshipFoundation
    25.6 + *
    25.7 + * Licensed under BSD
    25.8 + */
    25.9 +
   25.10 +#include <stdio.h>
   25.11 +#include <malloc.h>
   25.12 +#include <sys/time.h>
   25.13 +
   25.14 +#include "PR_impl/PR.h"
   25.15 +
   25.16 +
   25.17 +
   25.18 +//====================  Probes =================
   25.19 +/*
   25.20 + * In practice, probe operations are called from the app, from inside slaves
   25.21 + *  -- so have to be sure each probe is single-Slv owned, and be sure that
   25.22 + *  any place common structures are modified it's done inside the master.
   25.23 + * So -- the only place common structures are modified is during creation.
   25.24 + *  after that, all mods are to individual instances.
   25.25 + *
   25.26 + * Thniking perhaps should change the semantics to be that probes are
   25.27 + *  attached to the virtual processor -- and then everything is guaranteed
   25.28 + *  to be isolated -- except then can't take any intervals that span Slvs,
   25.29 + *  and would have to transfer the probes to Master env when Slv dissipates..
   25.30 + *  gets messy..
   25.31 + *
   25.32 + * For now, just making so that probe creation causes a suspend, so that
   25.33 + *  the dynamic array in the master env is only modified from the master
   25.34 + * 
   25.35 + */
   25.36 +
   25.37 +//============================  Helpers ===========================
   25.38 +inline void 
   25.39 +doNothing()
   25.40 + {
   25.41 + }
   25.42 +
   25.43 +float64 inline
   25.44 +giveInterval( struct timeval _start, struct timeval _end )
   25.45 + { float64 start, end;
   25.46 +   start = _start.tv_sec + _start.tv_usec / 1000000.0;
   25.47 +   end   = _end.tv_sec   + _end.tv_usec   / 1000000.0;
   25.48 +   return end - start;
   25.49 + }
   25.50 +          
   25.51 +//=================================================================
   25.52 +IntervalProbe *
   25.53 +create_generic_probe( char *nameStr, SlaveVP *animSlv )
   25.54 + {
   25.55 +   PRSemReq reqData;
   25.56 +
   25.57 +   reqData.reqType  = make_probe;
   25.58 +   reqData.nameStr  = nameStr;
   25.59 +
   25.60 +   PR_WL__send_PRSem_request( &reqData, animSlv );
   25.61 +
   25.62 +   return animSlv->dataRetFromReq;
   25.63 + }
   25.64 +
   25.65 +/*Use this version from outside PR -- it uses external malloc, and modifies
   25.66 + * dynamic array, so can't be animated in a slave Slv
   25.67 + */
   25.68 +IntervalProbe *
   25.69 +ext__create_generic_probe( char *nameStr )
   25.70 + { IntervalProbe *newProbe;
   25.71 +   int32          nameLen;
   25.72 +
   25.73 +   newProbe          = malloc( sizeof(IntervalProbe) );
   25.74 +   nameLen = strlen( nameStr );
   25.75 +   newProbe->nameStr = malloc( nameLen );
   25.76 +   memcpy( newProbe->nameStr, nameStr, nameLen );
   25.77 +   newProbe->hist    = NULL;
   25.78 +   newProbe->schedChoiceWasRecorded = FALSE;
   25.79 +   newProbe->probeID =
   25.80 +             addToDynArray( newProbe, _PRMasterEnv->dynIntervalProbesInfo );
   25.81 +
   25.82 +   return newProbe;
   25.83 + }
   25.84 +
   25.85 +//============================ Fns def in header =======================
   25.86 +
   25.87 +int32
   25.88 +PR_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv )
   25.89 + { IntervalProbe *newProbe;
   25.90 +
   25.91 +   newProbe = create_generic_probe( nameStr, animSlv );
   25.92 +   
   25.93 +   return newProbe->probeID;
   25.94 + }
   25.95 +
   25.96 +int32
   25.97 +PR_impl__create_histogram_probe( int32   numBins, float64    startValue,
   25.98 +               float64 binWidth, char   *nameStr, SlaveVP *animSlv )
   25.99 + { IntervalProbe *newProbe;
  25.100 +
  25.101 +   newProbe = create_generic_probe( nameStr, animSlv );
  25.102 +   
  25.103 +#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  25.104 +   DblHist *hist;
  25.105 +   hist =  makeDblHistogram( numBins, startValue, binWidth );
  25.106 +#else
  25.107 +   Histogram *hist;
  25.108 +   hist =  makeHistogram( numBins, startValue, binWidth );
  25.109 +#endif
  25.110 +   newProbe->hist = hist;
  25.111 +   return newProbe->probeID;
  25.112 + }
  25.113 +
  25.114 +
  25.115 +int32
  25.116 +PR_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv)
  25.117 + { IntervalProbe *newProbe;
  25.118 +   struct timeval *startStamp;
  25.119 +   float64 startSecs;
  25.120 +
  25.121 +   newProbe           = create_generic_probe( nameStr, animSlv );
  25.122 +   newProbe->endSecs  = 0;
  25.123 +
  25.124 +   
  25.125 +   gettimeofday( &(newProbe->startStamp), NULL);
  25.126 +
  25.127 +      //turn into a double
  25.128 +   startStamp = &(newProbe->startStamp);
  25.129 +   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
  25.130 +   newProbe->startSecs = startSecs;
  25.131 +
  25.132 +   return newProbe->probeID;
  25.133 + }
  25.134 +
  25.135 +int32
  25.136 +PR_ext_impl__record_time_point_into_new_probe( char *nameStr )
  25.137 + { IntervalProbe *newProbe;
  25.138 +   struct timeval *startStamp;
  25.139 +   float64 startSecs;
  25.140 +
  25.141 +   newProbe           = ext__create_generic_probe( nameStr );
  25.142 +   newProbe->endSecs  = 0;
  25.143 +
  25.144 +   gettimeofday( &(newProbe->startStamp), NULL);
  25.145 +
  25.146 +      //turn into a double
  25.147 +   startStamp = &(newProbe->startStamp);
  25.148 +   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
  25.149 +   newProbe->startSecs = startSecs;
  25.150 +
  25.151 +   return newProbe->probeID;
  25.152 + }
  25.153 +
  25.154 +
  25.155 +/*Only call from inside master or main startup/shutdown thread
  25.156 + */
  25.157 +void
  25.158 +PR_impl__free_probe( IntervalProbe *probe )
  25.159 + { if( probe->hist != NULL )   freeDblHist( probe->hist );
  25.160 +   if( probe->nameStr != NULL) PR_int__free( probe->nameStr );
  25.161 +   PR_int__free( probe );
  25.162 + }
  25.163 +
  25.164 +
  25.165 +void
  25.166 +PR_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv )
  25.167 + { IntervalProbe *probe;
  25.168 +
  25.169 +   PR_int__get_master_lock();
  25.170 +   probe = _PRMasterEnv->intervalProbes[ probeID ];
  25.171 +
  25.172 +   addValueIntoTable(probe->nameStr, probe, _PRMasterEnv->probeNameHashTbl);
  25.173 +   PR_int__release_master_lock();
  25.174 + }
  25.175 +
  25.176 +
  25.177 +IntervalProbe *
  25.178 +PR_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv )
  25.179 + {
  25.180 +   //TODO: fix this To be in Master -- race condition
  25.181 +   return getValueFromTable( probeName, _PRMasterEnv->probeNameHashTbl );
  25.182 + }
  25.183 +
  25.184 +
  25.185 +/*Everything is local to the animating slaveVP, so no need for request, do
  25.186 + * work locally, in the anim Slv
  25.187 + */
  25.188 +void
  25.189 +PR_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animatingSlv )
  25.190 + { IntervalProbe *probe;
  25.191 + 
  25.192 +   probe = _PRMasterEnv->intervalProbes[ probeID ];
  25.193 +   probe->schedChoiceWasRecorded = TRUE;
  25.194 +   probe->coreNum = animatingSlv->coreAnimatedBy;
  25.195 +   probe->slaveID = animatingSlv->slaveID;
  25.196 +   probe->slaveCreateSecs = animatingSlv->createPtInSecs;
  25.197 + }
  25.198 +
  25.199 +/*Everything is local to the animating slaveVP, so no need for request, do
  25.200 + * work locally, in the anim Slv
  25.201 + */
  25.202 +void
  25.203 +PR_impl__record_interval_start_in_probe( int32 probeID )
  25.204 + { IntervalProbe *probe;
  25.205 +
  25.206 +         DEBUG__printf( dbgProbes, "record start of interval" )
  25.207 +   probe = _PRMasterEnv->intervalProbes[ probeID ];
  25.208 +
  25.209 +      //record *start* point as last thing, after lookup
  25.210 +#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  25.211 +   gettimeofday( &(probe->startStamp), NULL);
  25.212 +#endif
  25.213 +#ifdef PROBES__USE_TSC_PROBES
  25.214 +   probe->startStamp = getTSCount();
  25.215 +#endif
  25.216 + }
  25.217 +
  25.218 +
  25.219 +/*Everything is local to the animating slaveVP, except the histogram, so do
  25.220 + * work locally, in the anim Slv -- may lose a few histogram counts
  25.221 + * 
  25.222 + *This should be safe to run inside SlaveVP
  25.223 + */
  25.224 +void
  25.225 +PR_impl__record_interval_end_in_probe( int32 probeID )
  25.226 + { IntervalProbe *probe;
  25.227 +
  25.228 +   //Record first thing -- before looking up the probe to store it into
  25.229 +#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  25.230 +   struct timeval  endStamp;
  25.231 +   gettimeofday( &(endStamp), NULL);
  25.232 +#endif
  25.233 +#ifdef PROBES__USE_TSC_PROBES
  25.234 +   TSCount endStamp, interval;
  25.235 +   endStamp = getTSCount();
  25.236 +#endif
  25.237 +#ifdef PROBES__USE_PERF_CTR_PROBES
  25.238 +
  25.239 +#endif
  25.240 +   
  25.241 +   probe = _PRMasterEnv->intervalProbes[ probeID ];
  25.242 +
  25.243 +#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  25.244 +   if( probe->hist != NULL )
  25.245 +    { addToDblHist( giveInterval( probe->startStamp, endStamp), probe->hist );
  25.246 +    }
  25.247 +#endif
  25.248 +#ifdef PROBES__USE_TSC_PROBES
  25.249 +   if( probe->hist != NULL )
  25.250 +    { interval = probe->endStamp - probe->startStamp;
  25.251 +         //Sanity check for TSC counter overflow: if sane, add to histogram
  25.252 +      if( interval < probe->hist->endOfRange * 10 )
  25.253 +         addToHist( interval, probe->hist );
  25.254 +    }
  25.255 +#endif
  25.256 +#ifdef PROBES__USE_PERF_CTR_PROBES
  25.257 +
  25.258 +#endif
  25.259 +   
  25.260 +         DEBUG__printf( dbgProbes, "record end of interval" )
  25.261 + }
  25.262 +
  25.263 +
  25.264 +void
  25.265 +print_probe_helper( IntervalProbe *probe )
  25.266 + {
  25.267 +   printf( "\nprobe: %s, ",  probe->nameStr );
  25.268 +   
  25.269 +   
  25.270 +   if( probe->schedChoiceWasRecorded )
  25.271 +    { printf( "coreNum: %d, slaveID: %d, slaveVPCreated: %0.6f | ",
  25.272 +              probe->coreNum, probe->slaveID, probe->slaveCreateSecs );
  25.273 +    }
  25.274 +
  25.275 +   if( probe->endSecs == 0 ) //just a single point in time
  25.276 +    {
  25.277 +      printf( " time point: %.6f\n",
  25.278 +              probe->startSecs - _PRMasterEnv->createPtInSecs );
  25.279 +    }
  25.280 +   else if( probe->hist == NULL ) //just an interval
  25.281 +    {
  25.282 +      printf( " startSecs: %.6f interval: %.6f\n", 
  25.283 +         (probe->startSecs - _PRMasterEnv->createPtInSecs), probe->interval);
  25.284 +    }
  25.285 +   else  //a full histogram of intervals
  25.286 +    {
  25.287 +      printDblHist( probe->hist );
  25.288 +    }
  25.289 + }
  25.290 +
  25.291 +void
  25.292 +PR_impl__print_stats_of_probe( IntervalProbe *probe )
  25.293 + { 
  25.294 +
  25.295 +//   probe = _PRMasterEnv->intervalProbes[ probeID ];
  25.296 +
  25.297 +   print_probe_helper( probe );
  25.298 + }
  25.299 +
  25.300 +
  25.301 +void
  25.302 +PR_impl__print_stats_of_all_probes()
  25.303 + {
  25.304 +   forAllInDynArrayDo( _PRMasterEnv->dynIntervalProbesInfo,
  25.305 +                          (DynArrayFnPtr) &PR_impl__print_stats_of_probe );
  25.306 +   fflush( stdout );
  25.307 + }
    26.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    26.2 +++ b/Services_Offered_by_PR/Measurement_and_Stats/probes.h	Wed Sep 19 23:12:44 2012 -0700
    26.3 @@ -0,0 +1,192 @@
    26.4 +/*
    26.5 + *  Copyright 2009 OpenSourceStewardshipFoundation.org
    26.6 + *  Licensed under GNU General Public License version 2
    26.7 + *
    26.8 + * Author: seanhalle@yahoo.com
    26.9 + * 
   26.10 + */
   26.11 +
   26.12 +#ifndef _PROBES_H
   26.13 +#define	_PROBES_H
   26.14 +#define _GNU_SOURCE
   26.15 +
   26.16 +#include "PR_impl/PR_primitive_data_types.h"
   26.17 +
   26.18 +#include <sys/time.h>
   26.19 +
   26.20 +/*Note on order of include files:  
   26.21 + * This file relies on #defines that appear in other files, which must come
   26.22 + * first in the #include sequence..
   26.23 + */
   26.24 +
   26.25 +/*Use these aliases in application code*/
   26.26 +#define PR_App__record_time_point_into_new_probe PR_WL__record_time_point_into_new_probe
   26.27 +#define PR_App__create_single_interval_probe   PR_WL__create_single_interval_probe
   26.28 +#define PR_App__create_histogram_probe         PR_WL__create_histogram_probe
   26.29 +#define PR_App__index_probe_by_its_name        PR_WL__index_probe_by_its_name
   26.30 +#define PR_App__get_probe_by_name              PR_WL__get_probe_by_name
   26.31 +#define PR_App__record_sched_choice_into_probe PR_WL__record_sched_choice_into_probe
   26.32 +#define PR_App__record_interval_start_in_probe PR_WL__record_interval_start_in_probe 
   26.33 +#define PR_App__record_interval_end_in_probe   PR_WL__record_interval_end_in_probe
   26.34 +#define PR_App__print_stats_of_probe           PR_WL__print_stats_of_probe
   26.35 +#define PR_App__print_stats_of_all_probes      PR_WL__print_stats_of_all_probes 
   26.36 +
   26.37 +
   26.38 +//==========================
   26.39 +#ifdef PROBES__USE_TSC_PROBES
   26.40 +   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   26.41 +   TSCount    startStamp; \
   26.42 +   TSCount    endStamp; \
   26.43 +   TSCount    interval; \
   26.44 +   Histogram *hist; /*if left NULL, then is single interval probe*/
   26.45 +#endif
   26.46 +#ifdef PROBES__USE_TIME_OF_DAY_PROBES
   26.47 +   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   26.48 +   struct timeval  startStamp; \
   26.49 +   struct timeval  endStamp; \
   26.50 +   float64         startSecs; \
   26.51 +   float64         endSecs; \
   26.52 +   float64         interval; \
   26.53 +   DblHist        *hist; /*if NULL, then is single interval probe*/
   26.54 +#endif
   26.55 +#ifdef PROBES__USE_PERF_CTR_PROBES
   26.56 +   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   26.57 +   int64  startStamp; \
   26.58 +   int64  endStamp; \
   26.59 +   int64  interval; \
   26.60 +   Histogram *hist; /*if left NULL, then is single interval probe*/
   26.61 +#endif
   26.62 +
   26.63 +//typedef struct _IntervalProbe IntervalProbe; -- is in PR.h
   26.64 +struct _IntervalProbe
   26.65 + {
   26.66 +   char           *nameStr;
   26.67 +   int32           probeID;
   26.68 +
   26.69 +   int32           schedChoiceWasRecorded;
   26.70 +   int32           coreNum;
   26.71 +   int32           slaveID;
   26.72 +   float64         slaveCreateSecs;
   26.73 +   PROBES__Insert_timestamps_and_intervals_into_probe_struct;
   26.74 + };
   26.75 +
   26.76 +//=========================== NEVER USE THESE ==========================
   26.77 +/*NEVER use these in any code!!  These are here only for use in the macros
   26.78 + * defined in this file!!
   26.79 + */
   26.80 +int32
   26.81 +PR_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv );
   26.82 +
   26.83 +int32
   26.84 +PR_impl__create_histogram_probe( int32   numBins, float64    startValue,
   26.85 +               float64 binWidth, char    *nameStr, SlaveVP *animSlv );
   26.86 +
   26.87 +int32
   26.88 +PR_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv);
   26.89 +
   26.90 +int32
   26.91 +PR_ext_impl__record_time_point_into_new_probe( char *nameStr );
   26.92 +
   26.93 +void
   26.94 +PR_impl__free_probe( IntervalProbe *probe );
   26.95 +
   26.96 +void
   26.97 +PR_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv );
   26.98 +
   26.99 +IntervalProbe *
  26.100 +PR_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv );
  26.101 +
  26.102 +void
  26.103 +PR_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animSlv );
  26.104 +
  26.105 +void
  26.106 +PR_impl__record_interval_start_in_probe( int32 probeID );
  26.107 +
  26.108 +void
  26.109 +PR_impl__record_interval_end_in_probe( int32 probeID );
  26.110 +
  26.111 +void
  26.112 +PR_impl__print_stats_of_probe( IntervalProbe *probe );
  26.113 +
  26.114 +void
  26.115 +PR_impl__print_stats_of_all_probes();
  26.116 +
  26.117 +
  26.118 +//======================== Probes =============================
  26.119 +//
  26.120 +// Use macros to allow turning probes off with a #define switch
  26.121 +// This means probes have zero impact on performance when off
  26.122 +//=============================================================
  26.123 +
  26.124 +#ifdef PROBES__TURN_ON_STATS_PROBES
  26.125 +
  26.126 +   #define PROBES__Create_Probe_Bookkeeping_Vars \
  26.127 +      _PRMasterEnv->dynIntervalProbesInfo = \
  26.128 +       makePrivDynArrayOfSize( (void***)&(_PRMasterEnv->intervalProbes), 200); \
  26.129 +      \
  26.130 +      _PRMasterEnv->probeNameHashTbl = makeHashTable( 1000, &PR_int__free ); \
  26.131 +      \
  26.132 +      /*put creation time directly into master env, for fast retrieval*/ \
  26.133 +   struct timeval timeStamp; \
  26.134 +   gettimeofday( &(timeStamp), NULL); \
  26.135 +   _PRMasterEnv->createPtInSecs = \
  26.136 +                           timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0);
  26.137 +
  26.138 +   #define PR_WL__record_time_point_into_new_probe( nameStr, animSlv ) \
  26.139 +           PR_impl__record_time_point_in_new_probe( nameStr, animSlv )
  26.140 +
  26.141 +   #define PR_ext__record_time_point_into_new_probe( nameStr ) \
  26.142 +           PR_ext_impl__record_time_point_into_new_probe( nameStr )
  26.143 +
  26.144 +   #define PR_WL__create_single_interval_probe( nameStr, animSlv ) \
  26.145 +           PR_impl__create_single_interval_probe( nameStr, animSlv )
  26.146 +
  26.147 +   #define PR_WL__create_histogram_probe(      numBins, startValue,              \
  26.148 +                                             binWidth, nameStr, animSlv )       \
  26.149 +           PR_impl__create_histogram_probe( numBins, startValue,              \
  26.150 +                                             binWidth, nameStr, animSlv )
  26.151 +   #define PR_int__free_probe( probe ) \
  26.152 +           PR_impl__free_probe( probe )
  26.153 +
  26.154 +   #define PR_WL__index_probe_by_its_name( probeID, animSlv ) \
  26.155 +           PR_impl__index_probe_by_its_name( probeID, animSlv )
  26.156 +
  26.157 +   #define PR_WL__get_probe_by_name( probeID, animSlv ) \
  26.158 +           PR_impl__get_probe_by_name( probeName, animSlv )
  26.159 +
  26.160 +   #define PR_WL__record_sched_choice_into_probe( probeID, animSlv ) \
  26.161 +           PR_impl__record_sched_choice_into_probe( probeID, animSlv )
  26.162 +
  26.163 +   #define PR_WL__record_interval_start_in_probe( probeID ) \
  26.164 +           PR_impl__record_interval_start_in_probe( probeID )
  26.165 +
  26.166 +   #define PR_WL__record_interval_end_in_probe( probeID ) \
  26.167 +           PR_impl__record_interval_end_in_probe( probeID )
  26.168 +
  26.169 +   #define PR_WL__print_stats_of_probe( probeID ) \
  26.170 +           PR_impl__print_stats_of_probe( probeID )
  26.171 +
  26.172 +   #define PR_WL__print_stats_of_all_probes() \
  26.173 +           PR_impl__print_stats_of_all_probes()
  26.174 +
  26.175 +
  26.176 +#else
  26.177 +   #define PROBES__Create_Probe_Bookkeeping_Vars
  26.178 +   #define PR_WL__record_time_point_into_new_probe( nameStr, animSlv ) 0 /* do nothing */
  26.179 +   #define PR_ext__record_time_point_into_new_probe( nameStr )  0 /* do nothing */
  26.180 +   #define PR_WL__create_single_interval_probe( nameStr, animSlv ) 0 /* do nothing */
  26.181 +   #define PR_WL__create_histogram_probe( numBins, startValue,              \
  26.182 +                                             binWidth, nameStr, animSlv )       \
  26.183 +          0 /* do nothing */
  26.184 +   #define PR_WL__index_probe_by_its_name( probeID, animSlv ) /* do nothing */
  26.185 +   #define PR_WL__get_probe_by_name( probeID, animSlv ) NULL /* do nothing */
  26.186 +   #define PR_WL__record_sched_choice_into_probe( probeID, animSlv ) /* do nothing */
  26.187 +   #define PR_WL__record_interval_start_in_probe( probeID )  /* do nothing */
  26.188 +   #define PR_WL__record_interval_end_in_probe( probeID )  /* do nothing */
  26.189 +   #define PR_WL__print_stats_of_probe( probeID ) ; /* do nothing */
  26.190 +   #define PR_WL__print_stats_of_all_probes() ;/* do nothing */
  26.191 +
  26.192 +#endif   /* defined PROBES__TURN_ON_STATS_PROBES */
  26.193 +
  26.194 +#endif	/* _PROBES_H */
  26.195 +
    27.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    27.2 +++ b/Services_Offered_by_PR/Memory_Handling/vmalloc.c	Wed Sep 19 23:12:44 2012 -0700
    27.3 @@ -0,0 +1,438 @@
    27.4 +/*
    27.5 + *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
    27.6 + *  Licensed under GNU General Public License version 2
    27.7 + *
    27.8 + * Author: seanhalle@yahoo.com
    27.9 + *
   27.10 + * Created on November 14, 2009, 9:07 PM
   27.11 + */
   27.12 +
   27.13 +#include <malloc.h>
   27.14 +#include <inttypes.h>
   27.15 +#include <stdlib.h>
   27.16 +#include <stdio.h>
   27.17 +#include <string.h>
   27.18 +#include <math.h>
   27.19 +
   27.20 +#include "PR_impl/PR.h"
   27.21 +#include "Histogram/Histogram.h"
   27.22 +
   27.23 +#define MAX_UINT64 0xFFFFFFFFFFFFFFFF
   27.24 +
   27.25 +//A MallocProlog is a head element if the HigherInMem variable is NULL
   27.26 +//A Chunk is free if the prevChunkInFreeList variable is NULL
   27.27 +
   27.28 +/*
   27.29 + * This calculates the container which fits the given size.
   27.30 + */
   27.31 +inline
   27.32 +uint32 getContainer(size_t size)
   27.33 +{
   27.34 +    return (log2(size)-LOG128)/LOG54;
   27.35 +}
   27.36 +
   27.37 +/*
   27.38 + * Removes the first chunk of a freeList
   27.39 + * The chunk is removed but not set as free. There is no check if
   27.40 + * the free list is empty, so make sure this is not the case.
   27.41 + */
   27.42 +inline
   27.43 +MallocProlog *removeChunk(MallocArrays* freeLists, uint32 containerIdx)
   27.44 +{
   27.45 +    MallocProlog** container = &freeLists->bigChunks[containerIdx];
   27.46 +    MallocProlog*  removedChunk = *container;
   27.47 +    *container = removedChunk->nextChunkInFreeList;
   27.48 +    
   27.49 +    if(removedChunk->nextChunkInFreeList)
   27.50 +        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
   27.51 +                (MallocProlog*)container;
   27.52 +    
   27.53 +    if(*container == NULL)
   27.54 +    {
   27.55 +       if(containerIdx < 64)
   27.56 +           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
   27.57 +       else
   27.58 +           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64));
   27.59 +    }
   27.60 +    
   27.61 +    return removedChunk;
   27.62 +}
   27.63 +
   27.64 +/*
   27.65 + * Removes the first chunk of a freeList
   27.66 + * The chunk is removed but not set as free. There is no check if
   27.67 + * the free list is empty, so make sure this is not the case.
   27.68 + */
   27.69 +inline
   27.70 +MallocProlog *removeSmallChunk(MallocArrays* freeLists, uint32 containerIdx)
   27.71 +{
   27.72 +    MallocProlog** container = &freeLists->smallChunks[containerIdx];
   27.73 +    MallocProlog*  removedChunk = *container;
   27.74 +    *container = removedChunk->nextChunkInFreeList;
   27.75 +    
   27.76 +    if(removedChunk->nextChunkInFreeList)
   27.77 +        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
   27.78 +                (MallocProlog*)container;
   27.79 +    
   27.80 +    return removedChunk;
   27.81 +}
   27.82 +
   27.83 +inline
   27.84 +size_t getChunkSize(MallocProlog* chunk)
   27.85 +{
   27.86 +    return (uintptr_t)chunk->nextHigherInMem -
   27.87 +            (uintptr_t)chunk - sizeof(MallocProlog);
   27.88 +}
   27.89 +
   27.90 +/*
   27.91 + * Removes a chunk from a free list.
   27.92 + */
   27.93 +inline
   27.94 +void extractChunk(MallocProlog* chunk, MallocArrays *freeLists)
   27.95 +{
   27.96 +   chunk->prevChunkInFreeList->nextChunkInFreeList = chunk->nextChunkInFreeList;
   27.97 +   if(chunk->nextChunkInFreeList)
   27.98 +       chunk->nextChunkInFreeList->prevChunkInFreeList = chunk->prevChunkInFreeList;
   27.99 +   
  27.100 +   //The last element in the list points to the container. If the container points
  27.101 +   //to NULL the container is empty
  27.102 +   if(*((void**)(chunk->prevChunkInFreeList)) == NULL && getChunkSize(chunk) >= BIG_LOWER_BOUND)
  27.103 +   {
  27.104 +       //Find the approppiate container because we do not know it
  27.105 +       uint64 containerIdx = ((uintptr_t)chunk->prevChunkInFreeList - (uintptr_t)freeLists->bigChunks) >> 3;
  27.106 +       if(containerIdx < (uint32)64)
  27.107 +           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
  27.108 +       if(containerIdx < 128 && containerIdx >=64)
  27.109 +           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64)); 
  27.110 +       
  27.111 +   }
  27.112 +}
  27.113 +
  27.114 +/*
  27.115 + * Merges two chunks.
  27.116 + * Chunk A has to be before chunk B in memory. Both have to be removed from
  27.117 + * a free list
  27.118 + */
  27.119 +inline
  27.120 +MallocProlog *mergeChunks(MallocProlog* chunkA, MallocProlog* chunkB)
  27.121 +{
  27.122 +    chunkA->nextHigherInMem = chunkB->nextHigherInMem;
  27.123 +    chunkB->nextHigherInMem->nextLowerInMem = chunkA;
  27.124 +    return chunkA;
  27.125 +}
  27.126 +/*
  27.127 + * Inserts a chunk into a free list.
  27.128 + */
  27.129 +inline
  27.130 +void insertChunk(MallocProlog* chunk, MallocProlog** container)
  27.131 +{
  27.132 +    chunk->nextChunkInFreeList = *container;
  27.133 +    chunk->prevChunkInFreeList = (MallocProlog*)container;
  27.134 +    if(*container)
  27.135 +        (*container)->prevChunkInFreeList = chunk;
  27.136 +    *container = chunk;
  27.137 +}
  27.138 +
  27.139 +/*
  27.140 + * Divides the chunk that a new chunk of newSize is created.
  27.141 + * There is no size check, so make sure the size value is valid.
  27.142 + */
  27.143 +inline
  27.144 +MallocProlog *divideChunk(MallocProlog* chunk, size_t newSize)
  27.145 +{
  27.146 +    MallocProlog* newChunk = (MallocProlog*)((uintptr_t)chunk->nextHigherInMem -
  27.147 +            newSize - sizeof(MallocProlog));
  27.148 +    
  27.149 +    newChunk->nextLowerInMem  = chunk;
  27.150 +    newChunk->nextHigherInMem = chunk->nextHigherInMem;
  27.151 +    
  27.152 +    chunk->nextHigherInMem->nextLowerInMem = newChunk;
  27.153 +    chunk->nextHigherInMem = newChunk;
  27.154 +    
  27.155 +    return newChunk;
  27.156 +}
  27.157 +
  27.158 +/* 
  27.159 + * Search for chunk in the list of big chunks. Split the block if it's too big
  27.160 + */
  27.161 +inline
  27.162 +MallocProlog *searchChunk(MallocArrays *freeLists, size_t sizeRequested, uint32 containerIdx)
  27.163 +{
  27.164 +    MallocProlog* foundChunk;
  27.165 +    
  27.166 +    uint64 searchVector = freeLists->bigChunksSearchVector[0];
  27.167 +    //set small chunk bits to zero
  27.168 +    searchVector &= MAX_UINT64 << containerIdx;
  27.169 +    containerIdx = __builtin_ffsl(searchVector); //least significant 1 bit
  27.170 +
  27.171 +    if(containerIdx == 0)
  27.172 +    {
  27.173 +       searchVector = freeLists->bigChunksSearchVector[1];
  27.174 +       containerIdx = __builtin_ffsl(searchVector);
  27.175 +       if(containerIdx == 0)
  27.176 +       {
  27.177 +           //TODO: get additional mem and insert into free list
  27.178 +           //malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
  27.179 +           printf("PR malloc failed: low memory");
  27.180 +           exit(1);   
  27.181 +       }
  27.182 +       containerIdx += 64;
  27.183 +    }
  27.184 +    containerIdx--;
  27.185 +    
  27.186 +
  27.187 +    foundChunk = removeChunk(freeLists, containerIdx);
  27.188 +    size_t chunkSize     = getChunkSize(foundChunk);
  27.189 +
  27.190 +    //If the new chunk is larger than the requested size: split
  27.191 +    if(chunkSize > sizeRequested + 2 * sizeof(MallocProlog) + BIG_LOWER_BOUND)
  27.192 +    {
  27.193 +       MallocProlog *newChunk = divideChunk(foundChunk,sizeRequested);
  27.194 +       containerIdx = getContainer(getChunkSize(foundChunk)) - 1;
  27.195 +       insertChunk(foundChunk,&freeLists->bigChunks[containerIdx]);
  27.196 +       if(containerIdx < 64)
  27.197 +           freeLists->bigChunksSearchVector[0] |= ((uint64)1 << containerIdx);
  27.198 +       else
  27.199 +           freeLists->bigChunksSearchVector[1] |= ((uint64)1 << (containerIdx-64));
  27.200 +       foundChunk = newChunk;
  27.201 +    } 
  27.202 +    
  27.203 +    return foundChunk;
  27.204 +}
  27.205 +
  27.206 +
  27.207 +/*
  27.208 + * This is sequential code, meant to only be called from the Master, not from
  27.209 + * any slave Slvs.
  27.210 + * 
  27.211 + *May 2012
  27.212 + *ToDo: Improve speed, by using built-in leading 1 detector to calc free-list
  27.213 + * index.
  27.214 + *Change to two separate arrays, one for free-lists of small fixed-size chunks
  27.215 + * other for free lists of exponentially growing chunk sizes
  27.216 + *Do simple compare to decide which array of lists to use
  27.217 + *For small chunks, size the lists in increments of 16, up to, say, 128 (1024
  27.218 + * is max if want less than 64 lists, which allows searching for first
  27.219 + * occupied free-list using leading-1 detector on a bit-vector)
  27.220 + *To find index, right-shift by 4 bits, and that's the index! (works because
  27.221 + * compare says no 1's above 128 position ((bit 7)), and sizes are every 16,
  27.222 + * so dividing by 16 equals exactly the position)
  27.223 + *For large chunks, have 63 free lists, but split into even and odd indexes.
  27.224 + *For even indexes, each list starts with chunks twice the size of previous
  27.225 + * even index.
  27.226 + *For odd indexes, each list starts with chunks of size half-way between those
  27.227 + * of the even indexes on either side.
  27.228 + *
  27.229 + *To calc the free-list position of a requested size, get pos of leading 1
  27.230 + * of the size, call this msbsP (most-significant-bit-set-position). Then
  27.231 + * check bit to right of it (one-less-significant)
  27.232 + *If it's 0 then use the even index: msbsP * 2, which is msbsP << 1.
  27.233 + *If it's 1, then use the odd-index, which is msbsP << 1  + 1
  27.234 + *
  27.235 + *To find msbsP, use GCC builtin: "int __builtin_clzll (unsigned long long)"
  27.236 + * which returns the number of zeros above (left of) msb set.  Note, dies if
  27.237 + * give it zero, but the compare used to choose between arrays makes sure
  27.238 + * requested size given to it is not zero.
  27.239 + * 
  27.240 + *This scheme keeps wastage small, while finding free element is O(1), and a
  27.241 + * fast constant.
  27.242 + *For large chunk sizes, if don't shave excess, then it ensures worst-case
  27.243 + * wastage due to mis-match in size of chunk vs requested size is 33% 
  27.244 + * (invariant: take any even list.. it starts at a power of 2, and next list
  27.245 + *  up starts at 50% larger, so biggest chunk is 1.5 x smallest request, that's
  27.246 + *  33% of total memory wasted. Then, for the odd index above, smallest chunk
  27.247 + *  is 2x for smallest request of 1.5x, for 25% total wasted memory)
  27.248 + *For smallest size chunks, the pre-amble wastes quite a bit, but above that,
  27.249 + * sizing in increments of 16 keeps wastage small.  And, if always shave, then
  27.250 + * wastage due to size mis-match is maximum 16 bytes for the large chunks.
  27.251 + * 
  27.252 + */
  27.253 +void *
  27.254 +PR_int__malloc( size_t sizeRequested )
  27.255 + {     
  27.256 +         MEAS__Capture_Pre_Malloc_Point
  27.257 +   
  27.258 +   MallocArrays* freeLists = _PRMasterEnv->freeLists;
  27.259 +   MallocProlog* foundChunk;
  27.260 +   
  27.261 +   //Return a small chunk if the requested size is smaller than 128B
  27.262 +   if(sizeRequested <= LOWER_BOUND)
  27.263 +    {
  27.264 +      uint32 freeListIdx = (sizeRequested-1)/SMALL_CHUNK_SIZE;
  27.265 +      if(freeLists->smallChunks[freeListIdx] == NULL)
  27.266 +        foundChunk = searchChunk(freeLists, SMALL_CHUNK_SIZE*(freeListIdx+1), 0);
  27.267 +      else
  27.268 +        foundChunk = removeSmallChunk(freeLists, freeListIdx);
  27.269 +       
  27.270 +      //Mark as allocated
  27.271 +      foundChunk->prevChunkInFreeList = NULL;      
  27.272 +      return foundChunk + 1;
  27.273 +    }
  27.274 +   
  27.275 +   //Calculate the expected container. Start one higher to have a Chunk that's
  27.276 +   //always big enough.
  27.277 +   uint32 containerIdx = getContainer(sizeRequested);
  27.278 +   
  27.279 +   if(freeLists->bigChunks[containerIdx] == NULL)
  27.280 +       foundChunk = searchChunk(freeLists, sizeRequested, containerIdx); 
  27.281 +   else
  27.282 +       foundChunk = removeChunk(freeLists, containerIdx); 
  27.283 +   
  27.284 +   //Mark as allocated
  27.285 +   foundChunk->prevChunkInFreeList = NULL;      
  27.286 +   
  27.287 +         MEAS__Capture_Post_Malloc_Point
  27.288 +   
  27.289 +   //skip over the prolog by adding its size to the pointer return
  27.290 +   return foundChunk + 1;
  27.291 + }
  27.292 +
  27.293 +void *
  27.294 +PR_WL__malloc( int32 sizeRequested )
  27.295 + { void *ret;
  27.296 + 
  27.297 +   PR_int__get_master_lock();
  27.298 +   ret = PR_int__malloc( sizeRequested );
  27.299 +   PR_int__release_master_lock();
  27.300 +   return ret;
  27.301 + }
  27.302 +
  27.303 +
  27.304 +/*
  27.305 + * This is sequential code, meant to only be called from the Master, not from
  27.306 + * any slave Slvs.
  27.307 + */
  27.308 +void
  27.309 +PR_int__free( void *ptrToFree )
  27.310 + {
  27.311 +    
  27.312 +         MEAS__Capture_Pre_Free_Point;
  27.313 +         
  27.314 +   MallocArrays* freeLists = _PRMasterEnv->freeLists;
  27.315 +   MallocProlog *chunkToFree = (MallocProlog*)ptrToFree - 1;
  27.316 +   uint32 containerIdx;
  27.317 +   
  27.318 +   //Check for free neighbors
  27.319 +   if(chunkToFree->nextLowerInMem)
  27.320 +   {
  27.321 +       if(chunkToFree->nextLowerInMem->prevChunkInFreeList != NULL)
  27.322 +       {//Chunk is not allocated
  27.323 +           extractChunk(chunkToFree->nextLowerInMem, freeLists);
  27.324 +           chunkToFree = mergeChunks(chunkToFree->nextLowerInMem, chunkToFree);
  27.325 +       }
  27.326 +   }
  27.327 +   if(chunkToFree->nextHigherInMem)
  27.328 +   {
  27.329 +       if(chunkToFree->nextHigherInMem->prevChunkInFreeList != NULL)
  27.330 +       {//Chunk is not allocated
  27.331 +           extractChunk(chunkToFree->nextHigherInMem, freeLists);
  27.332 +           chunkToFree = mergeChunks(chunkToFree, chunkToFree->nextHigherInMem);
  27.333 +       }
  27.334 +   }
  27.335 +   
  27.336 +   size_t chunkSize = getChunkSize(chunkToFree);
  27.337 +   if(chunkSize < BIG_LOWER_BOUND)
  27.338 +   {
  27.339 +       containerIdx =  (chunkSize/SMALL_CHUNK_SIZE)-1;
  27.340 +       if(containerIdx > SMALL_CHUNK_COUNT-1)
  27.341 +           containerIdx = SMALL_CHUNK_COUNT-1;
  27.342 +       insertChunk(chunkToFree, &freeLists->smallChunks[containerIdx]);
  27.343 +   }
  27.344 +   else
  27.345 +   {
  27.346 +       containerIdx = getContainer(getChunkSize(chunkToFree)) - 1;
  27.347 +       insertChunk(chunkToFree, &freeLists->bigChunks[containerIdx]);
  27.348 +       if(containerIdx < 64)
  27.349 +           freeLists->bigChunksSearchVector[0] |= (uint64)1 << containerIdx;
  27.350 +       else
  27.351 +           freeLists->bigChunksSearchVector[1] |= (uint64)1 << (containerIdx-64);
  27.352 +   }   
  27.353 +   
  27.354 +         MEAS__Capture_Post_Free_Point;
  27.355 + }
  27.356 +
  27.357 +void
  27.358 +PR_WL__free( void *ptrToFree )
  27.359 + {
  27.360 +   PR_int__get_master_lock();
  27.361 +   PR_int__free( ptrToFree );
  27.362 +   PR_int__release_master_lock();
  27.363 + }
  27.364 +
  27.365 +/*
  27.366 + * Designed to be called from the main thread outside of PR, during init
  27.367 + */
  27.368 +MallocArrays *
  27.369 +PR_ext__create_free_list()
  27.370 +{     
  27.371 +   //Initialize containers for small chunks and fill with zeros
  27.372 +   _PRMasterEnv->freeLists = (MallocArrays*)malloc( sizeof(MallocArrays) );
  27.373 +   MallocArrays *freeLists = _PRMasterEnv->freeLists;
  27.374 +   
  27.375 +   freeLists->smallChunks = 
  27.376 +           (MallocProlog**)malloc(SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
  27.377 +   memset((void*)freeLists->smallChunks,
  27.378 +           0,SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
  27.379 +   
  27.380 +   //Calculate number of containers for big chunks
  27.381 +   uint32 container = getContainer(MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE)+1;
  27.382 +   freeLists->bigChunks = (MallocProlog**)malloc(container*sizeof(MallocProlog*));
  27.383 +   memset((void*)freeLists->bigChunks,0,container*sizeof(MallocProlog*));
  27.384 +   freeLists->containerCount = container;
  27.385 +   
  27.386 +   //Create first element in lastContainer 
  27.387 +   MallocProlog *firstChunk = malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
  27.388 +   if( firstChunk == NULL ) {printf("Can't allocate initial memory\n"); exit(1);}
  27.389 +   freeLists->memSpace = firstChunk;
  27.390 +   
  27.391 +   //Touch memory to avoid page faults
  27.392 +   void *ptr,*endPtr; 
  27.393 +   endPtr = (void*)firstChunk+MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE;
  27.394 +   for(ptr = firstChunk; ptr < endPtr; ptr+=PAGE_SIZE)
  27.395 +   {
  27.396 +       *(char*)ptr = 0;
  27.397 +   }
  27.398 +   
  27.399 +   firstChunk->nextLowerInMem = NULL;
  27.400 +   firstChunk->nextHigherInMem = (MallocProlog*)((uintptr_t)firstChunk +
  27.401 +                        MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE - sizeof(MallocProlog));
  27.402 +   firstChunk->nextChunkInFreeList = NULL;
  27.403 +   //previous element in the queue is the container
  27.404 +   firstChunk->prevChunkInFreeList = &freeLists->bigChunks[container-2];
  27.405 +   
  27.406 +   freeLists->bigChunks[container-2] = firstChunk;
  27.407 +   //Insert into bit search list
  27.408 +   if(container <= 65)
  27.409 +   {
  27.410 +       freeLists->bigChunksSearchVector[0] = ((uint64)1 << (container-2));
  27.411 +       freeLists->bigChunksSearchVector[1] = 0;
  27.412 +   }   
  27.413 +   else
  27.414 +   {
  27.415 +       freeLists->bigChunksSearchVector[0] = 0;
  27.416 +       freeLists->bigChunksSearchVector[1] = ((uint64)1 << (container-66));
  27.417 +   }
  27.418 +   
  27.419 +   //Create dummy chunk to mark the top of stack this is of course
  27.420 +   //never freed
  27.421 +   MallocProlog *dummyChunk = firstChunk->nextHigherInMem;
  27.422 +   dummyChunk->nextHigherInMem = dummyChunk+1;
  27.423 +   dummyChunk->nextLowerInMem  = NULL;
  27.424 +   dummyChunk->nextChunkInFreeList = NULL;
  27.425 +   dummyChunk->prevChunkInFreeList = NULL;
  27.426 +   
  27.427 +   return freeLists;
  27.428 + }
  27.429 +
  27.430 +
  27.431 +/*Designed to be called from the main thread outside of PR, during cleanup
  27.432 + */
  27.433 +void
  27.434 +PR_ext__free_free_list( MallocArrays *freeLists )
  27.435 + {    
  27.436 +   free(freeLists->memSpace);
  27.437 +   free(freeLists->bigChunks);
  27.438 +   free(freeLists->smallChunks);
  27.439 +   
  27.440 + }
  27.441 +
    28.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    28.2 +++ b/Services_Offered_by_PR/Memory_Handling/vmalloc.h	Wed Sep 19 23:12:44 2012 -0700
    28.3 @@ -0,0 +1,94 @@
    28.4 +/*
    28.5 + *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
    28.6 + *  Licensed under GNU General Public License version 2
    28.7 + *
    28.8 + * Author: seanhalle@yahoo.com
    28.9 + *
   28.10 + * Created on November 14, 2009, 9:07 PM
   28.11 + */
   28.12 +
   28.13 +#ifndef _VMALLOC_H
   28.14 +#define	_VMALLOC_H
   28.15 +
   28.16 +#include <malloc.h>
   28.17 +#include <inttypes.h>
   28.18 +#include "PR_impl/PR_primitive_data_types.h"
   28.19 +
   28.20 +#define SMALL_CHUNK_SIZE 32
   28.21 +#define SMALL_CHUNK_COUNT 4
   28.22 +#define LOWER_BOUND     128  //Biggest chunk size that is created for the small chunks
   28.23 +#define BIG_LOWER_BOUND 160  //Smallest chunk size that is created for the big chunks
   28.24 +
   28.25 +#define LOG54 0.3219280948873623
   28.26 +#define LOG128 7
   28.27 +
   28.28 +typedef struct _MallocProlog MallocProlog;
   28.29 +
   28.30 +struct _MallocProlog
   28.31 + {
   28.32 +   MallocProlog *nextChunkInFreeList;
   28.33 +   MallocProlog *prevChunkInFreeList;
   28.34 +   MallocProlog *nextHigherInMem;
   28.35 +   MallocProlog *nextLowerInMem;
   28.36 + };
   28.37 +//MallocProlog
   28.38 + 
   28.39 + typedef struct MallocArrays MallocArrays;
   28.40 +
   28.41 + struct MallocArrays
   28.42 + {
   28.43 +     MallocProlog **smallChunks;
   28.44 +     MallocProlog **bigChunks;
   28.45 +     uint64       bigChunksSearchVector[2];
   28.46 +     void         *memSpace;
   28.47 +     uint32       containerCount;
   28.48 + };
   28.49 + //MallocArrays
   28.50 +
   28.51 +typedef struct
   28.52 + {
   28.53 +   MallocProlog *firstChunkInFreeList;
   28.54 +   int32         numInList; //TODO not used
   28.55 + }
   28.56 +FreeListHead;
   28.57 +
   28.58 +void *
   28.59 +PR_int__malloc( size_t sizeRequested );
   28.60 +#define PR_PI__malloc  PR_int__malloc
   28.61 +
   28.62 +void *
   28.63 +PR_WL__malloc( int32  sizeRequested ); /*BUG: -- get master lock */
   28.64 +#define PR_App__malloc  PR_WL__malloc
   28.65 +
   28.66 +void *
   28.67 +PR_int__malloc_aligned( size_t sizeRequested );
   28.68 +#define PR_PI__malloc_aligned PR_int__malloc_aligned
   28.69 +
   28.70 +void
   28.71 +PR_int__free( void *ptrToFree );
   28.72 +#define PR_PI__free  PR_int__free
   28.73 +
   28.74 +void
   28.75 +PR_WL__free( void *ptrToFree );
   28.76 +#define PR_App__free  PR_WL__free
   28.77 +
   28.78 +
   28.79 +
   28.80 +/*Allocates memory from the external system -- higher overhead
   28.81 + */
   28.82 +void *
   28.83 +PR_ext__malloc_in_ext( size_t sizeRequested );
   28.84 +
   28.85 +/*Frees memory that was allocated in the external system -- higher overhead
   28.86 + */
   28.87 +void
   28.88 +PR_ext__free_in_ext( void *ptrToFree );
   28.89 +
   28.90 +
   28.91 +MallocArrays *
   28.92 +PR_ext__create_free_list();
   28.93 +
   28.94 +void
   28.95 +PR_ext__free_free_list(MallocArrays *freeLists );
   28.96 +
   28.97 +#endif
   28.98 \ No newline at end of file
    29.1 --- a/Services_Offered_by_VMS/Debugging/DEBUG__macros.h	Mon Sep 03 03:34:54 2012 -0700
    29.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    29.3 @@ -1,65 +0,0 @@
    29.4 -/*
    29.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    29.6 - *  Licensed under GNU General Public License version 2
    29.7 - *
    29.8 - * Author: seanhalle@yahoo.com
    29.9 - * 
   29.10 - */
   29.11 -
   29.12 -#ifndef  _VMS_DEFS_DEBUG_H
   29.13 -#define	_VMS_DEFS_DEBUG_H
   29.14 -#define _GNU_SOURCE
   29.15 -
   29.16 -/*
   29.17 - */
   29.18 -#ifdef DEBUG__TURN_ON_DEBUG_PRINT
   29.19 -   #define DEBUG__printf(  bool, msg) \
   29.20 -      do{\
   29.21 -         if(bool)\
   29.22 -          { printf(msg);\
   29.23 -            printf(" | function: %s\n", __FUNCTION__);\
   29.24 -            fflush(stdin);\
   29.25 -          }\
   29.26 -        }while(0);/*macro magic to isolate var-names*/
   29.27 -
   29.28 -   #define DEBUG__printf1( bool, msg, param)  \
   29.29 -      do{\
   29.30 -         if(bool)\
   29.31 -          { printf(msg, param);\
   29.32 -            printf(" | function: %s\n", __FUNCTION__);\
   29.33 -            fflush(stdin);\
   29.34 -          }\
   29.35 -        }while(0);/*macro magic to isolate var-names*/
   29.36 -
   29.37 -   #define DEBUG__printf2( bool, msg, p1, p2) \
   29.38 -      do{\
   29.39 -         if(bool)\
   29.40 -          { printf(msg, p1, p2); \
   29.41 -            printf(" | function: %s\n", __FUNCTION__);\
   29.42 -            fflush(stdin);\
   29.43 -          }\
   29.44 -        }while(0);/*macro magic to isolate var-names*/
   29.45 -
   29.46 -   #define DEBUG__printf3( bool, msg, p1, p2, p3) \
   29.47 -      do{\
   29.48 -         if(bool)\
   29.49 -          { printf(msg, p1, p2, p3); \
   29.50 -            printf(" | function: %s\n", __FUNCTION__);\
   29.51 -            fflush(stdin);\
   29.52 -          }\
   29.53 -        }while(0);/*macro magic to isolate var-names*/
   29.54 -
   29.55 -#else
   29.56 -   #define DEBUG__printf(  bool, msg)         
   29.57 -   #define DEBUG__printf1( bool, msg, param)  
   29.58 -   #define DEBUG__printf2( bool, msg, p1, p2) 
   29.59 -#endif
   29.60 -
   29.61 -//============================= ERROR MSGs ============================
   29.62 -#define ERROR(msg) printf(msg);
   29.63 -#define ERROR1(msg, param) printf(msg, param); 
   29.64 -#define ERROR2(msg, p1, p2) printf(msg, p1, p2);
   29.65 -
   29.66 -//===========================================================================
   29.67 -#endif	/* _VMS_DEFS_H */
   29.68 -
    30.1 --- a/Services_Offered_by_VMS/Lang_Constructs/VMS_Lang.h	Mon Sep 03 03:34:54 2012 -0700
    30.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    30.3 @@ -1,44 +0,0 @@
    30.4 -/*
    30.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    30.6 - *  Licensed under GNU General Public License version 2
    30.7 - *
    30.8 - * Author: seanhalle@yahoo.com
    30.9 - *
   30.10 - */
   30.11 -
   30.12 -#ifndef _VMS_LANG_CONSTRUCTS_H
   30.13 -#define	_VMS_LANG_CONSTRUCTS_H
   30.14 -
   30.15 -#include "VMS_impl/VMS_primitive_data_types.h"
   30.16 -
   30.17 -/*This header defines everything specific to the VMS provided language
   30.18 - * constructs.
   30.19 - *Such constructs are used in application code, mixed-in with calls to
   30.20 - * constructs of the VMS-based language. 
   30.21 - */
   30.22 -inline void
   30.23 -handleMalloc( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv);
   30.24 -inline void
   30.25 -handleFree( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv );
   30.26 -inline void
   30.27 -handleTransEnd(SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv*semEnv);
   30.28 -inline void
   30.29 -handleTransStart( SSRSemReq *semReq, SlaveVP *requestingSlv,
   30.30 -                  SSRSemEnv *semEnv );
   30.31 -inline void
   30.32 -handleAtomic( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv);
   30.33 -inline void
   30.34 -handleStartFnSingleton( SSRSemReq *semReq, SlaveVP *reqstingSlv,
   30.35 -                      SSRSemEnv *semEnv );
   30.36 -inline void
   30.37 -handleEndFnSingleton( SSRSemReq *semReq, SlaveVP *requestingSlv,
   30.38 -                    SSRSemEnv *semEnv );
   30.39 -inline void
   30.40 -handleStartDataSingleton( SSRSemReq *semReq, SlaveVP *reqstingSlv,
   30.41 -                      SSRSemEnv *semEnv );
   30.42 -inline void
   30.43 -handleEndDataSingleton( SSRSemReq *semReq, SlaveVP *requestingSlv,
   30.44 -                    SSRSemEnv *semEnv );
   30.45 -
   30.46 -#endif	/* _VMS_LANG_CONSTRUCTS_H */
   30.47 -
    31.1 --- a/Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h	Mon Sep 03 03:34:54 2012 -0700
    31.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    31.3 @@ -1,514 +0,0 @@
    31.4 -/*
    31.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    31.6 - *  Licensed under GNU General Public License version 2
    31.7 - *
    31.8 - * Author: seanhalle@yahoo.com
    31.9 - * 
   31.10 - */
   31.11 -
   31.12 -#ifndef _VMS_MEAS_MACROS_H
   31.13 -#define _VMS_MEAS_MACROS_H
   31.14 -#define _GNU_SOURCE
   31.15 -
   31.16 -//==================  Macros define types of meas want  =====================
   31.17 -//
   31.18 -/*Generic measurement macro -- has name-space collision potential, which
   31.19 - * compiler will catch..  so only use one pair inside a given set of 
   31.20 - * curly braces. 
   31.21 - */
   31.22 -//TODO: finish generic capture interval in hist
   31.23 -enum histograms
   31.24 - { generic1
   31.25 - };
   31.26 -   #define MEAS__Capture_Pre_Point \
   31.27 -      int32 startStamp, endStamp; \
   31.28 -      saveLowTimeStampCountInto( startStamp );
   31.29 -
   31.30 -   #define MEAS__Capture_Post_Point( histName ) \
   31.31 -      saveLowTimeStampCountInto( endStamp ); \
   31.32 -      addIntervalToHist( startStamp, endStamp, _VMSMasterEnv->histName ); 
   31.33 -
   31.34 -
   31.35 -
   31.36 -
   31.37 -//==================  Macros define types of meas want  =====================
   31.38 -
   31.39 -#ifdef MEAS__TURN_ON_SUSP_MEAS
   31.40 -   #define MEAS__Insert_Susp_Meas_Fields_into_Slave \
   31.41 -       uint32  preSuspTSCLow; \
   31.42 -       uint32  postSuspTSCLow;
   31.43 -
   31.44 -   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv \
   31.45 -       Histogram       *suspLowTimeHist; \
   31.46 -       Histogram       *suspHighTimeHist;
   31.47 -
   31.48 -   #define MEAS__Make_Meas_Hists_for_Susp_Meas \
   31.49 -      _VMSMasterEnv->suspLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   31.50 -                                                    "master_low_time_hist");\
   31.51 -      _VMSMasterEnv->suspHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   31.52 -                                                    "master_high_time_hist");
   31.53 -      
   31.54 -      //record time stamp: compare to time-stamp recorded below
   31.55 -   #define MEAS__Capture_Pre_Susp_Point \
   31.56 -      saveLowTimeStampCountInto( animatingSlv->preSuspTSCLow );
   31.57 -   
   31.58 -      //NOTE: only take low part of count -- do sanity check when take diff
   31.59 -   #define MEAS__Capture_Post_Susp_Point \
   31.60 -      saveLowTimeStampCountInto( animatingSlv->postSuspTSCLow );\
   31.61 -      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
   31.62 -                         _VMSMasterEnv->suspLowTimeHist ); \
   31.63 -      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
   31.64 -                         _VMSMasterEnv->suspHighTimeHist );
   31.65 -
   31.66 -   #define MEAS__Print_Hists_for_Susp_Meas \
   31.67 -      printHist( _VMSMasterEnv->pluginTimeHist );
   31.68 -      
   31.69 -#else
   31.70 -   #define MEAS__Insert_Susp_Meas_Fields_into_Slave     
   31.71 -   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv 
   31.72 -   #define MEAS__Make_Meas_Hists_for_Susp_Meas 
   31.73 -   #define MEAS__Capture_Pre_Susp_Point
   31.74 -   #define MEAS__Capture_Post_Susp_Point   
   31.75 -   #define MEAS__Print_Hists_for_Susp_Meas 
   31.76 -#endif
   31.77 -
   31.78 -#ifdef MEAS__TURN_ON_MASTER_MEAS
   31.79 -   #define MEAS__Insert_Master_Meas_Fields_into_Slave \
   31.80 -       uint32  startMasterTSCLow; \
   31.81 -       uint32  endMasterTSCLow;
   31.82 -
   31.83 -   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv \
   31.84 -       Histogram       *masterLowTimeHist; \
   31.85 -       Histogram       *masterHighTimeHist;
   31.86 -
   31.87 -   #define MEAS__Make_Meas_Hists_for_Master_Meas \
   31.88 -      _VMSMasterEnv->masterLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   31.89 -                                                    "master_low_time_hist");\
   31.90 -      _VMSMasterEnv->masterHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
   31.91 -                                                    "master_high_time_hist");
   31.92 -
   31.93 -      //Total Master time includes one coreloop time -- just assume the core
   31.94 -      // loop time is same for Master as for AppSlvs, even though it may be
   31.95 -      // smaller due to higher predictability of the fixed jmp.
   31.96 -   #define MEAS__Capture_Pre_Master_Point\
   31.97 -      saveLowTimeStampCountInto( masterVP->startMasterTSCLow );
   31.98 -
   31.99 -   #define MEAS__Capture_Post_Master_Point \
  31.100 -      saveLowTimeStampCountInto( masterVP->endMasterTSCLow );\
  31.101 -      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
  31.102 -                         _VMSMasterEnv->masterLowTimeHist ); \
  31.103 -      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
  31.104 -                         _VMSMasterEnv->masterHighTimeHist );
  31.105 -
  31.106 -   #define MEAS__Print_Hists_for_Master_Meas \
  31.107 -      printHist( _VMSMasterEnv->pluginTimeHist );
  31.108 -
  31.109 -#else
  31.110 -   #define MEAS__Insert_Master_Meas_Fields_into_Slave
  31.111 -   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv 
  31.112 -   #define MEAS__Make_Meas_Hists_for_Master_Meas
  31.113 -   #define MEAS__Capture_Pre_Master_Point 
  31.114 -   #define MEAS__Capture_Post_Master_Point 
  31.115 -   #define MEAS__Print_Hists_for_Master_Meas 
  31.116 -#endif
  31.117 -
  31.118 -      
  31.119 -#ifdef MEAS__TURN_ON_MASTER_LOCK_MEAS
  31.120 -   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv \
  31.121 -       Histogram       *masterLockLowTimeHist; \
  31.122 -       Histogram       *masterLockHighTimeHist;
  31.123 -
  31.124 -   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas \
  31.125 -      _VMSMasterEnv->masterLockLowTimeHist  = makeFixedBinHist( 50, 0, 2, \
  31.126 -                                               "master lock low time hist");\
  31.127 -      _VMSMasterEnv->masterLockHighTimeHist  = makeFixedBinHist( 50, 0, 100,\
  31.128 -                                               "master lock high time hist");
  31.129 -
  31.130 -   #define MEAS__Capture_Pre_Master_Lock_Point \
  31.131 -      int32 startStamp, endStamp; \
  31.132 -      saveLowTimeStampCountInto( startStamp );
  31.133 -
  31.134 -   #define MEAS__Capture_Post_Master_Lock_Point \
  31.135 -      saveLowTimeStampCountInto( endStamp ); \
  31.136 -      addIntervalToHist( startStamp, endStamp,\
  31.137 -                         _VMSMasterEnv->masterLockLowTimeHist ); \
  31.138 -      addIntervalToHist( startStamp, endStamp,\
  31.139 -                         _VMSMasterEnv->masterLockHighTimeHist );
  31.140 -
  31.141 -   #define MEAS__Print_Hists_for_Master_Lock_Meas \
  31.142 -      printHist( _VMSMasterEnv->masterLockLowTimeHist ); \
  31.143 -      printHist( _VMSMasterEnv->masterLockHighTimeHist );
  31.144 -      
  31.145 -#else
  31.146 -   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv
  31.147 -   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas
  31.148 -   #define MEAS__Capture_Pre_Master_Lock_Point 
  31.149 -   #define MEAS__Capture_Post_Master_Lock_Point 
  31.150 -   #define MEAS__Print_Hists_for_Master_Lock_Meas
  31.151 -#endif
  31.152 -
  31.153 -
  31.154 -#ifdef MEAS__TURN_ON_MALLOC_MEAS
  31.155 -   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv\
  31.156 -       Histogram       *mallocTimeHist; \
  31.157 -       Histogram       *freeTimeHist;
  31.158 -
  31.159 -   #define MEAS__Make_Meas_Hists_for_Malloc_Meas \
  31.160 -      _VMSMasterEnv->mallocTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
  31.161 -                                                       "malloc_time_hist");\
  31.162 -      _VMSMasterEnv->freeTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
  31.163 -                                                       "free_time_hist");
  31.164 -
  31.165 -   #define MEAS__Capture_Pre_Malloc_Point \
  31.166 -      int32 startStamp, endStamp; \
  31.167 -      saveLowTimeStampCountInto( startStamp );
  31.168 -
  31.169 -   #define MEAS__Capture_Post_Malloc_Point \
  31.170 -      saveLowTimeStampCountInto( endStamp ); \
  31.171 -      addIntervalToHist( startStamp, endStamp,\
  31.172 -                         _VMSMasterEnv->mallocTimeHist ); 
  31.173 -
  31.174 -   #define MEAS__Capture_Pre_Free_Point \
  31.175 -      int32 startStamp, endStamp; \
  31.176 -      saveLowTimeStampCountInto( startStamp );
  31.177 -
  31.178 -   #define MEAS__Capture_Post_Free_Point \
  31.179 -      saveLowTimeStampCountInto( endStamp ); \
  31.180 -      addIntervalToHist( startStamp, endStamp,\
  31.181 -                         _VMSMasterEnv->freeTimeHist ); 
  31.182 -
  31.183 -   #define MEAS__Print_Hists_for_Malloc_Meas \
  31.184 -      printHist( _VMSMasterEnv->mallocTimeHist   ); \
  31.185 -      saveHistToFile( _VMSMasterEnv->mallocTimeHist   ); \
  31.186 -      printHist( _VMSMasterEnv->freeTimeHist     ); \
  31.187 -      saveHistToFile( _VMSMasterEnv->freeTimeHist     ); \
  31.188 -      freeHistExt( _VMSMasterEnv->mallocTimeHist ); \
  31.189 -      freeHistExt( _VMSMasterEnv->freeTimeHist   );
  31.190 -      
  31.191 -#else
  31.192 -   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv
  31.193 -   #define MEAS__Make_Meas_Hists_for_Malloc_Meas 
  31.194 -   #define MEAS__Capture_Pre_Malloc_Point
  31.195 -   #define MEAS__Capture_Post_Malloc_Point
  31.196 -   #define MEAS__Capture_Pre_Free_Point
  31.197 -   #define MEAS__Capture_Post_Free_Point
  31.198 -   #define MEAS__Print_Hists_for_Malloc_Meas 
  31.199 -#endif
  31.200 -
  31.201 -
  31.202 -
  31.203 -#ifdef MEAS__TURN_ON_PLUGIN_MEAS 
  31.204 -   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv \
  31.205 -      Histogram       *reqHdlrLowTimeHist; \
  31.206 -      Histogram       *reqHdlrHighTimeHist;
  31.207 -          
  31.208 -   #define MEAS__Make_Meas_Hists_for_Plugin_Meas \
  31.209 -      _VMSMasterEnv->reqHdlrLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
  31.210 -                                                    "plugin_low_time_hist");\
  31.211 -      _VMSMasterEnv->reqHdlrHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
  31.212 -                                                    "plugin_high_time_hist");
  31.213 -
  31.214 -   #define MEAS__startReqHdlr \
  31.215 -      int32 startStamp1, endStamp1; \
  31.216 -      saveLowTimeStampCountInto( startStamp1 );
  31.217 -
  31.218 -   #define MEAS__endReqHdlr \
  31.219 -      saveLowTimeStampCountInto( endStamp1 ); \
  31.220 -      addIntervalToHist( startStamp1, endStamp1, \
  31.221 -                           _VMSMasterEnv->reqHdlrLowTimeHist ); \
  31.222 -      addIntervalToHist( startStamp1, endStamp1, \
  31.223 -                           _VMSMasterEnv->reqHdlrHighTimeHist );
  31.224 -
  31.225 -   #define MEAS__Print_Hists_for_Plugin_Meas \
  31.226 -      printHist( _VMSMasterEnv->reqHdlrLowTimeHist ); \
  31.227 -      saveHistToFile( _VMSMasterEnv->reqHdlrLowTimeHist ); \
  31.228 -      printHist( _VMSMasterEnv->reqHdlrHighTimeHist ); \
  31.229 -      saveHistToFile( _VMSMasterEnv->reqHdlrHighTimeHist ); \
  31.230 -      freeHistExt( _VMSMasterEnv->reqHdlrLowTimeHist ); \
  31.231 -      freeHistExt( _VMSMasterEnv->reqHdlrHighTimeHist );
  31.232 -#else
  31.233 -   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv
  31.234 -   #define MEAS__Make_Meas_Hists_for_Plugin_Meas
  31.235 -   #define MEAS__startReqHdlr 
  31.236 -   #define MEAS__endReqHdlr 
  31.237 -   #define MEAS__Print_Hists_for_Plugin_Meas 
  31.238 -
  31.239 -#endif
  31.240 -
  31.241 -      
  31.242 -#ifdef MEAS__TURN_ON_SYSTEM_MEAS
  31.243 -   #define MEAS__Insert_System_Meas_Fields_into_Slave \
  31.244 -      TSCountLowHigh  startSusp; \
  31.245 -      uint64  totalSuspCycles; \
  31.246 -      uint32  numGoodSusp;
  31.247 -
  31.248 -   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv \
  31.249 -       TSCountLowHigh   startMaster; \
  31.250 -       uint64           totalMasterCycles; \
  31.251 -       uint32           numMasterAnimations; \
  31.252 -       TSCountLowHigh   startReqHdlr; \
  31.253 -       uint64           totalPluginCycles; \
  31.254 -       uint32           numPluginAnimations; \
  31.255 -       uint64           cyclesTillStartAnimationMaster; \
  31.256 -       TSCountLowHigh   endAnimationMaster;
  31.257 -
  31.258 -   #define MEAS__startAnimationMaster_forSys \
  31.259 -      TSCountLowHigh startStamp1, endStamp1; \
  31.260 -      saveTSCLowHigh( endStamp1 ); \
  31.261 -      _VMSMasterEnv->cyclesTillStartAnimationMaster = \
  31.262 -      endStamp1.longVal - masterVP->startSusp.longVal;
  31.263 -
  31.264 -   #define Meas_startReqHdlr_forSys \
  31.265 -        saveTSCLowHigh( startStamp1 ); \
  31.266 -        _VMSMasterEnv->startReqHdlr.longVal = startStamp1.longVal;
  31.267 - 
  31.268 -   #define MEAS__endAnimationMaster_forSys \
  31.269 -      saveTSCLowHigh( startStamp1 ); \
  31.270 -      _VMSMasterEnv->endAnimationMaster.longVal = startStamp1.longVal;
  31.271 -
  31.272 -   /*A TSC is stored in VP first thing inside wrapper-lib
  31.273 -    * Now, measures cycles from there to here
  31.274 -    * Master and Plugin will add this value to other trace-seg measures
  31.275 -    */
  31.276 -   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys\
  31.277 -          saveTSCLowHigh(endSusp); \
  31.278 -          numCycles = endSusp.longVal - currVP->startSusp.longVal; \
  31.279 -          /*sanity check (400K is about 20K iters)*/ \
  31.280 -          if( numCycles < 400000 ) \
  31.281 -           { currVP->totalSuspCycles += numCycles; \
  31.282 -             currVP->numGoodSusp++; \
  31.283 -           } \
  31.284 -             /*recorded every time, but only read if currVP == MasterVP*/ \
  31.285 -          _VMSMasterEnv->startMaster.longVal = endSusp.longVal;
  31.286 -
  31.287 -#else
  31.288 -   #define MEAS__Insert_System_Meas_Fields_into_Slave 
  31.289 -   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv 
  31.290 -   #define MEAS__Make_Meas_Hists_for_System_Meas
  31.291 -   #define MEAS__startAnimationMaster_forSys 
  31.292 -   #define MEAS__startReqHdlr_forSys
  31.293 -   #define MEAS__endAnimationMaster_forSys
  31.294 -   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys
  31.295 -   #define MEAS__Print_Hists_for_System_Meas 
  31.296 -#endif
  31.297 -
  31.298 -#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
  31.299 -   
  31.300 -   #define MEAS__Insert_Counter_Handler \
  31.301 -   typedef void (*CounterHandler) (int,int,int,SlaveVP*,uint64,uint64,uint64);
  31.302 - 
  31.303 -   enum eventType {
  31.304 -    DebugEvt = 0,
  31.305 -    AppResponderInvocation_start,
  31.306 -    AppResponder_start,
  31.307 -    AppResponder_end,
  31.308 -    AssignerInvocation_start,
  31.309 -    NextAssigner_start,
  31.310 -    Assigner_start,
  31.311 -    Assigner_end,
  31.312 -    Work_start,
  31.313 -    Work_end,
  31.314 -    HwResponderInvocation_start,
  31.315 -    Timestamp_start,
  31.316 -    Timestamp_end
  31.317 -   };
  31.318 -   
  31.319 -   #define saveCyclesAndInstrs(core,cycles,instrs,cachem) do{ \
  31.320 -   int cycles_fd = _VMSMasterEnv->cycles_counter_fd[core]; \
  31.321 -   int instrs_fd = _VMSMasterEnv->instrs_counter_fd[core]; \
  31.322 -   int cachem_fd = _VMSMasterEnv->cachem_counter_fd[core]; \
  31.323 -   int nread;                                           \
  31.324 -                                                        \
  31.325 -   nread = read(cycles_fd,&(cycles),sizeof(cycles));    \
  31.326 -   if(nread<0){                                         \
  31.327 -       perror("Error reading cycles counter");          \
  31.328 -       cycles = 0;                                      \
  31.329 -   }                                                    \
  31.330 -                                                        \
  31.331 -   nread = read(instrs_fd,&(instrs),sizeof(instrs));    \
  31.332 -   if(nread<0){                                         \
  31.333 -       perror("Error reading cycles counter");          \
  31.334 -       instrs = 0;                                      \
  31.335 -   }                                                    \
  31.336 -   nread = read(cachem_fd,&(cachem),sizeof(cachem));    \
  31.337 -   if(nread<0){                                         \
  31.338 -       perror("Error reading last level cache miss counter");          \
  31.339 -       cachem = 0;                                      \
  31.340 -   }                                                    \
  31.341 -   } while (0) 
  31.342 -
  31.343 -   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv \
  31.344 -     int cycles_counter_fd[NUM_CORES]; \
  31.345 -     int instrs_counter_fd[NUM_CORES]; \
  31.346 -     int cachem_counter_fd[NUM_CORES]; \
  31.347 -     uint64 start_master_lock[NUM_CORES][3]; \
  31.348 -     CounterHandler counterHandler;
  31.349 -
  31.350 -   #define HOLISTIC__Setup_Perf_Counters setup_perf_counters();
  31.351 -   
  31.352 -
  31.353 -   #define HOLISTIC__CoreCtrl_Setup \
  31.354 -   CounterHandler counterHandler = _VMSMasterEnv->counterHandler; \
  31.355 -   SlaveVP      *lastVPBeforeMaster = NULL; \
  31.356 -   /*if(thisCoresThdParams->coreNum == 0){ \
  31.357 -       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
  31.358 -       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
  31.359 -   } \
  31.360 -   if(0 < (thisCoresThdParams->coreNum) && (thisCoresThdParams->coreNum) < (NUM_CORES - 1)){ \
  31.361 -       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
  31.362 -       int sndctr = tsc_offset_resp(sendCoresThdParams, 0); \
  31.363 -       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
  31.364 -       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
  31.365 -   }  \
  31.366 -   if(thisCoresThdParams->coreNum == (NUM_CORES - 1)){ \
  31.367 -       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
  31.368 -       int sndctr = tsc_offset_resp(sendCoresThdParams,0); \
  31.369 -   }*/
  31.370 -   
  31.371 -   
  31.372 -   #define HOLISTIC__Insert_Master_Global_Vars \
  31.373 -        int vpid,task; \
  31.374 -        CounterHandler counterHandler = masterEnv->counterHandler;
  31.375 -   
  31.376 -   #define HOLISTIC__Record_last_work lastVPBeforeMaster = currVP;
  31.377 -
  31.378 -   #define HOLISTIC__Record_AppResponderInvocation_start \
  31.379 -      uint64 cycles,instrs,cachem; \
  31.380 -      saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  31.381 -      if(lastVPBeforeMaster){ \
  31.382 -        (*counterHandler)(AppResponderInvocation_start,lastVPBeforeMaster->slaveID,lastVPBeforeMaster->assignCount,lastVPBeforeMaster,cycles,instrs,cachem); \
  31.383 -        lastVPBeforeMaster = NULL; \
  31.384 -      } else { \
  31.385 -          _VMSMasterEnv->start_master_lock[thisCoresIdx][0] = cycles; \
  31.386 -          _VMSMasterEnv->start_master_lock[thisCoresIdx][1] = instrs; \
  31.387 -          _VMSMasterEnv->start_master_lock[thisCoresIdx][2] = cachem; \
  31.388 -      }
  31.389 - 
  31.390 -           /* Request Handler may call resume() on the VP, but we want to 
  31.391 -                * account the whole interval to the same task. Therefore, need
  31.392 -                * to save task ID at the beginning.
  31.393 -                * 
  31.394 -                * Using this value as "end of AppResponder Invocation Time"
  31.395 -                * is possible if there is only one SchedSlot per core -
  31.396 -                * invoking processor is last to be treated here! If more than
  31.397 -                * one slot, MasterLoop processing time for all but the last VP
  31.398 -                * would be erroneously counted as invocation time.
  31.399 -                */
  31.400 -   #define HOLISTIC__Record_AppResponder_start \
  31.401 -               vpid = currSlot->slaveAssignedToSlot->slaveID; \
  31.402 -               task = currSlot->slaveAssignedToSlot->assignCount; \
  31.403 -               uint64 cycles, instrs, cachem; \
  31.404 -               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  31.405 -               (*counterHandler)(AppResponder_start,vpid,task,currSlot->slaveAssignedToSlot,cycles,instrs,cachem);
  31.406 -
  31.407 -   #define HOLISTIC__Record_AppResponder_end \
  31.408 -        uint64 cycles2,instrs2,cachem2; \
  31.409 -        saveCyclesAndInstrs(thisCoresIdx,cycles2, instrs2,cachem2); \
  31.410 -        (*counterHandler)(AppResponder_end,vpid,task,currSlot->slaveAssignedToSlot,cycles2,instrs2,cachem2); \
  31.411 -        (*counterHandler)(Timestamp_end,vpid,task,currSlot->slaveAssignedToSlot,rdtsc(),0,0);
  31.412 -
  31.413 -   
  31.414 -   /* Don't know who to account time to yet - goes to assigned VP
  31.415 -    * after the call.
  31.416 -    */
  31.417 -   #define HOLISTIC__Record_Assigner_start \
  31.418 -       int empty = FALSE; \
  31.419 -       if(currSlot->slaveAssignedToSlot == NULL){ \
  31.420 -           empty= TRUE; \
  31.421 -       } \
  31.422 -       uint64 tmp_cycles, tmp_instrs, tmp_cachem; \
  31.423 -       saveCyclesAndInstrs(thisCoresIdx,tmp_cycles,tmp_instrs,tmp_cachem); \
  31.424 -       uint64 tsc = rdtsc(); \
  31.425 -       if(vpid > 0) { \
  31.426 -           (*counterHandler)(NextAssigner_start,vpid,task,currSlot->slaveAssignedToSlot,tmp_cycles,tmp_instrs,tmp_cachem); \
  31.427 -           vpid = 0; \
  31.428 -           task = 0; \
  31.429 -        }
  31.430 -
  31.431 -   #define HOLISTIC__Record_Assigner_end \
  31.432 -        uint64 cycles,instrs,cachem; \
  31.433 -        saveCyclesAndInstrs(thisCoresIdx,cycles,instrs,cachem); \
  31.434 -        if(empty){ \
  31.435 -            (*counterHandler)(AssignerInvocation_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,masterEnv->start_master_lock[thisCoresIdx][0],masterEnv->start_master_lock[thisCoresIdx][1],masterEnv->start_master_lock[thisCoresIdx][2]); \
  31.436 -        } \
  31.437 -        (*counterHandler)(Timestamp_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tsc,0,0); \
  31.438 -        (*counterHandler)(Assigner_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tmp_cycles,tmp_instrs,tmp_cachem); \
  31.439 -        (*counterHandler)(Assigner_end,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,cycles,instrs,tmp_cachem);
  31.440 -
  31.441 -   #define HOLISTIC__Record_Work_start \
  31.442 -        if(currVP){ \
  31.443 -                uint64 cycles,instrs,cachem; \
  31.444 -                saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  31.445 -                (*counterHandler)(Work_start,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
  31.446 -        }
  31.447 -   
  31.448 -   #define HOLISTIC__Record_Work_end \
  31.449 -       if(currVP){ \
  31.450 -               uint64 cycles,instrs,cachem; \
  31.451 -               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
  31.452 -               (*counterHandler)(Work_end,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
  31.453 -       }
  31.454 -
  31.455 -   #define HOLISTIC__Record_HwResponderInvocation_start \
  31.456 -        uint64 cycles,instrs,cachem; \
  31.457 -        saveCyclesAndInstrs(animatingSlv->coreAnimatedBy,cycles, instrs,cachem); \
  31.458 -        (*(_VMSMasterEnv->counterHandler))(HwResponderInvocation_start,animatingSlv->slaveID,animatingSlv->assignCount,animatingSlv,cycles,instrs,cachem); 
  31.459 -        
  31.460 -
  31.461 -   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr) do{     \
  31.462 -void* frame_ptr0 = vp_ptr->framePtr;                               \
  31.463 -void* frame_ptr1 = *((void**)frame_ptr0);                          \
  31.464 -void* frame_ptr2 = *((void**)frame_ptr1);                          \
  31.465 -void* frame_ptr3 = *((void**)frame_ptr2);                          \
  31.466 -void* ret_addr = *((void**)frame_ptr3 + 1);                        \
  31.467 -*res_ptr = ret_addr;                                               \
  31.468 -} while (0)
  31.469 -
  31.470 -#else  
  31.471 -   #define MEAS__Insert_Counter_Handler
  31.472 -   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv
  31.473 -   #define HOLISTIC__Setup_Perf_Counters
  31.474 -   #define HOLISTIC__CoreCtrl_Setup
  31.475 -   #define HOLISTIC__Insert_Master_Global_Vars
  31.476 -   #define HOLISTIC__Record_last_work
  31.477 -   #define HOLISTIC__Record_AppResponderInvocation_start
  31.478 -   #define HOLISTIC__Record_AppResponder_start
  31.479 -   #define HOLISTIC__Record_AppResponder_end
  31.480 -   #define HOLISTIC__Record_Assigner_start
  31.481 -   #define HOLISTIC__Record_Assigner_end
  31.482 -   #define HOLISTIC__Record_Work_start
  31.483 -   #define HOLISTIC__Record_Work_end
  31.484 -   #define HOLISTIC__Record_HwResponderInvocation_start
  31.485 -   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr)
  31.486 -#endif
  31.487 -
  31.488 -//Experiment in two-step macros -- if doesn't work, insert each separately
  31.489 -#define MEAS__Insert_Meas_Fields_into_Slave  \
  31.490 -   MEAS__Insert_Susp_Meas_Fields_into_Slave \
  31.491 -   MEAS__Insert_Master_Meas_Fields_into_Slave \
  31.492 -   MEAS__Insert_System_Meas_Fields_into_Slave 
  31.493 -
  31.494 -
  31.495 -//======================  Histogram Macros -- Create ========================
  31.496 -//
  31.497 -//
  31.498 -
  31.499 -//The language implementation should include a definition of this macro,
  31.500 -// which creates all the histograms the language uses to collect measurements
  31.501 -// of plugin operation -- so, if the language didn't define it, must
  31.502 -// define it here (as empty), to avoid compile error
  31.503 -#ifndef MEAS__Make_Meas_Hists_for_Language
  31.504 -#define MEAS__Make_Meas_Hists_for_Language
  31.505 -#endif
  31.506 -
  31.507 -#define makeAMeasHist( idx, name, numBins, startVal, binWidth ) \
  31.508 -      makeHighestDynArrayIndexBeAtLeast( _VMSMasterEnv->measHistsInfo, idx ); \
  31.509 -      _VMSMasterEnv->measHists[idx] =  \
  31.510 -                       makeFixedBinHist( numBins, startVal, binWidth, name );
  31.511 -
  31.512 -//==============================  Probes  ===================================
  31.513 -
  31.514 -
  31.515 -//===========================================================================
  31.516 -#endif	/* _VMS_DEFS_MEAS_H */
  31.517 -
    32.1 --- a/Services_Offered_by_VMS/Measurement_and_Stats/probes.c	Mon Sep 03 03:34:54 2012 -0700
    32.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    32.3 @@ -1,304 +0,0 @@
    32.4 -/*
    32.5 - * Copyright 2010  OpenSourceStewardshipFoundation
    32.6 - *
    32.7 - * Licensed under BSD
    32.8 - */
    32.9 -
   32.10 -#include <stdio.h>
   32.11 -#include <malloc.h>
   32.12 -#include <sys/time.h>
   32.13 -
   32.14 -#include "VMS_impl/VMS.h"
   32.15 -
   32.16 -
   32.17 -
   32.18 -//====================  Probes =================
   32.19 -/*
   32.20 - * In practice, probe operations are called from the app, from inside slaves
   32.21 - *  -- so have to be sure each probe is single-Slv owned, and be sure that
   32.22 - *  any place common structures are modified it's done inside the master.
   32.23 - * So -- the only place common structures are modified is during creation.
   32.24 - *  after that, all mods are to individual instances.
   32.25 - *
   32.26 - * Thniking perhaps should change the semantics to be that probes are
   32.27 - *  attached to the virtual processor -- and then everything is guaranteed
   32.28 - *  to be isolated -- except then can't take any intervals that span Slvs,
   32.29 - *  and would have to transfer the probes to Master env when Slv dissipates..
   32.30 - *  gets messy..
   32.31 - *
   32.32 - * For now, just making so that probe creation causes a suspend, so that
   32.33 - *  the dynamic array in the master env is only modified from the master
   32.34 - * 
   32.35 - */
   32.36 -
   32.37 -//============================  Helpers ===========================
   32.38 -inline void 
   32.39 -doNothing()
   32.40 - {
   32.41 - }
   32.42 -
   32.43 -float64 inline
   32.44 -giveInterval( struct timeval _start, struct timeval _end )
   32.45 - { float64 start, end;
   32.46 -   start = _start.tv_sec + _start.tv_usec / 1000000.0;
   32.47 -   end   = _end.tv_sec   + _end.tv_usec   / 1000000.0;
   32.48 -   return end - start;
   32.49 - }
   32.50 -          
   32.51 -//=================================================================
   32.52 -IntervalProbe *
   32.53 -create_generic_probe( char *nameStr, SlaveVP *animSlv )
   32.54 - {
   32.55 -   VMSSemReq reqData;
   32.56 -
   32.57 -   reqData.reqType  = make_probe;
   32.58 -   reqData.nameStr  = nameStr;
   32.59 -
   32.60 -   VMS_WL__send_VMSSem_request( &reqData, animSlv );
   32.61 -
   32.62 -   return animSlv->dataRetFromReq;
   32.63 - }
   32.64 -
   32.65 -/*Use this version from outside VMS -- it uses external malloc, and modifies
   32.66 - * dynamic array, so can't be animated in a slave Slv
   32.67 - */
   32.68 -IntervalProbe *
   32.69 -ext__create_generic_probe( char *nameStr )
   32.70 - { IntervalProbe *newProbe;
   32.71 -   int32          nameLen;
   32.72 -
   32.73 -   newProbe          = malloc( sizeof(IntervalProbe) );
   32.74 -   nameLen = strlen( nameStr );
   32.75 -   newProbe->nameStr = malloc( nameLen );
   32.76 -   memcpy( newProbe->nameStr, nameStr, nameLen );
   32.77 -   newProbe->hist    = NULL;
   32.78 -   newProbe->schedChoiceWasRecorded = FALSE;
   32.79 -   newProbe->probeID =
   32.80 -             addToDynArray( newProbe, _VMSMasterEnv->dynIntervalProbesInfo );
   32.81 -
   32.82 -   return newProbe;
   32.83 - }
   32.84 -
   32.85 -//============================ Fns def in header =======================
   32.86 -
   32.87 -int32
   32.88 -VMS_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv )
   32.89 - { IntervalProbe *newProbe;
   32.90 -
   32.91 -   newProbe = create_generic_probe( nameStr, animSlv );
   32.92 -   
   32.93 -   return newProbe->probeID;
   32.94 - }
   32.95 -
   32.96 -int32
   32.97 -VMS_impl__create_histogram_probe( int32   numBins, float64    startValue,
   32.98 -               float64 binWidth, char   *nameStr, SlaveVP *animSlv )
   32.99 - { IntervalProbe *newProbe;
  32.100 -
  32.101 -   newProbe = create_generic_probe( nameStr, animSlv );
  32.102 -   
  32.103 -#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  32.104 -   DblHist *hist;
  32.105 -   hist =  makeDblHistogram( numBins, startValue, binWidth );
  32.106 -#else
  32.107 -   Histogram *hist;
  32.108 -   hist =  makeHistogram( numBins, startValue, binWidth );
  32.109 -#endif
  32.110 -   newProbe->hist = hist;
  32.111 -   return newProbe->probeID;
  32.112 - }
  32.113 -
  32.114 -
  32.115 -int32
  32.116 -VMS_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv)
  32.117 - { IntervalProbe *newProbe;
  32.118 -   struct timeval *startStamp;
  32.119 -   float64 startSecs;
  32.120 -
  32.121 -   newProbe           = create_generic_probe( nameStr, animSlv );
  32.122 -   newProbe->endSecs  = 0;
  32.123 -
  32.124 -   
  32.125 -   gettimeofday( &(newProbe->startStamp), NULL);
  32.126 -
  32.127 -      //turn into a double
  32.128 -   startStamp = &(newProbe->startStamp);
  32.129 -   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
  32.130 -   newProbe->startSecs = startSecs;
  32.131 -
  32.132 -   return newProbe->probeID;
  32.133 - }
  32.134 -
  32.135 -int32
  32.136 -VMS_ext_impl__record_time_point_into_new_probe( char *nameStr )
  32.137 - { IntervalProbe *newProbe;
  32.138 -   struct timeval *startStamp;
  32.139 -   float64 startSecs;
  32.140 -
  32.141 -   newProbe           = ext__create_generic_probe( nameStr );
  32.142 -   newProbe->endSecs  = 0;
  32.143 -
  32.144 -   gettimeofday( &(newProbe->startStamp), NULL);
  32.145 -
  32.146 -      //turn into a double
  32.147 -   startStamp = &(newProbe->startStamp);
  32.148 -   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
  32.149 -   newProbe->startSecs = startSecs;
  32.150 -
  32.151 -   return newProbe->probeID;
  32.152 - }
  32.153 -
  32.154 -
  32.155 -/*Only call from inside master or main startup/shutdown thread
  32.156 - */
  32.157 -void
  32.158 -VMS_impl__free_probe( IntervalProbe *probe )
  32.159 - { if( probe->hist != NULL )   freeDblHist( probe->hist );
  32.160 -   if( probe->nameStr != NULL) VMS_int__free( probe->nameStr );
  32.161 -   VMS_int__free( probe );
  32.162 - }
  32.163 -
  32.164 -
  32.165 -void
  32.166 -VMS_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv )
  32.167 - { IntervalProbe *probe;
  32.168 -
  32.169 -   VMS_int__get_master_lock();
  32.170 -   probe = _VMSMasterEnv->intervalProbes[ probeID ];
  32.171 -
  32.172 -   addValueIntoTable(probe->nameStr, probe, _VMSMasterEnv->probeNameHashTbl);
  32.173 -   VMS_int__release_master_lock();
  32.174 - }
  32.175 -
  32.176 -
  32.177 -IntervalProbe *
  32.178 -VMS_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv )
  32.179 - {
  32.180 -   //TODO: fix this To be in Master -- race condition
  32.181 -   return getValueFromTable( probeName, _VMSMasterEnv->probeNameHashTbl );
  32.182 - }
  32.183 -
  32.184 -
  32.185 -/*Everything is local to the animating slaveVP, so no need for request, do
  32.186 - * work locally, in the anim Slv
  32.187 - */
  32.188 -void
  32.189 -VMS_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animatingSlv )
  32.190 - { IntervalProbe *probe;
  32.191 - 
  32.192 -   probe = _VMSMasterEnv->intervalProbes[ probeID ];
  32.193 -   probe->schedChoiceWasRecorded = TRUE;
  32.194 -   probe->coreNum = animatingSlv->coreAnimatedBy;
  32.195 -   probe->slaveID = animatingSlv->slaveID;
  32.196 -   probe->slaveCreateSecs = animatingSlv->createPtInSecs;
  32.197 - }
  32.198 -
  32.199 -/*Everything is local to the animating slaveVP, so no need for request, do
  32.200 - * work locally, in the anim Slv
  32.201 - */
  32.202 -void
  32.203 -VMS_impl__record_interval_start_in_probe( int32 probeID )
  32.204 - { IntervalProbe *probe;
  32.205 -
  32.206 -         DEBUG__printf( dbgProbes, "record start of interval" )
  32.207 -   probe = _VMSMasterEnv->intervalProbes[ probeID ];
  32.208 -
  32.209 -      //record *start* point as last thing, after lookup
  32.210 -#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  32.211 -   gettimeofday( &(probe->startStamp), NULL);
  32.212 -#endif
  32.213 -#ifdef PROBES__USE_TSC_PROBES
  32.214 -   probe->startStamp = getTSCount();
  32.215 -#endif
  32.216 - }
  32.217 -
  32.218 -
  32.219 -/*Everything is local to the animating slaveVP, except the histogram, so do
  32.220 - * work locally, in the anim Slv -- may lose a few histogram counts
  32.221 - * 
  32.222 - *This should be safe to run inside SlaveVP
  32.223 - */
  32.224 -void
  32.225 -VMS_impl__record_interval_end_in_probe( int32 probeID )
  32.226 - { IntervalProbe *probe;
  32.227 -
  32.228 -   //Record first thing -- before looking up the probe to store it into
  32.229 -#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  32.230 -   struct timeval  endStamp;
  32.231 -   gettimeofday( &(endStamp), NULL);
  32.232 -#endif
  32.233 -#ifdef PROBES__USE_TSC_PROBES
  32.234 -   TSCount endStamp, interval;
  32.235 -   endStamp = getTSCount();
  32.236 -#endif
  32.237 -#ifdef PROBES__USE_PERF_CTR_PROBES
  32.238 -
  32.239 -#endif
  32.240 -   
  32.241 -   probe = _VMSMasterEnv->intervalProbes[ probeID ];
  32.242 -
  32.243 -#ifdef PROBES__USE_TIME_OF_DAY_PROBES
  32.244 -   if( probe->hist != NULL )
  32.245 -    { addToDblHist( giveInterval( probe->startStamp, endStamp), probe->hist );
  32.246 -    }
  32.247 -#endif
  32.248 -#ifdef PROBES__USE_TSC_PROBES
  32.249 -   if( probe->hist != NULL )
  32.250 -    { interval = probe->endStamp - probe->startStamp;
  32.251 -         //Sanity check for TSC counter overflow: if sane, add to histogram
  32.252 -      if( interval < probe->hist->endOfRange * 10 )
  32.253 -         addToHist( interval, probe->hist );
  32.254 -    }
  32.255 -#endif
  32.256 -#ifdef PROBES__USE_PERF_CTR_PROBES
  32.257 -
  32.258 -#endif
  32.259 -   
  32.260 -         DEBUG__printf( dbgProbes, "record end of interval" )
  32.261 - }
  32.262 -
  32.263 -
  32.264 -void
  32.265 -print_probe_helper( IntervalProbe *probe )
  32.266 - {
  32.267 -   printf( "\nprobe: %s, ",  probe->nameStr );
  32.268 -   
  32.269 -   
  32.270 -   if( probe->schedChoiceWasRecorded )
  32.271 -    { printf( "coreNum: %d, slaveID: %d, slaveVPCreated: %0.6f | ",
  32.272 -              probe->coreNum, probe->slaveID, probe->slaveCreateSecs );
  32.273 -    }
  32.274 -
  32.275 -   if( probe->endSecs == 0 ) //just a single point in time
  32.276 -    {
  32.277 -      printf( " time point: %.6f\n",
  32.278 -              probe->startSecs - _VMSMasterEnv->createPtInSecs );
  32.279 -    }
  32.280 -   else if( probe->hist == NULL ) //just an interval
  32.281 -    {
  32.282 -      printf( " startSecs: %.6f interval: %.6f\n", 
  32.283 -         (probe->startSecs - _VMSMasterEnv->createPtInSecs), probe->interval);
  32.284 -    }
  32.285 -   else  //a full histogram of intervals
  32.286 -    {
  32.287 -      printDblHist( probe->hist );
  32.288 -    }
  32.289 - }
  32.290 -
  32.291 -void
  32.292 -VMS_impl__print_stats_of_probe( IntervalProbe *probe )
  32.293 - { 
  32.294 -
  32.295 -//   probe = _VMSMasterEnv->intervalProbes[ probeID ];
  32.296 -
  32.297 -   print_probe_helper( probe );
  32.298 - }
  32.299 -
  32.300 -
  32.301 -void
  32.302 -VMS_impl__print_stats_of_all_probes()
  32.303 - {
  32.304 -   forAllInDynArrayDo( _VMSMasterEnv->dynIntervalProbesInfo,
  32.305 -                          (DynArrayFnPtr) &VMS_impl__print_stats_of_probe );
  32.306 -   fflush( stdout );
  32.307 - }
    33.1 --- a/Services_Offered_by_VMS/Measurement_and_Stats/probes.h	Mon Sep 03 03:34:54 2012 -0700
    33.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    33.3 @@ -1,192 +0,0 @@
    33.4 -/*
    33.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    33.6 - *  Licensed under GNU General Public License version 2
    33.7 - *
    33.8 - * Author: seanhalle@yahoo.com
    33.9 - * 
   33.10 - */
   33.11 -
   33.12 -#ifndef _PROBES_H
   33.13 -#define	_PROBES_H
   33.14 -#define _GNU_SOURCE
   33.15 -
   33.16 -#include "VMS_impl/VMS_primitive_data_types.h"
   33.17 -
   33.18 -#include <sys/time.h>
   33.19 -
   33.20 -/*Note on order of include files:  
   33.21 - * This file relies on #defines that appear in other files, which must come
   33.22 - * first in the #include sequence..
   33.23 - */
   33.24 -
   33.25 -/*Use these aliases in application code*/
   33.26 -#define VMS_App__record_time_point_into_new_probe VMS_WL__record_time_point_into_new_probe
   33.27 -#define VMS_App__create_single_interval_probe   VMS_WL__create_single_interval_probe
   33.28 -#define VMS_App__create_histogram_probe         VMS_WL__create_histogram_probe
   33.29 -#define VMS_App__index_probe_by_its_name        VMS_WL__index_probe_by_its_name
   33.30 -#define VMS_App__get_probe_by_name              VMS_WL__get_probe_by_name
   33.31 -#define VMS_App__record_sched_choice_into_probe VMS_WL__record_sched_choice_into_probe
   33.32 -#define VMS_App__record_interval_start_in_probe VMS_WL__record_interval_start_in_probe 
   33.33 -#define VMS_App__record_interval_end_in_probe   VMS_WL__record_interval_end_in_probe
   33.34 -#define VMS_App__print_stats_of_probe           VMS_WL__print_stats_of_probe
   33.35 -#define VMS_App__print_stats_of_all_probes      VMS_WL__print_stats_of_all_probes 
   33.36 -
   33.37 -
   33.38 -//==========================
   33.39 -#ifdef PROBES__USE_TSC_PROBES
   33.40 -   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   33.41 -   TSCount    startStamp; \
   33.42 -   TSCount    endStamp; \
   33.43 -   TSCount    interval; \
   33.44 -   Histogram *hist; /*if left NULL, then is single interval probe*/
   33.45 -#endif
   33.46 -#ifdef PROBES__USE_TIME_OF_DAY_PROBES
   33.47 -   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   33.48 -   struct timeval  startStamp; \
   33.49 -   struct timeval  endStamp; \
   33.50 -   float64         startSecs; \
   33.51 -   float64         endSecs; \
   33.52 -   float64         interval; \
   33.53 -   DblHist        *hist; /*if NULL, then is single interval probe*/
   33.54 -#endif
   33.55 -#ifdef PROBES__USE_PERF_CTR_PROBES
   33.56 -   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
   33.57 -   int64  startStamp; \
   33.58 -   int64  endStamp; \
   33.59 -   int64  interval; \
   33.60 -   Histogram *hist; /*if left NULL, then is single interval probe*/
   33.61 -#endif
   33.62 -
   33.63 -//typedef struct _IntervalProbe IntervalProbe; -- is in VMS.h
   33.64 -struct _IntervalProbe
   33.65 - {
   33.66 -   char           *nameStr;
   33.67 -   int32           probeID;
   33.68 -
   33.69 -   int32           schedChoiceWasRecorded;
   33.70 -   int32           coreNum;
   33.71 -   int32           slaveID;
   33.72 -   float64         slaveCreateSecs;
   33.73 -   PROBES__Insert_timestamps_and_intervals_into_probe_struct;
   33.74 - };
   33.75 -
   33.76 -//=========================== NEVER USE THESE ==========================
   33.77 -/*NEVER use these in any code!!  These are here only for use in the macros
   33.78 - * defined in this file!!
   33.79 - */
   33.80 -int32
   33.81 -VMS_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv );
   33.82 -
   33.83 -int32
   33.84 -VMS_impl__create_histogram_probe( int32   numBins, float64    startValue,
   33.85 -               float64 binWidth, char    *nameStr, SlaveVP *animSlv );
   33.86 -
   33.87 -int32
   33.88 -VMS_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv);
   33.89 -
   33.90 -int32
   33.91 -VMS_ext_impl__record_time_point_into_new_probe( char *nameStr );
   33.92 -
   33.93 -void
   33.94 -VMS_impl__free_probe( IntervalProbe *probe );
   33.95 -
   33.96 -void
   33.97 -VMS_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv );
   33.98 -
   33.99 -IntervalProbe *
  33.100 -VMS_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv );
  33.101 -
  33.102 -void
  33.103 -VMS_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animSlv );
  33.104 -
  33.105 -void
  33.106 -VMS_impl__record_interval_start_in_probe( int32 probeID );
  33.107 -
  33.108 -void
  33.109 -VMS_impl__record_interval_end_in_probe( int32 probeID );
  33.110 -
  33.111 -void
  33.112 -VMS_impl__print_stats_of_probe( IntervalProbe *probe );
  33.113 -
  33.114 -void
  33.115 -VMS_impl__print_stats_of_all_probes();
  33.116 -
  33.117 -
  33.118 -//======================== Probes =============================
  33.119 -//
  33.120 -// Use macros to allow turning probes off with a #define switch
  33.121 -// This means probes have zero impact on performance when off
  33.122 -//=============================================================
  33.123 -
  33.124 -#ifdef PROBES__TURN_ON_STATS_PROBES
  33.125 -
  33.126 -   #define PROBES__Create_Probe_Bookkeeping_Vars \
  33.127 -      _VMSMasterEnv->dynIntervalProbesInfo = \
  33.128 -       makePrivDynArrayOfSize( (void***)&(_VMSMasterEnv->intervalProbes), 200); \
  33.129 -      \
  33.130 -      _VMSMasterEnv->probeNameHashTbl = makeHashTable( 1000, &VMS_int__free ); \
  33.131 -      \
  33.132 -      /*put creation time directly into master env, for fast retrieval*/ \
  33.133 -   struct timeval timeStamp; \
  33.134 -   gettimeofday( &(timeStamp), NULL); \
  33.135 -   _VMSMasterEnv->createPtInSecs = \
  33.136 -                           timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0);
  33.137 -
  33.138 -   #define VMS_WL__record_time_point_into_new_probe( nameStr, animSlv ) \
  33.139 -           VMS_impl__record_time_point_in_new_probe( nameStr, animSlv )
  33.140 -
  33.141 -   #define VMS_ext__record_time_point_into_new_probe( nameStr ) \
  33.142 -           VMS_ext_impl__record_time_point_into_new_probe( nameStr )
  33.143 -
  33.144 -   #define VMS_WL__create_single_interval_probe( nameStr, animSlv ) \
  33.145 -           VMS_impl__create_single_interval_probe( nameStr, animSlv )
  33.146 -
  33.147 -   #define VMS_WL__create_histogram_probe(      numBins, startValue,              \
  33.148 -                                             binWidth, nameStr, animSlv )       \
  33.149 -           VMS_impl__create_histogram_probe( numBins, startValue,              \
  33.150 -                                             binWidth, nameStr, animSlv )
  33.151 -   #define VMS_int__free_probe( probe ) \
  33.152 -           VMS_impl__free_probe( probe )
  33.153 -
  33.154 -   #define VMS_WL__index_probe_by_its_name( probeID, animSlv ) \
  33.155 -           VMS_impl__index_probe_by_its_name( probeID, animSlv )
  33.156 -
  33.157 -   #define VMS_WL__get_probe_by_name( probeID, animSlv ) \
  33.158 -           VMS_impl__get_probe_by_name( probeName, animSlv )
  33.159 -
  33.160 -   #define VMS_WL__record_sched_choice_into_probe( probeID, animSlv ) \
  33.161 -           VMS_impl__record_sched_choice_into_probe( probeID, animSlv )
  33.162 -
  33.163 -   #define VMS_WL__record_interval_start_in_probe( probeID ) \
  33.164 -           VMS_impl__record_interval_start_in_probe( probeID )
  33.165 -
  33.166 -   #define VMS_WL__record_interval_end_in_probe( probeID ) \
  33.167 -           VMS_impl__record_interval_end_in_probe( probeID )
  33.168 -
  33.169 -   #define VMS_WL__print_stats_of_probe( probeID ) \
  33.170 -           VMS_impl__print_stats_of_probe( probeID )
  33.171 -
  33.172 -   #define VMS_WL__print_stats_of_all_probes() \
  33.173 -           VMS_impl__print_stats_of_all_probes()
  33.174 -
  33.175 -
  33.176 -#else
  33.177 -   #define PROBES__Create_Probe_Bookkeeping_Vars
  33.178 -   #define VMS_WL__record_time_point_into_new_probe( nameStr, animSlv ) 0 /* do nothing */
  33.179 -   #define VMS_ext__record_time_point_into_new_probe( nameStr )  0 /* do nothing */
  33.180 -   #define VMS_WL__create_single_interval_probe( nameStr, animSlv ) 0 /* do nothing */
  33.181 -   #define VMS_WL__create_histogram_probe( numBins, startValue,              \
  33.182 -                                             binWidth, nameStr, animSlv )       \
  33.183 -          0 /* do nothing */
  33.184 -   #define VMS_WL__index_probe_by_its_name( probeID, animSlv ) /* do nothing */
  33.185 -   #define VMS_WL__get_probe_by_name( probeID, animSlv ) NULL /* do nothing */
  33.186 -   #define VMS_WL__record_sched_choice_into_probe( probeID, animSlv ) /* do nothing */
  33.187 -   #define VMS_WL__record_interval_start_in_probe( probeID )  /* do nothing */
  33.188 -   #define VMS_WL__record_interval_end_in_probe( probeID )  /* do nothing */
  33.189 -   #define VMS_WL__print_stats_of_probe( probeID ) ; /* do nothing */
  33.190 -   #define VMS_WL__print_stats_of_all_probes() ;/* do nothing */
  33.191 -
  33.192 -#endif   /* defined PROBES__TURN_ON_STATS_PROBES */
  33.193 -
  33.194 -#endif	/* _PROBES_H */
  33.195 -
    34.1 --- a/Services_Offered_by_VMS/Memory_Handling/vmalloc.c	Mon Sep 03 03:34:54 2012 -0700
    34.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    34.3 @@ -1,438 +0,0 @@
    34.4 -/*
    34.5 - *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
    34.6 - *  Licensed under GNU General Public License version 2
    34.7 - *
    34.8 - * Author: seanhalle@yahoo.com
    34.9 - *
   34.10 - * Created on November 14, 2009, 9:07 PM
   34.11 - */
   34.12 -
   34.13 -#include <malloc.h>
   34.14 -#include <inttypes.h>
   34.15 -#include <stdlib.h>
   34.16 -#include <stdio.h>
   34.17 -#include <string.h>
   34.18 -#include <math.h>
   34.19 -
   34.20 -#include "VMS_impl/VMS.h"
   34.21 -#include "Histogram/Histogram.h"
   34.22 -
   34.23 -#define MAX_UINT64 0xFFFFFFFFFFFFFFFF
   34.24 -
   34.25 -//A MallocProlog is a head element if the HigherInMem variable is NULL
   34.26 -//A Chunk is free if the prevChunkInFreeList variable is NULL
   34.27 -
   34.28 -/*
   34.29 - * This calculates the container which fits the given size.
   34.30 - */
   34.31 -inline
   34.32 -uint32 getContainer(size_t size)
   34.33 -{
   34.34 -    return (log2(size)-LOG128)/LOG54;
   34.35 -}
   34.36 -
   34.37 -/*
   34.38 - * Removes the first chunk of a freeList
   34.39 - * The chunk is removed but not set as free. There is no check if
   34.40 - * the free list is empty, so make sure this is not the case.
   34.41 - */
   34.42 -inline
   34.43 -MallocProlog *removeChunk(MallocArrays* freeLists, uint32 containerIdx)
   34.44 -{
   34.45 -    MallocProlog** container = &freeLists->bigChunks[containerIdx];
   34.46 -    MallocProlog*  removedChunk = *container;
   34.47 -    *container = removedChunk->nextChunkInFreeList;
   34.48 -    
   34.49 -    if(removedChunk->nextChunkInFreeList)
   34.50 -        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
   34.51 -                (MallocProlog*)container;
   34.52 -    
   34.53 -    if(*container == NULL)
   34.54 -    {
   34.55 -       if(containerIdx < 64)
   34.56 -           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
   34.57 -       else
   34.58 -           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64));
   34.59 -    }
   34.60 -    
   34.61 -    return removedChunk;
   34.62 -}
   34.63 -
   34.64 -/*
   34.65 - * Removes the first chunk of a freeList
   34.66 - * The chunk is removed but not set as free. There is no check if
   34.67 - * the free list is empty, so make sure this is not the case.
   34.68 - */
   34.69 -inline
   34.70 -MallocProlog *removeSmallChunk(MallocArrays* freeLists, uint32 containerIdx)
   34.71 -{
   34.72 -    MallocProlog** container = &freeLists->smallChunks[containerIdx];
   34.73 -    MallocProlog*  removedChunk = *container;
   34.74 -    *container = removedChunk->nextChunkInFreeList;
   34.75 -    
   34.76 -    if(removedChunk->nextChunkInFreeList)
   34.77 -        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
   34.78 -                (MallocProlog*)container;
   34.79 -    
   34.80 -    return removedChunk;
   34.81 -}
   34.82 -
   34.83 -inline
   34.84 -size_t getChunkSize(MallocProlog* chunk)
   34.85 -{
   34.86 -    return (uintptr_t)chunk->nextHigherInMem -
   34.87 -            (uintptr_t)chunk - sizeof(MallocProlog);
   34.88 -}
   34.89 -
   34.90 -/*
   34.91 - * Removes a chunk from a free list.
   34.92 - */
   34.93 -inline
   34.94 -void extractChunk(MallocProlog* chunk, MallocArrays *freeLists)
   34.95 -{
   34.96 -   chunk->prevChunkInFreeList->nextChunkInFreeList = chunk->nextChunkInFreeList;
   34.97 -   if(chunk->nextChunkInFreeList)
   34.98 -       chunk->nextChunkInFreeList->prevChunkInFreeList = chunk->prevChunkInFreeList;
   34.99 -   
  34.100 -   //The last element in the list points to the container. If the container points
  34.101 -   //to NULL the container is empty
  34.102 -   if(*((void**)(chunk->prevChunkInFreeList)) == NULL && getChunkSize(chunk) >= BIG_LOWER_BOUND)
  34.103 -   {
  34.104 -       //Find the approppiate container because we do not know it
  34.105 -       uint64 containerIdx = ((uintptr_t)chunk->prevChunkInFreeList - (uintptr_t)freeLists->bigChunks) >> 3;
  34.106 -       if(containerIdx < (uint32)64)
  34.107 -           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
  34.108 -       if(containerIdx < 128 && containerIdx >=64)
  34.109 -           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64)); 
  34.110 -       
  34.111 -   }
  34.112 -}
  34.113 -
  34.114 -/*
  34.115 - * Merges two chunks.
  34.116 - * Chunk A has to be before chunk B in memory. Both have to be removed from
  34.117 - * a free list
  34.118 - */
  34.119 -inline
  34.120 -MallocProlog *mergeChunks(MallocProlog* chunkA, MallocProlog* chunkB)
  34.121 -{
  34.122 -    chunkA->nextHigherInMem = chunkB->nextHigherInMem;
  34.123 -    chunkB->nextHigherInMem->nextLowerInMem = chunkA;
  34.124 -    return chunkA;
  34.125 -}
  34.126 -/*
  34.127 - * Inserts a chunk into a free list.
  34.128 - */
  34.129 -inline
  34.130 -void insertChunk(MallocProlog* chunk, MallocProlog** container)
  34.131 -{
  34.132 -    chunk->nextChunkInFreeList = *container;
  34.133 -    chunk->prevChunkInFreeList = (MallocProlog*)container;
  34.134 -    if(*container)
  34.135 -        (*container)->prevChunkInFreeList = chunk;
  34.136 -    *container = chunk;
  34.137 -}
  34.138 -
  34.139 -/*
  34.140 - * Divides the chunk that a new chunk of newSize is created.
  34.141 - * There is no size check, so make sure the size value is valid.
  34.142 - */
  34.143 -inline
  34.144 -MallocProlog *divideChunk(MallocProlog* chunk, size_t newSize)
  34.145 -{
  34.146 -    MallocProlog* newChunk = (MallocProlog*)((uintptr_t)chunk->nextHigherInMem -
  34.147 -            newSize - sizeof(MallocProlog));
  34.148 -    
  34.149 -    newChunk->nextLowerInMem  = chunk;
  34.150 -    newChunk->nextHigherInMem = chunk->nextHigherInMem;
  34.151 -    
  34.152 -    chunk->nextHigherInMem->nextLowerInMem = newChunk;
  34.153 -    chunk->nextHigherInMem = newChunk;
  34.154 -    
  34.155 -    return newChunk;
  34.156 -}
  34.157 -
  34.158 -/* 
  34.159 - * Search for chunk in the list of big chunks. Split the block if it's too big
  34.160 - */
  34.161 -inline
  34.162 -MallocProlog *searchChunk(MallocArrays *freeLists, size_t sizeRequested, uint32 containerIdx)
  34.163 -{
  34.164 -    MallocProlog* foundChunk;
  34.165 -    
  34.166 -    uint64 searchVector = freeLists->bigChunksSearchVector[0];
  34.167 -    //set small chunk bits to zero
  34.168 -    searchVector &= MAX_UINT64 << containerIdx;
  34.169 -    containerIdx = __builtin_ffsl(searchVector); //least significant 1 bit
  34.170 -
  34.171 -    if(containerIdx == 0)
  34.172 -    {
  34.173 -       searchVector = freeLists->bigChunksSearchVector[1];
  34.174 -       containerIdx = __builtin_ffsl(searchVector);
  34.175 -       if(containerIdx == 0)
  34.176 -       {
  34.177 -           //TODO: get additional mem and insert into free list
  34.178 -           //malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
  34.179 -           printf("VMS malloc failed: low memory");
  34.180 -           exit(1);   
  34.181 -       }
  34.182 -       containerIdx += 64;
  34.183 -    }
  34.184 -    containerIdx--;
  34.185 -    
  34.186 -
  34.187 -    foundChunk = removeChunk(freeLists, containerIdx);
  34.188 -    size_t chunkSize     = getChunkSize(foundChunk);
  34.189 -
  34.190 -    //If the new chunk is larger than the requested size: split
  34.191 -    if(chunkSize > sizeRequested + 2 * sizeof(MallocProlog) + BIG_LOWER_BOUND)
  34.192 -    {
  34.193 -       MallocProlog *newChunk = divideChunk(foundChunk,sizeRequested);
  34.194 -       containerIdx = getContainer(getChunkSize(foundChunk)) - 1;
  34.195 -       insertChunk(foundChunk,&freeLists->bigChunks[containerIdx]);
  34.196 -       if(containerIdx < 64)
  34.197 -           freeLists->bigChunksSearchVector[0] |= ((uint64)1 << containerIdx);
  34.198 -       else
  34.199 -           freeLists->bigChunksSearchVector[1] |= ((uint64)1 << (containerIdx-64));
  34.200 -       foundChunk = newChunk;
  34.201 -    } 
  34.202 -    
  34.203 -    return foundChunk;
  34.204 -}
  34.205 -
  34.206 -
  34.207 -/*
  34.208 - * This is sequential code, meant to only be called from the Master, not from
  34.209 - * any slave Slvs.
  34.210 - * 
  34.211 - *May 2012
  34.212 - *ToDo: Improve speed, by using built-in leading 1 detector to calc free-list
  34.213 - * index.
  34.214 - *Change to two separate arrays, one for free-lists of small fixed-size chunks
  34.215 - * other for free lists of exponentially growing chunk sizes
  34.216 - *Do simple compare to decide which array of lists to use
  34.217 - *For small chunks, size the lists in increments of 16, up to, say, 128 (1024
  34.218 - * is max if want less than 64 lists, which allows searching for first
  34.219 - * occupied free-list using leading-1 detector on a bit-vector)
  34.220 - *To find index, right-shift by 4 bits, and that's the index! (works because
  34.221 - * compare says no 1's above 128 position ((bit 7)), and sizes are every 16,
  34.222 - * so dividing by 16 equals exactly the position)
  34.223 - *For large chunks, have 63 free lists, but split into even and odd indexes.
  34.224 - *For even indexes, each list starts with chunks twice the size of previous
  34.225 - * even index.
  34.226 - *For odd indexes, each list starts with chunks of size half-way between those
  34.227 - * of the even indexes on either side.
  34.228 - *
  34.229 - *To calc the free-list position of a requested size, get pos of leading 1
  34.230 - * of the size, call this msbsP (most-significant-bit-set-position). Then
  34.231 - * check bit to right of it (one-less-significant)
  34.232 - *If it's 0 then use the even index: msbsP * 2, which is msbsP << 1.
  34.233 - *If it's 1, then use the odd-index, which is msbsP << 1  + 1
  34.234 - *
  34.235 - *To find msbsP, use GCC builtin: "int __builtin_clzll (unsigned long long)"
  34.236 - * which returns the number of zeros above (left of) msb set.  Note, dies if
  34.237 - * give it zero, but the compare used to choose between arrays makes sure
  34.238 - * requested size given to it is not zero.
  34.239 - * 
  34.240 - *This scheme keeps wastage small, while finding free element is O(1), and a
  34.241 - * fast constant.
  34.242 - *For large chunk sizes, if don't shave excess, then it ensures worst-case
  34.243 - * wastage due to mis-match in size of chunk vs requested size is 33% 
  34.244 - * (invariant: take any even list.. it starts at a power of 2, and next list
  34.245 - *  up starts at 50% larger, so biggest chunk is 1.5 x smallest request, that's
  34.246 - *  33% of total memory wasted. Then, for the odd index above, smallest chunk
  34.247 - *  is 2x for smallest request of 1.5x, for 25% total wasted memory)
  34.248 - *For smallest size chunks, the pre-amble wastes quite a bit, but above that,
  34.249 - * sizing in increments of 16 keeps wastage small.  And, if always shave, then
  34.250 - * wastage due to size mis-match is maximum 16 bytes for the large chunks.
  34.251 - * 
  34.252 - */
  34.253 -void *
  34.254 -VMS_int__malloc( size_t sizeRequested )
  34.255 - {     
  34.256 -         MEAS__Capture_Pre_Malloc_Point
  34.257 -   
  34.258 -   MallocArrays* freeLists = _VMSMasterEnv->freeLists;
  34.259 -   MallocProlog* foundChunk;
  34.260 -   
  34.261 -   //Return a small chunk if the requested size is smaller than 128B
  34.262 -   if(sizeRequested <= LOWER_BOUND)
  34.263 -    {
  34.264 -      uint32 freeListIdx = (sizeRequested-1)/SMALL_CHUNK_SIZE;
  34.265 -      if(freeLists->smallChunks[freeListIdx] == NULL)
  34.266 -        foundChunk = searchChunk(freeLists, SMALL_CHUNK_SIZE*(freeListIdx+1), 0);
  34.267 -      else
  34.268 -        foundChunk = removeSmallChunk(freeLists, freeListIdx);
  34.269 -       
  34.270 -      //Mark as allocated
  34.271 -      foundChunk->prevChunkInFreeList = NULL;      
  34.272 -      return foundChunk + 1;
  34.273 -    }
  34.274 -   
  34.275 -   //Calculate the expected container. Start one higher to have a Chunk that's
  34.276 -   //always big enough.
  34.277 -   uint32 containerIdx = getContainer(sizeRequested);
  34.278 -   
  34.279 -   if(freeLists->bigChunks[containerIdx] == NULL)
  34.280 -       foundChunk = searchChunk(freeLists, sizeRequested, containerIdx); 
  34.281 -   else
  34.282 -       foundChunk = removeChunk(freeLists, containerIdx); 
  34.283 -   
  34.284 -   //Mark as allocated
  34.285 -   foundChunk->prevChunkInFreeList = NULL;      
  34.286 -   
  34.287 -         MEAS__Capture_Post_Malloc_Point
  34.288 -   
  34.289 -   //skip over the prolog by adding its size to the pointer return
  34.290 -   return foundChunk + 1;
  34.291 - }
  34.292 -
  34.293 -void *
  34.294 -VMS_WL__malloc( int32 sizeRequested )
  34.295 - { void *ret;
  34.296 - 
  34.297 -   VMS_int__get_master_lock();
  34.298 -   ret = VMS_int__malloc( sizeRequested );
  34.299 -   VMS_int__release_master_lock();
  34.300 -   return ret;
  34.301 - }
  34.302 -
  34.303 -
  34.304 -/*
  34.305 - * This is sequential code, meant to only be called from the Master, not from
  34.306 - * any slave Slvs.
  34.307 - */
  34.308 -void
  34.309 -VMS_int__free( void *ptrToFree )
  34.310 - {
  34.311 -    
  34.312 -         MEAS__Capture_Pre_Free_Point;
  34.313 -         
  34.314 -   MallocArrays* freeLists = _VMSMasterEnv->freeLists;
  34.315 -   MallocProlog *chunkToFree = (MallocProlog*)ptrToFree - 1;
  34.316 -   uint32 containerIdx;
  34.317 -   
  34.318 -   //Check for free neighbors
  34.319 -   if(chunkToFree->nextLowerInMem)
  34.320 -   {
  34.321 -       if(chunkToFree->nextLowerInMem->prevChunkInFreeList != NULL)
  34.322 -       {//Chunk is not allocated
  34.323 -           extractChunk(chunkToFree->nextLowerInMem, freeLists);
  34.324 -           chunkToFree = mergeChunks(chunkToFree->nextLowerInMem, chunkToFree);
  34.325 -       }
  34.326 -   }
  34.327 -   if(chunkToFree->nextHigherInMem)
  34.328 -   {
  34.329 -       if(chunkToFree->nextHigherInMem->prevChunkInFreeList != NULL)
  34.330 -       {//Chunk is not allocated
  34.331 -           extractChunk(chunkToFree->nextHigherInMem, freeLists);
  34.332 -           chunkToFree = mergeChunks(chunkToFree, chunkToFree->nextHigherInMem);
  34.333 -       }
  34.334 -   }
  34.335 -   
  34.336 -   size_t chunkSize = getChunkSize(chunkToFree);
  34.337 -   if(chunkSize < BIG_LOWER_BOUND)
  34.338 -   {
  34.339 -       containerIdx =  (chunkSize/SMALL_CHUNK_SIZE)-1;
  34.340 -       if(containerIdx > SMALL_CHUNK_COUNT-1)
  34.341 -           containerIdx = SMALL_CHUNK_COUNT-1;
  34.342 -       insertChunk(chunkToFree, &freeLists->smallChunks[containerIdx]);
  34.343 -   }
  34.344 -   else
  34.345 -   {
  34.346 -       containerIdx = getContainer(getChunkSize(chunkToFree)) - 1;
  34.347 -       insertChunk(chunkToFree, &freeLists->bigChunks[containerIdx]);
  34.348 -       if(containerIdx < 64)
  34.349 -           freeLists->bigChunksSearchVector[0] |= (uint64)1 << containerIdx;
  34.350 -       else
  34.351 -           freeLists->bigChunksSearchVector[1] |= (uint64)1 << (containerIdx-64);
  34.352 -   }   
  34.353 -   
  34.354 -         MEAS__Capture_Post_Free_Point;
  34.355 - }
  34.356 -
  34.357 -void
  34.358 -VMS_WL__free( void *ptrToFree )
  34.359 - {
  34.360 -   VMS_int__get_master_lock();
  34.361 -   VMS_int__free( ptrToFree );
  34.362 -   VMS_int__release_master_lock();
  34.363 - }
  34.364 -
  34.365 -/*
  34.366 - * Designed to be called from the main thread outside of VMS, during init
  34.367 - */
  34.368 -MallocArrays *
  34.369 -VMS_ext__create_free_list()
  34.370 -{     
  34.371 -   //Initialize containers for small chunks and fill with zeros
  34.372 -   _VMSMasterEnv->freeLists = (MallocArrays*)malloc( sizeof(MallocArrays) );
  34.373 -   MallocArrays *freeLists = _VMSMasterEnv->freeLists;
  34.374 -   
  34.375 -   freeLists->smallChunks = 
  34.376 -           (MallocProlog**)malloc(SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
  34.377 -   memset((void*)freeLists->smallChunks,
  34.378 -           0,SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
  34.379 -   
  34.380 -   //Calculate number of containers for big chunks
  34.381 -   uint32 container = getContainer(MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE)+1;
  34.382 -   freeLists->bigChunks = (MallocProlog**)malloc(container*sizeof(MallocProlog*));
  34.383 -   memset((void*)freeLists->bigChunks,0,container*sizeof(MallocProlog*));
  34.384 -   freeLists->containerCount = container;
  34.385 -   
  34.386 -   //Create first element in lastContainer 
  34.387 -   MallocProlog *firstChunk = malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
  34.388 -   if( firstChunk == NULL ) {printf("Can't allocate initial memory\n"); exit(1);}
  34.389 -   freeLists->memSpace = firstChunk;
  34.390 -   
  34.391 -   //Touch memory to avoid page faults
  34.392 -   void *ptr,*endPtr; 
  34.393 -   endPtr = (void*)firstChunk+MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE;
  34.394 -   for(ptr = firstChunk; ptr < endPtr; ptr+=PAGE_SIZE)
  34.395 -   {
  34.396 -       *(char*)ptr = 0;
  34.397 -   }
  34.398 -   
  34.399 -   firstChunk->nextLowerInMem = NULL;
  34.400 -   firstChunk->nextHigherInMem = (MallocProlog*)((uintptr_t)firstChunk +
  34.401 -                        MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE - sizeof(MallocProlog));
  34.402 -   firstChunk->nextChunkInFreeList = NULL;
  34.403 -   //previous element in the queue is the container
  34.404 -   firstChunk->prevChunkInFreeList = &freeLists->bigChunks[container-2];
  34.405 -   
  34.406 -   freeLists->bigChunks[container-2] = firstChunk;
  34.407 -   //Insert into bit search list
  34.408 -   if(container <= 65)
  34.409 -   {
  34.410 -       freeLists->bigChunksSearchVector[0] = ((uint64)1 << (container-2));
  34.411 -       freeLists->bigChunksSearchVector[1] = 0;
  34.412 -   }   
  34.413 -   else
  34.414 -   {
  34.415 -       freeLists->bigChunksSearchVector[0] = 0;
  34.416 -       freeLists->bigChunksSearchVector[1] = ((uint64)1 << (container-66));
  34.417 -   }
  34.418 -   
  34.419 -   //Create dummy chunk to mark the top of stack this is of course
  34.420 -   //never freed
  34.421 -   MallocProlog *dummyChunk = firstChunk->nextHigherInMem;
  34.422 -   dummyChunk->nextHigherInMem = dummyChunk+1;
  34.423 -   dummyChunk->nextLowerInMem  = NULL;
  34.424 -   dummyChunk->nextChunkInFreeList = NULL;
  34.425 -   dummyChunk->prevChunkInFreeList = NULL;
  34.426 -   
  34.427 -   return freeLists;
  34.428 - }
  34.429 -
  34.430 -
  34.431 -/*Designed to be called from the main thread outside of VMS, during cleanup
  34.432 - */
  34.433 -void
  34.434 -VMS_ext__free_free_list( MallocArrays *freeLists )
  34.435 - {    
  34.436 -   free(freeLists->memSpace);
  34.437 -   free(freeLists->bigChunks);
  34.438 -   free(freeLists->smallChunks);
  34.439 -   
  34.440 - }
  34.441 -
    35.1 --- a/Services_Offered_by_VMS/Memory_Handling/vmalloc.h	Mon Sep 03 03:34:54 2012 -0700
    35.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    35.3 @@ -1,94 +0,0 @@
    35.4 -/*
    35.5 - *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
    35.6 - *  Licensed under GNU General Public License version 2
    35.7 - *
    35.8 - * Author: seanhalle@yahoo.com
    35.9 - *
   35.10 - * Created on November 14, 2009, 9:07 PM
   35.11 - */
   35.12 -
   35.13 -#ifndef _VMALLOC_H
   35.14 -#define	_VMALLOC_H
   35.15 -
   35.16 -#include <malloc.h>
   35.17 -#include <inttypes.h>
   35.18 -#include "VMS_impl/VMS_primitive_data_types.h"
   35.19 -
   35.20 -#define SMALL_CHUNK_SIZE 32
   35.21 -#define SMALL_CHUNK_COUNT 4
   35.22 -#define LOWER_BOUND     128  //Biggest chunk size that is created for the small chunks
   35.23 -#define BIG_LOWER_BOUND 160  //Smallest chunk size that is created for the big chunks
   35.24 -
   35.25 -#define LOG54 0.3219280948873623
   35.26 -#define LOG128 7
   35.27 -
   35.28 -typedef struct _MallocProlog MallocProlog;
   35.29 -
   35.30 -struct _MallocProlog
   35.31 - {
   35.32 -   MallocProlog *nextChunkInFreeList;
   35.33 -   MallocProlog *prevChunkInFreeList;
   35.34 -   MallocProlog *nextHigherInMem;
   35.35 -   MallocProlog *nextLowerInMem;
   35.36 - };
   35.37 -//MallocProlog
   35.38 - 
   35.39 - typedef struct MallocArrays MallocArrays;
   35.40 -
   35.41 - struct MallocArrays
   35.42 - {
   35.43 -     MallocProlog **smallChunks;
   35.44 -     MallocProlog **bigChunks;
   35.45 -     uint64       bigChunksSearchVector[2];
   35.46 -     void         *memSpace;
   35.47 -     uint32       containerCount;
   35.48 - };
   35.49 - //MallocArrays
   35.50 -
   35.51 -typedef struct
   35.52 - {
   35.53 -   MallocProlog *firstChunkInFreeList;
   35.54 -   int32         numInList; //TODO not used
   35.55 - }
   35.56 -FreeListHead;
   35.57 -
   35.58 -void *
   35.59 -VMS_int__malloc( size_t sizeRequested );
   35.60 -#define VMS_PI__malloc  VMS_int__malloc
   35.61 -
   35.62 -void *
   35.63 -VMS_WL__malloc( int32  sizeRequested ); /*BUG: -- get master lock */
   35.64 -#define VMS_App__malloc  VMS_WL__malloc
   35.65 -
   35.66 -void *
   35.67 -VMS_int__malloc_aligned( size_t sizeRequested );
   35.68 -#define VMS_PI__malloc_aligned VMS_int__malloc_aligned
   35.69 -
   35.70 -void
   35.71 -VMS_int__free( void *ptrToFree );
   35.72 -#define VMS_PI__free  VMS_int__free
   35.73 -
   35.74 -void
   35.75 -VMS_WL__free( void *ptrToFree );
   35.76 -#define VMS_App__free  VMS_WL__free
   35.77 -
   35.78 -
   35.79 -
   35.80 -/*Allocates memory from the external system -- higher overhead
   35.81 - */
   35.82 -void *
   35.83 -VMS_ext__malloc_in_ext( size_t sizeRequested );
   35.84 -
   35.85 -/*Frees memory that was allocated in the external system -- higher overhead
   35.86 - */
   35.87 -void
   35.88 -VMS_ext__free_in_ext( void *ptrToFree );
   35.89 -
   35.90 -
   35.91 -MallocArrays *
   35.92 -VMS_ext__create_free_list();
   35.93 -
   35.94 -void
   35.95 -VMS_ext__free_free_list(MallocArrays *freeLists );
   35.96 -
   35.97 -#endif
   35.98 \ No newline at end of file
    36.1 --- a/VMS.h	Mon Sep 03 03:34:54 2012 -0700
    36.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    36.3 @@ -1,390 +0,0 @@
    36.4 -/*
    36.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    36.6 - *  Licensed under GNU General Public License version 2
    36.7 - *
    36.8 - * Author: seanhalle@yahoo.com
    36.9 - * 
   36.10 - */
   36.11 -
   36.12 -#ifndef _VMS_H
   36.13 -#define	_VMS_H
   36.14 -#define _GNU_SOURCE
   36.15 -
   36.16 -#include "DynArray/DynArray.h"
   36.17 -#include "Hash_impl/PrivateHash.h"
   36.18 -#include "Histogram/Histogram.h"
   36.19 -#include "Queue_impl/PrivateQueue.h"
   36.20 -
   36.21 -#include "VMS_primitive_data_types.h"
   36.22 -#include "Services_Offered_by_VMS/Memory_Handling/vmalloc.h"
   36.23 -
   36.24 -#include <pthread.h>
   36.25 -#include <sys/time.h>
   36.26 -
   36.27 -//=================  Defines: included from separate files  =================
   36.28 -//
   36.29 -// Note: ALL defines are in other files, none are in here
   36.30 -//
   36.31 -#include "Defines/VMS_defs.h"
   36.32 -
   36.33 -
   36.34 -//================================ Typedefs =================================
   36.35 -//
   36.36 -typedef unsigned long long    TSCount;
   36.37 -
   36.38 -typedef struct _AnimSlot     AnimSlot;
   36.39 -typedef struct _VMSReqst      VMSReqst;
   36.40 -typedef struct _SlaveVP       SlaveVP;
   36.41 -typedef struct _MasterVP      MasterVP;
   36.42 -typedef struct _IntervalProbe IntervalProbe;
   36.43 -
   36.44 -
   36.45 -typedef SlaveVP *(*SlaveAssigner)  ( void *, AnimSlot*); //semEnv, slot for HW info
   36.46 -typedef void     (*RequestHandler) ( SlaveVP *, void * ); //prWReqst, semEnv
   36.47 -typedef void     (*TopLevelFnPtr)  ( void *, SlaveVP * ); //initData, animSlv
   36.48 -typedef void       TopLevelFn      ( void *, SlaveVP * ); //initData, animSlv
   36.49 -typedef void     (*ResumeSlvFnPtr) ( SlaveVP *, void * );
   36.50 -      //=========== MEASUREMENT STUFF ==========
   36.51 -        MEAS__Insert_Counter_Handler
   36.52 -      //========================================
   36.53 -
   36.54 -//============================ HW Dependent Fns ================================
   36.55 -
   36.56 -#include "HW_Dependent_Primitives/VMS__HW_measurement.h"
   36.57 -#include "HW_Dependent_Primitives/VMS__primitives.h"
   36.58 -
   36.59 -
   36.60 -//============= Request Related ===========
   36.61 -//
   36.62 -
   36.63 -enum VMSReqstType   //avoid starting enums at 0, for debug reasons
   36.64 - {
   36.65 -   semantic = 1,
   36.66 -   createReq,
   36.67 -   dissipate,
   36.68 -   VMSSemantic      //goes with VMSSemReqst below
   36.69 - };
   36.70 -
   36.71 -struct _VMSReqst
   36.72 - {
   36.73 -   enum VMSReqstType  reqType;//used for dissipate and in future for IO requests
   36.74 -   void              *semReqData;
   36.75 -
   36.76 -   VMSReqst *nextReqst;
   36.77 - };
   36.78 -//VMSReqst
   36.79 -
   36.80 -enum VMSSemReqstType   //These are equivalent to semantic requests, but for
   36.81 - {                     // VMS's services available directly to app, like OS
   36.82 -   make_probe = 1,    // and probe services -- like a VMS-wide built-in lang
   36.83 -   throw_excp,
   36.84 -   openFile,
   36.85 -   otherIO
   36.86 - };
   36.87 -
   36.88 -typedef struct
   36.89 - { enum VMSSemReqstType reqType;
   36.90 -   SlaveVP             *requestingSlv;
   36.91 -   char                *nameStr;  //for create probe
   36.92 -   char                *msgStr;   //for exception
   36.93 -   void                *exceptionData;
   36.94 - }
   36.95 - VMSSemReq;
   36.96 -
   36.97 -
   36.98 -//====================  Core data structures  ===================
   36.99 -
  36.100 -typedef struct
  36.101 - {
  36.102 -   //for future expansion
  36.103 - }
  36.104 -SlotPerfInfo;
  36.105 -
  36.106 -struct _AnimSlot
  36.107 - {
  36.108 -   int           workIsDone;
  36.109 -   int           needsSlaveAssigned;
  36.110 -   SlaveVP      *slaveAssignedToSlot;
  36.111 -   
  36.112 -   int           slotIdx;  //needed by Holistic Model's data gathering
  36.113 -   int           coreSlotIsOn;
  36.114 -   SlotPerfInfo *perfInfo; //used by assigner to pick best slave for core
  36.115 - };
  36.116 -//AnimSlot
  36.117 -
  36.118 - enum VPtype {
  36.119 -     Slave = 1, //default
  36.120 -     Master,
  36.121 -     Shutdown,
  36.122 -     Idle
  36.123 - };
  36.124 - 
  36.125 -/*This structure embodies the state of a slaveVP.  It is reused for masterVP
  36.126 - * and shutdownVPs.
  36.127 - */
  36.128 -struct _SlaveVP
  36.129 - {    //The offsets of these fields are hard-coded into assembly
  36.130 -   void       *stackPtr;         //save the core's stack ptr when suspend
  36.131 -   void       *framePtr;         //save core's frame ptr when suspend
  36.132 -   void       *resumeInstrPtr;   //save core's program-counter when suspend
  36.133 -   void       *coreCtlrFramePtr; //restore before jmp back to core controller
  36.134 -   void       *coreCtlrStackPtr; //restore before jmp back to core controller
  36.135 -   
  36.136 -      //============ below this, no fields are used in asm =============
  36.137 -   
  36.138 -   int         slaveID;       //each slave given a globally unique ID
  36.139 -   int         coreAnimatedBy; 
  36.140 -   void       *startOfStack;  //used to free, and to point slave to Fn
  36.141 -   enum VPtype typeOfVP;      //Slave vs Master vs Shutdown..
  36.142 -   int         assignCount;   //Each assign is for one work-unit, so IDs it
  36.143 -      //note, a scheduling decision is uniquely identified by the triple:
  36.144 -      // <slaveID, coreAnimatedBy, assignCount> -- used in record & replay
  36.145 -   
  36.146 -      //for comm -- between master and coreCtlr & btwn wrapper lib and plugin
  36.147 -   AnimSlot   *animSlotAssignedTo;
  36.148 -   VMSReqst   *requests;      //wrapper lib puts in requests, plugin takes out
  36.149 -   void       *dataRetFromReq;//Return vals from plugin to Wrapper Lib
  36.150 -
  36.151 -      //For using Slave as carrier for data
  36.152 -   void       *semanticData;  //Lang saves lang-specific things in slave here
  36.153 -
  36.154 -        //=========== MEASUREMENT STUFF ==========
  36.155 -         MEAS__Insert_Meas_Fields_into_Slave;
  36.156 -         float64     createPtInSecs;  //time VP created, in seconds
  36.157 -        //========================================
  36.158 - };
  36.159 -//SlaveVP
  36.160 -
  36.161 - 
  36.162 -/* The one and only global variable, holds many odds and ends
  36.163 - */
  36.164 -typedef struct
  36.165 - {    //The offsets of these fields are hard-coded into assembly
  36.166 -   void            *coreCtlrReturnPt;    //offset to this field used in asm
  36.167 -   int8             falseSharePad1[256 - sizeof(void*)];
  36.168 -   int32            masterLock;          //offset to this field used in asm
  36.169 -   int8             falseSharePad2[256 - sizeof(int32)];
  36.170 -      //============ below this, no fields are used in asm =============
  36.171 -
  36.172 -      //Basic VMS infrastructure
  36.173 -   SlaveVP        **masterVPs;
  36.174 -   AnimSlot      ***allAnimSlots;
  36.175 -   
  36.176 -      //plugin related
  36.177 -   SlaveAssigner    slaveAssigner;
  36.178 -   RequestHandler   requestHandler;
  36.179 -   void            *semanticEnv;
  36.180 -   
  36.181 -      //Slave creation
  36.182 -   int32            numSlavesCreated;  //gives ordering to processor creation
  36.183 -   int32            numSlavesAlive;    //used to detect fail-safe shutdown
  36.184 -
  36.185 -      //Initialization related
  36.186 -   int32            setupComplete;      //use while starting up coreCtlr
  36.187 -
  36.188 -      //Memory management related
  36.189 -   MallocArrays    *freeLists;
  36.190 -   int32            amtOfOutstandingMem;//total currently allocated
  36.191 -
  36.192 -      //Random number seeds -- random nums used in various places  
  36.193 -   uint32_t seed1;
  36.194 -   uint32_t seed2;
  36.195 -
  36.196 -      //=========== MEASUREMENT STUFF =============
  36.197 -       IntervalProbe   **intervalProbes;
  36.198 -       PrivDynArrayInfo *dynIntervalProbesInfo;
  36.199 -       HashTable        *probeNameHashTbl;
  36.200 -       int32             masterCreateProbeID;
  36.201 -       float64           createPtInSecs; //real-clock time VMS initialized
  36.202 -       Histogram       **measHists;
  36.203 -       PrivDynArrayInfo *measHistsInfo;
  36.204 -       MEAS__Insert_Susp_Meas_Fields_into_MasterEnv;
  36.205 -       MEAS__Insert_Master_Meas_Fields_into_MasterEnv;
  36.206 -       MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv;
  36.207 -       MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv;
  36.208 -       MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv;
  36.209 -       MEAS__Insert_System_Meas_Fields_into_MasterEnv;
  36.210 -       MEAS__Insert_Counter_Meas_Fields_into_MasterEnv;
  36.211 -      //==========================================
  36.212 - }
  36.213 -MasterEnv;
  36.214 -
  36.215 -//=========================  Extra Stuff Data Strucs  =======================
  36.216 -typedef struct
  36.217 - {
  36.218 -
  36.219 - }
  36.220 -VMSExcp;
  36.221 -
  36.222 -//=======================  OS Thread related  ===============================
  36.223 -
  36.224 -void * coreController( void *paramsIn );  //standard PThreads fn prototype
  36.225 -void * coreCtlr_Seq( void *paramsIn );  //standard PThreads fn prototype
  36.226 -void animationMaster( void *initData, SlaveVP *masterVP );
  36.227 -
  36.228 -
  36.229 -typedef struct
  36.230 - {
  36.231 -   void           *endThdPt;
  36.232 -   unsigned int    coreNum;
  36.233 - }
  36.234 -ThdParams;
  36.235 -
  36.236 -//=============================  Global Vars ================================
  36.237 -
  36.238 -volatile MasterEnv      *_VMSMasterEnv __align_to_cacheline__;
  36.239 -
  36.240 -   //these are global, but only used for startup and shutdown
  36.241 -pthread_t       coreCtlrThdHandles[ NUM_CORES ]; //pthread's virt-procr state
  36.242 -ThdParams      *coreCtlrThdParams [ NUM_CORES ];
  36.243 -
  36.244 -pthread_mutex_t suspendLock;
  36.245 -pthread_cond_t  suspendCond;
  36.246 -
  36.247 -//=========================  Function Prototypes  ===========================
  36.248 -/* MEANING OF   WL  PI  SS  int VMSOS
  36.249 - * These indicate which places the function is safe to use.  They stand for:
  36.250 - * 
  36.251 - * WL   Wrapper Library -- wrapper lib code should only use these
  36.252 - * PI   Plugin          -- plugin code should only use these
  36.253 - * SS   Startup and Shutdown -- designates these relate to startup & shutdown
  36.254 - * int  internal to VMS -- should not be used in wrapper lib or plugin
  36.255 - * VMSOS means "OS functions for applications to use"
  36.256 - * 
  36.257 - * VMS_int__ functions touch internal VMS data structs and are only safe
  36.258 - *  to be used inside the master lock.  However, occasionally, they appear
  36.259 - * in wrapper-lib or plugin code.  In those cases, very careful analysis
  36.260 - * has been done to be sure no concurrency issues could arise.
  36.261 - * 
  36.262 - * VMS_WL__ functions are all safe for use outside the master lock.
  36.263 - * 
  36.264 - * VMSOS are only safe for applications to use -- they're like a second
  36.265 - * language mixed in -- but they can't be used inside plugin code, and
  36.266 - * aren't meant for use in wrapper libraries, because they are themselves
  36.267 - * wrapper-library calls!
  36.268 - */
  36.269 -//========== Startup and shutdown ==========
  36.270 -void
  36.271 -VMS_SS__init();
  36.272 -
  36.273 -void
  36.274 -VMS_SS__start_the_work_then_wait_until_done();
  36.275 -
  36.276 -SlaveVP* 
  36.277 -VMS_SS__create_shutdown_slave();
  36.278 -
  36.279 -void
  36.280 -VMS_SS__shutdown();
  36.281 -
  36.282 -void
  36.283 -VMS_SS__cleanup_at_end_of_shutdown();
  36.284 -
  36.285 -
  36.286 -//==============    ===============
  36.287 -
  36.288 -inline SlaveVP *
  36.289 -VMS_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
  36.290 -#define VMS_PI__create_slaveVP VMS_int__create_slaveVP
  36.291 -#define VMS_WL__create_slaveVP VMS_int__create_slaveVP
  36.292 -
  36.293 -   //Use this to create processor inside entry point & other places outside
  36.294 -   // the VMS system boundary (IE, don't animate with a SlaveVP or MasterVP)
  36.295 -SlaveVP *
  36.296 -VMS_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
  36.297 -
  36.298 -inline SlaveVP *
  36.299 -VMS_int__create_slaveVP_helper( SlaveVP *newSlv,       TopLevelFnPtr  fnPtr,
  36.300 -                                void      *dataParam, void           *stackLocs );
  36.301 -
  36.302 -inline void
  36.303 -VMS_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
  36.304 -                              void    *dataParam);
  36.305 -
  36.306 -inline void
  36.307 -VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
  36.308 -                              void    *param);
  36.309 -
  36.310 -inline void
  36.311 -VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
  36.312 -                              void    *param1, void *param2);
  36.313 -
  36.314 -void
  36.315 -VMS_int__dissipate_slaveVP( SlaveVP *slaveToDissipate );
  36.316 -#define VMS_PI__dissipate_slaveVP VMS_int__dissipate_slaveVP
  36.317 -//WL: dissipate a SlaveVP by sending a request
  36.318 -
  36.319 -void
  36.320 -VMS_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate );
  36.321 -
  36.322 -void
  36.323 -VMS_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, VMSExcp *excpData );
  36.324 -#define VMS_PI__throw_exception  VMS_int__throw_exception
  36.325 -void
  36.326 -VMS_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  VMSExcp *excpData );
  36.327 -#define VMS_App__throw_exception VMS_WL__throw_exception
  36.328 -
  36.329 -void *
  36.330 -VMS_int__give_sem_env_for( SlaveVP *animSlv );
  36.331 -#define VMS_PI__give_sem_env_for  VMS_int__give_sem_env_for
  36.332 -#define VMS_SS__give_sem_env_for  VMS_int__give_sem_env_for
  36.333 -//No WL version -- not safe!  if use in WL, be sure data rd & wr is stable
  36.334 -
  36.335 -
  36.336 -inline void
  36.337 -VMS_int__get_master_lock();
  36.338 -
  36.339 -#define VMS_int__release_master_lock() _VMSMasterEnv->masterLock = UNLOCKED
  36.340 -
  36.341 -inline uint32_t
  36.342 -VMS_int__randomNumber();
  36.343 -
  36.344 -//==============  Request Related  ===============
  36.345 -
  36.346 -void
  36.347 -VMS_int__suspend_slaveVP_and_send_req( SlaveVP *callingSlv );
  36.348 -
  36.349 -inline void
  36.350 -VMS_WL__add_sem_request_in_mallocd_VMSReqst( void *semReqData, SlaveVP *callingSlv );
  36.351 -
  36.352 -inline void
  36.353 -VMS_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv );
  36.354 -
  36.355 -void
  36.356 -VMS_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv );
  36.357 -
  36.358 -void inline
  36.359 -VMS_WL__send_dissipate_req( SlaveVP *prToDissipate );
  36.360 -
  36.361 -inline void
  36.362 -VMS_WL__send_VMSSem_request( void *semReqData, SlaveVP *callingSlv );
  36.363 -
  36.364 -VMSReqst *
  36.365 -VMS_PI__take_next_request_out_of( SlaveVP *slaveWithReq );
  36.366 -//#define VMS_PI__take_next_request_out_of( slave ) slave->requests
  36.367 -
  36.368 -//inline void *
  36.369 -//VMS_PI__take_sem_reqst_from( VMSReqst *req );
  36.370 -#define VMS_PI__take_sem_reqst_from( req ) req->semReqData
  36.371 -
  36.372 -void inline
  36.373 -VMS_PI__handle_VMSSemReq( VMSReqst *req, SlaveVP *requestingSlv, void *semEnv,
  36.374 -                       ResumeSlvFnPtr resumeSlvFnPtr );
  36.375 -
  36.376 -//======================== MEASUREMENT ======================
  36.377 -uint64
  36.378 -VMS_WL__give_num_plugin_cycles();
  36.379 -uint32
  36.380 -VMS_WL__give_num_plugin_animations();
  36.381 -
  36.382 -
  36.383 -//========================= Utilities =======================
  36.384 -inline char *
  36.385 -VMS_int__strDup( char *str );
  36.386 -
  36.387 -
  36.388 -//========================= Probes =======================
  36.389 -#include "Services_Offered_by_VMS/Measurement_and_Stats/probes.h"
  36.390 -
  36.391 -//================================================
  36.392 -#endif	/* _VMS_H */
  36.393 -
    37.1 --- a/VMS__PI.c	Mon Sep 03 03:34:54 2012 -0700
    37.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    37.3 @@ -1,121 +0,0 @@
    37.4 -/*
    37.5 - * Copyright 2010  OpenSourceStewardshipFoundation
    37.6 - *
    37.7 - * Licensed under BSD
    37.8 - */
    37.9 -
   37.10 -#include <stdio.h>
   37.11 -#include <stdlib.h>
   37.12 -#include <string.h>
   37.13 -#include <malloc.h>
   37.14 -#include <inttypes.h>
   37.15 -#include <sys/time.h>
   37.16 -
   37.17 -#include "VMS.h"
   37.18 -
   37.19 -
   37.20 -/* MEANING OF   WL  PI  SS  int
   37.21 - * These indicate which places the function is safe to use.  They stand for:
   37.22 - * WL: Wrapper Library
   37.23 - * PI: Plugin 
   37.24 - * SS: Startup and Shutdown
   37.25 - * int: internal to the VMS implementation
   37.26 - */
   37.27 -
   37.28 -//=========================  Local Declarations  ========================
   37.29 -void inline
   37.30 -handleMakeProbe( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
   37.31 -
   37.32 -void inline
   37.33 -handleThrowException( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
   37.34 -//=======================================================================
   37.35 -
   37.36 - 
   37.37 -VMSReqst *
   37.38 -VMS_PI__take_next_request_out_of( SlaveVP *slaveWithReq )
   37.39 - { VMSReqst *req;
   37.40 -
   37.41 -   req = slaveWithReq->requests;
   37.42 -   if( req == NULL ) return NULL;
   37.43 -
   37.44 -   slaveWithReq->requests = slaveWithReq->requests->nextReqst;
   37.45 -   return req;
   37.46 - }
   37.47 -
   37.48 - 
   37.49 -
   37.50 -/*May 2012
   37.51 - *CHANGED IMPL -- now a macro in header file
   37.52 - *
   37.53 - *Turn function into macro that just accesses the request field
   37.54 - *
   37.55 -inline void *
   37.56 -VMS_PI__take_sem_reqst_from( VMSReqst *req )
   37.57 - {
   37.58 -   return req->semReqData;
   37.59 - }
   37.60 -*/
   37.61 -
   37.62 -
   37.63 -/* This is for OS requests and VMS infrastructure requests, such as to create
   37.64 - *  a probe -- a probe is inside the heart of VMS-core, it's not part of any
   37.65 - *  language -- but it's also a semantic thing that's triggered from and used
   37.66 - *  in the application.. so it crosses abstractions..  so, need some special
   37.67 - *  pattern here for handling such requests.
   37.68 - * Doing this just like it were a second language sharing VMS-core.
   37.69 - * 
   37.70 - * This is called from the language's request handler when it sees a request
   37.71 - *  of type VMSSemReq
   37.72 - *
   37.73 - * TODO: Later change this, to give probes their own separate plugin & have
   37.74 - *  VMS-core steer the request to appropriate plugin
   37.75 - * Do the same for OS calls -- look later at it..
   37.76 - */
   37.77 -void inline
   37.78 -VMS_PI__handle_VMSSemReq( VMSReqst *req, SlaveVP *requestingSlv, void *semEnv,
   37.79 -                       ResumeSlvFnPtr resumeFn )
   37.80 - { VMSSemReq *semReq;
   37.81 -
   37.82 -   semReq = VMS_PI__take_sem_reqst_from(req);
   37.83 -   if( semReq == NULL ) return;
   37.84 -   switch( semReq->reqType )  //sem handlers are all in other file
   37.85 -    {
   37.86 -      case make_probe:      handleMakeProbe(   semReq, semEnv, resumeFn);
   37.87 -         break;
   37.88 -      case throw_excp:  handleThrowException(  semReq, semEnv, resumeFn);
   37.89 -         break;
   37.90 -    }
   37.91 - }
   37.92 -
   37.93 -/*
   37.94 - */
   37.95 -void inline
   37.96 -handleMakeProbe( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
   37.97 - { IntervalProbe *newProbe;
   37.98 -
   37.99 -   newProbe          = VMS_int__malloc( sizeof(IntervalProbe) );
  37.100 -   newProbe->nameStr = VMS_int__strDup( semReq->nameStr );
  37.101 -   newProbe->hist    = NULL;
  37.102 -   newProbe->schedChoiceWasRecorded = FALSE;
  37.103 -
  37.104 -      //This runs in masterVP, so no race-condition worries
  37.105 -   newProbe->probeID =
  37.106 -            addToDynArray( newProbe, _VMSMasterEnv->dynIntervalProbesInfo );
  37.107 -
  37.108 -   semReq->requestingSlv->dataRetFromReq = newProbe;
  37.109 -
  37.110 -   //This in inside VMS, while resume_slaveVP fn is inside language, so pass
  37.111 -   // pointer from lang to here, then call it.
  37.112 -   (*resumeFn)( semReq->requestingSlv, semEnv );
  37.113 - }
  37.114 -
  37.115 -void inline
  37.116 -handleThrowException( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
  37.117 - {
  37.118 -   VMS_int__throw_exception(  semReq->msgStr, semReq->requestingSlv, semReq->exceptionData );
  37.119 -   
  37.120 -   (*resumeFn)( semReq->requestingSlv, semEnv );
  37.121 - }
  37.122 -
  37.123 -
  37.124 -
    38.1 --- a/VMS__WL.c	Mon Sep 03 03:34:54 2012 -0700
    38.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    38.3 @@ -1,160 +0,0 @@
    38.4 -/*
    38.5 - * Copyright 2010  OpenSourceStewardshipFoundation
    38.6 - *
    38.7 - * Licensed under BSD
    38.8 - */
    38.9 -
   38.10 -#include <stdio.h>
   38.11 -#include <stdlib.h>
   38.12 -#include <string.h>
   38.13 -#include <malloc.h>
   38.14 -#include <inttypes.h>
   38.15 -#include <sys/time.h>
   38.16 -
   38.17 -#include "VMS.h"
   38.18 -
   38.19 -
   38.20 -/* MEANING OF   WL  PI  SS  int
   38.21 - * These indicate which places the function is safe to use.  They stand for:
   38.22 - * WL: Wrapper Library
   38.23 - * PI: Plugin 
   38.24 - * SS: Startup and Shutdown
   38.25 - * int: internal to the VMS implementation
   38.26 - */
   38.27 -
   38.28 -
   38.29 -
   38.30 -/*For this implementation of VMS, it may not make much sense to have the
   38.31 - * system of requests for creating a new processor done this way.. but over
   38.32 - * the scope of single-master, multi-master, mult-tasking, OS-implementing,
   38.33 - * distributed-memory, and so on, this gives VMS implementation a chance to
   38.34 - * do stuff before suspend, in the SlaveVP, and in the Master before the plugin
   38.35 - * is called, as well as in the lang-lib before this is called, and in the
   38.36 - * plugin.  So, this gives both VMS and language implementations a chance to
   38.37 - * intercept at various points and do order-dependent stuff.
   38.38 - *Having a standard VMSNewPrReqData struc allows the language to create and
   38.39 - * free the struc, while VMS knows how to get the newSlv if it wants it, and
   38.40 - * it lets the lang have lang-specific data related to creation transported
   38.41 - * to the plugin.
   38.42 - */
   38.43 -void
   38.44 -VMS_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv )
   38.45 - { VMSReqst req;
   38.46 -
   38.47 -   req.reqType          = createReq;
   38.48 -   req.semReqData       = semReqData;
   38.49 -   req.nextReqst        = reqstingSlv->requests;
   38.50 -   reqstingSlv->requests = &req;
   38.51 -
   38.52 -   VMS_int__suspend_slaveVP_and_send_req( reqstingSlv );
   38.53 - }
   38.54 -
   38.55 -
   38.56 -/*
   38.57 - *This adds a request to dissipate, then suspends the processor so that the
   38.58 - * request handler will receive the request.  The request handler is what
   38.59 - * does the work of freeing memory and removing the processor from the
   38.60 - * semantic environment's data structures.
   38.61 - *The request handler also is what figures out when to shutdown the VMS
   38.62 - * system -- which causes all the core controller threads to die, and returns from
   38.63 - * the call that started up VMS to perform the work.
   38.64 - *
   38.65 - *This form is a bit misleading to understand if one is trying to figure out
   38.66 - * how VMS works -- it looks like a normal function call, but inside it
   38.67 - * sends a request to the request handler and suspends the processor, which
   38.68 - * jumps out of the VMS_WL__dissipate_slaveVP function, and out of all nestings
   38.69 - * above it, transferring the work of dissipating to the request handler,
   38.70 - * which then does the actual work -- causing the processor that animated
   38.71 - * the call of this function to disappear and the "hanging" state of this
   38.72 - * function to just poof into thin air -- the virtual processor's trace
   38.73 - * never returns from this call, but instead the virtual processor's trace
   38.74 - * gets suspended in this call and all the virt processor's state disap-
   38.75 - * pears -- making that suspend the last thing in the Slv's trace.
   38.76 - */
   38.77 -void
   38.78 -VMS_WL__send_dissipate_req( SlaveVP *slaveToDissipate )
   38.79 - { VMSReqst req;
   38.80 -
   38.81 -   req.reqType                = dissipate;
   38.82 -   req.nextReqst              = slaveToDissipate->requests;
   38.83 -   slaveToDissipate->requests = &req;
   38.84 -
   38.85 -   VMS_int__suspend_slaveVP_and_send_req( slaveToDissipate );
   38.86 - }
   38.87 -
   38.88 -
   38.89 -
   38.90 -/*This call's name indicates that request is malloc'd -- so req handler
   38.91 - * has to free any extra requests tacked on before a send, using this.
   38.92 - *
   38.93 - * This inserts the semantic-layer's request data into standard VMS carrier
   38.94 - * request data-struct that is mallocd.  The sem request doesn't need to
   38.95 - * be malloc'd if this is called inside the same call chain before the
   38.96 - * send of the last request is called.
   38.97 - *
   38.98 - *The request handler has to call VMS_int__free_VMSReq for any of these
   38.99 - */
  38.100 -inline void
  38.101 -VMS_WL__add_sem_request_in_mallocd_VMSReqst( void *semReqData,
  38.102 -                                          SlaveVP *callingSlv )
  38.103 - { VMSReqst *req;
  38.104 -
  38.105 -   req = VMS_int__malloc( sizeof(VMSReqst) );
  38.106 -   req->reqType         = semantic;
  38.107 -   req->semReqData      = semReqData;
  38.108 -   req->nextReqst       = callingSlv->requests;
  38.109 -   callingSlv->requests = req;
  38.110 - }
  38.111 -
  38.112 -/*This inserts the semantic-layer's request data into standard VMS carrier
  38.113 - * request data-struct is allocated on stack of this call & ptr to it sent
  38.114 - * to plugin
  38.115 - *Then it does suspend, to cause request to be sent.
  38.116 - */
  38.117 -inline void
  38.118 -VMS_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv )
  38.119 - { VMSReqst req;
  38.120 -
  38.121 -   req.reqType         = semantic;
  38.122 -   req.semReqData      = semReqData;
  38.123 -   req.nextReqst       = callingSlv->requests;
  38.124 -   callingSlv->requests = &req;
  38.125 -   
  38.126 -   VMS_int__suspend_slaveVP_and_send_req( callingSlv );
  38.127 - }
  38.128 -
  38.129 -
  38.130 -/*May 2012 Not sure what this is..  looks like old idea for VMS semantic
  38.131 - * request
  38.132 - */
  38.133 -inline void
  38.134 -VMS_WL__send_VMSSem_request( void *semReqData, SlaveVP *callingSlv )
  38.135 - { VMSReqst req;
  38.136 -
  38.137 -   req.reqType         = VMSSemantic;
  38.138 -   req.semReqData      = semReqData;
  38.139 -   req.nextReqst       = callingSlv->requests; //gab any other preceeding 
  38.140 -   callingSlv->requests = &req;
  38.141 -
  38.142 -   VMS_int__suspend_slaveVP_and_send_req( callingSlv );
  38.143 - }
  38.144 -
  38.145 -/*May 2012
  38.146 - *To throw exception from wrapper lib or application, first turn
  38.147 - * it into a request, then send the request
  38.148 - */
  38.149 -void
  38.150 -VMS_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  VMSExcp *excpData )
  38.151 - { VMSReqst req;
  38.152 -   VMSSemReq semReq;
  38.153 -
  38.154 -   req.reqType         = VMSSemantic;
  38.155 -   req.semReqData      = &semReq;
  38.156 -   req.nextReqst       = reqstSlv->requests; //gab any other preceeding 
  38.157 -   reqstSlv->requests   = &req;
  38.158 -
  38.159 -   semReq.msgStr        = msgStr;
  38.160 -   semReq.exceptionData = excpData;
  38.161 -   
  38.162 -   VMS_int__suspend_slaveVP_and_send_req( reqstSlv );
  38.163 - }
    39.1 --- a/VMS__int.c	Mon Sep 03 03:34:54 2012 -0700
    39.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    39.3 @@ -1,289 +0,0 @@
    39.4 -/*
    39.5 - * Copyright 2010  OpenSourceStewardshipFoundation
    39.6 - *
    39.7 - * Licensed under BSD
    39.8 - */
    39.9 -
   39.10 -#include <stdio.h>
   39.11 -#include <stdlib.h>
   39.12 -#include <string.h>
   39.13 -#include <malloc.h>
   39.14 -#include <inttypes.h>
   39.15 -#include <sys/time.h>
   39.16 -
   39.17 -#include "VMS.h"
   39.18 -
   39.19 -
   39.20 -/* MEANING OF   WL  PI  SS  int
   39.21 - * These indicate which places the function is safe to use.  They stand for:
   39.22 - * WL: Wrapper Library
   39.23 - * PI: Plugin 
   39.24 - * SS: Startup and Shutdown
   39.25 - * int: internal to the VMS implementation
   39.26 - */
   39.27 -
   39.28 -
   39.29 -inline SlaveVP *
   39.30 -VMS_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
   39.31 - { SlaveVP *newSlv;
   39.32 -   void      *stackLocs;
   39.33 -
   39.34 -   newSlv      = VMS_int__malloc( sizeof(SlaveVP) );
   39.35 -   stackLocs   = VMS_int__malloc( VIRT_PROCR_STACK_SIZE );
   39.36 -   if( stackLocs == 0 )
   39.37 -    { perror("VMS_int__malloc stack"); exit(1); }
   39.38 -
   39.39 -   _VMSMasterEnv->numSlavesAlive += 1;
   39.40 -
   39.41 -   return VMS_int__create_slaveVP_helper( newSlv, fnPtr, dataParam, stackLocs );
   39.42 - }
   39.43 -
   39.44 -/* "ext" designates that it's for use outside the VMS system -- should only
   39.45 - * be called from main thread or other thread -- never from code animated by
   39.46 - * a VMS virtual processor.
   39.47 - */
   39.48 -inline SlaveVP *
   39.49 -VMS_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
   39.50 - { SlaveVP *newSlv;
   39.51 -   char      *stackLocs;
   39.52 -
   39.53 -   newSlv      = malloc( sizeof(SlaveVP) );
   39.54 -   stackLocs  = malloc( VIRT_PROCR_STACK_SIZE );
   39.55 -   if( stackLocs == 0 )
   39.56 -    { perror("malloc stack"); exit(1); }
   39.57 -
   39.58 -   _VMSMasterEnv->numSlavesAlive += 1;
   39.59 -
   39.60 -   return VMS_int__create_slaveVP_helper(newSlv, fnPtr, dataParam, stackLocs);
   39.61 - }
   39.62 -
   39.63 -
   39.64 -//===========================================================================
   39.65 -/*there is a label inside this function -- save the addr of this label in
   39.66 - * the callingSlv struc, as the pick-up point from which to start the next
   39.67 - * work-unit for that slave.  If turns out have to save registers, then
   39.68 - * save them in the slave struc too.  Then do assembly jump to the CoreCtlr's
   39.69 - * "done with work-unit" label.  The slave struc is in the request in the
   39.70 - * slave that animated the just-ended work-unit, so all the state is saved
   39.71 - * there, and will get passed along, inside the request handler, to the
   39.72 - * next work-unit for that slave.
   39.73 - */
   39.74 -void
   39.75 -VMS_int__suspend_slaveVP_and_send_req( SlaveVP *animatingSlv )
   39.76 - { 
   39.77 -
   39.78 -      //This suspended Slv will get assigned by Master again at some
   39.79 -      // future point
   39.80 -
   39.81 -      //return ownership of the Slv and anim slot to Master virt pr
   39.82 -   animatingSlv->animSlotAssignedTo->workIsDone = TRUE;
   39.83 -
   39.84 -        HOLISTIC__Record_HwResponderInvocation_start;
   39.85 -         MEAS__Capture_Pre_Susp_Point;
   39.86 -      //This assembly function is a VMS primitive that first saves the
   39.87 -      // stack and frame pointer, plus an addr inside this assembly code.
   39.88 -      //When core ctlr later gets this slave out of a sched slot, it
   39.89 -      // restores the stack and frame and then jumps to the addr.. that
   39.90 -      // jmp causes return from this function.
   39.91 -      //So, in effect, this function takes a variable amount of wall-clock
   39.92 -      // time to complete -- the amount of time is determined by the
   39.93 -      // Master, which makes sure the memory is in a consistent state first.
   39.94 -   switchToCoreCtlr(animatingSlv);
   39.95 -   flushRegisters();
   39.96 -         MEAS__Capture_Post_Susp_Point;
   39.97 -		 
   39.98 -   return;
   39.99 - }
  39.100 -
  39.101 -
  39.102 -/* "ext" designates that it's for use outside the VMS system -- should only
  39.103 - * be called from main thread or other thread -- never from code animated by
  39.104 - * a SlaveVP, nor from a masterVP.
  39.105 - *
  39.106 - *Use this version to dissipate Slvs created outside the VMS system.
  39.107 - */
  39.108 -void
  39.109 -VMS_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate )
  39.110 - {
  39.111 -   _VMSMasterEnv->numSlavesAlive -= 1;
  39.112 -   if( _VMSMasterEnv->numSlavesAlive == 0 )
  39.113 -    {    //no more work, so shutdown
  39.114 -      VMS_SS__shutdown();  //note, creates shut-down slaves on each core
  39.115 -    }
  39.116 -
  39.117 -   //NOTE: dataParam was given to the processor, so should either have
  39.118 -      // been alloc'd with VMS_int__malloc, or freed by the level above animSlv.
  39.119 -      //So, all that's left to free here is the stack and the SlaveVP struc
  39.120 -      // itself
  39.121 -      //Note, should not stack-allocate the data param -- no guarantee, in
  39.122 -      // general that creating processor will outlive ones it creates.
  39.123 -   free( slaveToDissipate->startOfStack );
  39.124 -   free( slaveToDissipate );
  39.125 - }
  39.126 -
  39.127 -
  39.128 -
  39.129 -/*This must be called by the request handler plugin -- it cannot be called
  39.130 - * from the semantic library "dissipate processor" function -- instead, the
  39.131 - * semantic layer has to generate a request, and the plug-in calls this
  39.132 - * function.
  39.133 - *The reason is that this frees the virtual processor's stack -- which is
  39.134 - * still in use inside semantic library calls!
  39.135 - *
  39.136 - *This frees or recycles all the state owned by and comprising the VMS
  39.137 - * portion of the animating virtual procr.  The request handler must first
  39.138 - * free any semantic data created for the processor that didn't use the
  39.139 - * VMS_malloc mechanism.  Then it calls this, which first asks the malloc
  39.140 - * system to disown any state that did use VMS_malloc, and then frees the
  39.141 - * statck and the processor-struct itself.
  39.142 - *If the dissipated processor is the sole (remaining) owner of VMS_int__malloc'd
  39.143 - * state, then that state gets freed (or sent to recycling) as a side-effect
  39.144 - * of dis-owning it.
  39.145 - */
  39.146 -void
  39.147 -VMS_int__dissipate_slaveVP( SlaveVP *animatingSlv )
  39.148 - {
  39.149 -         DEBUG__printf2(dbgRqstHdlr, "VMS int dissipate slaveID: %d, alive: %d",animatingSlv->slaveID, _VMSMasterEnv->numSlavesAlive-1);
  39.150 -      //dis-own all locations owned by this processor, causing to be freed
  39.151 -      // any locations that it is (was) sole owner of
  39.152 -   _VMSMasterEnv->numSlavesAlive -= 1;
  39.153 -   if( _VMSMasterEnv->numSlavesAlive == 0 )
  39.154 -    {    //no more work, so shutdown
  39.155 -      VMS_SS__shutdown();  //note, creates shut-down processor on each core
  39.156 -    }
  39.157 -
  39.158 -      //NOTE: dataParam was given to the processor, so should either have
  39.159 -      // been alloc'd with VMS_int__malloc, or freed by the level above animSlv.
  39.160 -      //So, all that's left to free here is the stack and the SlaveVP struc
  39.161 -      // itself
  39.162 -      //Note, should not stack-allocate initial data -- no guarantee, in
  39.163 -      // general that creating processor will outlive ones it creates.
  39.164 -   VMS_int__free( animatingSlv->startOfStack );
  39.165 -   VMS_int__free( animatingSlv );
  39.166 - }
  39.167 -
  39.168 -/*Anticipating multi-tasking
  39.169 - */
  39.170 -void *
  39.171 -VMS_int__give_sem_env_for( SlaveVP *animSlv )
  39.172 - {
  39.173 -   return _VMSMasterEnv->semanticEnv;
  39.174 - }
  39.175 -
  39.176 -/*
  39.177 - *
  39.178 - */
  39.179 -inline SlaveVP *
  39.180 -VMS_int__create_slaveVP_helper( SlaveVP *newSlv,    TopLevelFnPtr  fnPtr,
  39.181 -                     void    *dataParam, void          *stackLocs )
  39.182 - {
  39.183 -   newSlv->startOfStack = stackLocs;
  39.184 -   newSlv->slaveID      = _VMSMasterEnv->numSlavesCreated++;
  39.185 -   newSlv->requests     = NULL;
  39.186 -   newSlv->animSlotAssignedTo    = NULL;
  39.187 -   newSlv->typeOfVP     = Slave;
  39.188 -   newSlv->assignCount  = 0;
  39.189 -
  39.190 -   VMS_int__reset_slaveVP_to_TopLvlFn( newSlv, fnPtr, dataParam );
  39.191 -           
  39.192 -   //============================= MEASUREMENT STUFF ========================
  39.193 -   #ifdef PROBES__TURN_ON_STATS_PROBES
  39.194 -   //TODO: make this TSCHiLow or generic equivalent
  39.195 -   //struct timeval timeStamp;
  39.196 -   //gettimeofday( &(timeStamp), NULL);
  39.197 -   //newSlv->createPtInSecs = timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0) -
  39.198 -   //                                           _VMSMasterEnv->createPtInSecs;
  39.199 -   #endif
  39.200 -   //========================================================================
  39.201 -
  39.202 -   return newSlv;
  39.203 - }
  39.204 -
  39.205 -
  39.206 -/*Later, improve this -- for now, just exits the application after printing
  39.207 - * the error message.
  39.208 - */
  39.209 -void
  39.210 -VMS_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, VMSExcp *excpData )
  39.211 - {
  39.212 -   printf("%s",msgStr);
  39.213 -   fflush(stdin);
  39.214 -   exit(1);
  39.215 - }
  39.216 -
  39.217 -
  39.218 -inline char *
  39.219 -VMS_int__strDup( char *str )
  39.220 - { char *retStr;
  39.221 -
  39.222 -   if( str == NULL ) return (char *)NULL;
  39.223 -   retStr = (char *)VMS_int__malloc( strlen(str) + 1 );
  39.224 -   strcpy( retStr, str );
  39.225 -
  39.226 -   return (char *)retStr;
  39.227 - }
  39.228 -
  39.229 -
  39.230 -inline void
  39.231 -VMS_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock );
  39.232 -
  39.233 -inline void
  39.234 -VMS_int__get_master_lock()
  39.235 - { int32 *addrOfMasterLock;
  39.236 - 
  39.237 -   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
  39.238 -
  39.239 -   int numTriesToGetLock = 0;
  39.240 -   int gotLock = 0;
  39.241 -   
  39.242 -            MEAS__Capture_Pre_Master_Lock_Point;
  39.243 -
  39.244 -   while( !gotLock ) //keep going until get master lock
  39.245 -    { 
  39.246 -      numTriesToGetLock++;   //if too many, means too much contention
  39.247 -      if( numTriesToGetLock > NUM_TRIES_BEFORE_DO_BACKOFF )
  39.248 -       { VMS_int__backoff_for_TooLongToGetLock( numTriesToGetLock );
  39.249 -       }
  39.250 -      if( numTriesToGetLock > MASTERLOCK_RETRIES_BEFORE_YIELD ) 
  39.251 -       { numTriesToGetLock = 0; 
  39.252 -         pthread_yield();
  39.253 -       }
  39.254 -   
  39.255 -         //try to get the lock
  39.256 -      gotLock = __sync_bool_compare_and_swap( addrOfMasterLock,
  39.257 -                                                         UNLOCKED, LOCKED );
  39.258 -    }
  39.259 -            MEAS__Capture_Post_Master_Lock_Point;
  39.260 - }
  39.261 -
  39.262 -/*Used by the backoff to pick a random amount of busy-wait.  Can't use the
  39.263 - * system rand because it takes much too long.
  39.264 - *Note, are passing pointers to the seeds, which are then modified
  39.265 - */
  39.266 -inline uint32_t
  39.267 -VMS_int__randomNumber()
  39.268 - {
  39.269 -	_VMSMasterEnv->seed1 = 36969 * (_VMSMasterEnv->seed1 & 65535) + 
  39.270 -                          (_VMSMasterEnv->seed1 >> 16);
  39.271 -	_VMSMasterEnv->seed2 = 18000 * (_VMSMasterEnv->seed2 & 65535) + 
  39.272 -                          (_VMSMasterEnv->seed2 >> 16);
  39.273 -	return (_VMSMasterEnv->seed1 << 16) + _VMSMasterEnv->seed2;
  39.274 - }
  39.275 -
  39.276 -
  39.277 -/*Busy-waits for a random number of cycles -- chooses number of cycles 
  39.278 - * differently than for the no-work backoff
  39.279 - */
  39.280 -inline void
  39.281 -VMS_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock )
  39.282 - { int32 i, waitIterations;
  39.283 -   volatile double fakeWorkVar; //busy-wait fake work
  39.284 -
  39.285 -   waitIterations = 
  39.286 -    VMS_int__randomNumber()% (numTriesToGetLock * GET_LOCK_BACKOFF_WEIGHT);   
  39.287 -   //addToHist( wait_iterations, coreLoopThdParams->wait_iterations_hist );
  39.288 -   for( i = 0; i < waitIterations; i++ )
  39.289 -    { fakeWorkVar += (fakeWorkVar + 32.0) / 2.0; //busy-wait
  39.290 -    }
  39.291 - }
  39.292 -
    40.1 --- a/VMS__startup_and_shutdown.c	Mon Sep 03 03:34:54 2012 -0700
    40.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    40.3 @@ -1,598 +0,0 @@
    40.4 -/*
    40.5 - * Copyright 2010  OpenSourceStewardshipFoundation
    40.6 - *
    40.7 - * Licensed under BSD
    40.8 - */
    40.9 -
   40.10 -#include <stdio.h>
   40.11 -#include <stdlib.h>
   40.12 -#include <string.h>
   40.13 -#include <malloc.h>
   40.14 -#include <inttypes.h>
   40.15 -#include <sys/time.h>
   40.16 -#include <pthread.h>
   40.17 -
   40.18 -#include "VMS.h"
   40.19 -
   40.20 -
   40.21 -#define thdAttrs NULL
   40.22 -
   40.23 -
   40.24 -/* MEANING OF   WL  PI  SS  int
   40.25 - * These indicate which places the function is safe to use.  They stand for:
   40.26 - * WL: Wrapper Library
   40.27 - * PI: Plugin 
   40.28 - * SS: Startup and Shutdown
   40.29 - * int: internal to the VMS implementation
   40.30 - */
   40.31 -
   40.32 -
   40.33 -//===========================================================================
   40.34 -AnimSlot **
   40.35 -create_anim_slots( int32 coreSlotsAreOn );
   40.36 -
   40.37 -void
   40.38 -create_masterEnv();
   40.39 -
   40.40 -void
   40.41 -create_the_coreCtlr_OS_threads();
   40.42 -
   40.43 -MallocProlog *
   40.44 -create_free_list();
   40.45 -
   40.46 -void
   40.47 -endOSThreadFn( void *initData, SlaveVP *animatingSlv );
   40.48 -
   40.49 -
   40.50 -//===========================================================================
   40.51 -
   40.52 -/*Setup has two phases:
   40.53 - * 1) Semantic layer first calls init_VMS, which creates masterEnv, and puts
   40.54 - *    the master Slv into the work-queue, ready for first "call"
   40.55 - * 2) Semantic layer then does its own init, which creates the seed virt
   40.56 - *    slave inside the semantic layer, ready to assign it when
   40.57 - *    asked by the first run of the animationMaster.
   40.58 - *
   40.59 - *This part is bit weird because VMS really wants to be "always there", and
   40.60 - * have applications attach and detach..  for now, this VMS is part of
   40.61 - * the app, so the VMS system starts up as part of running the app.
   40.62 - *
   40.63 - *The semantic layer is isolated from the VMS internals by making the
   40.64 - * semantic layer do setup to a state that it's ready with its
   40.65 - * initial Slvs, ready to assign them to slots when the animationMaster
   40.66 - * asks.  Without this pattern, the semantic layer's setup would
   40.67 - * have to modify slots directly to assign the initial virt-procrs, and put
   40.68 - * them into the readyToAnimateQ itself, breaking the isolation completely.
   40.69 - *
   40.70 - * 
   40.71 - *The semantic layer creates the initial Slv(s), and adds its
   40.72 - * own environment to masterEnv, and fills in the pointers to
   40.73 - * the requestHandler and slaveAssigner plug-in functions
   40.74 - */
   40.75 -
   40.76 -/*This allocates VMS data structures, populates the master VMSProc,
   40.77 - * and master environment, and returns the master environment to the semantic
   40.78 - * layer.
   40.79 - */
   40.80 -void
   40.81 -VMS_SS__init()
   40.82 - {
   40.83 -   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
   40.84 -      create_masterEnv();
   40.85 -      printf( "\n\n Running in SEQUENTIAL mode \n\n" );
   40.86 -   #else
   40.87 -      create_masterEnv();
   40.88 -      DEBUG__printf1(TRUE,"Offset of lock in masterEnv: %d ", (int32)offsetof(MasterEnv,masterLock) );
   40.89 -      create_the_coreCtlr_OS_threads();
   40.90 -   #endif
   40.91 - }
   40.92 -
   40.93 -
   40.94 -/*TODO: finish implementing
   40.95 - *This function returns information about the version of VMS, the language
   40.96 - * the program is being run in, its version, and information on the 
   40.97 - * hardware.
   40.98 - */
   40.99 -/*
  40.100 -char *
  40.101 -VMS_App__give_environment_string()
  40.102 - {
  40.103 -   //--------------------------
  40.104 -    fprintf(output, "#\n# >> Build information <<\n");
  40.105 -    fprintf(output, "# GCC VERSION: %d.%d.%d\n",__GNUC__,__GNUC_MINOR__,__GNUC_PATCHLEVEL__);
  40.106 -    fprintf(output, "# Build Date: %s %s\n", __DATE__, __TIME__);
  40.107 -    
  40.108 -    fprintf(output, "#\n# >> Hardware information <<\n");
  40.109 -    fprintf(output, "# Hardware Architecture: ");
  40.110 -   #ifdef __x86_64
  40.111 -    fprintf(output, "x86_64");
  40.112 -   #endif //__x86_64
  40.113 -   #ifdef __i386
  40.114 -    fprintf(output, "x86");
  40.115 -   #endif //__i386
  40.116 -    fprintf(output, "\n");
  40.117 -    fprintf(output, "# Number of Cores: %d\n", NUM_CORES);
  40.118 -   //--------------------------
  40.119 -    
  40.120 -   //VMS Plugins
  40.121 -    fprintf(output, "#\n# >> VMS Plugins <<\n");
  40.122 -    fprintf(output, "# Language : ");
  40.123 -    fprintf(output, _LANG_NAME_);
  40.124 -    fprintf(output, "\n");
  40.125 -       //Meta info gets set by calls from the language during its init,
  40.126 -       // and info registered by calls from inside the application
  40.127 -    fprintf(output, "# Assigner: %s\n", _VMSMasterEnv->metaInfo->assignerInfo);
  40.128 -
  40.129 -   //--------------------------
  40.130 -   //Application
  40.131 -    fprintf(output, "#\n# >> Application <<\n");
  40.132 -    fprintf(output, "# Name: %s\n", _VMSMasterEnv->metaInfo->appInfo);
  40.133 -    fprintf(output, "# Data Set:\n%s\n",_VMSMasterEnv->metaInfo->inputSet);
  40.134 -    
  40.135 -   //--------------------------
  40.136 - }
  40.137 - */
  40.138 - 
  40.139 -/*This structure holds all the information VMS needs to manage a program.  VMS
  40.140 - * stores information about what percent of CPU time the program is getting, what
  40.141 - * language it uses, the request handlers to call for its slaves, and so on.
  40.142 - */
  40.143 -/*
  40.144 -typedef struct
  40.145 - { void               *semEnv;
  40.146 -   RequestHdlrFnPtr    requestHandler;
  40.147 -   SlaveAssignerFnPtr  slaveAssigner;
  40.148 -   int32               numSlavesLive;
  40.149 -   void               *resultToReturn;
  40.150 -  
  40.151 -   TopLevelFnPtr   seedFnPtr;
  40.152 -   void           *dataForSeed;
  40.153 -   bool32          executionIsComplete;
  40.154 -   pthread_mutex_t doneLock;
  40.155 -   pthread_cond_t  doneCond;
  40.156 - }
  40.157 -VMSProcess;
  40.158 -*/
  40.159 -
  40.160 -         
  40.161 -/*
  40.162 -void
  40.163 -VMS_App__start_VMS_running()
  40.164 - {
  40.165 -   create_masterEnv();
  40.166 -   
  40.167 -   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
  40.168 -      //Nothing else to create for sequential mode
  40.169 -   #else
  40.170 -      create_the_coreCtlr_OS_threads();
  40.171 -   #endif    
  40.172 - }
  40.173 -*/
  40.174 -
  40.175 -/*A pointer to the startup-function for the language is given as the last
  40.176 - * argument to the call.  Use this to initialize a program in the language.
  40.177 - * This creates a data structure that encapsulates the bookkeeping info
  40.178 - * VMS uses to track and schedule a program run.
  40.179 - */
  40.180 -/*
  40.181 -VMSProcess *
  40.182 -VMS_App__spawn_program_on_data_in_Lang( TopLevelFnPtr prog_seed_fn, void *data,
  40.183 -                                    LangInitFnPtr langInitFnPtr )
  40.184 - { VMSProcess *newProcess;
  40.185 -   newProcess = malloc( sizeof(VMSProcess) );
  40.186 -   newProcess->doneLock = PTHREAD_MUTEX_INITIALIZER;
  40.187 -   newProcess->doneCond = PTHREAD_COND_INITIALIZER;
  40.188 -   newProcess->executionIsComplete = FALSE;
  40.189 -   newProcess->numSlavesLive = 0;
  40.190 -   
  40.191 -   newProcess->dataForSeed = data;
  40.192 -   newProcess->seedFnPtr   = prog_seed_fn;
  40.193 -   
  40.194 -      //The language's spawn-process function fills in the plugin function-ptrs in
  40.195 -      // the VMSProcess struct, gives the struct to VMS, which then makes and
  40.196 -      // queues the seed SlaveVP, which starts processors made from the code being
  40.197 -      // animated.
  40.198 -    
  40.199 -   (*langInitFnPtr)( newProcess );  
  40.200 -   
  40.201 -   return newProcess;
  40.202 - }
  40.203 -*/
  40.204 -
  40.205 -/*When all SlaveVPs owned by the program-run associated to the process have
  40.206 - * dissipated, then return from this call.  There is no language to cleanup,
  40.207 - * and VMS does not shutdown..  but the process bookkeeping structure,
  40.208 - * which is used by VMS to track and schedule the program, is freed.
  40.209 - *The VMSProcess structure is kept until this call collects the results from it,
  40.210 - * then freed.  If the process is not done yet when VMS gets this
  40.211 - * call, then this call waits..  the challenge here is that this call comes from
  40.212 - * a live OS thread that's outside VMS..  so, inside here, it waits on a 
  40.213 - * condition..  then it's a VMS thread that signals this to wake up..
  40.214 - *First checks whether the process is done, if yes, calls the clean-up fn then
  40.215 - * returns the result extracted from the VMSProcess struct.
  40.216 - *If process not done yet, then performs a wait (in a loop to be sure the
  40.217 - * wakeup is not spurious, which can happen).  VMS registers the wait, and upon
  40.218 - * the process ending (last SlaveVP owned by it dissipates), then VMS signals
  40.219 - * this to wakeup.  This then calls the cleanup fn and returns the result.
  40.220 - */
  40.221 -/*
  40.222 -void *
  40.223 -VMS_App__give_results_when_done_for( VMSProcess *process )
  40.224 - { void *result;
  40.225 -   
  40.226 -   pthread_mutex_lock( process->doneLock );
  40.227 -   while( !(process->executionIsComplete) )
  40.228 -    {
  40.229 -      pthread_cond_wait( process->doneCond,
  40.230 -                         process->doneLock );
  40.231 -    }
  40.232 -   pthread_mutex_unlock( process->doneLock );
  40.233 -   
  40.234 -   result = process->resultToReturn;
  40.235 -   
  40.236 -   VMS_int__cleanup_process_after_done( process );
  40.237 -   free( process );  //was malloc'd above, so free it here
  40.238 -   
  40.239 -   return result;
  40.240 - }
  40.241 -*/
  40.242 -
  40.243 -/*Turns off the VMS system, and frees all data associated with it.  Does this
  40.244 - * by creating shutdown SlaveVPs and inserting them into animation slots.
  40.245 - * Will probably have to wake up sleeping cores as part of this -- the fn that
  40.246 - * inserts the new SlaveVPs should handle the wakeup..
  40.247 - */
  40.248 -/*
  40.249 -void
  40.250 -VMS_SS__shutdown(); //already defined -- look at it
  40.251 -
  40.252 -void
  40.253 -VMS_App__shutdown()
  40.254 - {
  40.255 -   for( cores )
  40.256 -    { slave = VMS_int__create_new_SlaveVP( endOSThreadFn, NULL );
  40.257 -      VMS_int__insert_slave_onto_core( SlaveVP *slave, coreNum );
  40.258 -    }
  40.259 - }
  40.260 -*/
  40.261 -
  40.262 -/* VMS_App__start_VMS_running();
  40.263 -
  40.264 -   VMSProcess matrixMultProcess;
  40.265 -   
  40.266 -   matrixMultProcess =
  40.267 -    VMS_App__spawn_program_on_data_in_Lang( &prog_seed_fn, data, Vthread_lang );
  40.268 -   
  40.269 -   resMatrix = VMS_App__give_results_when_done_for( matrixMultProcess );
  40.270 -   
  40.271 -   VMS_App__shutdown();
  40.272 - */
  40.273 -
  40.274 -void
  40.275 -create_masterEnv()
  40.276 - { MasterEnv       *masterEnv;
  40.277 -   VMSQueueStruc  **readyToAnimateQs;
  40.278 -   int              coreIdx;
  40.279 -   SlaveVP        **masterVPs;
  40.280 -   AnimSlot     ***allAnimSlots; //ptr to array of ptrs
  40.281 -
  40.282 -
  40.283 -      //Make the master env, which holds everything else
  40.284 -   _VMSMasterEnv = malloc( sizeof(MasterEnv) );
  40.285 -
  40.286 -        //Very first thing put into the master env is the free-list, seeded
  40.287 -        // with a massive initial chunk of memory.
  40.288 -        //After this, all other mallocs are VMS__malloc.
  40.289 -   _VMSMasterEnv->freeLists        = VMS_ext__create_free_list();
  40.290 -   
  40.291 -   
  40.292 -   //===================== Only VMS__malloc after this ====================
  40.293 -   masterEnv     = (MasterEnv*)_VMSMasterEnv;
  40.294 -   
  40.295 -      //Make a readyToAnimateQ for each core controller
  40.296 -   readyToAnimateQs = VMS_int__malloc( NUM_CORES * sizeof(VMSQueueStruc *) );
  40.297 -   masterVPs        = VMS_int__malloc( NUM_CORES * sizeof(SlaveVP *) );
  40.298 -
  40.299 -      //One array for each core, several in array, core's masterVP scheds all
  40.300 -   allAnimSlots    = VMS_int__malloc( NUM_CORES * sizeof(AnimSlot *) );
  40.301 -
  40.302 -   _VMSMasterEnv->numSlavesAlive = 0;  //used to detect shut-down condition
  40.303 -
  40.304 -   _VMSMasterEnv->numSlavesCreated = 0;  //used by create slave to set ID
  40.305 -   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  40.306 -    {    
  40.307 -      readyToAnimateQs[ coreIdx ] = makeVMSQ();
  40.308 -      
  40.309 -         //Q: should give masterVP core-specific info as its init data?
  40.310 -      masterVPs[ coreIdx ] = VMS_int__create_slaveVP( (TopLevelFnPtr)&animationMaster, (void*)masterEnv );
  40.311 -      masterVPs[ coreIdx ]->coreAnimatedBy = coreIdx;
  40.312 -      masterVPs[ coreIdx ]->typeOfVP = Master;
  40.313 -      allAnimSlots[ coreIdx ] = create_anim_slots( coreIdx ); //makes for one core
  40.314 -    }
  40.315 -   _VMSMasterEnv->masterVPs        = masterVPs;
  40.316 -   _VMSMasterEnv->masterLock       = UNLOCKED;
  40.317 -   _VMSMasterEnv->seed1 = rand()%1000; // init random number generator
  40.318 -   _VMSMasterEnv->seed2 = rand()%1000; // init random number generator
  40.319 -   _VMSMasterEnv->allAnimSlots    = allAnimSlots;
  40.320 -   _VMSMasterEnv->measHistsInfo = NULL; 
  40.321 -
  40.322 -   //============================= MEASUREMENT STUFF ========================
  40.323 -      
  40.324 -         MEAS__Make_Meas_Hists_for_Susp_Meas;
  40.325 -         MEAS__Make_Meas_Hists_for_Master_Meas;
  40.326 -         MEAS__Make_Meas_Hists_for_Master_Lock_Meas;
  40.327 -         MEAS__Make_Meas_Hists_for_Malloc_Meas;
  40.328 -         MEAS__Make_Meas_Hists_for_Plugin_Meas;
  40.329 -         MEAS__Make_Meas_Hists_for_Language;
  40.330 -
  40.331 -         PROBES__Create_Probe_Bookkeeping_Vars;
  40.332 -         
  40.333 -         HOLISTIC__Setup_Perf_Counters;
  40.334 -         
  40.335 -   //========================================================================
  40.336 - }
  40.337 -
  40.338 -AnimSlot **
  40.339 -create_anim_slots( int32 coreSlotsAreOn )
  40.340 - { AnimSlot  **animSlots;
  40.341 -   int i;
  40.342 -
  40.343 -   animSlots  = VMS_int__malloc( NUM_ANIM_SLOTS * sizeof(AnimSlot *) );
  40.344 -
  40.345 -   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
  40.346 -    {
  40.347 -      animSlots[i] = VMS_int__malloc( sizeof(AnimSlot) );
  40.348 -
  40.349 -         //Set state to mean "handling requests done, slot needs filling"
  40.350 -      animSlots[i]->workIsDone         = FALSE;
  40.351 -      animSlots[i]->needsSlaveAssigned = TRUE;
  40.352 -      animSlots[i]->slotIdx            = i; //quick retrieval of slot pos
  40.353 -      animSlots[i]->coreSlotIsOn       = coreSlotsAreOn;
  40.354 -    }
  40.355 -   return animSlots;
  40.356 - }
  40.357 -
  40.358 -
  40.359 -void
  40.360 -freeAnimSlots( AnimSlot **animSlots )
  40.361 - { int i;
  40.362 -   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
  40.363 -    {
  40.364 -      VMS_int__free( animSlots[i] );
  40.365 -    }
  40.366 -   VMS_int__free( animSlots );
  40.367 - }
  40.368 -
  40.369 -
  40.370 -void
  40.371 -create_the_coreCtlr_OS_threads()
  40.372 - {
  40.373 -   //========================================================================
  40.374 -   //                      Create the Threads
  40.375 -   int coreIdx, retCode;
  40.376 -
  40.377 -      //Need the threads to be created suspended, and wait for a signal
  40.378 -      // before proceeding -- gives time after creating to initialize other
  40.379 -      // stuff before the coreCtlrs set off.
  40.380 -   _VMSMasterEnv->setupComplete = 0;
  40.381 -   
  40.382 -      //initialize the cond used to make the new threads wait and sync up
  40.383 -      //must do this before *creating* the threads..
  40.384 -   pthread_mutex_init( &suspendLock, NULL );
  40.385 -   pthread_cond_init( &suspendCond, NULL );
  40.386 -
  40.387 -      //Make the threads that animate the core controllers
  40.388 -   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
  40.389 -    { coreCtlrThdParams[coreIdx]          = VMS_int__malloc( sizeof(ThdParams) );
  40.390 -      coreCtlrThdParams[coreIdx]->coreNum = coreIdx;
  40.391 -
  40.392 -      retCode =
  40.393 -      pthread_create( &(coreCtlrThdHandles[coreIdx]),
  40.394 -                        thdAttrs,
  40.395 -                       &coreController,
  40.396 -               (void *)(coreCtlrThdParams[coreIdx]) );
  40.397 -      if(retCode){printf("ERROR creating thread: %d\n", retCode); exit(1);}
  40.398 -    }
  40.399 - }
  40.400 -
  40.401 -
  40.402 -
  40.403 -void
  40.404 -VMS_SS__register_request_handler( RequestHandler requestHandler )
  40.405 - { _VMSMasterEnv->requestHandler = requestHandler;
  40.406 - }
  40.407 -
  40.408 -
  40.409 -void
  40.410 -VMS_SS__register_anim_assigner( SlaveAssigner animAssigner )
  40.411 - { _VMSMasterEnv->slaveAssigner = animAssigner;
  40.412 - }
  40.413 -
  40.414 -VMS_SS__register_semantic_env( void *semanticEnv )
  40.415 - { _VMSMasterEnv->semanticEnv = semanticEnv;
  40.416 - }
  40.417 -
  40.418 -
  40.419 -/*This is what causes the VMS system to initialize.. then waits for it to
  40.420 - * exit.
  40.421 - * 
  40.422 - *Wrapper lib layer calls this when it wants the system to start running..
  40.423 - */
  40.424 -void
  40.425 -VMS_SS__start_the_work_then_wait_until_done()
  40.426 - { 
  40.427 -#ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
  40.428 -   /*Only difference between version with an OS thread pinned to each core and
  40.429 -    * the sequential version of VMS is VMS__init_Seq, this, and coreCtlr_Seq.
  40.430 -    */
  40.431 -         //Instead of un-suspending threads, just call the one and only
  40.432 -         // core ctlr (sequential version), in the main thread.
  40.433 -      coreCtlr_Seq( NULL );
  40.434 -      flushRegisters();
  40.435 -#else
  40.436 -   int coreIdx;
  40.437 -      //Start the core controllers running
  40.438 -   
  40.439 -      //tell the core controller threads that setup is complete
  40.440 -      //get lock, to lock out any threads still starting up -- they'll see
  40.441 -      // that setupComplete is true before entering while loop, and so never
  40.442 -      // wait on the condition
  40.443 -   pthread_mutex_lock(     &suspendLock );
  40.444 -   _VMSMasterEnv->setupComplete = 1;
  40.445 -   pthread_mutex_unlock(   &suspendLock );
  40.446 -   pthread_cond_broadcast( &suspendCond );
  40.447 -   
  40.448 -   
  40.449 -      //wait for all to complete
  40.450 -   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
  40.451 -    {
  40.452 -      pthread_join( coreCtlrThdHandles[coreIdx], NULL );
  40.453 -    }
  40.454 -   
  40.455 -      //NOTE: do not clean up VMS env here -- semantic layer has to have
  40.456 -      // a chance to clean up its environment first, then do a call to free
  40.457 -      // the Master env and rest of VMS locations
  40.458 -#endif
  40.459 - }
  40.460 -
  40.461 -
  40.462 -SlaveVP* VMS_SS__create_shutdown_slave(){
  40.463 -    SlaveVP* shutdownVP;
  40.464 -    
  40.465 -    shutdownVP = VMS_int__create_slaveVP( &endOSThreadFn, NULL );
  40.466 -    shutdownVP->typeOfVP = Shutdown;
  40.467 -    
  40.468 -    return shutdownVP;
  40.469 -}
  40.470 -
  40.471 -//TODO: look at architecting cleanest separation between request handler
  40.472 -// and animation master, for dissipate, create, shutdown, and other non-semantic
  40.473 -// requests.  Issue is chain: one removes requests from AppSlv, one dispatches
  40.474 -// on type of request, and one handles each type..  but some types require
  40.475 -// action from both request handler and animation master -- maybe just give the
  40.476 -// request handler calls like:  VMS__handle_X_request_type
  40.477 -
  40.478 -
  40.479 -/*This is called by the semantic layer's request handler when it decides its
  40.480 - * time to shut down the VMS system.  Calling this causes the core controller OS
  40.481 - * threads to exit, which unblocks the entry-point function that started up
  40.482 - * VMS, and allows it to grab the result and return to the original single-
  40.483 - * threaded application.
  40.484 - * 
  40.485 - *The _VMSMasterEnv is needed by this shut down function, so the create-seed-
  40.486 - * and-wait function has to free a bunch of stuff after it detects the
  40.487 - * threads have all died: the masterEnv, the thread-related locations,
  40.488 - * masterVP any AppSlvs that might still be allocated and sitting in the
  40.489 - * semantic environment, or have been orphaned in the _VMSWorkQ.
  40.490 - * 
  40.491 - *NOTE: the semantic plug-in is expected to use VMS__malloc to get all the
  40.492 - * locations it needs, and give ownership to masterVP.  Then, they will be
  40.493 - * automatically freed.
  40.494 - *
  40.495 - *In here,create one core-loop shut-down processor for each core controller and put
  40.496 - * them all directly into the readyToAnimateQ.
  40.497 - *Note, this function can ONLY be called after the semantic environment no
  40.498 - * longer cares if AppSlvs get animated after the point this is called.  In
  40.499 - * other words, this can be used as an abort, or else it should only be
  40.500 - * called when all AppSlvs have finished dissipate requests -- only at that
  40.501 - * point is it sure that all results have completed.
  40.502 - */
  40.503 -void
  40.504 -VMS_SS__shutdown()
  40.505 - { int32       coreIdx;
  40.506 -   SlaveVP    *shutDownSlv;
  40.507 -   AnimSlot **animSlots;
  40.508 -      //create the shutdown processors, one for each core controller -- put them
  40.509 -      // directly into the Q -- each core will die when gets one
  40.510 -   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  40.511 -    {    //Note, this is running in the master
  40.512 -      shutDownSlv = VMS_SS__create_shutdown_slave();
  40.513 -         //last slave has dissipated, so no more in slots, so write
  40.514 -         // shut down slave into first animulng slot.
  40.515 -      animSlots = _VMSMasterEnv->allAnimSlots[ coreIdx ];
  40.516 -      animSlots[0]->slaveAssignedToSlot = shutDownSlv;
  40.517 -      animSlots[0]->needsSlaveAssigned = FALSE;
  40.518 -      shutDownSlv->coreAnimatedBy = coreIdx;
  40.519 -      shutDownSlv->animSlotAssignedTo = animSlots[ 0 ];
  40.520 -    }
  40.521 - }
  40.522 -
  40.523 -
  40.524 -/*Am trying to be cute, avoiding IF statement in coreCtlr that checks for
  40.525 - * a special shutdown slaveVP.  Ended up with extra-complex shutdown sequence.
  40.526 - *This function has the sole purpose of setting the stack and framePtr
  40.527 - * to the coreCtlr's stack and framePtr.. it does that then jumps to the
  40.528 - * core ctlr's shutdown point -- might be able to just call Pthread_exit
  40.529 - * from here, but am going back to the pthread's stack and setting everything
  40.530 - * up just as if it never jumped out, before calling pthread_exit.
  40.531 - *The end-point of core ctlr will free the stack and so forth of the
  40.532 - * processor that animates this function, (this fn is transfering the
  40.533 - * animator of the AppSlv that is in turn animating this function over
  40.534 - * to core controller function -- note that this slices out a level of virtual
  40.535 - * processors).
  40.536 - */
  40.537 -void
  40.538 -endOSThreadFn( void *initData, SlaveVP *animatingSlv )
  40.539 - { 
  40.540 -   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
  40.541 -    asmTerminateCoreCtlrSeq(animatingSlv);
  40.542 -   #else
  40.543 -    asmTerminateCoreCtlr(animatingSlv);
  40.544 -   #endif
  40.545 - }
  40.546 -
  40.547 -
  40.548 -/*This is called from the startup & shutdown
  40.549 - */
  40.550 -void
  40.551 -VMS_SS__cleanup_at_end_of_shutdown()
  40.552 - { 
  40.553 -      //Before getting rid of everything, print out any measurements made
  40.554 -   if( _VMSMasterEnv->measHistsInfo != NULL )
  40.555 -    { forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&printHist );
  40.556 -      forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&saveHistToFile);
  40.557 -      forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&freeHist );
  40.558 -    }
  40.559 -   
  40.560 -   MEAS__Print_Hists_for_Susp_Meas;
  40.561 -   MEAS__Print_Hists_for_Master_Meas;
  40.562 -   MEAS__Print_Hists_for_Master_Lock_Meas;
  40.563 -   MEAS__Print_Hists_for_Malloc_Meas;
  40.564 -   MEAS__Print_Hists_for_Plugin_Meas;
  40.565 -   
  40.566 -
  40.567 -      //All the environment data has been allocated with VMS__malloc, so just
  40.568 -      // free its internal big-chunk and all inside it disappear.
  40.569 -/*
  40.570 -   readyToAnimateQs = _VMSMasterEnv->readyToAnimateQs;
  40.571 -   masterVPs        = _VMSMasterEnv->masterVPs;
  40.572 -   allAnimSlots    = _VMSMasterEnv->allAnimSlots;
  40.573 -   
  40.574 -   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
  40.575 -    {
  40.576 -      freeVMSQ( readyToAnimateQs[ coreIdx ] );
  40.577 -         //master Slvs were created external to VMS, so use external free
  40.578 -      VMS_int__dissipate_slaveVP( masterVPs[ coreIdx ] );
  40.579 -      
  40.580 -      freeAnimSlots( allAnimSlots[ coreIdx ] );
  40.581 -    }
  40.582 -   
  40.583 -   VMS_int__free( _VMSMasterEnv->readyToAnimateQs );
  40.584 -   VMS_int__free( _VMSMasterEnv->masterVPs );
  40.585 -   VMS_int__free( _VMSMasterEnv->allAnimSlots );
  40.586 -   
  40.587 -   //============================= MEASUREMENT STUFF ========================
  40.588 -   #ifdef PROBES__TURN_ON_STATS_PROBES
  40.589 -   freeDynArrayDeep( _VMSMasterEnv->dynIntervalProbesInfo, &VMS_WL__free_probe);
  40.590 -   #endif
  40.591 -   //========================================================================
  40.592 -*/
  40.593 -      //These are the only two that use system free 
  40.594 -   VMS_ext__free_free_list( _VMSMasterEnv->freeLists );
  40.595 -   free( (void *)_VMSMasterEnv );
  40.596 - }
  40.597 -
  40.598 -
  40.599 -//================================
  40.600 -
  40.601 -
    41.1 --- a/VMS_primitive_data_types.h	Mon Sep 03 03:34:54 2012 -0700
    41.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    41.3 @@ -1,42 +0,0 @@
    41.4 -/*
    41.5 - *  Copyright 2009 OpenSourceStewardshipFoundation.org
    41.6 - *  Licensed under GNU General Public License version 2
    41.7 - *  
    41.8 - * Author: seanhalle@yahoo.com
    41.9 - *  
   41.10 -
   41.11 - */
   41.12 -
   41.13 -#ifndef _PRIMITIVE_DATA_TYPES_H
   41.14 -#define _PRIMITIVE_DATA_TYPES_H
   41.15 -
   41.16 -
   41.17 -/*For portability, need primitive data types that have a well defined
   41.18 - * size, and well-defined layout into bytes
   41.19 - *To do this, provide standard aliases for all primitive data types
   41.20 - *These aliases must be used in all functions instead of the ANSI types
   41.21 - *
   41.22 - *When VMS is used together with BLIS, these definitions will be replaced
   41.23 - * inside each specialization module according to the compiler used in
   41.24 - * that module and the hardware being specialized to.
   41.25 - */
   41.26 -typedef char               bool8;
   41.27 -typedef char               int8;
   41.28 -typedef char               uint8;
   41.29 -typedef short              int16;
   41.30 -typedef unsigned short     uint16;
   41.31 -typedef int                int32;
   41.32 -typedef unsigned int       uint32;
   41.33 -typedef unsigned int       bool32;
   41.34 -typedef long long          int64;
   41.35 -typedef unsigned long long uint64;
   41.36 -typedef float              float32;
   41.37 -typedef double             float64;
   41.38 -//typedef double double      float128;  //GCC doesn't like this
   41.39 -#define float128 double double
   41.40 -
   41.41 -#define TRUE  1
   41.42 -#define FALSE 0
   41.43 -
   41.44 -#endif	/* _PRIMITIVE_DATA_TYPES_H */
   41.45 -
    42.1 --- a/__README__Code_Overview.txt	Mon Sep 03 03:34:54 2012 -0700
    42.2 +++ b/__README__Code_Overview.txt	Wed Sep 19 23:12:44 2012 -0700
    42.3 @@ -1,21 +1,21 @@
    42.4  
    42.5 -This file is intended to help those new to VMS to find their way around the code.
    42.6 +This file is intended to help those new to PR to find their way around the code.
    42.7  
    42.8  Some observations:
    42.9 --] VMS.h is the top header file, and is the root of a tree of #includes that pulls in all the other headers
   42.10 +-] PR.h is the top header file, and is the root of a tree of #includes that pulls in all the other headers
   42.11  
   42.12  -] Defines directory contains all the header files that hold #define statements
   42.13  
   42.14 --] VMS has several kinds of function, grouped according to what kind of code should call them: VMS_App_.. for applications to call, VMS_WL_.. for wrapper-library code to call, VMS_PI_.. for plugin code to call, and VMS_int_.. for VMS to use internally.  Sometimes VMS_int_ functions are called from the wrapper library or plugin, but this should only be done by programmers who have gained an in-depth knowledge of VMS's implementation and understand that VMS_int_ functions are not protected for concurrent use..
   42.15 +-] PR has several kinds of function, grouped according to what kind of code should call them: PR_App_.. for applications to call, PR_WL_.. for wrapper-library code to call, PR_PI_.. for plugin code to call, and PR_int_.. for PR to use internally.  Sometimes PR_int_ functions are called from the wrapper library or plugin, but this should only be done by programmers who have gained an in-depth knowledge of PR's implementation and understand that PR_int_ functions are not protected for concurrent use..
   42.16  
   42.17 --] VMS has its own version of malloc, unfortunately, which is due to the system malloc breaking when the stack-pointer register is manipulated, which VMS must do.  The VMS form of malloc must be used in code that runs inside the VMS system, especially all application code that uses a VMS-based language.  However, a complication is that the malloc implementation is not protected with a lock.  However, mallocs performed in the main thread, outside the VMS-language program, cannot use VMS malloc..  this presents some issues crossing the boundary..
   42.18 +-] PR has its own version of malloc, unfortunately, which is due to the system malloc breaking when the stack-pointer register is manipulated, which PR must do.  The PR form of malloc must be used in code that runs inside the PR system, especially all application code that uses a PR-based language.  However, a complication is that the malloc implementation is not protected with a lock.  However, mallocs performed in the main thread, outside the PR-language program, cannot use PR malloc..  this presents some issues crossing the boundary..
   42.19  
   42.20 --] Things in the code are turned on and off by using #define in combination with #ifdef.  All defines for doing this are found in Defines/VMS_defs__turn_on_and_off.h.  The rest of the files in Defines directory contain macro definitions, hardware constants, and any other #define statements.
   42.21 +-] Things in the code are turned on and off by using #define in combination with #ifdef.  All defines for doing this are found in Defines/PR_defs__turn_on_and_off.h.  The rest of the files in Defines directory contain macro definitions, hardware constants, and any other #define statements.
   42.22  
   42.23 --] VMS has many macros used in the code..  such as for measurements and debug..  all measurement, debug, and statistics gathering statements can be turned on or off by commenting-out or uncommenting the appropriate #define.  
   42.24 +-] PR has many macros used in the code..  such as for measurements and debug..  all measurement, debug, and statistics gathering statements can be turned on or off by commenting-out or uncommenting the appropriate #define.  
   42.25  
   42.26 --] The best way to learn VMS is to uncomment  DEBUG__TURN_ON_SEQUENTIAL_MODE, which allows using a normal debugger while sequentially executing through both application code and VMS internals.  Setting breakpoints at various spots in the code is a good way to see the VMS system in operation.
   42.27 +-] The best way to learn PR is to uncomment  DEBUG__TURN_ON_SEQUENTIAL_MODE, which allows using a normal debugger while sequentially executing through both application code and PR internals.  Setting breakpoints at various spots in the code is a good way to see the PR system in operation.
   42.28  
   42.29 --] VMS has several "VMS primitives" implemented with assembly code.  The net effect of these assembly functions is to perform the switching between application code and the VMS system.
   42.30 +-] PR has several "PR primitives" implemented with assembly code.  The net effect of these assembly functions is to perform the switching between application code and the PR system.
   42.31  
   42.32 --] The heart of this multi-core version of VMS is the AnimationMaster and CoreController.  Those files have large comments explaining the nature of VMS and this implementation.  Those comments are the best place to start reading, to get an understanding of the code before tracing through it.
   42.33 \ No newline at end of file
   42.34 +-] The heart of this multi-core version of PR is the AnimationMaster and CoreController.  Those files have large comments explaining the nature of PR and this implementation.  Those comments are the best place to start reading, to get an understanding of the code before tracing through it.
   42.35 \ No newline at end of file