changeset 230:f2a7831352dc Common_Ancestor

changed SchedulingMaster.c to AnimationMaster.c and cleaned up all comments
author Some Random Person <seanhalle@yahoo.com>
date Thu, 15 Mar 2012 20:31:41 -0700
parents 5c475c4b7b49
children 88fd85921d7f
files AnimationMaster.c CoreController.c Hardware_Dependent/VMS__primitives_asm.s SchedulingMaster.c VMS.h VMS__startup_and_shutdown.c __README__Code_Overview.txt
diffstat 7 files changed, 465 insertions(+), 406 deletions(-) [+]
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/AnimationMaster.c	Thu Mar 15 20:31:41 2012 -0700
     1.3 @@ -0,0 +1,393 @@
     1.4 +/*
     1.5 + * Copyright 2010  OpenSourceStewardshipFoundation
     1.6 + * 
     1.7 + * Licensed under BSD
     1.8 + */
     1.9 +
    1.10 +
    1.11 +
    1.12 +#include <stdio.h>
    1.13 +#include <stddef.h>
    1.14 +
    1.15 +#include "VMS.h"
    1.16 +
    1.17 +
    1.18 +//========================= Local Fn Prototypes =============================
    1.19 +void inline
    1.20 +stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
    1.21 +               SlaveVP *masterVP );
    1.22 +
    1.23 +//===========================================================================
    1.24 +
    1.25 +
    1.26 +
    1.27 +/*The animationMaster embodies most of the animator of the language.  The
    1.28 + * animator is what emodies the behavior of language constructs. 
    1.29 + * As such, it is the animationMaster, in combination with the plugin
    1.30 + * functions, that make the language constructs do their behavior.   
    1.31 + * 
    1.32 + *Within the code, this is the top-level-function of the masterVPs, and
    1.33 + * runs when the coreController has no more slave VPs.  It's job is to
    1.34 + * refill the animation slots with slaves.
    1.35 + *
    1.36 + *To do this, it scans the animation slots for just-completed slaves.
    1.37 + * Each of these has a request in it.  So, the master hands each to the
    1.38 + * plugin's request handler.
    1.39 + *Each request represents a language construct that has been encountered
    1.40 + * by the application code in the slave. Passing the request to the
    1.41 + * request handler is how that language construct's behavior gets invoked.
    1.42 + * The request handler then performs the actions of the construct's
    1.43 + * behavior. So, the request handler encodes the behavior of the 
    1.44 + * language's parallelism constructs, and performs that when the master
    1.45 + * hands it a slave containing a request to perform that construct.
    1.46 + * 
    1.47 + *On a shared-memory machine, the behavior of parallelism constructs
    1.48 + * equals control, over order of execution of code.  Hence, the behavior
    1.49 + * of the language constructs performed by the request handler is to 
    1.50 + * choose the order that slaves get animated, and thereby control the
    1.51 + * order that application code in the slaves executes.
    1.52 + * 
    1.53 + *To control order of animation of slaves, the request handler has a
    1.54 + * semantic environment that holds data structures used to hold slaves
    1.55 + * and choose when they're ready to be animated.
    1.56 + *
    1.57 + *Once a slave is marked as ready to be animated by the request handler,
    1.58 + * it is the second plugin function, the Assigner, which chooses the core
    1.59 + * the slave gets assigned to for animation.  Hence, the Assigner doesn't
    1.60 + * perform any of the semantic behavior of language constructs, rather
    1.61 + * it gives the language a chance to improve performance. The performance
    1.62 + * of application code is strongly related to communication between
    1.63 + * cores. On shared-memory machines, communication is caused during
    1.64 + * execution of code, by memory accesses, and how much depends on contents
    1.65 + * of caches connected to the core executing the code.  So, the placement
    1.66 + * of slaves determines the communication caused during execution of the
    1.67 + * slave's code.
    1.68 + *The point of the Assigner, then, is to use application information during
    1.69 + * execution of the program, to make choices about slave placement onto
    1.70 + * cores, with the aim to put slaves close to caches containing the data
    1.71 + * used by the slave's code.
    1.72 + * 
    1.73 + *==========================================================================
    1.74 + *In summary, the animationMaster scans the slots, finds slaves
    1.75 + * just-finished, which hold requests, pass those to the request handler,
    1.76 + * along with the semantic environment, and the request handler then manages
    1.77 + * the structures in the semantic env, which controls the order of
    1.78 + * animation of slaves, and so embodies the behavior of the language
    1.79 + * constructs.
    1.80 + *The animationMaster then rescans the slots, offering each empty one to
    1.81 + * the Assigner, along with the semantic environment.  The Assigner chooses
    1.82 + * among the ready slaves in the semantic Env, finding the one best suited
    1.83 + * to be animated by that slot's associated core.
    1.84 + * 
    1.85 + *==========================================================================
    1.86 + *Implementation Details:
    1.87 + * 
    1.88 + *There is a separate masterVP for each core, but a single semantic
    1.89 + * environment shared by all cores.  Each core also has its own scheduling
    1.90 + * slots, which are used to communicate slaves between animationMaster and
    1.91 + * coreController.  There is only one global variable, _VMSMasterEnv, which
    1.92 + * holds the semantic env and other things shared by the different
    1.93 + * masterVPs.  The request handler and Assigner are registered with
    1.94 + * the animationMaster by the language's init function, and a pointer to
    1.95 + * each is in the _VMSMasterEnv. (There are also some pthread related global
    1.96 + * vars, but they're only used during init of VMS).
    1.97 + *VMS gains control over the cores by essentially "turning off" the OS's
    1.98 + * scheduler, using pthread pin-to-core commands.
    1.99 + *
   1.100 + *The masterVPs are created during init, with this animationMaster as their
   1.101 + * top level function.  The masterVPs use the same SlaveVP data structure,
   1.102 + * even though they're not slave VPs.
   1.103 + *A "seed slave" is also created during init -- this is equivalent to the
   1.104 + * "main" function in C, and acts as the entry-point to the VMS-language-
   1.105 + * based application.
   1.106 + *The masterVPs shared a single system-wide master-lock, so only one
   1.107 + * masterVP may be animated at a time.
   1.108 + *The core controllers access _VMSMasterEnv to get the masterVP, and when
   1.109 + * they start, the slots are all empty, so they run their associated core's
   1.110 + * masterVP.  The first of those to get the master lock sees the seed slave
   1.111 + * in the shared semantic environment, so when it runs the Assigner, that
   1.112 + * returns the seed slave, which the animationMaster puts into a scheduling
   1.113 + * slot then switches to the core controller.  That then switches the core
   1.114 + * over to the seed slave, which then proceeds to execute language
   1.115 + * constructs to create more slaves, and so on.  Each of those constructs
   1.116 + * causes the seed slave to suspend, switching over to the core controller,
   1.117 + * which eventually switches to the masterVP, which executes the 
   1.118 + * request handler, which uses VMS primitives to carry out the creation of
   1.119 + * new slave VPs, which are marked as ready for the Assigner, and so on..
   1.120 + * 
   1.121 + *On animation slots, and system behavior:
   1.122 + * A request may linger in a animation slot for a long time while
   1.123 + * the slaves in the other slots are animated.  This only becomes a problem
   1.124 + * when such a request is a choke-point in the constraints, and is needed
   1.125 + * to free work for *other* cores.  To reduce this occurance, the number
   1.126 + * of animation slots should be kept low.  In balance, having multiple
   1.127 + * animation slots amortizes the overhead of switching to the masterVP and
   1.128 + * executing the animationMaster code, which drives for more than one. In
   1.129 + * practice, the best balance should be discovered by profiling.
   1.130 + */
   1.131 +void animationMaster( void *initData, SlaveVP *masterVP )
   1.132 + { 
   1.133 +      //Used while scanning and filling animation slots
   1.134 +   int32           slotIdx, numSlotsFilled;
   1.135 +   SchedSlot      *currSlot, **schedSlots;
   1.136 +   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
   1.137 +
   1.138 +      //Local copies, for performance
   1.139 +   MasterEnv      *masterEnv;
   1.140 +   SlaveAssigner   slaveAssigner;
   1.141 +   RequestHandler  requestHandler;
   1.142 +   void           *semanticEnv;
   1.143 +   int32           thisCoresIdx;
   1.144 +      
   1.145 +   //======================== Initializations ========================
   1.146 +   masterEnv        = (MasterEnv*)_VMSMasterEnv;
   1.147 +   
   1.148 +   thisCoresIdx     = masterVP->coreAnimatedBy;
   1.149 +   schedSlots       = masterEnv->allSchedSlots[thisCoresIdx];
   1.150 +
   1.151 +   requestHandler   = masterEnv->requestHandler;
   1.152 +   slaveAssigner    = masterEnv->slaveAssigner;
   1.153 +   semanticEnv      = masterEnv->semanticEnv;
   1.154 +
   1.155 +   
   1.156 +   //======================== animationMaster ========================
   1.157 +   while(1){
   1.158 +       
   1.159 +      MEAS__Capture_Pre_Master_Point
   1.160 +
   1.161 +      //Scan the animation slots
   1.162 +   numSlotsFilled = 0;
   1.163 +   for( slotIdx = 0; slotIdx < NUM_SCHED_SLOTS; slotIdx++)
   1.164 +    {
   1.165 +      currSlot = schedSlots[ slotIdx ];
   1.166 +
   1.167 +         //Check if newly-done slave in slot, which will need request handld
   1.168 +      if( currSlot->workIsDone )
   1.169 +       {
   1.170 +         currSlot->workIsDone         = FALSE;
   1.171 +         currSlot->needsSlaveAssigned = TRUE;
   1.172 +
   1.173 +               MEAS__startReqHdlr;
   1.174 +               
   1.175 +            //process the requests made by the slave (held inside slave struc)
   1.176 +         (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
   1.177 +         
   1.178 +               MEAS__endReqHdlr;
   1.179 +       }
   1.180 +         //If slot empty, hand to Assigner to fill with a slave
   1.181 +      if( currSlot->needsSlaveAssigned )
   1.182 +       {    //Call plugin's Assigner to give slot a new slave
   1.183 +         assignedSlaveVP =
   1.184 +          (*slaveAssigner)( semanticEnv, currSlot );
   1.185 +         
   1.186 +            //put the chosen slave into slot, and adjust flags and state
   1.187 +         if( assignedSlaveVP != NULL )
   1.188 +          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
   1.189 +            assignedSlaveVP->schedSlot       = currSlot;
   1.190 +            currSlot->needsSlaveAssigned  = FALSE;
   1.191 +            numSlotsFilled               += 1;
   1.192 +          }
   1.193 +       }
   1.194 +    }
   1.195 +
   1.196 +   
   1.197 +   #ifdef SYS__TURN_ON_WORK_STEALING
   1.198 +      /*If no slots filled, means no more work, look for work to steal. */
   1.199 +   if( numSlotsFilled == 0 )
   1.200 +    { gateProtected_stealWorkInto( currSlot, readyToAnimateQ, masterVP );
   1.201 +    }
   1.202 +   #endif
   1.203 +
   1.204 +         MEAS__Capture_Post_Master_Point;
   1.205 +   
   1.206 +   masterSwitchToCoreCtlr(animatingSlv);
   1.207 +   flushRegisters();
   1.208 +   }//MasterLoop
   1.209 +
   1.210 +
   1.211 + }
   1.212 +
   1.213 +
   1.214 +//===========================  Work Stealing  ==============================
   1.215 +
   1.216 +/*This is first of two work-stealing approaches.  It's not used, but left
   1.217 + * in the code as a simple illustration of the principle.  This version
   1.218 + * has a race condition -- the core controllers are accessing their own
   1.219 + * animation slots at the same time that this work-stealer on a different
   1.220 + * core is..
   1.221 + *Because the core controllers run outside the master lock, this interaction
   1.222 + * is not protected.
   1.223 + */
   1.224 +void inline
   1.225 +stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
   1.226 +               SlaveVP *masterVP )
   1.227 + { 
   1.228 +   SlaveVP   *stolenSlv;
   1.229 +   int32        coreIdx, i;
   1.230 +   VMSQueueStruc *currQ;
   1.231 +
   1.232 +   stolenSlv = NULL;
   1.233 +   coreIdx = masterVP->coreAnimatedBy;
   1.234 +   for( i = 0; i < NUM_CORES -1; i++ )
   1.235 +    {
   1.236 +      if( coreIdx >= NUM_CORES -1 )
   1.237 +       { coreIdx = 0;
   1.238 +       }
   1.239 +      else
   1.240 +       { coreIdx++;
   1.241 +       }
   1.242 +      //TODO: fix this for coreCtlr scans slots
   1.243 +//      currQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
   1.244 +      if( numInVMSQ( currQ ) > 0 )
   1.245 +       { stolenSlv = readVMSQ (currQ );
   1.246 +         break;
   1.247 +       }
   1.248 +    }
   1.249 +
   1.250 +   if( stolenSlv != NULL )
   1.251 +    { currSlot->slaveAssignedToSlot = stolenSlv;
   1.252 +      stolenSlv->schedSlot           = currSlot;
   1.253 +      currSlot->needsSlaveAssigned  = FALSE;
   1.254 +
   1.255 +      writeVMSQ( stolenSlv, readyToAnimateQ );
   1.256 +    }
   1.257 + }
   1.258 +
   1.259 +/*This algorithm makes the common case fast.  Make the coreloop passive,
   1.260 + * and show its progress.  Make the stealer control a gate that coreloop
   1.261 + * has to pass.
   1.262 + *To avoid interference, only one stealer at a time.  Use a global
   1.263 + * stealer-lock, so only the stealer is slowed.
   1.264 + *
   1.265 + *The pattern is based on a gate -- stealer shuts the gate, then monitors
   1.266 + * to be sure any already past make it all the way out, before starting.
   1.267 + *So, have a "progress" measure just before the gate, then have two after it,
   1.268 + * one is in a "waiting room" outside the gate, the other is at the exit.
   1.269 + *Then, the stealer first shuts the gate, then checks the progress measure
   1.270 + * outside it, then looks to see if the progress measure at the exit is the
   1.271 + * same.  If yes, it knows the protected area is empty 'cause no other way
   1.272 + * to get in and the last to get in also exited.
   1.273 + *If the progress measure at the exit is not the same, then the stealer goes
   1.274 + * into a loop checking both the waiting-area and the exit progress-measures
   1.275 + * until one of them shows the same as the measure outside the gate.  Might
   1.276 + * as well re-read the measure outside the gate each go around, just to be
   1.277 + * sure.  It is guaranteed that one of the two will eventually match the one
   1.278 + * outside the gate.
   1.279 + *
   1.280 + *Here's an informal proof of correctness:
   1.281 + *The gate can be closed at any point, and have only four cases:
   1.282 + *  1) coreloop made it past the gate-closing but not yet past the exit
   1.283 + *  2) coreloop made it past the pre-gate progress update but not yet past
   1.284 + *     the gate,
   1.285 + *  3) coreloop is right before the pre-gate update
   1.286 + *  4) coreloop is past the exit and far from the pre-gate update.
   1.287 + *
   1.288 + * Covering the cases in reverse order,
   1.289 + *  4) is not a problem -- stealer will read pre-gate progress, see that it
   1.290 + *     matches exit progress, and the gate is closed, so stealer can proceed.
   1.291 + *  3) stealer will read pre-gate progress just after coreloop updates it..
   1.292 + *     so stealer goes into a loop until the coreloop causes wait-progress
   1.293 + *     to match pre-gate progress, so then stealer can proceed
   1.294 + *  2) same as 3..
   1.295 + *  1) stealer reads pre-gate progress, sees that it's different than exit,
   1.296 + *     so goes into loop until exit matches pre-gate, now it knows coreloop
   1.297 + *     is not in protected and cannot get back in, so can proceed.
   1.298 + *
   1.299 + *Implementation for the stealer:
   1.300 + *
   1.301 + *First, acquire the stealer lock -- only cores with no work to do will
   1.302 + * compete to steal, so not a big performance penalty having only one --
   1.303 + * will rarely have multiple stealers in a system with plenty of work -- and
   1.304 + * in a system with little work, it doesn't matter.
   1.305 + *
   1.306 + *Note, have single-reader, single-writer pattern for all variables used to
   1.307 + * communicate between stealer and victims
   1.308 + *
   1.309 + *So, scan the queues of the core controllers, until find non-empty.  Each core
   1.310 + * has its own list that it scans.  The list goes in order from closest to
   1.311 + * furthest core, so it steals first from close cores.  Later can add
   1.312 + * taking info from the app about overlapping footprints, and scan all the
   1.313 + * others then choose work with the most footprint overlap with the contents
   1.314 + * of this core's cache.
   1.315 + *
   1.316 + *Now, have a victim want to take work from.  So, shut the gate in that
   1.317 + * coreloop, by setting the "gate closed" var on its stack to TRUE.
   1.318 + *Then, read the core's pre-gate progress and compare to the core's exit
   1.319 + * progress.
   1.320 + *If same, can proceed to take work from the coreloop's queue.  When done,
   1.321 + * write FALSE to gate closed var.
   1.322 + *If different, then enter a loop that reads the pre-gate progress, then
   1.323 + * compares to exit progress then to wait progress.  When one of two
   1.324 + * matches, proceed.  Take work from the coreloop's queue.  When done,
   1.325 + * write FALSE to the gate closed var.
   1.326 + * 
   1.327 + */
   1.328 +void inline
   1.329 +gateProtected_stealWorkInto( SchedSlot *currSlot,
   1.330 +                             VMSQueueStruc *myReadyToAnimateQ,
   1.331 +                             SlaveVP *masterVP )
   1.332 + {
   1.333 +   SlaveVP     *stolenSlv;
   1.334 +   int32          coreIdx, i, haveAVictim, gotLock;
   1.335 +   VMSQueueStruc *victimsQ;
   1.336 +
   1.337 +   volatile GateStruc *vicGate;
   1.338 +   int32               coreMightBeInProtected;
   1.339 +
   1.340 +
   1.341 +
   1.342 +      //see if any other cores have work available to steal
   1.343 +   haveAVictim = FALSE;
   1.344 +   coreIdx = masterVP->coreAnimatedBy;
   1.345 +   for( i = 0; i < NUM_CORES -1; i++ )
   1.346 +    {
   1.347 +      if( coreIdx >= NUM_CORES -1 )
   1.348 +       { coreIdx = 0;
   1.349 +       }
   1.350 +      else
   1.351 +       { coreIdx++;
   1.352 +       }
   1.353 +      //TODO: fix this for coreCtlr scans slots
   1.354 +//      victimsQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
   1.355 +      if( numInVMSQ( victimsQ ) > 0 )
   1.356 +       { haveAVictim = TRUE;
   1.357 +         vicGate = _VMSMasterEnv->workStealingGates[ coreIdx ];
   1.358 +         break;
   1.359 +       }
   1.360 +    }
   1.361 +   if( !haveAVictim ) return;  //no work to steal, exit
   1.362 +
   1.363 +      //have a victim core, now get the stealer-lock
   1.364 +   gotLock =__sync_bool_compare_and_swap( &(_VMSMasterEnv->workStealingLock),
   1.365 +                                                          UNLOCKED, LOCKED );
   1.366 +   if( !gotLock ) return; //go back to core controller, which will re-start master
   1.367 +
   1.368 +
   1.369 +   //====== Start Gate-protection =======
   1.370 +   vicGate->gateClosed = TRUE;
   1.371 +   coreMightBeInProtected= vicGate->preGateProgress != vicGate->exitProgress;
   1.372 +   while( coreMightBeInProtected )
   1.373 +    {    //wait until sure
   1.374 +      if( vicGate->preGateProgress == vicGate->waitProgress )
   1.375 +         coreMightBeInProtected = FALSE;
   1.376 +      if( vicGate->preGateProgress == vicGate->exitProgress )
   1.377 +         coreMightBeInProtected = FALSE;
   1.378 +    }
   1.379 +
   1.380 +   stolenSlv = readVMSQ ( victimsQ );
   1.381 +
   1.382 +   vicGate->gateClosed = FALSE;
   1.383 +   //======= End Gate-protection  =======
   1.384 +
   1.385 +
   1.386 +   if( stolenSlv != NULL )  //victim could have been in protected and taken
   1.387 +    { currSlot->slaveAssignedToSlot = stolenSlv;
   1.388 +      stolenSlv->schedSlot           = currSlot;
   1.389 +      currSlot->needsSlaveAssigned  = FALSE;
   1.390 +
   1.391 +      writeVMSQ( stolenSlv, myReadyToAnimateQ );
   1.392 +    }
   1.393 +
   1.394 +      //unlock the work stealing lock
   1.395 +   _VMSMasterEnv->workStealingLock = UNLOCKED;
   1.396 + }
     2.1 --- a/CoreController.c	Thu Mar 15 06:36:37 2012 -0700
     2.2 +++ b/CoreController.c	Thu Mar 15 20:31:41 2012 -0700
     2.3 @@ -16,6 +16,7 @@
     2.4  
     2.5  //=====================  Functions local to this file =======================
     2.6  void *terminateCoreController(SlaveVP *currSlv);
     2.7 +
     2.8  inline void
     2.9  doBackoff_for_TooLongToGetLock( int32  numTriesToGetLock, uint32 *seed1, 
    2.10                                  uint32 *seed2 );
    2.11 @@ -35,32 +36,32 @@
    2.12   * core controller, which then chooses which VP the core animates next.
    2.13   *
    2.14   *The way the core controller decides which VP to switch the core to next is:
    2.15 - * 1) There are a number of "scheduling slots", which the master VP fills up
    2.16 + * 1) There are a number of "animation slots", which the master VP fills up
    2.17   *    with slave VPs that are ready to be animated.  So, the core controller
    2.18 - *    just iterates through the scheduling slots.  When the next slot has a
    2.19 + *    just iterates through the animation slots.  When the next slot has a
    2.20   *    slave VP in it, the core controller switches the core over to animate
    2.21   *    that slave.
    2.22 - * 2) When the core controller checks a scheduling slot, and it's empty,
    2.23 + * 2) When the core controller checks a animation slot, and it's empty,
    2.24   *    then the controller switches the core over to animating the master VP,
    2.25   *    whose job is to find more slave VPs ready, and assign those to 
    2.26 - *    scheduling slots.
    2.27 + *    animation slots.
    2.28   *
    2.29 - *So, in effect, a scheduling slot functions as another layer of virtual
    2.30 + *So, in effect, a animation slot functions as another layer of virtual
    2.31   * processor.  A slot has the logical meaning of being an animator that
    2.32   * animates the slave assigned to it.  However, the core controller sits
    2.33   * below the slots, and sequences down them, assigning the actual physical
    2.34   * core to each slot, in turn.
    2.35 - *The reason for having the scheduling slots and core controller is to 
    2.36 + *The reason for having the animation slots and core controller is to 
    2.37   * amortize the overhead of switching to the master VP and running it.  With
    2.38 - * multiple scheduling slots, the time to switch-to-master and the code in
    2.39 - * the master loop is divided by the number of scheduling slots.
    2.40 - *The core controller and scheduling slots are not fundamental parts of VMS,
    2.41 + * multiple animation slots, the time to switch-to-master and the code in
    2.42 + * the master loop is divided by the number of animation slots.
    2.43 + *The core controller and animation slots are not fundamental parts of VMS,
    2.44   * but rather optimizations put into the shared-semantic-state version of
    2.45   * VMS.  Other versions of VMS will not have a core controller nor scheduling
    2.46   * slots.
    2.47   * 
    2.48   *The core controller "owns" the physical core, in effect, and is the 
    2.49 - * function given to the pthread creation call.  Hence, it contains code
    2.50 + * function given to the pthread's creation call.  Hence, it contains code
    2.51   * related to pthread startup, synchronizing the controllers to all start
    2.52   * at the same time-point, and pinning the pthreads to physical cores.
    2.53   * 
    2.54 @@ -82,7 +83,7 @@
    2.55        //Variables used during measurements
    2.56     TSCountLowHigh  endSusp;
    2.57        //Variables used in random-backoff, for master-lock and waiting for work
    2.58 -   uint32_t seed1 = rand()%1000; // init random number generator for retries
    2.59 +   uint32_t seed1 = rand()%1000; // init random number generator for backoffs
    2.60     uint32_t seed2 = rand()%1000;
    2.61        //Variable for work-stealing -- a gate protects a critical section
    2.62     volatile GateStruc gate;      //on stack to avoid false-sharing
    2.63 @@ -146,7 +147,7 @@
    2.64  
    2.65        
    2.66        if( ! currSlot->needsSlaveAssigned ) //slot does have slave assigned
    2.67 -       { numRepetitionsWithNoWork = 0;     //reset B2B master count
    2.68 +       { numRepetitionsWithNoWork = 0;     //reset back2back master count
    2.69           currSlotIdx ++;
    2.70           currVP = currSlot->slaveAssignedToSlot;
    2.71         }
    2.72 @@ -206,12 +207,15 @@
    2.73      }//while(1)
    2.74   }
    2.75  
    2.76 -
    2.77 +/*Shutdown of VMS involves several steps, of which this is the last.  This
    2.78 + * function is jumped to from the asmTerminateCoreCtrl, which is in turn
    2.79 + * called from endOSThreadFn, which is the top-level-fn of the shutdown
    2.80 + * slaves.
    2.81 + */
    2.82  void *
    2.83  terminateCoreCtlr(SlaveVP *currSlv)
    2.84   {
    2.85 -   //first free shutdown Slv that jumped here -- it first restores the
    2.86 -   // coreloop's stack, so addr of currSlv in stack frame is still correct
    2.87 +   //first, free shutdown Slv that jumped here, then end the pthread
    2.88     VMS_int__dissipate_slaveVP( currSlv );
    2.89     pthread_exit( NULL );
    2.90   }
    2.91 @@ -238,6 +242,8 @@
    2.92   { int32 i, waitIterations;
    2.93     volatile double fakeWorkVar; //busy-wait fake work
    2.94   
    2.95 +     //Get a random number of iterations to busy-wait.  The % is a simple
    2.96 +     // way to set the maximum value that can be generated.
    2.97     waitIterations = 
    2.98      randomNumber(seed1, seed2) % 
    2.99      (numRepsWithNoWork * numRepsWithNoWork * NUM_CORES);
     3.1 --- a/Hardware_Dependent/VMS__primitives_asm.s	Thu Mar 15 06:36:37 2012 -0700
     3.2 +++ b/Hardware_Dependent/VMS__primitives_asm.s	Thu Mar 15 20:31:41 2012 -0700
     3.3 @@ -106,13 +106,15 @@
     3.4      ret
     3.5  
     3.6  
     3.7 -//Switch to terminateCoreCtlr
     3.8 -//therefor switch to coreCtlr context from master context
     3.9 -// no need to call because the stack is already set up for switchSlv
    3.10 -// and Slv is in %rdi
    3.11 -// and both functions have the same argument.
    3.12 -// do not save register of Slv because this function will never return
    3.13 -/* SlaveVP  offsets:
    3.14 +/*Switch to terminateCoreCtlr
    3.15 + *This is called by endOSThreadFn, which is the top-level function given
    3.16 + * to a shutdown slave.  When such a slave gets switched to, by the core
    3.17 + * controller, it runs the top-level function, which calls this, which
    3.18 + * then calls terminateCoreCtlr, which ends the pthread.  Note, when get
    3.19 + * here, stack is already set up for switchSlv and Slv ptr is in %rdi.
    3.20 + *Do not save registers of Slv because this function will never return
    3.21 + *
    3.22 + * SlaveVP  offsets:
    3.23   * 0x10  stackPtr
    3.24   * 0x18 framePtr
    3.25   * 0x20 resumeInstrPtr
    3.26 @@ -124,12 +126,11 @@
    3.27   * 0x8 masterLock
    3.28   */
    3.29  .globl asmTerminateCoreCtlr
    3.30 -asmTerminateCoreCtlr:
    3.31 -    #SlaveVP in %rdi
    3.32 +asmTerminateCoreCtlr:                #SlaveVP ptr is in %rdi
    3.33      movq    0x38(%rdi), %rsp         #restore stack pointer
    3.34      movq    0x30(%rdi), %rbp         #restore frame pointer
    3.35      movq    $terminateCoreCtlr, %rax
    3.36 -    jmp     *%rax                    #jmp to CoreCtlr
    3.37 +    jmp     *%rax                    #jmp to fn that ends the pthread
    3.38  
    3.39  
    3.40  /*
     4.1 --- a/SchedulingMaster.c	Thu Mar 15 06:36:37 2012 -0700
     4.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     4.3 @@ -1,350 +0,0 @@
     4.4 -/*
     4.5 - * Copyright 2010  OpenSourceStewardshipFoundation
     4.6 - * 
     4.7 - * Licensed under BSD
     4.8 - */
     4.9 -
    4.10 -
    4.11 -
    4.12 -#include <stdio.h>
    4.13 -#include <stddef.h>
    4.14 -
    4.15 -#include "VMS.h"
    4.16 -
    4.17 -
    4.18 -//===========================================================================
    4.19 -void inline
    4.20 -stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
    4.21 -               SlaveVP *masterVP );
    4.22 -
    4.23 -//===========================================================================
    4.24 -
    4.25 -
    4.26 -
    4.27 -/*This code is animated by the virtual Master processor.
    4.28 - *
    4.29 - *Polls each sched slot exactly once, hands any requests made by a newly
    4.30 - * done slave to the "request handler" plug-in function
    4.31 - *
    4.32 - *Any slots that need a Slv assigned are given to the "assign"
    4.33 - * plug-in function, which tries to assign a Slv (slave) to it.
    4.34 - *
    4.35 - *When all slots needing a processor have been given to the assign plug-in,
    4.36 - * a fraction of the slaves successfully assigned are put into the
    4.37 - * work queue, then a continuation of this function is put in, then the rest
    4.38 - * of the Slvs that were successfully assigned.
    4.39 - *
    4.40 - *The first thing the continuation does is busy-wait until the previous
    4.41 - * animation completes.  This is because an (unlikely) continuation may
    4.42 - * sneak through queue before previous continuation is done putting second
    4.43 - * part of assigned slaves in, which is the only race condition.
    4.44 - *
    4.45 - */
    4.46 -
    4.47 -/*May 29, 2010 -- birth a Master during init so that first core controller to
    4.48 - * start running gets it and does all the stuff for a newly born --
    4.49 - * from then on, will be doing continuation, but do suspension self
    4.50 - * directly at end of master loop
    4.51 - *So VMS_WL__init just births the master virtual processor same way it births
    4.52 - * all the others -- then does any extra setup needed and puts it into the
    4.53 - * work queue.
    4.54 - *However means have to make masterEnv a global static volatile the same way
    4.55 - * did with readyToAnimateQ in core controller.  -- for performance, put the
    4.56 - * jump to the core controller directly in here, and have it directly jump back.
    4.57 - *
    4.58 - *
    4.59 - *Aug 18, 2010 -- Going to a separate MasterVP for each core, to see if this
    4.60 - * avoids the suspected bug in the system stack that causes bizarre faults
    4.61 - * at random places in the system code.
    4.62 - *
    4.63 - *So, this function is coupled to each of the MasterVPs, -- meaning this
    4.64 - * function can't rely on a particular stack and frame -- each MasterVP that
    4.65 - * animates this function has a different one.
    4.66 - *
    4.67 - *At this point, the schedulingMaster does not write itself into the queue anymore,
    4.68 - * instead, the coreCtlr acquires the masterLock when it has nothing to
    4.69 - * animate, and then animates its own schedulingMaster.  However, still try to put
    4.70 - * several AppSlvs into the queue to amortize the startup cost of switching
    4.71 - * to the MasterVP.  Note, don't have to worry about latency of requests much
    4.72 - * because most requests generate work for same core -- only latency issue
    4.73 - * is case when other cores starved and one core's requests generate work
    4.74 - * for them -- so keep max in queue to 3 or 4..
    4.75 - */
    4.76 -void schedulingMaster( void *initData, SlaveVP *animatingSlv )
    4.77 - { 
    4.78 -   int32           slotIdx, numSlotsFilled;
    4.79 -   SlaveVP        *schedSlaveVP;
    4.80 -   SchedSlot      *currSlot, **schedSlots;
    4.81 -   MasterEnv      *masterEnv;
    4.82 -   VMSQueueStruc  *readyToAnimateQ;
    4.83 -   
    4.84 -   SlaveAssigner  slaveAssigner;
    4.85 -   RequestHandler  requestHandler;
    4.86 -   void           *semanticEnv;
    4.87 -
    4.88 -   int32           thisCoresIdx;
    4.89 -   SlaveVP        *masterVP;
    4.90 -   volatile        SlaveVP *volatileMasterVP;
    4.91 -   
    4.92 -   volatileMasterVP = animatingSlv;
    4.93 -   masterVP         = (SlaveVP*)volatileMasterVP; //used to force re-define after jmp
    4.94 -
    4.95 -      //First animation of each MasterVP will in turn animate this part
    4.96 -      // of setup code.. (Slv creator sets up the stack as if this function
    4.97 -      // was called normally, but actually get here by jmp)
    4.98 -      //So, setup values about stack ptr, jmp pt and all that
    4.99 -   //masterVP->resumeInstrPtr = &&schedulingMasterStartPt;
   4.100 -
   4.101 -
   4.102 -      //Note, got rid of writing the stack and frame ptr up here, because
   4.103 -      // only one
   4.104 -      // core can ever animate a given MasterVP, so don't need to communicate
   4.105 -      // new frame and stack ptr to the MasterVP storage before a second
   4.106 -      // version of that MasterVP can get animated on a different core.
   4.107 -      //Also got rid of the busy-wait.
   4.108 -
   4.109 -   
   4.110 -   //schedulingMasterStartPt:
   4.111 -   while(1){
   4.112 -       
   4.113 -      MEAS__Capture_Pre_Master_Point
   4.114 -
   4.115 -   masterEnv        = (MasterEnv*)_VMSMasterEnv;
   4.116 -   
   4.117 -      //GCC may optimize so doesn't always re-define from frame-storage
   4.118 -   masterVP         = (SlaveVP*)volatileMasterVP;  //just to make sure after jmp
   4.119 -   thisCoresIdx     = masterVP->coreAnimatedBy;
   4.120 -   schedSlots       = masterEnv->allSchedSlots[thisCoresIdx];
   4.121 -
   4.122 -   requestHandler   = masterEnv->requestHandler;
   4.123 -   slaveAssigner    = masterEnv->slaveAssigner;
   4.124 -   semanticEnv      = masterEnv->semanticEnv;
   4.125 -
   4.126 -
   4.127 -      //Poll each slot's Done flag
   4.128 -   numSlotsFilled = 0;
   4.129 -   for( slotIdx = 0; slotIdx < NUM_SCHED_SLOTS; slotIdx++)
   4.130 -    {
   4.131 -      currSlot = schedSlots[ slotIdx ];
   4.132 -
   4.133 -      if( currSlot->workIsDone )
   4.134 -       {
   4.135 -         currSlot->workIsDone         = FALSE;
   4.136 -         currSlot->needsSlaveAssigned = TRUE;
   4.137 -
   4.138 -               MEAS__startReqHdlr;
   4.139 -               
   4.140 -            //process the requests made by the slave (held inside slave struc)
   4.141 -         (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
   4.142 -         
   4.143 -               MEAS__endReqHdlr;
   4.144 -       }
   4.145 -      if( currSlot->needsSlaveAssigned )
   4.146 -       {    //give slot a new Slv
   4.147 -         schedSlaveVP =
   4.148 -          (*slaveAssigner)( semanticEnv, thisCoresIdx, currSlot );
   4.149 -         
   4.150 -         if( schedSlaveVP != NULL )
   4.151 -          { currSlot->slaveAssignedToSlot = schedSlaveVP;
   4.152 -            schedSlaveVP->schedSlot       = currSlot;
   4.153 -            currSlot->needsSlaveAssigned  = FALSE;
   4.154 -            numSlotsFilled               += 1;
   4.155 -          }
   4.156 -       }
   4.157 -    }
   4.158 -
   4.159 -   
   4.160 -   #ifdef SYS__TURN_ON_WORK_STEALING
   4.161 -      //If no slots filled, means no more work, look for work to steal.
   4.162 -   if( numSlotsFilled == 0 )
   4.163 -    { gateProtected_stealWorkInto( currSlot, readyToAnimateQ, masterVP );
   4.164 -    }
   4.165 -   #endif
   4.166 -
   4.167 -         MEAS__Capture_Post_Master_Point;
   4.168 -   
   4.169 -   masterSwitchToCoreCtlr(animatingSlv);
   4.170 -   flushRegisters();
   4.171 -   }//MasterLoop
   4.172 -
   4.173 -
   4.174 - }
   4.175 -
   4.176 -
   4.177 -
   4.178 -/*This has a race condition -- the coreloops are accessing their own queues
   4.179 - * at the same time that this work-stealer on a different core is trying to
   4.180 - */
   4.181 -void inline
   4.182 -stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
   4.183 -               SlaveVP *masterVP )
   4.184 - { 
   4.185 -   SlaveVP   *stolenSlv;
   4.186 -   int32        coreIdx, i;
   4.187 -   VMSQueueStruc *currQ;
   4.188 -
   4.189 -   stolenSlv = NULL;
   4.190 -   coreIdx = masterVP->coreAnimatedBy;
   4.191 -   for( i = 0; i < NUM_CORES -1; i++ )
   4.192 -    {
   4.193 -      if( coreIdx >= NUM_CORES -1 )
   4.194 -       { coreIdx = 0;
   4.195 -       }
   4.196 -      else
   4.197 -       { coreIdx++;
   4.198 -       }
   4.199 -      //TODO: fix this for coreCtlr scans slots
   4.200 -//      currQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
   4.201 -      if( numInVMSQ( currQ ) > 0 )
   4.202 -       { stolenSlv = readVMSQ (currQ );
   4.203 -         break;
   4.204 -       }
   4.205 -    }
   4.206 -
   4.207 -   if( stolenSlv != NULL )
   4.208 -    { currSlot->slaveAssignedToSlot = stolenSlv;
   4.209 -      stolenSlv->schedSlot           = currSlot;
   4.210 -      currSlot->needsSlaveAssigned  = FALSE;
   4.211 -
   4.212 -      writeVMSQ( stolenSlv, readyToAnimateQ );
   4.213 -    }
   4.214 - }
   4.215 -
   4.216 -/*This algorithm makes the common case fast.  Make the coreloop passive,
   4.217 - * and show its progress.  Make the stealer control a gate that coreloop
   4.218 - * has to pass.
   4.219 - *To avoid interference, only one stealer at a time.  Use a global
   4.220 - * stealer-lock.
   4.221 - *
   4.222 - *The pattern is based on a gate -- stealer shuts the gate, then monitors
   4.223 - * to be sure any already past make it all the way out, before starting.
   4.224 - *So, have a "progress" measure just before the gate, then have two after it,
   4.225 - * one is in a "waiting room" outside the gate, the other is at the exit.
   4.226 - *Then, the stealer first shuts the gate, then checks the progress measure
   4.227 - * outside it, then looks to see if the progress measure at the exit is the
   4.228 - * same.  If yes, it knows the protected area is empty 'cause no other way
   4.229 - * to get in and the last to get in also exited.
   4.230 - *If the progress measure at the exit is not the same, then the stealer goes
   4.231 - * into a loop checking both the waiting-area and the exit progress-measures
   4.232 - * until one of them shows the same as the measure outside the gate.  Might
   4.233 - * as well re-read the measure outside the gate each go around, just to be
   4.234 - * sure.  It is guaranteed that one of the two will eventually match the one
   4.235 - * outside the gate.
   4.236 - *
   4.237 - *Here's an informal proof of correctness:
   4.238 - *The gate can be closed at any point, and have only four cases:
   4.239 - *  1) coreloop made it past the gate-closing but not yet past the exit
   4.240 - *  2) coreloop made it past the pre-gate progress update but not yet past
   4.241 - *     the gate,
   4.242 - *  3) coreloop is right before the pre-gate update
   4.243 - *  4) coreloop is past the exit and far from the pre-gate update.
   4.244 - *
   4.245 - * Covering the cases in reverse order,
   4.246 - *  4) is not a problem -- stealer will read pre-gate progress, see that it
   4.247 - *     matches exit progress, and the gate is closed, so stealer can proceed.
   4.248 - *  3) stealer will read pre-gate progress just after coreloop updates it..
   4.249 - *     so stealer goes into a loop until the coreloop causes wait-progress
   4.250 - *     to match pre-gate progress, so then stealer can proceed
   4.251 - *  2) same as 3..
   4.252 - *  1) stealer reads pre-gate progress, sees that it's different than exit,
   4.253 - *     so goes into loop until exit matches pre-gate, now it knows coreloop
   4.254 - *     is not in protected and cannot get back in, so can proceed.
   4.255 - *
   4.256 - *Implementation for the stealer:
   4.257 - *
   4.258 - *First, acquire the stealer lock -- only cores with no work to do will
   4.259 - * compete to steal, so not a big performance penalty having only one --
   4.260 - * will rarely have multiple stealers in a system with plenty of work -- and
   4.261 - * in a system with little work, it doesn't matter.
   4.262 - *
   4.263 - *Note, have single-reader, single-writer pattern for all variables used to
   4.264 - * communicate between stealer and victims
   4.265 - *
   4.266 - *So, scan the queues of the core controllers, until find non-empty.  Each core
   4.267 - * has its own list that it scans.  The list goes in order from closest to
   4.268 - * furthest core, so it steals first from close cores.  Later can add
   4.269 - * taking info from the app about overlapping footprints, and scan all the
   4.270 - * others then choose work with the most footprint overlap with the contents
   4.271 - * of this core's cache.
   4.272 - *
   4.273 - *Now, have a victim want to take work from.  So, shut the gate in that
   4.274 - * coreloop, by setting the "gate closed" var on its stack to TRUE.
   4.275 - *Then, read the core's pre-gate progress and compare to the core's exit
   4.276 - * progress.
   4.277 - *If same, can proceed to take work from the coreloop's queue.  When done,
   4.278 - * write FALSE to gate closed var.
   4.279 - *If different, then enter a loop that reads the pre-gate progress, then
   4.280 - * compares to exit progress then to wait progress.  When one of two
   4.281 - * matches, proceed.  Take work from the coreloop's queue.  When done,
   4.282 - * write FALSE to the gate closed var.
   4.283 - * 
   4.284 - */
   4.285 -void inline
   4.286 -gateProtected_stealWorkInto( SchedSlot *currSlot,
   4.287 -                             VMSQueueStruc *myReadyToAnimateQ,
   4.288 -                             SlaveVP *masterVP )
   4.289 - {
   4.290 -   SlaveVP     *stolenSlv;
   4.291 -   int32          coreIdx, i, haveAVictim, gotLock;
   4.292 -   VMSQueueStruc *victimsQ;
   4.293 -
   4.294 -   volatile GateStruc *vicGate;
   4.295 -   int32               coreMightBeInProtected;
   4.296 -
   4.297 -
   4.298 -
   4.299 -      //see if any other cores have work available to steal
   4.300 -   haveAVictim = FALSE;
   4.301 -   coreIdx = masterVP->coreAnimatedBy;
   4.302 -   for( i = 0; i < NUM_CORES -1; i++ )
   4.303 -    {
   4.304 -      if( coreIdx >= NUM_CORES -1 )
   4.305 -       { coreIdx = 0;
   4.306 -       }
   4.307 -      else
   4.308 -       { coreIdx++;
   4.309 -       }
   4.310 -      //TODO: fix this for coreCtlr scans slots
   4.311 -//      victimsQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
   4.312 -      if( numInVMSQ( victimsQ ) > 0 )
   4.313 -       { haveAVictim = TRUE;
   4.314 -         vicGate = _VMSMasterEnv->workStealingGates[ coreIdx ];
   4.315 -         break;
   4.316 -       }
   4.317 -    }
   4.318 -   if( !haveAVictim ) return;  //no work to steal, exit
   4.319 -
   4.320 -      //have a victim core, now get the stealer-lock
   4.321 -   gotLock =__sync_bool_compare_and_swap( &(_VMSMasterEnv->workStealingLock),
   4.322 -                                                          UNLOCKED, LOCKED );
   4.323 -   if( !gotLock ) return; //go back to core controller, which will re-start master
   4.324 -
   4.325 -
   4.326 -   //====== Start Gate-protection =======
   4.327 -   vicGate->gateClosed = TRUE;
   4.328 -   coreMightBeInProtected= vicGate->preGateProgress != vicGate->exitProgress;
   4.329 -   while( coreMightBeInProtected )
   4.330 -    {    //wait until sure
   4.331 -      if( vicGate->preGateProgress == vicGate->waitProgress )
   4.332 -         coreMightBeInProtected = FALSE;
   4.333 -      if( vicGate->preGateProgress == vicGate->exitProgress )
   4.334 -         coreMightBeInProtected = FALSE;
   4.335 -    }
   4.336 -
   4.337 -   stolenSlv = readVMSQ ( victimsQ );
   4.338 -
   4.339 -   vicGate->gateClosed = FALSE;
   4.340 -   //======= End Gate-protection  =======
   4.341 -
   4.342 -
   4.343 -   if( stolenSlv != NULL )  //victim could have been in protected and taken
   4.344 -    { currSlot->slaveAssignedToSlot = stolenSlv;
   4.345 -      stolenSlv->schedSlot           = currSlot;
   4.346 -      currSlot->needsSlaveAssigned  = FALSE;
   4.347 -
   4.348 -      writeVMSQ( stolenSlv, myReadyToAnimateQ );
   4.349 -    }
   4.350 -
   4.351 -      //unlock the work stealing lock
   4.352 -   _VMSMasterEnv->workStealingLock = UNLOCKED;
   4.353 - }
     5.1 --- a/VMS.h	Thu Mar 15 06:36:37 2012 -0700
     5.2 +++ b/VMS.h	Thu Mar 15 20:31:41 2012 -0700
     5.3 @@ -44,7 +44,7 @@
     5.4  typedef struct _GateStruc     GateStruc;
     5.5  
     5.6  
     5.7 -typedef SlaveVP *(*SlaveAssigner)  ( void *, int, SchedSlot *); //semEnv, coreIdx, slot for HW info
     5.8 +typedef SlaveVP *(*SlaveAssigner)  ( void *, SchedSlot*); //semEnv, slot for HW info
     5.9  typedef void     (*RequestHandler) ( SlaveVP *, void * ); //prWReqst, semEnv
    5.10  typedef void     (*TopLevelFnPtr)  ( void *, SlaveVP * ); //initData, animSlv
    5.11  typedef void       TopLevelFn      ( void *, SlaveVP * ); //initData, animSlv
    5.12 @@ -98,10 +98,13 @@
    5.13  
    5.14  struct _SchedSlot
    5.15   {
    5.16 -   int         slotIdx;     //needed by Holistic Model's data gathering
    5.17 -   int         workIsDone;
    5.18 -   int         needsSlaveAssigned;
    5.19 -   SlaveVP    *slaveAssignedToSlot;
    5.20 +   int           workIsDone;
    5.21 +   int           needsSlaveAssigned;
    5.22 +   SlaveVP      *slaveAssignedToSlot;
    5.23 +   
    5.24 +   int           slotIdx;  //needed by Holistic Model's data gathering
    5.25 +   int           coreOfSlot;
    5.26 +   SlotPerfInfo *perfInfo; //used by assigner to pick best slave for core
    5.27   };
    5.28  //SchedSlot
    5.29  
    5.30 @@ -134,41 +137,46 @@
    5.31   };
    5.32  //SlaveVP
    5.33  
    5.34 -
    5.35 -/*WARNING: re-arranging this data structure could cause Slv-switching
    5.36 - *         assembly code to fail -- hard-codes offsets of fields
    5.37 - *         (because -O3 messes with things otherwise)
    5.38 +/*The one and only global variable, holds many odds and ends
    5.39   */
    5.40  typedef struct
    5.41 - {    //The offset of these fields is hard-coded into assembly
    5.42 + {    //The offsets of these fields are hard-coded into assembly
    5.43     void            *coreCtlrReturnPt;    //offset of field used in asm
    5.44     int32            masterLock __align_to_cacheline__;   //used in asm
    5.45     
    5.46 -      //below this, no asm uses the field offsets
    5.47 +      //============ below this, no asm uses the field offsets =============
    5.48 +
    5.49 +      //Basic VMS infrastructure
    5.50 +   SlaveVP        **masterVPs;
    5.51 +   SchedSlot     ***allSchedSlots;
    5.52 +   
    5.53 +      //plugin related
    5.54     SlaveAssigner    slaveAssigner;
    5.55     RequestHandler   requestHandler;
    5.56 +   void            *semanticEnv;
    5.57     
    5.58 -   SchedSlot     ***allSchedSlots;
    5.59 -   SlaveVP        **masterVPs;
    5.60 +      //Slave creation
    5.61 +   int32            numSlavesCreated;  //gives ordering to processor creation
    5.62 +   int32            numSlavesAlive;    //used to detect fail-safe shutdown
    5.63  
    5.64 -   void            *semanticEnv;
    5.65 -   void            *OSEventStruc;   //for future, when add I/O to BLIS
    5.66 +      //Initialization related
    5.67 +   int32            setupComplete;      //use while starting up coreCtlr
    5.68 +
    5.69 +      //Memory management related
    5.70     MallocArrays    *freeLists;
    5.71 -   int32            amtOfOutstandingMem; //total currently allocated
    5.72 -
    5.73 -   int32            setupComplete;  //use while starting up coreCtlr
    5.74 +   int32            amtOfOutstandingMem;//total currently allocated
    5.75 +   
    5.76 +      //Work-stealing related
    5.77     GateStruc       *workStealingGates[ NUM_CORES ]; //concurrent work-steal
    5.78     int32            workStealingLock;
    5.79     
    5.80 -   int32            numSlavesCreated; //gives ordering to processor creation
    5.81 -   int32            numSlavesAlive;   //used to detect when to shutdown
    5.82 -
    5.83 +   
    5.84        //=========== MEASUREMENT STUFF =============
    5.85         IntervalProbe   **intervalProbes;
    5.86         PrivDynArrayInfo *dynIntervalProbesInfo;
    5.87         HashTable        *probeNameHashTbl;
    5.88         int32             masterCreateProbeID;
    5.89 -       float64           createPtInSecs;
    5.90 +       float64           createPtInSecs; //real-clock time VMS initialized
    5.91         Histogram       **measHists;
    5.92         PrivDynArrayInfo *measHistsInfo;
    5.93         MEAS__Insert_Susp_Meas_Fields_into_MasterEnv;
    5.94 @@ -201,7 +209,7 @@
    5.95  
    5.96  void * coreController( void *paramsIn );  //standard PThreads fn prototype
    5.97  void * coreCtlr_Seq( void *paramsIn );  //standard PThreads fn prototype
    5.98 -void schedulingMaster( void *initData, SlaveVP *masterVP );
    5.99 +void animationMaster( void *initData, SlaveVP *masterVP );
   5.100  
   5.101  
   5.102  typedef struct
     6.1 --- a/VMS__startup_and_shutdown.c	Thu Mar 15 06:36:37 2012 -0700
     6.2 +++ b/VMS__startup_and_shutdown.c	Thu Mar 15 20:31:41 2012 -0700
     6.3 @@ -18,9 +18,6 @@
     6.4  #define thdAttrs NULL
     6.5  
     6.6  //===========================================================================
     6.7 -void
     6.8 -shutdownFn( void *dummy, SlaveVP *dummy2 );
     6.9 -
    6.10  SchedSlot **
    6.11  create_sched_slots();
    6.12  
    6.13 @@ -44,7 +41,7 @@
    6.14   *    the master Slv into the work-queue, ready for first "call"
    6.15   * 2) Semantic layer then does its own init, which creates the seed virt
    6.16   *    slave inside the semantic layer, ready to assign it when
    6.17 - *    asked by the first run of the schedulingMaster.
    6.18 + *    asked by the first run of the animationMaster.
    6.19   *
    6.20   *This part is bit weird because VMS really wants to be "always there", and
    6.21   * have applications attach and detach..  for now, this VMS is part of
    6.22 @@ -52,7 +49,7 @@
    6.23   *
    6.24   *The semantic layer is isolated from the VMS internals by making the
    6.25   * semantic layer do setup to a state that it's ready with its
    6.26 - * initial Slvs, ready to assign them to slots when the schedulingMaster
    6.27 + * initial Slvs, ready to assign them to slots when the animationMaster
    6.28   * asks.  Without this pattern, the semantic layer's setup would
    6.29   * have to modify slots directly to assign the initial virt-procrs, and put
    6.30   * them into the readyToAnimateQ itself, breaking the isolation completely.
    6.31 @@ -230,16 +227,19 @@
    6.32  */
    6.33  
    6.34  /*Turns off the VMS system, and frees all data associated with it.  Does this
    6.35 - * by creating shutdown SlaveVPs and inserting them into scheduling slots.
    6.36 + * by creating shutdown SlaveVPs and inserting them into animation slots.
    6.37   * Will probably have to wake up sleeping cores as part of this -- the fn that
    6.38   * inserts the new SlaveVPs should handle the wakeup..
    6.39   */
    6.40  /*
    6.41  void
    6.42 +VMS_SS__shutdown(); //already defined -- look at it
    6.43 +
    6.44 +void
    6.45  VMS__shutdown()
    6.46   {
    6.47     for( cores )
    6.48 -    { slave = VMS_int__create_new_SlaveVP( shutdownFn, NULL );
    6.49 +    { slave = VMS_int__create_new_SlaveVP( endOSThreadFn, NULL );
    6.50        VMS_int__insert_slave_onto_core( SlaveVP *slave, coreNum );
    6.51      }
    6.52   }
    6.53 @@ -293,7 +293,7 @@
    6.54        readyToAnimateQs[ coreIdx ] = makeVMSQ();
    6.55        
    6.56           //Q: should give masterVP core-specific info as its init data?
    6.57 -      masterVPs[ coreIdx ] = VMS_int__create_slaveVP( (TopLevelFnPtr)&schedulingMaster, (void*)masterEnv );
    6.58 +      masterVPs[ coreIdx ] = VMS_int__create_slaveVP( (TopLevelFnPtr)&animationMaster, (void*)masterEnv );
    6.59        masterVPs[ coreIdx ]->coreAnimatedBy = coreIdx;
    6.60        allSchedSlots[ coreIdx ] = create_sched_slots(); //makes for one core
    6.61        _VMSMasterEnv->workStealingGates[ coreIdx ] = NULL;
     7.1 --- a/__README__Code_Overview.txt	Thu Mar 15 06:36:37 2012 -0700
     7.2 +++ b/__README__Code_Overview.txt	Thu Mar 15 20:31:41 2012 -0700
     7.3 @@ -18,3 +18,4 @@
     7.4  
     7.5  -] VMS has several "VMS primitives" implemented with assembly code.  The net effect of these assembly functions is to perform the switching between application code and the VMS system.
     7.6  
     7.7 +-] The heart of this multi-core version of VMS is the AnimationMaster and CoreController.  Those files have large comments explaining the nature of VMS and this implementation.  Those comments are the best place to start reading, to get an understanding of the code before tracing through it.
     7.8 \ No newline at end of file