VMS/VMS_Implementations/VMS_impls/VMS__MC_shared_impl: AnimationMaster.c comparison

comparison AnimationMaster.c @ 230:f2a7831352dc

changed SchedulingMaster.c to AnimationMaster.c and cleaned up all comments

author	Some Random Person <seanhalle@yahoo.com>
date	Thu, 15 Mar 2012 20:31:41 -0700
parents
children	88fd85921d7f

comparison

equal deleted inserted replaced

--1:000000000000
+:5ceadea89150
+/*
+* Copyright 2010  OpenSourceStewardshipFoundation
+*
+* Licensed under BSD
+*/
+#include <stdio.h>
+#include <stddef.h>
+#include "VMS.h"
+//========================= Local Fn Prototypes =============================
+void inline
+stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
+SlaveVP *masterVP );
+//===========================================================================
+/*The animationMaster embodies most of the animator of the language.  The
+* animator is what emodies the behavior of language constructs.
+* As such, it is the animationMaster, in combination with the plugin
+* functions, that make the language constructs do their behavior.
+*
+*Within the code, this is the top-level-function of the masterVPs, and
+* runs when the coreController has no more slave VPs.  It's job is to
+* refill the animation slots with slaves.
+*
+*To do this, it scans the animation slots for just-completed slaves.
+* Each of these has a request in it.  So, the master hands each to the
+* plugin's request handler.
+*Each request represents a language construct that has been encountered
+* by the application code in the slave. Passing the request to the
+* request handler is how that language construct's behavior gets invoked.
+* The request handler then performs the actions of the construct's
+* behavior. So, the request handler encodes the behavior of the
+* language's parallelism constructs, and performs that when the master
+* hands it a slave containing a request to perform that construct.
+*
+*On a shared-memory machine, the behavior of parallelism constructs
+* equals control, over order of execution of code.  Hence, the behavior
+* of the language constructs performed by the request handler is to
+* choose the order that slaves get animated, and thereby control the
+* order that application code in the slaves executes.
+*
+*To control order of animation of slaves, the request handler has a
+* semantic environment that holds data structures used to hold slaves
+* and choose when they're ready to be animated.
+*
+*Once a slave is marked as ready to be animated by the request handler,
+* it is the second plugin function, the Assigner, which chooses the core
+* the slave gets assigned to for animation.  Hence, the Assigner doesn't
+* perform any of the semantic behavior of language constructs, rather
+* it gives the language a chance to improve performance. The performance
+* of application code is strongly related to communication between
+* cores. On shared-memory machines, communication is caused during
+* execution of code, by memory accesses, and how much depends on contents
+* of caches connected to the core executing the code.  So, the placement
+* of slaves determines the communication caused during execution of the
+* slave's code.
+*The point of the Assigner, then, is to use application information during
+* execution of the program, to make choices about slave placement onto
+* cores, with the aim to put slaves close to caches containing the data
+* used by the slave's code.
+*
+*==========================================================================
+*In summary, the animationMaster scans the slots, finds slaves
+* just-finished, which hold requests, pass those to the request handler,
+* along with the semantic environment, and the request handler then manages
+* the structures in the semantic env, which controls the order of
+* animation of slaves, and so embodies the behavior of the language
+* constructs.
+*The animationMaster then rescans the slots, offering each empty one to
+* the Assigner, along with the semantic environment.  The Assigner chooses
+* among the ready slaves in the semantic Env, finding the one best suited
+* to be animated by that slot's associated core.
+*
+*==========================================================================
+*Implementation Details:
+*
+*There is a separate masterVP for each core, but a single semantic
+* environment shared by all cores.  Each core also has its own scheduling
+* slots, which are used to communicate slaves between animationMaster and
+* coreController.  There is only one global variable, _VMSMasterEnv, which
+* holds the semantic env and other things shared by the different
+* masterVPs.  The request handler and Assigner are registered with
+* the animationMaster by the language's init function, and a pointer to
+* each is in the _VMSMasterEnv. (There are also some pthread related global
+* vars, but they're only used during init of VMS).
+*VMS gains control over the cores by essentially "turning off" the OS's
+* scheduler, using pthread pin-to-core commands.
+*
+*The masterVPs are created during init, with this animationMaster as their
+* top level function.  The masterVPs use the same SlaveVP data structure,
+* even though they're not slave VPs.
+*A "seed slave" is also created during init -- this is equivalent to the
+* "main" function in C, and acts as the entry-point to the VMS-language-
+* based application.
+*The masterVPs shared a single system-wide master-lock, so only one
+* masterVP may be animated at a time.
+*The core controllers access _VMSMasterEnv to get the masterVP, and when
+* they start, the slots are all empty, so they run their associated core's
+* masterVP.  The first of those to get the master lock sees the seed slave
+* in the shared semantic environment, so when it runs the Assigner, that
+* returns the seed slave, which the animationMaster puts into a scheduling
+* slot then switches to the core controller.  That then switches the core
+* over to the seed slave, which then proceeds to execute language
+* constructs to create more slaves, and so on.  Each of those constructs
+* causes the seed slave to suspend, switching over to the core controller,
+* which eventually switches to the masterVP, which executes the
+* request handler, which uses VMS primitives to carry out the creation of
+* new slave VPs, which are marked as ready for the Assigner, and so on..
+*
+*On animation slots, and system behavior:
+* A request may linger in a animation slot for a long time while
+* the slaves in the other slots are animated.  This only becomes a problem
+* when such a request is a choke-point in the constraints, and is needed
+* to free work for *other* cores.  To reduce this occurance, the number
+* of animation slots should be kept low.  In balance, having multiple
+* animation slots amortizes the overhead of switching to the masterVP and
+* executing the animationMaster code, which drives for more than one. In
+* practice, the best balance should be discovered by profiling.
+*/
+void animationMaster( void *initData, SlaveVP *masterVP )
+{
+//Used while scanning and filling animation slots
+int32           slotIdx, numSlotsFilled;
+SchedSlot      *currSlot, **schedSlots;
+SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
+//Local copies, for performance
+MasterEnv      *masterEnv;
+SlaveAssigner   slaveAssigner;
+RequestHandler  requestHandler;
+void           *semanticEnv;
+int32           thisCoresIdx;
+//======================== Initializations ========================
+masterEnv        = (MasterEnv*)_VMSMasterEnv;
+thisCoresIdx     = masterVP->coreAnimatedBy;
+schedSlots       = masterEnv->allSchedSlots[thisCoresIdx];
+requestHandler   = masterEnv->requestHandler;
+slaveAssigner    = masterEnv->slaveAssigner;
+semanticEnv      = masterEnv->semanticEnv;
+//======================== animationMaster ========================
+while(1){
+MEAS__Capture_Pre_Master_Point
+//Scan the animation slots
+numSlotsFilled = 0;
+for( slotIdx = 0; slotIdx < NUM_SCHED_SLOTS; slotIdx++)
+{
+currSlot = schedSlots[ slotIdx ];
+//Check if newly-done slave in slot, which will need request handld
+if( currSlot->workIsDone )
+{
+currSlot->workIsDone         = FALSE;
+currSlot->needsSlaveAssigned = TRUE;
+MEAS__startReqHdlr;
+//process the requests made by the slave (held inside slave struc)
+(*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
+MEAS__endReqHdlr;
+}
+//If slot empty, hand to Assigner to fill with a slave
+if( currSlot->needsSlaveAssigned )
+{    //Call plugin's Assigner to give slot a new slave
+assignedSlaveVP =
+(*slaveAssigner)( semanticEnv, currSlot );
+//put the chosen slave into slot, and adjust flags and state
+if( assignedSlaveVP != NULL )
+{ currSlot->slaveAssignedToSlot = assignedSlaveVP;
+assignedSlaveVP->schedSlot       = currSlot;
+currSlot->needsSlaveAssigned  = FALSE;
+numSlotsFilled               += 1;
+}
+}
+}
+#ifdef SYS__TURN_ON_WORK_STEALING
+/*If no slots filled, means no more work, look for work to steal. */
+if( numSlotsFilled == 0 )
+{ gateProtected_stealWorkInto( currSlot, readyToAnimateQ, masterVP );
+}
+#endif
+MEAS__Capture_Post_Master_Point;
+masterSwitchToCoreCtlr(animatingSlv);
+flushRegisters();
+}//MasterLoop
+}
+//===========================  Work Stealing  ==============================
+/*This is first of two work-stealing approaches.  It's not used, but left
+* in the code as a simple illustration of the principle.  This version
+* has a race condition -- the core controllers are accessing their own
+* animation slots at the same time that this work-stealer on a different
+* core is..
+*Because the core controllers run outside the master lock, this interaction
+* is not protected.
+*/
+void inline
+stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ,
+SlaveVP *masterVP )
+{
+SlaveVP   *stolenSlv;
+int32        coreIdx, i;
+VMSQueueStruc *currQ;
+stolenSlv = NULL;
+coreIdx = masterVP->coreAnimatedBy;
+for( i = 0; i < NUM_CORES -1; i++ )
+{
+if( coreIdx >= NUM_CORES -1 )
+{ coreIdx = 0;
+}
+else
+{ coreIdx++;
+}
+//TODO: fix this for coreCtlr scans slots
+//      currQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
+if( numInVMSQ( currQ ) > 0 )
+{ stolenSlv = readVMSQ (currQ );
+break;
+}
+}
+if( stolenSlv != NULL )
+{ currSlot->slaveAssignedToSlot = stolenSlv;
+stolenSlv->schedSlot           = currSlot;
+currSlot->needsSlaveAssigned  = FALSE;
+writeVMSQ( stolenSlv, readyToAnimateQ );
+}
+}
+/*This algorithm makes the common case fast.  Make the coreloop passive,
+* and show its progress.  Make the stealer control a gate that coreloop
+* has to pass.
+*To avoid interference, only one stealer at a time.  Use a global
+* stealer-lock, so only the stealer is slowed.
+*
+*The pattern is based on a gate -- stealer shuts the gate, then monitors
+* to be sure any already past make it all the way out, before starting.
+*So, have a "progress" measure just before the gate, then have two after it,
+* one is in a "waiting room" outside the gate, the other is at the exit.
+*Then, the stealer first shuts the gate, then checks the progress measure
+* outside it, then looks to see if the progress measure at the exit is the
+* same.  If yes, it knows the protected area is empty 'cause no other way
+* to get in and the last to get in also exited.
+*If the progress measure at the exit is not the same, then the stealer goes
+* into a loop checking both the waiting-area and the exit progress-measures
+* until one of them shows the same as the measure outside the gate.  Might
+* as well re-read the measure outside the gate each go around, just to be
+* sure.  It is guaranteed that one of the two will eventually match the one
+* outside the gate.
+*
+*Here's an informal proof of correctness:
+*The gate can be closed at any point, and have only four cases:
+*  1) coreloop made it past the gate-closing but not yet past the exit
+*  2) coreloop made it past the pre-gate progress update but not yet past
+*     the gate,
+*  3) coreloop is right before the pre-gate update
+*  4) coreloop is past the exit and far from the pre-gate update.
+*
+* Covering the cases in reverse order,
+*  4) is not a problem -- stealer will read pre-gate progress, see that it
+*     matches exit progress, and the gate is closed, so stealer can proceed.
+*  3) stealer will read pre-gate progress just after coreloop updates it..
+*     so stealer goes into a loop until the coreloop causes wait-progress
+*     to match pre-gate progress, so then stealer can proceed
+*  2) same as 3..
+*  1) stealer reads pre-gate progress, sees that it's different than exit,
+*     so goes into loop until exit matches pre-gate, now it knows coreloop
+*     is not in protected and cannot get back in, so can proceed.
+*
+*Implementation for the stealer:
+*
+*First, acquire the stealer lock -- only cores with no work to do will
+* compete to steal, so not a big performance penalty having only one --
+* will rarely have multiple stealers in a system with plenty of work -- and
+* in a system with little work, it doesn't matter.
+*
+*Note, have single-reader, single-writer pattern for all variables used to
+* communicate between stealer and victims
+*
+*So, scan the queues of the core controllers, until find non-empty.  Each core
+* has its own list that it scans.  The list goes in order from closest to
+* furthest core, so it steals first from close cores.  Later can add
+* taking info from the app about overlapping footprints, and scan all the
+* others then choose work with the most footprint overlap with the contents
+* of this core's cache.
+*
+*Now, have a victim want to take work from.  So, shut the gate in that
+* coreloop, by setting the "gate closed" var on its stack to TRUE.
+*Then, read the core's pre-gate progress and compare to the core's exit
+* progress.
+*If same, can proceed to take work from the coreloop's queue.  When done,
+* write FALSE to gate closed var.
+*If different, then enter a loop that reads the pre-gate progress, then
+* compares to exit progress then to wait progress.  When one of two
+* matches, proceed.  Take work from the coreloop's queue.  When done,
+* write FALSE to the gate closed var.
+*
+*/
+void inline
+gateProtected_stealWorkInto( SchedSlot *currSlot,
+VMSQueueStruc *myReadyToAnimateQ,
+SlaveVP *masterVP )
+{
+SlaveVP     *stolenSlv;
+int32          coreIdx, i, haveAVictim, gotLock;
+VMSQueueStruc *victimsQ;
+volatile GateStruc *vicGate;
+int32               coreMightBeInProtected;
+//see if any other cores have work available to steal
+haveAVictim = FALSE;
+coreIdx = masterVP->coreAnimatedBy;
+for( i = 0; i < NUM_CORES -1; i++ )
+{
+if( coreIdx >= NUM_CORES -1 )
+{ coreIdx = 0;
+}
+else
+{ coreIdx++;
+}
+//TODO: fix this for coreCtlr scans slots
+//      victimsQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
+if( numInVMSQ( victimsQ ) > 0 )
+{ haveAVictim = TRUE;
+vicGate = _VMSMasterEnv->workStealingGates[ coreIdx ];
+break;
+}
+}
+if( !haveAVictim ) return;  //no work to steal, exit
+//have a victim core, now get the stealer-lock
+gotLock =__sync_bool_compare_and_swap( &(_VMSMasterEnv->workStealingLock),
+UNLOCKED, LOCKED );
+if( !gotLock ) return; //go back to core controller, which will re-start master
+//====== Start Gate-protection =======
+vicGate->gateClosed = TRUE;
+coreMightBeInProtected= vicGate->preGateProgress != vicGate->exitProgress;
+while( coreMightBeInProtected )
+{    //wait until sure
+if( vicGate->preGateProgress == vicGate->waitProgress )
+coreMightBeInProtected = FALSE;
+if( vicGate->preGateProgress == vicGate->exitProgress )
+coreMightBeInProtected = FALSE;
+}
+stolenSlv = readVMSQ ( victimsQ );
+vicGate->gateClosed = FALSE;
+//======= End Gate-protection  =======
+if( stolenSlv != NULL )  //victim could have been in protected and taken
+{ currSlot->slaveAssignedToSlot = stolenSlv;
+stolenSlv->schedSlot           = currSlot;
+currSlot->needsSlaveAssigned  = FALSE;
+writeVMSQ( stolenSlv, myReadyToAnimateQ );
+}
+//unlock the work stealing lock
+_VMSMasterEnv->workStealingLock = UNLOCKED;
+}

Mercurial > cgi-bin > hgwebdir.cgi > VMS > VMS_Implementations > VMS_impls > VMS__MC_shared_impl

comparison AnimationMaster.c @ 230:f2a7831352dc