Mercurial > cgi-bin > hgwebdir.cgi > VMS > VMS_Implementations > VMS_impls > VMS__MC_shared_impl
comparison AnimationMaster.c @ 230:f2a7831352dc
changed SchedulingMaster.c to AnimationMaster.c and cleaned up all comments
| author | Some Random Person <seanhalle@yahoo.com> |
|---|---|
| date | Thu, 15 Mar 2012 20:31:41 -0700 |
| parents | |
| children | 88fd85921d7f |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:5ceadea89150 |
|---|---|
| 1 /* | |
| 2 * Copyright 2010 OpenSourceStewardshipFoundation | |
| 3 * | |
| 4 * Licensed under BSD | |
| 5 */ | |
| 6 | |
| 7 | |
| 8 | |
| 9 #include <stdio.h> | |
| 10 #include <stddef.h> | |
| 11 | |
| 12 #include "VMS.h" | |
| 13 | |
| 14 | |
| 15 //========================= Local Fn Prototypes ============================= | |
| 16 void inline | |
| 17 stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ, | |
| 18 SlaveVP *masterVP ); | |
| 19 | |
| 20 //=========================================================================== | |
| 21 | |
| 22 | |
| 23 | |
| 24 /*The animationMaster embodies most of the animator of the language. The | |
| 25 * animator is what emodies the behavior of language constructs. | |
| 26 * As such, it is the animationMaster, in combination with the plugin | |
| 27 * functions, that make the language constructs do their behavior. | |
| 28 * | |
| 29 *Within the code, this is the top-level-function of the masterVPs, and | |
| 30 * runs when the coreController has no more slave VPs. It's job is to | |
| 31 * refill the animation slots with slaves. | |
| 32 * | |
| 33 *To do this, it scans the animation slots for just-completed slaves. | |
| 34 * Each of these has a request in it. So, the master hands each to the | |
| 35 * plugin's request handler. | |
| 36 *Each request represents a language construct that has been encountered | |
| 37 * by the application code in the slave. Passing the request to the | |
| 38 * request handler is how that language construct's behavior gets invoked. | |
| 39 * The request handler then performs the actions of the construct's | |
| 40 * behavior. So, the request handler encodes the behavior of the | |
| 41 * language's parallelism constructs, and performs that when the master | |
| 42 * hands it a slave containing a request to perform that construct. | |
| 43 * | |
| 44 *On a shared-memory machine, the behavior of parallelism constructs | |
| 45 * equals control, over order of execution of code. Hence, the behavior | |
| 46 * of the language constructs performed by the request handler is to | |
| 47 * choose the order that slaves get animated, and thereby control the | |
| 48 * order that application code in the slaves executes. | |
| 49 * | |
| 50 *To control order of animation of slaves, the request handler has a | |
| 51 * semantic environment that holds data structures used to hold slaves | |
| 52 * and choose when they're ready to be animated. | |
| 53 * | |
| 54 *Once a slave is marked as ready to be animated by the request handler, | |
| 55 * it is the second plugin function, the Assigner, which chooses the core | |
| 56 * the slave gets assigned to for animation. Hence, the Assigner doesn't | |
| 57 * perform any of the semantic behavior of language constructs, rather | |
| 58 * it gives the language a chance to improve performance. The performance | |
| 59 * of application code is strongly related to communication between | |
| 60 * cores. On shared-memory machines, communication is caused during | |
| 61 * execution of code, by memory accesses, and how much depends on contents | |
| 62 * of caches connected to the core executing the code. So, the placement | |
| 63 * of slaves determines the communication caused during execution of the | |
| 64 * slave's code. | |
| 65 *The point of the Assigner, then, is to use application information during | |
| 66 * execution of the program, to make choices about slave placement onto | |
| 67 * cores, with the aim to put slaves close to caches containing the data | |
| 68 * used by the slave's code. | |
| 69 * | |
| 70 *========================================================================== | |
| 71 *In summary, the animationMaster scans the slots, finds slaves | |
| 72 * just-finished, which hold requests, pass those to the request handler, | |
| 73 * along with the semantic environment, and the request handler then manages | |
| 74 * the structures in the semantic env, which controls the order of | |
| 75 * animation of slaves, and so embodies the behavior of the language | |
| 76 * constructs. | |
| 77 *The animationMaster then rescans the slots, offering each empty one to | |
| 78 * the Assigner, along with the semantic environment. The Assigner chooses | |
| 79 * among the ready slaves in the semantic Env, finding the one best suited | |
| 80 * to be animated by that slot's associated core. | |
| 81 * | |
| 82 *========================================================================== | |
| 83 *Implementation Details: | |
| 84 * | |
| 85 *There is a separate masterVP for each core, but a single semantic | |
| 86 * environment shared by all cores. Each core also has its own scheduling | |
| 87 * slots, which are used to communicate slaves between animationMaster and | |
| 88 * coreController. There is only one global variable, _VMSMasterEnv, which | |
| 89 * holds the semantic env and other things shared by the different | |
| 90 * masterVPs. The request handler and Assigner are registered with | |
| 91 * the animationMaster by the language's init function, and a pointer to | |
| 92 * each is in the _VMSMasterEnv. (There are also some pthread related global | |
| 93 * vars, but they're only used during init of VMS). | |
| 94 *VMS gains control over the cores by essentially "turning off" the OS's | |
| 95 * scheduler, using pthread pin-to-core commands. | |
| 96 * | |
| 97 *The masterVPs are created during init, with this animationMaster as their | |
| 98 * top level function. The masterVPs use the same SlaveVP data structure, | |
| 99 * even though they're not slave VPs. | |
| 100 *A "seed slave" is also created during init -- this is equivalent to the | |
| 101 * "main" function in C, and acts as the entry-point to the VMS-language- | |
| 102 * based application. | |
| 103 *The masterVPs shared a single system-wide master-lock, so only one | |
| 104 * masterVP may be animated at a time. | |
| 105 *The core controllers access _VMSMasterEnv to get the masterVP, and when | |
| 106 * they start, the slots are all empty, so they run their associated core's | |
| 107 * masterVP. The first of those to get the master lock sees the seed slave | |
| 108 * in the shared semantic environment, so when it runs the Assigner, that | |
| 109 * returns the seed slave, which the animationMaster puts into a scheduling | |
| 110 * slot then switches to the core controller. That then switches the core | |
| 111 * over to the seed slave, which then proceeds to execute language | |
| 112 * constructs to create more slaves, and so on. Each of those constructs | |
| 113 * causes the seed slave to suspend, switching over to the core controller, | |
| 114 * which eventually switches to the masterVP, which executes the | |
| 115 * request handler, which uses VMS primitives to carry out the creation of | |
| 116 * new slave VPs, which are marked as ready for the Assigner, and so on.. | |
| 117 * | |
| 118 *On animation slots, and system behavior: | |
| 119 * A request may linger in a animation slot for a long time while | |
| 120 * the slaves in the other slots are animated. This only becomes a problem | |
| 121 * when such a request is a choke-point in the constraints, and is needed | |
| 122 * to free work for *other* cores. To reduce this occurance, the number | |
| 123 * of animation slots should be kept low. In balance, having multiple | |
| 124 * animation slots amortizes the overhead of switching to the masterVP and | |
| 125 * executing the animationMaster code, which drives for more than one. In | |
| 126 * practice, the best balance should be discovered by profiling. | |
| 127 */ | |
| 128 void animationMaster( void *initData, SlaveVP *masterVP ) | |
| 129 { | |
| 130 //Used while scanning and filling animation slots | |
| 131 int32 slotIdx, numSlotsFilled; | |
| 132 SchedSlot *currSlot, **schedSlots; | |
| 133 SlaveVP *assignedSlaveVP; //the slave chosen by the assigner | |
| 134 | |
| 135 //Local copies, for performance | |
| 136 MasterEnv *masterEnv; | |
| 137 SlaveAssigner slaveAssigner; | |
| 138 RequestHandler requestHandler; | |
| 139 void *semanticEnv; | |
| 140 int32 thisCoresIdx; | |
| 141 | |
| 142 //======================== Initializations ======================== | |
| 143 masterEnv = (MasterEnv*)_VMSMasterEnv; | |
| 144 | |
| 145 thisCoresIdx = masterVP->coreAnimatedBy; | |
| 146 schedSlots = masterEnv->allSchedSlots[thisCoresIdx]; | |
| 147 | |
| 148 requestHandler = masterEnv->requestHandler; | |
| 149 slaveAssigner = masterEnv->slaveAssigner; | |
| 150 semanticEnv = masterEnv->semanticEnv; | |
| 151 | |
| 152 | |
| 153 //======================== animationMaster ======================== | |
| 154 while(1){ | |
| 155 | |
| 156 MEAS__Capture_Pre_Master_Point | |
| 157 | |
| 158 //Scan the animation slots | |
| 159 numSlotsFilled = 0; | |
| 160 for( slotIdx = 0; slotIdx < NUM_SCHED_SLOTS; slotIdx++) | |
| 161 { | |
| 162 currSlot = schedSlots[ slotIdx ]; | |
| 163 | |
| 164 //Check if newly-done slave in slot, which will need request handld | |
| 165 if( currSlot->workIsDone ) | |
| 166 { | |
| 167 currSlot->workIsDone = FALSE; | |
| 168 currSlot->needsSlaveAssigned = TRUE; | |
| 169 | |
| 170 MEAS__startReqHdlr; | |
| 171 | |
| 172 //process the requests made by the slave (held inside slave struc) | |
| 173 (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv ); | |
| 174 | |
| 175 MEAS__endReqHdlr; | |
| 176 } | |
| 177 //If slot empty, hand to Assigner to fill with a slave | |
| 178 if( currSlot->needsSlaveAssigned ) | |
| 179 { //Call plugin's Assigner to give slot a new slave | |
| 180 assignedSlaveVP = | |
| 181 (*slaveAssigner)( semanticEnv, currSlot ); | |
| 182 | |
| 183 //put the chosen slave into slot, and adjust flags and state | |
| 184 if( assignedSlaveVP != NULL ) | |
| 185 { currSlot->slaveAssignedToSlot = assignedSlaveVP; | |
| 186 assignedSlaveVP->schedSlot = currSlot; | |
| 187 currSlot->needsSlaveAssigned = FALSE; | |
| 188 numSlotsFilled += 1; | |
| 189 } | |
| 190 } | |
| 191 } | |
| 192 | |
| 193 | |
| 194 #ifdef SYS__TURN_ON_WORK_STEALING | |
| 195 /*If no slots filled, means no more work, look for work to steal. */ | |
| 196 if( numSlotsFilled == 0 ) | |
| 197 { gateProtected_stealWorkInto( currSlot, readyToAnimateQ, masterVP ); | |
| 198 } | |
| 199 #endif | |
| 200 | |
| 201 MEAS__Capture_Post_Master_Point; | |
| 202 | |
| 203 masterSwitchToCoreCtlr(animatingSlv); | |
| 204 flushRegisters(); | |
| 205 }//MasterLoop | |
| 206 | |
| 207 | |
| 208 } | |
| 209 | |
| 210 | |
| 211 //=========================== Work Stealing ============================== | |
| 212 | |
| 213 /*This is first of two work-stealing approaches. It's not used, but left | |
| 214 * in the code as a simple illustration of the principle. This version | |
| 215 * has a race condition -- the core controllers are accessing their own | |
| 216 * animation slots at the same time that this work-stealer on a different | |
| 217 * core is.. | |
| 218 *Because the core controllers run outside the master lock, this interaction | |
| 219 * is not protected. | |
| 220 */ | |
| 221 void inline | |
| 222 stealWorkInto( SchedSlot *currSlot, VMSQueueStruc *readyToAnimateQ, | |
| 223 SlaveVP *masterVP ) | |
| 224 { | |
| 225 SlaveVP *stolenSlv; | |
| 226 int32 coreIdx, i; | |
| 227 VMSQueueStruc *currQ; | |
| 228 | |
| 229 stolenSlv = NULL; | |
| 230 coreIdx = masterVP->coreAnimatedBy; | |
| 231 for( i = 0; i < NUM_CORES -1; i++ ) | |
| 232 { | |
| 233 if( coreIdx >= NUM_CORES -1 ) | |
| 234 { coreIdx = 0; | |
| 235 } | |
| 236 else | |
| 237 { coreIdx++; | |
| 238 } | |
| 239 //TODO: fix this for coreCtlr scans slots | |
| 240 // currQ = _VMSMasterEnv->readyToAnimateQs[coreIdx]; | |
| 241 if( numInVMSQ( currQ ) > 0 ) | |
| 242 { stolenSlv = readVMSQ (currQ ); | |
| 243 break; | |
| 244 } | |
| 245 } | |
| 246 | |
| 247 if( stolenSlv != NULL ) | |
| 248 { currSlot->slaveAssignedToSlot = stolenSlv; | |
| 249 stolenSlv->schedSlot = currSlot; | |
| 250 currSlot->needsSlaveAssigned = FALSE; | |
| 251 | |
| 252 writeVMSQ( stolenSlv, readyToAnimateQ ); | |
| 253 } | |
| 254 } | |
| 255 | |
| 256 /*This algorithm makes the common case fast. Make the coreloop passive, | |
| 257 * and show its progress. Make the stealer control a gate that coreloop | |
| 258 * has to pass. | |
| 259 *To avoid interference, only one stealer at a time. Use a global | |
| 260 * stealer-lock, so only the stealer is slowed. | |
| 261 * | |
| 262 *The pattern is based on a gate -- stealer shuts the gate, then monitors | |
| 263 * to be sure any already past make it all the way out, before starting. | |
| 264 *So, have a "progress" measure just before the gate, then have two after it, | |
| 265 * one is in a "waiting room" outside the gate, the other is at the exit. | |
| 266 *Then, the stealer first shuts the gate, then checks the progress measure | |
| 267 * outside it, then looks to see if the progress measure at the exit is the | |
| 268 * same. If yes, it knows the protected area is empty 'cause no other way | |
| 269 * to get in and the last to get in also exited. | |
| 270 *If the progress measure at the exit is not the same, then the stealer goes | |
| 271 * into a loop checking both the waiting-area and the exit progress-measures | |
| 272 * until one of them shows the same as the measure outside the gate. Might | |
| 273 * as well re-read the measure outside the gate each go around, just to be | |
| 274 * sure. It is guaranteed that one of the two will eventually match the one | |
| 275 * outside the gate. | |
| 276 * | |
| 277 *Here's an informal proof of correctness: | |
| 278 *The gate can be closed at any point, and have only four cases: | |
| 279 * 1) coreloop made it past the gate-closing but not yet past the exit | |
| 280 * 2) coreloop made it past the pre-gate progress update but not yet past | |
| 281 * the gate, | |
| 282 * 3) coreloop is right before the pre-gate update | |
| 283 * 4) coreloop is past the exit and far from the pre-gate update. | |
| 284 * | |
| 285 * Covering the cases in reverse order, | |
| 286 * 4) is not a problem -- stealer will read pre-gate progress, see that it | |
| 287 * matches exit progress, and the gate is closed, so stealer can proceed. | |
| 288 * 3) stealer will read pre-gate progress just after coreloop updates it.. | |
| 289 * so stealer goes into a loop until the coreloop causes wait-progress | |
| 290 * to match pre-gate progress, so then stealer can proceed | |
| 291 * 2) same as 3.. | |
| 292 * 1) stealer reads pre-gate progress, sees that it's different than exit, | |
| 293 * so goes into loop until exit matches pre-gate, now it knows coreloop | |
| 294 * is not in protected and cannot get back in, so can proceed. | |
| 295 * | |
| 296 *Implementation for the stealer: | |
| 297 * | |
| 298 *First, acquire the stealer lock -- only cores with no work to do will | |
| 299 * compete to steal, so not a big performance penalty having only one -- | |
| 300 * will rarely have multiple stealers in a system with plenty of work -- and | |
| 301 * in a system with little work, it doesn't matter. | |
| 302 * | |
| 303 *Note, have single-reader, single-writer pattern for all variables used to | |
| 304 * communicate between stealer and victims | |
| 305 * | |
| 306 *So, scan the queues of the core controllers, until find non-empty. Each core | |
| 307 * has its own list that it scans. The list goes in order from closest to | |
| 308 * furthest core, so it steals first from close cores. Later can add | |
| 309 * taking info from the app about overlapping footprints, and scan all the | |
| 310 * others then choose work with the most footprint overlap with the contents | |
| 311 * of this core's cache. | |
| 312 * | |
| 313 *Now, have a victim want to take work from. So, shut the gate in that | |
| 314 * coreloop, by setting the "gate closed" var on its stack to TRUE. | |
| 315 *Then, read the core's pre-gate progress and compare to the core's exit | |
| 316 * progress. | |
| 317 *If same, can proceed to take work from the coreloop's queue. When done, | |
| 318 * write FALSE to gate closed var. | |
| 319 *If different, then enter a loop that reads the pre-gate progress, then | |
| 320 * compares to exit progress then to wait progress. When one of two | |
| 321 * matches, proceed. Take work from the coreloop's queue. When done, | |
| 322 * write FALSE to the gate closed var. | |
| 323 * | |
| 324 */ | |
| 325 void inline | |
| 326 gateProtected_stealWorkInto( SchedSlot *currSlot, | |
| 327 VMSQueueStruc *myReadyToAnimateQ, | |
| 328 SlaveVP *masterVP ) | |
| 329 { | |
| 330 SlaveVP *stolenSlv; | |
| 331 int32 coreIdx, i, haveAVictim, gotLock; | |
| 332 VMSQueueStruc *victimsQ; | |
| 333 | |
| 334 volatile GateStruc *vicGate; | |
| 335 int32 coreMightBeInProtected; | |
| 336 | |
| 337 | |
| 338 | |
| 339 //see if any other cores have work available to steal | |
| 340 haveAVictim = FALSE; | |
| 341 coreIdx = masterVP->coreAnimatedBy; | |
| 342 for( i = 0; i < NUM_CORES -1; i++ ) | |
| 343 { | |
| 344 if( coreIdx >= NUM_CORES -1 ) | |
| 345 { coreIdx = 0; | |
| 346 } | |
| 347 else | |
| 348 { coreIdx++; | |
| 349 } | |
| 350 //TODO: fix this for coreCtlr scans slots | |
| 351 // victimsQ = _VMSMasterEnv->readyToAnimateQs[coreIdx]; | |
| 352 if( numInVMSQ( victimsQ ) > 0 ) | |
| 353 { haveAVictim = TRUE; | |
| 354 vicGate = _VMSMasterEnv->workStealingGates[ coreIdx ]; | |
| 355 break; | |
| 356 } | |
| 357 } | |
| 358 if( !haveAVictim ) return; //no work to steal, exit | |
| 359 | |
| 360 //have a victim core, now get the stealer-lock | |
| 361 gotLock =__sync_bool_compare_and_swap( &(_VMSMasterEnv->workStealingLock), | |
| 362 UNLOCKED, LOCKED ); | |
| 363 if( !gotLock ) return; //go back to core controller, which will re-start master | |
| 364 | |
| 365 | |
| 366 //====== Start Gate-protection ======= | |
| 367 vicGate->gateClosed = TRUE; | |
| 368 coreMightBeInProtected= vicGate->preGateProgress != vicGate->exitProgress; | |
| 369 while( coreMightBeInProtected ) | |
| 370 { //wait until sure | |
| 371 if( vicGate->preGateProgress == vicGate->waitProgress ) | |
| 372 coreMightBeInProtected = FALSE; | |
| 373 if( vicGate->preGateProgress == vicGate->exitProgress ) | |
| 374 coreMightBeInProtected = FALSE; | |
| 375 } | |
| 376 | |
| 377 stolenSlv = readVMSQ ( victimsQ ); | |
| 378 | |
| 379 vicGate->gateClosed = FALSE; | |
| 380 //======= End Gate-protection ======= | |
| 381 | |
| 382 | |
| 383 if( stolenSlv != NULL ) //victim could have been in protected and taken | |
| 384 { currSlot->slaveAssignedToSlot = stolenSlv; | |
| 385 stolenSlv->schedSlot = currSlot; | |
| 386 currSlot->needsSlaveAssigned = FALSE; | |
| 387 | |
| 388 writeVMSQ( stolenSlv, myReadyToAnimateQ ); | |
| 389 } | |
| 390 | |
| 391 //unlock the work stealing lock | |
| 392 _VMSMasterEnv->workStealingLock = UNLOCKED; | |
| 393 } |
