# HG changeset patch
# User Sean Halle <seanhalle@yahoo.com>
# Date 1348121564 25200
# Node ID 999f2966a3e55b69f45caf7d2e1f32fe69706ab2
# Parent  0dc0b865390239b398d3711acf670ca5cb426195
new branch -- Dev_ML -- for making VMS take langlets whose constructs can be mixed

diff -r 0dc0b8653902 -r 999f2966a3e5 AnimationMaster.c
--- a/AnimationMaster.c	Mon Sep 03 03:34:54 2012 -0700
+++ b/AnimationMaster.c	Wed Sep 19 23:12:44 2012 -0700
@@ -9,7 +9,7 @@
 #include <stdio.h>
 #include <stddef.h>
 
-#include "VMS.h"
+#include "PR.h"
 
 
 
@@ -20,11 +20,39 @@
  * 
  *Within the code, this is the top-level-function of the masterVPs, and
  * runs when the coreController has no more slave VPs.  It's job is to
- * refill the animation slots with slaves.
+ * refill the animation slots with slaves that have work.
  *
- *To do this, it scans the animation slots for just-completed slaves.
- * Each of these has a request in it.  So, the master hands each to the
- * plugin's request handler.
+ *There are multiple versions of the master, each tuned to a specific 
+ * combination of modes.  This keeps the master simple, with reduced overhead,
+ * when the application is not using the extra complexity.
+ * 
+ *As of Sept 2012, the versions available will be:
+ * 1) Single langauge, which only exposes slaves (such as SSR or Vthread)
+ * 2) Single language, which only exposes tasks  (such as pure dataflow)
+ * 3) Single language, which exposes both (like Cilk, StarSs, and OpenMP)
+ * 4) Multi-language, which always assumes both tasks and slaves
+ * 5) Multi-language and multi-process, which also assumes both tasks and slaves
+ *
+ * 
+ *
+ */
+
+
+//=====================  The versions of the Animation Master  =================
+//
+//==============================================================================
+
+/* 1) This version is for a single language, that has only slaves, no tasks,
+ *    such as Vthread or SSR.
+ *This version is for when an application has only a single language, and
+ * that language exposes slaves explicitly (as opposed to a task based 
+ * language like pure dataflow).
+ * 
+ *
+ *It scans the animation slots for just-completed slaves.
+ * Each completed slave has a request in it.  So, the master hands each to
+ * the plugin's request handler (there is only one plugin, because only one
+ * lang).
  *Each request represents a language construct that has been encountered
  * by the application code in the slave. Passing the request to the
  * request handler is how that language construct's behavior gets invoked.
@@ -77,24 +105,24 @@
  *There is a separate masterVP for each core, but a single semantic
  * environment shared by all cores.  Each core also has its own scheduling
  * slots, which are used to communicate slaves between animationMaster and
- * coreController.  There is only one global variable, _VMSMasterEnv, which
+ * coreController.  There is only one global variable, _PRMasterEnv, which
  * holds the semantic env and other things shared by the different
  * masterVPs.  The request handler and Assigner are registered with
  * the animationMaster by the language's init function, and a pointer to
- * each is in the _VMSMasterEnv. (There are also some pthread related global
- * vars, but they're only used during init of VMS).
- *VMS gains control over the cores by essentially "turning off" the OS's
+ * each is in the _PRMasterEnv. (There are also some pthread related global
+ * vars, but they're only used during init of PR).
+ *PR gains control over the cores by essentially "turning off" the OS's
  * scheduler, using pthread pin-to-core commands.
  *
  *The masterVPs are created during init, with this animationMaster as their
  * top level function.  The masterVPs use the same SlaveVP data structure,
  * even though they're not slave VPs.
  *A "seed slave" is also created during init -- this is equivalent to the
- * "main" function in C, and acts as the entry-point to the VMS-language-
+ * "main" function in C, and acts as the entry-point to the PR-language-
  * based application.
- *The masterVPs shared a single system-wide master-lock, so only one
+ *The masterVPs share a single system-wide master-lock, so only one
  * masterVP may be animated at a time.
- *The core controllers access _VMSMasterEnv to get the masterVP, and when
+ *The core controllers access _PRMasterEnv to get the masterVP, and when
  * they start, the slots are all empty, so they run their associated core's
  * masterVP.  The first of those to get the master lock sees the seed slave
  * in the shared semantic environment, so when it runs the Assigner, that
@@ -104,14 +132,14 @@
  * constructs to create more slaves, and so on.  Each of those constructs
  * causes the seed slave to suspend, switching over to the core controller,
  * which eventually switches to the masterVP, which executes the 
- * request handler, which uses VMS primitives to carry out the creation of
+ * request handler, which uses PR primitives to carry out the creation of
  * new slave VPs, which are marked as ready for the Assigner, and so on..
  * 
  *On animation slots, and system behavior:
- * A request may linger in a animation slot for a long time while
+ * A request may linger in an animation slot for a long time while
  * the slaves in the other slots are animated.  This only becomes a problem
  * when such a request is a choke-point in the constraints, and is needed
- * to free work for *other* cores.  To reduce this occurance, the number
+ * to free work for *other* cores.  To reduce this occurrence, the number
  * of animation slots should be kept low.  In balance, having multiple
  * animation slots amortizes the overhead of switching to the masterVP and
  * executing the animationMaster code, which drives for more than one. In
@@ -163,7 +191,29 @@
        HOLISTIC__Record_AppResponder_start;
                MEAS__startReqHdlr;
                
-            //process the requests made by the slave (held inside slave struc)
+           currSlot->workIsDone         = FALSE;
+            currSlot->needsSlaveAssigned = TRUE;
+            SlaveVP *currSlave = currSlot->slaveAssignedToSlot;
+            
+	justAddedReqHdlrChg();
+			//handle the request, either by VMS or by the language
+            if( currSlave->requests->reqType != LangReq )
+             {    //The request is a standard VMS one, not one defined by the
+                  // language, so VMS handles it, then queues slave to be assigned
+               handleReqInVMS( currSlave );
+               writePrivQ( currSlave, VMSReadyQ ); //Q slave to be assigned below
+             }
+            else
+             {       MEAS__startReqHdlr;
+
+                  //Language handles request, which is held inside slave struc
+               (*requestHandler)( currSlave, semanticEnv );
+
+                     MEAS__endReqHdlr;
+             }
+          }
+
+		  //process the requests made by the slave (held inside slave struc)
          (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
          
          HOLISTIC__Record_AppResponder_end;
@@ -196,3 +246,756 @@
    }//while(1) 
  }
 
+
+/* 2)  This version is for a single language that has only tasks, which 
+ *     cannot be suspended.
+ */
+void animationMaster( void *initData, SlaveVP *masterVP )
+ { 
+      //Used while scanning and filling animation slots
+   int32           slotIdx, numSlotsFilled;
+   AnimSlot       *currSlot, **animSlots;
+   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
+   
+      //Local copies, for performance
+   MasterEnv      *masterEnv;
+   SlaveAssigner   slaveAssigner;
+   RequestHandler  requestHandler;
+   PRSemEnv       *semanticEnv;
+   int32           thisCoresIdx;
+
+   //#ifdef  MODE__MULTI_LANG
+   SlaveVP        *slave;
+   PRProcess      *process;
+   PRConstrEnvHolder *constrEnvHolder;
+   int32           langMagicNumber;
+   //#endif
+   
+   //======================== Initializations ========================
+   masterEnv        = (MasterEnv*)_PRMasterEnv;
+   
+   thisCoresIdx     = masterVP->coreAnimatedBy;
+   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
+
+   requestHandler   = masterEnv->requestHandler;
+   slaveAssigner    = masterEnv->slaveAssigner;
+   semanticEnv      = masterEnv->semanticEnv;
+   
+      //initialize, for non-multi-lang, non multi-proc case
+      // default handler gets put into master env by a registration call by lang
+   endTaskHandler   = masterEnv->defaultTaskHandler;
+   
+      HOLISTIC__Insert_Master_Global_Vars;
+   
+   //======================== animationMaster ========================
+   //Do loop gets requests handled and work assigned to slots..
+   // work can either be a task or a resumed slave
+   //Having two cases makes this logic complex.. can be finishing either, and 
+   // then the next available work may be either.. so really have two distinct
+   // loops that are inter-twined.. 
+   while(1){
+       
+      MEAS__Capture_Pre_Master_Point
+
+      //Scan the animation slots
+   numSlotsFilled = 0;
+   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
+    {
+      currSlot = animSlots[ slotIdx ];
+
+         //Check if newly-done slave in slot, which will need request handled
+      if( currSlot->workIsDone )
+       { currSlot->workIsDone = FALSE;
+       
+               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
+               MEAS__startReqHdlr;
+               
+         
+            //process the request made by the slave (held inside slave struc)
+         slave = currSlot->slaveAssignedToSlot;
+         
+            //check if the completed work was a task..
+         if( slave->taskMetaInfo->isATask )
+          {
+             if( slave->reqst->type == TaskEnd ) 
+              {    //do task end handler, which is registered separately
+                   //note, end hdlr may use semantic data from reqst..
+                //#ifdef  MODE__MULTI_LANG
+                   //get end-task handler
+                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
+                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
+                //#endif
+                (*taskEndHandler)( slave, semanticEnv );
+                
+                goto AssignWork;
+              }
+             else  //is a task, and just suspended
+              {    //turn slot slave into free task slave & make replacement
+                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
+                
+                //goto normal slave request handling
+                goto SlaveReqHandling; 
+              }
+          }
+         else //is a slave that suspended
+          {
+          SlaveReqHandling:
+            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
+         
+               HOLISTIC__Record_AppResponder_end;
+               MEAS__endReqHdlr;
+               
+            goto AssignWork;
+          }
+       } //if has suspended slave that needs handling
+      
+         //if slot empty, hand to Assigner to fill with a slave
+      if( currSlot->needsSlaveAssigned )
+       {    //Call plugin's Assigner to give slot a new slave
+               HOLISTIC__Record_Assigner_start;
+               
+       AssignWork:
+     
+         assignedSlaveVP = assignWork( semanticEnv, currSlot );
+       
+            //put the chosen slave into slot, and adjust flags and state
+         if( assignedSlaveVP != NULL )
+          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
+            assignedSlaveVP->animSlotAssignedTo = currSlot;
+            currSlot->needsSlaveAssigned  = FALSE;
+            numSlotsFilled               += 1;
+          }
+         else
+          {
+            currSlot->needsSlaveAssigned  = TRUE; //local write
+          }
+               HOLISTIC__Record_Assigner_end;
+       }//if slot needs slave assigned
+    }//for( slotIdx..
+
+         MEAS__Capture_Post_Master_Point;
+   
+   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
+   flushRegisters();
+   }//while(1) 
+ }
+
+
+/*This is the master when just multi-lang, but not multi-process mode is on.
+ * This version has to handle both tasks and slaves, and do extra work of 
+ * looking up the semantic env and handlers to use, for each completed bit of 
+ * work.
+ *It also has to search through the semantic envs to find one with work,
+ * then ask that env's assigner to return a unit of that work.
+ * 
+ *The language is written to startup in the same way as if it were the only
+ * language in the app, and it operates in the same way,
+ * the only difference between single language and multi-lang is here, in the
+ * master.
+ *This invisibility to mode is why the language has to use registration calls
+ * for everything during startup -- those calls do different things depending
+ * on whether it's single-language or multi-language mode.
+ * 
+ *In this version of the master, work can either be a task or a resumed slave
+ *Having two cases makes this logic complex.. can be finishing either, and
+ * then the next available work may be either.. so really have two distinct 
+ * loops that are inter-twined.. 
+ * 
+ *Some special cases:
+ * A task-end is a special case for a few reasons (below).
+ * A task-end can't block a slave (can't cause it to "logically suspend")
+ * A task available for work can only be assigned to a special slave, which 
+ *   has been set aside for doing tasks, one such task-slave is always 
+ *   assigned to each slot. So, when a task ends, a new task is assigned to
+ *   that slot's task-slave right away.  
+ * But if no tasks are available, then have to switch over to looking at
+ *   slaves to find one ready to resume, to find work for the slot.
+ * If a task just suspends, not ends, then its task-slave is no longer 
+ *   available to take new tasks, so a new task-slave has to be assigned to
+ *   that slot.  Then the slave of the suspended task is turned into a free
+ *   task-slave and request handling is done on it as if it were a slave 
+ *   that suspended.
+ * After request handling, do the same sequence of looking for a task to be
+ *   work, and if none, look for a slave ready to resume, as work for the slot.
+ * If a slave suspends, handle its request, then look for work.. first for a
+ *   task to assign, and if none, slaves ready to resume.
+ * Another special case is when task-end is done on a free task-slave.. in
+ *   that case, the slave has no more work and no way to get more.. so place
+ *   it into a recycle queue.
+ * If no work is found of either type, then do a special thing to prune down
+ *   the extra slaves in the recycle queue, just so don't get too many..
+ * 
+ *The multi-lang thing complicates matters..  
+ *
+ *For request handling, it means have to first fetch the semantic environment
+ * of the language, and then do the request handler pointed to by that
+ * semantic env.
+ *For assigning, things get more complex because of competing goals..  One
+ * goal is for language specific stuff to be used during assignment, so
+ * assigner can make higher quality decisions..  but with multiple languages,
+ * which only get mixed in the application, the assigners can't be written
+ * with knowledge of each other.  So, they can only make localized decisions,
+ * and so different language's assigners may interfere with each other..
+ * 
+ *So, have some possibilities available:
+ *1) can have a fixed scheduler in the proto-runtime, that all the
+ * languages give their work to..  (but then lose language-specific info, 
+ * there is a standard PR format for assignment info, and the langauge 
+ * attaches this to the work-unit when it gives it to PR.. also have issue
+ * with HWSim, which uses a priority Q instead of FIFO, and requests can 
+ * "undo" previous work put in, so request handlers need way to manipulate
+ * the work-holding Q..) (this might be fudgeable with
+ * HWSim, if the master did a lang-supplied callback each time it assigns a
+ * unit to a slot..  then HWSim can keep exactly one unit of work in PR's
+ * queue at a time..  but this is quite hack-like.. or perhaps HWSim supplies
+ * a task-end handler that kicks the next unit of work from HWSim internal
+ * priority queue, over to PR readyQ)
+ *2) can have each language have its own semantic env, that holds its own
+ * work, which is assigned by its own assigner.. then the master searches
+ * through all the semantic envs to find one with work and asks it give work..
+ * (this has downside of blinding assigners to each other.. but does work
+ * for HWSim case)
+ *3) could make PR have a different readyQ for each core, and ask the lang
+ * to put work to the core it prefers.. but the work may be moved by PR if
+ * needed, say if one core idles for too long. This is a hybrid approach, 
+ * letting the language decide which core, but PR keeps the work and does it
+ * FIFO style.. (this might als be fudgeable with HWSim, in similar fashion, 
+ * but it would be complicated by having to track cores separately) 
+ *
+ *Choosing 2, to keep compatibility with single-lang mode..  it allows the same
+ * assigner to be used for single-lang as for multi-lang..  the overhead of
+ * the extra master search for work is part of the price of the flexibility,
+ * but should be fairly small.. takes the first env that has work available, 
+ * and whatever it returns is assigned to the slot..
+ * 
+ *As a hybrid, giving an option for a unified override assigner to be registered
+ * and used..  This allows something like a static analysis to detect
+ * which languages are grouped together, and then analyze the pattern of 
+ * construct calls, and generate a custom assigner that uses info from all
+ * the languages in a unified way..  Don't really expect this to happen, 
+ * but making it possible.
+ */
+#ifdef  MODE__MULTI_LANG
+void animationMaster( void *initData, SlaveVP *masterVP )
+ { 
+      //Used while scanning and filling animation slots
+   int32           slotIdx, numSlotsFilled;
+   AnimSlot       *currSlot, **animSlots;
+   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
+   
+      //Local copies, for performance
+   MasterEnv      *masterEnv;
+   SlaveAssigner   slaveAssigner;
+   RequestHandler  requestHandler;
+   PRSemEnv       *semanticEnv;
+   int32           thisCoresIdx;
+
+   //#ifdef  MODE__MULTI_LANG
+   SlaveVP        *slave;
+   PRProcess      *process;
+   PRConstrEnvHolder *constrEnvHolder;
+   int32           langMagicNumber;
+   //#endif
+   
+   //======================== Initializations ========================
+   masterEnv        = (MasterEnv*)_PRMasterEnv;
+   
+   thisCoresIdx     = masterVP->coreAnimatedBy;
+   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
+
+   requestHandler   = masterEnv->requestHandler;
+   slaveAssigner    = masterEnv->slaveAssigner;
+   semanticEnv      = masterEnv->semanticEnv;
+   
+      //initialize, for non-multi-lang, non multi-proc case
+      // default handler gets put into master env by a registration call by lang
+   endTaskHandler   = masterEnv->defaultTaskHandler;
+   
+      HOLISTIC__Insert_Master_Global_Vars;
+   
+   //======================== animationMaster ========================
+   //Do loop gets requests handled and work assigned to slots..
+   // work can either be a task or a resumed slave
+   //Having two cases makes this logic complex.. can be finishing either, and 
+   // then the next available work may be either.. so really have two distinct
+   // loops that are inter-twined.. 
+   while(1){
+       
+      MEAS__Capture_Pre_Master_Point
+
+      //Scan the animation slots
+   numSlotsFilled = 0;
+   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
+    {
+      currSlot = animSlots[ slotIdx ];
+
+         //Check if newly-done slave in slot, which will need request handled
+      if( currSlot->workIsDone )
+       { currSlot->workIsDone = FALSE;
+       
+               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
+               MEAS__startReqHdlr;
+               
+         
+            //process the request made by the slave (held inside slave struc)
+         slave = currSlot->slaveAssignedToSlot;
+         
+            //check if the completed work was a task..
+         if( slave->taskMetaInfo->isATask )
+          {
+             if( slave->reqst->type == TaskEnd ) 
+              {    //do task end handler, which is registered separately
+                   //note, end hdlr may use semantic data from reqst..
+                //#ifdef  MODE__MULTI_LANG
+                   //get end-task handler
+                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
+                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
+                //#endif
+                (*taskEndHandler)( slave, semanticEnv );
+                
+                goto AssignWork;
+              }
+             else  //is a task, and just suspended
+              {    //turn slot slave into free task slave & make replacement
+                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
+                
+                //goto normal slave request handling
+                goto SlaveReqHandling; 
+              }
+          }
+         else //is a slave that suspended
+          {
+          SlaveReqHandling:
+            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
+         
+               HOLISTIC__Record_AppResponder_end;
+               MEAS__endReqHdlr;
+               
+            goto AssignWork;
+          }
+       } //if has suspended slave that needs handling
+      
+         //if slot empty, hand to Assigner to fill with a slave
+      if( currSlot->needsSlaveAssigned )
+       {    //Call plugin's Assigner to give slot a new slave
+               HOLISTIC__Record_Assigner_start;
+               
+       AssignWork:
+     
+         assignedSlaveVP = assignWork( semanticEnv, currSlot );
+       
+            //put the chosen slave into slot, and adjust flags and state
+         if( assignedSlaveVP != NULL )
+          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
+            assignedSlaveVP->animSlotAssignedTo = currSlot;
+            currSlot->needsSlaveAssigned  = FALSE;
+            numSlotsFilled               += 1;
+          }
+         else
+          {
+            currSlot->needsSlaveAssigned  = TRUE; //local write
+          }
+               HOLISTIC__Record_Assigner_end;
+       }//if slot needs slave assigned
+    }//for( slotIdx..
+
+         MEAS__Capture_Post_Master_Point;
+   
+   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
+   flushRegisters();
+   }//while(1) 
+ }
+#endif //MODE__MULTI_LANG
+
+
+
+//This is the master when both multi-lang and multi-process modes are turned on
+//#ifdef MODE__MULTI_LANG
+//#ifdef MODE__MULTI_PROCESS
+void animationMaster( void *initData, SlaveVP *masterVP )
+ { 
+      //Used while scanning and filling animation slots
+   int32           slotIdx, numSlotsFilled;
+   AnimSlot       *currSlot, **animSlots;
+   SlaveVP        *assignedSlaveVP;  //the slave chosen by the assigner
+   
+      //Local copies, for performance
+   MasterEnv      *masterEnv;
+   SlaveAssigner   slaveAssigner;
+   RequestHandler  requestHandler;
+   PRSemEnv       *semanticEnv;
+   int32           thisCoresIdx;
+
+   SlaveVP        *slave;
+   PRProcess      *process;
+   PRConstrEnvHolder *constrEnvHolder;
+   int32           langMagicNumber;
+   
+   //======================== Initializations ========================
+   masterEnv        = (MasterEnv*)_PRMasterEnv;
+   
+   thisCoresIdx     = masterVP->coreAnimatedBy;
+   animSlots        = masterEnv->allAnimSlots[thisCoresIdx];
+
+   requestHandler   = masterEnv->requestHandler;
+   slaveAssigner    = masterEnv->slaveAssigner;
+   semanticEnv      = masterEnv->semanticEnv;
+   
+      //initialize, for non-multi-lang, non multi-proc case
+      // default handler gets put into master env by a registration call by lang
+   endTaskHandler   = masterEnv->defaultTaskHandler;
+   
+      HOLISTIC__Insert_Master_Global_Vars;
+   
+   //======================== animationMaster ========================
+   //Do loop gets requests handled and work assigned to slots..
+   // work can either be a task or a resumed slave
+   //Having two cases makes this logic complex.. can be finishing either, and 
+   // then the next available work may be either.. so really have two distinct
+   // loops that are inter-twined.. 
+   while(1){
+       
+      MEAS__Capture_Pre_Master_Point
+
+      //Scan the animation slots
+   numSlotsFilled = 0;
+   for( slotIdx = 0; slotIdx < NUM_ANIM_SLOTS; slotIdx++)
+    {
+      currSlot = animSlots[ slotIdx ];
+
+         //Check if newly-done slave in slot, which will need request handled
+      if( currSlot->workIsDone )
+       { currSlot->workIsDone = FALSE;
+       
+               HOLISTIC__Record_AppResponder_start; //TODO: update to check which process for each slot
+               MEAS__startReqHdlr;
+               
+         
+            //process the request made by the slave (held inside slave struc)
+         slave = currSlot->slaveAssignedToSlot;
+         
+            //check if the completed work was a task..
+         if( slave->taskMetaInfo->isATask )
+          {
+             if( slave->reqst->type == TaskEnd ) 
+              {    //do task end handler, which is registered separately
+                   //note, end hdlr may use semantic data from reqst..
+                   //get end-task handler
+                //taskEndHandler = lookup( slave->reqst->langMagicNumber, processEnv );
+                taskEndHandler = slave->taskMetaInfo->endTaskHandler;
+                
+                (*taskEndHandler)( slave, semanticEnv );
+                
+                goto AssignWork;
+              }
+             else  //is a task, and just suspended
+              {    //turn slot slave into free task slave & make replacement
+                if( slave->typeOfVP == TaskSlotSlv ) changeSlvType();
+                
+                //goto normal slave request handling
+                goto SlaveReqHandling; 
+              }
+          }
+         else //is a slave that suspended
+          {
+             
+          SlaveReqHandling:
+            (*requestHandler)( slave, semanticEnv ); //(note: indirect Fn call more efficient when use fewer params, instead re-fetch from slave)
+         
+               HOLISTIC__Record_AppResponder_end;
+               MEAS__endReqHdlr;
+               
+            goto AssignWork;
+          }
+       } //if has suspended slave that needs handling
+      
+         //if slot empty, hand to Assigner to fill with a slave
+      if( currSlot->needsSlaveAssigned )
+       {    //Scan sem environs, looking for one with ready work.
+            // call the Assigner for that sem Env, to give slot a new slave
+               HOLISTIC__Record_Assigner_start;
+               
+       AssignWork:
+     
+         assignedSlaveVP = assignWork( semanticEnv, currSlot );
+       
+            //put the chosen slave into slot, and adjust flags and state
+         if( assignedSlaveVP != NULL )
+          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
+            assignedSlaveVP->animSlotAssignedTo = currSlot;
+            currSlot->needsSlaveAssigned  = FALSE;
+            numSlotsFilled               += 1;
+          }
+         else
+          {
+            currSlot->needsSlaveAssigned  = TRUE; //local write
+          }
+               HOLISTIC__Record_Assigner_end;
+       }//if slot needs slave assigned
+    }//for( slotIdx..
+
+         MEAS__Capture_Post_Master_Point;
+   
+   masterSwitchToCoreCtlr( masterVP ); //returns when ctlr switches back to master
+   flushRegisters();
+   }//while(1) 
+ }
+#endif  //MODE__MULTI_LANG
+#endif  //MODE__MULTI_PROCESS
+
+
+/*This does three things:
+ * 1) ask for a slave ready to resume
+ * 2) if none, then ask for a task, and assign to the slot slave
+ * 3) if none, then prune former task slaves waiting to be recycled.
+ *
+   //Have two separate assigners in each semantic env,
+   // which keeps its own work in its own structures.. the master, here, 
+   // searches through the semantic environs, takes the first that has work
+   // available, and whatever it returns is assigned to the slot..
+   //However, also have an override assigner.. because static analysis tools know
+   // which languages are grouped together.. and the override enables them to
+   // generate a custom assigner that uses info from all the languages in a 
+   // unified way..  Don't really expect this to happen, but making it possible.
+ */
+inline SlaveVP *
+assignWork( PRProcessEnv *processEnv, AnimSlot *slot )
+ { SlaveVP     *returnSlv;
+   //VSsSemEnv   *semEnv;
+   //VSsSemData  *semData;
+   int32        coreNum, slotNum;
+   PRTaskMetaInfo *newTaskStub;
+   SlaveVP     *freeTaskSlv;
+
+   
+      //master has to handle slot slaves.. so either assigner returns
+      // taskMetaInfo or else two assigners, one for slaves, other for tasks..     
+   semEnvs = processEnv->semEnvs;
+   numEnvs = processEnv->numSemEnvs;
+   for( envIdx = 0; envIdx < numEnvs; envIdx++ )
+    { semEnv = semEnvs[envIdx];
+      if( semEnv->hasWork )
+       { assigner = semEnv->assigner; 
+         retTaskMetaInfo = (*assigner)( semEnv, slot );
+         
+         return retTaskMetaInfo; //quit, have work
+       }
+    }
+   
+   coreNum = slot->coreSlotIsOn;
+   slotNum = slot->slotIdx;
+ 
+      //first try to get a ready slave
+   returnSlv = getReadySlave();
+
+   if( returnSlv != NULL )
+    { returnSlv->coreAnimatedBy   = coreNum;
+    
+         //have work, so reset Done flag (when work generated on other core)
+      if( processEnv->coreIsDone[coreNum] == TRUE ) //reads are higher perf
+         processEnv->coreIsDone[coreNum] = FALSE;   //don't just write always
+    
+      goto ReturnTheSlv;
+    }
+   
+      //were no slaves, so try to get a ready task.. 
+   newTaskStub = getTaskStub();
+   
+   if( newTaskStub != NULL )
+    { 
+         //get the slot slave to assign the task to..
+      returnSlv = processEnv->slotTaskSlvs[coreNum][slotNum];
+
+         //point slave to task's function, and mark slave as having task
+      PR_int__reset_slaveVP_to_TopLvlFn( returnSlv, 
+                          newTaskStub->taskType->fn, newTaskStub->args );
+      returnSlv->taskStub          = newTaskStub;
+      newTaskStub->slaveAssignedTo = returnSlv;
+      returnSlv->needsTaskAssigned = FALSE;  //slot slave is a "Task" slave type
+      
+         //have work, so reset Done flag, if was set
+      if( processEnv->coreIsDone[coreNum] == TRUE ) //reads are higher perf
+         processEnv->coreIsDone[coreNum] = FALSE;   //don't just write always
+      
+      goto ReturnTheSlv;
+    }
+   else
+    {    //no task, so prune the recycle pool of free task slaves
+      freeTaskSlv = readPrivQ( processEnv->freeTaskSlvRecycleQ );
+      if( freeTaskSlv != NULL )
+       {    //delete to bound the num extras, and deliver shutdown cond
+         handleDissipate( freeTaskSlv, processEnv );
+            //then return NULL
+         returnSlv = NULL;
+         
+         goto ReturnTheSlv;
+       }
+      else
+       { //candidate for shutdown.. if all extras dissipated, and no tasks
+         // and no ready to resume slaves, then no way to generate
+         // more tasks (on this core -- other core might have task still)
+         if( processEnv->numLiveExtraTaskSlvs == 0 && 
+             processEnv->numLiveThreadSlvs == 0 )
+          { //This core sees no way to generate more tasks, so say it
+            if( processEnv->coreIsDone[coreNum] == FALSE )
+             { processEnv->numCoresDone += 1;
+               processEnv->coreIsDone[coreNum] = TRUE;
+               #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
+               processEnv->shutdownInitiated = TRUE;
+               
+               #else
+               if( processEnv->numCoresDone == NUM_CORES )
+                { //means no cores have work, and none can generate more
+                  processEnv->shutdownInitiated = TRUE;
+                }
+               #endif
+             }
+          }
+            //check if shutdown has been initiated by this or other core
+         if(processEnv->shutdownInitiated) 
+          { returnSlv = PR_SS__create_shutdown_slave();
+          }
+         else
+            returnSlv = NULL;
+
+         goto ReturnTheSlv; //don't need, but completes pattern
+       } //if( freeTaskSlv != NULL )
+    } //if( newTaskStub == NULL )
+   //outcome: 1)slave was just pointed to task, 2)no tasks, so slave NULL
+ 
+
+ ReturnTheSlv:  //All paths goto here.. to provide single point for holistic..
+
+   #ifdef HOLISTIC__TURN_ON_OBSERVE_UCC
+   if( returnSlv == NULL )
+    { returnSlv = processEnv->idleSlv[coreNum][slotNum]; 
+    
+         //things that would normally happen in resume(), but idle VPs
+         // never go there
+      returnSlv->assignCount++; //gives each idle unit a unique ID
+      Unit newU;
+      newU.vp = returnSlv->slaveID;
+      newU.task = returnSlv->assignCount;
+      addToListOfArrays(Unit,newU,processEnv->unitList);
+
+      if (returnSlv->assignCount > 1) //make a dependency from prev idle unit
+       { Dependency newD;             // to this one
+         newD.from_vp = returnSlv->slaveID;
+         newD.from_task = returnSlv->assignCount - 1;
+         newD.to_vp = returnSlv->slaveID;
+         newD.to_task = returnSlv->assignCount;
+         addToListOfArrays(Dependency, newD ,processEnv->ctlDependenciesList);  
+       }
+    }
+   else //have a slave will be assigned to the slot
+    { //assignSlv->numTimesAssigned++;
+         //get previous occupant of the slot
+      Unit prev_in_slot = 
+         processEnv->last_in_slot[coreNum * NUM_ANIM_SLOTS + slotNum];
+      if(prev_in_slot.vp != 0) //if not first slave in slot, make dependency
+       { Dependency newD;      // is a hardware dependency
+         newD.from_vp = prev_in_slot.vp;
+         newD.from_task = prev_in_slot.task;
+         newD.to_vp = returnSlv->slaveID;
+         newD.to_task = returnSlv->assignCount;
+         addToListOfArrays(Dependency,newD,processEnv->hwArcs);   
+       }
+      prev_in_slot.vp = returnSlv->slaveID; //make new slave the new previous
+      prev_in_slot.task = returnSlv->assignCount;
+      processEnv->last_in_slot[coreNum * NUM_ANIM_SLOTS + slotNum] =
+         prev_in_slot;        
+    }
+   #endif
+
+   return( returnSlv );
+ }
+
+      
+//=================================================================
+         //#else  //is MODE__MULTI_LANG
+            //For multi-lang mode, first, get the constraint-env holder out of
+            // the process, which is in the slave.
+            //Second, get the magic number out of the request, use it to look up
+            // the constraint Env within the constraint-env holder.
+            //Then get the request handler out of the constr env
+         constrEnvHolder = slave->process->constrEnvHolder;
+         reqst = slave->request;
+         langMagicNumber = reqst->langMagicNumber;
+         semanticEnv = lookup( langMagicNumber, constrEnvHolder ); //a macro
+         if( slave->reqst->type == taskEnd ) //end-task is special
+          {    //need to know what lang's task ended
+            taskEndHandler = semanticEnv->taskEndHandler;
+            (*taskEndHandler)( slave, reqst, semanticEnv ); //can put semantic data into task end reqst, for continuation, etc
+               //this is a slot slave, get a new task for it
+            if( !existsOverrideAssigner )//if exists, is set above, before loop
+             {    //search for task assigner that has work
+               for( a = 0; a < num_assigners; a++ )
+                { if( taskAssigners[a]->hasWork )
+                   { newTaskAssigner = taskAssigners[a];
+                     (*newTaskAssigner)( slave, semanticEnv );
+                     goto GotTask;
+                   }
+                }
+               goto NoTasks;
+             }
+            
+           GotTask:
+            continue; //have work, so do next iter of loop, don't call slave assigner
+          }
+         if( slave->typeOfVP == taskSlotSlv ) changeSlvType();//is suspended task
+            //now do normal suspended slave request handler
+         requestHandler = semanticEnv->requestHandler;
+         //#endif
+
+         
+       }
+         //If make it here, then was no task for this slot
+         //slot empty, hand to Assigner to fill with a slave
+      if( currSlot->needsSlaveAssigned )
+       {    //Call plugin's Assigner to give slot a new slave
+               HOLISTIC__Record_Assigner_start;
+               
+         //#ifdef  MODE__MULTI_LANG
+        NoTasks:
+            //First, choose an Assigner..
+            //There are several Assigners, one for each langlet.. they all
+            // indicate whether they have work available.. just pick the first
+            // one that has work..  Or, if there's a Unified Assigner, call
+            // that one..  So, go down array, checking..
+         if( !existsOverrideAssigner ) 
+          { for( a = 0; a < num_assigners; a++ )
+             { if( assigners[a]->hasWork )
+                { slaveAssigner = assigners[a];
+                  goto GotAssigner;
+                }
+             }
+            //no work, so just continue to next iter of scan loop
+            continue;
+          }
+         //when exists override, the assigner is set, once, above, so do nothing
+        GotAssigner:
+         //#endif
+        
+         assignedSlaveVP =
+          (*slaveAssigner)( semanticEnv, currSlot );
+         
+            //put the chosen slave into slot, and adjust flags and state
+         if( assignedSlaveVP != NULL )
+          { currSlot->slaveAssignedToSlot = assignedSlaveVP;
+            assignedSlaveVP->animSlotAssignedTo = currSlot;
+            currSlot->needsSlaveAssigned  = FALSE;
+            numSlotsFilled               += 1;
+            
+            HOLISTIC__Record_Assigner_end;
+          }
+       }//if slot needs slave assigned
+    }//for( slotIdx..
+
+         MEAS__Capture_Post_Master_Point;
+   
+   masterSwitchToCoreCtlr( masterVP );
+   flushRegisters();
+         DEBUG__printf(FALSE,"came back after switch to core -- so lock released!");
+   }//while(1) 
+ }
+
diff -r 0dc0b8653902 -r 999f2966a3e5 CoreController.c
--- a/CoreController.c	Mon Sep 03 03:34:54 2012 -0700
+++ b/CoreController.c	Wed Sep 19 23:12:44 2012 -0700
@@ -5,7 +5,7 @@
  */
 
 
-#include "VMS.h"
+#include "PR.h"
 
 #include <stdlib.h>
 #include <stdio.h>
@@ -55,9 +55,9 @@
  * amortize the overhead of switching to the master VP and running it.  With
  * multiple animation slots, the time to switch-to-master and the code in
  * the animation master is divided by the number of animation slots.
- *The core controller and animation slots are not fundamental parts of VMS,
+ *The core controller and animation slots are not fundamental parts of PR,
  * but rather optimizations put into the shared-semantic-state version of
- * VMS.  Other versions of VMS will not have a core controller nor scheduling
+ * PR.  Other versions of PR will not have a core controller nor scheduling
  * slots.
  * 
  *The core controller "owns" the physical core, in effect, and is the 
@@ -92,13 +92,13 @@
    thisCoresIdx = thisCoresThdParams->coreNum;
 
       //Assembly that saves addr of label of return instr -- label in assmbly
-   recordCoreCtlrReturnLabelAddr((void**)&(_VMSMasterEnv->coreCtlrReturnPt));
+   recordCoreCtlrReturnLabelAddr((void**)&(_PRMasterEnv->coreCtlrReturnPt));
 
-   animSlots = _VMSMasterEnv->allAnimSlots[thisCoresIdx];
+   animSlots = _PRMasterEnv->allAnimSlots[thisCoresIdx];
    currSlotIdx = 0; //start at slot 0, go up until one empty, then do master
    numRepetitionsWithNoWork = 0;
-   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
-   thisCoresMasterVP = _VMSMasterEnv->masterVPs[thisCoresIdx];
+   addrOfMasterLock = &(_PRMasterEnv->masterLock);
+   thisCoresMasterVP = _PRMasterEnv->masterVPs[thisCoresIdx];
    
    //==================== pthread related stuff ======================
       //pin the pthread to the core -- takes away Linux control
@@ -113,7 +113,7 @@
 
       //make sure the controllers all start at same time, by making them wait
    pthread_mutex_lock(  &suspendLock );
-   while( !(_VMSMasterEnv->setupComplete) )
+   while( !(_PRMasterEnv->setupComplete) )
     { pthread_cond_wait( &suspendCond, &suspendLock );
     }
    pthread_mutex_unlock( &suspendLock );
@@ -209,7 +209,7 @@
     }//while(1)
  }
 
-/*Shutdown of VMS involves several steps, of which this is the last.  This
+/*Shutdown of PR involves several steps, of which this is the last.  This
  * function is jumped to from the asmTerminateCoreCtrl, which is in turn
  * called from endOSThreadFn, which is the top-level-fn of the shutdown
  * slaves.
@@ -218,18 +218,18 @@
 terminateCoreCtlr(SlaveVP *currSlv)
  {
    //first, free shutdown Slv that jumped here, then end the pthread
-   VMS_int__dissipate_slaveVP( currSlv );
+   PR_int__dissipate_slaveVP( currSlv );
    pthread_exit( NULL );
  }
 
 inline uint32_t
 randomNumber()
  {
-	_VMSMasterEnv->seed1 = (uint32)(36969 * (_VMSMasterEnv->seed1 & 65535) + 
-                                   (_VMSMasterEnv->seed1 >> 16) );
-	_VMSMasterEnv->seed2 = (uint32)(18000 * (_VMSMasterEnv->seed2 & 65535) + 
-                                   (_VMSMasterEnv->seed2 >> 16) );
-	return (_VMSMasterEnv->seed1 << 16) + _VMSMasterEnv->seed2;
+	_PRMasterEnv->seed1 = (uint32)(36969 * (_PRMasterEnv->seed1 & 65535) + 
+                                   (_PRMasterEnv->seed1 >> 16) );
+	_PRMasterEnv->seed2 = (uint32)(18000 * (_PRMasterEnv->seed2 & 65535) + 
+                                   (_PRMasterEnv->seed2 >> 16) );
+	return (_PRMasterEnv->seed1 << 16) + _PRMasterEnv->seed2;
  }
 
 
@@ -292,14 +292,14 @@
    
    //===============  Initializations ===================
    thisCoresIdx = 0; //sequential version
-   animSlots = _VMSMasterEnv->allAnimSlots[thisCoresIdx];
+   animSlots = _PRMasterEnv->allAnimSlots[thisCoresIdx];
    currSlotIdx = 0; //start at slot 0, go up until one empty, then do master
    numRepetitionsWithNoWork = 0;
-   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
-   thisCoresMasterVP = _VMSMasterEnv->masterVPs[thisCoresIdx];
+   addrOfMasterLock = &(_PRMasterEnv->masterLock);
+   thisCoresMasterVP = _PRMasterEnv->masterVPs[thisCoresIdx];
    
       //Assembly that saves addr of label of return instr -- label in assmbly
-   recordCoreCtlrReturnLabelAddr((void**)&(_VMSMasterEnv->coreCtlrReturnPt));
+   recordCoreCtlrReturnLabelAddr((void**)&(_PRMasterEnv->coreCtlrReturnPt));
 
    
    //====================== The Core Controller ======================
diff -r 0dc0b8653902 -r 999f2966a3e5 Defines/MEAS__macros_to_be_moved_to_langs.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Defines/MEAS__macros_to_be_moved_to_langs.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,57 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef  _PR_LANG_SPEC_DEFS_H
+#define	_PR_LANG_SPEC_DEFS_H
+
+
+
+//===================  Language-specific Measurement Stuff ===================
+//
+//TODO:  move these into the language implementation directories
+//
+
+
+//===========================================================================
+//VCilk
+
+#ifdef VCILK
+
+#define spawnHistIdx      1 //note: starts at 1
+#define syncHistIdx       2
+
+#define MEAS__Make_Meas_Hists_for_Language() \
+   _PRMasterEnv->measHistsInfo = \
+          makePrivDynArrayOfSize( (void***)&(_PRMasterEnv->measHists), 200); \
+    makeAMeasHist( spawnHistIdx,      "Spawn",        50, 0, 200 ) \
+    makeAMeasHist( syncHistIdx,       "Sync",         50, 0, 200 )
+
+
+#define Meas_startSpawn \
+    int32 startStamp, endStamp; \
+    saveLowTimeStampCountInto( startStamp ); \
+
+#define Meas_endSpawn \
+    saveLowTimeStampCountInto( endStamp ); \
+    addIntervalToHist( startStamp, endStamp, \
+                             _PRMasterEnv->measHists[ spawnHistIdx ] );
+
+#define Meas_startSync \
+    int32 startStamp, endStamp; \
+    saveLowTimeStampCountInto( startStamp ); \
+
+#define Meas_endSync \
+    saveLowTimeStampCountInto( endStamp ); \
+    addIntervalToHist( startStamp, endStamp, \
+                             _PRMasterEnv->measHists[ syncHistIdx ] );
+#endif
+
+//===========================================================================
+
+#endif	/* _PR_DEFS_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Defines/PR_defs.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Defines/PR_defs.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,43 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef  _PR_DEFS_MAIN_H
+#define	_PR_DEFS_MAIN_H
+#define _GNU_SOURCE
+
+//===========================  PR-wide defs  ===============================
+
+#define SUCCESS 0
+
+   //only after macro-expansion are the defs of writePrivQ, aso looked up
+   // so these defs can be at the top, and writePrivQ defined later on..
+#define writePRQ     writePrivQ
+#define readPRQ      readPrivQ
+#define makePRQ      makePrivQ
+#define numInPRQ     numInPrivQ
+#define PRQueueStruc PrivQueueStruc
+
+
+/*The language should re-define this, but need a default in case it doesn't*/
+#ifndef _LANG_NAME_
+#define _LANG_NAME_ ""
+#endif
+
+//======================  Hardware Constants ============================
+#include "PR_defs__HW_constants.h"
+
+//======================  Macros  ======================
+   //for turning macros and other PR features on and off
+#include "PR_defs__turn_on_and_off.h"
+
+#include "../Services_Offered_by_PR/Debugging/DEBUG__macros.h"
+#include "../Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h"
+
+//===========================================================================
+#endif	/*  */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Defines/PR_defs__HW_constants.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Defines/PR_defs__HW_constants.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,54 @@
+/*
+ *  Copyright 2012 OpenSourceStewardshipFoundation
+ *  Licensed under BSD
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef _PR_HW_SPEC_DEFS_H
+#define	_PR_HW_SPEC_DEFS_H
+#define _GNU_SOURCE
+
+
+//=========================  Hardware related Constants =====================
+   //This value is the number of hardware threads in the shared memory
+   // machine
+#define NUM_CORES        4
+
+   // tradeoff amortizing master fixed overhead vs imbalance potential
+   // when work-stealing, can make bigger, at risk of losing cache affinity
+#define NUM_ANIM_SLOTS  1
+
+   //These are for backoff inside core-loop, which reduces lock contention
+#define NUM_REPS_W_NO_WORK_BEFORE_YIELD      10
+#define NUM_REPS_W_NO_WORK_BEFORE_BACKOFF    2
+#define MASTERLOCK_RETRIES_BEFORE_YIELD      100
+#define NUM_TRIES_BEFORE_DO_BACKOFF          10
+#define GET_LOCK_BACKOFF_WEIGHT 100
+   
+   // stack size in virtual processors created
+#define VIRT_PROCR_STACK_SIZE 0x8000 /* 32K */
+
+   // memory for PR_int__malloc
+#define MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE 0x8000000 /* 128M */
+
+   //Frequency of TS counts -- have to do tests to verify
+   //NOTE: turn off (in BIOS)  TURBO-BOOST and SPEED-STEP else won't be const
+#define TSCOUNT_FREQ 3180000000
+
+#define CACHE_LINE_SZ  256
+#define PAGE_SIZE     4096
+
+//To prevent false-sharing, aligns a variable to a cache-line boundary.
+//No need to use for local vars because those are never shared between cores
+#define __align_to_cacheline__ __attribute__ ((aligned(CACHE_LINE_SZ)))
+
+//aligns a pointer to cacheline. The memory area has to contain at least
+//CACHE_LINE_SZ bytes more then needed
+#define __align_address(ptr) ((void*)(((uintptr_t)(ptr))&((uintptr_t)(~0x0FF))))
+
+//===========================================================================
+
+#endif	/* _PR_DEFS_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Defines/VMS_defs.h
--- a/Defines/VMS_defs.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,43 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef  _VMS_DEFS_MAIN_H
-#define	_VMS_DEFS_MAIN_H
-#define _GNU_SOURCE
-
-//===========================  VMS-wide defs  ===============================
-
-#define SUCCESS 0
-
-   //only after macro-expansion are the defs of writePrivQ, aso looked up
-   // so these defs can be at the top, and writePrivQ defined later on..
-#define writeVMSQ     writePrivQ
-#define readVMSQ      readPrivQ
-#define makeVMSQ      makePrivQ
-#define numInVMSQ     numInPrivQ
-#define VMSQueueStruc PrivQueueStruc
-
-
-/*The language should re-define this, but need a default in case it doesn't*/
-#ifndef _LANG_NAME_
-#define _LANG_NAME_ ""
-#endif
-
-//======================  Hardware Constants ============================
-#include "VMS_defs__HW_constants.h"
-
-//======================  Macros  ======================
-   //for turning macros and other VMS features on and off
-#include "VMS_defs__turn_on_and_off.h"
-
-#include "../Services_Offered_by_VMS/Debugging/DEBUG__macros.h"
-#include "../Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h"
-
-//===========================================================================
-#endif	/*  */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Defines/VMS_defs__HW_constants.h
--- a/Defines/VMS_defs__HW_constants.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,54 +0,0 @@
-/*
- *  Copyright 2012 OpenSourceStewardshipFoundation
- *  Licensed under BSD
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef _VMS_HW_SPEC_DEFS_H
-#define	_VMS_HW_SPEC_DEFS_H
-#define _GNU_SOURCE
-
-
-//=========================  Hardware related Constants =====================
-   //This value is the number of hardware threads in the shared memory
-   // machine
-#define NUM_CORES        4
-
-   // tradeoff amortizing master fixed overhead vs imbalance potential
-   // when work-stealing, can make bigger, at risk of losing cache affinity
-#define NUM_ANIM_SLOTS  1
-
-   //These are for backoff inside core-loop, which reduces lock contention
-#define NUM_REPS_W_NO_WORK_BEFORE_YIELD      10
-#define NUM_REPS_W_NO_WORK_BEFORE_BACKOFF    2
-#define MASTERLOCK_RETRIES_BEFORE_YIELD      100
-#define NUM_TRIES_BEFORE_DO_BACKOFF          10
-#define GET_LOCK_BACKOFF_WEIGHT 100
-   
-   // stack size in virtual processors created
-#define VIRT_PROCR_STACK_SIZE 0x8000 /* 32K */
-
-   // memory for VMS_int__malloc
-#define MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE 0x8000000 /* 128M */
-
-   //Frequency of TS counts -- have to do tests to verify
-   //NOTE: turn off (in BIOS)  TURBO-BOOST and SPEED-STEP else won't be const
-#define TSCOUNT_FREQ 3180000000
-
-#define CACHE_LINE_SZ  256
-#define PAGE_SIZE     4096
-
-//To prevent false-sharing, aligns a variable to a cache-line boundary.
-//No need to use for local vars because those are never shared between cores
-#define __align_to_cacheline__ __attribute__ ((aligned(CACHE_LINE_SZ)))
-
-//aligns a pointer to cacheline. The memory area has to contain at least
-//CACHE_LINE_SZ bytes more then needed
-#define __align_address(ptr) ((void*)(((uintptr_t)(ptr))&((uintptr_t)(~0x0FF))))
-
-//===========================================================================
-
-#endif	/* _VMS_DEFS_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/PR__HW_measurement.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/HW_Dependent_Primitives/PR__HW_measurement.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,87 @@
+#include <unistd.h>
+#include <fcntl.h>
+#include <linux/types.h>
+#include <linux/perf_event.h>
+#include <errno.h>
+#include <sys/syscall.h>
+#include <linux/prctl.h>
+
+#include "../PR.h"
+
+void setup_perf_counters(){
+#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
+    struct perf_event_attr hw_event;
+   memset(&hw_event,0,sizeof(hw_event));
+	hw_event.size = sizeof(struct perf_event_attr);
+	hw_event.disabled = 1;
+	hw_event.inherit = 1; /* children inherit it   */
+	hw_event.pinned = 1; /* must always be on PMU */
+	hw_event.exclusive = 0; /* only group on PMU     */
+	hw_event.exclude_user = 0; /* don't count user      */
+	hw_event.exclude_kernel = 0; /* ditto kernel          */
+	hw_event.exclude_hv = 0; /* ditto hypervisor      */
+	hw_event.exclude_idle = 0; /* don't count when idle */
+
+        int coreIdx;
+   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
+    {
+       hw_event.type = PERF_TYPE_HARDWARE;	
+       hw_event.config = PERF_COUNT_HW_CPU_CYCLES; //cycles
+        _PRMasterEnv->cycles_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
+ 		0,//pid_t pid, 
+		coreIdx,//int cpu, 
+		-1,//int group_fd,
+		0//unsigned long flags
+	);
+        if (_PRMasterEnv->cycles_counter_fd[coreIdx]<0){
+            fprintf(stderr,"On core %d: ",coreIdx);
+            perror("Failed to open cycles counter");
+        }
+        hw_event.type = PERF_TYPE_HARDWARE;
+        hw_event.config = PERF_COUNT_HW_INSTRUCTIONS; //instrs
+        _PRMasterEnv->instrs_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
+ 		0,//pid_t pid, 
+		coreIdx,//int cpu, 
+		-1,//int group_fd,
+		0//unsigned long flags
+	);
+        if (_PRMasterEnv->instrs_counter_fd[coreIdx]<0){
+            fprintf(stderr,"On core %d: ",coreIdx);
+            perror("Failed to open instrs counter");
+        }
+        hw_event.type = PERF_TYPE_HW_CACHE;
+        hw_event.config = PERF_COUNT_HW_CACHE_L1D <<  0  |
+	(PERF_COUNT_HW_CACHE_OP_READ		<<  8) |
+	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16); //cache misses
+        _PRMasterEnv->cachem_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
+ 		0,//pid_t pid, 
+		coreIdx,//int cpu, 
+		-1,//int group_fd,
+		0//unsigned long flags
+	);
+        if (_PRMasterEnv->cachem_counter_fd[coreIdx]<0){
+            fprintf(stderr,"On core %d: ",coreIdx);
+            perror("Failed to open cache miss counter");
+            exit(1);
+        }
+   }
+        
+   prctl(PR_TASK_PERF_EVENTS_ENABLE);
+#endif
+}
+
+__inline__ uint64_t rdtsc(){
+    uint32_t lo, hi;
+    __asm__ __volatile__ (      // serialize
+    "xorl %%eax,%%eax \n        cpuid"
+    ::: "%rax", "%rbx", "%rcx", "%rdx");
+    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); 
+   /* asm volatile("RDTSC;"                   
+                 "movl %%eax, %0;"         
+                 "movl %%edx, %1;"         
+               : "=m" (lo), "=m" (hi)
+               :                        
+               : "%eax", "%edx"         
+                ); */
+    return (uint64_t)hi << 32 | lo;
+}
\ No newline at end of file
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/PR__HW_measurement.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/HW_Dependent_Primitives/PR__HW_measurement.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,63 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef _PR__HW_MEASUREMENT_H
+#define	_PR__HW_MEASUREMENT_H
+#define _GNU_SOURCE
+
+
+//===================  Macros to Capture Measurements  ======================
+
+typedef union
+ { uint32 lowHigh[2];
+   uint64 longVal;
+ }
+TSCountLowHigh;
+
+
+//===================  Macros to Capture Measurements  ======================
+//
+//===== RDTSC wrapper ===== 
+//Also runs with x86_64 code
+#define saveTSCLowHigh(lowHighIn) \
+   asm volatile("RDTSC;                   \
+                 movl %%eax, %0;          \
+                 movl %%edx, %1;"         \
+   /* outputs */ : "=m" (lowHighIn.lowHigh[0]), "=m" (lowHighIn.lowHigh[1])\
+   /* inputs  */ :                        \
+   /* clobber */ : "%eax", "%edx"         \
+                );
+
+#define saveTimeStampCountInto(low, high) \
+   asm volatile("RDTSC;                   \
+                 movl %%eax, %0;          \
+                 movl %%edx, %1;"         \
+   /* outputs */ : "=m" (low), "=m" (high)\
+   /* inputs  */ :                        \
+   /* clobber */ : "%eax", "%edx"         \
+                );
+
+#define saveLowTimeStampCountInto(low)    \
+   asm volatile("RDTSC;                   \
+                 movl %%eax, %0;"         \
+   /* outputs */ : "=m" (low)             \
+   /* inputs  */ :                        \
+   /* clobber */ : "%eax", "%edx"         \
+                );
+
+inline TSCount getTSCount();
+
+
+   //For code that calculates normalization-offset between TSC counts of
+   // different cores.
+//#define NUM_TSC_ROUND_TRIPS 10
+
+void setup_perf_counters();
+uint64_t rdtsc(void);
+#endif	/* */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/PR__primitives.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/HW_Dependent_Primitives/PR__primitives.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,137 @@
+/*
+ * This File contains all hardware dependent C code.
+ */
+
+
+#include "../PR.h"
+
+/*Reset the stack then set it up with __cdecl structure on it
+ * Except doing a trick for 64 bits, where point slave to helper assembly
+ * that copies the function pointer off stack and into a reg, then
+ * jumps to it.  So, set the resumeInstrPtr to the helper-assembly.
+ *This is for first-time startup of slave.. it trashes the stack.
+ *No registers saved into old stack frame, and no animator state to
+ * return to
+ * 
+ *This was factored into separate function because it's used stand-alone in
+ * some wrapper-libraries (but only "int" version, to warn users to check
+ * carefully that it's safe)
+ */
+inline void
+PR_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
+                              void    *dataParam)
+ { void  *stackPtr;
+
+// Start of Hardware dependent part           
+   
+    //Set slave's instr pointer to a helper Fn that copies params from stack
+   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&startUpTopLevelFn;
+   
+    //fnPtr takes two params -- void *dataParam & void *animSlv
+    // Stack grows *down*, so start it at highest stack addr, minus room
+    // for 2 params + return addr. Do ptr arith in terms of bytes..
+   stackPtr = 
+     (uint8 *)slaveVP->startOfStack + VIRT_PROCR_STACK_SIZE - 4*sizeof(void*);
+  
+    //setup __cdecl on stack
+    //Normally, return Addr is in loc pointed to by stackPtr, but doing a
+    // trick for 64 bit arch, where put ptr to top-level fn there instead,
+    // and set resumeInstrPtr to a helper-fn that copies the top-level
+    // fn ptr and params into registers.
+    //Then, dataParam is at stackPtr + 8 bytes, & animating SlaveVP above
+    //Do ptr arith in terms of pointers
+   *((SlaveVP**)stackPtr + 2 ) = slaveVP; //rightmost param
+   *((void**)stackPtr + 1 ) = dataParam;  //next  param to left
+   *((void**)stackPtr) = (void*)fnPtr;    //copied to reg by helper Fn
+   
+  
+// end of Hardware dependent part           
+   
+      //core controller will switch to stack & frame pointers stored in slave,
+      // can't use this fn if have state on stack that needs preserving.
+   slaveVP->stackPtr = stackPtr; 
+   slaveVP->framePtr = stackPtr; 
+ }
+
+
+/*Preserve the stack, pushing the __cdecl structure onto it
+ * For 64 bits, params passed in regs, so point slave to helper assembly
+ * that copies the arguments off stack and into regs, then
+ * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
+ * 
+ *This preserves the stack state existed at time slave was suspended.
+ */
+inline void
+PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param)
+ { void  *stackPtr;
+
+// Start of Hardware dependent part           
+   
+    // Get the slave's current stack ptr, and make room for param + ret addr
+   stackPtr = ((void **)slaveVP->stackPtr - 2);
+  
+    //save slave's current instr ptr as the return addr, so stack looks
+    // just like it does after a call instr.
+    //Put argument plus fn addr onto stack -- helper will copy into regs
+    // then jump to the fn
+    //fnPtr is just below top of stack, param is above at stackPtr + 8 bytes
+   *((void**)stackPtr + 1 ) = param;
+   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
+   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
+   
+    //Set slave's instr pointer to a helper Fn that copies params from stack
+   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToOneParamFn;
+   
+// end of Hardware dependent part           
+   
+      //core controller will switch to stack & frame pointers stored in slave,
+      // then jmp to helper Fn, which will then move param to register used
+      // to pass argument and jmp to fnPtr saved on stack.
+      //That fn should save the framePtr on stack and make room
+      // for its own frame, as normal.  So don't modify framePtr, only stack
+   slaveVP->stackPtr = stackPtr;
+ }
+
+
+/*Same as for one-parameter function, but puts two arguments on stack
+ *Preserve the stack, pushing the __cdecl structure onto it
+ * For 64 bits, params passed in regs, so point slave to helper assembly
+ * that copies the arguments off stack and into regs, then
+ * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
+ * 
+ *This preserves the stack state existed at time slave was suspended.
+ */
+inline void
+PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param1, void *param2)
+ { void  *stackPtr;
+
+// Start of Hardware dependent part           
+   
+    // Get the slave's current stack ptr, and make room for param + ret addr
+   stackPtr = slaveVP->stackPtr - 3;
+  
+    //save slave's current instr ptr as the return addr, so stack looks
+    // just like it does after a call instr.
+    //Put argument plus fn addr onto stack -- helper will copy into regs
+    // then jump to the fn
+    //fnPtr is just below top of stack, param1 is above at stackPtr + 8 bytes
+   *((void**)stackPtr + 2 ) = param2;
+   *((void**)stackPtr + 1 ) = param1;
+   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
+   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
+   
+    //Set slave's instr pointer to a helper Fn that copies params from stack
+   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToTwoParamFn;
+   
+// end of Hardware dependent part           
+   
+      //core controller will switch to stack & frame pointers stored in slave,
+      // then jmp to helper Fn, which will then move param to register used
+      // to pass argument and jmp to fnPtr saved on stack.
+      //That fn should save the framePtr on stack and make room
+      // for its own frame, as normal.  So don't modify framePtr, only stack
+   slaveVP->stackPtr = stackPtr;
+ }
+
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/PR__primitives.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/HW_Dependent_Primitives/PR__primitives.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,55 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef  _PR__PRIMITIVES_H
+#define	_PR__PRIMITIVES_H
+#define _GNU_SOURCE
+
+void 
+recordCoreCtlrReturnLabelAddr(void **returnAddress);
+
+void 
+switchToSlv(SlaveVP *nextSlave);
+
+void 
+switchToCoreCtlr(SlaveVP *nextSlave);
+
+void 
+masterSwitchToCoreCtlr(SlaveVP *nextSlave);
+
+void 
+startUpTopLevelFn();
+
+void 
+jmpToOneParamFn();
+
+void 
+jmpToTwoParamFn();
+
+void *
+asmTerminateCoreCtlr(SlaveVP *currSlv);
+
+#define flushRegisters() \
+        asm volatile ("":::"%rbx", "%r12", "%r13","%r14","%r15")
+
+void
+PR_int__save_return_into_ptd_to_loc_then_do_ret(void *ptdToLoc);
+
+void
+PR_int__return_to_addr_in_ptd_to_loc(void *ptdToLoc);
+
+inline void
+PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param);
+
+inline void
+PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param1, void *param2);
+
+#endif	/* _PR__HW_DEPENDENT_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/PR__primitives_asm.s
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/HW_Dependent_Primitives/PR__primitives_asm.s	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,189 @@
+.data
+
+
+.text
+
+//Save return label address for the coreCtlr to pointer
+//Arguments: Pointer to variable holding address
+.globl recordCoreCtlrReturnLabelAddr
+recordCoreCtlrReturnLabelAddr:
+    movq    $coreCtlrReturn, %rcx  #load label address
+    movq    %rcx, (%rdi)           #save address to pointer
+    ret
+
+
+//Trick for 64 bit arch -- copies args from stack into regs, then does jmp to
+// the top-level function, which was pointed to by the stack-ptr
+.globl startUpTopLevelFn
+startUpTopLevelFn:
+    movq    %rdi      , %rsi #get second argument from first argument of switchSlv
+    movq    0x08(%rsp), %rdi #get first argument from stack
+    movq    (%rsp)    , %rax #get top-level function's addr from stack
+    jmp     *%rax            #jump to the top-level function
+
+
+//Args passed in regs in 64 bit arch. This copies args from stack into regs,
+// then does jmp to the function, whose addr is on stack.
+//For 64bit, %rdi is first arg, %rsi is second arg to function
+//The top of stack is a valid return addr (old value of slaveVP's instrPtr),
+// and the fnPtr is just below the top of stack (will be overwritten when
+// fn saves the frame ptr)
+.globl jmpToOneParamFn
+jmpToOneParamFn:
+    movq    0x08(%rsp), %rdi #get the argument from stack
+    movq   -0x08(%rsp), %rax #get function's addr from stack
+    jmp     *%rax            #jump to the function
+
+.globl jmpToTwoParamFn
+jmpToTwoParamFn:
+    movq    0x10(%rsp), %rsi #get the second argument from stack
+    movq    0x08(%rsp), %rdi #get the first argument from stack
+    movq   -0x08(%rsp), %rax #get function's addr from stack
+    jmp     *%rax            #jump to the function
+
+
+//Switches form CoreCtlr to either a normal Slv VP or the Master VP
+//switch to VP's stack and frame ptr then jump to VP's next-instr-ptr
+/* SlaveVP  offsets:
+ * 0x00  stackPtr
+ * 0x08 framePtr
+ * 0x10 resumeInstrPtr
+ * 0x18 coreCtlrFramePtr
+ * 0x20 coreCtlrStackPtr
+ *
+ * _PRMasterEnv  offsets:
+ * 0x00 coreCtlrReturnPt
+ * 0x100 masterLock
+ */
+.globl switchToSlv
+switchToSlv:
+    #SlaveVP in %rdi
+    movq    %rsp      , 0x20(%rdi)   #save core ctlr stack pointer 
+    movq    %rbp      , 0x18(%rdi)   #save core ctlr frame pointer
+    movq    0x00(%rdi), %rsp         #restore stack pointer
+    movq    0x08(%rdi), %rbp         #restore frame pointer
+    movq    0x10(%rdi), %rax         #get jmp pointer
+    jmp     *%rax                    #jmp to Slv
+coreCtlrReturn:
+    ret
+
+    
+//switches to core controller. saves return address
+/* SlaveVP  offsets:
+ * 0x00  stackPtr
+ * 0x08 framePtr
+ * 0x10 resumeInstrPtr
+ * 0x18 coreCtlrFramePtr
+ * 0x20 coreCtlrStackPtr
+ *
+ * _PRMasterEnv  offsets:
+ * 0x00 coreCtlrReturnPt
+ * 0x100 masterLock
+ */
+.globl switchToCoreCtlr
+switchToCoreCtlr:
+    #SlaveVP in %rdi
+    movq    $SlvReturn, 0x10(%rdi)   #store return address
+    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
+    movq    %rbp      , 0x08(%rdi)   #save frame pointer
+    movq    0x20(%rdi), %rsp         #restore stack pointer
+    movq    0x18(%rdi), %rbp         #restore frame pointer
+    movq    $_PRMasterEnv, %rcx
+    movq        (%rcx), %rcx         #_PRMasterEnv is pointer to struct
+    movq    0x00(%rcx), %rax         #get CoreCtlrStartPt
+    jmp     *%rax                    #jmp to CoreCtlr
+SlvReturn:
+    ret
+
+
+
+//switches to core controller from master. saves return address
+//Releases masterLock so the next AnimationMaster can be executed
+/* SlaveVP  offsets:
+ * 0x00  stackPtr
+ * 0x08 framePtr
+ * 0x10 resumeInstrPtr
+ * 0x18 coreCtlrFramePtr
+ * 0x20 coreCtlrStackPtr
+ *
+ * _PRMasterEnv  offsets:
+ * 0x00 coreCtlrReturnPt
+ * 0x100 masterLock
+ */
+.globl masterSwitchToCoreCtlr
+masterSwitchToCoreCtlr:
+    #SlaveVP in %rdi
+    movq    $MasterReturn, 0x10(%rdi)   #store return address
+    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
+    movq    %rbp      , 0x08(%rdi)   #save frame pointer
+    movq    0x20(%rdi), %rsp         #restore stack pointer
+    movq    0x18(%rdi), %rbp         #restore frame pointer
+    movq    $_PRMasterEnv, %rcx
+    movq        (%rcx), %rcx         #_PRMasterEnv is pointer to struct
+    movq    0x00(%rcx), %rax         #get CoreCtlr return pt
+    movl    $0x0      , 0x100(%rcx)  #release lock
+    jmp     *%rax                    #jmp to CoreCtlr
+MasterReturn:
+    ret
+
+
+/*Switch to terminateCoreCtlr
+ *This is called by endOSThreadFn, which is the top-level function given
+ * to a shutdown slave.  When such a slave gets switched to, by the core
+ * controller, it runs the top-level function, which calls this, which
+ * then calls terminateCoreCtlr, which ends the pthread.  Note, when get
+ * here, stack is already set up for switchSlv and Slv ptr is in %rdi.
+ *Do not save registers of Slv because this function will never return
+ *
+ * SlaveVP  offsets:
+ * 0x00  stackPtr
+ * 0x08 framePtr
+ * 0x10 resumeInstrPtr
+ * 0x18 coreCtlrFramePtr
+ * 0x20 coreCtlrStackPtr
+ *
+ * _PRMasterEnv  offsets:
+ * 0x00 coreCtlrReturnPt
+ * 0x100 masterLock
+ */
+.globl asmTerminateCoreCtlr
+asmTerminateCoreCtlr:                #SlaveVP ptr is in %rdi
+    movq    0x20(%rdi), %rsp         #restore stack pointer
+    movq    0x18(%rdi), %rbp         #restore frame pointer
+    movq    $terminateCoreCtlr, %rax
+    jmp     *%rax                    #jmp to fn that ends the pthread
+
+
+/*
+ * This one for the sequential version is special. It discards the current stack
+ * and returns directly from the coreCtlr after PR_WL__dissipate_slaveVP was called
+ */
+.globl asmTerminateCoreCtlrSeq
+asmTerminateCoreCtlrSeq:
+    #SlaveVP in %rdi
+    movq    0x20(%rdi), %rsp         #restore stack pointer
+    movq    0x18(%rdi), %rbp         #restore frame pointer
+    #argument is in %rdi
+    call    PR_int__dissipate_slaveVP
+    movq    %rbp      , %rsp        #goto the coreCtlrs stack
+    pop     %rbp        #restore the old framepointer
+    ret                 #return from core controller
+    
+
+//Takes the return addr off the stack and saves into the loc pointed to by
+// by the parameter passed in via rdi.  Return addr is at 0x8(%rbp) for 64bit
+.globl PR_int__save_return_into_ptd_to_loc_then_do_ret
+PR_int__save_return_into_ptd_to_loc_then_do_ret:
+    movq 0x08(%rbp),   %rax  #get ret address, rbp is the same as in the calling function
+    movq      %rax,   (%rdi) #write ret addr into addr passed as param field
+    ret
+
+
+//Assembly code changes the return addr on the stack to the one
+// pointed to by the parameter, then returns. Stack's return addr is at 0x8(%rbp)
+.globl PR_int__return_to_addr_in_ptd_to_loc
+PR_int__return_to_addr_in_ptd_to_loc:
+    movq   (%rdi),     %rax  #get return addr from addr passed as param
+    movq    %rax, 0x08(%rbp) #write return addr to the stack of the caller
+    ret
+
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/VMS__HW_measurement.c
--- a/HW_Dependent_Primitives/VMS__HW_measurement.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,87 +0,0 @@
-#include <unistd.h>
-#include <fcntl.h>
-#include <linux/types.h>
-#include <linux/perf_event.h>
-#include <errno.h>
-#include <sys/syscall.h>
-#include <linux/prctl.h>
-
-#include "../VMS.h"
-
-void setup_perf_counters(){
-#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
-    struct perf_event_attr hw_event;
-   memset(&hw_event,0,sizeof(hw_event));
-	hw_event.size = sizeof(struct perf_event_attr);
-	hw_event.disabled = 1;
-	hw_event.inherit = 1; /* children inherit it   */
-	hw_event.pinned = 1; /* must always be on PMU */
-	hw_event.exclusive = 0; /* only group on PMU     */
-	hw_event.exclude_user = 0; /* don't count user      */
-	hw_event.exclude_kernel = 0; /* ditto kernel          */
-	hw_event.exclude_hv = 0; /* ditto hypervisor      */
-	hw_event.exclude_idle = 0; /* don't count when idle */
-
-        int coreIdx;
-   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
-    {
-       hw_event.type = PERF_TYPE_HARDWARE;	
-       hw_event.config = PERF_COUNT_HW_CPU_CYCLES; //cycles
-        _VMSMasterEnv->cycles_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
- 		0,//pid_t pid, 
-		coreIdx,//int cpu, 
-		-1,//int group_fd,
-		0//unsigned long flags
-	);
-        if (_VMSMasterEnv->cycles_counter_fd[coreIdx]<0){
-            fprintf(stderr,"On core %d: ",coreIdx);
-            perror("Failed to open cycles counter");
-        }
-        hw_event.type = PERF_TYPE_HARDWARE;
-        hw_event.config = PERF_COUNT_HW_INSTRUCTIONS; //instrs
-        _VMSMasterEnv->instrs_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
- 		0,//pid_t pid, 
-		coreIdx,//int cpu, 
-		-1,//int group_fd,
-		0//unsigned long flags
-	);
-        if (_VMSMasterEnv->instrs_counter_fd[coreIdx]<0){
-            fprintf(stderr,"On core %d: ",coreIdx);
-            perror("Failed to open instrs counter");
-        }
-        hw_event.type = PERF_TYPE_HW_CACHE;
-        hw_event.config = PERF_COUNT_HW_CACHE_L1D <<  0  |
-	(PERF_COUNT_HW_CACHE_OP_READ		<<  8) |
-	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16); //cache misses
-        _VMSMasterEnv->cachem_counter_fd[coreIdx] = syscall(__NR_perf_event_open, &hw_event,
- 		0,//pid_t pid, 
-		coreIdx,//int cpu, 
-		-1,//int group_fd,
-		0//unsigned long flags
-	);
-        if (_VMSMasterEnv->cachem_counter_fd[coreIdx]<0){
-            fprintf(stderr,"On core %d: ",coreIdx);
-            perror("Failed to open cache miss counter");
-            exit(1);
-        }
-   }
-        
-   prctl(PR_TASK_PERF_EVENTS_ENABLE);
-#endif
-}
-
-__inline__ uint64_t rdtsc(){
-    uint32_t lo, hi;
-    __asm__ __volatile__ (      // serialize
-    "xorl %%eax,%%eax \n        cpuid"
-    ::: "%rax", "%rbx", "%rcx", "%rdx");
-    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); 
-   /* asm volatile("RDTSC;"                   
-                 "movl %%eax, %0;"         
-                 "movl %%edx, %1;"         
-               : "=m" (lo), "=m" (hi)
-               :                        
-               : "%eax", "%edx"         
-                ); */
-    return (uint64_t)hi << 32 | lo;
-}
\ No newline at end of file
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/VMS__HW_measurement.h
--- a/HW_Dependent_Primitives/VMS__HW_measurement.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,63 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef _VMS__HW_MEASUREMENT_H
-#define	_VMS__HW_MEASUREMENT_H
-#define _GNU_SOURCE
-
-
-//===================  Macros to Capture Measurements  ======================
-
-typedef union
- { uint32 lowHigh[2];
-   uint64 longVal;
- }
-TSCountLowHigh;
-
-
-//===================  Macros to Capture Measurements  ======================
-//
-//===== RDTSC wrapper ===== 
-//Also runs with x86_64 code
-#define saveTSCLowHigh(lowHighIn) \
-   asm volatile("RDTSC;                   \
-                 movl %%eax, %0;          \
-                 movl %%edx, %1;"         \
-   /* outputs */ : "=m" (lowHighIn.lowHigh[0]), "=m" (lowHighIn.lowHigh[1])\
-   /* inputs  */ :                        \
-   /* clobber */ : "%eax", "%edx"         \
-                );
-
-#define saveTimeStampCountInto(low, high) \
-   asm volatile("RDTSC;                   \
-                 movl %%eax, %0;          \
-                 movl %%edx, %1;"         \
-   /* outputs */ : "=m" (low), "=m" (high)\
-   /* inputs  */ :                        \
-   /* clobber */ : "%eax", "%edx"         \
-                );
-
-#define saveLowTimeStampCountInto(low)    \
-   asm volatile("RDTSC;                   \
-                 movl %%eax, %0;"         \
-   /* outputs */ : "=m" (low)             \
-   /* inputs  */ :                        \
-   /* clobber */ : "%eax", "%edx"         \
-                );
-
-inline TSCount getTSCount();
-
-
-   //For code that calculates normalization-offset between TSC counts of
-   // different cores.
-//#define NUM_TSC_ROUND_TRIPS 10
-
-void setup_perf_counters();
-uint64_t rdtsc(void);
-#endif	/* */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/VMS__primitives.c
--- a/HW_Dependent_Primitives/VMS__primitives.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,137 +0,0 @@
-/*
- * This File contains all hardware dependent C code.
- */
-
-
-#include "../VMS.h"
-
-/*Reset the stack then set it up with __cdecl structure on it
- * Except doing a trick for 64 bits, where point slave to helper assembly
- * that copies the function pointer off stack and into a reg, then
- * jumps to it.  So, set the resumeInstrPtr to the helper-assembly.
- *This is for first-time startup of slave.. it trashes the stack.
- *No registers saved into old stack frame, and no animator state to
- * return to
- * 
- *This was factored into separate function because it's used stand-alone in
- * some wrapper-libraries (but only "int" version, to warn users to check
- * carefully that it's safe)
- */
-inline void
-VMS_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
-                              void    *dataParam)
- { void  *stackPtr;
-
-// Start of Hardware dependent part           
-   
-    //Set slave's instr pointer to a helper Fn that copies params from stack
-   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&startUpTopLevelFn;
-   
-    //fnPtr takes two params -- void *dataParam & void *animSlv
-    // Stack grows *down*, so start it at highest stack addr, minus room
-    // for 2 params + return addr. Do ptr arith in terms of bytes..
-   stackPtr = 
-     (uint8 *)slaveVP->startOfStack + VIRT_PROCR_STACK_SIZE - 4*sizeof(void*);
-  
-    //setup __cdecl on stack
-    //Normally, return Addr is in loc pointed to by stackPtr, but doing a
-    // trick for 64 bit arch, where put ptr to top-level fn there instead,
-    // and set resumeInstrPtr to a helper-fn that copies the top-level
-    // fn ptr and params into registers.
-    //Then, dataParam is at stackPtr + 8 bytes, & animating SlaveVP above
-    //Do ptr arith in terms of pointers
-   *((SlaveVP**)stackPtr + 2 ) = slaveVP; //rightmost param
-   *((void**)stackPtr + 1 ) = dataParam;  //next  param to left
-   *((void**)stackPtr) = (void*)fnPtr;    //copied to reg by helper Fn
-   
-  
-// end of Hardware dependent part           
-   
-      //core controller will switch to stack & frame pointers stored in slave,
-      // can't use this fn if have state on stack that needs preserving.
-   slaveVP->stackPtr = stackPtr; 
-   slaveVP->framePtr = stackPtr; 
- }
-
-
-/*Preserve the stack, pushing the __cdecl structure onto it
- * For 64 bits, params passed in regs, so point slave to helper assembly
- * that copies the arguments off stack and into regs, then
- * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
- * 
- *This preserves the stack state existed at time slave was suspended.
- */
-inline void
-VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param)
- { void  *stackPtr;
-
-// Start of Hardware dependent part           
-   
-    // Get the slave's current stack ptr, and make room for param + ret addr
-   stackPtr = ((void **)slaveVP->stackPtr - 2);
-  
-    //save slave's current instr ptr as the return addr, so stack looks
-    // just like it does after a call instr.
-    //Put argument plus fn addr onto stack -- helper will copy into regs
-    // then jump to the fn
-    //fnPtr is just below top of stack, param is above at stackPtr + 8 bytes
-   *((void**)stackPtr + 1 ) = param;
-   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
-   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
-   
-    //Set slave's instr pointer to a helper Fn that copies params from stack
-   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToOneParamFn;
-   
-// end of Hardware dependent part           
-   
-      //core controller will switch to stack & frame pointers stored in slave,
-      // then jmp to helper Fn, which will then move param to register used
-      // to pass argument and jmp to fnPtr saved on stack.
-      //That fn should save the framePtr on stack and make room
-      // for its own frame, as normal.  So don't modify framePtr, only stack
-   slaveVP->stackPtr = stackPtr;
- }
-
-
-/*Same as for one-parameter function, but puts two arguments on stack
- *Preserve the stack, pushing the __cdecl structure onto it
- * For 64 bits, params passed in regs, so point slave to helper assembly
- * that copies the arguments off stack and into regs, then
- * jumps to Fn.  So, set the resumeInstrPtr to the helper-assembly.
- * 
- *This preserves the stack state existed at time slave was suspended.
- */
-inline void
-VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param1, void *param2)
- { void  *stackPtr;
-
-// Start of Hardware dependent part           
-   
-    // Get the slave's current stack ptr, and make room for param + ret addr
-   stackPtr = slaveVP->stackPtr - 3;
-  
-    //save slave's current instr ptr as the return addr, so stack looks
-    // just like it does after a call instr.
-    //Put argument plus fn addr onto stack -- helper will copy into regs
-    // then jump to the fn
-    //fnPtr is just below top of stack, param1 is above at stackPtr + 8 bytes
-   *((void**)stackPtr + 2 ) = param2;
-   *((void**)stackPtr + 1 ) = param1;
-   *((void**)stackPtr) = slaveVP->resumeInstrPtr; //acts as return addr
-   *((void**)stackPtr - 1) = (void*)fnPtr;        //what helper jmps to
-   
-    //Set slave's instr pointer to a helper Fn that copies params from stack
-   slaveVP->resumeInstrPtr  = (TopLevelFnPtr)&jmpToTwoParamFn;
-   
-// end of Hardware dependent part           
-   
-      //core controller will switch to stack & frame pointers stored in slave,
-      // then jmp to helper Fn, which will then move param to register used
-      // to pass argument and jmp to fnPtr saved on stack.
-      //That fn should save the framePtr on stack and make room
-      // for its own frame, as normal.  So don't modify framePtr, only stack
-   slaveVP->stackPtr = stackPtr;
- }
-
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/VMS__primitives.h
--- a/HW_Dependent_Primitives/VMS__primitives.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,55 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef  _VMS__PRIMITIVES_H
-#define	_VMS__PRIMITIVES_H
-#define _GNU_SOURCE
-
-void 
-recordCoreCtlrReturnLabelAddr(void **returnAddress);
-
-void 
-switchToSlv(SlaveVP *nextSlave);
-
-void 
-switchToCoreCtlr(SlaveVP *nextSlave);
-
-void 
-masterSwitchToCoreCtlr(SlaveVP *nextSlave);
-
-void 
-startUpTopLevelFn();
-
-void 
-jmpToOneParamFn();
-
-void 
-jmpToTwoParamFn();
-
-void *
-asmTerminateCoreCtlr(SlaveVP *currSlv);
-
-#define flushRegisters() \
-        asm volatile ("":::"%rbx", "%r12", "%r13","%r14","%r15")
-
-void
-VMS_int__save_return_into_ptd_to_loc_then_do_ret(void *ptdToLoc);
-
-void
-VMS_int__return_to_addr_in_ptd_to_loc(void *ptdToLoc);
-
-inline void
-VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param);
-
-inline void
-VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param1, void *param2);
-
-#endif	/* _VMS__HW_DEPENDENT_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 HW_Dependent_Primitives/VMS__primitives_asm.s
--- a/HW_Dependent_Primitives/VMS__primitives_asm.s	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,189 +0,0 @@
-.data
-
-
-.text
-
-//Save return label address for the coreCtlr to pointer
-//Arguments: Pointer to variable holding address
-.globl recordCoreCtlrReturnLabelAddr
-recordCoreCtlrReturnLabelAddr:
-    movq    $coreCtlrReturn, %rcx  #load label address
-    movq    %rcx, (%rdi)           #save address to pointer
-    ret
-
-
-//Trick for 64 bit arch -- copies args from stack into regs, then does jmp to
-// the top-level function, which was pointed to by the stack-ptr
-.globl startUpTopLevelFn
-startUpTopLevelFn:
-    movq    %rdi      , %rsi #get second argument from first argument of switchSlv
-    movq    0x08(%rsp), %rdi #get first argument from stack
-    movq    (%rsp)    , %rax #get top-level function's addr from stack
-    jmp     *%rax            #jump to the top-level function
-
-
-//Args passed in regs in 64 bit arch. This copies args from stack into regs,
-// then does jmp to the function, whose addr is on stack.
-//For 64bit, %rdi is first arg, %rsi is second arg to function
-//The top of stack is a valid return addr (old value of slaveVP's instrPtr),
-// and the fnPtr is just below the top of stack (will be overwritten when
-// fn saves the frame ptr)
-.globl jmpToOneParamFn
-jmpToOneParamFn:
-    movq    0x08(%rsp), %rdi #get the argument from stack
-    movq   -0x08(%rsp), %rax #get function's addr from stack
-    jmp     *%rax            #jump to the function
-
-.globl jmpToTwoParamFn
-jmpToTwoParamFn:
-    movq    0x10(%rsp), %rsi #get the second argument from stack
-    movq    0x08(%rsp), %rdi #get the first argument from stack
-    movq   -0x08(%rsp), %rax #get function's addr from stack
-    jmp     *%rax            #jump to the function
-
-
-//Switches form CoreCtlr to either a normal Slv VP or the Master VP
-//switch to VP's stack and frame ptr then jump to VP's next-instr-ptr
-/* SlaveVP  offsets:
- * 0x00  stackPtr
- * 0x08 framePtr
- * 0x10 resumeInstrPtr
- * 0x18 coreCtlrFramePtr
- * 0x20 coreCtlrStackPtr
- *
- * _VMSMasterEnv  offsets:
- * 0x00 coreCtlrReturnPt
- * 0x100 masterLock
- */
-.globl switchToSlv
-switchToSlv:
-    #SlaveVP in %rdi
-    movq    %rsp      , 0x20(%rdi)   #save core ctlr stack pointer 
-    movq    %rbp      , 0x18(%rdi)   #save core ctlr frame pointer
-    movq    0x00(%rdi), %rsp         #restore stack pointer
-    movq    0x08(%rdi), %rbp         #restore frame pointer
-    movq    0x10(%rdi), %rax         #get jmp pointer
-    jmp     *%rax                    #jmp to Slv
-coreCtlrReturn:
-    ret
-
-    
-//switches to core controller. saves return address
-/* SlaveVP  offsets:
- * 0x00  stackPtr
- * 0x08 framePtr
- * 0x10 resumeInstrPtr
- * 0x18 coreCtlrFramePtr
- * 0x20 coreCtlrStackPtr
- *
- * _VMSMasterEnv  offsets:
- * 0x00 coreCtlrReturnPt
- * 0x100 masterLock
- */
-.globl switchToCoreCtlr
-switchToCoreCtlr:
-    #SlaveVP in %rdi
-    movq    $SlvReturn, 0x10(%rdi)   #store return address
-    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
-    movq    %rbp      , 0x08(%rdi)   #save frame pointer
-    movq    0x20(%rdi), %rsp         #restore stack pointer
-    movq    0x18(%rdi), %rbp         #restore frame pointer
-    movq    $_VMSMasterEnv, %rcx
-    movq        (%rcx), %rcx         #_VMSMasterEnv is pointer to struct
-    movq    0x00(%rcx), %rax         #get CoreCtlrStartPt
-    jmp     *%rax                    #jmp to CoreCtlr
-SlvReturn:
-    ret
-
-
-
-//switches to core controller from master. saves return address
-//Releases masterLock so the next AnimationMaster can be executed
-/* SlaveVP  offsets:
- * 0x00  stackPtr
- * 0x08 framePtr
- * 0x10 resumeInstrPtr
- * 0x18 coreCtlrFramePtr
- * 0x20 coreCtlrStackPtr
- *
- * _VMSMasterEnv  offsets:
- * 0x00 coreCtlrReturnPt
- * 0x100 masterLock
- */
-.globl masterSwitchToCoreCtlr
-masterSwitchToCoreCtlr:
-    #SlaveVP in %rdi
-    movq    $MasterReturn, 0x10(%rdi)   #store return address
-    movq    %rsp      , 0x00(%rdi)   #save stack pointer 
-    movq    %rbp      , 0x08(%rdi)   #save frame pointer
-    movq    0x20(%rdi), %rsp         #restore stack pointer
-    movq    0x18(%rdi), %rbp         #restore frame pointer
-    movq    $_VMSMasterEnv, %rcx
-    movq        (%rcx), %rcx         #_VMSMasterEnv is pointer to struct
-    movq    0x00(%rcx), %rax         #get CoreCtlr return pt
-    movl    $0x0      , 0x100(%rcx)  #release lock
-    jmp     *%rax                    #jmp to CoreCtlr
-MasterReturn:
-    ret
-
-
-/*Switch to terminateCoreCtlr
- *This is called by endOSThreadFn, which is the top-level function given
- * to a shutdown slave.  When such a slave gets switched to, by the core
- * controller, it runs the top-level function, which calls this, which
- * then calls terminateCoreCtlr, which ends the pthread.  Note, when get
- * here, stack is already set up for switchSlv and Slv ptr is in %rdi.
- *Do not save registers of Slv because this function will never return
- *
- * SlaveVP  offsets:
- * 0x00  stackPtr
- * 0x08 framePtr
- * 0x10 resumeInstrPtr
- * 0x18 coreCtlrFramePtr
- * 0x20 coreCtlrStackPtr
- *
- * _VMSMasterEnv  offsets:
- * 0x00 coreCtlrReturnPt
- * 0x100 masterLock
- */
-.globl asmTerminateCoreCtlr
-asmTerminateCoreCtlr:                #SlaveVP ptr is in %rdi
-    movq    0x20(%rdi), %rsp         #restore stack pointer
-    movq    0x18(%rdi), %rbp         #restore frame pointer
-    movq    $terminateCoreCtlr, %rax
-    jmp     *%rax                    #jmp to fn that ends the pthread
-
-
-/*
- * This one for the sequential version is special. It discards the current stack
- * and returns directly from the coreCtlr after VMS_WL__dissipate_slaveVP was called
- */
-.globl asmTerminateCoreCtlrSeq
-asmTerminateCoreCtlrSeq:
-    #SlaveVP in %rdi
-    movq    0x20(%rdi), %rsp         #restore stack pointer
-    movq    0x18(%rdi), %rbp         #restore frame pointer
-    #argument is in %rdi
-    call    VMS_int__dissipate_slaveVP
-    movq    %rbp      , %rsp        #goto the coreCtlrs stack
-    pop     %rbp        #restore the old framepointer
-    ret                 #return from core controller
-    
-
-//Takes the return addr off the stack and saves into the loc pointed to by
-// by the parameter passed in via rdi.  Return addr is at 0x8(%rbp) for 64bit
-.globl VMS_int__save_return_into_ptd_to_loc_then_do_ret
-VMS_int__save_return_into_ptd_to_loc_then_do_ret:
-    movq 0x08(%rbp),   %rax  #get ret address, rbp is the same as in the calling function
-    movq      %rax,   (%rdi) #write ret addr into addr passed as param field
-    ret
-
-
-//Assembly code changes the return addr on the stack to the one
-// pointed to by the parameter, then returns. Stack's return addr is at 0x8(%rbp)
-.globl VMS_int__return_to_addr_in_ptd_to_loc
-VMS_int__return_to_addr_in_ptd_to_loc:
-    movq   (%rdi),     %rax  #get return addr from addr passed as param
-    movq    %rax, 0x08(%rbp) #write return addr to the stack of the caller
-    ret
-
diff -r 0dc0b8653902 -r 999f2966a3e5 PR.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,442 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef _PR_H
+#define	_PR_H
+#define _GNU_SOURCE
+
+#include "DynArray/DynArray.h"
+#include "Hash_impl/PrivateHash.h"
+#include "Histogram/Histogram.h"
+#include "Queue_impl/PrivateQueue.h"
+
+#include "PR_primitive_data_types.h"
+#include "Services_Offered_by_PR/Memory_Handling/vmalloc.h"
+
+#include <pthread.h>
+#include <sys/time.h>
+
+//=================  Defines: included from separate files  =================
+//
+// Note: ALL defines are in other files, none are in here
+//
+#include "Defines/PR_defs.h"
+
+
+//================================ Typedefs =================================
+//
+typedef unsigned long long    TSCount;
+
+typedef struct _AnimSlot     AnimSlot;
+typedef struct _PRReqst      PRReqst;
+typedef struct _SlaveVP       SlaveVP;
+typedef struct _MasterVP      MasterVP;
+typedef struct _IntervalProbe IntervalProbe;
+
+
+typedef SlaveVP *(*SlaveAssigner)  ( void *, AnimSlot*); //semEnv, slot for HW info
+typedef void     (*RequestHandler) ( SlaveVP *, void * ); //prWReqst, semEnv
+typedef void     (*TopLevelFnPtr)  ( void *, SlaveVP * ); //initData, animSlv
+typedef void       TopLevelFn      ( void *, SlaveVP * ); //initData, animSlv
+typedef void     (*ResumeSlvFnPtr) ( SlaveVP *, void * );
+      //=========== MEASUREMENT STUFF ==========
+        MEAS__Insert_Counter_Handler
+      //========================================
+
+//============================ HW Dependent Fns ================================
+
+#include "HW_Dependent_Primitives/PR__HW_measurement.h"
+#include "HW_Dependent_Primitives/PR__primitives.h"
+
+
+//============= Request Related ===========
+//
+
+enum PRReqstType   //avoid starting enums at 0, for debug reasons
+ {
+   semantic = 1,
+   createReq,
+   dissipate,
+   PRSemantic      //goes with PRSemReqst below
+ };
+
+struct _PRReqst
+ {
+   enum PRReqstType  reqType;//used for dissipate and in future for IO requests
+   void              *semReqData;
+
+   PRReqst *nextReqst;
+ };
+//PRReqst
+
+enum PRSemReqstType   //These are equivalent to semantic requests, but for
+ {                     // PR's services available directly to app, like OS
+   make_probe = 1,    // and probe services -- like a PR-wide built-in lang
+   throw_excp,
+   openFile,
+   otherIO
+ };
+
+typedef struct
+ { enum PRSemReqstType reqType;
+   SlaveVP             *requestingSlv;
+   char                *nameStr;  //for create probe
+   char                *msgStr;   //for exception
+   void                *exceptionData;
+ }
+ PRSemReq;
+
+
+//====================  Core data structures  ===================
+
+typedef struct
+ {
+   //for future expansion
+ }
+SlotPerfInfo;
+
+struct _AnimSlot
+ {
+   int           workIsDone;
+   int           needsSlaveAssigned;
+   SlaveVP      *slaveAssignedToSlot;
+   
+   int           slotIdx;  //needed by Holistic Model's data gathering
+   int           coreSlotIsOn;
+   SlotPerfInfo *perfInfo; //used by assigner to pick best slave for core
+ };
+//AnimSlot
+
+enum VPtype 
+ { TaskSlotSlv = 1,//Slave tied to an anim slot, only animates tasks
+   TaskExtraSlv,   //When a suspended task ends, the slave becomes this
+   PersistentSlv,  //the VP is explicitly seen in the app code, or task suspends
+   Slave, //to be removed
+   Master,
+   Shutdown,
+   Idle
+ };
+ 
+/*This structure embodies the state of a slaveVP.  It is reused for masterVP
+ * and shutdownVPs.
+ */
+struct _SlaveVP
+ {    //The offsets of these fields are hard-coded into assembly
+   void       *stackPtr;         //save the core's stack ptr when suspend
+   void       *framePtr;         //save core's frame ptr when suspend
+   void       *resumeInstrPtr;   //save core's program-counter when suspend
+   void       *coreCtlrFramePtr; //restore before jmp back to core controller
+   void       *coreCtlrStackPtr; //restore before jmp back to core controller
+   
+      //============ below this, no fields are used in asm =============
+   
+   int         slaveID;       //each slave given a globally unique ID
+   int         coreAnimatedBy; 
+   void       *startOfStack;  //used to free, and to point slave to Fn
+   enum VPtype typeOfVP;      //Slave vs Master vs Shutdown..
+   int         assignCount;   //Each assign is for one work-unit, so IDs it
+      //note, a scheduling decision is uniquely identified by the triple:
+      // <slaveID, coreAnimatedBy, assignCount> -- used in record & replay
+   
+      //for comm -- between master and coreCtlr & btwn wrapper lib and plugin
+   AnimSlot   *animSlotAssignedTo;
+   PRReqst   *request;      //wrapper lib puts in requests, plugin takes out
+   void       *dataRetFromReq;//Return vals from plugin to Wrapper Lib
+
+      //For using Slave as carrier for data
+   void       *semanticData;  //Lang saves lang-specific things in slave here
+
+        //=========== MEASUREMENT STUFF ==========
+         MEAS__Insert_Meas_Fields_into_Slave;
+         float64     createPtInSecs;  //time VP created, in seconds
+        //========================================
+ };
+//SlaveVP
+
+ 
+/* The one and only global variable, holds many odds and ends
+ */
+typedef struct
+ {    //The offsets of these fields are hard-coded into assembly
+   void            *coreCtlrReturnPt;    //offset to this field used in asm
+   int8             falseSharePad1[256 - sizeof(void*)];
+   int32            masterLock;          //offset to this field used in asm
+   int8             falseSharePad2[256 - sizeof(int32)];
+      //============ below this, no fields are used in asm =============
+
+      //Basic PR infrastructure
+   SlaveVP        **masterVPs;
+   AnimSlot      ***allAnimSlots;
+   
+      //plugin related
+   PRSemEnv       **langlets;
+   
+      //Slave creation -- global count of slaves existing, across langs and processes
+   int32            numSlavesCreated;  //used to give unique ID to processor
+//no reasonable way to do fail-safe when have mult langlets and processes.. have to detect for each langlet separately
+//   int32            numSlavesAlive;    //used to detect fail-safe shutdown
+
+      //Initialization related
+   int32            setupComplete;      //use while starting up coreCtlr
+
+      //Memory management related
+   MallocArrays    *freeLists;
+   int32            amtOfOutstandingMem;//total currently allocated
+
+      //Random number seeds -- random nums used in various places  
+   uint32_t seed1;
+   uint32_t seed2;
+
+      //=========== MEASUREMENT STUFF =============
+       IntervalProbe   **intervalProbes;
+       PtrToPrivDynArray *dynIntervalProbesInfo;
+       HashTable        *probeNameHashTbl;
+       int32             masterCreateProbeID;
+       float64           createPtInSecs; //real-clock time PR initialized
+       Histogram       **measHists;
+       PtrToPrivDynArray *measHistsInfo;
+       MEAS__Insert_Susp_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_Master_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_System_Meas_Fields_into_MasterEnv;
+       MEAS__Insert_Counter_Meas_Fields_into_MasterEnv;
+      //==========================================
+ }
+MasterEnv;
+
+//=====================
+typedef struct
+ { int32   langletID; //acts as index into array of langlets in master env
+   void   *langletSemEnv;
+   int32   langMagicNumber;
+   SlaveAssigner    slaveAssigner;
+   RequestHandler   requestHandler;
+   EndTaskHandler   endTaskHandler;
+   
+      //Tack slaves created, separately for each langlet (in each process)
+   int32            numSlavesCreated;  //gives ordering to processor creation
+   int32            numSlavesAlive;    //used to detect fail-safe shutdown
+   
+      //when multi-lang, master polls sem env's to find one with work in it..
+      // in single-lang case, flag ignored, master always asks lang for work
+   int32   hasWork;    
+ }
+PRSemEnv;
+
+//=====================  Top Processor level Data Strucs  ======================
+typedef struct
+ { 
+   
+ }
+PRProcess;
+/*This structure holds all the information PR needs to manage a program.  PR
+ * stores information about what percent of CPU time the program is getting, 
+ * 
+ */
+typedef struct
+ { //void               *semEnv;
+   //RequestHdlrFnPtr    requestHandler;
+   //SlaveAssignerFnPtr  slaveAssigner;
+   int32               numSlavesLive;
+   void               *resultToReturn;
+  
+   SlaveVP        *seedSlv;   
+   
+      //These are used to coordinate within the main function..?
+   bool32          executionIsComplete;
+   pthread_mutex_t doneLock; //? not sure need these..?
+   pthread_cond_t  doneCond;
+ }
+PRProcess;
+
+
+//=========================  Extra Stuff Data Strucs  =======================
+typedef struct
+ {
+
+ }
+PRExcp; //exception
+
+//=======================  OS Thread related  ===============================
+
+void * coreController( void *paramsIn );  //standard PThreads fn prototype
+void * coreCtlr_Seq( void *paramsIn );  //standard PThreads fn prototype
+void animationMaster( void *initData, SlaveVP *masterVP );
+
+
+typedef struct
+ {
+   void           *endThdPt;
+   unsigned int    coreNum;
+ }
+ThdParams;
+
+//=============================  Global Vars ================================
+
+volatile MasterEnv      *_PRMasterEnv __align_to_cacheline__;
+
+   //these are global, but only used for startup and shutdown
+pthread_t       coreCtlrThdHandles[ NUM_CORES ]; //pthread's virt-procr state
+ThdParams      *coreCtlrThdParams [ NUM_CORES ];
+
+pthread_mutex_t suspendLock;
+pthread_cond_t  suspendCond;
+
+//=========================  Function Prototypes  ===========================
+/* MEANING OF   WL  PI  SS  int PROS
+ * These indicate which places the function is safe to use.  They stand for:
+ * 
+ * WL   Wrapper Library -- wrapper lib code should only use these
+ * PI   Plugin          -- plugin code should only use these
+ * SS   Startup and Shutdown -- designates these relate to startup & shutdown
+ * int  internal to PR -- should not be used in wrapper lib or plugin
+ * PROS means "OS functions for applications to use"
+ * 
+ * PR_int__ functions touch internal PR data structs and are only safe
+ *  to be used inside the master lock.  However, occasionally, they appear
+ * in wrapper-lib or plugin code.  In those cases, very careful analysis
+ * has been done to be sure no concurrency issues could arise.
+ * 
+ * PR_WL__ functions are all safe for use outside the master lock.
+ * 
+ * PROS are only safe for applications to use -- they're like a second
+ * language mixed in -- but they can't be used inside plugin code, and
+ * aren't meant for use in wrapper libraries, because they are themselves
+ * wrapper-library calls!
+ */
+//========== Startup and shutdown ==========
+void
+PR__start();
+
+void
+PR_SS__start_the_work_then_wait_until_done();
+
+SlaveVP* 
+PR_SS__create_shutdown_slave();
+
+void
+PR_SS__shutdown();
+
+void
+PR_SS__cleanup_at_end_of_shutdown();
+
+void
+PR_SS__register_langlets_semEnv( PRSemEnv *semEnv, int32 VSs_MAGIC_NUMBER, 
+                              SlaveVP  *seedVP );
+
+
+//==============    ===============
+
+inline SlaveVP *
+PR_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
+#define PR_PI__create_slaveVP PR_int__create_slaveVP
+#define PR_WL__create_slaveVP PR_int__create_slaveVP
+
+   //Use this to create processor inside entry point & other places outside
+   // the PR system boundary (IE, don't animate with a SlaveVP or MasterVP)
+SlaveVP *
+PR_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
+
+inline SlaveVP *
+PR_int__create_slaveVP_helper( SlaveVP *newSlv,       TopLevelFnPtr  fnPtr,
+                                void      *dataParam, void           *stackLocs );
+
+inline void
+PR_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
+                              void    *dataParam);
+
+inline void
+PR_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param);
+
+inline void
+PR_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
+                              void    *param1, void *param2);
+
+void
+PR_int__dissipate_slaveVP( SlaveVP *slaveToDissipate );
+#define PR_PI__dissipate_slaveVP PR_int__dissipate_slaveVP
+//WL: dissipate a SlaveVP by sending a request
+
+void
+PR_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate );
+
+void
+PR_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, PRExcp *excpData );
+#define PR_PI__throw_exception  PR_int__throw_exception
+void
+PR_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  PRExcp *excpData );
+#define PR_App__throw_exception PR_WL__throw_exception
+
+void *
+PR_int__give_sem_env_for( SlaveVP *animSlv );
+#define PR_PI__give_sem_env_for  PR_int__give_sem_env_for
+#define PR_SS__give_sem_env_for  PR_int__give_sem_env_for
+//No WL version -- not safe!  if use in WL, be sure data rd & wr is stable
+
+
+inline void
+PR_int__get_master_lock();
+
+#define PR_int__release_master_lock() _PRMasterEnv->masterLock = UNLOCKED
+
+inline uint32_t
+PR_int__randomNumber();
+
+//==============  Request Related  ===============
+
+void
+PR_int__suspend_slaveVP_and_send_req( SlaveVP *callingSlv );
+
+inline void
+PR_WL__add_sem_request_in_mallocd_PRReqst( void *semReqData, SlaveVP *callingSlv );
+
+inline void
+PR_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv );
+
+void
+PR_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv );
+
+void inline
+PR_WL__send_dissipate_req( SlaveVP *prToDissipate );
+
+inline void
+PR_WL__send_PRSem_request( void *semReqData, SlaveVP *callingSlv );
+
+PRReqst *
+PR_PI__take_next_request_out_of( SlaveVP *slaveWithReq );
+//#define PR_PI__take_next_request_out_of( slave ) slave->requests
+
+//inline void *
+//PR_PI__take_sem_reqst_from( PRReqst *req );
+#define PR_PI__take_sem_reqst_from( req ) req->semReqData
+
+void inline
+PR_PI__handle_PRSemReq( PRReqst *req, SlaveVP *requestingSlv, void *semEnv,
+                       ResumeSlvFnPtr resumeSlvFnPtr );
+
+//======================== MEASUREMENT ======================
+uint64
+PR_WL__give_num_plugin_cycles();
+uint32
+PR_WL__give_num_plugin_animations();
+
+
+//========================= Utilities =======================
+inline char *
+PR_int__strDup( char *str );
+
+
+//========================= Probes =======================
+#include "Services_Offered_by_PR/Measurement_and_Stats/probes.h"
+
+//================================================
+#endif	/* _PR_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 PR__PI.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR__PI.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,121 @@
+/*
+ * Copyright 2010  OpenSourceStewardshipFoundation
+ *
+ * Licensed under BSD
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <malloc.h>
+#include <inttypes.h>
+#include <sys/time.h>
+
+#include "PR.h"
+
+
+/* MEANING OF   WL  PI  SS  int
+ * These indicate which places the function is safe to use.  They stand for:
+ * WL: Wrapper Library
+ * PI: Plugin 
+ * SS: Startup and Shutdown
+ * int: internal to the PR implementation
+ */
+
+//=========================  Local Declarations  ========================
+void inline
+handleMakeProbe( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
+
+void inline
+handleThrowException( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
+//=======================================================================
+
+ 
+PRReqst *
+PR_PI__take_next_request_out_of( SlaveVP *slaveWithReq )
+ { PRReqst *req;
+
+   req = slaveWithReq->request;
+   if( req == NULL ) return NULL;
+
+   slaveWithReq->request = slaveWithReq->request->nextReqst;
+   return req;
+ }
+
+ 
+
+/*May 2012
+ *CHANGED IMPL -- now a macro in header file
+ *
+ *Turn function into macro that just accesses the request field
+ *
+inline void *
+PR_PI__take_sem_reqst_from( PRReqst *req )
+ {
+   return req->semReqData;
+ }
+*/
+
+
+/* This is for OS requests and PR infrastructure requests, such as to create
+ *  a probe -- a probe is inside the heart of PR-core, it's not part of any
+ *  language -- but it's also a semantic thing that's triggered from and used
+ *  in the application.. so it crosses abstractions..  so, need some special
+ *  pattern here for handling such requests.
+ * Doing this just like it were a second language sharing PR-core.
+ * 
+ * This is called from the language's request handler when it sees a request
+ *  of type PRSemReq
+ *
+ * TODO: Later change this, to give probes their own separate plugin & have
+ *  PR-core steer the request to appropriate plugin
+ * Do the same for OS calls -- look later at it..
+ */
+void inline
+PR_PI__handle_PRSemReq( PRReqst *req, SlaveVP *requestingSlv, void *semEnv,
+                       ResumeSlvFnPtr resumeFn )
+ { PRSemReq *semReq;
+
+   semReq = PR_PI__take_sem_reqst_from(req);
+   if( semReq == NULL ) return;
+   switch( semReq->reqType )  //sem handlers are all in other file
+    {
+      case make_probe:      handleMakeProbe(   semReq, semEnv, resumeFn);
+         break;
+      case throw_excp:  handleThrowException(  semReq, semEnv, resumeFn);
+         break;
+    }
+ }
+
+/*
+ */
+void inline
+handleMakeProbe( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
+ { IntervalProbe *newProbe;
+
+   newProbe          = PR_int__malloc( sizeof(IntervalProbe) );
+   newProbe->nameStr = PR_int__strDup( semReq->nameStr );
+   newProbe->hist    = NULL;
+   newProbe->schedChoiceWasRecorded = FALSE;
+
+      //This runs in masterVP, so no race-condition worries
+   newProbe->probeID =
+            addToDynArray( newProbe, _PRMasterEnv->dynIntervalProbesInfo );
+
+   semReq->requestingSlv->dataRetFromReq = newProbe;
+
+   //This in inside PR, while resume_slaveVP fn is inside language, so pass
+   // pointer from lang to here, then call it.
+   (*resumeFn)( semReq->requestingSlv, semEnv );
+ }
+
+void inline
+handleThrowException( PRSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
+ {
+   PR_int__throw_exception(  semReq->msgStr, semReq->requestingSlv, semReq->exceptionData );
+   
+   (*resumeFn)( semReq->requestingSlv, semEnv );
+ }
+
+
+
diff -r 0dc0b8653902 -r 999f2966a3e5 PR__WL.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR__WL.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,160 @@
+/*
+ * Copyright 2010  OpenSourceStewardshipFoundation
+ *
+ * Licensed under BSD
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <malloc.h>
+#include <inttypes.h>
+#include <sys/time.h>
+
+#include "PR.h"
+
+
+/* MEANING OF   WL  PI  SS  int
+ * These indicate which places the function is safe to use.  They stand for:
+ * WL: Wrapper Library
+ * PI: Plugin 
+ * SS: Startup and Shutdown
+ * int: internal to the PR implementation
+ */
+
+
+
+/*For this implementation of PR, it may not make much sense to have the
+ * system of requests for creating a new processor done this way.. but over
+ * the scope of single-master, multi-master, mult-tasking, OS-implementing,
+ * distributed-memory, and so on, this gives PR implementation a chance to
+ * do stuff before suspend, in the SlaveVP, and in the Master before the plugin
+ * is called, as well as in the lang-lib before this is called, and in the
+ * plugin.  So, this gives both PR and language implementations a chance to
+ * intercept at various points and do order-dependent stuff.
+ *Having a standard PRNewPrReqData struc allows the language to create and
+ * free the struc, while PR knows how to get the newSlv if it wants it, and
+ * it lets the lang have lang-specific data related to creation transported
+ * to the plugin.
+ */
+void
+PR_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv )
+ { PRReqst req;
+
+   req.reqType          = createReq;
+   req.semReqData       = semReqData;
+   req.nextReqst        = reqstingSlv->request;
+   reqstingSlv->request = &req;
+
+   PR_int__suspend_slaveVP_and_send_req( reqstingSlv );
+ }
+
+
+/*
+ *This adds a request to dissipate, then suspends the processor so that the
+ * request handler will receive the request.  The request handler is what
+ * does the work of freeing memory and removing the processor from the
+ * semantic environment's data structures.
+ *The request handler also is what figures out when to shutdown the PR
+ * system -- which causes all the core controller threads to die, and returns from
+ * the call that started up PR to perform the work.
+ *
+ *This form is a bit misleading to understand if one is trying to figure out
+ * how PR works -- it looks like a normal function call, but inside it
+ * sends a request to the request handler and suspends the processor, which
+ * jumps out of the PR_WL__dissipate_slaveVP function, and out of all nestings
+ * above it, transferring the work of dissipating to the request handler,
+ * which then does the actual work -- causing the processor that animated
+ * the call of this function to disappear and the "hanging" state of this
+ * function to just poof into thin air -- the virtual processor's trace
+ * never returns from this call, but instead the virtual processor's trace
+ * gets suspended in this call and all the virt processor's state disap-
+ * pears -- making that suspend the last thing in the Slv's trace.
+ */
+void
+PR_WL__send_dissipate_req( SlaveVP *slaveToDissipate )
+ { PRReqst req;
+
+   req.reqType                = dissipate;
+   req.nextReqst              = slaveToDissipate->request;
+   slaveToDissipate->request = &req;
+
+   PR_int__suspend_slaveVP_and_send_req( slaveToDissipate );
+ }
+
+
+
+/*This call's name indicates that request is malloc'd -- so req handler
+ * has to free any extra requests tacked on before a send, using this.
+ *
+ * This inserts the semantic-layer's request data into standard PR carrier
+ * request data-struct that is mallocd.  The sem request doesn't need to
+ * be malloc'd if this is called inside the same call chain before the
+ * send of the last request is called.
+ *
+ *The request handler has to call PR_int__free_PRReq for any of these
+ */
+inline void
+PR_WL__add_sem_request_in_mallocd_PRReqst( void *semReqData,
+                                          SlaveVP *callingSlv )
+ { PRReqst *req;
+
+   req = PR_int__malloc( sizeof(PRReqst) );
+   req->reqType         = semantic;
+   req->semReqData      = semReqData;
+   req->nextReqst       = callingSlv->request;
+   callingSlv->request = req;
+ }
+
+/*This inserts the semantic-layer's request data into standard PR carrier
+ * request data-struct is allocated on stack of this call & ptr to it sent
+ * to plugin
+ *Then it does suspend, to cause request to be sent.
+ */
+inline void
+PR_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv )
+ { PRReqst req;
+
+   req.reqType         = semantic;
+   req.semReqData      = semReqData;
+   req.nextReqst       = callingSlv->request;
+   callingSlv->request = &req;
+   
+   PR_int__suspend_slaveVP_and_send_req( callingSlv );
+ }
+
+
+/*May 2012 Not sure what this is..  looks like old idea for PR semantic
+ * request
+ */
+inline void
+PR_WL__send_PRSem_request( void *semReqData, SlaveVP *callingSlv )
+ { PRReqst req;
+
+   req.reqType         = PRSemantic;
+   req.semReqData      = semReqData;
+   req.nextReqst       = callingSlv->request; //gab any other preceeding 
+   callingSlv->request = &req;
+
+   PR_int__suspend_slaveVP_and_send_req( callingSlv );
+ }
+
+/*May 2012
+ *To throw exception from wrapper lib or application, first turn
+ * it into a request, then send the request
+ */
+void
+PR_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  PRExcp *excpData )
+ { PRReqst req;
+   PRSemReq semReq;
+
+   req.reqType         = PRSemantic;
+   req.semReqData      = &semReq;
+   req.nextReqst       = reqstSlv->request; //gab any other preceeding 
+   reqstSlv->request   = &req;
+
+   semReq.msgStr        = msgStr;
+   semReq.exceptionData = excpData;
+   
+   PR_int__suspend_slaveVP_and_send_req( reqstSlv );
+ }
diff -r 0dc0b8653902 -r 999f2966a3e5 PR__int.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR__int.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,289 @@
+/*
+ * Copyright 2010  OpenSourceStewardshipFoundation
+ *
+ * Licensed under BSD
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <malloc.h>
+#include <inttypes.h>
+#include <sys/time.h>
+
+#include "PR.h"
+
+
+/* MEANING OF   WL  PI  SS  int
+ * These indicate which places the function is safe to use.  They stand for:
+ * WL: Wrapper Library
+ * PI: Plugin 
+ * SS: Startup and Shutdown
+ * int: internal to the PR implementation
+ */
+
+
+inline SlaveVP *
+PR_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
+ { SlaveVP *newSlv;
+   void      *stackLocs;
+
+   newSlv      = PR_int__malloc( sizeof(SlaveVP) );
+   stackLocs   = PR_int__malloc( VIRT_PROCR_STACK_SIZE );
+   if( stackLocs == 0 )
+    { perror("PR_int__malloc stack"); exit(1); }
+
+   _PRMasterEnv->numSlavesAlive += 1;
+
+   return PR_int__create_slaveVP_helper( newSlv, fnPtr, dataParam, stackLocs );
+ }
+
+/* "ext" designates that it's for use outside the PR system -- should only
+ * be called from main thread or other thread -- never from code animated by
+ * a PR virtual processor.
+ */
+inline SlaveVP *
+PR_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
+ { SlaveVP *newSlv;
+   char      *stackLocs;
+
+   newSlv      = malloc( sizeof(SlaveVP) );
+   stackLocs  = malloc( VIRT_PROCR_STACK_SIZE );
+   if( stackLocs == 0 )
+    { perror("malloc stack"); exit(1); }
+
+   _PRMasterEnv->numSlavesAlive += 1;
+
+   return PR_int__create_slaveVP_helper(newSlv, fnPtr, dataParam, stackLocs);
+ }
+
+
+//===========================================================================
+/*there is a label inside this function -- save the addr of this label in
+ * the callingSlv struc, as the pick-up point from which to start the next
+ * work-unit for that slave.  If turns out have to save registers, then
+ * save them in the slave struc too.  Then do assembly jump to the CoreCtlr's
+ * "done with work-unit" label.  The slave struc is in the request in the
+ * slave that animated the just-ended work-unit, so all the state is saved
+ * there, and will get passed along, inside the request handler, to the
+ * next work-unit for that slave.
+ */
+void
+PR_int__suspend_slaveVP_and_send_req( SlaveVP *animatingSlv )
+ { 
+
+      //This suspended Slv will get assigned by Master again at some
+      // future point
+
+      //return ownership of the Slv and anim slot to Master virt pr
+   animatingSlv->animSlotAssignedTo->workIsDone = TRUE;
+
+        HOLISTIC__Record_HwResponderInvocation_start;
+         MEAS__Capture_Pre_Susp_Point;
+      //This assembly function is a PR primitive that first saves the
+      // stack and frame pointer, plus an addr inside this assembly code.
+      //When core ctlr later gets this slave out of a sched slot, it
+      // restores the stack and frame and then jumps to the addr.. that
+      // jmp causes return from this function.
+      //So, in effect, this function takes a variable amount of wall-clock
+      // time to complete -- the amount of time is determined by the
+      // Master, which makes sure the memory is in a consistent state first.
+   switchToCoreCtlr(animatingSlv);
+   flushRegisters();
+         MEAS__Capture_Post_Susp_Point;
+		 
+   return;
+ }
+
+
+/* "ext" designates that it's for use outside the PR system -- should only
+ * be called from main thread or other thread -- never from code animated by
+ * a SlaveVP, nor from a masterVP.
+ *
+ *Use this version to dissipate Slvs created outside the PR system.
+ */
+void
+PR_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate )
+ {
+   _PRMasterEnv->numSlavesAlive -= 1;
+   if( _PRMasterEnv->numSlavesAlive == 0 )
+    {    //no more work, so shutdown
+      PR_SS__shutdown();  //note, creates shut-down slaves on each core
+    }
+
+   //NOTE: dataParam was given to the processor, so should either have
+      // been alloc'd with PR_int__malloc, or freed by the level above animSlv.
+      //So, all that's left to free here is the stack and the SlaveVP struc
+      // itself
+      //Note, should not stack-allocate the data param -- no guarantee, in
+      // general that creating processor will outlive ones it creates.
+   free( slaveToDissipate->startOfStack );
+   free( slaveToDissipate );
+ }
+
+
+
+/*This must be called by the request handler plugin -- it cannot be called
+ * from the semantic library "dissipate processor" function -- instead, the
+ * semantic layer has to generate a request, and the plug-in calls this
+ * function.
+ *The reason is that this frees the virtual processor's stack -- which is
+ * still in use inside semantic library calls!
+ *
+ *This frees or recycles all the state owned by and comprising the PR
+ * portion of the animating virtual procr.  The request handler must first
+ * free any semantic data created for the processor that didn't use the
+ * PR_malloc mechanism.  Then it calls this, which first asks the malloc
+ * system to disown any state that did use PR_malloc, and then frees the
+ * statck and the processor-struct itself.
+ *If the dissipated processor is the sole (remaining) owner of PR_int__malloc'd
+ * state, then that state gets freed (or sent to recycling) as a side-effect
+ * of dis-owning it.
+ */
+void
+PR_int__dissipate_slaveVP( SlaveVP *animatingSlv )
+ {
+         DEBUG__printf2(dbgRqstHdlr, "PR int dissipate slaveID: %d, alive: %d",animatingSlv->slaveID, _PRMasterEnv->numSlavesAlive-1);
+      //dis-own all locations owned by this processor, causing to be freed
+      // any locations that it is (was) sole owner of
+   _PRMasterEnv->numSlavesAlive -= 1;
+   if( _PRMasterEnv->numSlavesAlive == 0 )
+    {    //no more work, so shutdown
+      PR_SS__shutdown();  //note, creates shut-down processor on each core
+    }
+
+      //NOTE: dataParam was given to the processor, so should either have
+      // been alloc'd with PR_int__malloc, or freed by the level above animSlv.
+      //So, all that's left to free here is the stack and the SlaveVP struc
+      // itself
+      //Note, should not stack-allocate initial data -- no guarantee, in
+      // general that creating processor will outlive ones it creates.
+   PR_int__free( animatingSlv->startOfStack );
+   PR_int__free( animatingSlv );
+ }
+
+/*Anticipating multi-tasking
+ */
+void *
+PR_int__give_sem_env_for( SlaveVP *animSlv )
+ {
+   return _PRMasterEnv->semanticEnv;
+ }
+
+/*
+ *
+ */
+inline SlaveVP *
+PR_int__create_slaveVP_helper( SlaveVP *newSlv,    TopLevelFnPtr  fnPtr,
+                     void    *dataParam, void          *stackLocs )
+ {
+   newSlv->startOfStack = stackLocs;
+   newSlv->slaveID      = _PRMasterEnv->numSlavesCreated++;
+   newSlv->request     = NULL;
+   newSlv->animSlotAssignedTo    = NULL;
+   newSlv->typeOfVP     = Slave;
+   newSlv->assignCount  = 0;
+
+   PR_int__reset_slaveVP_to_TopLvlFn( newSlv, fnPtr, dataParam );
+           
+   //============================= MEASUREMENT STUFF ========================
+   #ifdef PROBES__TURN_ON_STATS_PROBES
+   //TODO: make this TSCHiLow or generic equivalent
+   //struct timeval timeStamp;
+   //gettimeofday( &(timeStamp), NULL);
+   //newSlv->createPtInSecs = timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0) -
+   //                                           _PRMasterEnv->createPtInSecs;
+   #endif
+   //========================================================================
+
+   return newSlv;
+ }
+
+
+/*Later, improve this -- for now, just exits the application after printing
+ * the error message.
+ */
+void
+PR_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, PRExcp *excpData )
+ {
+   printf("%s",msgStr);
+   fflush(stdin);
+   exit(1);
+ }
+
+
+inline char *
+PR_int__strDup( char *str )
+ { char *retStr;
+
+   if( str == NULL ) return (char *)NULL;
+   retStr = (char *)PR_int__malloc( strlen(str) + 1 );
+   strcpy( retStr, str );
+
+   return (char *)retStr;
+ }
+
+
+inline void
+PR_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock );
+
+inline void
+PR_int__get_master_lock()
+ { int32 *addrOfMasterLock;
+ 
+   addrOfMasterLock = &(_PRMasterEnv->masterLock);
+
+   int numTriesToGetLock = 0;
+   int gotLock = 0;
+   
+            MEAS__Capture_Pre_Master_Lock_Point;
+
+   while( !gotLock ) //keep going until get master lock
+    { 
+      numTriesToGetLock++;   //if too many, means too much contention
+      if( numTriesToGetLock > NUM_TRIES_BEFORE_DO_BACKOFF )
+       { PR_int__backoff_for_TooLongToGetLock( numTriesToGetLock );
+       }
+      if( numTriesToGetLock > MASTERLOCK_RETRIES_BEFORE_YIELD ) 
+       { numTriesToGetLock = 0; 
+         pthread_yield();
+       }
+   
+         //try to get the lock
+      gotLock = __sync_bool_compare_and_swap( addrOfMasterLock,
+                                                         UNLOCKED, LOCKED );
+    }
+            MEAS__Capture_Post_Master_Lock_Point;
+ }
+
+/*Used by the backoff to pick a random amount of busy-wait.  Can't use the
+ * system rand because it takes much too long.
+ *Note, are passing pointers to the seeds, which are then modified
+ */
+inline uint32_t
+PR_int__randomNumber()
+ {
+	_PRMasterEnv->seed1 = 36969 * (_PRMasterEnv->seed1 & 65535) + 
+                          (_PRMasterEnv->seed1 >> 16);
+	_PRMasterEnv->seed2 = 18000 * (_PRMasterEnv->seed2 & 65535) + 
+                          (_PRMasterEnv->seed2 >> 16);
+	return (_PRMasterEnv->seed1 << 16) + _PRMasterEnv->seed2;
+ }
+
+
+/*Busy-waits for a random number of cycles -- chooses number of cycles 
+ * differently than for the no-work backoff
+ */
+inline void
+PR_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock )
+ { int32 i, waitIterations;
+   volatile double fakeWorkVar; //busy-wait fake work
+
+   waitIterations = 
+    PR_int__randomNumber()% (numTriesToGetLock * GET_LOCK_BACKOFF_WEIGHT);   
+   //addToHist( wait_iterations, coreLoopThdParams->wait_iterations_hist );
+   for( i = 0; i < waitIterations; i++ )
+    { fakeWorkVar += (fakeWorkVar + 32.0) / 2.0; //busy-wait
+    }
+ }
+
diff -r 0dc0b8653902 -r 999f2966a3e5 PR__startup_and_shutdown.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR__startup_and_shutdown.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,601 @@
+/*
+ * Copyright 2010  OpenSourceStewardshipFoundation
+ *
+ * Licensed under BSD
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <malloc.h>
+#include <inttypes.h>
+#include <sys/time.h>
+#include <pthread.h>
+
+#include "PR.h"
+
+
+#define thdAttrs NULL
+
+
+/* MEANING OF   WL  PI  SS  int
+ * These indicate which places the function is safe to use.  They stand for:
+ * WL: Wrapper Library
+ * PI: Plugin 
+ * SS: Startup and Shutdown
+ * int: internal to the PR implementation
+ */
+
+
+//===========================================================================
+AnimSlot **
+create_anim_slots( int32 coreSlotsAreOn );
+
+void
+create_masterEnv();
+
+void
+create_the_coreCtlr_OS_threads();
+
+MallocProlog *
+create_free_list();
+
+void
+endOSThreadFn( void *initData, SlaveVP *animatingSlv );
+
+
+//===========================================================================
+
+/*Setup has two phases:
+ * 1) Semantic layer first calls init_PR, which creates masterEnv, and puts
+ *    the master Slv into the work-queue, ready for first "call"
+ * 2) Semantic layer then does its own init, which creates the seed virt
+ *    slave inside the semantic layer, ready to assign it when
+ *    asked by the first run of the animationMaster.
+ *
+ *This part is bit weird because PR really wants to be "always there", and
+ * have applications attach and detach..  for now, this PR is part of
+ * the app, so the PR system starts up as part of running the app.
+ *
+ *The semantic layer is isolated from the PR internals by making the
+ * semantic layer do setup to a state that it's ready with its
+ * initial Slvs, ready to assign them to slots when the animationMaster
+ * asks.  Without this pattern, the semantic layer's setup would
+ * have to modify slots directly to assign the initial virt-procrs, and put
+ * them into the readyToAnimateQ itself, breaking the isolation completely.
+ *
+ * 
+ *The semantic layer creates the initial Slv(s), and adds its
+ * own environment to masterEnv, and fills in the pointers to
+ * the requestHandler and slaveAssigner plug-in functions
+ */
+
+/*This allocates PR data structures, populates the master PRProc,
+ * and master environment, and returns the master environment to the semantic
+ * layer.
+ */
+void
+PR__start()
+ {
+   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
+      create_masterEnv();
+      printf( "\n\n Running in SEQUENTIAL mode \n\n" );
+   #else
+      create_masterEnv();
+      DEBUG__printf1(dbgInfra,"Offset of lock in masterEnv: %d ", (int32)offsetof(MasterEnv,masterLock) );
+      create_the_coreCtlr_OS_threads();
+   #endif
+ }
+
+/*This gets the process struct out of the seedVP, then gets the semEnv-holding
+ * struct out of that, then inserts the semantic env into that struct, using
+ * the magic number as the key to the sem env placement.  The master will 
+ * use the magic number from a request to retrieve the semantic env appropriate
+ * for the construct that made the request.
+ */
+void
+PR__register_langlets_semEnv( PRSemEnv *semEnv, int32 magicNumber, 
+                              SlaveVP  *seedVP )
+ { PREnvHolder *envHolder;
+   PRProcess   *process;
+
+   process   = seedVP->process;
+   envHolder = process->semEnvHolder;
+   
+   insert( magicNumber, semEnv, envHolder );
+ }
+
+
+/*TODO: finish implementing
+ *This function returns information about the version of PR, the language
+ * the program is being run in, its version, and information on the 
+ * hardware.
+ */
+/*
+char *
+PR_App__give_environment_string()
+ {
+   //--------------------------
+    fprintf(output, "#\n# >> Build information <<\n");
+    fprintf(output, "# GCC VERSION: %d.%d.%d\n",__GNUC__,__GNUC_MINOR__,__GNUC_PATCHLEVEL__);
+    fprintf(output, "# Build Date: %s %s\n", __DATE__, __TIME__);
+    
+    fprintf(output, "#\n# >> Hardware information <<\n");
+    fprintf(output, "# Hardware Architecture: ");
+   #ifdef __x86_64
+    fprintf(output, "x86_64");
+   #endif //__x86_64
+   #ifdef __i386
+    fprintf(output, "x86");
+   #endif //__i386
+    fprintf(output, "\n");
+    fprintf(output, "# Number of Cores: %d\n", NUM_CORES);
+   //--------------------------
+    
+   //PR Plugins
+    fprintf(output, "#\n# >> PR Plugins <<\n");
+    fprintf(output, "# Language : ");
+    fprintf(output, _LANG_NAME_);
+    fprintf(output, "\n");
+       //Meta info gets set by calls from the language during its init,
+       // and info registered by calls from inside the application
+    fprintf(output, "# Assigner: %s\n", _PRMasterEnv->metaInfo->assignerInfo);
+
+   //--------------------------
+   //Application
+    fprintf(output, "#\n# >> Application <<\n");
+    fprintf(output, "# Name: %s\n", _PRMasterEnv->metaInfo->appInfo);
+    fprintf(output, "# Data Set:\n%s\n",_PRMasterEnv->metaInfo->inputSet);
+    
+   //--------------------------
+ }
+ */
+ 
+
+/*A pointer to the startup-function for the language is given as the last
+ * argument to the call.  Use this to initialize a program in the language.
+ * This creates a data structure that encapsulates the bookkeeping info
+ * PR uses to track and schedule a program run.
+ */
+PRProcess *
+PR__spawn_program_on_data_in_Lang( TopLevelFnPtr seed_fn, void *data )
+ { PRProcess *newProcess;
+   newProcess = malloc( sizeof(PRProcess) );
+   
+   newProcess->doneLock = PTHREAD_MUTEX_INITIALIZER;
+   newProcess->doneCond = PTHREAD_COND_INITIALIZER;
+   newProcess->executionIsComplete = FALSE;
+   newProcess->numSlavesLive = 0;
+   
+   newProcess->dataForSeed = data;
+   newProcess->seedFnPtr   = prog_seed_fn;
+   
+      //The language's spawn-process function fills in the plugin function-ptrs in
+      // the PRProcess struct, gives the struct to PR, which then makes and
+      // queues the seed SlaveVP, which starts processors made from the code being
+      // animated.
+    
+   (*langInitFnPtr)( newProcess );  
+   
+   return newProcess;
+ }
+
+
+/*When all SlaveVPs owned by the program-run associated to the process have
+ * dissipated, then return from this call.  There is no language to cleanup,
+ * and PR does not shutdown..  but the process bookkeeping structure,
+ * which is used by PR to track and schedule the program, is freed.
+ *The PRProcess structure is kept until this call collects the results from it,
+ * then freed.  If the process is not done yet when PR gets this
+ * call, then this call waits..  the challenge here is that this call comes from
+ * a live OS thread that's outside PR..  so, inside here, it waits on a 
+ * condition..  then it's a PR thread that signals this to wake up..
+ *First checks whether the process is done, if yes, calls the clean-up fn then
+ * returns the result extracted from the PRProcess struct.
+ *If process not done yet, then performs a wait (in a loop to be sure the
+ * wakeup is not spurious, which can happen).  PR registers the wait, and upon
+ * the process ending (last SlaveVP owned by it dissipates), then PR signals
+ * this to wakeup.  This then calls the cleanup fn and returns the result.
+ */
+/*
+void *
+PR_App__give_results_when_done_for( PRProcess *process )
+ { void *result;
+   
+   pthread_mutex_lock( process->doneLock );
+   while( !(process->executionIsComplete) )
+    {
+      pthread_cond_wait( process->doneCond,
+                         process->doneLock );
+    }
+   pthread_mutex_unlock( process->doneLock );
+   
+   result = process->resultToReturn;
+   
+   PR_int__cleanup_process_after_done( process );
+   free( process );  //was malloc'd above, so free it here
+   
+   return result;
+ }
+*/
+
+/*Turns off the PR system, and frees all data associated with it.  Does this
+ * by creating shutdown SlaveVPs and inserting them into animation slots.
+ * Will probably have to wake up sleeping cores as part of this -- the fn that
+ * inserts the new SlaveVPs should handle the wakeup..
+ */
+/*
+void
+PR_SS__shutdown(); //already defined -- look at it
+
+void
+PR_App__shutdown()
+ {
+   for( cores )
+    { slave = PR_int__create_new_SlaveVP( endOSThreadFn, NULL );
+      PR_int__insert_slave_onto_core( SlaveVP *slave, coreNum );
+    }
+ }
+*/
+
+/* PR_App__start_PR_running();
+
+   PRProcess matrixMultProcess;
+   
+   matrixMultProcess =
+    PR_App__spawn_program_on_data_in_Lang( &prog_seed_fn, data, Vthread_lang );
+   
+   resMatrix = PR_App__give_results_when_done_for( matrixMultProcess );
+   
+   PR_App__shutdown();
+ */
+
+void
+create_masterEnv()
+ { MasterEnv       *masterEnv;
+   PRQueueStruc  **readyToAnimateQs;
+   int              coreIdx;
+   SlaveVP        **masterVPs;
+   AnimSlot     ***allAnimSlots; //ptr to array of ptrs
+
+
+      //Make the master env, which holds everything else
+   _PRMasterEnv = malloc( sizeof(MasterEnv) );
+
+        //Very first thing put into the master env is the free-list, seeded
+        // with a massive initial chunk of memory.
+        //After this, all other mallocs are PR__malloc.
+   _PRMasterEnv->freeLists        = PR_ext__create_free_list();
+   
+   
+   //===================== Only PR__malloc after this ====================
+   masterEnv     = (MasterEnv*)_PRMasterEnv;
+   
+      //Make a readyToAnimateQ for each core controller
+   readyToAnimateQs = PR_int__malloc( NUM_CORES * sizeof(PRQueueStruc *) );
+   masterVPs        = PR_int__malloc( NUM_CORES * sizeof(SlaveVP *) );
+
+      //One array for each core, several in array, core's masterVP scheds all
+   allAnimSlots    = PR_int__malloc( NUM_CORES * sizeof(AnimSlot *) );
+
+   _PRMasterEnv->numSlavesAlive = 0;  //used to detect shut-down condition
+
+//========================================
+   semEnv->shutdownInitiated = FALSE;
+   semEnv->coreIsDone = PR_int__malloc( NUM_CORES * sizeof( bool32 ) );
+   
+      //For each animation slot, there is an idle slave, and an initial
+      // slave assigned as the current-task-slave.  Create them here.
+   SlaveVP *idleSlv, *slotTaskSlv;
+   for( coreNum = 0; coreNum < NUM_CORES; coreNum++ )
+    { semEnv->coreIsDone[coreNum] = FALSE; //use during shutdown
+    
+      for( slotNum = 0; slotNum < NUM_ANIM_SLOTS; ++slotNum )
+       { idleSlv = VSs__create_slave_helper( &idle_fn, NULL, semEnv, 0);
+         idleSlv->coreAnimatedBy                = coreNum;
+         idleSlv->animSlotAssignedTo            =
+                               _PRMasterEnv->allAnimSlots[coreNum][slotNum];
+         semEnv->idleSlv[coreNum][slotNum] = idleSlv;
+         
+         slotTaskSlv = VSs__create_slave_helper( &idle_fn, NULL, semEnv, 0);
+         slotTaskSlv->coreAnimatedBy            = coreNum;
+         slotTaskSlv->animSlotAssignedTo        = 
+                               _PRMasterEnv->allAnimSlots[coreNum][slotNum];
+         
+         semData                    = slotTaskSlv->semanticData;
+         semData->needsTaskAssigned = TRUE;
+         semData->slaveType         = SlotTaskSlv;
+         semEnv->slotTaskSlvs[coreNum][slotNum] = slotTaskSlv;
+       }
+    }
+
+      //create the recycle queue where free task slaves are put after their task ends
+   semEnv->freeTaskSlvRecycleQ  = makePRQ();
+   
+
+   semEnv->numLiveExtraTaskSlvs   = 0;
+   semEnv->numLiveThreadSlvs      = 0; //none existent yet.. "create process" creates the seeds  
+//==================================================================
+   
+   _PRMasterEnv->numSlavesCreated = 0;  //used by create slave to set slave ID
+   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
+    {    
+      readyToAnimateQs[ coreIdx ] = makePRQ();
+      
+         //Q: should give masterVP core-specific info as its init data?
+      masterVPs[ coreIdx ] = PR_int__create_slaveVP( (TopLevelFnPtr)&animationMaster, (void*)masterEnv );
+      masterVPs[ coreIdx ]->coreAnimatedBy = coreIdx;
+      masterVPs[ coreIdx ]->typeOfVP = Master;
+      allAnimSlots[ coreIdx ] = create_anim_slots( coreIdx ); //makes for one core
+    }
+   _PRMasterEnv->masterVPs        = masterVPs;
+   _PRMasterEnv->masterLock       = UNLOCKED;
+   _PRMasterEnv->seed1 = rand()%1000; // init random number generator
+   _PRMasterEnv->seed2 = rand()%1000; // init random number generator
+   _PRMasterEnv->allAnimSlots    = allAnimSlots;
+   _PRMasterEnv->measHistsInfo = NULL; 
+
+   //============================= MEASUREMENT STUFF ========================
+      
+         MEAS__Make_Meas_Hists_for_Susp_Meas;
+         MEAS__Make_Meas_Hists_for_Master_Meas;
+         MEAS__Make_Meas_Hists_for_Master_Lock_Meas;
+         MEAS__Make_Meas_Hists_for_Malloc_Meas;
+         MEAS__Make_Meas_Hists_for_Plugin_Meas;
+         MEAS__Make_Meas_Hists_for_Language;
+
+         PROBES__Create_Probe_Bookkeeping_Vars;
+         
+         HOLISTIC__Setup_Perf_Counters;
+         
+   //========================================================================
+ }
+
+AnimSlot **
+create_anim_slots( int32 coreSlotsAreOn )
+ { AnimSlot  **animSlots;
+   int i;
+
+   animSlots  = PR_int__malloc( NUM_ANIM_SLOTS * sizeof(AnimSlot *) );
+
+   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
+    {
+      animSlots[i] = PR_int__malloc( sizeof(AnimSlot) );
+
+         //Set state to mean "handling requests done, slot needs filling"
+      animSlots[i]->workIsDone         = FALSE;
+      animSlots[i]->needsSlaveAssigned = TRUE;
+      animSlots[i]->slotIdx            = i; //quick retrieval of slot pos
+      animSlots[i]->coreSlotIsOn       = coreSlotsAreOn;
+    }
+   return animSlots;
+ }
+
+
+void
+freeAnimSlots( AnimSlot **animSlots )
+ { int i;
+   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
+    {
+      PR_int__free( animSlots[i] );
+    }
+   PR_int__free( animSlots );
+ }
+
+
+void
+create_the_coreCtlr_OS_threads()
+ {
+   //========================================================================
+   //                      Create the Threads
+   int coreIdx, retCode;
+
+      //Need the threads to be created suspended, and wait for a signal
+      // before proceeding -- gives time after creating to initialize other
+      // stuff before the coreCtlrs set off.
+   _PRMasterEnv->setupComplete = 0;
+   
+      //initialize the cond used to make the new threads wait and sync up
+      //must do this before *creating* the threads..
+   pthread_mutex_init( &suspendLock, NULL );
+   pthread_cond_init( &suspendCond, NULL );
+
+      //Make the threads that animate the core controllers
+   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
+    { coreCtlrThdParams[coreIdx]          = PR_int__malloc( sizeof(ThdParams) );
+      coreCtlrThdParams[coreIdx]->coreNum = coreIdx;
+
+      retCode =
+      pthread_create( &(coreCtlrThdHandles[coreIdx]),
+                        thdAttrs,
+                       &coreController,
+               (void *)(coreCtlrThdParams[coreIdx]) );
+      if(retCode){printf("ERROR creating thread: %d\n", retCode); exit(1);}
+    }
+ }
+
+
+/*This is what causes the PR system to initialize.. then waits for it to
+ * exit.
+ * 
+ *Wrapper lib layer calls this when it wants the system to start running..
+ */
+/*
+void
+PR_SS__start_the_work_then_wait_until_done()
+ { 
+#ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
+   //Only difference between version with an OS thread pinned to each core and
+   // the sequential version of PR is PR__init_Seq, this, and coreCtlr_Seq.
+   //
+         //Instead of un-suspending threads, just call the one and only
+         // core ctlr (sequential version), in the main thread.
+      coreCtlr_Seq( NULL );
+      flushRegisters();
+#else
+   int coreIdx;
+      //Start the core controllers running
+   
+      //tell the core controller threads that setup is complete
+      //get lock, to lock out any threads still starting up -- they'll see
+      // that setupComplete is true before entering while loop, and so never
+      // wait on the condition
+   pthread_mutex_lock(     &suspendLock );
+   _PRMasterEnv->setupComplete = 1;
+   pthread_mutex_unlock(   &suspendLock );
+   pthread_cond_broadcast( &suspendCond );
+   
+   
+      //wait for all to complete
+   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
+    {
+      pthread_join( coreCtlrThdHandles[coreIdx], NULL );
+    }
+   
+      //NOTE: do not clean up PR env here -- semantic layer has to have
+      // a chance to clean up its environment first, then do a call to free
+      // the Master env and rest of PR locations
+#endif
+ }
+*/
+
+SlaveVP* PR_SS__create_shutdown_slave(){
+    SlaveVP* shutdownVP;
+    
+    shutdownVP = PR_int__create_slaveVP( &endOSThreadFn, NULL );
+    shutdownVP->typeOfVP = Shutdown;
+    
+    return shutdownVP;
+}
+
+//TODO: look at architecting cleanest separation between request handler
+// and animation master, for dissipate, create, shutdown, and other non-semantic
+// requests.  Issue is chain: one removes requests from AppSlv, one dispatches
+// on type of request, and one handles each type..  but some types require
+// action from both request handler and animation master -- maybe just give the
+// request handler calls like:  PR__handle_X_request_type
+
+
+/*This is called by the semantic layer's request handler when it decides its
+ * time to shut down the PR system.  Calling this causes the core controller OS
+ * threads to exit, which unblocks the entry-point function that started up
+ * PR, and allows it to grab the result and return to the original single-
+ * threaded application.
+ * 
+ *The _PRMasterEnv is needed by this shut down function, so the create-seed-
+ * and-wait function has to free a bunch of stuff after it detects the
+ * threads have all died: the masterEnv, the thread-related locations,
+ * masterVP any AppSlvs that might still be allocated and sitting in the
+ * semantic environment, or have been orphaned in the _PRWorkQ.
+ * 
+ *NOTE: the semantic plug-in is expected to use PR__malloc to get all the
+ * locations it needs, and give ownership to masterVP.  Then, they will be
+ * automatically freed.
+ *
+ *In here,create one core-loop shut-down processor for each core controller and put
+ * them all directly into the readyToAnimateQ.
+ *Note, this function can ONLY be called after the semantic environment no
+ * longer cares if AppSlvs get animated after the point this is called.  In
+ * other words, this can be used as an abort, or else it should only be
+ * called when all AppSlvs have finished dissipate requests -- only at that
+ * point is it sure that all results have completed.
+ */
+void
+PR_SS__shutdown()
+ { int32       coreIdx;
+   SlaveVP    *shutDownSlv;
+   AnimSlot **animSlots;
+      //create the shutdown processors, one for each core controller -- put them
+      // directly into the Q -- each core will die when gets one
+   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
+    {    //Note, this is running in the master
+      shutDownSlv = PR_SS__create_shutdown_slave();
+         //last slave has dissipated, so no more in slots, so write
+         // shut down slave into first animulng slot.
+      animSlots = _PRMasterEnv->allAnimSlots[ coreIdx ];
+      animSlots[0]->slaveAssignedToSlot = shutDownSlv;
+      animSlots[0]->needsSlaveAssigned = FALSE;
+      shutDownSlv->coreAnimatedBy = coreIdx;
+      shutDownSlv->animSlotAssignedTo = animSlots[ 0 ];
+    }
+ }
+
+
+/*Am trying to be cute, avoiding IF statement in coreCtlr that checks for
+ * a special shutdown slaveVP.  Ended up with extra-complex shutdown sequence.
+ *This function has the sole purpose of setting the stack and framePtr
+ * to the coreCtlr's stack and framePtr.. it does that then jumps to the
+ * core ctlr's shutdown point -- might be able to just call Pthread_exit
+ * from here, but am going back to the pthread's stack and setting everything
+ * up just as if it never jumped out, before calling pthread_exit.
+ *The end-point of core ctlr will free the stack and so forth of the
+ * processor that animates this function, (this fn is transfering the
+ * animator of the AppSlv that is in turn animating this function over
+ * to core controller function -- note that this slices out a level of virtual
+ * processors).
+ */
+void
+endOSThreadFn( void *initData, SlaveVP *animatingSlv )
+ { 
+   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
+    asmTerminateCoreCtlrSeq(animatingSlv);
+   #else
+    asmTerminateCoreCtlr(animatingSlv);
+   #endif
+ }
+
+
+/*This is called from the startup & shutdown
+ */
+void
+PR_SS__cleanup_at_end_of_shutdown()
+ { 
+      //Before getting rid of everything, print out any measurements made
+   if( _PRMasterEnv->measHistsInfo != NULL )
+    { forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&printHist );
+      forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&saveHistToFile);
+      forAllInDynArrayDo( _PRMasterEnv->measHistsInfo, (DynArrayFnPtr)&freeHist );
+    }
+   
+   MEAS__Print_Hists_for_Susp_Meas;
+   MEAS__Print_Hists_for_Master_Meas;
+   MEAS__Print_Hists_for_Master_Lock_Meas;
+   MEAS__Print_Hists_for_Malloc_Meas;
+   MEAS__Print_Hists_for_Plugin_Meas;
+   
+
+      //All the environment data has been allocated with PR__malloc, so just
+      // free its internal big-chunk and all inside it disappear.
+/*
+   readyToAnimateQs = _PRMasterEnv->readyToAnimateQs;
+   masterVPs        = _PRMasterEnv->masterVPs;
+   allAnimSlots    = _PRMasterEnv->allAnimSlots;
+   
+   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
+    {
+      freePRQ( readyToAnimateQs[ coreIdx ] );
+         //master Slvs were created external to PR, so use external free
+      PR_int__dissipate_slaveVP( masterVPs[ coreIdx ] );
+      
+      freeAnimSlots( allAnimSlots[ coreIdx ] );
+    }
+   
+   PR_int__free( _PRMasterEnv->readyToAnimateQs );
+   PR_int__free( _PRMasterEnv->masterVPs );
+   PR_int__free( _PRMasterEnv->allAnimSlots );
+   
+   //============================= MEASUREMENT STUFF ========================
+   #ifdef PROBES__TURN_ON_STATS_PROBES
+   freeDynArrayDeep( _PRMasterEnv->dynIntervalProbesInfo, &PR_WL__free_probe);
+   #endif
+   //========================================================================
+*/
+      //These are the only two that use system free 
+   PR_ext__free_free_list( _PRMasterEnv->freeLists );
+   free( (void *)_PRMasterEnv );
+ }
+
+
+//================================
+
+
diff -r 0dc0b8653902 -r 999f2966a3e5 PR_primitive_data_types.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/PR_primitive_data_types.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,42 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *  
+ * Author: seanhalle@yahoo.com
+ *  
+
+ */
+
+#ifndef _PRIMITIVE_DATA_TYPES_H
+#define _PRIMITIVE_DATA_TYPES_H
+
+
+/*For portability, need primitive data types that have a well defined
+ * size, and well-defined layout into bytes
+ *To do this, provide standard aliases for all primitive data types
+ *These aliases must be used in all functions instead of the ANSI types
+ *
+ *When PR is used together with BLIS, these definitions will be replaced
+ * inside each specialization module according to the compiler used in
+ * that module and the hardware being specialized to.
+ */
+typedef char               bool8;
+typedef char               int8;
+typedef char               uint8;
+typedef short              int16;
+typedef unsigned short     uint16;
+typedef int                int32;
+typedef unsigned int       uint32;
+typedef unsigned int       bool32;
+typedef long long          int64;
+typedef unsigned long long uint64;
+typedef float              float32;
+typedef double             float64;
+//typedef double double      float128;  //GCC doesn't like this
+#define float128 double double
+
+#define TRUE  1
+#define FALSE 0
+
+#endif	/* _PRIMITIVE_DATA_TYPES_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Services_Offered_by_PR/Measurement_and_Stats/MEAS__macros.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,514 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef _PR_MEAS_MACROS_H
+#define _PR_MEAS_MACROS_H
+#define _GNU_SOURCE
+
+//==================  Macros define types of meas want  =====================
+//
+/*Generic measurement macro -- has name-space collision potential, which
+ * compiler will catch..  so only use one pair inside a given set of 
+ * curly braces. 
+ */
+//TODO: finish generic capture interval in hist
+enum histograms
+ { generic1
+ };
+   #define MEAS__Capture_Pre_Point \
+      int32 startStamp, endStamp; \
+      saveLowTimeStampCountInto( startStamp );
+
+   #define MEAS__Capture_Post_Point( histName ) \
+      saveLowTimeStampCountInto( endStamp ); \
+      addIntervalToHist( startStamp, endStamp, _PRMasterEnv->histName ); 
+
+
+
+
+//==================  Macros define types of meas want  =====================
+
+#ifdef MEAS__TURN_ON_SUSP_MEAS
+   #define MEAS__Insert_Susp_Meas_Fields_into_Slave \
+       uint32  preSuspTSCLow; \
+       uint32  postSuspTSCLow;
+
+   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv \
+       Histogram       *suspLowTimeHist; \
+       Histogram       *suspHighTimeHist;
+
+   #define MEAS__Make_Meas_Hists_for_Susp_Meas \
+      _PRMasterEnv->suspLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "master_low_time_hist");\
+      _PRMasterEnv->suspHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "master_high_time_hist");
+      
+      //record time stamp: compare to time-stamp recorded below
+   #define MEAS__Capture_Pre_Susp_Point \
+      saveLowTimeStampCountInto( animatingSlv->preSuspTSCLow );
+   
+      //NOTE: only take low part of count -- do sanity check when take diff
+   #define MEAS__Capture_Post_Susp_Point \
+      saveLowTimeStampCountInto( animatingSlv->postSuspTSCLow );\
+      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
+                         _PRMasterEnv->suspLowTimeHist ); \
+      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
+                         _PRMasterEnv->suspHighTimeHist );
+
+   #define MEAS__Print_Hists_for_Susp_Meas \
+      printHist( _PRMasterEnv->pluginTimeHist );
+      
+#else
+   #define MEAS__Insert_Susp_Meas_Fields_into_Slave     
+   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv 
+   #define MEAS__Make_Meas_Hists_for_Susp_Meas 
+   #define MEAS__Capture_Pre_Susp_Point
+   #define MEAS__Capture_Post_Susp_Point   
+   #define MEAS__Print_Hists_for_Susp_Meas 
+#endif
+
+#ifdef MEAS__TURN_ON_MASTER_MEAS
+   #define MEAS__Insert_Master_Meas_Fields_into_Slave \
+       uint32  startMasterTSCLow; \
+       uint32  endMasterTSCLow;
+
+   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv \
+       Histogram       *masterLowTimeHist; \
+       Histogram       *masterHighTimeHist;
+
+   #define MEAS__Make_Meas_Hists_for_Master_Meas \
+      _PRMasterEnv->masterLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "master_low_time_hist");\
+      _PRMasterEnv->masterHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "master_high_time_hist");
+
+      //Total Master time includes one coreloop time -- just assume the core
+      // loop time is same for Master as for AppSlvs, even though it may be
+      // smaller due to higher predictability of the fixed jmp.
+   #define MEAS__Capture_Pre_Master_Point\
+      saveLowTimeStampCountInto( masterVP->startMasterTSCLow );
+
+   #define MEAS__Capture_Post_Master_Point \
+      saveLowTimeStampCountInto( masterVP->endMasterTSCLow );\
+      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
+                         _PRMasterEnv->masterLowTimeHist ); \
+      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
+                         _PRMasterEnv->masterHighTimeHist );
+
+   #define MEAS__Print_Hists_for_Master_Meas \
+      printHist( _PRMasterEnv->pluginTimeHist );
+
+#else
+   #define MEAS__Insert_Master_Meas_Fields_into_Slave
+   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv 
+   #define MEAS__Make_Meas_Hists_for_Master_Meas
+   #define MEAS__Capture_Pre_Master_Point 
+   #define MEAS__Capture_Post_Master_Point 
+   #define MEAS__Print_Hists_for_Master_Meas 
+#endif
+
+      
+#ifdef MEAS__TURN_ON_MASTER_LOCK_MEAS
+   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv \
+       Histogram       *masterLockLowTimeHist; \
+       Histogram       *masterLockHighTimeHist;
+
+   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas \
+      _PRMasterEnv->masterLockLowTimeHist  = makeFixedBinHist( 50, 0, 2, \
+                                               "master lock low time hist");\
+      _PRMasterEnv->masterLockHighTimeHist  = makeFixedBinHist( 50, 0, 100,\
+                                               "master lock high time hist");
+
+   #define MEAS__Capture_Pre_Master_Lock_Point \
+      int32 startStamp, endStamp; \
+      saveLowTimeStampCountInto( startStamp );
+
+   #define MEAS__Capture_Post_Master_Lock_Point \
+      saveLowTimeStampCountInto( endStamp ); \
+      addIntervalToHist( startStamp, endStamp,\
+                         _PRMasterEnv->masterLockLowTimeHist ); \
+      addIntervalToHist( startStamp, endStamp,\
+                         _PRMasterEnv->masterLockHighTimeHist );
+
+   #define MEAS__Print_Hists_for_Master_Lock_Meas \
+      printHist( _PRMasterEnv->masterLockLowTimeHist ); \
+      printHist( _PRMasterEnv->masterLockHighTimeHist );
+      
+#else
+   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv
+   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas
+   #define MEAS__Capture_Pre_Master_Lock_Point 
+   #define MEAS__Capture_Post_Master_Lock_Point 
+   #define MEAS__Print_Hists_for_Master_Lock_Meas
+#endif
+
+
+#ifdef MEAS__TURN_ON_MALLOC_MEAS
+   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv\
+       Histogram       *mallocTimeHist; \
+       Histogram       *freeTimeHist;
+
+   #define MEAS__Make_Meas_Hists_for_Malloc_Meas \
+      _PRMasterEnv->mallocTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
+                                                       "malloc_time_hist");\
+      _PRMasterEnv->freeTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
+                                                       "free_time_hist");
+
+   #define MEAS__Capture_Pre_Malloc_Point \
+      int32 startStamp, endStamp; \
+      saveLowTimeStampCountInto( startStamp );
+
+   #define MEAS__Capture_Post_Malloc_Point \
+      saveLowTimeStampCountInto( endStamp ); \
+      addIntervalToHist( startStamp, endStamp,\
+                         _PRMasterEnv->mallocTimeHist ); 
+
+   #define MEAS__Capture_Pre_Free_Point \
+      int32 startStamp, endStamp; \
+      saveLowTimeStampCountInto( startStamp );
+
+   #define MEAS__Capture_Post_Free_Point \
+      saveLowTimeStampCountInto( endStamp ); \
+      addIntervalToHist( startStamp, endStamp,\
+                         _PRMasterEnv->freeTimeHist ); 
+
+   #define MEAS__Print_Hists_for_Malloc_Meas \
+      printHist( _PRMasterEnv->mallocTimeHist   ); \
+      saveHistToFile( _PRMasterEnv->mallocTimeHist   ); \
+      printHist( _PRMasterEnv->freeTimeHist     ); \
+      saveHistToFile( _PRMasterEnv->freeTimeHist     ); \
+      freeHistExt( _PRMasterEnv->mallocTimeHist ); \
+      freeHistExt( _PRMasterEnv->freeTimeHist   );
+      
+#else
+   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv
+   #define MEAS__Make_Meas_Hists_for_Malloc_Meas 
+   #define MEAS__Capture_Pre_Malloc_Point
+   #define MEAS__Capture_Post_Malloc_Point
+   #define MEAS__Capture_Pre_Free_Point
+   #define MEAS__Capture_Post_Free_Point
+   #define MEAS__Print_Hists_for_Malloc_Meas 
+#endif
+
+
+
+#ifdef MEAS__TURN_ON_PLUGIN_MEAS 
+   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv \
+      Histogram       *reqHdlrLowTimeHist; \
+      Histogram       *reqHdlrHighTimeHist;
+          
+   #define MEAS__Make_Meas_Hists_for_Plugin_Meas \
+      _PRMasterEnv->reqHdlrLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "plugin_low_time_hist");\
+      _PRMasterEnv->reqHdlrHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
+                                                    "plugin_high_time_hist");
+
+   #define MEAS__startReqHdlr \
+      int32 startStamp1, endStamp1; \
+      saveLowTimeStampCountInto( startStamp1 );
+
+   #define MEAS__endReqHdlr \
+      saveLowTimeStampCountInto( endStamp1 ); \
+      addIntervalToHist( startStamp1, endStamp1, \
+                           _PRMasterEnv->reqHdlrLowTimeHist ); \
+      addIntervalToHist( startStamp1, endStamp1, \
+                           _PRMasterEnv->reqHdlrHighTimeHist );
+
+   #define MEAS__Print_Hists_for_Plugin_Meas \
+      printHist( _PRMasterEnv->reqHdlrLowTimeHist ); \
+      saveHistToFile( _PRMasterEnv->reqHdlrLowTimeHist ); \
+      printHist( _PRMasterEnv->reqHdlrHighTimeHist ); \
+      saveHistToFile( _PRMasterEnv->reqHdlrHighTimeHist ); \
+      freeHistExt( _PRMasterEnv->reqHdlrLowTimeHist ); \
+      freeHistExt( _PRMasterEnv->reqHdlrHighTimeHist );
+#else
+   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv
+   #define MEAS__Make_Meas_Hists_for_Plugin_Meas
+   #define MEAS__startReqHdlr 
+   #define MEAS__endReqHdlr 
+   #define MEAS__Print_Hists_for_Plugin_Meas 
+
+#endif
+
+      
+#ifdef MEAS__TURN_ON_SYSTEM_MEAS
+   #define MEAS__Insert_System_Meas_Fields_into_Slave \
+      TSCountLowHigh  startSusp; \
+      uint64  totalSuspCycles; \
+      uint32  numGoodSusp;
+
+   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv \
+       TSCountLowHigh   startMaster; \
+       uint64           totalMasterCycles; \
+       uint32           numMasterAnimations; \
+       TSCountLowHigh   startReqHdlr; \
+       uint64           totalPluginCycles; \
+       uint32           numPluginAnimations; \
+       uint64           cyclesTillStartAnimationMaster; \
+       TSCountLowHigh   endAnimationMaster;
+
+   #define MEAS__startAnimationMaster_forSys \
+      TSCountLowHigh startStamp1, endStamp1; \
+      saveTSCLowHigh( endStamp1 ); \
+      _PRMasterEnv->cyclesTillStartAnimationMaster = \
+      endStamp1.longVal - masterVP->startSusp.longVal;
+
+   #define Meas_startReqHdlr_forSys \
+        saveTSCLowHigh( startStamp1 ); \
+        _PRMasterEnv->startReqHdlr.longVal = startStamp1.longVal;
+ 
+   #define MEAS__endAnimationMaster_forSys \
+      saveTSCLowHigh( startStamp1 ); \
+      _PRMasterEnv->endAnimationMaster.longVal = startStamp1.longVal;
+
+   /*A TSC is stored in VP first thing inside wrapper-lib
+    * Now, measures cycles from there to here
+    * Master and Plugin will add this value to other trace-seg measures
+    */
+   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys\
+          saveTSCLowHigh(endSusp); \
+          numCycles = endSusp.longVal - currVP->startSusp.longVal; \
+          /*sanity check (400K is about 20K iters)*/ \
+          if( numCycles < 400000 ) \
+           { currVP->totalSuspCycles += numCycles; \
+             currVP->numGoodSusp++; \
+           } \
+             /*recorded every time, but only read if currVP == MasterVP*/ \
+          _PRMasterEnv->startMaster.longVal = endSusp.longVal;
+
+#else
+   #define MEAS__Insert_System_Meas_Fields_into_Slave 
+   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv 
+   #define MEAS__Make_Meas_Hists_for_System_Meas
+   #define MEAS__startAnimationMaster_forSys 
+   #define MEAS__startReqHdlr_forSys
+   #define MEAS__endAnimationMaster_forSys
+   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys
+   #define MEAS__Print_Hists_for_System_Meas 
+#endif
+
+#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
+   
+   #define MEAS__Insert_Counter_Handler \
+   typedef void (*CounterHandler) (int,int,int,SlaveVP*,uint64,uint64,uint64);
+ 
+   enum eventType {
+    DebugEvt = 0,
+    AppResponderInvocation_start,
+    AppResponder_start,
+    AppResponder_end,
+    AssignerInvocation_start,
+    NextAssigner_start,
+    Assigner_start,
+    Assigner_end,
+    Work_start,
+    Work_end,
+    HwResponderInvocation_start,
+    Timestamp_start,
+    Timestamp_end
+   };
+   
+   #define saveCyclesAndInstrs(core,cycles,instrs,cachem) do{ \
+   int cycles_fd = _PRMasterEnv->cycles_counter_fd[core]; \
+   int instrs_fd = _PRMasterEnv->instrs_counter_fd[core]; \
+   int cachem_fd = _PRMasterEnv->cachem_counter_fd[core]; \
+   int nread;                                           \
+                                                        \
+   nread = read(cycles_fd,&(cycles),sizeof(cycles));    \
+   if(nread<0){                                         \
+       perror("Error reading cycles counter");          \
+       cycles = 0;                                      \
+   }                                                    \
+                                                        \
+   nread = read(instrs_fd,&(instrs),sizeof(instrs));    \
+   if(nread<0){                                         \
+       perror("Error reading cycles counter");          \
+       instrs = 0;                                      \
+   }                                                    \
+   nread = read(cachem_fd,&(cachem),sizeof(cachem));    \
+   if(nread<0){                                         \
+       perror("Error reading last level cache miss counter");          \
+       cachem = 0;                                      \
+   }                                                    \
+   } while (0) 
+
+   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv \
+     int cycles_counter_fd[NUM_CORES]; \
+     int instrs_counter_fd[NUM_CORES]; \
+     int cachem_counter_fd[NUM_CORES]; \
+     uint64 start_master_lock[NUM_CORES][3]; \
+     CounterHandler counterHandler;
+
+   #define HOLISTIC__Setup_Perf_Counters setup_perf_counters();
+   
+
+   #define HOLISTIC__CoreCtrl_Setup \
+   CounterHandler counterHandler = _PRMasterEnv->counterHandler; \
+   SlaveVP      *lastVPBeforeMaster = NULL; \
+   /*if(thisCoresThdParams->coreNum == 0){ \
+       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
+       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
+   } \
+   if(0 < (thisCoresThdParams->coreNum) && (thisCoresThdParams->coreNum) < (NUM_CORES - 1)){ \
+       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
+       int sndctr = tsc_offset_resp(sendCoresThdParams, 0); \
+       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
+       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
+   }  \
+   if(thisCoresThdParams->coreNum == (NUM_CORES - 1)){ \
+       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
+       int sndctr = tsc_offset_resp(sendCoresThdParams,0); \
+   }*/
+   
+   
+   #define HOLISTIC__Insert_Master_Global_Vars \
+        int vpid,task; \
+        CounterHandler counterHandler = masterEnv->counterHandler;
+   
+   #define HOLISTIC__Record_last_work lastVPBeforeMaster = currVP;
+
+   #define HOLISTIC__Record_AppResponderInvocation_start \
+      uint64 cycles,instrs,cachem; \
+      saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
+      if(lastVPBeforeMaster){ \
+        (*counterHandler)(AppResponderInvocation_start,lastVPBeforeMaster->slaveID,lastVPBeforeMaster->assignCount,lastVPBeforeMaster,cycles,instrs,cachem); \
+        lastVPBeforeMaster = NULL; \
+      } else { \
+          _PRMasterEnv->start_master_lock[thisCoresIdx][0] = cycles; \
+          _PRMasterEnv->start_master_lock[thisCoresIdx][1] = instrs; \
+          _PRMasterEnv->start_master_lock[thisCoresIdx][2] = cachem; \
+      }
+ 
+           /* Request Handler may call resume() on the VP, but we want to 
+                * account the whole interval to the same task. Therefore, need
+                * to save task ID at the beginning.
+                * 
+                * Using this value as "end of AppResponder Invocation Time"
+                * is possible if there is only one SchedSlot per core -
+                * invoking processor is last to be treated here! If more than
+                * one slot, MasterLoop processing time for all but the last VP
+                * would be erroneously counted as invocation time.
+                */
+   #define HOLISTIC__Record_AppResponder_start \
+               vpid = currSlot->slaveAssignedToSlot->slaveID; \
+               task = currSlot->slaveAssignedToSlot->assignCount; \
+               uint64 cycles, instrs, cachem; \
+               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
+               (*counterHandler)(AppResponder_start,vpid,task,currSlot->slaveAssignedToSlot,cycles,instrs,cachem);
+
+   #define HOLISTIC__Record_AppResponder_end \
+        uint64 cycles2,instrs2,cachem2; \
+        saveCyclesAndInstrs(thisCoresIdx,cycles2, instrs2,cachem2); \
+        (*counterHandler)(AppResponder_end,vpid,task,currSlot->slaveAssignedToSlot,cycles2,instrs2,cachem2); \
+        (*counterHandler)(Timestamp_end,vpid,task,currSlot->slaveAssignedToSlot,rdtsc(),0,0);
+
+   
+   /* Don't know who to account time to yet - goes to assigned VP
+    * after the call.
+    */
+   #define HOLISTIC__Record_Assigner_start \
+       int empty = FALSE; \
+       if(currSlot->slaveAssignedToSlot == NULL){ \
+           empty= TRUE; \
+       } \
+       uint64 tmp_cycles, tmp_instrs, tmp_cachem; \
+       saveCyclesAndInstrs(thisCoresIdx,tmp_cycles,tmp_instrs,tmp_cachem); \
+       uint64 tsc = rdtsc(); \
+       if(vpid > 0) { \
+           (*counterHandler)(NextAssigner_start,vpid,task,currSlot->slaveAssignedToSlot,tmp_cycles,tmp_instrs,tmp_cachem); \
+           vpid = 0; \
+           task = 0; \
+        }
+
+   #define HOLISTIC__Record_Assigner_end \
+        uint64 cycles,instrs,cachem; \
+        saveCyclesAndInstrs(thisCoresIdx,cycles,instrs,cachem); \
+        if(empty){ \
+            (*counterHandler)(AssignerInvocation_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,masterEnv->start_master_lock[thisCoresIdx][0],masterEnv->start_master_lock[thisCoresIdx][1],masterEnv->start_master_lock[thisCoresIdx][2]); \
+        } \
+        (*counterHandler)(Timestamp_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tsc,0,0); \
+        (*counterHandler)(Assigner_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tmp_cycles,tmp_instrs,tmp_cachem); \
+        (*counterHandler)(Assigner_end,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,cycles,instrs,tmp_cachem);
+
+   #define HOLISTIC__Record_Work_start \
+        if(currVP){ \
+                uint64 cycles,instrs,cachem; \
+                saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
+                (*counterHandler)(Work_start,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
+        }
+   
+   #define HOLISTIC__Record_Work_end \
+       if(currVP){ \
+               uint64 cycles,instrs,cachem; \
+               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
+               (*counterHandler)(Work_end,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
+       }
+
+   #define HOLISTIC__Record_HwResponderInvocation_start \
+        uint64 cycles,instrs,cachem; \
+        saveCyclesAndInstrs(animatingSlv->coreAnimatedBy,cycles, instrs,cachem); \
+        (*(_PRMasterEnv->counterHandler))(HwResponderInvocation_start,animatingSlv->slaveID,animatingSlv->assignCount,animatingSlv,cycles,instrs,cachem); 
+        
+
+   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr) do{     \
+void* frame_ptr0 = vp_ptr->framePtr;                               \
+void* frame_ptr1 = *((void**)frame_ptr0);                          \
+void* frame_ptr2 = *((void**)frame_ptr1);                          \
+void* frame_ptr3 = *((void**)frame_ptr2);                          \
+void* ret_addr = *((void**)frame_ptr3 + 1);                        \
+*res_ptr = ret_addr;                                               \
+} while (0)
+
+#else  
+   #define MEAS__Insert_Counter_Handler
+   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv
+   #define HOLISTIC__Setup_Perf_Counters
+   #define HOLISTIC__CoreCtrl_Setup
+   #define HOLISTIC__Insert_Master_Global_Vars
+   #define HOLISTIC__Record_last_work
+   #define HOLISTIC__Record_AppResponderInvocation_start
+   #define HOLISTIC__Record_AppResponder_start
+   #define HOLISTIC__Record_AppResponder_end
+   #define HOLISTIC__Record_Assigner_start
+   #define HOLISTIC__Record_Assigner_end
+   #define HOLISTIC__Record_Work_start
+   #define HOLISTIC__Record_Work_end
+   #define HOLISTIC__Record_HwResponderInvocation_start
+   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr)
+#endif
+
+//Experiment in two-step macros -- if doesn't work, insert each separately
+#define MEAS__Insert_Meas_Fields_into_Slave  \
+   MEAS__Insert_Susp_Meas_Fields_into_Slave \
+   MEAS__Insert_Master_Meas_Fields_into_Slave \
+   MEAS__Insert_System_Meas_Fields_into_Slave 
+
+
+//======================  Histogram Macros -- Create ========================
+//
+//
+
+//The language implementation should include a definition of this macro,
+// which creates all the histograms the language uses to collect measurements
+// of plugin operation -- so, if the language didn't define it, must
+// define it here (as empty), to avoid compile error
+#ifndef MEAS__Make_Meas_Hists_for_Language
+#define MEAS__Make_Meas_Hists_for_Language
+#endif
+
+#define makeAMeasHist( idx, name, numBins, startVal, binWidth ) \
+      makeHighestDynArrayIndexBeAtLeast( _PRMasterEnv->measHistsInfo, idx ); \
+      _PRMasterEnv->measHists[idx] =  \
+                       makeFixedBinHist( numBins, startVal, binWidth, name );
+
+//==============================  Probes  ===================================
+
+
+//===========================================================================
+#endif	/* _PR_DEFS_MEAS_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_PR/Measurement_and_Stats/probes.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Services_Offered_by_PR/Measurement_and_Stats/probes.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,304 @@
+/*
+ * Copyright 2010  OpenSourceStewardshipFoundation
+ *
+ * Licensed under BSD
+ */
+
+#include <stdio.h>
+#include <malloc.h>
+#include <sys/time.h>
+
+#include "PR_impl/PR.h"
+
+
+
+//====================  Probes =================
+/*
+ * In practice, probe operations are called from the app, from inside slaves
+ *  -- so have to be sure each probe is single-Slv owned, and be sure that
+ *  any place common structures are modified it's done inside the master.
+ * So -- the only place common structures are modified is during creation.
+ *  after that, all mods are to individual instances.
+ *
+ * Thniking perhaps should change the semantics to be that probes are
+ *  attached to the virtual processor -- and then everything is guaranteed
+ *  to be isolated -- except then can't take any intervals that span Slvs,
+ *  and would have to transfer the probes to Master env when Slv dissipates..
+ *  gets messy..
+ *
+ * For now, just making so that probe creation causes a suspend, so that
+ *  the dynamic array in the master env is only modified from the master
+ * 
+ */
+
+//============================  Helpers ===========================
+inline void 
+doNothing()
+ {
+ }
+
+float64 inline
+giveInterval( struct timeval _start, struct timeval _end )
+ { float64 start, end;
+   start = _start.tv_sec + _start.tv_usec / 1000000.0;
+   end   = _end.tv_sec   + _end.tv_usec   / 1000000.0;
+   return end - start;
+ }
+          
+//=================================================================
+IntervalProbe *
+create_generic_probe( char *nameStr, SlaveVP *animSlv )
+ {
+   PRSemReq reqData;
+
+   reqData.reqType  = make_probe;
+   reqData.nameStr  = nameStr;
+
+   PR_WL__send_PRSem_request( &reqData, animSlv );
+
+   return animSlv->dataRetFromReq;
+ }
+
+/*Use this version from outside PR -- it uses external malloc, and modifies
+ * dynamic array, so can't be animated in a slave Slv
+ */
+IntervalProbe *
+ext__create_generic_probe( char *nameStr )
+ { IntervalProbe *newProbe;
+   int32          nameLen;
+
+   newProbe          = malloc( sizeof(IntervalProbe) );
+   nameLen = strlen( nameStr );
+   newProbe->nameStr = malloc( nameLen );
+   memcpy( newProbe->nameStr, nameStr, nameLen );
+   newProbe->hist    = NULL;
+   newProbe->schedChoiceWasRecorded = FALSE;
+   newProbe->probeID =
+             addToDynArray( newProbe, _PRMasterEnv->dynIntervalProbesInfo );
+
+   return newProbe;
+ }
+
+//============================ Fns def in header =======================
+
+int32
+PR_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv )
+ { IntervalProbe *newProbe;
+
+   newProbe = create_generic_probe( nameStr, animSlv );
+   
+   return newProbe->probeID;
+ }
+
+int32
+PR_impl__create_histogram_probe( int32   numBins, float64    startValue,
+               float64 binWidth, char   *nameStr, SlaveVP *animSlv )
+ { IntervalProbe *newProbe;
+
+   newProbe = create_generic_probe( nameStr, animSlv );
+   
+#ifdef PROBES__USE_TIME_OF_DAY_PROBES
+   DblHist *hist;
+   hist =  makeDblHistogram( numBins, startValue, binWidth );
+#else
+   Histogram *hist;
+   hist =  makeHistogram( numBins, startValue, binWidth );
+#endif
+   newProbe->hist = hist;
+   return newProbe->probeID;
+ }
+
+
+int32
+PR_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv)
+ { IntervalProbe *newProbe;
+   struct timeval *startStamp;
+   float64 startSecs;
+
+   newProbe           = create_generic_probe( nameStr, animSlv );
+   newProbe->endSecs  = 0;
+
+   
+   gettimeofday( &(newProbe->startStamp), NULL);
+
+      //turn into a double
+   startStamp = &(newProbe->startStamp);
+   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
+   newProbe->startSecs = startSecs;
+
+   return newProbe->probeID;
+ }
+
+int32
+PR_ext_impl__record_time_point_into_new_probe( char *nameStr )
+ { IntervalProbe *newProbe;
+   struct timeval *startStamp;
+   float64 startSecs;
+
+   newProbe           = ext__create_generic_probe( nameStr );
+   newProbe->endSecs  = 0;
+
+   gettimeofday( &(newProbe->startStamp), NULL);
+
+      //turn into a double
+   startStamp = &(newProbe->startStamp);
+   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
+   newProbe->startSecs = startSecs;
+
+   return newProbe->probeID;
+ }
+
+
+/*Only call from inside master or main startup/shutdown thread
+ */
+void
+PR_impl__free_probe( IntervalProbe *probe )
+ { if( probe->hist != NULL )   freeDblHist( probe->hist );
+   if( probe->nameStr != NULL) PR_int__free( probe->nameStr );
+   PR_int__free( probe );
+ }
+
+
+void
+PR_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv )
+ { IntervalProbe *probe;
+
+   PR_int__get_master_lock();
+   probe = _PRMasterEnv->intervalProbes[ probeID ];
+
+   addValueIntoTable(probe->nameStr, probe, _PRMasterEnv->probeNameHashTbl);
+   PR_int__release_master_lock();
+ }
+
+
+IntervalProbe *
+PR_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv )
+ {
+   //TODO: fix this To be in Master -- race condition
+   return getValueFromTable( probeName, _PRMasterEnv->probeNameHashTbl );
+ }
+
+
+/*Everything is local to the animating slaveVP, so no need for request, do
+ * work locally, in the anim Slv
+ */
+void
+PR_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animatingSlv )
+ { IntervalProbe *probe;
+ 
+   probe = _PRMasterEnv->intervalProbes[ probeID ];
+   probe->schedChoiceWasRecorded = TRUE;
+   probe->coreNum = animatingSlv->coreAnimatedBy;
+   probe->slaveID = animatingSlv->slaveID;
+   probe->slaveCreateSecs = animatingSlv->createPtInSecs;
+ }
+
+/*Everything is local to the animating slaveVP, so no need for request, do
+ * work locally, in the anim Slv
+ */
+void
+PR_impl__record_interval_start_in_probe( int32 probeID )
+ { IntervalProbe *probe;
+
+         DEBUG__printf( dbgProbes, "record start of interval" )
+   probe = _PRMasterEnv->intervalProbes[ probeID ];
+
+      //record *start* point as last thing, after lookup
+#ifdef PROBES__USE_TIME_OF_DAY_PROBES
+   gettimeofday( &(probe->startStamp), NULL);
+#endif
+#ifdef PROBES__USE_TSC_PROBES
+   probe->startStamp = getTSCount();
+#endif
+ }
+
+
+/*Everything is local to the animating slaveVP, except the histogram, so do
+ * work locally, in the anim Slv -- may lose a few histogram counts
+ * 
+ *This should be safe to run inside SlaveVP
+ */
+void
+PR_impl__record_interval_end_in_probe( int32 probeID )
+ { IntervalProbe *probe;
+
+   //Record first thing -- before looking up the probe to store it into
+#ifdef PROBES__USE_TIME_OF_DAY_PROBES
+   struct timeval  endStamp;
+   gettimeofday( &(endStamp), NULL);
+#endif
+#ifdef PROBES__USE_TSC_PROBES
+   TSCount endStamp, interval;
+   endStamp = getTSCount();
+#endif
+#ifdef PROBES__USE_PERF_CTR_PROBES
+
+#endif
+   
+   probe = _PRMasterEnv->intervalProbes[ probeID ];
+
+#ifdef PROBES__USE_TIME_OF_DAY_PROBES
+   if( probe->hist != NULL )
+    { addToDblHist( giveInterval( probe->startStamp, endStamp), probe->hist );
+    }
+#endif
+#ifdef PROBES__USE_TSC_PROBES
+   if( probe->hist != NULL )
+    { interval = probe->endStamp - probe->startStamp;
+         //Sanity check for TSC counter overflow: if sane, add to histogram
+      if( interval < probe->hist->endOfRange * 10 )
+         addToHist( interval, probe->hist );
+    }
+#endif
+#ifdef PROBES__USE_PERF_CTR_PROBES
+
+#endif
+   
+         DEBUG__printf( dbgProbes, "record end of interval" )
+ }
+
+
+void
+print_probe_helper( IntervalProbe *probe )
+ {
+   printf( "\nprobe: %s, ",  probe->nameStr );
+   
+   
+   if( probe->schedChoiceWasRecorded )
+    { printf( "coreNum: %d, slaveID: %d, slaveVPCreated: %0.6f | ",
+              probe->coreNum, probe->slaveID, probe->slaveCreateSecs );
+    }
+
+   if( probe->endSecs == 0 ) //just a single point in time
+    {
+      printf( " time point: %.6f\n",
+              probe->startSecs - _PRMasterEnv->createPtInSecs );
+    }
+   else if( probe->hist == NULL ) //just an interval
+    {
+      printf( " startSecs: %.6f interval: %.6f\n", 
+         (probe->startSecs - _PRMasterEnv->createPtInSecs), probe->interval);
+    }
+   else  //a full histogram of intervals
+    {
+      printDblHist( probe->hist );
+    }
+ }
+
+void
+PR_impl__print_stats_of_probe( IntervalProbe *probe )
+ { 
+
+//   probe = _PRMasterEnv->intervalProbes[ probeID ];
+
+   print_probe_helper( probe );
+ }
+
+
+void
+PR_impl__print_stats_of_all_probes()
+ {
+   forAllInDynArrayDo( _PRMasterEnv->dynIntervalProbesInfo,
+                          (DynArrayFnPtr) &PR_impl__print_stats_of_probe );
+   fflush( stdout );
+ }
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_PR/Measurement_and_Stats/probes.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Services_Offered_by_PR/Measurement_and_Stats/probes.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,192 @@
+/*
+ *  Copyright 2009 OpenSourceStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ * 
+ */
+
+#ifndef _PROBES_H
+#define	_PROBES_H
+#define _GNU_SOURCE
+
+#include "PR_impl/PR_primitive_data_types.h"
+
+#include <sys/time.h>
+
+/*Note on order of include files:  
+ * This file relies on #defines that appear in other files, which must come
+ * first in the #include sequence..
+ */
+
+/*Use these aliases in application code*/
+#define PR_App__record_time_point_into_new_probe PR_WL__record_time_point_into_new_probe
+#define PR_App__create_single_interval_probe   PR_WL__create_single_interval_probe
+#define PR_App__create_histogram_probe         PR_WL__create_histogram_probe
+#define PR_App__index_probe_by_its_name        PR_WL__index_probe_by_its_name
+#define PR_App__get_probe_by_name              PR_WL__get_probe_by_name
+#define PR_App__record_sched_choice_into_probe PR_WL__record_sched_choice_into_probe
+#define PR_App__record_interval_start_in_probe PR_WL__record_interval_start_in_probe 
+#define PR_App__record_interval_end_in_probe   PR_WL__record_interval_end_in_probe
+#define PR_App__print_stats_of_probe           PR_WL__print_stats_of_probe
+#define PR_App__print_stats_of_all_probes      PR_WL__print_stats_of_all_probes 
+
+
+//==========================
+#ifdef PROBES__USE_TSC_PROBES
+   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
+   TSCount    startStamp; \
+   TSCount    endStamp; \
+   TSCount    interval; \
+   Histogram *hist; /*if left NULL, then is single interval probe*/
+#endif
+#ifdef PROBES__USE_TIME_OF_DAY_PROBES
+   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
+   struct timeval  startStamp; \
+   struct timeval  endStamp; \
+   float64         startSecs; \
+   float64         endSecs; \
+   float64         interval; \
+   DblHist        *hist; /*if NULL, then is single interval probe*/
+#endif
+#ifdef PROBES__USE_PERF_CTR_PROBES
+   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
+   int64  startStamp; \
+   int64  endStamp; \
+   int64  interval; \
+   Histogram *hist; /*if left NULL, then is single interval probe*/
+#endif
+
+//typedef struct _IntervalProbe IntervalProbe; -- is in PR.h
+struct _IntervalProbe
+ {
+   char           *nameStr;
+   int32           probeID;
+
+   int32           schedChoiceWasRecorded;
+   int32           coreNum;
+   int32           slaveID;
+   float64         slaveCreateSecs;
+   PROBES__Insert_timestamps_and_intervals_into_probe_struct;
+ };
+
+//=========================== NEVER USE THESE ==========================
+/*NEVER use these in any code!!  These are here only for use in the macros
+ * defined in this file!!
+ */
+int32
+PR_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv );
+
+int32
+PR_impl__create_histogram_probe( int32   numBins, float64    startValue,
+               float64 binWidth, char    *nameStr, SlaveVP *animSlv );
+
+int32
+PR_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv);
+
+int32
+PR_ext_impl__record_time_point_into_new_probe( char *nameStr );
+
+void
+PR_impl__free_probe( IntervalProbe *probe );
+
+void
+PR_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv );
+
+IntervalProbe *
+PR_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv );
+
+void
+PR_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animSlv );
+
+void
+PR_impl__record_interval_start_in_probe( int32 probeID );
+
+void
+PR_impl__record_interval_end_in_probe( int32 probeID );
+
+void
+PR_impl__print_stats_of_probe( IntervalProbe *probe );
+
+void
+PR_impl__print_stats_of_all_probes();
+
+
+//======================== Probes =============================
+//
+// Use macros to allow turning probes off with a #define switch
+// This means probes have zero impact on performance when off
+//=============================================================
+
+#ifdef PROBES__TURN_ON_STATS_PROBES
+
+   #define PROBES__Create_Probe_Bookkeeping_Vars \
+      _PRMasterEnv->dynIntervalProbesInfo = \
+       makePrivDynArrayOfSize( (void***)&(_PRMasterEnv->intervalProbes), 200); \
+      \
+      _PRMasterEnv->probeNameHashTbl = makeHashTable( 1000, &PR_int__free ); \
+      \
+      /*put creation time directly into master env, for fast retrieval*/ \
+   struct timeval timeStamp; \
+   gettimeofday( &(timeStamp), NULL); \
+   _PRMasterEnv->createPtInSecs = \
+                           timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0);
+
+   #define PR_WL__record_time_point_into_new_probe( nameStr, animSlv ) \
+           PR_impl__record_time_point_in_new_probe( nameStr, animSlv )
+
+   #define PR_ext__record_time_point_into_new_probe( nameStr ) \
+           PR_ext_impl__record_time_point_into_new_probe( nameStr )
+
+   #define PR_WL__create_single_interval_probe( nameStr, animSlv ) \
+           PR_impl__create_single_interval_probe( nameStr, animSlv )
+
+   #define PR_WL__create_histogram_probe(      numBins, startValue,              \
+                                             binWidth, nameStr, animSlv )       \
+           PR_impl__create_histogram_probe( numBins, startValue,              \
+                                             binWidth, nameStr, animSlv )
+   #define PR_int__free_probe( probe ) \
+           PR_impl__free_probe( probe )
+
+   #define PR_WL__index_probe_by_its_name( probeID, animSlv ) \
+           PR_impl__index_probe_by_its_name( probeID, animSlv )
+
+   #define PR_WL__get_probe_by_name( probeID, animSlv ) \
+           PR_impl__get_probe_by_name( probeName, animSlv )
+
+   #define PR_WL__record_sched_choice_into_probe( probeID, animSlv ) \
+           PR_impl__record_sched_choice_into_probe( probeID, animSlv )
+
+   #define PR_WL__record_interval_start_in_probe( probeID ) \
+           PR_impl__record_interval_start_in_probe( probeID )
+
+   #define PR_WL__record_interval_end_in_probe( probeID ) \
+           PR_impl__record_interval_end_in_probe( probeID )
+
+   #define PR_WL__print_stats_of_probe( probeID ) \
+           PR_impl__print_stats_of_probe( probeID )
+
+   #define PR_WL__print_stats_of_all_probes() \
+           PR_impl__print_stats_of_all_probes()
+
+
+#else
+   #define PROBES__Create_Probe_Bookkeeping_Vars
+   #define PR_WL__record_time_point_into_new_probe( nameStr, animSlv ) 0 /* do nothing */
+   #define PR_ext__record_time_point_into_new_probe( nameStr )  0 /* do nothing */
+   #define PR_WL__create_single_interval_probe( nameStr, animSlv ) 0 /* do nothing */
+   #define PR_WL__create_histogram_probe( numBins, startValue,              \
+                                             binWidth, nameStr, animSlv )       \
+          0 /* do nothing */
+   #define PR_WL__index_probe_by_its_name( probeID, animSlv ) /* do nothing */
+   #define PR_WL__get_probe_by_name( probeID, animSlv ) NULL /* do nothing */
+   #define PR_WL__record_sched_choice_into_probe( probeID, animSlv ) /* do nothing */
+   #define PR_WL__record_interval_start_in_probe( probeID )  /* do nothing */
+   #define PR_WL__record_interval_end_in_probe( probeID )  /* do nothing */
+   #define PR_WL__print_stats_of_probe( probeID ) ; /* do nothing */
+   #define PR_WL__print_stats_of_all_probes() ;/* do nothing */
+
+#endif   /* defined PROBES__TURN_ON_STATS_PROBES */
+
+#endif	/* _PROBES_H */
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_PR/Memory_Handling/vmalloc.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Services_Offered_by_PR/Memory_Handling/vmalloc.c	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,438 @@
+/*
+ *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ *
+ * Created on November 14, 2009, 9:07 PM
+ */
+
+#include <malloc.h>
+#include <inttypes.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+
+#include "PR_impl/PR.h"
+#include "Histogram/Histogram.h"
+
+#define MAX_UINT64 0xFFFFFFFFFFFFFFFF
+
+//A MallocProlog is a head element if the HigherInMem variable is NULL
+//A Chunk is free if the prevChunkInFreeList variable is NULL
+
+/*
+ * This calculates the container which fits the given size.
+ */
+inline
+uint32 getContainer(size_t size)
+{
+    return (log2(size)-LOG128)/LOG54;
+}
+
+/*
+ * Removes the first chunk of a freeList
+ * The chunk is removed but not set as free. There is no check if
+ * the free list is empty, so make sure this is not the case.
+ */
+inline
+MallocProlog *removeChunk(MallocArrays* freeLists, uint32 containerIdx)
+{
+    MallocProlog** container = &freeLists->bigChunks[containerIdx];
+    MallocProlog*  removedChunk = *container;
+    *container = removedChunk->nextChunkInFreeList;
+    
+    if(removedChunk->nextChunkInFreeList)
+        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
+                (MallocProlog*)container;
+    
+    if(*container == NULL)
+    {
+       if(containerIdx < 64)
+           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
+       else
+           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64));
+    }
+    
+    return removedChunk;
+}
+
+/*
+ * Removes the first chunk of a freeList
+ * The chunk is removed but not set as free. There is no check if
+ * the free list is empty, so make sure this is not the case.
+ */
+inline
+MallocProlog *removeSmallChunk(MallocArrays* freeLists, uint32 containerIdx)
+{
+    MallocProlog** container = &freeLists->smallChunks[containerIdx];
+    MallocProlog*  removedChunk = *container;
+    *container = removedChunk->nextChunkInFreeList;
+    
+    if(removedChunk->nextChunkInFreeList)
+        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
+                (MallocProlog*)container;
+    
+    return removedChunk;
+}
+
+inline
+size_t getChunkSize(MallocProlog* chunk)
+{
+    return (uintptr_t)chunk->nextHigherInMem -
+            (uintptr_t)chunk - sizeof(MallocProlog);
+}
+
+/*
+ * Removes a chunk from a free list.
+ */
+inline
+void extractChunk(MallocProlog* chunk, MallocArrays *freeLists)
+{
+   chunk->prevChunkInFreeList->nextChunkInFreeList = chunk->nextChunkInFreeList;
+   if(chunk->nextChunkInFreeList)
+       chunk->nextChunkInFreeList->prevChunkInFreeList = chunk->prevChunkInFreeList;
+   
+   //The last element in the list points to the container. If the container points
+   //to NULL the container is empty
+   if(*((void**)(chunk->prevChunkInFreeList)) == NULL && getChunkSize(chunk) >= BIG_LOWER_BOUND)
+   {
+       //Find the approppiate container because we do not know it
+       uint64 containerIdx = ((uintptr_t)chunk->prevChunkInFreeList - (uintptr_t)freeLists->bigChunks) >> 3;
+       if(containerIdx < (uint32)64)
+           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
+       if(containerIdx < 128 && containerIdx >=64)
+           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64)); 
+       
+   }
+}
+
+/*
+ * Merges two chunks.
+ * Chunk A has to be before chunk B in memory. Both have to be removed from
+ * a free list
+ */
+inline
+MallocProlog *mergeChunks(MallocProlog* chunkA, MallocProlog* chunkB)
+{
+    chunkA->nextHigherInMem = chunkB->nextHigherInMem;
+    chunkB->nextHigherInMem->nextLowerInMem = chunkA;
+    return chunkA;
+}
+/*
+ * Inserts a chunk into a free list.
+ */
+inline
+void insertChunk(MallocProlog* chunk, MallocProlog** container)
+{
+    chunk->nextChunkInFreeList = *container;
+    chunk->prevChunkInFreeList = (MallocProlog*)container;
+    if(*container)
+        (*container)->prevChunkInFreeList = chunk;
+    *container = chunk;
+}
+
+/*
+ * Divides the chunk that a new chunk of newSize is created.
+ * There is no size check, so make sure the size value is valid.
+ */
+inline
+MallocProlog *divideChunk(MallocProlog* chunk, size_t newSize)
+{
+    MallocProlog* newChunk = (MallocProlog*)((uintptr_t)chunk->nextHigherInMem -
+            newSize - sizeof(MallocProlog));
+    
+    newChunk->nextLowerInMem  = chunk;
+    newChunk->nextHigherInMem = chunk->nextHigherInMem;
+    
+    chunk->nextHigherInMem->nextLowerInMem = newChunk;
+    chunk->nextHigherInMem = newChunk;
+    
+    return newChunk;
+}
+
+/* 
+ * Search for chunk in the list of big chunks. Split the block if it's too big
+ */
+inline
+MallocProlog *searchChunk(MallocArrays *freeLists, size_t sizeRequested, uint32 containerIdx)
+{
+    MallocProlog* foundChunk;
+    
+    uint64 searchVector = freeLists->bigChunksSearchVector[0];
+    //set small chunk bits to zero
+    searchVector &= MAX_UINT64 << containerIdx;
+    containerIdx = __builtin_ffsl(searchVector); //least significant 1 bit
+
+    if(containerIdx == 0)
+    {
+       searchVector = freeLists->bigChunksSearchVector[1];
+       containerIdx = __builtin_ffsl(searchVector);
+       if(containerIdx == 0)
+       {
+           //TODO: get additional mem and insert into free list
+           //malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
+           printf("PR malloc failed: low memory");
+           exit(1);   
+       }
+       containerIdx += 64;
+    }
+    containerIdx--;
+    
+
+    foundChunk = removeChunk(freeLists, containerIdx);
+    size_t chunkSize     = getChunkSize(foundChunk);
+
+    //If the new chunk is larger than the requested size: split
+    if(chunkSize > sizeRequested + 2 * sizeof(MallocProlog) + BIG_LOWER_BOUND)
+    {
+       MallocProlog *newChunk = divideChunk(foundChunk,sizeRequested);
+       containerIdx = getContainer(getChunkSize(foundChunk)) - 1;
+       insertChunk(foundChunk,&freeLists->bigChunks[containerIdx]);
+       if(containerIdx < 64)
+           freeLists->bigChunksSearchVector[0] |= ((uint64)1 << containerIdx);
+       else
+           freeLists->bigChunksSearchVector[1] |= ((uint64)1 << (containerIdx-64));
+       foundChunk = newChunk;
+    } 
+    
+    return foundChunk;
+}
+
+
+/*
+ * This is sequential code, meant to only be called from the Master, not from
+ * any slave Slvs.
+ * 
+ *May 2012
+ *ToDo: Improve speed, by using built-in leading 1 detector to calc free-list
+ * index.
+ *Change to two separate arrays, one for free-lists of small fixed-size chunks
+ * other for free lists of exponentially growing chunk sizes
+ *Do simple compare to decide which array of lists to use
+ *For small chunks, size the lists in increments of 16, up to, say, 128 (1024
+ * is max if want less than 64 lists, which allows searching for first
+ * occupied free-list using leading-1 detector on a bit-vector)
+ *To find index, right-shift by 4 bits, and that's the index! (works because
+ * compare says no 1's above 128 position ((bit 7)), and sizes are every 16,
+ * so dividing by 16 equals exactly the position)
+ *For large chunks, have 63 free lists, but split into even and odd indexes.
+ *For even indexes, each list starts with chunks twice the size of previous
+ * even index.
+ *For odd indexes, each list starts with chunks of size half-way between those
+ * of the even indexes on either side.
+ *
+ *To calc the free-list position of a requested size, get pos of leading 1
+ * of the size, call this msbsP (most-significant-bit-set-position). Then
+ * check bit to right of it (one-less-significant)
+ *If it's 0 then use the even index: msbsP * 2, which is msbsP << 1.
+ *If it's 1, then use the odd-index, which is msbsP << 1  + 1
+ *
+ *To find msbsP, use GCC builtin: "int __builtin_clzll (unsigned long long)"
+ * which returns the number of zeros above (left of) msb set.  Note, dies if
+ * give it zero, but the compare used to choose between arrays makes sure
+ * requested size given to it is not zero.
+ * 
+ *This scheme keeps wastage small, while finding free element is O(1), and a
+ * fast constant.
+ *For large chunk sizes, if don't shave excess, then it ensures worst-case
+ * wastage due to mis-match in size of chunk vs requested size is 33% 
+ * (invariant: take any even list.. it starts at a power of 2, and next list
+ *  up starts at 50% larger, so biggest chunk is 1.5 x smallest request, that's
+ *  33% of total memory wasted. Then, for the odd index above, smallest chunk
+ *  is 2x for smallest request of 1.5x, for 25% total wasted memory)
+ *For smallest size chunks, the pre-amble wastes quite a bit, but above that,
+ * sizing in increments of 16 keeps wastage small.  And, if always shave, then
+ * wastage due to size mis-match is maximum 16 bytes for the large chunks.
+ * 
+ */
+void *
+PR_int__malloc( size_t sizeRequested )
+ {     
+         MEAS__Capture_Pre_Malloc_Point
+   
+   MallocArrays* freeLists = _PRMasterEnv->freeLists;
+   MallocProlog* foundChunk;
+   
+   //Return a small chunk if the requested size is smaller than 128B
+   if(sizeRequested <= LOWER_BOUND)
+    {
+      uint32 freeListIdx = (sizeRequested-1)/SMALL_CHUNK_SIZE;
+      if(freeLists->smallChunks[freeListIdx] == NULL)
+        foundChunk = searchChunk(freeLists, SMALL_CHUNK_SIZE*(freeListIdx+1), 0);
+      else
+        foundChunk = removeSmallChunk(freeLists, freeListIdx);
+       
+      //Mark as allocated
+      foundChunk->prevChunkInFreeList = NULL;      
+      return foundChunk + 1;
+    }
+   
+   //Calculate the expected container. Start one higher to have a Chunk that's
+   //always big enough.
+   uint32 containerIdx = getContainer(sizeRequested);
+   
+   if(freeLists->bigChunks[containerIdx] == NULL)
+       foundChunk = searchChunk(freeLists, sizeRequested, containerIdx); 
+   else
+       foundChunk = removeChunk(freeLists, containerIdx); 
+   
+   //Mark as allocated
+   foundChunk->prevChunkInFreeList = NULL;      
+   
+         MEAS__Capture_Post_Malloc_Point
+   
+   //skip over the prolog by adding its size to the pointer return
+   return foundChunk + 1;
+ }
+
+void *
+PR_WL__malloc( int32 sizeRequested )
+ { void *ret;
+ 
+   PR_int__get_master_lock();
+   ret = PR_int__malloc( sizeRequested );
+   PR_int__release_master_lock();
+   return ret;
+ }
+
+
+/*
+ * This is sequential code, meant to only be called from the Master, not from
+ * any slave Slvs.
+ */
+void
+PR_int__free( void *ptrToFree )
+ {
+    
+         MEAS__Capture_Pre_Free_Point;
+         
+   MallocArrays* freeLists = _PRMasterEnv->freeLists;
+   MallocProlog *chunkToFree = (MallocProlog*)ptrToFree - 1;
+   uint32 containerIdx;
+   
+   //Check for free neighbors
+   if(chunkToFree->nextLowerInMem)
+   {
+       if(chunkToFree->nextLowerInMem->prevChunkInFreeList != NULL)
+       {//Chunk is not allocated
+           extractChunk(chunkToFree->nextLowerInMem, freeLists);
+           chunkToFree = mergeChunks(chunkToFree->nextLowerInMem, chunkToFree);
+       }
+   }
+   if(chunkToFree->nextHigherInMem)
+   {
+       if(chunkToFree->nextHigherInMem->prevChunkInFreeList != NULL)
+       {//Chunk is not allocated
+           extractChunk(chunkToFree->nextHigherInMem, freeLists);
+           chunkToFree = mergeChunks(chunkToFree, chunkToFree->nextHigherInMem);
+       }
+   }
+   
+   size_t chunkSize = getChunkSize(chunkToFree);
+   if(chunkSize < BIG_LOWER_BOUND)
+   {
+       containerIdx =  (chunkSize/SMALL_CHUNK_SIZE)-1;
+       if(containerIdx > SMALL_CHUNK_COUNT-1)
+           containerIdx = SMALL_CHUNK_COUNT-1;
+       insertChunk(chunkToFree, &freeLists->smallChunks[containerIdx]);
+   }
+   else
+   {
+       containerIdx = getContainer(getChunkSize(chunkToFree)) - 1;
+       insertChunk(chunkToFree, &freeLists->bigChunks[containerIdx]);
+       if(containerIdx < 64)
+           freeLists->bigChunksSearchVector[0] |= (uint64)1 << containerIdx;
+       else
+           freeLists->bigChunksSearchVector[1] |= (uint64)1 << (containerIdx-64);
+   }   
+   
+         MEAS__Capture_Post_Free_Point;
+ }
+
+void
+PR_WL__free( void *ptrToFree )
+ {
+   PR_int__get_master_lock();
+   PR_int__free( ptrToFree );
+   PR_int__release_master_lock();
+ }
+
+/*
+ * Designed to be called from the main thread outside of PR, during init
+ */
+MallocArrays *
+PR_ext__create_free_list()
+{     
+   //Initialize containers for small chunks and fill with zeros
+   _PRMasterEnv->freeLists = (MallocArrays*)malloc( sizeof(MallocArrays) );
+   MallocArrays *freeLists = _PRMasterEnv->freeLists;
+   
+   freeLists->smallChunks = 
+           (MallocProlog**)malloc(SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
+   memset((void*)freeLists->smallChunks,
+           0,SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
+   
+   //Calculate number of containers for big chunks
+   uint32 container = getContainer(MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE)+1;
+   freeLists->bigChunks = (MallocProlog**)malloc(container*sizeof(MallocProlog*));
+   memset((void*)freeLists->bigChunks,0,container*sizeof(MallocProlog*));
+   freeLists->containerCount = container;
+   
+   //Create first element in lastContainer 
+   MallocProlog *firstChunk = malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
+   if( firstChunk == NULL ) {printf("Can't allocate initial memory\n"); exit(1);}
+   freeLists->memSpace = firstChunk;
+   
+   //Touch memory to avoid page faults
+   void *ptr,*endPtr; 
+   endPtr = (void*)firstChunk+MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE;
+   for(ptr = firstChunk; ptr < endPtr; ptr+=PAGE_SIZE)
+   {
+       *(char*)ptr = 0;
+   }
+   
+   firstChunk->nextLowerInMem = NULL;
+   firstChunk->nextHigherInMem = (MallocProlog*)((uintptr_t)firstChunk +
+                        MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE - sizeof(MallocProlog));
+   firstChunk->nextChunkInFreeList = NULL;
+   //previous element in the queue is the container
+   firstChunk->prevChunkInFreeList = &freeLists->bigChunks[container-2];
+   
+   freeLists->bigChunks[container-2] = firstChunk;
+   //Insert into bit search list
+   if(container <= 65)
+   {
+       freeLists->bigChunksSearchVector[0] = ((uint64)1 << (container-2));
+       freeLists->bigChunksSearchVector[1] = 0;
+   }   
+   else
+   {
+       freeLists->bigChunksSearchVector[0] = 0;
+       freeLists->bigChunksSearchVector[1] = ((uint64)1 << (container-66));
+   }
+   
+   //Create dummy chunk to mark the top of stack this is of course
+   //never freed
+   MallocProlog *dummyChunk = firstChunk->nextHigherInMem;
+   dummyChunk->nextHigherInMem = dummyChunk+1;
+   dummyChunk->nextLowerInMem  = NULL;
+   dummyChunk->nextChunkInFreeList = NULL;
+   dummyChunk->prevChunkInFreeList = NULL;
+   
+   return freeLists;
+ }
+
+
+/*Designed to be called from the main thread outside of PR, during cleanup
+ */
+void
+PR_ext__free_free_list( MallocArrays *freeLists )
+ {    
+   free(freeLists->memSpace);
+   free(freeLists->bigChunks);
+   free(freeLists->smallChunks);
+   
+ }
+
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_PR/Memory_Handling/vmalloc.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Services_Offered_by_PR/Memory_Handling/vmalloc.h	Wed Sep 19 23:12:44 2012 -0700
@@ -0,0 +1,94 @@
+/*
+ *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
+ *  Licensed under GNU General Public License version 2
+ *
+ * Author: seanhalle@yahoo.com
+ *
+ * Created on November 14, 2009, 9:07 PM
+ */
+
+#ifndef _VMALLOC_H
+#define	_VMALLOC_H
+
+#include <malloc.h>
+#include <inttypes.h>
+#include "PR_impl/PR_primitive_data_types.h"
+
+#define SMALL_CHUNK_SIZE 32
+#define SMALL_CHUNK_COUNT 4
+#define LOWER_BOUND     128  //Biggest chunk size that is created for the small chunks
+#define BIG_LOWER_BOUND 160  //Smallest chunk size that is created for the big chunks
+
+#define LOG54 0.3219280948873623
+#define LOG128 7
+
+typedef struct _MallocProlog MallocProlog;
+
+struct _MallocProlog
+ {
+   MallocProlog *nextChunkInFreeList;
+   MallocProlog *prevChunkInFreeList;
+   MallocProlog *nextHigherInMem;
+   MallocProlog *nextLowerInMem;
+ };
+//MallocProlog
+ 
+ typedef struct MallocArrays MallocArrays;
+
+ struct MallocArrays
+ {
+     MallocProlog **smallChunks;
+     MallocProlog **bigChunks;
+     uint64       bigChunksSearchVector[2];
+     void         *memSpace;
+     uint32       containerCount;
+ };
+ //MallocArrays
+
+typedef struct
+ {
+   MallocProlog *firstChunkInFreeList;
+   int32         numInList; //TODO not used
+ }
+FreeListHead;
+
+void *
+PR_int__malloc( size_t sizeRequested );
+#define PR_PI__malloc  PR_int__malloc
+
+void *
+PR_WL__malloc( int32  sizeRequested ); /*BUG: -- get master lock */
+#define PR_App__malloc  PR_WL__malloc
+
+void *
+PR_int__malloc_aligned( size_t sizeRequested );
+#define PR_PI__malloc_aligned PR_int__malloc_aligned
+
+void
+PR_int__free( void *ptrToFree );
+#define PR_PI__free  PR_int__free
+
+void
+PR_WL__free( void *ptrToFree );
+#define PR_App__free  PR_WL__free
+
+
+
+/*Allocates memory from the external system -- higher overhead
+ */
+void *
+PR_ext__malloc_in_ext( size_t sizeRequested );
+
+/*Frees memory that was allocated in the external system -- higher overhead
+ */
+void
+PR_ext__free_in_ext( void *ptrToFree );
+
+
+MallocArrays *
+PR_ext__create_free_list();
+
+void
+PR_ext__free_free_list(MallocArrays *freeLists );
+
+#endif
\ No newline at end of file
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Debugging/DEBUG__macros.h
--- a/Services_Offered_by_VMS/Debugging/DEBUG__macros.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,65 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef  _VMS_DEFS_DEBUG_H
-#define	_VMS_DEFS_DEBUG_H
-#define _GNU_SOURCE
-
-/*
- */
-#ifdef DEBUG__TURN_ON_DEBUG_PRINT
-   #define DEBUG__printf(  bool, msg) \
-      do{\
-         if(bool)\
-          { printf(msg);\
-            printf(" | function: %s\n", __FUNCTION__);\
-            fflush(stdin);\
-          }\
-        }while(0);/*macro magic to isolate var-names*/
-
-   #define DEBUG__printf1( bool, msg, param)  \
-      do{\
-         if(bool)\
-          { printf(msg, param);\
-            printf(" | function: %s\n", __FUNCTION__);\
-            fflush(stdin);\
-          }\
-        }while(0);/*macro magic to isolate var-names*/
-
-   #define DEBUG__printf2( bool, msg, p1, p2) \
-      do{\
-         if(bool)\
-          { printf(msg, p1, p2); \
-            printf(" | function: %s\n", __FUNCTION__);\
-            fflush(stdin);\
-          }\
-        }while(0);/*macro magic to isolate var-names*/
-
-   #define DEBUG__printf3( bool, msg, p1, p2, p3) \
-      do{\
-         if(bool)\
-          { printf(msg, p1, p2, p3); \
-            printf(" | function: %s\n", __FUNCTION__);\
-            fflush(stdin);\
-          }\
-        }while(0);/*macro magic to isolate var-names*/
-
-#else
-   #define DEBUG__printf(  bool, msg)         
-   #define DEBUG__printf1( bool, msg, param)  
-   #define DEBUG__printf2( bool, msg, p1, p2) 
-#endif
-
-//============================= ERROR MSGs ============================
-#define ERROR(msg) printf(msg);
-#define ERROR1(msg, param) printf(msg, param); 
-#define ERROR2(msg, p1, p2) printf(msg, p1, p2);
-
-//===========================================================================
-#endif	/* _VMS_DEFS_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Lang_Constructs/VMS_Lang.h
--- a/Services_Offered_by_VMS/Lang_Constructs/VMS_Lang.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,44 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- *
- */
-
-#ifndef _VMS_LANG_CONSTRUCTS_H
-#define	_VMS_LANG_CONSTRUCTS_H
-
-#include "VMS_impl/VMS_primitive_data_types.h"
-
-/*This header defines everything specific to the VMS provided language
- * constructs.
- *Such constructs are used in application code, mixed-in with calls to
- * constructs of the VMS-based language. 
- */
-inline void
-handleMalloc( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv);
-inline void
-handleFree( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv );
-inline void
-handleTransEnd(SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv*semEnv);
-inline void
-handleTransStart( SSRSemReq *semReq, SlaveVP *requestingSlv,
-                  SSRSemEnv *semEnv );
-inline void
-handleAtomic( SSRSemReq *semReq, SlaveVP *requestingSlv, SSRSemEnv *semEnv);
-inline void
-handleStartFnSingleton( SSRSemReq *semReq, SlaveVP *reqstingSlv,
-                      SSRSemEnv *semEnv );
-inline void
-handleEndFnSingleton( SSRSemReq *semReq, SlaveVP *requestingSlv,
-                    SSRSemEnv *semEnv );
-inline void
-handleStartDataSingleton( SSRSemReq *semReq, SlaveVP *reqstingSlv,
-                      SSRSemEnv *semEnv );
-inline void
-handleEndDataSingleton( SSRSemReq *semReq, SlaveVP *requestingSlv,
-                    SSRSemEnv *semEnv );
-
-#endif	/* _VMS_LANG_CONSTRUCTS_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h
--- a/Services_Offered_by_VMS/Measurement_and_Stats/MEAS__macros.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,514 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef _VMS_MEAS_MACROS_H
-#define _VMS_MEAS_MACROS_H
-#define _GNU_SOURCE
-
-//==================  Macros define types of meas want  =====================
-//
-/*Generic measurement macro -- has name-space collision potential, which
- * compiler will catch..  so only use one pair inside a given set of 
- * curly braces. 
- */
-//TODO: finish generic capture interval in hist
-enum histograms
- { generic1
- };
-   #define MEAS__Capture_Pre_Point \
-      int32 startStamp, endStamp; \
-      saveLowTimeStampCountInto( startStamp );
-
-   #define MEAS__Capture_Post_Point( histName ) \
-      saveLowTimeStampCountInto( endStamp ); \
-      addIntervalToHist( startStamp, endStamp, _VMSMasterEnv->histName ); 
-
-
-
-
-//==================  Macros define types of meas want  =====================
-
-#ifdef MEAS__TURN_ON_SUSP_MEAS
-   #define MEAS__Insert_Susp_Meas_Fields_into_Slave \
-       uint32  preSuspTSCLow; \
-       uint32  postSuspTSCLow;
-
-   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv \
-       Histogram       *suspLowTimeHist; \
-       Histogram       *suspHighTimeHist;
-
-   #define MEAS__Make_Meas_Hists_for_Susp_Meas \
-      _VMSMasterEnv->suspLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "master_low_time_hist");\
-      _VMSMasterEnv->suspHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "master_high_time_hist");
-      
-      //record time stamp: compare to time-stamp recorded below
-   #define MEAS__Capture_Pre_Susp_Point \
-      saveLowTimeStampCountInto( animatingSlv->preSuspTSCLow );
-   
-      //NOTE: only take low part of count -- do sanity check when take diff
-   #define MEAS__Capture_Post_Susp_Point \
-      saveLowTimeStampCountInto( animatingSlv->postSuspTSCLow );\
-      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
-                         _VMSMasterEnv->suspLowTimeHist ); \
-      addIntervalToHist( preSuspTSCLow, postSuspTSCLow,\
-                         _VMSMasterEnv->suspHighTimeHist );
-
-   #define MEAS__Print_Hists_for_Susp_Meas \
-      printHist( _VMSMasterEnv->pluginTimeHist );
-      
-#else
-   #define MEAS__Insert_Susp_Meas_Fields_into_Slave     
-   #define MEAS__Insert_Susp_Meas_Fields_into_MasterEnv 
-   #define MEAS__Make_Meas_Hists_for_Susp_Meas 
-   #define MEAS__Capture_Pre_Susp_Point
-   #define MEAS__Capture_Post_Susp_Point   
-   #define MEAS__Print_Hists_for_Susp_Meas 
-#endif
-
-#ifdef MEAS__TURN_ON_MASTER_MEAS
-   #define MEAS__Insert_Master_Meas_Fields_into_Slave \
-       uint32  startMasterTSCLow; \
-       uint32  endMasterTSCLow;
-
-   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv \
-       Histogram       *masterLowTimeHist; \
-       Histogram       *masterHighTimeHist;
-
-   #define MEAS__Make_Meas_Hists_for_Master_Meas \
-      _VMSMasterEnv->masterLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "master_low_time_hist");\
-      _VMSMasterEnv->masterHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "master_high_time_hist");
-
-      //Total Master time includes one coreloop time -- just assume the core
-      // loop time is same for Master as for AppSlvs, even though it may be
-      // smaller due to higher predictability of the fixed jmp.
-   #define MEAS__Capture_Pre_Master_Point\
-      saveLowTimeStampCountInto( masterVP->startMasterTSCLow );
-
-   #define MEAS__Capture_Post_Master_Point \
-      saveLowTimeStampCountInto( masterVP->endMasterTSCLow );\
-      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
-                         _VMSMasterEnv->masterLowTimeHist ); \
-      addIntervalToHist( startMasterTSCLow, endMasterTSCLow,\
-                         _VMSMasterEnv->masterHighTimeHist );
-
-   #define MEAS__Print_Hists_for_Master_Meas \
-      printHist( _VMSMasterEnv->pluginTimeHist );
-
-#else
-   #define MEAS__Insert_Master_Meas_Fields_into_Slave
-   #define MEAS__Insert_Master_Meas_Fields_into_MasterEnv 
-   #define MEAS__Make_Meas_Hists_for_Master_Meas
-   #define MEAS__Capture_Pre_Master_Point 
-   #define MEAS__Capture_Post_Master_Point 
-   #define MEAS__Print_Hists_for_Master_Meas 
-#endif
-
-      
-#ifdef MEAS__TURN_ON_MASTER_LOCK_MEAS
-   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv \
-       Histogram       *masterLockLowTimeHist; \
-       Histogram       *masterLockHighTimeHist;
-
-   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas \
-      _VMSMasterEnv->masterLockLowTimeHist  = makeFixedBinHist( 50, 0, 2, \
-                                               "master lock low time hist");\
-      _VMSMasterEnv->masterLockHighTimeHist  = makeFixedBinHist( 50, 0, 100,\
-                                               "master lock high time hist");
-
-   #define MEAS__Capture_Pre_Master_Lock_Point \
-      int32 startStamp, endStamp; \
-      saveLowTimeStampCountInto( startStamp );
-
-   #define MEAS__Capture_Post_Master_Lock_Point \
-      saveLowTimeStampCountInto( endStamp ); \
-      addIntervalToHist( startStamp, endStamp,\
-                         _VMSMasterEnv->masterLockLowTimeHist ); \
-      addIntervalToHist( startStamp, endStamp,\
-                         _VMSMasterEnv->masterLockHighTimeHist );
-
-   #define MEAS__Print_Hists_for_Master_Lock_Meas \
-      printHist( _VMSMasterEnv->masterLockLowTimeHist ); \
-      printHist( _VMSMasterEnv->masterLockHighTimeHist );
-      
-#else
-   #define MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv
-   #define MEAS__Make_Meas_Hists_for_Master_Lock_Meas
-   #define MEAS__Capture_Pre_Master_Lock_Point 
-   #define MEAS__Capture_Post_Master_Lock_Point 
-   #define MEAS__Print_Hists_for_Master_Lock_Meas
-#endif
-
-
-#ifdef MEAS__TURN_ON_MALLOC_MEAS
-   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv\
-       Histogram       *mallocTimeHist; \
-       Histogram       *freeTimeHist;
-
-   #define MEAS__Make_Meas_Hists_for_Malloc_Meas \
-      _VMSMasterEnv->mallocTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
-                                                       "malloc_time_hist");\
-      _VMSMasterEnv->freeTimeHist  = makeFixedBinHistExt( 100, 0, 30,\
-                                                       "free_time_hist");
-
-   #define MEAS__Capture_Pre_Malloc_Point \
-      int32 startStamp, endStamp; \
-      saveLowTimeStampCountInto( startStamp );
-
-   #define MEAS__Capture_Post_Malloc_Point \
-      saveLowTimeStampCountInto( endStamp ); \
-      addIntervalToHist( startStamp, endStamp,\
-                         _VMSMasterEnv->mallocTimeHist ); 
-
-   #define MEAS__Capture_Pre_Free_Point \
-      int32 startStamp, endStamp; \
-      saveLowTimeStampCountInto( startStamp );
-
-   #define MEAS__Capture_Post_Free_Point \
-      saveLowTimeStampCountInto( endStamp ); \
-      addIntervalToHist( startStamp, endStamp,\
-                         _VMSMasterEnv->freeTimeHist ); 
-
-   #define MEAS__Print_Hists_for_Malloc_Meas \
-      printHist( _VMSMasterEnv->mallocTimeHist   ); \
-      saveHistToFile( _VMSMasterEnv->mallocTimeHist   ); \
-      printHist( _VMSMasterEnv->freeTimeHist     ); \
-      saveHistToFile( _VMSMasterEnv->freeTimeHist     ); \
-      freeHistExt( _VMSMasterEnv->mallocTimeHist ); \
-      freeHistExt( _VMSMasterEnv->freeTimeHist   );
-      
-#else
-   #define MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv
-   #define MEAS__Make_Meas_Hists_for_Malloc_Meas 
-   #define MEAS__Capture_Pre_Malloc_Point
-   #define MEAS__Capture_Post_Malloc_Point
-   #define MEAS__Capture_Pre_Free_Point
-   #define MEAS__Capture_Post_Free_Point
-   #define MEAS__Print_Hists_for_Malloc_Meas 
-#endif
-
-
-
-#ifdef MEAS__TURN_ON_PLUGIN_MEAS 
-   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv \
-      Histogram       *reqHdlrLowTimeHist; \
-      Histogram       *reqHdlrHighTimeHist;
-          
-   #define MEAS__Make_Meas_Hists_for_Plugin_Meas \
-      _VMSMasterEnv->reqHdlrLowTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "plugin_low_time_hist");\
-      _VMSMasterEnv->reqHdlrHighTimeHist  = makeFixedBinHistExt( 100, 0, 200,\
-                                                    "plugin_high_time_hist");
-
-   #define MEAS__startReqHdlr \
-      int32 startStamp1, endStamp1; \
-      saveLowTimeStampCountInto( startStamp1 );
-
-   #define MEAS__endReqHdlr \
-      saveLowTimeStampCountInto( endStamp1 ); \
-      addIntervalToHist( startStamp1, endStamp1, \
-                           _VMSMasterEnv->reqHdlrLowTimeHist ); \
-      addIntervalToHist( startStamp1, endStamp1, \
-                           _VMSMasterEnv->reqHdlrHighTimeHist );
-
-   #define MEAS__Print_Hists_for_Plugin_Meas \
-      printHist( _VMSMasterEnv->reqHdlrLowTimeHist ); \
-      saveHistToFile( _VMSMasterEnv->reqHdlrLowTimeHist ); \
-      printHist( _VMSMasterEnv->reqHdlrHighTimeHist ); \
-      saveHistToFile( _VMSMasterEnv->reqHdlrHighTimeHist ); \
-      freeHistExt( _VMSMasterEnv->reqHdlrLowTimeHist ); \
-      freeHistExt( _VMSMasterEnv->reqHdlrHighTimeHist );
-#else
-   #define MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv
-   #define MEAS__Make_Meas_Hists_for_Plugin_Meas
-   #define MEAS__startReqHdlr 
-   #define MEAS__endReqHdlr 
-   #define MEAS__Print_Hists_for_Plugin_Meas 
-
-#endif
-
-      
-#ifdef MEAS__TURN_ON_SYSTEM_MEAS
-   #define MEAS__Insert_System_Meas_Fields_into_Slave \
-      TSCountLowHigh  startSusp; \
-      uint64  totalSuspCycles; \
-      uint32  numGoodSusp;
-
-   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv \
-       TSCountLowHigh   startMaster; \
-       uint64           totalMasterCycles; \
-       uint32           numMasterAnimations; \
-       TSCountLowHigh   startReqHdlr; \
-       uint64           totalPluginCycles; \
-       uint32           numPluginAnimations; \
-       uint64           cyclesTillStartAnimationMaster; \
-       TSCountLowHigh   endAnimationMaster;
-
-   #define MEAS__startAnimationMaster_forSys \
-      TSCountLowHigh startStamp1, endStamp1; \
-      saveTSCLowHigh( endStamp1 ); \
-      _VMSMasterEnv->cyclesTillStartAnimationMaster = \
-      endStamp1.longVal - masterVP->startSusp.longVal;
-
-   #define Meas_startReqHdlr_forSys \
-        saveTSCLowHigh( startStamp1 ); \
-        _VMSMasterEnv->startReqHdlr.longVal = startStamp1.longVal;
- 
-   #define MEAS__endAnimationMaster_forSys \
-      saveTSCLowHigh( startStamp1 ); \
-      _VMSMasterEnv->endAnimationMaster.longVal = startStamp1.longVal;
-
-   /*A TSC is stored in VP first thing inside wrapper-lib
-    * Now, measures cycles from there to here
-    * Master and Plugin will add this value to other trace-seg measures
-    */
-   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys\
-          saveTSCLowHigh(endSusp); \
-          numCycles = endSusp.longVal - currVP->startSusp.longVal; \
-          /*sanity check (400K is about 20K iters)*/ \
-          if( numCycles < 400000 ) \
-           { currVP->totalSuspCycles += numCycles; \
-             currVP->numGoodSusp++; \
-           } \
-             /*recorded every time, but only read if currVP == MasterVP*/ \
-          _VMSMasterEnv->startMaster.longVal = endSusp.longVal;
-
-#else
-   #define MEAS__Insert_System_Meas_Fields_into_Slave 
-   #define MEAS__Insert_System_Meas_Fields_into_MasterEnv 
-   #define MEAS__Make_Meas_Hists_for_System_Meas
-   #define MEAS__startAnimationMaster_forSys 
-   #define MEAS__startReqHdlr_forSys
-   #define MEAS__endAnimationMaster_forSys
-   #define MEAS__Capture_End_Susp_in_CoreCtlr_ForSys
-   #define MEAS__Print_Hists_for_System_Meas 
-#endif
-
-#ifdef HOLISTIC__TURN_ON_PERF_COUNTERS
-   
-   #define MEAS__Insert_Counter_Handler \
-   typedef void (*CounterHandler) (int,int,int,SlaveVP*,uint64,uint64,uint64);
- 
-   enum eventType {
-    DebugEvt = 0,
-    AppResponderInvocation_start,
-    AppResponder_start,
-    AppResponder_end,
-    AssignerInvocation_start,
-    NextAssigner_start,
-    Assigner_start,
-    Assigner_end,
-    Work_start,
-    Work_end,
-    HwResponderInvocation_start,
-    Timestamp_start,
-    Timestamp_end
-   };
-   
-   #define saveCyclesAndInstrs(core,cycles,instrs,cachem) do{ \
-   int cycles_fd = _VMSMasterEnv->cycles_counter_fd[core]; \
-   int instrs_fd = _VMSMasterEnv->instrs_counter_fd[core]; \
-   int cachem_fd = _VMSMasterEnv->cachem_counter_fd[core]; \
-   int nread;                                           \
-                                                        \
-   nread = read(cycles_fd,&(cycles),sizeof(cycles));    \
-   if(nread<0){                                         \
-       perror("Error reading cycles counter");          \
-       cycles = 0;                                      \
-   }                                                    \
-                                                        \
-   nread = read(instrs_fd,&(instrs),sizeof(instrs));    \
-   if(nread<0){                                         \
-       perror("Error reading cycles counter");          \
-       instrs = 0;                                      \
-   }                                                    \
-   nread = read(cachem_fd,&(cachem),sizeof(cachem));    \
-   if(nread<0){                                         \
-       perror("Error reading last level cache miss counter");          \
-       cachem = 0;                                      \
-   }                                                    \
-   } while (0) 
-
-   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv \
-     int cycles_counter_fd[NUM_CORES]; \
-     int instrs_counter_fd[NUM_CORES]; \
-     int cachem_counter_fd[NUM_CORES]; \
-     uint64 start_master_lock[NUM_CORES][3]; \
-     CounterHandler counterHandler;
-
-   #define HOLISTIC__Setup_Perf_Counters setup_perf_counters();
-   
-
-   #define HOLISTIC__CoreCtrl_Setup \
-   CounterHandler counterHandler = _VMSMasterEnv->counterHandler; \
-   SlaveVP      *lastVPBeforeMaster = NULL; \
-   /*if(thisCoresThdParams->coreNum == 0){ \
-       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
-       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
-   } \
-   if(0 < (thisCoresThdParams->coreNum) && (thisCoresThdParams->coreNum) < (NUM_CORES - 1)){ \
-       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
-       int sndctr = tsc_offset_resp(sendCoresThdParams, 0); \
-       uint64 initval = tsc_offset_send(thisCoresThdParams,0); \
-       while(!coreCtlrThdParams[NUM_CORES - 2]->ret_tsc); \
-   }  \
-   if(thisCoresThdParams->coreNum == (NUM_CORES - 1)){ \
-       ThdParams* sendCoresThdParams = coreCtlrThdParams[thisCoresThdParams->coreNum - 1]; \
-       int sndctr = tsc_offset_resp(sendCoresThdParams,0); \
-   }*/
-   
-   
-   #define HOLISTIC__Insert_Master_Global_Vars \
-        int vpid,task; \
-        CounterHandler counterHandler = masterEnv->counterHandler;
-   
-   #define HOLISTIC__Record_last_work lastVPBeforeMaster = currVP;
-
-   #define HOLISTIC__Record_AppResponderInvocation_start \
-      uint64 cycles,instrs,cachem; \
-      saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
-      if(lastVPBeforeMaster){ \
-        (*counterHandler)(AppResponderInvocation_start,lastVPBeforeMaster->slaveID,lastVPBeforeMaster->assignCount,lastVPBeforeMaster,cycles,instrs,cachem); \
-        lastVPBeforeMaster = NULL; \
-      } else { \
-          _VMSMasterEnv->start_master_lock[thisCoresIdx][0] = cycles; \
-          _VMSMasterEnv->start_master_lock[thisCoresIdx][1] = instrs; \
-          _VMSMasterEnv->start_master_lock[thisCoresIdx][2] = cachem; \
-      }
- 
-           /* Request Handler may call resume() on the VP, but we want to 
-                * account the whole interval to the same task. Therefore, need
-                * to save task ID at the beginning.
-                * 
-                * Using this value as "end of AppResponder Invocation Time"
-                * is possible if there is only one SchedSlot per core -
-                * invoking processor is last to be treated here! If more than
-                * one slot, MasterLoop processing time for all but the last VP
-                * would be erroneously counted as invocation time.
-                */
-   #define HOLISTIC__Record_AppResponder_start \
-               vpid = currSlot->slaveAssignedToSlot->slaveID; \
-               task = currSlot->slaveAssignedToSlot->assignCount; \
-               uint64 cycles, instrs, cachem; \
-               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
-               (*counterHandler)(AppResponder_start,vpid,task,currSlot->slaveAssignedToSlot,cycles,instrs,cachem);
-
-   #define HOLISTIC__Record_AppResponder_end \
-        uint64 cycles2,instrs2,cachem2; \
-        saveCyclesAndInstrs(thisCoresIdx,cycles2, instrs2,cachem2); \
-        (*counterHandler)(AppResponder_end,vpid,task,currSlot->slaveAssignedToSlot,cycles2,instrs2,cachem2); \
-        (*counterHandler)(Timestamp_end,vpid,task,currSlot->slaveAssignedToSlot,rdtsc(),0,0);
-
-   
-   /* Don't know who to account time to yet - goes to assigned VP
-    * after the call.
-    */
-   #define HOLISTIC__Record_Assigner_start \
-       int empty = FALSE; \
-       if(currSlot->slaveAssignedToSlot == NULL){ \
-           empty= TRUE; \
-       } \
-       uint64 tmp_cycles, tmp_instrs, tmp_cachem; \
-       saveCyclesAndInstrs(thisCoresIdx,tmp_cycles,tmp_instrs,tmp_cachem); \
-       uint64 tsc = rdtsc(); \
-       if(vpid > 0) { \
-           (*counterHandler)(NextAssigner_start,vpid,task,currSlot->slaveAssignedToSlot,tmp_cycles,tmp_instrs,tmp_cachem); \
-           vpid = 0; \
-           task = 0; \
-        }
-
-   #define HOLISTIC__Record_Assigner_end \
-        uint64 cycles,instrs,cachem; \
-        saveCyclesAndInstrs(thisCoresIdx,cycles,instrs,cachem); \
-        if(empty){ \
-            (*counterHandler)(AssignerInvocation_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,masterEnv->start_master_lock[thisCoresIdx][0],masterEnv->start_master_lock[thisCoresIdx][1],masterEnv->start_master_lock[thisCoresIdx][2]); \
-        } \
-        (*counterHandler)(Timestamp_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tsc,0,0); \
-        (*counterHandler)(Assigner_start,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,tmp_cycles,tmp_instrs,tmp_cachem); \
-        (*counterHandler)(Assigner_end,assignedSlaveVP->slaveID,assignedSlaveVP->assignCount,assignedSlaveVP,cycles,instrs,tmp_cachem);
-
-   #define HOLISTIC__Record_Work_start \
-        if(currVP){ \
-                uint64 cycles,instrs,cachem; \
-                saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
-                (*counterHandler)(Work_start,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
-        }
-   
-   #define HOLISTIC__Record_Work_end \
-       if(currVP){ \
-               uint64 cycles,instrs,cachem; \
-               saveCyclesAndInstrs(thisCoresIdx,cycles, instrs,cachem); \
-               (*counterHandler)(Work_end,currVP->slaveID,currVP->assignCount,currVP,cycles,instrs,cachem); \
-       }
-
-   #define HOLISTIC__Record_HwResponderInvocation_start \
-        uint64 cycles,instrs,cachem; \
-        saveCyclesAndInstrs(animatingSlv->coreAnimatedBy,cycles, instrs,cachem); \
-        (*(_VMSMasterEnv->counterHandler))(HwResponderInvocation_start,animatingSlv->slaveID,animatingSlv->assignCount,animatingSlv,cycles,instrs,cachem); 
-        
-
-   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr) do{     \
-void* frame_ptr0 = vp_ptr->framePtr;                               \
-void* frame_ptr1 = *((void**)frame_ptr0);                          \
-void* frame_ptr2 = *((void**)frame_ptr1);                          \
-void* frame_ptr3 = *((void**)frame_ptr2);                          \
-void* ret_addr = *((void**)frame_ptr3 + 1);                        \
-*res_ptr = ret_addr;                                               \
-} while (0)
-
-#else  
-   #define MEAS__Insert_Counter_Handler
-   #define MEAS__Insert_Counter_Meas_Fields_into_MasterEnv
-   #define HOLISTIC__Setup_Perf_Counters
-   #define HOLISTIC__CoreCtrl_Setup
-   #define HOLISTIC__Insert_Master_Global_Vars
-   #define HOLISTIC__Record_last_work
-   #define HOLISTIC__Record_AppResponderInvocation_start
-   #define HOLISTIC__Record_AppResponder_start
-   #define HOLISTIC__Record_AppResponder_end
-   #define HOLISTIC__Record_Assigner_start
-   #define HOLISTIC__Record_Assigner_end
-   #define HOLISTIC__Record_Work_start
-   #define HOLISTIC__Record_Work_end
-   #define HOLISTIC__Record_HwResponderInvocation_start
-   #define getReturnAddressBeforeLibraryCall(vp_ptr, res_ptr)
-#endif
-
-//Experiment in two-step macros -- if doesn't work, insert each separately
-#define MEAS__Insert_Meas_Fields_into_Slave  \
-   MEAS__Insert_Susp_Meas_Fields_into_Slave \
-   MEAS__Insert_Master_Meas_Fields_into_Slave \
-   MEAS__Insert_System_Meas_Fields_into_Slave 
-
-
-//======================  Histogram Macros -- Create ========================
-//
-//
-
-//The language implementation should include a definition of this macro,
-// which creates all the histograms the language uses to collect measurements
-// of plugin operation -- so, if the language didn't define it, must
-// define it here (as empty), to avoid compile error
-#ifndef MEAS__Make_Meas_Hists_for_Language
-#define MEAS__Make_Meas_Hists_for_Language
-#endif
-
-#define makeAMeasHist( idx, name, numBins, startVal, binWidth ) \
-      makeHighestDynArrayIndexBeAtLeast( _VMSMasterEnv->measHistsInfo, idx ); \
-      _VMSMasterEnv->measHists[idx] =  \
-                       makeFixedBinHist( numBins, startVal, binWidth, name );
-
-//==============================  Probes  ===================================
-
-
-//===========================================================================
-#endif	/* _VMS_DEFS_MEAS_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Measurement_and_Stats/probes.c
--- a/Services_Offered_by_VMS/Measurement_and_Stats/probes.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,304 +0,0 @@
-/*
- * Copyright 2010  OpenSourceStewardshipFoundation
- *
- * Licensed under BSD
- */
-
-#include <stdio.h>
-#include <malloc.h>
-#include <sys/time.h>
-
-#include "VMS_impl/VMS.h"
-
-
-
-//====================  Probes =================
-/*
- * In practice, probe operations are called from the app, from inside slaves
- *  -- so have to be sure each probe is single-Slv owned, and be sure that
- *  any place common structures are modified it's done inside the master.
- * So -- the only place common structures are modified is during creation.
- *  after that, all mods are to individual instances.
- *
- * Thniking perhaps should change the semantics to be that probes are
- *  attached to the virtual processor -- and then everything is guaranteed
- *  to be isolated -- except then can't take any intervals that span Slvs,
- *  and would have to transfer the probes to Master env when Slv dissipates..
- *  gets messy..
- *
- * For now, just making so that probe creation causes a suspend, so that
- *  the dynamic array in the master env is only modified from the master
- * 
- */
-
-//============================  Helpers ===========================
-inline void 
-doNothing()
- {
- }
-
-float64 inline
-giveInterval( struct timeval _start, struct timeval _end )
- { float64 start, end;
-   start = _start.tv_sec + _start.tv_usec / 1000000.0;
-   end   = _end.tv_sec   + _end.tv_usec   / 1000000.0;
-   return end - start;
- }
-          
-//=================================================================
-IntervalProbe *
-create_generic_probe( char *nameStr, SlaveVP *animSlv )
- {
-   VMSSemReq reqData;
-
-   reqData.reqType  = make_probe;
-   reqData.nameStr  = nameStr;
-
-   VMS_WL__send_VMSSem_request( &reqData, animSlv );
-
-   return animSlv->dataRetFromReq;
- }
-
-/*Use this version from outside VMS -- it uses external malloc, and modifies
- * dynamic array, so can't be animated in a slave Slv
- */
-IntervalProbe *
-ext__create_generic_probe( char *nameStr )
- { IntervalProbe *newProbe;
-   int32          nameLen;
-
-   newProbe          = malloc( sizeof(IntervalProbe) );
-   nameLen = strlen( nameStr );
-   newProbe->nameStr = malloc( nameLen );
-   memcpy( newProbe->nameStr, nameStr, nameLen );
-   newProbe->hist    = NULL;
-   newProbe->schedChoiceWasRecorded = FALSE;
-   newProbe->probeID =
-             addToDynArray( newProbe, _VMSMasterEnv->dynIntervalProbesInfo );
-
-   return newProbe;
- }
-
-//============================ Fns def in header =======================
-
-int32
-VMS_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv )
- { IntervalProbe *newProbe;
-
-   newProbe = create_generic_probe( nameStr, animSlv );
-   
-   return newProbe->probeID;
- }
-
-int32
-VMS_impl__create_histogram_probe( int32   numBins, float64    startValue,
-               float64 binWidth, char   *nameStr, SlaveVP *animSlv )
- { IntervalProbe *newProbe;
-
-   newProbe = create_generic_probe( nameStr, animSlv );
-   
-#ifdef PROBES__USE_TIME_OF_DAY_PROBES
-   DblHist *hist;
-   hist =  makeDblHistogram( numBins, startValue, binWidth );
-#else
-   Histogram *hist;
-   hist =  makeHistogram( numBins, startValue, binWidth );
-#endif
-   newProbe->hist = hist;
-   return newProbe->probeID;
- }
-
-
-int32
-VMS_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv)
- { IntervalProbe *newProbe;
-   struct timeval *startStamp;
-   float64 startSecs;
-
-   newProbe           = create_generic_probe( nameStr, animSlv );
-   newProbe->endSecs  = 0;
-
-   
-   gettimeofday( &(newProbe->startStamp), NULL);
-
-      //turn into a double
-   startStamp = &(newProbe->startStamp);
-   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
-   newProbe->startSecs = startSecs;
-
-   return newProbe->probeID;
- }
-
-int32
-VMS_ext_impl__record_time_point_into_new_probe( char *nameStr )
- { IntervalProbe *newProbe;
-   struct timeval *startStamp;
-   float64 startSecs;
-
-   newProbe           = ext__create_generic_probe( nameStr );
-   newProbe->endSecs  = 0;
-
-   gettimeofday( &(newProbe->startStamp), NULL);
-
-      //turn into a double
-   startStamp = &(newProbe->startStamp);
-   startSecs = startStamp->tv_sec + ( startStamp->tv_usec / 1000000.0 );
-   newProbe->startSecs = startSecs;
-
-   return newProbe->probeID;
- }
-
-
-/*Only call from inside master or main startup/shutdown thread
- */
-void
-VMS_impl__free_probe( IntervalProbe *probe )
- { if( probe->hist != NULL )   freeDblHist( probe->hist );
-   if( probe->nameStr != NULL) VMS_int__free( probe->nameStr );
-   VMS_int__free( probe );
- }
-
-
-void
-VMS_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv )
- { IntervalProbe *probe;
-
-   VMS_int__get_master_lock();
-   probe = _VMSMasterEnv->intervalProbes[ probeID ];
-
-   addValueIntoTable(probe->nameStr, probe, _VMSMasterEnv->probeNameHashTbl);
-   VMS_int__release_master_lock();
- }
-
-
-IntervalProbe *
-VMS_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv )
- {
-   //TODO: fix this To be in Master -- race condition
-   return getValueFromTable( probeName, _VMSMasterEnv->probeNameHashTbl );
- }
-
-
-/*Everything is local to the animating slaveVP, so no need for request, do
- * work locally, in the anim Slv
- */
-void
-VMS_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animatingSlv )
- { IntervalProbe *probe;
- 
-   probe = _VMSMasterEnv->intervalProbes[ probeID ];
-   probe->schedChoiceWasRecorded = TRUE;
-   probe->coreNum = animatingSlv->coreAnimatedBy;
-   probe->slaveID = animatingSlv->slaveID;
-   probe->slaveCreateSecs = animatingSlv->createPtInSecs;
- }
-
-/*Everything is local to the animating slaveVP, so no need for request, do
- * work locally, in the anim Slv
- */
-void
-VMS_impl__record_interval_start_in_probe( int32 probeID )
- { IntervalProbe *probe;
-
-         DEBUG__printf( dbgProbes, "record start of interval" )
-   probe = _VMSMasterEnv->intervalProbes[ probeID ];
-
-      //record *start* point as last thing, after lookup
-#ifdef PROBES__USE_TIME_OF_DAY_PROBES
-   gettimeofday( &(probe->startStamp), NULL);
-#endif
-#ifdef PROBES__USE_TSC_PROBES
-   probe->startStamp = getTSCount();
-#endif
- }
-
-
-/*Everything is local to the animating slaveVP, except the histogram, so do
- * work locally, in the anim Slv -- may lose a few histogram counts
- * 
- *This should be safe to run inside SlaveVP
- */
-void
-VMS_impl__record_interval_end_in_probe( int32 probeID )
- { IntervalProbe *probe;
-
-   //Record first thing -- before looking up the probe to store it into
-#ifdef PROBES__USE_TIME_OF_DAY_PROBES
-   struct timeval  endStamp;
-   gettimeofday( &(endStamp), NULL);
-#endif
-#ifdef PROBES__USE_TSC_PROBES
-   TSCount endStamp, interval;
-   endStamp = getTSCount();
-#endif
-#ifdef PROBES__USE_PERF_CTR_PROBES
-
-#endif
-   
-   probe = _VMSMasterEnv->intervalProbes[ probeID ];
-
-#ifdef PROBES__USE_TIME_OF_DAY_PROBES
-   if( probe->hist != NULL )
-    { addToDblHist( giveInterval( probe->startStamp, endStamp), probe->hist );
-    }
-#endif
-#ifdef PROBES__USE_TSC_PROBES
-   if( probe->hist != NULL )
-    { interval = probe->endStamp - probe->startStamp;
-         //Sanity check for TSC counter overflow: if sane, add to histogram
-      if( interval < probe->hist->endOfRange * 10 )
-         addToHist( interval, probe->hist );
-    }
-#endif
-#ifdef PROBES__USE_PERF_CTR_PROBES
-
-#endif
-   
-         DEBUG__printf( dbgProbes, "record end of interval" )
- }
-
-
-void
-print_probe_helper( IntervalProbe *probe )
- {
-   printf( "\nprobe: %s, ",  probe->nameStr );
-   
-   
-   if( probe->schedChoiceWasRecorded )
-    { printf( "coreNum: %d, slaveID: %d, slaveVPCreated: %0.6f | ",
-              probe->coreNum, probe->slaveID, probe->slaveCreateSecs );
-    }
-
-   if( probe->endSecs == 0 ) //just a single point in time
-    {
-      printf( " time point: %.6f\n",
-              probe->startSecs - _VMSMasterEnv->createPtInSecs );
-    }
-   else if( probe->hist == NULL ) //just an interval
-    {
-      printf( " startSecs: %.6f interval: %.6f\n", 
-         (probe->startSecs - _VMSMasterEnv->createPtInSecs), probe->interval);
-    }
-   else  //a full histogram of intervals
-    {
-      printDblHist( probe->hist );
-    }
- }
-
-void
-VMS_impl__print_stats_of_probe( IntervalProbe *probe )
- { 
-
-//   probe = _VMSMasterEnv->intervalProbes[ probeID ];
-
-   print_probe_helper( probe );
- }
-
-
-void
-VMS_impl__print_stats_of_all_probes()
- {
-   forAllInDynArrayDo( _VMSMasterEnv->dynIntervalProbesInfo,
-                          (DynArrayFnPtr) &VMS_impl__print_stats_of_probe );
-   fflush( stdout );
- }
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Measurement_and_Stats/probes.h
--- a/Services_Offered_by_VMS/Measurement_and_Stats/probes.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,192 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef _PROBES_H
-#define	_PROBES_H
-#define _GNU_SOURCE
-
-#include "VMS_impl/VMS_primitive_data_types.h"
-
-#include <sys/time.h>
-
-/*Note on order of include files:  
- * This file relies on #defines that appear in other files, which must come
- * first in the #include sequence..
- */
-
-/*Use these aliases in application code*/
-#define VMS_App__record_time_point_into_new_probe VMS_WL__record_time_point_into_new_probe
-#define VMS_App__create_single_interval_probe   VMS_WL__create_single_interval_probe
-#define VMS_App__create_histogram_probe         VMS_WL__create_histogram_probe
-#define VMS_App__index_probe_by_its_name        VMS_WL__index_probe_by_its_name
-#define VMS_App__get_probe_by_name              VMS_WL__get_probe_by_name
-#define VMS_App__record_sched_choice_into_probe VMS_WL__record_sched_choice_into_probe
-#define VMS_App__record_interval_start_in_probe VMS_WL__record_interval_start_in_probe 
-#define VMS_App__record_interval_end_in_probe   VMS_WL__record_interval_end_in_probe
-#define VMS_App__print_stats_of_probe           VMS_WL__print_stats_of_probe
-#define VMS_App__print_stats_of_all_probes      VMS_WL__print_stats_of_all_probes 
-
-
-//==========================
-#ifdef PROBES__USE_TSC_PROBES
-   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
-   TSCount    startStamp; \
-   TSCount    endStamp; \
-   TSCount    interval; \
-   Histogram *hist; /*if left NULL, then is single interval probe*/
-#endif
-#ifdef PROBES__USE_TIME_OF_DAY_PROBES
-   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
-   struct timeval  startStamp; \
-   struct timeval  endStamp; \
-   float64         startSecs; \
-   float64         endSecs; \
-   float64         interval; \
-   DblHist        *hist; /*if NULL, then is single interval probe*/
-#endif
-#ifdef PROBES__USE_PERF_CTR_PROBES
-   #define PROBES__Insert_timestamps_and_intervals_into_probe_struct \
-   int64  startStamp; \
-   int64  endStamp; \
-   int64  interval; \
-   Histogram *hist; /*if left NULL, then is single interval probe*/
-#endif
-
-//typedef struct _IntervalProbe IntervalProbe; -- is in VMS.h
-struct _IntervalProbe
- {
-   char           *nameStr;
-   int32           probeID;
-
-   int32           schedChoiceWasRecorded;
-   int32           coreNum;
-   int32           slaveID;
-   float64         slaveCreateSecs;
-   PROBES__Insert_timestamps_and_intervals_into_probe_struct;
- };
-
-//=========================== NEVER USE THESE ==========================
-/*NEVER use these in any code!!  These are here only for use in the macros
- * defined in this file!!
- */
-int32
-VMS_impl__create_single_interval_probe( char *nameStr, SlaveVP *animSlv );
-
-int32
-VMS_impl__create_histogram_probe( int32   numBins, float64    startValue,
-               float64 binWidth, char    *nameStr, SlaveVP *animSlv );
-
-int32
-VMS_impl__record_time_point_into_new_probe( char *nameStr, SlaveVP *animSlv);
-
-int32
-VMS_ext_impl__record_time_point_into_new_probe( char *nameStr );
-
-void
-VMS_impl__free_probe( IntervalProbe *probe );
-
-void
-VMS_impl__index_probe_by_its_name( int32 probeID, SlaveVP *animSlv );
-
-IntervalProbe *
-VMS_impl__get_probe_by_name( char *probeName, SlaveVP *animSlv );
-
-void
-VMS_impl__record_sched_choice_into_probe( int32 probeID, SlaveVP *animSlv );
-
-void
-VMS_impl__record_interval_start_in_probe( int32 probeID );
-
-void
-VMS_impl__record_interval_end_in_probe( int32 probeID );
-
-void
-VMS_impl__print_stats_of_probe( IntervalProbe *probe );
-
-void
-VMS_impl__print_stats_of_all_probes();
-
-
-//======================== Probes =============================
-//
-// Use macros to allow turning probes off with a #define switch
-// This means probes have zero impact on performance when off
-//=============================================================
-
-#ifdef PROBES__TURN_ON_STATS_PROBES
-
-   #define PROBES__Create_Probe_Bookkeeping_Vars \
-      _VMSMasterEnv->dynIntervalProbesInfo = \
-       makePrivDynArrayOfSize( (void***)&(_VMSMasterEnv->intervalProbes), 200); \
-      \
-      _VMSMasterEnv->probeNameHashTbl = makeHashTable( 1000, &VMS_int__free ); \
-      \
-      /*put creation time directly into master env, for fast retrieval*/ \
-   struct timeval timeStamp; \
-   gettimeofday( &(timeStamp), NULL); \
-   _VMSMasterEnv->createPtInSecs = \
-                           timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0);
-
-   #define VMS_WL__record_time_point_into_new_probe( nameStr, animSlv ) \
-           VMS_impl__record_time_point_in_new_probe( nameStr, animSlv )
-
-   #define VMS_ext__record_time_point_into_new_probe( nameStr ) \
-           VMS_ext_impl__record_time_point_into_new_probe( nameStr )
-
-   #define VMS_WL__create_single_interval_probe( nameStr, animSlv ) \
-           VMS_impl__create_single_interval_probe( nameStr, animSlv )
-
-   #define VMS_WL__create_histogram_probe(      numBins, startValue,              \
-                                             binWidth, nameStr, animSlv )       \
-           VMS_impl__create_histogram_probe( numBins, startValue,              \
-                                             binWidth, nameStr, animSlv )
-   #define VMS_int__free_probe( probe ) \
-           VMS_impl__free_probe( probe )
-
-   #define VMS_WL__index_probe_by_its_name( probeID, animSlv ) \
-           VMS_impl__index_probe_by_its_name( probeID, animSlv )
-
-   #define VMS_WL__get_probe_by_name( probeID, animSlv ) \
-           VMS_impl__get_probe_by_name( probeName, animSlv )
-
-   #define VMS_WL__record_sched_choice_into_probe( probeID, animSlv ) \
-           VMS_impl__record_sched_choice_into_probe( probeID, animSlv )
-
-   #define VMS_WL__record_interval_start_in_probe( probeID ) \
-           VMS_impl__record_interval_start_in_probe( probeID )
-
-   #define VMS_WL__record_interval_end_in_probe( probeID ) \
-           VMS_impl__record_interval_end_in_probe( probeID )
-
-   #define VMS_WL__print_stats_of_probe( probeID ) \
-           VMS_impl__print_stats_of_probe( probeID )
-
-   #define VMS_WL__print_stats_of_all_probes() \
-           VMS_impl__print_stats_of_all_probes()
-
-
-#else
-   #define PROBES__Create_Probe_Bookkeeping_Vars
-   #define VMS_WL__record_time_point_into_new_probe( nameStr, animSlv ) 0 /* do nothing */
-   #define VMS_ext__record_time_point_into_new_probe( nameStr )  0 /* do nothing */
-   #define VMS_WL__create_single_interval_probe( nameStr, animSlv ) 0 /* do nothing */
-   #define VMS_WL__create_histogram_probe( numBins, startValue,              \
-                                             binWidth, nameStr, animSlv )       \
-          0 /* do nothing */
-   #define VMS_WL__index_probe_by_its_name( probeID, animSlv ) /* do nothing */
-   #define VMS_WL__get_probe_by_name( probeID, animSlv ) NULL /* do nothing */
-   #define VMS_WL__record_sched_choice_into_probe( probeID, animSlv ) /* do nothing */
-   #define VMS_WL__record_interval_start_in_probe( probeID )  /* do nothing */
-   #define VMS_WL__record_interval_end_in_probe( probeID )  /* do nothing */
-   #define VMS_WL__print_stats_of_probe( probeID ) ; /* do nothing */
-   #define VMS_WL__print_stats_of_all_probes() ;/* do nothing */
-
-#endif   /* defined PROBES__TURN_ON_STATS_PROBES */
-
-#endif	/* _PROBES_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Memory_Handling/vmalloc.c
--- a/Services_Offered_by_VMS/Memory_Handling/vmalloc.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,438 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- *
- * Created on November 14, 2009, 9:07 PM
- */
-
-#include <malloc.h>
-#include <inttypes.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <string.h>
-#include <math.h>
-
-#include "VMS_impl/VMS.h"
-#include "Histogram/Histogram.h"
-
-#define MAX_UINT64 0xFFFFFFFFFFFFFFFF
-
-//A MallocProlog is a head element if the HigherInMem variable is NULL
-//A Chunk is free if the prevChunkInFreeList variable is NULL
-
-/*
- * This calculates the container which fits the given size.
- */
-inline
-uint32 getContainer(size_t size)
-{
-    return (log2(size)-LOG128)/LOG54;
-}
-
-/*
- * Removes the first chunk of a freeList
- * The chunk is removed but not set as free. There is no check if
- * the free list is empty, so make sure this is not the case.
- */
-inline
-MallocProlog *removeChunk(MallocArrays* freeLists, uint32 containerIdx)
-{
-    MallocProlog** container = &freeLists->bigChunks[containerIdx];
-    MallocProlog*  removedChunk = *container;
-    *container = removedChunk->nextChunkInFreeList;
-    
-    if(removedChunk->nextChunkInFreeList)
-        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
-                (MallocProlog*)container;
-    
-    if(*container == NULL)
-    {
-       if(containerIdx < 64)
-           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
-       else
-           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64));
-    }
-    
-    return removedChunk;
-}
-
-/*
- * Removes the first chunk of a freeList
- * The chunk is removed but not set as free. There is no check if
- * the free list is empty, so make sure this is not the case.
- */
-inline
-MallocProlog *removeSmallChunk(MallocArrays* freeLists, uint32 containerIdx)
-{
-    MallocProlog** container = &freeLists->smallChunks[containerIdx];
-    MallocProlog*  removedChunk = *container;
-    *container = removedChunk->nextChunkInFreeList;
-    
-    if(removedChunk->nextChunkInFreeList)
-        removedChunk->nextChunkInFreeList->prevChunkInFreeList = 
-                (MallocProlog*)container;
-    
-    return removedChunk;
-}
-
-inline
-size_t getChunkSize(MallocProlog* chunk)
-{
-    return (uintptr_t)chunk->nextHigherInMem -
-            (uintptr_t)chunk - sizeof(MallocProlog);
-}
-
-/*
- * Removes a chunk from a free list.
- */
-inline
-void extractChunk(MallocProlog* chunk, MallocArrays *freeLists)
-{
-   chunk->prevChunkInFreeList->nextChunkInFreeList = chunk->nextChunkInFreeList;
-   if(chunk->nextChunkInFreeList)
-       chunk->nextChunkInFreeList->prevChunkInFreeList = chunk->prevChunkInFreeList;
-   
-   //The last element in the list points to the container. If the container points
-   //to NULL the container is empty
-   if(*((void**)(chunk->prevChunkInFreeList)) == NULL && getChunkSize(chunk) >= BIG_LOWER_BOUND)
-   {
-       //Find the approppiate container because we do not know it
-       uint64 containerIdx = ((uintptr_t)chunk->prevChunkInFreeList - (uintptr_t)freeLists->bigChunks) >> 3;
-       if(containerIdx < (uint32)64)
-           freeLists->bigChunksSearchVector[0] &= ~((uint64)1 << containerIdx); 
-       if(containerIdx < 128 && containerIdx >=64)
-           freeLists->bigChunksSearchVector[1] &= ~((uint64)1 << (containerIdx-64)); 
-       
-   }
-}
-
-/*
- * Merges two chunks.
- * Chunk A has to be before chunk B in memory. Both have to be removed from
- * a free list
- */
-inline
-MallocProlog *mergeChunks(MallocProlog* chunkA, MallocProlog* chunkB)
-{
-    chunkA->nextHigherInMem = chunkB->nextHigherInMem;
-    chunkB->nextHigherInMem->nextLowerInMem = chunkA;
-    return chunkA;
-}
-/*
- * Inserts a chunk into a free list.
- */
-inline
-void insertChunk(MallocProlog* chunk, MallocProlog** container)
-{
-    chunk->nextChunkInFreeList = *container;
-    chunk->prevChunkInFreeList = (MallocProlog*)container;
-    if(*container)
-        (*container)->prevChunkInFreeList = chunk;
-    *container = chunk;
-}
-
-/*
- * Divides the chunk that a new chunk of newSize is created.
- * There is no size check, so make sure the size value is valid.
- */
-inline
-MallocProlog *divideChunk(MallocProlog* chunk, size_t newSize)
-{
-    MallocProlog* newChunk = (MallocProlog*)((uintptr_t)chunk->nextHigherInMem -
-            newSize - sizeof(MallocProlog));
-    
-    newChunk->nextLowerInMem  = chunk;
-    newChunk->nextHigherInMem = chunk->nextHigherInMem;
-    
-    chunk->nextHigherInMem->nextLowerInMem = newChunk;
-    chunk->nextHigherInMem = newChunk;
-    
-    return newChunk;
-}
-
-/* 
- * Search for chunk in the list of big chunks. Split the block if it's too big
- */
-inline
-MallocProlog *searchChunk(MallocArrays *freeLists, size_t sizeRequested, uint32 containerIdx)
-{
-    MallocProlog* foundChunk;
-    
-    uint64 searchVector = freeLists->bigChunksSearchVector[0];
-    //set small chunk bits to zero
-    searchVector &= MAX_UINT64 << containerIdx;
-    containerIdx = __builtin_ffsl(searchVector); //least significant 1 bit
-
-    if(containerIdx == 0)
-    {
-       searchVector = freeLists->bigChunksSearchVector[1];
-       containerIdx = __builtin_ffsl(searchVector);
-       if(containerIdx == 0)
-       {
-           //TODO: get additional mem and insert into free list
-           //malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
-           printf("VMS malloc failed: low memory");
-           exit(1);   
-       }
-       containerIdx += 64;
-    }
-    containerIdx--;
-    
-
-    foundChunk = removeChunk(freeLists, containerIdx);
-    size_t chunkSize     = getChunkSize(foundChunk);
-
-    //If the new chunk is larger than the requested size: split
-    if(chunkSize > sizeRequested + 2 * sizeof(MallocProlog) + BIG_LOWER_BOUND)
-    {
-       MallocProlog *newChunk = divideChunk(foundChunk,sizeRequested);
-       containerIdx = getContainer(getChunkSize(foundChunk)) - 1;
-       insertChunk(foundChunk,&freeLists->bigChunks[containerIdx]);
-       if(containerIdx < 64)
-           freeLists->bigChunksSearchVector[0] |= ((uint64)1 << containerIdx);
-       else
-           freeLists->bigChunksSearchVector[1] |= ((uint64)1 << (containerIdx-64));
-       foundChunk = newChunk;
-    } 
-    
-    return foundChunk;
-}
-
-
-/*
- * This is sequential code, meant to only be called from the Master, not from
- * any slave Slvs.
- * 
- *May 2012
- *ToDo: Improve speed, by using built-in leading 1 detector to calc free-list
- * index.
- *Change to two separate arrays, one for free-lists of small fixed-size chunks
- * other for free lists of exponentially growing chunk sizes
- *Do simple compare to decide which array of lists to use
- *For small chunks, size the lists in increments of 16, up to, say, 128 (1024
- * is max if want less than 64 lists, which allows searching for first
- * occupied free-list using leading-1 detector on a bit-vector)
- *To find index, right-shift by 4 bits, and that's the index! (works because
- * compare says no 1's above 128 position ((bit 7)), and sizes are every 16,
- * so dividing by 16 equals exactly the position)
- *For large chunks, have 63 free lists, but split into even and odd indexes.
- *For even indexes, each list starts with chunks twice the size of previous
- * even index.
- *For odd indexes, each list starts with chunks of size half-way between those
- * of the even indexes on either side.
- *
- *To calc the free-list position of a requested size, get pos of leading 1
- * of the size, call this msbsP (most-significant-bit-set-position). Then
- * check bit to right of it (one-less-significant)
- *If it's 0 then use the even index: msbsP * 2, which is msbsP << 1.
- *If it's 1, then use the odd-index, which is msbsP << 1  + 1
- *
- *To find msbsP, use GCC builtin: "int __builtin_clzll (unsigned long long)"
- * which returns the number of zeros above (left of) msb set.  Note, dies if
- * give it zero, but the compare used to choose between arrays makes sure
- * requested size given to it is not zero.
- * 
- *This scheme keeps wastage small, while finding free element is O(1), and a
- * fast constant.
- *For large chunk sizes, if don't shave excess, then it ensures worst-case
- * wastage due to mis-match in size of chunk vs requested size is 33% 
- * (invariant: take any even list.. it starts at a power of 2, and next list
- *  up starts at 50% larger, so biggest chunk is 1.5 x smallest request, that's
- *  33% of total memory wasted. Then, for the odd index above, smallest chunk
- *  is 2x for smallest request of 1.5x, for 25% total wasted memory)
- *For smallest size chunks, the pre-amble wastes quite a bit, but above that,
- * sizing in increments of 16 keeps wastage small.  And, if always shave, then
- * wastage due to size mis-match is maximum 16 bytes for the large chunks.
- * 
- */
-void *
-VMS_int__malloc( size_t sizeRequested )
- {     
-         MEAS__Capture_Pre_Malloc_Point
-   
-   MallocArrays* freeLists = _VMSMasterEnv->freeLists;
-   MallocProlog* foundChunk;
-   
-   //Return a small chunk if the requested size is smaller than 128B
-   if(sizeRequested <= LOWER_BOUND)
-    {
-      uint32 freeListIdx = (sizeRequested-1)/SMALL_CHUNK_SIZE;
-      if(freeLists->smallChunks[freeListIdx] == NULL)
-        foundChunk = searchChunk(freeLists, SMALL_CHUNK_SIZE*(freeListIdx+1), 0);
-      else
-        foundChunk = removeSmallChunk(freeLists, freeListIdx);
-       
-      //Mark as allocated
-      foundChunk->prevChunkInFreeList = NULL;      
-      return foundChunk + 1;
-    }
-   
-   //Calculate the expected container. Start one higher to have a Chunk that's
-   //always big enough.
-   uint32 containerIdx = getContainer(sizeRequested);
-   
-   if(freeLists->bigChunks[containerIdx] == NULL)
-       foundChunk = searchChunk(freeLists, sizeRequested, containerIdx); 
-   else
-       foundChunk = removeChunk(freeLists, containerIdx); 
-   
-   //Mark as allocated
-   foundChunk->prevChunkInFreeList = NULL;      
-   
-         MEAS__Capture_Post_Malloc_Point
-   
-   //skip over the prolog by adding its size to the pointer return
-   return foundChunk + 1;
- }
-
-void *
-VMS_WL__malloc( int32 sizeRequested )
- { void *ret;
- 
-   VMS_int__get_master_lock();
-   ret = VMS_int__malloc( sizeRequested );
-   VMS_int__release_master_lock();
-   return ret;
- }
-
-
-/*
- * This is sequential code, meant to only be called from the Master, not from
- * any slave Slvs.
- */
-void
-VMS_int__free( void *ptrToFree )
- {
-    
-         MEAS__Capture_Pre_Free_Point;
-         
-   MallocArrays* freeLists = _VMSMasterEnv->freeLists;
-   MallocProlog *chunkToFree = (MallocProlog*)ptrToFree - 1;
-   uint32 containerIdx;
-   
-   //Check for free neighbors
-   if(chunkToFree->nextLowerInMem)
-   {
-       if(chunkToFree->nextLowerInMem->prevChunkInFreeList != NULL)
-       {//Chunk is not allocated
-           extractChunk(chunkToFree->nextLowerInMem, freeLists);
-           chunkToFree = mergeChunks(chunkToFree->nextLowerInMem, chunkToFree);
-       }
-   }
-   if(chunkToFree->nextHigherInMem)
-   {
-       if(chunkToFree->nextHigherInMem->prevChunkInFreeList != NULL)
-       {//Chunk is not allocated
-           extractChunk(chunkToFree->nextHigherInMem, freeLists);
-           chunkToFree = mergeChunks(chunkToFree, chunkToFree->nextHigherInMem);
-       }
-   }
-   
-   size_t chunkSize = getChunkSize(chunkToFree);
-   if(chunkSize < BIG_LOWER_BOUND)
-   {
-       containerIdx =  (chunkSize/SMALL_CHUNK_SIZE)-1;
-       if(containerIdx > SMALL_CHUNK_COUNT-1)
-           containerIdx = SMALL_CHUNK_COUNT-1;
-       insertChunk(chunkToFree, &freeLists->smallChunks[containerIdx]);
-   }
-   else
-   {
-       containerIdx = getContainer(getChunkSize(chunkToFree)) - 1;
-       insertChunk(chunkToFree, &freeLists->bigChunks[containerIdx]);
-       if(containerIdx < 64)
-           freeLists->bigChunksSearchVector[0] |= (uint64)1 << containerIdx;
-       else
-           freeLists->bigChunksSearchVector[1] |= (uint64)1 << (containerIdx-64);
-   }   
-   
-         MEAS__Capture_Post_Free_Point;
- }
-
-void
-VMS_WL__free( void *ptrToFree )
- {
-   VMS_int__get_master_lock();
-   VMS_int__free( ptrToFree );
-   VMS_int__release_master_lock();
- }
-
-/*
- * Designed to be called from the main thread outside of VMS, during init
- */
-MallocArrays *
-VMS_ext__create_free_list()
-{     
-   //Initialize containers for small chunks and fill with zeros
-   _VMSMasterEnv->freeLists = (MallocArrays*)malloc( sizeof(MallocArrays) );
-   MallocArrays *freeLists = _VMSMasterEnv->freeLists;
-   
-   freeLists->smallChunks = 
-           (MallocProlog**)malloc(SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
-   memset((void*)freeLists->smallChunks,
-           0,SMALL_CHUNK_COUNT*sizeof(MallocProlog*));
-   
-   //Calculate number of containers for big chunks
-   uint32 container = getContainer(MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE)+1;
-   freeLists->bigChunks = (MallocProlog**)malloc(container*sizeof(MallocProlog*));
-   memset((void*)freeLists->bigChunks,0,container*sizeof(MallocProlog*));
-   freeLists->containerCount = container;
-   
-   //Create first element in lastContainer 
-   MallocProlog *firstChunk = malloc( MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE );
-   if( firstChunk == NULL ) {printf("Can't allocate initial memory\n"); exit(1);}
-   freeLists->memSpace = firstChunk;
-   
-   //Touch memory to avoid page faults
-   void *ptr,*endPtr; 
-   endPtr = (void*)firstChunk+MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE;
-   for(ptr = firstChunk; ptr < endPtr; ptr+=PAGE_SIZE)
-   {
-       *(char*)ptr = 0;
-   }
-   
-   firstChunk->nextLowerInMem = NULL;
-   firstChunk->nextHigherInMem = (MallocProlog*)((uintptr_t)firstChunk +
-                        MALLOC_ADDITIONAL_MEM_FROM_OS_SIZE - sizeof(MallocProlog));
-   firstChunk->nextChunkInFreeList = NULL;
-   //previous element in the queue is the container
-   firstChunk->prevChunkInFreeList = &freeLists->bigChunks[container-2];
-   
-   freeLists->bigChunks[container-2] = firstChunk;
-   //Insert into bit search list
-   if(container <= 65)
-   {
-       freeLists->bigChunksSearchVector[0] = ((uint64)1 << (container-2));
-       freeLists->bigChunksSearchVector[1] = 0;
-   }   
-   else
-   {
-       freeLists->bigChunksSearchVector[0] = 0;
-       freeLists->bigChunksSearchVector[1] = ((uint64)1 << (container-66));
-   }
-   
-   //Create dummy chunk to mark the top of stack this is of course
-   //never freed
-   MallocProlog *dummyChunk = firstChunk->nextHigherInMem;
-   dummyChunk->nextHigherInMem = dummyChunk+1;
-   dummyChunk->nextLowerInMem  = NULL;
-   dummyChunk->nextChunkInFreeList = NULL;
-   dummyChunk->prevChunkInFreeList = NULL;
-   
-   return freeLists;
- }
-
-
-/*Designed to be called from the main thread outside of VMS, during cleanup
- */
-void
-VMS_ext__free_free_list( MallocArrays *freeLists )
- {    
-   free(freeLists->memSpace);
-   free(freeLists->bigChunks);
-   free(freeLists->smallChunks);
-   
- }
-
diff -r 0dc0b8653902 -r 999f2966a3e5 Services_Offered_by_VMS/Memory_Handling/vmalloc.h
--- a/Services_Offered_by_VMS/Memory_Handling/vmalloc.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,94 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceCodeStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- *
- * Created on November 14, 2009, 9:07 PM
- */
-
-#ifndef _VMALLOC_H
-#define	_VMALLOC_H
-
-#include <malloc.h>
-#include <inttypes.h>
-#include "VMS_impl/VMS_primitive_data_types.h"
-
-#define SMALL_CHUNK_SIZE 32
-#define SMALL_CHUNK_COUNT 4
-#define LOWER_BOUND     128  //Biggest chunk size that is created for the small chunks
-#define BIG_LOWER_BOUND 160  //Smallest chunk size that is created for the big chunks
-
-#define LOG54 0.3219280948873623
-#define LOG128 7
-
-typedef struct _MallocProlog MallocProlog;
-
-struct _MallocProlog
- {
-   MallocProlog *nextChunkInFreeList;
-   MallocProlog *prevChunkInFreeList;
-   MallocProlog *nextHigherInMem;
-   MallocProlog *nextLowerInMem;
- };
-//MallocProlog
- 
- typedef struct MallocArrays MallocArrays;
-
- struct MallocArrays
- {
-     MallocProlog **smallChunks;
-     MallocProlog **bigChunks;
-     uint64       bigChunksSearchVector[2];
-     void         *memSpace;
-     uint32       containerCount;
- };
- //MallocArrays
-
-typedef struct
- {
-   MallocProlog *firstChunkInFreeList;
-   int32         numInList; //TODO not used
- }
-FreeListHead;
-
-void *
-VMS_int__malloc( size_t sizeRequested );
-#define VMS_PI__malloc  VMS_int__malloc
-
-void *
-VMS_WL__malloc( int32  sizeRequested ); /*BUG: -- get master lock */
-#define VMS_App__malloc  VMS_WL__malloc
-
-void *
-VMS_int__malloc_aligned( size_t sizeRequested );
-#define VMS_PI__malloc_aligned VMS_int__malloc_aligned
-
-void
-VMS_int__free( void *ptrToFree );
-#define VMS_PI__free  VMS_int__free
-
-void
-VMS_WL__free( void *ptrToFree );
-#define VMS_App__free  VMS_WL__free
-
-
-
-/*Allocates memory from the external system -- higher overhead
- */
-void *
-VMS_ext__malloc_in_ext( size_t sizeRequested );
-
-/*Frees memory that was allocated in the external system -- higher overhead
- */
-void
-VMS_ext__free_in_ext( void *ptrToFree );
-
-
-MallocArrays *
-VMS_ext__create_free_list();
-
-void
-VMS_ext__free_free_list(MallocArrays *freeLists );
-
-#endif
\ No newline at end of file
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS.h
--- a/VMS.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,390 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *
- * Author: seanhalle@yahoo.com
- * 
- */
-
-#ifndef _VMS_H
-#define	_VMS_H
-#define _GNU_SOURCE
-
-#include "DynArray/DynArray.h"
-#include "Hash_impl/PrivateHash.h"
-#include "Histogram/Histogram.h"
-#include "Queue_impl/PrivateQueue.h"
-
-#include "VMS_primitive_data_types.h"
-#include "Services_Offered_by_VMS/Memory_Handling/vmalloc.h"
-
-#include <pthread.h>
-#include <sys/time.h>
-
-//=================  Defines: included from separate files  =================
-//
-// Note: ALL defines are in other files, none are in here
-//
-#include "Defines/VMS_defs.h"
-
-
-//================================ Typedefs =================================
-//
-typedef unsigned long long    TSCount;
-
-typedef struct _AnimSlot     AnimSlot;
-typedef struct _VMSReqst      VMSReqst;
-typedef struct _SlaveVP       SlaveVP;
-typedef struct _MasterVP      MasterVP;
-typedef struct _IntervalProbe IntervalProbe;
-
-
-typedef SlaveVP *(*SlaveAssigner)  ( void *, AnimSlot*); //semEnv, slot for HW info
-typedef void     (*RequestHandler) ( SlaveVP *, void * ); //prWReqst, semEnv
-typedef void     (*TopLevelFnPtr)  ( void *, SlaveVP * ); //initData, animSlv
-typedef void       TopLevelFn      ( void *, SlaveVP * ); //initData, animSlv
-typedef void     (*ResumeSlvFnPtr) ( SlaveVP *, void * );
-      //=========== MEASUREMENT STUFF ==========
-        MEAS__Insert_Counter_Handler
-      //========================================
-
-//============================ HW Dependent Fns ================================
-
-#include "HW_Dependent_Primitives/VMS__HW_measurement.h"
-#include "HW_Dependent_Primitives/VMS__primitives.h"
-
-
-//============= Request Related ===========
-//
-
-enum VMSReqstType   //avoid starting enums at 0, for debug reasons
- {
-   semantic = 1,
-   createReq,
-   dissipate,
-   VMSSemantic      //goes with VMSSemReqst below
- };
-
-struct _VMSReqst
- {
-   enum VMSReqstType  reqType;//used for dissipate and in future for IO requests
-   void              *semReqData;
-
-   VMSReqst *nextReqst;
- };
-//VMSReqst
-
-enum VMSSemReqstType   //These are equivalent to semantic requests, but for
- {                     // VMS's services available directly to app, like OS
-   make_probe = 1,    // and probe services -- like a VMS-wide built-in lang
-   throw_excp,
-   openFile,
-   otherIO
- };
-
-typedef struct
- { enum VMSSemReqstType reqType;
-   SlaveVP             *requestingSlv;
-   char                *nameStr;  //for create probe
-   char                *msgStr;   //for exception
-   void                *exceptionData;
- }
- VMSSemReq;
-
-
-//====================  Core data structures  ===================
-
-typedef struct
- {
-   //for future expansion
- }
-SlotPerfInfo;
-
-struct _AnimSlot
- {
-   int           workIsDone;
-   int           needsSlaveAssigned;
-   SlaveVP      *slaveAssignedToSlot;
-   
-   int           slotIdx;  //needed by Holistic Model's data gathering
-   int           coreSlotIsOn;
-   SlotPerfInfo *perfInfo; //used by assigner to pick best slave for core
- };
-//AnimSlot
-
- enum VPtype {
-     Slave = 1, //default
-     Master,
-     Shutdown,
-     Idle
- };
- 
-/*This structure embodies the state of a slaveVP.  It is reused for masterVP
- * and shutdownVPs.
- */
-struct _SlaveVP
- {    //The offsets of these fields are hard-coded into assembly
-   void       *stackPtr;         //save the core's stack ptr when suspend
-   void       *framePtr;         //save core's frame ptr when suspend
-   void       *resumeInstrPtr;   //save core's program-counter when suspend
-   void       *coreCtlrFramePtr; //restore before jmp back to core controller
-   void       *coreCtlrStackPtr; //restore before jmp back to core controller
-   
-      //============ below this, no fields are used in asm =============
-   
-   int         slaveID;       //each slave given a globally unique ID
-   int         coreAnimatedBy; 
-   void       *startOfStack;  //used to free, and to point slave to Fn
-   enum VPtype typeOfVP;      //Slave vs Master vs Shutdown..
-   int         assignCount;   //Each assign is for one work-unit, so IDs it
-      //note, a scheduling decision is uniquely identified by the triple:
-      // <slaveID, coreAnimatedBy, assignCount> -- used in record & replay
-   
-      //for comm -- between master and coreCtlr & btwn wrapper lib and plugin
-   AnimSlot   *animSlotAssignedTo;
-   VMSReqst   *requests;      //wrapper lib puts in requests, plugin takes out
-   void       *dataRetFromReq;//Return vals from plugin to Wrapper Lib
-
-      //For using Slave as carrier for data
-   void       *semanticData;  //Lang saves lang-specific things in slave here
-
-        //=========== MEASUREMENT STUFF ==========
-         MEAS__Insert_Meas_Fields_into_Slave;
-         float64     createPtInSecs;  //time VP created, in seconds
-        //========================================
- };
-//SlaveVP
-
- 
-/* The one and only global variable, holds many odds and ends
- */
-typedef struct
- {    //The offsets of these fields are hard-coded into assembly
-   void            *coreCtlrReturnPt;    //offset to this field used in asm
-   int8             falseSharePad1[256 - sizeof(void*)];
-   int32            masterLock;          //offset to this field used in asm
-   int8             falseSharePad2[256 - sizeof(int32)];
-      //============ below this, no fields are used in asm =============
-
-      //Basic VMS infrastructure
-   SlaveVP        **masterVPs;
-   AnimSlot      ***allAnimSlots;
-   
-      //plugin related
-   SlaveAssigner    slaveAssigner;
-   RequestHandler   requestHandler;
-   void            *semanticEnv;
-   
-      //Slave creation
-   int32            numSlavesCreated;  //gives ordering to processor creation
-   int32            numSlavesAlive;    //used to detect fail-safe shutdown
-
-      //Initialization related
-   int32            setupComplete;      //use while starting up coreCtlr
-
-      //Memory management related
-   MallocArrays    *freeLists;
-   int32            amtOfOutstandingMem;//total currently allocated
-
-      //Random number seeds -- random nums used in various places  
-   uint32_t seed1;
-   uint32_t seed2;
-
-      //=========== MEASUREMENT STUFF =============
-       IntervalProbe   **intervalProbes;
-       PrivDynArrayInfo *dynIntervalProbesInfo;
-       HashTable        *probeNameHashTbl;
-       int32             masterCreateProbeID;
-       float64           createPtInSecs; //real-clock time VMS initialized
-       Histogram       **measHists;
-       PrivDynArrayInfo *measHistsInfo;
-       MEAS__Insert_Susp_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_Master_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_Master_Lock_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_Malloc_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_Plugin_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_System_Meas_Fields_into_MasterEnv;
-       MEAS__Insert_Counter_Meas_Fields_into_MasterEnv;
-      //==========================================
- }
-MasterEnv;
-
-//=========================  Extra Stuff Data Strucs  =======================
-typedef struct
- {
-
- }
-VMSExcp;
-
-//=======================  OS Thread related  ===============================
-
-void * coreController( void *paramsIn );  //standard PThreads fn prototype
-void * coreCtlr_Seq( void *paramsIn );  //standard PThreads fn prototype
-void animationMaster( void *initData, SlaveVP *masterVP );
-
-
-typedef struct
- {
-   void           *endThdPt;
-   unsigned int    coreNum;
- }
-ThdParams;
-
-//=============================  Global Vars ================================
-
-volatile MasterEnv      *_VMSMasterEnv __align_to_cacheline__;
-
-   //these are global, but only used for startup and shutdown
-pthread_t       coreCtlrThdHandles[ NUM_CORES ]; //pthread's virt-procr state
-ThdParams      *coreCtlrThdParams [ NUM_CORES ];
-
-pthread_mutex_t suspendLock;
-pthread_cond_t  suspendCond;
-
-//=========================  Function Prototypes  ===========================
-/* MEANING OF   WL  PI  SS  int VMSOS
- * These indicate which places the function is safe to use.  They stand for:
- * 
- * WL   Wrapper Library -- wrapper lib code should only use these
- * PI   Plugin          -- plugin code should only use these
- * SS   Startup and Shutdown -- designates these relate to startup & shutdown
- * int  internal to VMS -- should not be used in wrapper lib or plugin
- * VMSOS means "OS functions for applications to use"
- * 
- * VMS_int__ functions touch internal VMS data structs and are only safe
- *  to be used inside the master lock.  However, occasionally, they appear
- * in wrapper-lib or plugin code.  In those cases, very careful analysis
- * has been done to be sure no concurrency issues could arise.
- * 
- * VMS_WL__ functions are all safe for use outside the master lock.
- * 
- * VMSOS are only safe for applications to use -- they're like a second
- * language mixed in -- but they can't be used inside plugin code, and
- * aren't meant for use in wrapper libraries, because they are themselves
- * wrapper-library calls!
- */
-//========== Startup and shutdown ==========
-void
-VMS_SS__init();
-
-void
-VMS_SS__start_the_work_then_wait_until_done();
-
-SlaveVP* 
-VMS_SS__create_shutdown_slave();
-
-void
-VMS_SS__shutdown();
-
-void
-VMS_SS__cleanup_at_end_of_shutdown();
-
-
-//==============    ===============
-
-inline SlaveVP *
-VMS_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
-#define VMS_PI__create_slaveVP VMS_int__create_slaveVP
-#define VMS_WL__create_slaveVP VMS_int__create_slaveVP
-
-   //Use this to create processor inside entry point & other places outside
-   // the VMS system boundary (IE, don't animate with a SlaveVP or MasterVP)
-SlaveVP *
-VMS_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam );
-
-inline SlaveVP *
-VMS_int__create_slaveVP_helper( SlaveVP *newSlv,       TopLevelFnPtr  fnPtr,
-                                void      *dataParam, void           *stackLocs );
-
-inline void
-VMS_int__reset_slaveVP_to_TopLvlFn( SlaveVP *slaveVP, TopLevelFnPtr fnPtr,
-                              void    *dataParam);
-
-inline void
-VMS_int__point_slaveVP_to_OneParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param);
-
-inline void
-VMS_int__point_slaveVP_to_TwoParamFn( SlaveVP *slaveVP, void *fnPtr,
-                              void    *param1, void *param2);
-
-void
-VMS_int__dissipate_slaveVP( SlaveVP *slaveToDissipate );
-#define VMS_PI__dissipate_slaveVP VMS_int__dissipate_slaveVP
-//WL: dissipate a SlaveVP by sending a request
-
-void
-VMS_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate );
-
-void
-VMS_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, VMSExcp *excpData );
-#define VMS_PI__throw_exception  VMS_int__throw_exception
-void
-VMS_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  VMSExcp *excpData );
-#define VMS_App__throw_exception VMS_WL__throw_exception
-
-void *
-VMS_int__give_sem_env_for( SlaveVP *animSlv );
-#define VMS_PI__give_sem_env_for  VMS_int__give_sem_env_for
-#define VMS_SS__give_sem_env_for  VMS_int__give_sem_env_for
-//No WL version -- not safe!  if use in WL, be sure data rd & wr is stable
-
-
-inline void
-VMS_int__get_master_lock();
-
-#define VMS_int__release_master_lock() _VMSMasterEnv->masterLock = UNLOCKED
-
-inline uint32_t
-VMS_int__randomNumber();
-
-//==============  Request Related  ===============
-
-void
-VMS_int__suspend_slaveVP_and_send_req( SlaveVP *callingSlv );
-
-inline void
-VMS_WL__add_sem_request_in_mallocd_VMSReqst( void *semReqData, SlaveVP *callingSlv );
-
-inline void
-VMS_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv );
-
-void
-VMS_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv );
-
-void inline
-VMS_WL__send_dissipate_req( SlaveVP *prToDissipate );
-
-inline void
-VMS_WL__send_VMSSem_request( void *semReqData, SlaveVP *callingSlv );
-
-VMSReqst *
-VMS_PI__take_next_request_out_of( SlaveVP *slaveWithReq );
-//#define VMS_PI__take_next_request_out_of( slave ) slave->requests
-
-//inline void *
-//VMS_PI__take_sem_reqst_from( VMSReqst *req );
-#define VMS_PI__take_sem_reqst_from( req ) req->semReqData
-
-void inline
-VMS_PI__handle_VMSSemReq( VMSReqst *req, SlaveVP *requestingSlv, void *semEnv,
-                       ResumeSlvFnPtr resumeSlvFnPtr );
-
-//======================== MEASUREMENT ======================
-uint64
-VMS_WL__give_num_plugin_cycles();
-uint32
-VMS_WL__give_num_plugin_animations();
-
-
-//========================= Utilities =======================
-inline char *
-VMS_int__strDup( char *str );
-
-
-//========================= Probes =======================
-#include "Services_Offered_by_VMS/Measurement_and_Stats/probes.h"
-
-//================================================
-#endif	/* _VMS_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS__PI.c
--- a/VMS__PI.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,121 +0,0 @@
-/*
- * Copyright 2010  OpenSourceStewardshipFoundation
- *
- * Licensed under BSD
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <malloc.h>
-#include <inttypes.h>
-#include <sys/time.h>
-
-#include "VMS.h"
-
-
-/* MEANING OF   WL  PI  SS  int
- * These indicate which places the function is safe to use.  They stand for:
- * WL: Wrapper Library
- * PI: Plugin 
- * SS: Startup and Shutdown
- * int: internal to the VMS implementation
- */
-
-//=========================  Local Declarations  ========================
-void inline
-handleMakeProbe( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
-
-void inline
-handleThrowException( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn );
-//=======================================================================
-
- 
-VMSReqst *
-VMS_PI__take_next_request_out_of( SlaveVP *slaveWithReq )
- { VMSReqst *req;
-
-   req = slaveWithReq->requests;
-   if( req == NULL ) return NULL;
-
-   slaveWithReq->requests = slaveWithReq->requests->nextReqst;
-   return req;
- }
-
- 
-
-/*May 2012
- *CHANGED IMPL -- now a macro in header file
- *
- *Turn function into macro that just accesses the request field
- *
-inline void *
-VMS_PI__take_sem_reqst_from( VMSReqst *req )
- {
-   return req->semReqData;
- }
-*/
-
-
-/* This is for OS requests and VMS infrastructure requests, such as to create
- *  a probe -- a probe is inside the heart of VMS-core, it's not part of any
- *  language -- but it's also a semantic thing that's triggered from and used
- *  in the application.. so it crosses abstractions..  so, need some special
- *  pattern here for handling such requests.
- * Doing this just like it were a second language sharing VMS-core.
- * 
- * This is called from the language's request handler when it sees a request
- *  of type VMSSemReq
- *
- * TODO: Later change this, to give probes their own separate plugin & have
- *  VMS-core steer the request to appropriate plugin
- * Do the same for OS calls -- look later at it..
- */
-void inline
-VMS_PI__handle_VMSSemReq( VMSReqst *req, SlaveVP *requestingSlv, void *semEnv,
-                       ResumeSlvFnPtr resumeFn )
- { VMSSemReq *semReq;
-
-   semReq = VMS_PI__take_sem_reqst_from(req);
-   if( semReq == NULL ) return;
-   switch( semReq->reqType )  //sem handlers are all in other file
-    {
-      case make_probe:      handleMakeProbe(   semReq, semEnv, resumeFn);
-         break;
-      case throw_excp:  handleThrowException(  semReq, semEnv, resumeFn);
-         break;
-    }
- }
-
-/*
- */
-void inline
-handleMakeProbe( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
- { IntervalProbe *newProbe;
-
-   newProbe          = VMS_int__malloc( sizeof(IntervalProbe) );
-   newProbe->nameStr = VMS_int__strDup( semReq->nameStr );
-   newProbe->hist    = NULL;
-   newProbe->schedChoiceWasRecorded = FALSE;
-
-      //This runs in masterVP, so no race-condition worries
-   newProbe->probeID =
-            addToDynArray( newProbe, _VMSMasterEnv->dynIntervalProbesInfo );
-
-   semReq->requestingSlv->dataRetFromReq = newProbe;
-
-   //This in inside VMS, while resume_slaveVP fn is inside language, so pass
-   // pointer from lang to here, then call it.
-   (*resumeFn)( semReq->requestingSlv, semEnv );
- }
-
-void inline
-handleThrowException( VMSSemReq *semReq, void *semEnv, ResumeSlvFnPtr resumeFn )
- {
-   VMS_int__throw_exception(  semReq->msgStr, semReq->requestingSlv, semReq->exceptionData );
-   
-   (*resumeFn)( semReq->requestingSlv, semEnv );
- }
-
-
-
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS__WL.c
--- a/VMS__WL.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,160 +0,0 @@
-/*
- * Copyright 2010  OpenSourceStewardshipFoundation
- *
- * Licensed under BSD
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <malloc.h>
-#include <inttypes.h>
-#include <sys/time.h>
-
-#include "VMS.h"
-
-
-/* MEANING OF   WL  PI  SS  int
- * These indicate which places the function is safe to use.  They stand for:
- * WL: Wrapper Library
- * PI: Plugin 
- * SS: Startup and Shutdown
- * int: internal to the VMS implementation
- */
-
-
-
-/*For this implementation of VMS, it may not make much sense to have the
- * system of requests for creating a new processor done this way.. but over
- * the scope of single-master, multi-master, mult-tasking, OS-implementing,
- * distributed-memory, and so on, this gives VMS implementation a chance to
- * do stuff before suspend, in the SlaveVP, and in the Master before the plugin
- * is called, as well as in the lang-lib before this is called, and in the
- * plugin.  So, this gives both VMS and language implementations a chance to
- * intercept at various points and do order-dependent stuff.
- *Having a standard VMSNewPrReqData struc allows the language to create and
- * free the struc, while VMS knows how to get the newSlv if it wants it, and
- * it lets the lang have lang-specific data related to creation transported
- * to the plugin.
- */
-void
-VMS_WL__send_create_slaveVP_req( void *semReqData, SlaveVP *reqstingSlv )
- { VMSReqst req;
-
-   req.reqType          = createReq;
-   req.semReqData       = semReqData;
-   req.nextReqst        = reqstingSlv->requests;
-   reqstingSlv->requests = &req;
-
-   VMS_int__suspend_slaveVP_and_send_req( reqstingSlv );
- }
-
-
-/*
- *This adds a request to dissipate, then suspends the processor so that the
- * request handler will receive the request.  The request handler is what
- * does the work of freeing memory and removing the processor from the
- * semantic environment's data structures.
- *The request handler also is what figures out when to shutdown the VMS
- * system -- which causes all the core controller threads to die, and returns from
- * the call that started up VMS to perform the work.
- *
- *This form is a bit misleading to understand if one is trying to figure out
- * how VMS works -- it looks like a normal function call, but inside it
- * sends a request to the request handler and suspends the processor, which
- * jumps out of the VMS_WL__dissipate_slaveVP function, and out of all nestings
- * above it, transferring the work of dissipating to the request handler,
- * which then does the actual work -- causing the processor that animated
- * the call of this function to disappear and the "hanging" state of this
- * function to just poof into thin air -- the virtual processor's trace
- * never returns from this call, but instead the virtual processor's trace
- * gets suspended in this call and all the virt processor's state disap-
- * pears -- making that suspend the last thing in the Slv's trace.
- */
-void
-VMS_WL__send_dissipate_req( SlaveVP *slaveToDissipate )
- { VMSReqst req;
-
-   req.reqType                = dissipate;
-   req.nextReqst              = slaveToDissipate->requests;
-   slaveToDissipate->requests = &req;
-
-   VMS_int__suspend_slaveVP_and_send_req( slaveToDissipate );
- }
-
-
-
-/*This call's name indicates that request is malloc'd -- so req handler
- * has to free any extra requests tacked on before a send, using this.
- *
- * This inserts the semantic-layer's request data into standard VMS carrier
- * request data-struct that is mallocd.  The sem request doesn't need to
- * be malloc'd if this is called inside the same call chain before the
- * send of the last request is called.
- *
- *The request handler has to call VMS_int__free_VMSReq for any of these
- */
-inline void
-VMS_WL__add_sem_request_in_mallocd_VMSReqst( void *semReqData,
-                                          SlaveVP *callingSlv )
- { VMSReqst *req;
-
-   req = VMS_int__malloc( sizeof(VMSReqst) );
-   req->reqType         = semantic;
-   req->semReqData      = semReqData;
-   req->nextReqst       = callingSlv->requests;
-   callingSlv->requests = req;
- }
-
-/*This inserts the semantic-layer's request data into standard VMS carrier
- * request data-struct is allocated on stack of this call & ptr to it sent
- * to plugin
- *Then it does suspend, to cause request to be sent.
- */
-inline void
-VMS_WL__send_sem_request( void *semReqData, SlaveVP *callingSlv )
- { VMSReqst req;
-
-   req.reqType         = semantic;
-   req.semReqData      = semReqData;
-   req.nextReqst       = callingSlv->requests;
-   callingSlv->requests = &req;
-   
-   VMS_int__suspend_slaveVP_and_send_req( callingSlv );
- }
-
-
-/*May 2012 Not sure what this is..  looks like old idea for VMS semantic
- * request
- */
-inline void
-VMS_WL__send_VMSSem_request( void *semReqData, SlaveVP *callingSlv )
- { VMSReqst req;
-
-   req.reqType         = VMSSemantic;
-   req.semReqData      = semReqData;
-   req.nextReqst       = callingSlv->requests; //gab any other preceeding 
-   callingSlv->requests = &req;
-
-   VMS_int__suspend_slaveVP_and_send_req( callingSlv );
- }
-
-/*May 2012
- *To throw exception from wrapper lib or application, first turn
- * it into a request, then send the request
- */
-void
-VMS_WL__throw_exception( char *msgStr, SlaveVP *reqstSlv,  VMSExcp *excpData )
- { VMSReqst req;
-   VMSSemReq semReq;
-
-   req.reqType         = VMSSemantic;
-   req.semReqData      = &semReq;
-   req.nextReqst       = reqstSlv->requests; //gab any other preceeding 
-   reqstSlv->requests   = &req;
-
-   semReq.msgStr        = msgStr;
-   semReq.exceptionData = excpData;
-   
-   VMS_int__suspend_slaveVP_and_send_req( reqstSlv );
- }
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS__int.c
--- a/VMS__int.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,289 +0,0 @@
-/*
- * Copyright 2010  OpenSourceStewardshipFoundation
- *
- * Licensed under BSD
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <malloc.h>
-#include <inttypes.h>
-#include <sys/time.h>
-
-#include "VMS.h"
-
-
-/* MEANING OF   WL  PI  SS  int
- * These indicate which places the function is safe to use.  They stand for:
- * WL: Wrapper Library
- * PI: Plugin 
- * SS: Startup and Shutdown
- * int: internal to the VMS implementation
- */
-
-
-inline SlaveVP *
-VMS_int__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
- { SlaveVP *newSlv;
-   void      *stackLocs;
-
-   newSlv      = VMS_int__malloc( sizeof(SlaveVP) );
-   stackLocs   = VMS_int__malloc( VIRT_PROCR_STACK_SIZE );
-   if( stackLocs == 0 )
-    { perror("VMS_int__malloc stack"); exit(1); }
-
-   _VMSMasterEnv->numSlavesAlive += 1;
-
-   return VMS_int__create_slaveVP_helper( newSlv, fnPtr, dataParam, stackLocs );
- }
-
-/* "ext" designates that it's for use outside the VMS system -- should only
- * be called from main thread or other thread -- never from code animated by
- * a VMS virtual processor.
- */
-inline SlaveVP *
-VMS_ext__create_slaveVP( TopLevelFnPtr fnPtr, void *dataParam )
- { SlaveVP *newSlv;
-   char      *stackLocs;
-
-   newSlv      = malloc( sizeof(SlaveVP) );
-   stackLocs  = malloc( VIRT_PROCR_STACK_SIZE );
-   if( stackLocs == 0 )
-    { perror("malloc stack"); exit(1); }
-
-   _VMSMasterEnv->numSlavesAlive += 1;
-
-   return VMS_int__create_slaveVP_helper(newSlv, fnPtr, dataParam, stackLocs);
- }
-
-
-//===========================================================================
-/*there is a label inside this function -- save the addr of this label in
- * the callingSlv struc, as the pick-up point from which to start the next
- * work-unit for that slave.  If turns out have to save registers, then
- * save them in the slave struc too.  Then do assembly jump to the CoreCtlr's
- * "done with work-unit" label.  The slave struc is in the request in the
- * slave that animated the just-ended work-unit, so all the state is saved
- * there, and will get passed along, inside the request handler, to the
- * next work-unit for that slave.
- */
-void
-VMS_int__suspend_slaveVP_and_send_req( SlaveVP *animatingSlv )
- { 
-
-      //This suspended Slv will get assigned by Master again at some
-      // future point
-
-      //return ownership of the Slv and anim slot to Master virt pr
-   animatingSlv->animSlotAssignedTo->workIsDone = TRUE;
-
-        HOLISTIC__Record_HwResponderInvocation_start;
-         MEAS__Capture_Pre_Susp_Point;
-      //This assembly function is a VMS primitive that first saves the
-      // stack and frame pointer, plus an addr inside this assembly code.
-      //When core ctlr later gets this slave out of a sched slot, it
-      // restores the stack and frame and then jumps to the addr.. that
-      // jmp causes return from this function.
-      //So, in effect, this function takes a variable amount of wall-clock
-      // time to complete -- the amount of time is determined by the
-      // Master, which makes sure the memory is in a consistent state first.
-   switchToCoreCtlr(animatingSlv);
-   flushRegisters();
-         MEAS__Capture_Post_Susp_Point;
-		 
-   return;
- }
-
-
-/* "ext" designates that it's for use outside the VMS system -- should only
- * be called from main thread or other thread -- never from code animated by
- * a SlaveVP, nor from a masterVP.
- *
- *Use this version to dissipate Slvs created outside the VMS system.
- */
-void
-VMS_ext__dissipate_slaveVP( SlaveVP *slaveToDissipate )
- {
-   _VMSMasterEnv->numSlavesAlive -= 1;
-   if( _VMSMasterEnv->numSlavesAlive == 0 )
-    {    //no more work, so shutdown
-      VMS_SS__shutdown();  //note, creates shut-down slaves on each core
-    }
-
-   //NOTE: dataParam was given to the processor, so should either have
-      // been alloc'd with VMS_int__malloc, or freed by the level above animSlv.
-      //So, all that's left to free here is the stack and the SlaveVP struc
-      // itself
-      //Note, should not stack-allocate the data param -- no guarantee, in
-      // general that creating processor will outlive ones it creates.
-   free( slaveToDissipate->startOfStack );
-   free( slaveToDissipate );
- }
-
-
-
-/*This must be called by the request handler plugin -- it cannot be called
- * from the semantic library "dissipate processor" function -- instead, the
- * semantic layer has to generate a request, and the plug-in calls this
- * function.
- *The reason is that this frees the virtual processor's stack -- which is
- * still in use inside semantic library calls!
- *
- *This frees or recycles all the state owned by and comprising the VMS
- * portion of the animating virtual procr.  The request handler must first
- * free any semantic data created for the processor that didn't use the
- * VMS_malloc mechanism.  Then it calls this, which first asks the malloc
- * system to disown any state that did use VMS_malloc, and then frees the
- * statck and the processor-struct itself.
- *If the dissipated processor is the sole (remaining) owner of VMS_int__malloc'd
- * state, then that state gets freed (or sent to recycling) as a side-effect
- * of dis-owning it.
- */
-void
-VMS_int__dissipate_slaveVP( SlaveVP *animatingSlv )
- {
-         DEBUG__printf2(dbgRqstHdlr, "VMS int dissipate slaveID: %d, alive: %d",animatingSlv->slaveID, _VMSMasterEnv->numSlavesAlive-1);
-      //dis-own all locations owned by this processor, causing to be freed
-      // any locations that it is (was) sole owner of
-   _VMSMasterEnv->numSlavesAlive -= 1;
-   if( _VMSMasterEnv->numSlavesAlive == 0 )
-    {    //no more work, so shutdown
-      VMS_SS__shutdown();  //note, creates shut-down processor on each core
-    }
-
-      //NOTE: dataParam was given to the processor, so should either have
-      // been alloc'd with VMS_int__malloc, or freed by the level above animSlv.
-      //So, all that's left to free here is the stack and the SlaveVP struc
-      // itself
-      //Note, should not stack-allocate initial data -- no guarantee, in
-      // general that creating processor will outlive ones it creates.
-   VMS_int__free( animatingSlv->startOfStack );
-   VMS_int__free( animatingSlv );
- }
-
-/*Anticipating multi-tasking
- */
-void *
-VMS_int__give_sem_env_for( SlaveVP *animSlv )
- {
-   return _VMSMasterEnv->semanticEnv;
- }
-
-/*
- *
- */
-inline SlaveVP *
-VMS_int__create_slaveVP_helper( SlaveVP *newSlv,    TopLevelFnPtr  fnPtr,
-                     void    *dataParam, void          *stackLocs )
- {
-   newSlv->startOfStack = stackLocs;
-   newSlv->slaveID      = _VMSMasterEnv->numSlavesCreated++;
-   newSlv->requests     = NULL;
-   newSlv->animSlotAssignedTo    = NULL;
-   newSlv->typeOfVP     = Slave;
-   newSlv->assignCount  = 0;
-
-   VMS_int__reset_slaveVP_to_TopLvlFn( newSlv, fnPtr, dataParam );
-           
-   //============================= MEASUREMENT STUFF ========================
-   #ifdef PROBES__TURN_ON_STATS_PROBES
-   //TODO: make this TSCHiLow or generic equivalent
-   //struct timeval timeStamp;
-   //gettimeofday( &(timeStamp), NULL);
-   //newSlv->createPtInSecs = timeStamp.tv_sec +(timeStamp.tv_usec/1000000.0) -
-   //                                           _VMSMasterEnv->createPtInSecs;
-   #endif
-   //========================================================================
-
-   return newSlv;
- }
-
-
-/*Later, improve this -- for now, just exits the application after printing
- * the error message.
- */
-void
-VMS_int__throw_exception( char *msgStr, SlaveVP *reqstSlv, VMSExcp *excpData )
- {
-   printf("%s",msgStr);
-   fflush(stdin);
-   exit(1);
- }
-
-
-inline char *
-VMS_int__strDup( char *str )
- { char *retStr;
-
-   if( str == NULL ) return (char *)NULL;
-   retStr = (char *)VMS_int__malloc( strlen(str) + 1 );
-   strcpy( retStr, str );
-
-   return (char *)retStr;
- }
-
-
-inline void
-VMS_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock );
-
-inline void
-VMS_int__get_master_lock()
- { int32 *addrOfMasterLock;
- 
-   addrOfMasterLock = &(_VMSMasterEnv->masterLock);
-
-   int numTriesToGetLock = 0;
-   int gotLock = 0;
-   
-            MEAS__Capture_Pre_Master_Lock_Point;
-
-   while( !gotLock ) //keep going until get master lock
-    { 
-      numTriesToGetLock++;   //if too many, means too much contention
-      if( numTriesToGetLock > NUM_TRIES_BEFORE_DO_BACKOFF )
-       { VMS_int__backoff_for_TooLongToGetLock( numTriesToGetLock );
-       }
-      if( numTriesToGetLock > MASTERLOCK_RETRIES_BEFORE_YIELD ) 
-       { numTriesToGetLock = 0; 
-         pthread_yield();
-       }
-   
-         //try to get the lock
-      gotLock = __sync_bool_compare_and_swap( addrOfMasterLock,
-                                                         UNLOCKED, LOCKED );
-    }
-            MEAS__Capture_Post_Master_Lock_Point;
- }
-
-/*Used by the backoff to pick a random amount of busy-wait.  Can't use the
- * system rand because it takes much too long.
- *Note, are passing pointers to the seeds, which are then modified
- */
-inline uint32_t
-VMS_int__randomNumber()
- {
-	_VMSMasterEnv->seed1 = 36969 * (_VMSMasterEnv->seed1 & 65535) + 
-                          (_VMSMasterEnv->seed1 >> 16);
-	_VMSMasterEnv->seed2 = 18000 * (_VMSMasterEnv->seed2 & 65535) + 
-                          (_VMSMasterEnv->seed2 >> 16);
-	return (_VMSMasterEnv->seed1 << 16) + _VMSMasterEnv->seed2;
- }
-
-
-/*Busy-waits for a random number of cycles -- chooses number of cycles 
- * differently than for the no-work backoff
- */
-inline void
-VMS_int__backoff_for_TooLongToGetLock( int32 numTriesToGetLock )
- { int32 i, waitIterations;
-   volatile double fakeWorkVar; //busy-wait fake work
-
-   waitIterations = 
-    VMS_int__randomNumber()% (numTriesToGetLock * GET_LOCK_BACKOFF_WEIGHT);   
-   //addToHist( wait_iterations, coreLoopThdParams->wait_iterations_hist );
-   for( i = 0; i < waitIterations; i++ )
-    { fakeWorkVar += (fakeWorkVar + 32.0) / 2.0; //busy-wait
-    }
- }
-
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS__startup_and_shutdown.c
--- a/VMS__startup_and_shutdown.c	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,598 +0,0 @@
-/*
- * Copyright 2010  OpenSourceStewardshipFoundation
- *
- * Licensed under BSD
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <malloc.h>
-#include <inttypes.h>
-#include <sys/time.h>
-#include <pthread.h>
-
-#include "VMS.h"
-
-
-#define thdAttrs NULL
-
-
-/* MEANING OF   WL  PI  SS  int
- * These indicate which places the function is safe to use.  They stand for:
- * WL: Wrapper Library
- * PI: Plugin 
- * SS: Startup and Shutdown
- * int: internal to the VMS implementation
- */
-
-
-//===========================================================================
-AnimSlot **
-create_anim_slots( int32 coreSlotsAreOn );
-
-void
-create_masterEnv();
-
-void
-create_the_coreCtlr_OS_threads();
-
-MallocProlog *
-create_free_list();
-
-void
-endOSThreadFn( void *initData, SlaveVP *animatingSlv );
-
-
-//===========================================================================
-
-/*Setup has two phases:
- * 1) Semantic layer first calls init_VMS, which creates masterEnv, and puts
- *    the master Slv into the work-queue, ready for first "call"
- * 2) Semantic layer then does its own init, which creates the seed virt
- *    slave inside the semantic layer, ready to assign it when
- *    asked by the first run of the animationMaster.
- *
- *This part is bit weird because VMS really wants to be "always there", and
- * have applications attach and detach..  for now, this VMS is part of
- * the app, so the VMS system starts up as part of running the app.
- *
- *The semantic layer is isolated from the VMS internals by making the
- * semantic layer do setup to a state that it's ready with its
- * initial Slvs, ready to assign them to slots when the animationMaster
- * asks.  Without this pattern, the semantic layer's setup would
- * have to modify slots directly to assign the initial virt-procrs, and put
- * them into the readyToAnimateQ itself, breaking the isolation completely.
- *
- * 
- *The semantic layer creates the initial Slv(s), and adds its
- * own environment to masterEnv, and fills in the pointers to
- * the requestHandler and slaveAssigner plug-in functions
- */
-
-/*This allocates VMS data structures, populates the master VMSProc,
- * and master environment, and returns the master environment to the semantic
- * layer.
- */
-void
-VMS_SS__init()
- {
-   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
-      create_masterEnv();
-      printf( "\n\n Running in SEQUENTIAL mode \n\n" );
-   #else
-      create_masterEnv();
-      DEBUG__printf1(TRUE,"Offset of lock in masterEnv: %d ", (int32)offsetof(MasterEnv,masterLock) );
-      create_the_coreCtlr_OS_threads();
-   #endif
- }
-
-
-/*TODO: finish implementing
- *This function returns information about the version of VMS, the language
- * the program is being run in, its version, and information on the 
- * hardware.
- */
-/*
-char *
-VMS_App__give_environment_string()
- {
-   //--------------------------
-    fprintf(output, "#\n# >> Build information <<\n");
-    fprintf(output, "# GCC VERSION: %d.%d.%d\n",__GNUC__,__GNUC_MINOR__,__GNUC_PATCHLEVEL__);
-    fprintf(output, "# Build Date: %s %s\n", __DATE__, __TIME__);
-    
-    fprintf(output, "#\n# >> Hardware information <<\n");
-    fprintf(output, "# Hardware Architecture: ");
-   #ifdef __x86_64
-    fprintf(output, "x86_64");
-   #endif //__x86_64
-   #ifdef __i386
-    fprintf(output, "x86");
-   #endif //__i386
-    fprintf(output, "\n");
-    fprintf(output, "# Number of Cores: %d\n", NUM_CORES);
-   //--------------------------
-    
-   //VMS Plugins
-    fprintf(output, "#\n# >> VMS Plugins <<\n");
-    fprintf(output, "# Language : ");
-    fprintf(output, _LANG_NAME_);
-    fprintf(output, "\n");
-       //Meta info gets set by calls from the language during its init,
-       // and info registered by calls from inside the application
-    fprintf(output, "# Assigner: %s\n", _VMSMasterEnv->metaInfo->assignerInfo);
-
-   //--------------------------
-   //Application
-    fprintf(output, "#\n# >> Application <<\n");
-    fprintf(output, "# Name: %s\n", _VMSMasterEnv->metaInfo->appInfo);
-    fprintf(output, "# Data Set:\n%s\n",_VMSMasterEnv->metaInfo->inputSet);
-    
-   //--------------------------
- }
- */
- 
-/*This structure holds all the information VMS needs to manage a program.  VMS
- * stores information about what percent of CPU time the program is getting, what
- * language it uses, the request handlers to call for its slaves, and so on.
- */
-/*
-typedef struct
- { void               *semEnv;
-   RequestHdlrFnPtr    requestHandler;
-   SlaveAssignerFnPtr  slaveAssigner;
-   int32               numSlavesLive;
-   void               *resultToReturn;
-  
-   TopLevelFnPtr   seedFnPtr;
-   void           *dataForSeed;
-   bool32          executionIsComplete;
-   pthread_mutex_t doneLock;
-   pthread_cond_t  doneCond;
- }
-VMSProcess;
-*/
-
-         
-/*
-void
-VMS_App__start_VMS_running()
- {
-   create_masterEnv();
-   
-   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
-      //Nothing else to create for sequential mode
-   #else
-      create_the_coreCtlr_OS_threads();
-   #endif    
- }
-*/
-
-/*A pointer to the startup-function for the language is given as the last
- * argument to the call.  Use this to initialize a program in the language.
- * This creates a data structure that encapsulates the bookkeeping info
- * VMS uses to track and schedule a program run.
- */
-/*
-VMSProcess *
-VMS_App__spawn_program_on_data_in_Lang( TopLevelFnPtr prog_seed_fn, void *data,
-                                    LangInitFnPtr langInitFnPtr )
- { VMSProcess *newProcess;
-   newProcess = malloc( sizeof(VMSProcess) );
-   newProcess->doneLock = PTHREAD_MUTEX_INITIALIZER;
-   newProcess->doneCond = PTHREAD_COND_INITIALIZER;
-   newProcess->executionIsComplete = FALSE;
-   newProcess->numSlavesLive = 0;
-   
-   newProcess->dataForSeed = data;
-   newProcess->seedFnPtr   = prog_seed_fn;
-   
-      //The language's spawn-process function fills in the plugin function-ptrs in
-      // the VMSProcess struct, gives the struct to VMS, which then makes and
-      // queues the seed SlaveVP, which starts processors made from the code being
-      // animated.
-    
-   (*langInitFnPtr)( newProcess );  
-   
-   return newProcess;
- }
-*/
-
-/*When all SlaveVPs owned by the program-run associated to the process have
- * dissipated, then return from this call.  There is no language to cleanup,
- * and VMS does not shutdown..  but the process bookkeeping structure,
- * which is used by VMS to track and schedule the program, is freed.
- *The VMSProcess structure is kept until this call collects the results from it,
- * then freed.  If the process is not done yet when VMS gets this
- * call, then this call waits..  the challenge here is that this call comes from
- * a live OS thread that's outside VMS..  so, inside here, it waits on a 
- * condition..  then it's a VMS thread that signals this to wake up..
- *First checks whether the process is done, if yes, calls the clean-up fn then
- * returns the result extracted from the VMSProcess struct.
- *If process not done yet, then performs a wait (in a loop to be sure the
- * wakeup is not spurious, which can happen).  VMS registers the wait, and upon
- * the process ending (last SlaveVP owned by it dissipates), then VMS signals
- * this to wakeup.  This then calls the cleanup fn and returns the result.
- */
-/*
-void *
-VMS_App__give_results_when_done_for( VMSProcess *process )
- { void *result;
-   
-   pthread_mutex_lock( process->doneLock );
-   while( !(process->executionIsComplete) )
-    {
-      pthread_cond_wait( process->doneCond,
-                         process->doneLock );
-    }
-   pthread_mutex_unlock( process->doneLock );
-   
-   result = process->resultToReturn;
-   
-   VMS_int__cleanup_process_after_done( process );
-   free( process );  //was malloc'd above, so free it here
-   
-   return result;
- }
-*/
-
-/*Turns off the VMS system, and frees all data associated with it.  Does this
- * by creating shutdown SlaveVPs and inserting them into animation slots.
- * Will probably have to wake up sleeping cores as part of this -- the fn that
- * inserts the new SlaveVPs should handle the wakeup..
- */
-/*
-void
-VMS_SS__shutdown(); //already defined -- look at it
-
-void
-VMS_App__shutdown()
- {
-   for( cores )
-    { slave = VMS_int__create_new_SlaveVP( endOSThreadFn, NULL );
-      VMS_int__insert_slave_onto_core( SlaveVP *slave, coreNum );
-    }
- }
-*/
-
-/* VMS_App__start_VMS_running();
-
-   VMSProcess matrixMultProcess;
-   
-   matrixMultProcess =
-    VMS_App__spawn_program_on_data_in_Lang( &prog_seed_fn, data, Vthread_lang );
-   
-   resMatrix = VMS_App__give_results_when_done_for( matrixMultProcess );
-   
-   VMS_App__shutdown();
- */
-
-void
-create_masterEnv()
- { MasterEnv       *masterEnv;
-   VMSQueueStruc  **readyToAnimateQs;
-   int              coreIdx;
-   SlaveVP        **masterVPs;
-   AnimSlot     ***allAnimSlots; //ptr to array of ptrs
-
-
-      //Make the master env, which holds everything else
-   _VMSMasterEnv = malloc( sizeof(MasterEnv) );
-
-        //Very first thing put into the master env is the free-list, seeded
-        // with a massive initial chunk of memory.
-        //After this, all other mallocs are VMS__malloc.
-   _VMSMasterEnv->freeLists        = VMS_ext__create_free_list();
-   
-   
-   //===================== Only VMS__malloc after this ====================
-   masterEnv     = (MasterEnv*)_VMSMasterEnv;
-   
-      //Make a readyToAnimateQ for each core controller
-   readyToAnimateQs = VMS_int__malloc( NUM_CORES * sizeof(VMSQueueStruc *) );
-   masterVPs        = VMS_int__malloc( NUM_CORES * sizeof(SlaveVP *) );
-
-      //One array for each core, several in array, core's masterVP scheds all
-   allAnimSlots    = VMS_int__malloc( NUM_CORES * sizeof(AnimSlot *) );
-
-   _VMSMasterEnv->numSlavesAlive = 0;  //used to detect shut-down condition
-
-   _VMSMasterEnv->numSlavesCreated = 0;  //used by create slave to set ID
-   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
-    {    
-      readyToAnimateQs[ coreIdx ] = makeVMSQ();
-      
-         //Q: should give masterVP core-specific info as its init data?
-      masterVPs[ coreIdx ] = VMS_int__create_slaveVP( (TopLevelFnPtr)&animationMaster, (void*)masterEnv );
-      masterVPs[ coreIdx ]->coreAnimatedBy = coreIdx;
-      masterVPs[ coreIdx ]->typeOfVP = Master;
-      allAnimSlots[ coreIdx ] = create_anim_slots( coreIdx ); //makes for one core
-    }
-   _VMSMasterEnv->masterVPs        = masterVPs;
-   _VMSMasterEnv->masterLock       = UNLOCKED;
-   _VMSMasterEnv->seed1 = rand()%1000; // init random number generator
-   _VMSMasterEnv->seed2 = rand()%1000; // init random number generator
-   _VMSMasterEnv->allAnimSlots    = allAnimSlots;
-   _VMSMasterEnv->measHistsInfo = NULL; 
-
-   //============================= MEASUREMENT STUFF ========================
-      
-         MEAS__Make_Meas_Hists_for_Susp_Meas;
-         MEAS__Make_Meas_Hists_for_Master_Meas;
-         MEAS__Make_Meas_Hists_for_Master_Lock_Meas;
-         MEAS__Make_Meas_Hists_for_Malloc_Meas;
-         MEAS__Make_Meas_Hists_for_Plugin_Meas;
-         MEAS__Make_Meas_Hists_for_Language;
-
-         PROBES__Create_Probe_Bookkeeping_Vars;
-         
-         HOLISTIC__Setup_Perf_Counters;
-         
-   //========================================================================
- }
-
-AnimSlot **
-create_anim_slots( int32 coreSlotsAreOn )
- { AnimSlot  **animSlots;
-   int i;
-
-   animSlots  = VMS_int__malloc( NUM_ANIM_SLOTS * sizeof(AnimSlot *) );
-
-   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
-    {
-      animSlots[i] = VMS_int__malloc( sizeof(AnimSlot) );
-
-         //Set state to mean "handling requests done, slot needs filling"
-      animSlots[i]->workIsDone         = FALSE;
-      animSlots[i]->needsSlaveAssigned = TRUE;
-      animSlots[i]->slotIdx            = i; //quick retrieval of slot pos
-      animSlots[i]->coreSlotIsOn       = coreSlotsAreOn;
-    }
-   return animSlots;
- }
-
-
-void
-freeAnimSlots( AnimSlot **animSlots )
- { int i;
-   for( i = 0; i < NUM_ANIM_SLOTS; i++ )
-    {
-      VMS_int__free( animSlots[i] );
-    }
-   VMS_int__free( animSlots );
- }
-
-
-void
-create_the_coreCtlr_OS_threads()
- {
-   //========================================================================
-   //                      Create the Threads
-   int coreIdx, retCode;
-
-      //Need the threads to be created suspended, and wait for a signal
-      // before proceeding -- gives time after creating to initialize other
-      // stuff before the coreCtlrs set off.
-   _VMSMasterEnv->setupComplete = 0;
-   
-      //initialize the cond used to make the new threads wait and sync up
-      //must do this before *creating* the threads..
-   pthread_mutex_init( &suspendLock, NULL );
-   pthread_cond_init( &suspendCond, NULL );
-
-      //Make the threads that animate the core controllers
-   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
-    { coreCtlrThdParams[coreIdx]          = VMS_int__malloc( sizeof(ThdParams) );
-      coreCtlrThdParams[coreIdx]->coreNum = coreIdx;
-
-      retCode =
-      pthread_create( &(coreCtlrThdHandles[coreIdx]),
-                        thdAttrs,
-                       &coreController,
-               (void *)(coreCtlrThdParams[coreIdx]) );
-      if(retCode){printf("ERROR creating thread: %d\n", retCode); exit(1);}
-    }
- }
-
-
-
-void
-VMS_SS__register_request_handler( RequestHandler requestHandler )
- { _VMSMasterEnv->requestHandler = requestHandler;
- }
-
-
-void
-VMS_SS__register_anim_assigner( SlaveAssigner animAssigner )
- { _VMSMasterEnv->slaveAssigner = animAssigner;
- }
-
-VMS_SS__register_semantic_env( void *semanticEnv )
- { _VMSMasterEnv->semanticEnv = semanticEnv;
- }
-
-
-/*This is what causes the VMS system to initialize.. then waits for it to
- * exit.
- * 
- *Wrapper lib layer calls this when it wants the system to start running..
- */
-void
-VMS_SS__start_the_work_then_wait_until_done()
- { 
-#ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
-   /*Only difference between version with an OS thread pinned to each core and
-    * the sequential version of VMS is VMS__init_Seq, this, and coreCtlr_Seq.
-    */
-         //Instead of un-suspending threads, just call the one and only
-         // core ctlr (sequential version), in the main thread.
-      coreCtlr_Seq( NULL );
-      flushRegisters();
-#else
-   int coreIdx;
-      //Start the core controllers running
-   
-      //tell the core controller threads that setup is complete
-      //get lock, to lock out any threads still starting up -- they'll see
-      // that setupComplete is true before entering while loop, and so never
-      // wait on the condition
-   pthread_mutex_lock(     &suspendLock );
-   _VMSMasterEnv->setupComplete = 1;
-   pthread_mutex_unlock(   &suspendLock );
-   pthread_cond_broadcast( &suspendCond );
-   
-   
-      //wait for all to complete
-   for( coreIdx=0; coreIdx < NUM_CORES; coreIdx++ )
-    {
-      pthread_join( coreCtlrThdHandles[coreIdx], NULL );
-    }
-   
-      //NOTE: do not clean up VMS env here -- semantic layer has to have
-      // a chance to clean up its environment first, then do a call to free
-      // the Master env and rest of VMS locations
-#endif
- }
-
-
-SlaveVP* VMS_SS__create_shutdown_slave(){
-    SlaveVP* shutdownVP;
-    
-    shutdownVP = VMS_int__create_slaveVP( &endOSThreadFn, NULL );
-    shutdownVP->typeOfVP = Shutdown;
-    
-    return shutdownVP;
-}
-
-//TODO: look at architecting cleanest separation between request handler
-// and animation master, for dissipate, create, shutdown, and other non-semantic
-// requests.  Issue is chain: one removes requests from AppSlv, one dispatches
-// on type of request, and one handles each type..  but some types require
-// action from both request handler and animation master -- maybe just give the
-// request handler calls like:  VMS__handle_X_request_type
-
-
-/*This is called by the semantic layer's request handler when it decides its
- * time to shut down the VMS system.  Calling this causes the core controller OS
- * threads to exit, which unblocks the entry-point function that started up
- * VMS, and allows it to grab the result and return to the original single-
- * threaded application.
- * 
- *The _VMSMasterEnv is needed by this shut down function, so the create-seed-
- * and-wait function has to free a bunch of stuff after it detects the
- * threads have all died: the masterEnv, the thread-related locations,
- * masterVP any AppSlvs that might still be allocated and sitting in the
- * semantic environment, or have been orphaned in the _VMSWorkQ.
- * 
- *NOTE: the semantic plug-in is expected to use VMS__malloc to get all the
- * locations it needs, and give ownership to masterVP.  Then, they will be
- * automatically freed.
- *
- *In here,create one core-loop shut-down processor for each core controller and put
- * them all directly into the readyToAnimateQ.
- *Note, this function can ONLY be called after the semantic environment no
- * longer cares if AppSlvs get animated after the point this is called.  In
- * other words, this can be used as an abort, or else it should only be
- * called when all AppSlvs have finished dissipate requests -- only at that
- * point is it sure that all results have completed.
- */
-void
-VMS_SS__shutdown()
- { int32       coreIdx;
-   SlaveVP    *shutDownSlv;
-   AnimSlot **animSlots;
-      //create the shutdown processors, one for each core controller -- put them
-      // directly into the Q -- each core will die when gets one
-   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
-    {    //Note, this is running in the master
-      shutDownSlv = VMS_SS__create_shutdown_slave();
-         //last slave has dissipated, so no more in slots, so write
-         // shut down slave into first animulng slot.
-      animSlots = _VMSMasterEnv->allAnimSlots[ coreIdx ];
-      animSlots[0]->slaveAssignedToSlot = shutDownSlv;
-      animSlots[0]->needsSlaveAssigned = FALSE;
-      shutDownSlv->coreAnimatedBy = coreIdx;
-      shutDownSlv->animSlotAssignedTo = animSlots[ 0 ];
-    }
- }
-
-
-/*Am trying to be cute, avoiding IF statement in coreCtlr that checks for
- * a special shutdown slaveVP.  Ended up with extra-complex shutdown sequence.
- *This function has the sole purpose of setting the stack and framePtr
- * to the coreCtlr's stack and framePtr.. it does that then jumps to the
- * core ctlr's shutdown point -- might be able to just call Pthread_exit
- * from here, but am going back to the pthread's stack and setting everything
- * up just as if it never jumped out, before calling pthread_exit.
- *The end-point of core ctlr will free the stack and so forth of the
- * processor that animates this function, (this fn is transfering the
- * animator of the AppSlv that is in turn animating this function over
- * to core controller function -- note that this slices out a level of virtual
- * processors).
- */
-void
-endOSThreadFn( void *initData, SlaveVP *animatingSlv )
- { 
-   #ifdef DEBUG__TURN_ON_SEQUENTIAL_MODE
-    asmTerminateCoreCtlrSeq(animatingSlv);
-   #else
-    asmTerminateCoreCtlr(animatingSlv);
-   #endif
- }
-
-
-/*This is called from the startup & shutdown
- */
-void
-VMS_SS__cleanup_at_end_of_shutdown()
- { 
-      //Before getting rid of everything, print out any measurements made
-   if( _VMSMasterEnv->measHistsInfo != NULL )
-    { forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&printHist );
-      forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&saveHistToFile);
-      forAllInDynArrayDo( _VMSMasterEnv->measHistsInfo, (DynArrayFnPtr)&freeHist );
-    }
-   
-   MEAS__Print_Hists_for_Susp_Meas;
-   MEAS__Print_Hists_for_Master_Meas;
-   MEAS__Print_Hists_for_Master_Lock_Meas;
-   MEAS__Print_Hists_for_Malloc_Meas;
-   MEAS__Print_Hists_for_Plugin_Meas;
-   
-
-      //All the environment data has been allocated with VMS__malloc, so just
-      // free its internal big-chunk and all inside it disappear.
-/*
-   readyToAnimateQs = _VMSMasterEnv->readyToAnimateQs;
-   masterVPs        = _VMSMasterEnv->masterVPs;
-   allAnimSlots    = _VMSMasterEnv->allAnimSlots;
-   
-   for( coreIdx = 0; coreIdx < NUM_CORES; coreIdx++ )
-    {
-      freeVMSQ( readyToAnimateQs[ coreIdx ] );
-         //master Slvs were created external to VMS, so use external free
-      VMS_int__dissipate_slaveVP( masterVPs[ coreIdx ] );
-      
-      freeAnimSlots( allAnimSlots[ coreIdx ] );
-    }
-   
-   VMS_int__free( _VMSMasterEnv->readyToAnimateQs );
-   VMS_int__free( _VMSMasterEnv->masterVPs );
-   VMS_int__free( _VMSMasterEnv->allAnimSlots );
-   
-   //============================= MEASUREMENT STUFF ========================
-   #ifdef PROBES__TURN_ON_STATS_PROBES
-   freeDynArrayDeep( _VMSMasterEnv->dynIntervalProbesInfo, &VMS_WL__free_probe);
-   #endif
-   //========================================================================
-*/
-      //These are the only two that use system free 
-   VMS_ext__free_free_list( _VMSMasterEnv->freeLists );
-   free( (void *)_VMSMasterEnv );
- }
-
-
-//================================
-
-
diff -r 0dc0b8653902 -r 999f2966a3e5 VMS_primitive_data_types.h
--- a/VMS_primitive_data_types.h	Mon Sep 03 03:34:54 2012 -0700
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,42 +0,0 @@
-/*
- *  Copyright 2009 OpenSourceStewardshipFoundation.org
- *  Licensed under GNU General Public License version 2
- *  
- * Author: seanhalle@yahoo.com
- *  
-
- */
-
-#ifndef _PRIMITIVE_DATA_TYPES_H
-#define _PRIMITIVE_DATA_TYPES_H
-
-
-/*For portability, need primitive data types that have a well defined
- * size, and well-defined layout into bytes
- *To do this, provide standard aliases for all primitive data types
- *These aliases must be used in all functions instead of the ANSI types
- *
- *When VMS is used together with BLIS, these definitions will be replaced
- * inside each specialization module according to the compiler used in
- * that module and the hardware being specialized to.
- */
-typedef char               bool8;
-typedef char               int8;
-typedef char               uint8;
-typedef short              int16;
-typedef unsigned short     uint16;
-typedef int                int32;
-typedef unsigned int       uint32;
-typedef unsigned int       bool32;
-typedef long long          int64;
-typedef unsigned long long uint64;
-typedef float              float32;
-typedef double             float64;
-//typedef double double      float128;  //GCC doesn't like this
-#define float128 double double
-
-#define TRUE  1
-#define FALSE 0
-
-#endif	/* _PRIMITIVE_DATA_TYPES_H */
-
diff -r 0dc0b8653902 -r 999f2966a3e5 __README__Code_Overview.txt
--- a/__README__Code_Overview.txt	Mon Sep 03 03:34:54 2012 -0700
+++ b/__README__Code_Overview.txt	Wed Sep 19 23:12:44 2012 -0700
@@ -1,21 +1,21 @@
 
-This file is intended to help those new to VMS to find their way around the code.
+This file is intended to help those new to PR to find their way around the code.
 
 Some observations:
--] VMS.h is the top header file, and is the root of a tree of #includes that pulls in all the other headers
+-] PR.h is the top header file, and is the root of a tree of #includes that pulls in all the other headers
 
 -] Defines directory contains all the header files that hold #define statements
 
--] VMS has several kinds of function, grouped according to what kind of code should call them: VMS_App_.. for applications to call, VMS_WL_.. for wrapper-library code to call, VMS_PI_.. for plugin code to call, and VMS_int_.. for VMS to use internally.  Sometimes VMS_int_ functions are called from the wrapper library or plugin, but this should only be done by programmers who have gained an in-depth knowledge of VMS's implementation and understand that VMS_int_ functions are not protected for concurrent use..
+-] PR has several kinds of function, grouped according to what kind of code should call them: PR_App_.. for applications to call, PR_WL_.. for wrapper-library code to call, PR_PI_.. for plugin code to call, and PR_int_.. for PR to use internally.  Sometimes PR_int_ functions are called from the wrapper library or plugin, but this should only be done by programmers who have gained an in-depth knowledge of PR's implementation and understand that PR_int_ functions are not protected for concurrent use..
 
--] VMS has its own version of malloc, unfortunately, which is due to the system malloc breaking when the stack-pointer register is manipulated, which VMS must do.  The VMS form of malloc must be used in code that runs inside the VMS system, especially all application code that uses a VMS-based language.  However, a complication is that the malloc implementation is not protected with a lock.  However, mallocs performed in the main thread, outside the VMS-language program, cannot use VMS malloc..  this presents some issues crossing the boundary..
+-] PR has its own version of malloc, unfortunately, which is due to the system malloc breaking when the stack-pointer register is manipulated, which PR must do.  The PR form of malloc must be used in code that runs inside the PR system, especially all application code that uses a PR-based language.  However, a complication is that the malloc implementation is not protected with a lock.  However, mallocs performed in the main thread, outside the PR-language program, cannot use PR malloc..  this presents some issues crossing the boundary..
 
--] Things in the code are turned on and off by using #define in combination with #ifdef.  All defines for doing this are found in Defines/VMS_defs__turn_on_and_off.h.  The rest of the files in Defines directory contain macro definitions, hardware constants, and any other #define statements.
+-] Things in the code are turned on and off by using #define in combination with #ifdef.  All defines for doing this are found in Defines/PR_defs__turn_on_and_off.h.  The rest of the files in Defines directory contain macro definitions, hardware constants, and any other #define statements.
 
--] VMS has many macros used in the code..  such as for measurements and debug..  all measurement, debug, and statistics gathering statements can be turned on or off by commenting-out or uncommenting the appropriate #define.  
+-] PR has many macros used in the code..  such as for measurements and debug..  all measurement, debug, and statistics gathering statements can be turned on or off by commenting-out or uncommenting the appropriate #define.  
 
--] The best way to learn VMS is to uncomment  DEBUG__TURN_ON_SEQUENTIAL_MODE, which allows using a normal debugger while sequentially executing through both application code and VMS internals.  Setting breakpoints at various spots in the code is a good way to see the VMS system in operation.
+-] The best way to learn PR is to uncomment  DEBUG__TURN_ON_SEQUENTIAL_MODE, which allows using a normal debugger while sequentially executing through both application code and PR internals.  Setting breakpoints at various spots in the code is a good way to see the PR system in operation.
 
--] VMS has several "VMS primitives" implemented with assembly code.  The net effect of these assembly functions is to perform the switching between application code and the VMS system.
+-] PR has several "PR primitives" implemented with assembly code.  The net effect of these assembly functions is to perform the switching between application code and the PR system.
 
--] The heart of this multi-core version of VMS is the AnimationMaster and CoreController.  Those files have large comments explaining the nature of VMS and this implementation.  Those comments are the best place to start reading, to get an understanding of the code before tracing through it.
\ No newline at end of file
+-] The heart of this multi-core version of PR is the AnimationMaster and CoreController.  Those files have large comments explaining the nature of PR and this implementation.  Those comments are the best place to start reading, to get an understanding of the code before tracing through it.
\ No newline at end of file