changeset 4:b7a974ccc6f4

Added old design notes -- probably very out of sync with code
author Me
date Wed, 28 Jul 2010 13:16:31 -0700
parents 9f2e23d38ff2
children 833c981134dd
files DESIGN_NOTES.txt
diffstat 1 files changed, 212 insertions(+), 0 deletions(-) [+]
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/DESIGN_NOTES.txt	Wed Jul 28 13:16:31 2010 -0700
     1.3 @@ -0,0 +1,212 @@
     1.4 +
     1.5 +From e-mail to Albert, on design of app-virt-procr to core-loop animation
     1.6 +switch and back.
     1.7 +
     1.8 +====================
     1.9 +General warnings about this code:
    1.10 +It only compiles in GCC 4.x  (label addr and computed goto)
    1.11 +Has assembly for x86  32bit
    1.12 +
    1.13 +
    1.14 +====================
    1.15 +AVProcr data-struc has: stack-ptr, jump-ptr, data-ptr, slotNum, coreloop-ptr
    1.16 + and semantic-custom-ptr
    1.17 +
    1.18 +The VMS Creator: takes ptr to function and ptr to initial data
    1.19 +-- creates a new AVProcr struc
    1.20 +-- sets the jmp-ptr field to the ptr-to-function passed in
    1.21 +-- sets the data-ptr to ptr to initial data passed in
    1.22 +-- if this is for a suspendable virt  processor, then create a stack and set
    1.23 +   the stack-ptr
    1.24 +
    1.25 +VMS__create_procr( AVProcrFnPtr fnPtr, void *initialData )
    1.26 +{
    1.27 +AVProcr   newPr = malloc( sizeof(AVProcr) );
    1.28 +newPr->jmpPtr = fnPtr;
    1.29 +newPr->coreLoopDonePt = &CoreLoopDonePt; //label is in coreLoop
    1.30 +newPr->data = initialData;
    1.31 +newPr->stackPtr = createNewStack();
    1.32 +return newPr;
    1.33 +}
    1.34 +
    1.35 +The semantic layer can then add its own state in the cusom-ptr field
    1.36 +
    1.37 +The Scheduler plug-in:
    1.38 +-- Sets slave-ptr in AVProcr, and points the slave to AVProcr
    1.39 +-- if non-suspendable, sets the AVProcr's stack-ptr to the slave's stack-ptr
    1.40 +
    1.41 +MasterLoop:
    1.42 +-- puts AVProcr structures onto the workQ
    1.43 +
    1.44 +CoreLoop:
    1.45 +-- gets stack-ptr out of AVProcr and sets the core's stack-ptr to that
    1.46 +-- gets data-ptr out of AVProcr and puts it into reg GCC uses for that param
    1.47 +-- puts AVProcr's addr into reg GCC uses for the AVProcr-pointer param
    1.48 +-- jumps to the addr in AVProcr's jmp-ptr field
    1.49 +CoreLoop()
    1.50 +{ while( FOREVER )
    1.51 + { nextPr = readQ( workQ );  //workQ is static (global) var declared volatile
    1.52 +   <dataPtr-param-register>       = nextPr->data;
    1.53 +   <AVProcrPtr-param-register> = nextPr;
    1.54 +   <stack-pointer register>          = nextPr->stackPtr;
    1.55 +   jmp nextPr->jmpPtr;
    1.56 +CoreLoopDonePt:   //label's addr put into AVProcr when create new one
    1.57 + }
    1.58 +}
    1.59 +(Note, for suspendable processors coming back from suspension, there is no
    1.60 + need to fill the parameter registers -- they will be discarded)
    1.61 +
    1.62 +Suspend an application-level virtual processor:
    1.63 +VMS__AVPSuspend( AVProcr *pr )
    1.64 +{
    1.65 +pr->jmpPtr = &ResumePt;  //label defined a few lines below
    1.66 +pr->slave->doneFlag = TRUE;
    1.67 +pr->stackPtr = <current SP reg value>;
    1.68 +jmp pr->coreLoopDonePt;
    1.69 +ResumePt: return;
    1.70 +}
    1.71 +
    1.72 +This works because the core loop will have switched back to this stack
    1.73 + before jumping to ResumePt..    also, the core loop never modifies the
    1.74 + stack pointer, it simply switches to whatever stack pointer is in the
    1.75 + next AVProcr it gets off the workQ.
    1.76 +
    1.77 +
    1.78 +
    1.79 +=============================================================================
    1.80 +As it is now, there's only one major unknown about GCC (first thing below
    1.81 +  the line),  and there are a few restrictions, the most intrusive being
    1.82 +  that the functions the application gives to the semantic layer have a
    1.83 +  pre-defined prototype -- return nothing, take a pointer to initial data
    1.84 +  and a pointer to an AVProcr struc, which they're not allowed to modify
    1.85 +  -- only pass it to semantic-lib calls.
    1.86 +
    1.87 +So, here are the assumptions, restrictions, and so forth:
    1.88 +===========================
    1.89 +Major assumption:  that GCC will do the following the same way every time:
    1.90 +  say the application defines a function that fits this typedef:
    1.91 +typedef void (*AVProcrFnPtr)  ( void *, AVProcr * );
    1.92 +
    1.93 +and let's say somewhere in the code they do this:
    1.94 +AVProcrFnPtr   fnPtr = &someFunc;
    1.95 +
    1.96 +then they do this:
    1.97 +(*fnPtr)( dataPtr, animatingVirtProcrPtr );
    1.98 +
    1.99 +Can the registers that GCC uses to pass the two pointers be predicted?
   1.100 + Will they always be the same registers, in every program that has the
   1.101 + same typedef?
   1.102 +If that typedef fixes, guaranteed, the registers (on x86) that GCC will use
   1.103 + to send the two pointers, then the rest of this solution works.
   1.104 +
   1.105 +Change in model: Instead of a virtual processor whose execution trace is
   1.106 + divided into work-units, replacing that with the pattern that a virtual
   1.107 + processor is suspended.  Which means, no more "work unit" data structure
   1.108 + -- instead, it's now an "Application Virtual Processor" structure
   1.109 + -- AVProcr -- which is given directly to the application function!
   1.110 +
   1.111 +   -- You were right, don't need slaves to be virtual processors, only need
   1.112 +      "scheduling buckets" -- just a way to keep track of things..
   1.113 +
   1.114 +Restrictions:
   1.115 +-- the  "virtual entities"  created by the semantic layer must be virtual
   1.116 +   processors, created with a function-to-execute and initial data -- the
   1.117 +   function is restricted to return nothing and only take a pointer to the
   1.118 +   initial data plus a pointer to an AVProcr structure, which represents
   1.119 +   "self", the virtual processor created.  (This is the interface I showed
   1.120 +   you for "Hello World" semantic layer).
   1.121 +What this means for synchronous dataflow, is that the nodes in the graph
   1.122 +  are virtual processors that in turn spawn a new virtual processor for
   1.123 +  every "firing" of the node.  This should be fine because the function
   1.124 +  that the node itself is created with is a "canned" function that is part
   1.125 +  of the semantic layer -- the function that is spawned is the user-provided
   1.126 +  function.  The restriction only means that the values from the inputs to
   1.127 +  the node are packaged as the "initial data" given to the spawned virtual
   1.128 +  processor -- so the user-function has to cast a void * to the
   1.129 +  semantic-layer-defined structure by which it gets the inputs to the node.
   1.130 +
   1.131 +-- Second restriction is that the semantic layer has to use VMS supplied
   1.132 +   stuff -- for example, the data structure that represents the
   1.133 +   application-level virtual processor is defined in VMS, and the semantic
   1.134 +   layer has to call a VMS function in order to suspend a virtual processor.
   1.135 +
   1.136 +-- Third restriction is that the application code never do anything with
   1.137 +   the AVProcr structure except pass it to semantic-layer lib calls.
   1.138 +
   1.139 +-- Fourth restriction is that every virtual processor must call a
   1.140 +   "dissipate" function as its last act -- the user-supplied
   1.141 +   virtual-processor function can't just end -- it has to call
   1.142 +   SemLib__dissipate( AVProcr ) before the closing brace.. and after the
   1.143 +   semantic layer is done cleaning up its own data, it has to in turn call
   1.144 +   VMS__disspate( AVProcr ).
   1.145 +
   1.146 +-- For performance reasons, I think I want to have two different kinds of
   1.147 +   app-virtual processor -- suspendable ones and non-suspendable -- where
   1.148 +   non-suspendable are not allowed to perform any communication with other
   1.149 +   virtual processors, except at birth and death.  Suspendable ones, of
   1.150 +   course can perform communications, create other processors, and so forth
   1.151 +   -- all of which cause it to suspend.
   1.152 +The performance difference is that I need a separate stack for each
   1.153 +  suspendable, but non-suspendable can re-use a fixed number of stacks
   1.154 +  (one for each slave).
   1.155 +
   1.156 +
   1.157 +==================== May 29
   1.158 +
   1.159 +Qs:
   1.160 +--1 how to safely jump between virt processor's trace and coreloop
   1.161 +--2 how to set up __cdecl style stack + frame for just-born virtual processor
   1.162 +--3 how to switch stack-pointers + frame-pointers
   1.163 +
   1.164 +
   1.165 +--1:
   1.166 +Not sure if GCC's computed goto is safe, because modify the stack pointer
   1.167 +without GCC's knowledge -- although, don't use the stack in the coreloop
   1.168 +segment, so, actually, that should be safe!
   1.169 +
   1.170 +So, GCC has its own special C extensions, one of which gets address of label:
   1.171 +
   1.172 +void *labelAddr;
   1.173 +labelAddr = &&label;
   1.174 +goto *labelAddr;
   1.175 +
   1.176 +--2
   1.177 +In CoreLoop, will check whether VirtProc just born, or was suspended.
   1.178 +If just born, do bit of code that sets up the virtual processor's stack
   1.179 +and frame according to the __cdecl convention for the standard virt proc
   1.180 +fn typedef -- save the pointer to data and pointer to virt proc struc into
   1.181 +correct places in the frame
   1.182 +   __cdecl says, according to:
   1.183 +http://unixwiz.net/techtips/win32-callconv-asm.html
   1.184 +To do this:
   1.185 +push the parameters onto the stack, right most first, working backwards to
   1.186 + the left.
   1.187 +Then perform call instr, which pushes return addr onto stack.
   1.188 +Then callee first pushes the frame pointer, %EBP followed by placing the
   1.189 +then-current value of stack pointer into %EBP
   1.190 +push ebp
   1.191 +mov  ebp, esp    // ebp « esp
   1.192 +
   1.193 +Once %ebp has been changed, it can now refer directly to the function's
   1.194 + arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer
   1.195 + and 4(%ebp) is the old instruction pointer.
   1.196 +
   1.197 +Then callee pushes regs it will use then adds to stack pointer the size of
   1.198 + its local vars.
   1.199 +
   1.200 +Stack in callee looks like this:
   1.201 +16(%ebp)	 - third function parameter
   1.202 +12(%ebp)	 - second function parameter
   1.203 +8(%ebp)	 - first function parameter
   1.204 +4(%ebp)	 - old %EIP (the function's "return address")
   1.205 +----------^^ State seen at first instr of callee ^^-----------
   1.206 +0(%ebp)	- old %EBP (previous function's base pointer)
   1.207 +-4(%ebp)	 - save of EAX, the only reg used in function
   1.208 +-8(%ebp)	 - first local variable
   1.209 +-12(%ebp)	 - second local variable
   1.210 +-16(%ebp)	 - third local variable
   1.211 +
   1.212 +
   1.213 +--3
   1.214 +It might be just as simple as two mov instrs, one for %ESP, one for %EBP..
   1.215 + the stack and frame pointer regs