changeset 25:556a9b80cdb5

Perf Tuning -- Some slight fix ups on related work section
author Some Random Person <seanhalle@yahoo.com>
date Mon, 16 Apr 2012 09:17:17 -0700
parents 72ba77515c93
children b793b4934cf8
files 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex
diffstat 1 files changed, 28 insertions(+), 4 deletions(-) [+]
line diff
     1.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Mon Apr 16 09:16:11 2012 -0700
     1.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Mon Apr 16 09:17:17 2012 -0700
     1.3 @@ -82,14 +82,38 @@
     1.4  
     1.5  
     1.6  \section{Background and Related Work}
     1.7 -Performance tuning, as does functional debugging, has steps that are iterated until the person tuning is satisfied. First take measurements and display them, in order to discover discrepancies from desired behavior. Next, connect the details of the discrepancies with structure information to form a hypothesis for the cause of the discrepancy. The cause should suggest a plan to fix the problem. Then implement the plan, re-execute, gather new measurements, and repeat until satisfied.
     1.8 +Performance tuning has steps that are iterated until the person tuning is satisfied. First take measurements and display them, in order to discover discrepancies from desired behavior. Next, connect the details of the discrepancies with structure information to form a hypothesis for the cause of the discrepancy. The cause should suggest a plan to fix the problem. Then implement the plan, re-execute, gather new measurements, and repeat until satisfied.
     1.9 +
    1.10 +As discussed, for parallel performance, the information presented must include task/work-unit, constraints on scheduling those, and scheduling decisions actually made.
    1.11 +
    1.12 +Most of the older more established tools come from the threads world, and conceive of the application as a processor that performs actions, but don't include the concept of application-defined tasks nor constraints on them. For example, 
    1.13 +
    1.14 +
    1.15 +Their structure information is either call graph or line of code. This is too limited for parallel performance. Need tasks and scheduling. More recent languages move towards task-based. This makes more of the needed structure information available. But we've not seen a complete or coherent presentation.
    1.16 +
    1.17 +===========
    1.18 +
    1.19 +We've had basic perf tuning, then Tau, now 
    1.20 +
    1.21 +A survey of the most highly cited classic papers shows the commonality..
    1.22 +
    1.23 +===========
    1.24 +
    1.25 +One of the most cited examples of the classic performance tuning systems is  Tau[]. (Threads, thousands of data sources, no coherent inter-relation of what's happening) It integrates many data sources, with rich displays. However its model was cores and memories and contexts, with actions taken on or by each. It had no well defined concept of scheduling, runtime nor units of work (tasks). Hence, it had no view that integrated the parallelism-specific information about tasks, constraints on them, and scheduling choices.
    1.26 +
    1.27 +Another highly cited classic performance tuning system is Paradyn[]. It is meant for applications that run for several days on multi-thousand node clusters. Its model of computation is based on events, both the timing of events and counts of events. It has a system for user-supplied instrumentation to collect event information, and, it has a hypotheses mechanism that protects the user from having to write custom code to test their hypotheses. However, the hypotheses are in terms of the timing and counts of events. not the parallel computation relevant information of units of scheduled work and the scheduling decisions made on those. (an attempt at codifying this hypothesis and test approach to perf tuning). 
    1.28 +
    1.29 +The second most cited is an overview paper from 1991.
    1.30 +
    1.31 +Paragraph instruments MPI library. So it's an event-based model -- for cores only tells if busy, communication specific overhead, or idle.
    1.32 +
    1.33 +So, Tau is thread-view, Paragraph is MPI view, Paradyn is event view.
    1.34  
    1.35  ?
    1.36  
    1.37 -Talk about other tools:
    1.38 +?show measurements but no structure. Recently 
    1.39  
    1.40 -most of older more established, come from threads world, conceive the application as a processor that does things and don't know what things are -- Tau had model but was cores and memories and contexts -- not scheduling or runtime or units of work (no tasks) -- no tasks nor constraints on tasks.
    1.41 -
    1.42 + -- 
    1.43  Seeing need task based languages now -- people who dev lang also dev tools to go with it. Direction is clearly going towards task-based, but not there yet
    1.44  
    1.45  MPI is also machine-based abstraction, that gives communication information, but doesn't have concept of constraints . Its sort of in-between..