Mercurial > cgi-bin > hgwebdir.cgi > VMS > 0__Writings > kshalle

changeset 17:07c466b1006d
Perf tuning starts with Nina's use-case of current problem
author: Some Random Person <seanhalle@yahoo.com>
date: Thu, 12 Apr 2012 06:23:14 -0700
parents: be5673d9658b
children: 53991637cae5
files: 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex
diffstat: 1 files changed, 14 insertions(+), 0 deletions(-) [+]
[-]

0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex 14 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex 14
0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex 14
     1.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Wed Apr 11 10:20:53 2012 -0700
     1.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Thu Apr 12 06:23:14 2012 -0700
     1.3 @@ -64,6 +64,20 @@
     1.4  
     1.5  \section{Introduction and Motivation}
     1.6  
     1.7 +(Where reader is, can all say "yeah, I agree")
     1.8 +Visualizationas have been around, and perf tuning has been around, but they are a bit fragmented, focusing on one specific view of the application, like statistics of line of code or function call or call-graph. Have core time-lines, showing which function by time, but then don't know why or how much is what wanted it to do, or have cache misses over time, but never have a coherent view of how application code connects to what happens where and when. 
     1.9 +
    1.10 +Can have a timeline view, and next to that execution by function and what percent by function, next to that histogram of cache misses.  But no coherent view of how to connect these things. There is information missing that connects the views. The user still has to guess about what the cause might be.
    1.11 +
    1.12 +To fix this, want is a mental framework that the views all fit into, so that they connect to each other when one looks at the information. 
    1.13 +
    1.14 +The problem of the other vis is they don't give shape of application..
    1.15 + the fundamental parallelism related structure. Scheduling is a fundamental part of parallel execution. The views must include both constraints on scheduling and the actual scheduling choices. The parts that affect what scheduling ch
    1.16 +
    1.17 +to have more theoretical underpinning and several views that connect to each other. The user needs more information, with some mental framework to 
    1.18 +
    1.19 +a lot of the time it's lists of measurements, or bar graphs, things like that -- over the whole application or by function -- forcing guessing of how it connects to -- if it tells you that this line creates a lot of level 2 cache misses, that doesn't tell you what the application is doing to cause this.. but when have whole UCC along with it, have context for the measurements -- puts the line of code into a framework -- it's necessary but not useful by itself -- it needs to be connected -- the unit information is more interesting than the line of code information -- line of code has only sequential meaning, missing scheduling connection -- need the scheduling behavior added -- need to know the unit of work that's causing problem, not the line of code -- unit provides a parallelism context, line of code does not.. unit provides an execution order and execution location, with implied communication -- line of code does not.
    1.20 +
    1.21  Performance tuning, as does functional debugging, has steps that are iterated: Use measurements to discover discrepancies from desired behavior, use structure info together with that to form hypothesis for cause of discrepancy, use strucuture info together with hypothesis of cause to create plan to fix, then implement and re-execute and gather new measurements, repeat until satisfied.
    1.22     
    1.23  Expl of what is meant by "structure" info -- example where meas of runtime system showed that overhead of task creation took longer than task execution. Hypothesis was trivial: cause of lost performance is runtime overhead of creation is larger than work in a scheduled unit.  The plan to fix is to change the number of work-units created, by changing the parameter in the divider code.  Implementing this and re-executing showed that this source of performance loss was fixed by the change.