changeset 94:483eb2bbc724

Holistic -- SECOND CHANGE AFTER SUBMITTED VERSION -- modifying the paper for a new submission
author Sean Halle <seanhalle@yahoo.com>
date Mon, 29 Oct 2012 04:55:37 -0700
parents d005f9012126
children df22471e0fdb
files 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex
diffstat 1 files changed, 2 insertions(+), 7 deletions(-) [+]
line diff
     1.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Mon Oct 29 04:32:21 2012 -0700
     1.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Mon Oct 29 04:55:37 2012 -0700
     1.3 @@ -71,13 +71,8 @@
     1.4  
     1.5  
     1.6  \begin{abstract}
     1.7 -Performance tuning is an important aspect of parallel programming that involves understanding both communication behavior and scheduling behavior. Many good tools exist for identifying hot-spots in code, and idleness due to waiting on synchronization constructs.  These help in tuning data layout, and to find  constructs that spend a long time blocked, but leave the user guessing as to why they block for so long, with no other work to fill in.   The answer      often involves finding complex chain-reactions of scheduling decisions.  We propose applying a novel model of parallel computation to guide the gathering and display of runtime scheduling decisions, which makes such chain-reactions easy to spot and easy to fix.    
     1.8 -
     1.9 - which in turn requires th
    1.10 - which requires knowledge of the internal structure of the application and the runtime is available to understand how the observed patterns of performance have come to pass.
    1.11 -A trend in parallel programming languages is towards models that capture more structural information about the application, in an effort to increase both performance and ease of programming. We propose using this structural information in  performance tuning tools to make the causes of performance loss more readily apparent.
    1.12 -Our work produces a universal, adaptable set of performance visualizations that integrates this extra application structure, via a new model of parallel computation. The visualizations clearly identify idle cores, and tie the idleness to causal interactions within the runtime and hardware, and from there to the parallelism constructs that constrained the runtime and hardware behavior, thereby eliminating guesswork.
    1.13 -This approach can be used to instrument the runtime of any parallel language or programming model without modifying the application. As a case study, we applied it to the SSR message-passing model, and we walk through a tuning session on a large multi-core machine to illustrate the improvements in identifying performance loss and generating hypotheses for the cause. 
    1.14 +Performance tuning is an important aspect of parallel programming that involves understanding both communication behavior and scheduling behavior. Many good tools exist for identifying hot-spots in code, and idleness due to waiting on synchronization constructs.  These tools help in tuning data layout, and to find  constructs that spend a long time blocked, but leave the user guessing as to why they block for so long, with no other work to fill in.   The answer      often involves finding complex chain-reactions of scheduling decisions.  We propose applying a novel model of parallel computation to guide the gathering and display of runtime scheduling decisions.  which makes such chain-reactions easy to spot and easy to fix. Our work produces a set of performance visualizations which are similar to existing ones, but include details to tie idleness to causal interactions within the runtime and hardware, and from there to the parallelism constructs responsible.
    1.15 +The computation model can be employed to guide instrumenting the runtime of any parallel language or programming model,  and the needed data then collected, without modifying the application.  To simplify illustration, we instrumented the runtime of our SSR pi-calculus inspired programming model, and we walk through a tuning session on a large multi-core machine to illustrate the improvements in  generating hypotheses for the causes of idle cores, and how to reduce the idleness. 
    1.16  \end{abstract}
    1.17  
    1.18  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%