Mercurial > cgi-bin > hgwebdir.cgi > VMS > 0__Writings > kshalle
changeset 15:d885f1eb9ad5
Added "outline" to perf tuning paper
| author | Some Random Person <seanhalle@yahoo.com> |
|---|---|
| date | Wed, 11 Apr 2012 08:40:05 -0700 |
| parents | d72bb1ea1427 |
| children | be5673d9658b |
| files | 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex |
| diffstat | 1 files changed, 60 insertions(+), 4 deletions(-) [+] |
line diff
1.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Apr 11 07:51:08 2012 -0700 1.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Apr 11 08:40:05 2012 -0700 1.3 @@ -64,13 +64,69 @@ 1.4 1.5 \section{Introduction and Motivation} 1.6 1.7 -Performance tuning has two phases that each involve several aspects of: code, runtime implementation, scheduling choices, and hardware. In order to be effective, a tool used during performance tuning must be part 1.8 +Performance tuning has two phases that each involve several aspects, including code, runtime implementation, scheduling choices, and hardware. In order to be effective, a tool used during performance tuning must be part 1.9 of a complete model of computation that ties all aspects of both phases of tuning together. Current tools fall short, both because they lack an encompassing model of computation, and because the tools are isolated from each other. Without integration, the user gets an incomplete picture of the computation and must resort to guesses either of where the problem lies or of what to do to fix it. 1.10 1.11 -We introduce in this paper a model of computation that ties all aspects of performance together, along with instrumentation and visualization that is guided by the model and links all relevant performance 1.12 -tuning information together. The model and visualization tools are illustrated with a story line that shows how they are used to performance tune the standard matrix-multiply application on two multi-core systems. 1.13 +We introduce in this paper a model of computation that ties all aspects of performance together, along with instrumentation and visualization that is guided by the model and that links all relevant performance 1.14 +tuning information together. The model and visualization tools are illustrated with a story line, which shows how they are used to performance tune the standard matrix-multiply application on two multi-core systems. 1.15 1.16 -Although we use standard visualization techniques [cite], our approach differs from previous work in both theoretical and practical aspects. The theory we use is The Holistic Model of Parallel Computation, which ties together parallelism construct semantics with scheduling choices made during a run, and specific measurements made on the cores. When put into practice, new kinds of measurements are taken, which complete the picture presented to the user, and each measurement is tied to a specific segment of code. The resulting combination not only identifies the source of performance loss, but ties it back to specific sources and suggests precise fixes, all of which is illustrated in our story line. 1.17 +Although we use standard visualization techniques [cite], our approach differs from previous work in both theoretical and practical aspects. The theory we use is The Holistic Model of Parallel Computation, which ties together parallelism construct semantics with scheduling choices made during a run, and specific measurements made on the cores. When put into practice, new kinds of measurements are taken, which complete the picture presented to the user, and each measurement is tied to a specific segment of code. The resulting combination not only identifies each source of performance loss, but ties it back to specific causes and suggests precise fixes, all of which is illustrated in our story line. 1.18 + 1.19 +<maybe some stuff about features and benefits of our approach: no app instrumentation, it's all inside language runtime, very low overhead, integrated with VMS-based functional debugging, and so on> 1.20 + 1.21 + 1.22 +\section{Setup} 1.23 + 1.24 +Preview of what will see in setup 1.25 + 1.26 + 1.27 +\subsection{a bit about development environment and the machines} 1.28 + 1.29 +\subsection{SSR Language} 1.30 +Intro to SSR and its features. 1.31 + 1.32 +\subsection{A bit about the code} 1.33 +Pics illustrate matrix mult divider code. 1.34 + 1.35 +Pics illustrate processors created and communication between them. 1.36 + 1.37 +\subsection{} 1.38 + elements of Holistic Model without explanation: UCC and consequence graph elements. How consequence graph ties back to features in UCC and to specific segments of code. 1.39 + 1.40 + 1.41 +<a line about lang features support this perf tuning model -- can still do with pthreads, but less effective because don't have clean units nor semantics of constraints on units -- SSR provides these, as do languages like CnC, StarSs, and so on> 1.42 + 1.43 +\subsection{summary of instrumenting language and VMS} 1.44 +Make point that no application instrumentation -- everything is inside language runtime (part in plugin, part in VMS -- VMS is just a helper to simplify runtime creation). 1.45 + 1.46 +Pic from VMS paper, with arrows pointing to places instrumentation injected. 1.47 + 1.48 + 1.49 +\section{The Visualizations} 1.50 + 1.51 +<note: use language that talks about the visualizations as seen, but avoid using word "tool" in connection with visualization.. no need to draw attention to fact that we don't have a GUI, the contribution is not a tool, but rather a methodology, and the visualizations are just one element of the methodology> 1.52 + 1.53 +show example UCC and consequence graph pictures. 1.54 + 1.55 +Talk about features of the two graphs, point out how features indicate performance loss, link picture elements that indicate perf loss to sources of loss -- could be param choices, like how many pieces to make, or input choices like size of matrix, or code choices, like how to perform division, or scheduling choices, like how to assign work-units to cores. 1.56 + 1.57 +State will see all of these in practice during the story lines of using the visualizations. 1.58 + 1.59 +\section{Illustrative Story of Performance Tuning Matrix Multiply on Two Different Machines} 1.60 + 1.61 +Overview of steps in story, and what each step will show 1.62 + 1.63 +\subsection{Performance Tuning on 1 socket by 4-core Machine} 1.64 + 1.65 +Starting point: just wrote code and ran it -- show UCC and consqG pics.. point out glaring visual feature that says a big perf loss.. show how links to cause.. and so on.. 1.66 + 1.67 + 1.68 +\subsection{Performance Tuning on 4 socket by 10 core by 2 context Machine} 1.69 + 1.70 +Same as for 4 core machine.. this time, point out what choices are different between 40 core and 4 core. 1.71 + 1.72 + 1.73 +====================================================== 1.74 1.75 \section{Random Early Thoughts} 1.76
