# HG changeset patch # User Some Random Person # Date 1334158805 25200 # Node ID d885f1eb9ad534ecedda173e1e0b4f7155d7e844 # Parent d72bb1ea14273edbd88d09320160f835eb702253 Added "outline" to perf tuning paper diff -r d72bb1ea1427 -r d885f1eb9ad5 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Apr 11 07:51:08 2012 -0700 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Apr 11 08:40:05 2012 -0700 @@ -64,13 +64,69 @@ \section{Introduction and Motivation} -Performance tuning has two phases that each involve several aspects of: code, runtime implementation, scheduling choices, and hardware. In order to be effective, a tool used during performance tuning must be part +Performance tuning has two phases that each involve several aspects, including code, runtime implementation, scheduling choices, and hardware. In order to be effective, a tool used during performance tuning must be part of a complete model of computation that ties all aspects of both phases of tuning together. Current tools fall short, both because they lack an encompassing model of computation, and because the tools are isolated from each other. Without integration, the user gets an incomplete picture of the computation and must resort to guesses either of where the problem lies or of what to do to fix it. -We introduce in this paper a model of computation that ties all aspects of performance together, along with instrumentation and visualization that is guided by the model and links all relevant performance -tuning information together. The model and visualization tools are illustrated with a story line that shows how they are used to performance tune the standard matrix-multiply application on two multi-core systems. +We introduce in this paper a model of computation that ties all aspects of performance together, along with instrumentation and visualization that is guided by the model and that links all relevant performance +tuning information together. The model and visualization tools are illustrated with a story line, which shows how they are used to performance tune the standard matrix-multiply application on two multi-core systems. -Although we use standard visualization techniques [cite], our approach differs from previous work in both theoretical and practical aspects. The theory we use is The Holistic Model of Parallel Computation, which ties together parallelism construct semantics with scheduling choices made during a run, and specific measurements made on the cores. When put into practice, new kinds of measurements are taken, which complete the picture presented to the user, and each measurement is tied to a specific segment of code. The resulting combination not only identifies the source of performance loss, but ties it back to specific sources and suggests precise fixes, all of which is illustrated in our story line. +Although we use standard visualization techniques [cite], our approach differs from previous work in both theoretical and practical aspects. The theory we use is The Holistic Model of Parallel Computation, which ties together parallelism construct semantics with scheduling choices made during a run, and specific measurements made on the cores. When put into practice, new kinds of measurements are taken, which complete the picture presented to the user, and each measurement is tied to a specific segment of code. The resulting combination not only identifies each source of performance loss, but ties it back to specific causes and suggests precise fixes, all of which is illustrated in our story line. + + + + +\section{Setup} + +Preview of what will see in setup + + +\subsection{a bit about development environment and the machines} + +\subsection{SSR Language} +Intro to SSR and its features. + +\subsection{A bit about the code} +Pics illustrate matrix mult divider code. + +Pics illustrate processors created and communication between them. + +\subsection{} + elements of Holistic Model without explanation: UCC and consequence graph elements. How consequence graph ties back to features in UCC and to specific segments of code. + + + + +\subsection{summary of instrumenting language and VMS} +Make point that no application instrumentation -- everything is inside language runtime (part in plugin, part in VMS -- VMS is just a helper to simplify runtime creation). + +Pic from VMS paper, with arrows pointing to places instrumentation injected. + + +\section{The Visualizations} + + + +show example UCC and consequence graph pictures. + +Talk about features of the two graphs, point out how features indicate performance loss, link picture elements that indicate perf loss to sources of loss -- could be param choices, like how many pieces to make, or input choices like size of matrix, or code choices, like how to perform division, or scheduling choices, like how to assign work-units to cores. + +State will see all of these in practice during the story lines of using the visualizations. + +\section{Illustrative Story of Performance Tuning Matrix Multiply on Two Different Machines} + +Overview of steps in story, and what each step will show + +\subsection{Performance Tuning on 1 socket by 4-core Machine} + +Starting point: just wrote code and ran it -- show UCC and consqG pics.. point out glaring visual feature that says a big perf loss.. show how links to cause.. and so on.. + + +\subsection{Performance Tuning on 4 socket by 10 core by 2 context Machine} + +Same as for 4 core machine.. this time, point out what choices are different between 40 core and 4 core. + + +====================================================== \section{Random Early Thoughts}