changeset 31:3214f6daa63a

perf tuning -- added pdfs for figures, and put into latex
author Some Random Person <seanhalle@yahoo.com>
date Tue, 17 Apr 2012 17:10:00 -0700
parents 220a5cb65311
children d9e42341c7e7
files 0__Papers/Holistic_Model/Perf_Tune/figures/192.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/194.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/196.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/199.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/201.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/204.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/208.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/209.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/210.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/212.pdf 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.pdf 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex
diffstat 12 files changed, 65 insertions(+), 33 deletions(-) [+]
line diff
     1.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/192.pdf has changed
     2.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/194.pdf has changed
     3.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/196.pdf has changed
     4.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/199.pdf has changed
     5.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/201.pdf has changed
     6.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/204.pdf has changed
     7.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/208.pdf has changed
     8.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/209.pdf has changed
     9.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/210.pdf has changed
    10.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/212.pdf has changed
    11.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.pdf has changed
    12.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Tue Apr 17 20:12:33 2012 +0200
    12.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex	Tue Apr 17 17:10:00 2012 -0700
    12.3 @@ -26,7 +26,7 @@
    12.4  \documentclass[conference]{IEEEtran}
    12.5  %
    12.6  %\usepackage{makeidx,geometry,amssymb,graphicx,calc,ifthen}
    12.7 -\usepackage{amssymb,graphicx,calc,ifthen}
    12.8 +\usepackage{amssymb,graphicx,calc,ifthen,subfigure,dblfloatfix,fixltx2e}
    12.9  %
   12.10  
   12.11  % *** CITATION PACKAGES ***
   12.12 @@ -283,10 +283,10 @@
   12.13  
   12.14  
   12.15  
   12.16 -\title{Performance Tuning Requires Integration of Multiple Aspects of Application, Runtime, Scheduling, and Hardware..  OR Integrated Performance Tuning Using Semantic Information Collected by Instrumenting the Language Runtime}
   12.17 +\title{Performance Tuning Requires Integrating Aspects of Application, Runtime, Scheduling, and Hardware..  OR Integrated Performance Tuning Using Semantic Information Collected by Instrumenting the Language Runtime}
   12.18  
   12.19  \author{
   12.20 -        Nina Englehardt\\
   12.21 +        Nina Engelhardt\\
   12.22          TU Berlin
   12.23  \and
   12.24          Sean Halle\\
   12.25 @@ -298,9 +298,6 @@
   12.26  
   12.27  
   12.28  
   12.29 -\begin{document}
   12.30 -
   12.31 -
   12.32  % This \maketitle command is required from ieeepes version 4.0, to make
   12.33  % ieeepes work correctly with newer LaTeX versions.
   12.34  \maketitle
   12.35 @@ -408,51 +405,86 @@
   12.36  [maybe some stuff about features and benefits of our approach: no app instrumentation, it's all inside language runtime, very low overhead, integrated with VMS-based functional debugging, and so on]
   12.37  
   12.38  
   12.39 -\section{Story}
   12.40 +\section{Illustrative Story of Performance Tuning}
   12.41  
   12.42 -We start by showing the tool being used during a typical performance tuning session, to see how its features give benefit, and how competing tools lack of those features makes the work more difficult.
   12.43 +We start by showing the tool being used during a typical performance tuning session, to see how its features give benefit, and how competing tools' lack of those features makes the work more difficult.
   12.44  
   12.45  The program we debug is standard matrix multiply, with which the reader should be familiar. We run it on a 40 core machine.
   12.46  
   12.47  The code includes a function that automatically divides the work into a number of tasks, based on the number of cores and a tuning parameter. It distributes the tasks across the cores in a round-robin fashion. The answers produced by the tasks are collected by a ``receiver" task and accumulated into the result matrix.
   12.48  
   12.49 -The language used is SSR, which is based on rendez-vous style send and receive operations between virtual processors. It has comands for  creating and destroying virtutal processors,  and two kinds of send-receive pairs. The first, send\_from\_to specifies a specific sender and specific receiver. The second, send\_of\_type\_to, specifies a specific receiver, but the sender is anonymous. 
   12.50 -The language also includes the ability to override which core a virtual processor is assigned to.
   12.51 +
   12.52 +The language used is SSR, which is based on rendez-vous style send and receive operations made between virtual processors. It has comands for  creating and destroying virtutal processors,  and three kinds of send-receive pairs. The first, send\_from\_to specifies a specific sender and specific receiver. The second, send\_of\_type\_to, specifies a specific receiver, but the sender is anonymous, which increases flexibility while maintaining some control over scope. The third kind, send\_of\_type, only specifies the type, and so acts as a global communication channel. The language also includes performance constructs such as the ability to force which core a virtual processor is assigned to.
   12.53 +
   12.54 +A note on terminology: the word ``task''  has acquired multiple  meanings in the literature, making it a vague term. We often use instead the term  ``work-unit'', which is defined precisely as the trace-segment performed on a core, between two successive scheduling events, plus the set of datums consumed by that trace segment. The word task often maps well onto this precise definition, and we use both words.
   12.55 +
   12.56  
   12.57  After functional debugging, the first run produces the visualization seen in Fig X. This is what we refer to as a consequence graph. It depicts all the scheduling operations performed by the runtime, along with the consequent usage of the cores. 
   12.58  
   12.59 -A blue vertical block represents the time the core spends doing the actual work of one work-unit. Just above each is the runtime overhead, broken into pieces representing acquisition of the lock on the shared semantic state, time spent performing the semantics of the parallelism construct, time spent deciding which ready task to execute next, and time spent switching from virtual processor to the runtime and back. 
   12.60 +A blue vertical block represents the time the core spends doing the actual work of one work-unit. Just above each is the runtime overhead spent on that work-unit, broken into pieces representing acquisition of the lock on the shared semantic state, time spent performing the semantics of the parallelism construct, time spent deciding which ready task to execute next, and time spent switching from virtual processor, to the runtime, and back. 
   12.61  
   12.62  A second visualization, seen in Figure X, depicts the constraints on the scheduling decisions the runtime is allowed to make. 
   12.63  
   12.64 -The blue blocks are arranged according to causal dependencies between them, which are the choices the scheduler made for the order to execute the work-units. However, many different orderings could also have been validly chosen.
   12.65 +The blue blocks are arranged according to the choices the scheduler in the runtime made.  Because they determine the succession of activities on a given core, these choices show causal dependencies between work-units.  
   12.66  
   12.67 -Which scheduler choices are valid is determined by three kinds of constraints: the application code constraints, hardware constraints, and runtime implementation imposed constraints. 
   12.68 +Many different orderings could also have been validly chosen. Which scheduler choices are valid is determined by three kinds of constraints: the application code constraints, hardware constraints, and runtime implementation imposed constraints. 
   12.69  
   12.70 -Returning to  Fig X, the lines represent application-code constraints that each tie two work-units together. Each color represents the  kind of constraint imposed by one kind of  parallelism construct. Red is point-to-point send\_from\_to, while green is many-to-many send\_of\_type\_to.\  
   12.71 +Returning to  Fig X, the lines in red, cyan, and green represent application-code constraints that each tie two work-units together. The color represents the  kind of constraint imposed by one kind of  parallelism construct. Red is point-to-point send\_from\_to, while green is many-to-one send\_of\_type\_to, and cyan is the singleton construct.\  
   12.72  
   12.73 -  
   12.74 +%\begin{figure}[ht]
   12.75 +% \begin{minipage}[b]{0.5\linewidth}
   12.76 +%  \centering
   12.77 +%  \includegraphics[width=0.27in, height=6in]{../figures/184.pdf}
   12.78 +%  \caption{default}
   12.79 +%  \label{fig:figure1}
   12.80 +% \end{minipage}
   12.81 +%\hspace{0.5cm}
   12.82 +% \begin{minipage}[b]{0.5\linewidth}
   12.83 +%  \centering
   12.84 +%  \includegraphics[width=1in]{../figures/.pdf}
   12.85 +%  \caption{default}
   12.86 +%  \label{fig:figure2}
   12.87 +% \end{minipage}
   12.88 +%\end{figure}
   12.89  
   12.90 -\begin{figure}[ht]
   12.91 - \center{
   12.92 - \includegraphics[width=1in, height=6.1in]{../figures/184.pdf}
   12.93 +\begin{figure*}
   12.94 +\centering
   12.95 +\mbox
   12.96 + { \subfigure[description of left graph]
   12.97 +    {\includegraphics[width=0.2in]{../figures/192.pdf} 
   12.98 +    }\quad
   12.99 +   \subfigure[description of right graph]
  12.100 +    {\includegraphics[width=1in]{../figures/194.pdf} 
  12.101 +    }
  12.102 +   \subfigure[description of right graph]
  12.103 +    {\includegraphics[width=1in]{../figures/196.pdf} 
  12.104 +    }
  12.105 +   \subfigure[description of right graph]
  12.106 +    {\includegraphics[width=1in]{../figures/199.pdf} 
  12.107 +    }
  12.108 +   \subfigure[description of right graph]
  12.109 +    {\includegraphics[width=1in]{../figures/201.pdf} 
  12.110 +    }
  12.111 +   \subfigure[description of right graph]
  12.112 +    {\includegraphics[width=1in]{../figures/204.pdf} 
  12.113 +    }
  12.114 +   \subfigure[description of right graph]
  12.115 +    {\includegraphics[width=1in]{../figures/208.pdf} 
  12.116 +    }
  12.117   }
  12.118 - \caption
  12.119 - { Foo
  12.120 -  }
  12.121 -\label{figCommProcr}
  12.122 -\end{figure}
  12.123 -
  12.124 -
  12.125 -\begin{figure}[ht]
  12.126 - \center{
  12.127 - \includegraphics[width=1in, height=6.1in]{../figures/185.pdf}
  12.128 +\mbox
  12.129 + {  \subfigure[description of left graph]
  12.130 +    {\includegraphics[width=1in]{../figures/209.pdf} 
  12.131 +    }\quad
  12.132 +   \subfigure[description of right graph]
  12.133 +    {\includegraphics[width=1in]{../figures/210.pdf} 
  12.134 +    }
  12.135 +   \subfigure[description of right graph]
  12.136 +    {\includegraphics[width=1in]{../figures/212.pdf} 
  12.137 +    }
  12.138   }
  12.139 - \caption
  12.140 - { Foo
  12.141 -  }
  12.142 -\label{figCommProcr}
  12.143 -\end{figure}
  12.144 +\caption{Text pertaining to all graphs ...} \label{fig12}
  12.145 +\end{figure*}
  12.146  
  12.147  
  12.148  \section{Setup}