Mercurial > cgi-bin > hgwebdir.cgi > VMS > 0__Writings > kshalle
changeset 31:3214f6daa63a
perf tuning -- added pdfs for figures, and put into latex
| author | Some Random Person <seanhalle@yahoo.com> |
|---|---|
| date | Tue, 17 Apr 2012 17:10:00 -0700 |
| parents | 220a5cb65311 |
| children | d9e42341c7e7 |
| files | 0__Papers/Holistic_Model/Perf_Tune/figures/192.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/194.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/196.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/199.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/201.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/204.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/208.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/209.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/210.pdf 0__Papers/Holistic_Model/Perf_Tune/figures/212.pdf 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.pdf 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex |
| diffstat | 12 files changed, 65 insertions(+), 33 deletions(-) [+] |
line diff
1.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/192.pdf has changed
2.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/194.pdf has changed
3.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/196.pdf has changed
4.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/199.pdf has changed
5.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/201.pdf has changed
6.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/204.pdf has changed
7.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/208.pdf has changed
8.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/209.pdf has changed
9.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/210.pdf has changed
10.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/figures/212.pdf has changed
11.1 Binary file 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.pdf has changed
12.1 --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Tue Apr 17 20:12:33 2012 +0200 12.2 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Tue Apr 17 17:10:00 2012 -0700 12.3 @@ -26,7 +26,7 @@ 12.4 \documentclass[conference]{IEEEtran} 12.5 % 12.6 %\usepackage{makeidx,geometry,amssymb,graphicx,calc,ifthen} 12.7 -\usepackage{amssymb,graphicx,calc,ifthen} 12.8 +\usepackage{amssymb,graphicx,calc,ifthen,subfigure,dblfloatfix,fixltx2e} 12.9 % 12.10 12.11 % *** CITATION PACKAGES *** 12.12 @@ -283,10 +283,10 @@ 12.13 12.14 12.15 12.16 -\title{Performance Tuning Requires Integration of Multiple Aspects of Application, Runtime, Scheduling, and Hardware.. OR Integrated Performance Tuning Using Semantic Information Collected by Instrumenting the Language Runtime} 12.17 +\title{Performance Tuning Requires Integrating Aspects of Application, Runtime, Scheduling, and Hardware.. OR Integrated Performance Tuning Using Semantic Information Collected by Instrumenting the Language Runtime} 12.18 12.19 \author{ 12.20 - Nina Englehardt\\ 12.21 + Nina Engelhardt\\ 12.22 TU Berlin 12.23 \and 12.24 Sean Halle\\ 12.25 @@ -298,9 +298,6 @@ 12.26 12.27 12.28 12.29 -\begin{document} 12.30 - 12.31 - 12.32 % This \maketitle command is required from ieeepes version 4.0, to make 12.33 % ieeepes work correctly with newer LaTeX versions. 12.34 \maketitle 12.35 @@ -408,51 +405,86 @@ 12.36 [maybe some stuff about features and benefits of our approach: no app instrumentation, it's all inside language runtime, very low overhead, integrated with VMS-based functional debugging, and so on] 12.37 12.38 12.39 -\section{Story} 12.40 +\section{Illustrative Story of Performance Tuning} 12.41 12.42 -We start by showing the tool being used during a typical performance tuning session, to see how its features give benefit, and how competing tools lack of those features makes the work more difficult. 12.43 +We start by showing the tool being used during a typical performance tuning session, to see how its features give benefit, and how competing tools' lack of those features makes the work more difficult. 12.44 12.45 The program we debug is standard matrix multiply, with which the reader should be familiar. We run it on a 40 core machine. 12.46 12.47 The code includes a function that automatically divides the work into a number of tasks, based on the number of cores and a tuning parameter. It distributes the tasks across the cores in a round-robin fashion. The answers produced by the tasks are collected by a ``receiver" task and accumulated into the result matrix. 12.48 12.49 -The language used is SSR, which is based on rendez-vous style send and receive operations between virtual processors. It has comands for creating and destroying virtutal processors, and two kinds of send-receive pairs. The first, send\_from\_to specifies a specific sender and specific receiver. The second, send\_of\_type\_to, specifies a specific receiver, but the sender is anonymous. 12.50 -The language also includes the ability to override which core a virtual processor is assigned to. 12.51 + 12.52 +The language used is SSR, which is based on rendez-vous style send and receive operations made between virtual processors. It has comands for creating and destroying virtutal processors, and three kinds of send-receive pairs. The first, send\_from\_to specifies a specific sender and specific receiver. The second, send\_of\_type\_to, specifies a specific receiver, but the sender is anonymous, which increases flexibility while maintaining some control over scope. The third kind, send\_of\_type, only specifies the type, and so acts as a global communication channel. The language also includes performance constructs such as the ability to force which core a virtual processor is assigned to. 12.53 + 12.54 +A note on terminology: the word ``task'' has acquired multiple meanings in the literature, making it a vague term. We often use instead the term ``work-unit'', which is defined precisely as the trace-segment performed on a core, between two successive scheduling events, plus the set of datums consumed by that trace segment. The word task often maps well onto this precise definition, and we use both words. 12.55 + 12.56 12.57 After functional debugging, the first run produces the visualization seen in Fig X. This is what we refer to as a consequence graph. It depicts all the scheduling operations performed by the runtime, along with the consequent usage of the cores. 12.58 12.59 -A blue vertical block represents the time the core spends doing the actual work of one work-unit. Just above each is the runtime overhead, broken into pieces representing acquisition of the lock on the shared semantic state, time spent performing the semantics of the parallelism construct, time spent deciding which ready task to execute next, and time spent switching from virtual processor to the runtime and back. 12.60 +A blue vertical block represents the time the core spends doing the actual work of one work-unit. Just above each is the runtime overhead spent on that work-unit, broken into pieces representing acquisition of the lock on the shared semantic state, time spent performing the semantics of the parallelism construct, time spent deciding which ready task to execute next, and time spent switching from virtual processor, to the runtime, and back. 12.61 12.62 A second visualization, seen in Figure X, depicts the constraints on the scheduling decisions the runtime is allowed to make. 12.63 12.64 -The blue blocks are arranged according to causal dependencies between them, which are the choices the scheduler made for the order to execute the work-units. However, many different orderings could also have been validly chosen. 12.65 +The blue blocks are arranged according to the choices the scheduler in the runtime made. Because they determine the succession of activities on a given core, these choices show causal dependencies between work-units. 12.66 12.67 -Which scheduler choices are valid is determined by three kinds of constraints: the application code constraints, hardware constraints, and runtime implementation imposed constraints. 12.68 +Many different orderings could also have been validly chosen. Which scheduler choices are valid is determined by three kinds of constraints: the application code constraints, hardware constraints, and runtime implementation imposed constraints. 12.69 12.70 -Returning to Fig X, the lines represent application-code constraints that each tie two work-units together. Each color represents the kind of constraint imposed by one kind of parallelism construct. Red is point-to-point send\_from\_to, while green is many-to-many send\_of\_type\_to.\ 12.71 +Returning to Fig X, the lines in red, cyan, and green represent application-code constraints that each tie two work-units together. The color represents the kind of constraint imposed by one kind of parallelism construct. Red is point-to-point send\_from\_to, while green is many-to-one send\_of\_type\_to, and cyan is the singleton construct.\ 12.72 12.73 - 12.74 +%\begin{figure}[ht] 12.75 +% \begin{minipage}[b]{0.5\linewidth} 12.76 +% \centering 12.77 +% \includegraphics[width=0.27in, height=6in]{../figures/184.pdf} 12.78 +% \caption{default} 12.79 +% \label{fig:figure1} 12.80 +% \end{minipage} 12.81 +%\hspace{0.5cm} 12.82 +% \begin{minipage}[b]{0.5\linewidth} 12.83 +% \centering 12.84 +% \includegraphics[width=1in]{../figures/.pdf} 12.85 +% \caption{default} 12.86 +% \label{fig:figure2} 12.87 +% \end{minipage} 12.88 +%\end{figure} 12.89 12.90 -\begin{figure}[ht] 12.91 - \center{ 12.92 - \includegraphics[width=1in, height=6.1in]{../figures/184.pdf} 12.93 +\begin{figure*} 12.94 +\centering 12.95 +\mbox 12.96 + { \subfigure[description of left graph] 12.97 + {\includegraphics[width=0.2in]{../figures/192.pdf} 12.98 + }\quad 12.99 + \subfigure[description of right graph] 12.100 + {\includegraphics[width=1in]{../figures/194.pdf} 12.101 + } 12.102 + \subfigure[description of right graph] 12.103 + {\includegraphics[width=1in]{../figures/196.pdf} 12.104 + } 12.105 + \subfigure[description of right graph] 12.106 + {\includegraphics[width=1in]{../figures/199.pdf} 12.107 + } 12.108 + \subfigure[description of right graph] 12.109 + {\includegraphics[width=1in]{../figures/201.pdf} 12.110 + } 12.111 + \subfigure[description of right graph] 12.112 + {\includegraphics[width=1in]{../figures/204.pdf} 12.113 + } 12.114 + \subfigure[description of right graph] 12.115 + {\includegraphics[width=1in]{../figures/208.pdf} 12.116 + } 12.117 } 12.118 - \caption 12.119 - { Foo 12.120 - } 12.121 -\label{figCommProcr} 12.122 -\end{figure} 12.123 - 12.124 - 12.125 -\begin{figure}[ht] 12.126 - \center{ 12.127 - \includegraphics[width=1in, height=6.1in]{../figures/185.pdf} 12.128 +\mbox 12.129 + { \subfigure[description of left graph] 12.130 + {\includegraphics[width=1in]{../figures/209.pdf} 12.131 + }\quad 12.132 + \subfigure[description of right graph] 12.133 + {\includegraphics[width=1in]{../figures/210.pdf} 12.134 + } 12.135 + \subfigure[description of right graph] 12.136 + {\includegraphics[width=1in]{../figures/212.pdf} 12.137 + } 12.138 } 12.139 - \caption 12.140 - { Foo 12.141 - } 12.142 -\label{figCommProcr} 12.143 -\end{figure} 12.144 +\caption{Text pertaining to all graphs ...} \label{fig12} 12.145 +\end{figure*} 12.146 12.147 12.148 \section{Setup}
