# HG changeset patch # User Sean Halle # Date 1345048977 25200 # Node ID bb5df2b662dd114fac9dec01c44e74e98a332d86 # Parent bc83d94128d0bee31a83a4d1e6ee7eafc68b47dc perf tuning -- merged differing ideas on conclusion diff -r bc83d94128d0 -r bb5df2b662dd 0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex --- a/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Aug 15 16:58:30 2012 +0200 +++ b/0__Papers/Holistic_Model/Perf_Tune/latex/Holistic_Perf_Tuning.tex Wed Aug 15 09:42:57 2012 -0700 @@ -467,7 +467,7 @@ The model's concepts of meta-unit and unit life-line map directly to the UCC visualization. The constraints in the UCC visualization are those stated in or implied by the application (with the complexities about UCC modifications and levels noted in Section \ref{sec:theory}). -However, the SCG is not a strict expression of the model, rather it's purpose is practical. It shows usage of cores, and relates that to the quantities in the model. Hence, the measurements for the SCG all are boundaries, where the core's time switches from one category in the model to a different. +However, the SCG is not a strict expression of the model, rather it's purpose is practical. It shows usage of cores, and relates that to the quantities in the model. Hence, the measurements for the SCG all are boundaries of where the core's time switches from one category in the model to a different. This differs from the model in subtle ways. Most notably, the model declares segments of time where communications take place, while the SCG doesn't measure the communication time directly, rather it captures idleness of the core caused by the non-overlapped portion of that communication. Also, when calculating the critical path, the SCG only counts non-overlapped portions of runtime activity. @@ -562,13 +562,21 @@ \section{Conclusion} \label{sec:conclusion} -We have shown how to apply a generalized model of parallel computation to build adaptable performance visualizations, relying only on information collected through instrumenting the language runtime, with no modification to the application. -The approach is demonstrated through the case study of instrumenting the SSR message-passing language runtime and using it to tune a simple parallel matrix multiply. +We have shown how to apply a new, and general, model of parallel computation to build performance visualizations that simplify identifying instances of performance loss and linking them to details of application code responsible. They rely only on information collected through instrumenting the language runtime, with no modification to the application. -The resulting visualizations show that the focus on the parallelism-relevant concepts of work units and constraints on their execution allows a clearer view of parallelism-specific issues. -By integrating visual display of constraints stemming from application structure, language runtime implementation, or hardware features, the various possible causes for performance loss are covered. A flexible filtering system for different types of constraints avoids overcharging the display. +By integrating visual display of constraints due to application structure, language runtime implementation, and hardware features, all relevant causes for performance loss are covered. The semantic information collected allows filtering for the relevant types of constraints, to avoid overcharging the display. -As the approach relies on information available to the runtime, we expect that even better results will be observed for ``high-level'' parallel languages that more closely match application concepts instead of hardware concepts. +We demonstrated, with a case study, how this improves usability and eliminates guesswork, by providing a direct path to details in the application code where changes should be made. These benefits derive from the computation model, which focuses on the aspects of parallelism relevant to performance in a way that makes generation of the correct hypothesis for performance loss straight forward. + +%I'd like to avoid weaknesses of our approach, in the conclusion.. and this wasn't discussed much in the body. +%As the approach relies on information available to the runtime, we expect that even better results will be observed for ``high-level'' parallel languages that more closely match application concepts instead of hardware concepts. + + + + + + + \bibliography{bib_for_papers_12_Jy_15}