### changeset 97:3f338effbfd9

Proto-runtime paper -- checkpoint, about to delete bunch from intro
author Sean Halle Mon, 10 Dec 2012 05:51:06 -0800 45975739549a 6b354d9aefb5 0__Papers/VMS/VMS__Foundation_Paper/VMS__Full_conference_version/latex/VMS__Full_conf_paper_2.tex 1 files changed, 10 insertions(+), 8 deletions(-) [+]
line diff
     1.1 --- a/0__Papers/VMS/VMS__Foundation_Paper/VMS__Full_conference_version/latex/VMS__Full_conf_paper_2.tex	Fri Nov 30 13:26:57 2012 -0800
1.2 +++ b/0__Papers/VMS/VMS__Foundation_Paper/VMS__Full_conference_version/latex/VMS__Full_conf_paper_2.tex	Mon Dec 10 05:51:06 2012 -0800
1.3 @@ -84,13 +84,13 @@
1.4
1.5  [Note to reviewers: this paper's style and structure follow the official PPoPP guide to writing style, which is linked to the PPoPP website. We are taking on faith that the approach has been communicated effectively to reviewers and that we won't be penalized for following it's unorthodox structure.]
1.6
1.7 -Programming in the past has been overwhelmingly sequential, with the applications being run on sequential hardware.  But the laws of physics have forced the hardware to become parallel, even down to embedded devices such as smart phones. The trend appears unstoppable, forcing essentially all future programming to  become parallel programming.  However,  sequential programming remains the dominant approach due to  the difficulty of the traditional parallel programming methods.
1.8 +Programming in the past has been overwhelmingly sequential, with the applications being run on sequential hardware.  But the laws of physics have forced the hardware to become parallel. This trend will force nearly all future programming to  become parallel programming.  However,  the transition from sequential to parallel programming has been slow due to  the difficulty of the traditional parallel programming methods.
1.9
1.10 -The difficulties with parallel programming fall into three main categories: 1) a difficult mental model, 2) having to rewrite the code for each hardware target to get acceptable performance and 3) disruption to existing practices, including steep learning curve, changes to the tools used, and changes in design practices. Many believe that these can all be overcome with the use of Domain-Specific Languages. But such languages have been costly to create, and to port across hardware targets, which makes them impractical given the small number of users of each language, and so have not caught on.
1.11 +The difficulties with parallel programming fall into three main categories: 1) a difficult mental model, 2) having to rewrite the code for each hardware target to get acceptable performance and 3) disruption to existing practices, including steep learning curve, changes to the tools used, and changes in design practices. Many believe that these can be overcome with the use of Domain-Specific Languages []. But such languages have been slow to adopt, we believe due to the cost to create and to port across hardware targets. This cost makes them impractical given the small number of users of each language, which is specific to a narrow domain.
1.12
1.13  We propose that a method that makes Domain Specific Languages (DSLs) low cost to produce as well as to port across hardware targets will allow them to fulfill their promise, and we introduce what we call a proto-runtime to help towards this goal.
1.14
1.15 -A proto-runtime is a normal, full, runtime, but with two key pieces replaced by an interface. One piece is the logic of language constructs, the other is the logic for choosing which core to assign work onto. What's left is the proto-runtime, which comprises low-level details of internal runtime communication between cores and protection of shared runtime state during concurrent accesses performed  by the plugged-in language-behavior pieces.
1.16 +The proto-runtime approach is a normal, full, runtime, but with two key pieces replaced by an interface. One  piece replaced is the logic of language constructs, and the other is logic for choosing which core to assign work onto. The remaining proto-runtime handles the  low-level hardware details of the runtime.
1.17
1.18  The decomposition into a proto-runtime plus  plugged-in  language behaviors modularizes the construction of runtimes.  The proto-runtime is one module, which  embodies runtime internals, which are hardware oriented and independent of language. The plugged-in portions form the two other modules, which are language specific. The interface between them   occurs at a natural boundary, which separates   the hardware oriented portion of a runtime from the language oriented portion.
1.19
1.20 @@ -103,22 +103,24 @@
1.21
1.22  \item The modularization  cleanly separates hardware oriented runtime internals from the logic of the language (\S).
1.23
1.24 -\item Those who use the proto-runtime approach can rely upon it to apply to future languages and hardware because the patterns underlying it appear to be fundamental and so should apply equally well to as yet undiscovered languages and architectures (\S\ ).
1.25 +\item Those who use the proto-runtime approach should be able to rely upon it to apply to future languages and hardware  because the patterns underlying it appear to be fundamental and should hold for future languages and architectures (\S\ ).
1.26
1.27
1.28  \item The modularization results in reduced time to implement a new language's behavior, and in reduced time to port a language to new hardware (\S\ ).
1.29
1.30  \begin{itemize}
1.31 +
1.32 +
1.33 +\item  Part of the time reduction is due to the proto-runtime providing a centralized location for services for all languages to use, so the language doesn't have to provide them separately.  Such services include debugging facilities, automated verification, concurrency handling, hardware performance information gathering, etc  (\S\ ).
1.34 +
1.35  \item Part of the time reduction is due to reuse of the runtime's internal hardware-oriented portion  by all languages (\S \ref{sec:intro}).
1.36
1.37 -
1.38  \item Part of the time reduction is due to all languages inheriting the effort of performance tuning the runtime internals, so the language doesn't have to tune runtime to hardware  (\S\ ).
1.39
1.40  \item  Part of the time reduction is due  to the use of sequential thinking when implementing the language logic. Sequential thinking is possible because the proto-runtime provides protection of shared internal runtime state, and exports an interface that presents a sequential model  (\S\ ).
1.41
1.42  \item Part of the  time reduction is due to the modularization making it practical to reuse language logic from one language to another  (\S\ ).
1.43
1.44 -\item  Part of the time reduction is due to the proto-runtime providing a centralized location for services for all languages to use, so the language doesn't have to provide them separately.  Such services include debugging facilities, automated verification, concurrency handling, hardware performance information gathering, etc  (\S\ ).
1.45
1.46  \end{itemize}
1.47
1.48 @@ -140,7 +142,7 @@
1.49
1.50  We next show what the blocks to eDSLs are, and where the main effort in implementing an eDSL lies. Specifically, in \S \ref{sec:DSLHypothesis} we show that the small number of users of an eDSL means that the eDSL must be very low effort to create, and also low effort to port to new hardware.  At the same time, the eDSL must remain very high performance across hardware targets.
1.51
1.52 -In \S we analyze where the effort of creating an eDSL is expended. It turns out that in the traditional approach, it is expended in creating the translator for the custom DSL syntax, in creating the runtime, and in performance tuning the major domain-specific constructs. We propose that the MetaBorg[] translation approach  covers creating translators for custom syntax, and that tuning constructs is inescapable, leaving the question of runtime implementation time.
1.53 +In \S we analyze where the effort of creating an eDSL is expended. It turns out that in the traditional approach, it is expended in creating the translator for the custom DSL syntax, in creating the runtime, and in performance tuning the major domain-specific constructs. We propose that the MetaBorg[] or Rose[] translation approaches cover creating translators for custom syntax, and that tuning constructs is inescapable, leaving the question of runtime implementation time.
1.54
1.55  In \S we explore the effects of runtime implementation time by taking a step back and examine what the industry-wide picture would be if the eDSL approach were adopted. A large number of eDSLs will come into existence, each with its own set of runtimes, one runtime for each hardware target.  That causes a multiplicative effect: the number of runtimes will equal the number of eDSLs times the number of hardware targets.  Unless the effort of implementing runtimes reduces, this multiplicative effect could dominate, which would retard the uptake of eDSLs. Thus, showing that an approach that mitigates this multiplicative effect is valuable, and is the role that the proto-runtime plays.
1.56
1.57 @@ -161,7 +163,7 @@
1.58  \section{The DSL Hypothesis}
1.59  \label{sec:DSLHypothesis}
1.60
1.61 -In this section we expand on the hypothesis that an embedded style DSL (eDSL) provides high programmer productivity, with a low learning curve. Further,  we show (\S ) that when an application is written in a well designed eDSL, porting it to new hardware becomes simpler because often only the language needs to be ported.  That is because the elements of the problem being solved that require large amounts of computation are often pulled into the language. Lastly (\S ),  we hypothesize that switching from sequential programming to using an eDSL is low disruption because the base language remains the same, along with most of the development tools and practices.
1.62 +In this section we expand on the hypothesis that an embedded style Domain Specfic Language (eDSL) provides high programmer productivity, with a low learning curve. Further,  we show (\S ) that when an application is written in a well designed eDSL, porting it to new hardware becomes simpler because often only the language needs to be ported.  That is because the elements of the problem being solved that require large amounts of computation are often pulled into the language. Lastly (\S ),  we hypothesize that switching from sequential programming to using an eDSL is low disruption because the base language remains the same, along with most of the development tools and practices.
1.63
1.64  In \S \ref{sec:DSLHypothesis} we show that the small number of users of an eDSL means that the eDSL must be very low effort to create, and also low effort to port to new hardware.  At the same time, the eDSL must remain very high performance across hardware targets.
1.65