Mercurial > cgi-bin > hgwebdir.cgi > VMS > 0__Writings > kshalle
changeset 22:6166abb29bf4
Future arch -- gen pdf
| author | Some Random Person <seanhalle@yahoo.com> |
|---|---|
| date | Thu, 12 Apr 2012 08:53:52 -0700 |
| parents | dd038db1f191 |
| children | 8a46b7d0621a |
| files | 0__Papers/Future_Architecture/latex/Future_Architecture.pdf 0__Papers/Future_Architecture/latex/Future_Architecture.tex |
| diffstat | 2 files changed, 6 insertions(+), 4 deletions(-) [+] |
line diff
1.1 Binary file 0__Papers/Future_Architecture/latex/Future_Architecture.pdf has changed
2.1 --- a/0__Papers/Future_Architecture/latex/Future_Architecture.tex Thu Apr 12 08:49:09 2012 -0700 2.2 +++ b/0__Papers/Future_Architecture/latex/Future_Architecture.tex Thu Apr 12 08:53:52 2012 -0700 2.3 @@ -502,11 +502,13 @@ 2.4 Together, these features should efficiently implement transactional memory, thread-level speculation, acquire-release, and speculative implementation of the variations on sequential consistency. 2.5 \paragraph{simplified and scalable memory models} The communication processor plus speculation hardware can support a wide variety of memory models, including simplified high-level ones implied by domain-specific constructs. The speculation and linkage to context-swapping allows memory consistency and communication to be overlapped with work. Scalability is then in the hands of communication firm-ware. 2.6 2.7 -\paragraph{high-level constructs for on-chip communications} Essentially any high-level communication construct can be implemented in firm-ware of the communication processors. Further, linkage between communication processor and work processor brings pipeline-level hardware control to the high-level communication constructs. They can cause virtual-processors to be swapped out of hardware during communication, so that it is overlapped with useful work from a different context. 2.8 +\paragraph{high-level constructs for on-chip communications} 2.9 + Essentially any high-level communication construct can be implemented in firm-ware of the communication processors. Further, linkage between communication processor and work processor brings pipeline-level hardware control to the high-level communication constructs. They can cause virtual-processors to be swapped out of hardware during communication, so that it is overlapped with useful work from a different context. 2.10 2.11 2.12 2.13 -\paragraph{future directions in programming massively parallel systems} A hierarchy of runtimes, with each level tuned to one level in the HW hierarchy will be key. The algorithms and code should be arranged so that data and computation on it is divided into fractal-like patterns. The goal is for each level of hardware to look the same in terms of communication and computation activity. Thus, communication within work-units scales the same as communication available in the hardware, as level in the hierarchy is traversed. 2.14 +\paragraph{future directions in programming massively parallel systems} 2.15 +A hierarchy of runtimes, with each level tuned to one level in the HW hierarchy will be key. The algorithms and code should be arranged so that data and computation on it is divided into fractal-like patterns. The goal is for each level of hardware to look the same in terms of communication and computation activity. Thus, communication within work-units scales the same as communication available in the hardware, as level in the hierarchy is traversed. 2.16 2.17 This means programmers need to find hierarchical approximations to problems, where they accumulate lower-level results. This produces an application hierarchy in which amount of communication between pieces decreases as go up. 2.18 2.19 @@ -520,6 +522,8 @@ 2.20 2.21 2.22 2.23 +\end{document} 2.24 + 2.25 \bibliography{Bib_for_papers} 2.26 2.27 ================================== 2.28 @@ -644,8 +648,6 @@ 2.29 ==================== 2.30 2.31 2.32 -\end{document} 2.33 - 2.34 Expanding on the first claim, the semantics of constructs, and information extracted from the application code by the toolchain can both be used by the runtime in decisions about task contents, which task to run on which core, and order of task execution. The communication pattern that results determines how much communication is overlapped with useful work, the energy of the computation, throughput, and average utilization. 2.35 2.36 Expanding on the second claim, currently, each domain-specific language requires significant effort to create, and more importantly to port to each hardware target. The small user-base of each language cannot support such cost, making domain-specific languages impractical. The suggested software stack minimizes the creation and porting effort for domain-specific languages, and firm-ware runtime support fits well within such a stack [HWSim and codec lang].
