111x Filetype PDF File size 1.79 MB Source: www.cs.uoregon.edu
Evolution of ProgrammingApproachesforHigh-Performance HeterogeneousSystems Jacob Lambert University of Oregon jlambert@cs.uoregon.edu Advisor: Allen D. Malony University of Oregon malony@cs.uoregon.edu External Advisor: Seyong Lee OakRidgeNational Lab lees2@ornl.gov Area ExamReport CommitteeMembers:AllenMalony,BoyanaNorris,HankChilds ComputerScience University of Oregon United States December14,2020 Evolution of ProgrammingApproachesforHigh-Performance HeterogeneousSystems ABSTRACT they still were created to address the same challenges: using opti- Nearly all contemporary high-performance systems rely on hetero- mized hardware to execute specific algorithmic patterns. geneous computation. As a result, scientific application developers ThePartitionable SIMD/MIMD System (PASM) [270] machine are increasingly pushed to explore heterogeneous programming developed at Purdue University in 1981 was initially developed for approaches. In this project, we discuss the long history of hetero- image processing and pattern recognition application. PASM was geneous computing and analyze the evolution of heterogeneous unique in that it could be dynamically reconfigured into either a programming approaches, from distributed systems to grid com- SIMDorMIMDmachine,oracombinationthereof.Thegoalwas puting to accelerator-based supercomputers. to create a machine that could be optimized for different image processing and pattern recognition tasks, configuring either more SIMDorMIMDcapabilitiesdependingontherequirementsofthe application. 1 INTRODUCTION However, like many early heterogeneous computing systems, Heterogeneouscomputingisparamounttotoday’shigh-performance programmability was not the primary concern. The programming systems.Thetopandnextgenerationofsupercomputersallemploy environment for PASM required the design of a new procedure- heterogeneity, and even desktop workstations can be configured based structured language similar to TRANQUIL [2], the develop- to utilize heterogeneous execution. The explosion of activity and mentofacustomcompiler, and even the development of a custom interest in heterogeneous computing, as well as the exploration operating system. and development of heterogeneous programming approaches, may Another early heterogeneous system was TRAC, the Texas Re- seemlike a recent trend. However, heterogeneous programming configurableArrayComputer[264],builtin1980.LikePASM,TRAC has been a topic of research and discussion for nearly four decades. could weave between SIMD and MIMD execution modes. But also Many of the issues faced by contemporary heterogeneous pro- like PASM,programmabilitywasnotaprimaryorcommonconcern grammingapproachdesigners have long histories, and have many with the TRAC machine, as it relied on now-arcane Job Control connections with now antiquated projects. Languages and APL source code [197]. In this project, we explore the evolution and history of hetero- Thelackoffocusonprogrammingapproachesforearlyheteroge- geneous computing, with a focus on the development of heteroge- neous systems is evident in some ways by the difficulty in finding neous programming approaches. In Section 2, we do a deep dive information on how the machines were typically programmed. into the field of distributed heterogeneous programming, the first However, as the availability of heterogeneous computing environ- application of hardware heterogeneity in computing. In Section 3, ments increased throughout the 1990s, so did the research and we briefly explore the resolutions of distributed heterogeneous development of programming environments. systems and approaches, and discuss the transitional period for Throughoutthe80sandearly90s,this environment expanded the field of heterogeneous computing. In Section 4, we provide a to include vector processors, scalar processors, graphics machines, broad exploration into contemporary accelerator-based heteroge- etc. To this end, in this first major section we explore distributed neouscomputing,specifically analyzing the different programming heterogeneous computing. approaches developed and employed across different accelerator Although the first heterogeneous machines consisted of mixed- architectures. Finally, in Section 5, we take a zoomed-out look at modemachines like PASM and TRAC, mixed-machine heteroge- the development of heterogeneous programming approaches, in- neous systems became the more popular and accessible option trospect on some important takeaways and topics, and speculate throughout the 1990s. Instead of a single machine with the ability about the future of next-generation heterogeneous systems. to switch betweenasynchronousSIMDmodeandanasynchronous MIMDmode,mixed-machinesystemscontainedavarietyofdiffer- 2 DISTRIBUTEDHETEROGENEOUS ent processing machines connected by a high-speed interconnect. SYSTEMS1980-1995 Examples of machines used in mixed-machine systems include Even40yearsago,computerscientists realized heterogeneity was graphics and rendering-specific machines like the Pixel Planes 5, needed due to diminishing returns in homogeneous systems. In Silicon Graphics 340 VGX, SIMD and vector machines like the the literature, the first references to the term "heterogeneous com- MasParMP-series and the CM 200/2000, and coarse grained MIMD puting" revolved around the distinction between single instruc- machines like the CM-5, Vista, and Sequent machines. tion, multiple data (SIMD) and multiple instruction, multiple data It was well understood that different classes of machines (SIMD, (MIMD)machinesinadistributed computing environment. MIMD,vector, graphics, sequential) excelled at different tasks (par- Several machines dating back to the 1980s were created and allel computation, statistical analysis, rendering, display), and that advertised as heterogeneous computers. Although these machines these machines could be networked together in a single system. were conceptually different than today’s heterogeneous machines, However,coordinatingthesedistributedsystemstoexecuteasingle application presented significant challenges, which many of the early surveyed works related to distributed heterogeneous com- projects in the next section began to address. puting, and they heavily influenced the heterogeneous systems In this section, we explore different programming frameworks created and heterogeneous software and programming approaches developed to utilize these distributed heterogeneous systems. In used. Ercegovac [106] lists how, at the time, the three different ap- Section 2.1, we review several surveys to gain a contextualized proacheswerecombinedindifferentwaystoformthefivefollowing insight into the research consensus during the time period. Then in heterogeneous approaches: Section 2.2, we review the most prominent and impactful program- (1) Mainframeswithintegratedvectorunits,programmedusing mingsystemsintroducedduringthistime.FinallyinSection2.3we a single instruction set augmented by vector instructions. discuss the evolution of distributed heterogeneous computing, and (2) Vector processors having two distinct types of instructions howitrelates to the subsequent sections. andprocessors, scalar and vector. An early example includes the SAXPY system, which could be classified as a Matrix 2.1 Distributed Heterogeneous Architectures, Processing Unit. Concepts, and Themes (3) Specialized processors attached to the host machine (AP). For insight into high-level perspectives, opinions, and the general This approach closely resembles accelerator-based heteroge- state of the area of early distributed heterogeneous computing, we neous computing, the subject of Section 4. The ST-100 and include discussions from several survey works published during ST-50 are early examples of this approach. the targeted time period. We aim to extract general trends and (4) Multiprocessor Systems with vector processors as nodes, overarching concepts that drove the development of early systems or scalar processors augmented with vector units as nodes. andearly heterogeneous programming approaches. For example in PASM, mentioned earlier in this Section, the TheworkbyErcegovac[106],HeterogeneityinSupercomputerAr- operating system supported multi-tasking at the FORTRAN chitectures, represents one of the first published works specifically level, and the programmer could use the fork/join API calls surveying the state of high performance heterogeneous computing. to exploit MIMD-level parallelism. Theydefine heterogeneity as the combination of different architec- CEDAR[172]representedanotherexampleofamultiproces- tures and system design styles into one system or machine, and sor cluster with eight processors, each processor modified their motivation for heterogeneous systems is summed up well by with an Alliant FX/8 mini-supercomputer. This allowed het- the following direct quote: erogeneity within clusters, and among clusters, and at the level of instructions, supporting vector processing, multipro- Heterogeneityinthedesign(ofsupercomputers)needs cessing, and parallel processing. to be considered when a point of diminishing returns (5) Special-purpose architectures that could contain heterogene- in a homogeneous architecture is reached. ity at both the implementation and function levels. The As we see throughout this work, this drive for specialization Navier-Stokes computer (NSC) [262] is an example. The to counter diminishing returns from existing hardware repeatedly nodes could be personalized via firmware to respond to inte- resurfaces, and this motivation for heterogeneous systems is very rior or boundary nodes. muchrelevant today. Five years later, another relevant survey, Heterogeneous Com- Ercegovac’sworkdefinesfourdistinctavenuesforheterogeneity: puting: Challenges and Opportunities was published by Khokhar (1) System Level - The combination of a CPU and an I/O channel et al [166]. Where the previous survey focused on heterogeneous processor, or a host and special processor, or a master/slave computing as a means to improve performance over homogeneous multiprocessor system. systems, this work offers an additional motivation; instead of re- (2) Operating System Level - The operating system in a dis- placing existing costly multiprocessor systems, they propose to tributed architecture, and how it handles functionality and leverage heterogeneous computing to use existing systems in an in- performance for a diverse set of nodes. tegrated environment. Conceptually, this motivation aligns closely (3) Program Level - Within a program, tasks need to be defined with the goals of grid and metacomputing, discussed in Section 3. as concurrent, either by a programmer or compiler, and then The authors present ten primary issues facing the developing those tasks are allocated and executed on different proces- heterogeneous computing systems, which also serve as a high- sors. level road map of the required facilities of a mature heterogeneous (4) Instruction Level - Specialized units, like an arithmetic vector programmingenvironment: pipelines, are used to provide optimal cost/performance ra- (1) Algorithm Design - Should existing algorithms be manually tios. These units execute specialized instructions to achieve refactored to exploit heterogeneity, or automatically profiled higher performance than possible with a generalized unit, to determine types of heterogeneous parallelism? at an extra cost. (2) Code-type Profiling - The process of determining code prop- erties (vectorizable, SIMD/MIMD parallel, scalar, special pur- At the time of Ercegovac’s work, there existed three primary pose) homogeneousprocessingapproachesinhigh-performancecomput- (3) Analytical Benchmarking - A quantitative method for deter- ing: (1) vector pipeline and array processors, (2) multiprocessors mining which code patterns and properties most appropri- and multi-computers following the MIMD model, and (3) attached ately map to which heterogeneous components in a hetero- SIMDprocessors. These approaches were ubiquitous across all the geneous system 2 (4) Partitioning - The process of dividing up an assigning an FreundandSiegelalsooffertwopotentialprogrammingparadigms: application to heterogeneous system, informed by the code- (1) the adaptation of existing languages for heterogeneous envi- type profiling and analytical benchmarking steps. ronments and (2) explicitly designed languages with heterogene- (5) Machine Selection - Given an array of available heteroge- ity in mind. They discuss advantages and disadvantages of both neous machines, what is the process for selecting the most paradigms. This discussion of balance between specificity and gen- appropriate machine for a given application. Typically, the erality in heterogeneous program paradigms continues today, with goal of machine selection methods and algorithms, for ex- contention between specific approaches like CUDA and general ample the Heterogeneous Optimal Selection Theory (HOST) approaches like OneAPI. Additionally, the authors depart from the algorithm [65] was to select the least expensive machine opinion that there would be one true compiler, architecture, oper- while respecting a maximal execution time. ating system, and tool set to handle all heterogeneous tasks well, (6) Scheduling - A heterogeneous system-level scheduler needs insisting that a variety of options will likely be beneficial depending to be aware of the different heterogeneous components and ontheapplication and context. schedule accordingly. In the conclusion, the authors predict that heterogeneity will (7) Synchronization - Communication between senders and re- always be necessary for wide classes of HPC problems; computa- ceivers, shared data structures, and collectives operations tional demands will always exceed capacity and grow faster than presented novel challenges in heterogeneous systems. hardware capabilities. This has certainly proven to be true, as het- (8) Network - The interconnection network itself between het- erogeneous computing is a staple in today’s high-performance erogeneous machines presented challenges. computing. (9) Programming Environments - Unlike today, where program- The 1994 work by Weems et al., Linguistic Support for Hetero- mibility and productivity lie at the forefront of heteroge- geneous Parallel Processing: A Survey and an Approach [292], is neoussystemdiscussions, in this work the discussion of pro- particularly interesting in the context of this project. As previ- grammingenvironmentsalmost seems like an afterthought. ously mentioned, programming approaches and methodologies are This is not unusual in works exploring early heterogeneous typically a minor consideration in many early heterogeneous com- systems however, as hardware system-level issues were typ- puting works. However, this work explored the existing options for ically the primary focus. However, they do mention that heterogeneous programming and the challenges and requirements a programming language would need to be independent, for heterogeneous languages. portable, and include cross-parallel compilers and debug- The authors define three essential criteria for evaluating the gers. suitability of languages for heterogeneous computing: (1) efficiency (10) PerformanceEvaluation-Finally,theydiscusstheneedforde- and performance, (2) ease of implementation, and (3) portability. velopmentofnovelperformanceevaluationtoolsspecifically Theydiscuss how languages would need to support an orthogonal designed for heterogeneous systems. combinationofdifferentprogrammingmodels,includingsequential, control (task) parallelism, coarse and fine-grained data parallelism, andsharedanddistributedmemory.Theystressthatheterogeneous Insummary,theauthorscallforaneedforbettertoolstoidentify programminglanguages must be extendable to avoid limitations parallelism, improved high-speed networking and communication on their adaptability, and that abstractions over trivialities must protocols, standards for interfaces between machines, efficient par- be provided in order to not overwhelm programmers, while still titioning and mapping strategies, and user-friendly interfaces and providing access details needed by system software. Furthermore, programmingenvironments. Many of these issues are addressed they discuss the need for an appropriate model of parallelism at bytheprogrammingapproachesandimplementationsdiscussed different levels, i.e., control parallelism at a high level, and data throughout this work. However, as more heterogeneous and spe- parallelism at a lower level. These kinds of considerations and cialized processors emerge (Sections 4.8 and 4.9), many of these concernsarestillrelevanttoday.Forexample,theubiquitousMPI+X issues resurface and remain as outstanding issues and challenges approach has long been the de facto solution for this kind of tiered with today’s high-performance heterogeneous computing. parallelism, but requires interfacing with two standards and two In the guest editor’s introduction of the 1993 ACM Computer implementations. journal, a special edition on Heterogeneous Processing, Freund and Weemsetal.thensurveytheexistinglanguages,anddiscusstheir Siegel offer a high-level perspective on the then-current state of limitations with respect to their vision of a truly heterogeneous high-performance heterogeneous computing [117]. language. They include Cluster-M [107], HPF [170], Delirium [203], They offer several motivations for heterogeneous processing. Linda[62],PCN[115],PVM[278],p4[60],C**,PC++,ABCL/1[302], Different types of tasks inherently contain different computational Jade [254], Ada9x [276], Charm++ [161], and Mentat [126] in the characteristics requiring different types of processors, and forcing discussion, some of which are explored in this project in Section 2.2. all problem sets to map to the same fixed processor is unnatural. Theyfurther detail six features of an ideal heterogeneous program- They also consider the notion that the primary goal of heteroge- minglanguage: neous computing should be to maximize usable performance as (1) supports any mode of parallelism opposed to peak performance, by means of using all available hard- (2) supports any grain size wareinaheterogeneouswayinsteadofmaximizingperformance (3) supports both implicit and explicit communication onaspecific processor. (4) users can define and abstract synchronizations 3
no reviews yet
Please Login to review.