Heterogeneous computing
Encyclopedia
Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor
(GPP), a special-purpose processor (i.e. digital signal processor
(DSP) or graphics processing unit
(GPU)), a co-processor, or custom acceleration logic (application-specific integrated circuit
(ASIC) or field-programmable gate array
(FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures
(ISAs). The demand for increased heterogeneity in computing systems is partially due to the need for high-performance, highly-reactive systems that interact with other environments (audio/video systems, control systems, networked applications, etc.). In the past, huge advances in technology and frequency scaling allowed the majority of computer applications to increase in performance without requiring structural changes or custom hardware acceleration. While these advances continue, their effect on modern applications is not as dramatic as other obstacles such as the memory-wall and power-wall come into play. Now, with these additional constraints, the primary method of gaining extra performance out of computing systems is to introduce additional specialized resources, thus making a computing system heterogeneous. This allows a designer to use multiple types of processing elements, each able to perform the tasks that it is best suited for. The addition of extra, independent computing resources necessarily allows most heterogeneous systems to be considered parallel computing
, or multi-core (computing)
systems. Another term sometimes seen for this type of computing is "hybrid computing". Hybrid-core computing
is a form of heterogeneous computing wherein asymmetric computational units coexist with a "commodity" processor.
The level of heterogeneity in modern computing systems gradually rises as increases in chip area and further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a system-on-chip, or SoC. As an example, many new processors now include built-in logic for interfacing with other devices (SATA
, PCI
, Ethernet
, RFID, Radio
s, UARTs, and Memory Controllers), as well as programmable functional units and hardware accelerators (GPUs, Encryption Co-processors, programmable network processors, A/V encoders/decoders, etc.).
ISA or instruction set architecture
ABI or application binary interface
API or application programming interface
Low-Level Implementation of Language Features
Memory Interface and Hierarchy
Interconnect
Heterogeneous platforms often require the use of multiple compilers in order to target the different types of compute elements found in such platforms. This results in a more complicated development process compared to homogeneous systems process; as multiple compilers and linkers must be used together in a cohesive way in order to properly target a heterogeneous platform. Interpretive techniques can be used to hide heterogeneity, but the cost (overhead) of interpretation often requires the use of just-in-time compilation
mechanisms that result in a more complex run-time system that may be unsuitable in embedded, or real-time scenarios.
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
(GPP), a special-purpose processor (i.e. digital signal processor
Digital signal processor
A digital signal processor is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.-Typical characteristics:...
(DSP) or graphics processing unit
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
(GPU)), a co-processor, or custom acceleration logic (application-specific integrated circuit
Application-specific integrated circuit
An application-specific integrated circuit is an integrated circuit customized for a particular use, rather than intended for general-purpose use. For example, a chip designed solely to run a cell phone is an ASIC...
(ASIC) or field-programmable gate array
Field-programmable gate array
A field-programmable gate array is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence "field-programmable"...
(FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
(ISAs). The demand for increased heterogeneity in computing systems is partially due to the need for high-performance, highly-reactive systems that interact with other environments (audio/video systems, control systems, networked applications, etc.). In the past, huge advances in technology and frequency scaling allowed the majority of computer applications to increase in performance without requiring structural changes or custom hardware acceleration. While these advances continue, their effect on modern applications is not as dramatic as other obstacles such as the memory-wall and power-wall come into play. Now, with these additional constraints, the primary method of gaining extra performance out of computing systems is to introduce additional specialized resources, thus making a computing system heterogeneous. This allows a designer to use multiple types of processing elements, each able to perform the tasks that it is best suited for. The addition of extra, independent computing resources necessarily allows most heterogeneous systems to be considered parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
, or multi-core (computing)
Multi-core (computing)
A multi-core processor is a single computing component with two or more independent actual processors , which are the units that read and execute program instructions...
systems. Another term sometimes seen for this type of computing is "hybrid computing". Hybrid-core computing
Hybrid-core computing
Hybrid-core computing is the technique of extending a commodity instruction set architecture with application-specific instructions to accelerate application performance...
is a form of heterogeneous computing wherein asymmetric computational units coexist with a "commodity" processor.
The level of heterogeneity in modern computing systems gradually rises as increases in chip area and further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a system-on-chip, or SoC. As an example, many new processors now include built-in logic for interfacing with other devices (SATA
Sata
Sata is a traditional dish from the Malaysian state of Terengganu, consisting of spiced fish meat wrapped in banana leaves and cooked on a grill.It is a type of Malaysian fish cake, or otak-otak...
, PCI
Peripheral Component Interconnect
Conventional PCI is a computer bus for attaching hardware devices in a computer...
, Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....
, RFID, Radio
Radio
Radio is the transmission of signals through free space by modulation of electromagnetic waves with frequencies below those of visible light. Electromagnetic radiation travels by means of oscillating electromagnetic fields that pass through the air and the vacuum of space...
s, UARTs, and Memory Controllers), as well as programmable functional units and hardware accelerators (GPUs, Encryption Co-processors, programmable network processors, A/V encoders/decoders, etc.).
Common features
Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include :ISA or instruction set architecture
- Compute elements may have different instruction set architectures, leading to binary incompatibility.
ABI or application binary interface
Application binary interface
In computer software, an application binary interface describes the low-level interface between an application program and the operating system or another application.- Description :...
- Compute elements may interpret memory in different ways. This may include both endianness, calling convention, and memory layout, and depends on both the architecture and compilerCompilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
being used.
API or application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
- Library and OS services may not be uniformly available to all compute elements.
Low-Level Implementation of Language Features
- Language features such as functions and threads are often implemented using function pointers, a mechanism which requires additional translation or abstraction when used in heterogeneous environments.
Memory Interface and Hierarchy
- Compute elements may have different cacheCacheIn computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
structures, cache coherencyCache coherencyIn computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...
protocols, and memory access may be uniform or non-uniform memory accessNon-Uniform Memory AccessNon-Uniform Memory Access is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor...
(NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
Interconnect
- Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory accessDirect memory accessDirect memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....
(DMA) devices, mailboxes, FIFOs, and scratchpad memoriesScratchpad RAMScratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress...
, etc.
Heterogeneous platforms often require the use of multiple compilers in order to target the different types of compute elements found in such platforms. This results in a more complicated development process compared to homogeneous systems process; as multiple compilers and linkers must be used together in a cohesive way in order to properly target a heterogeneous platform. Interpretive techniques can be used to hide heterogeneity, but the cost (overhead) of interpretation often requires the use of just-in-time compilation
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...
mechanisms that result in a more complex run-time system that may be unsuitable in embedded, or real-time scenarios.
Heterogeneous computing platforms
- Texas Instruments OMAP
- Analog Devices Blackfin
- IBM Cell
- SpursEngineSpursEngineSpursEngine is a microprocessor from Toshiba built as a media oriented coprocessor, designed for 3D- and video processing in consumer electronics such as set-top boxes and computers. The SpursEngine processor is also known as the Quad Core HD processor...
- Emotion EngineEmotion EngineThe Emotion Engine is a CPU developed and manufactured by Sony Computer Entertainment and Toshiba for use in the Sony PlayStation 2 video game console, as well as early PlayStation 3 models sold in Japan and North America...
- Intel IXP Network Processors
- XilinxXilinxXilinx, Inc. is a supplier of programmable logic devices. It is known for inventing the field programmable gate array and as the first semiconductor company with a fabless manufacturing model....
Platform FPGAs (Virtex-II Pro, Virtex 4 FX, Virtex 5 FXT) - CrayCrayCray Inc. is an American supercomputer manufacturer based in Seattle, Washington. The company's predecessor, Cray Research, Inc. , was founded in 1972 by computer designer Seymour Cray. Seymour Cray went on to form the spin-off Cray Computer Corporation , in 1989, which went bankrupt in 1995,...
XD1Cray XD1The Cray XD1 was an entry-level supercomputer range, made by Cray Inc.The XD1 uses AMD Opteron 64-bit CPUs, and utilizes the Direct Connect Architecture over HyperTransport to remove the bottleneck at the PCI and contention at the memory. The MPI latency is ¼ that of Infiniband, and 1/30... - SRC Computers SRC-6 and SRC-7
- Convey ComputerConvey ComputerConvey Computer Corporation is a privately owned company, established in December 2006 and is based in Richardson, Texas. Convey has developed a specific form of heterogeneous computing they call hybrid-core computing...
Corporation's HC-1 - Atmel Diopsis
- Intel Sandy Bridge and AMD FusionAMD FusionAMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...
CPUs - Intel "Stellarton" (Atom + FPGA)