Explicitly Parallel Instruction Computing
Encyclopedia
Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance
to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP
development of the Intel Itanium
architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler
, rather than complex on-die
circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies
.
, in which multiple operations are encoded in every instruction, and then processed by multiple execution units.
One goal of EPIC was to move the complexity of instruction scheduling from the CPU hardware to the software compiler, which can do the instruction scheduling statically (with help of trace feedback information). This eliminates the need for complex scheduling circuitry in the CPU, which frees up space and power for other functions, including additional execution resources. An equally important goal was to further exploit instruction level parallelism
(ILP), by using the compiler to find and exploit additional opportunities for parallel execution
.
VLIW (at least the original forms) has several short-comings that precluded it from becoming mainstream:
EPIC architecture has evolved from VLIW architecture, while retaining many concepts of the superscalar architecture.
VLIW:
The EPIC architecture also includes a grab-bag of architectural concepts to increase ILP:
The Itanium
architecture also added register renaming and rotating register files, a tool useful for software pipelining
since it avoids having to manually unroll and rename registers.
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP
Hewlett-Packard
Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...
development of the Intel Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
, rather than complex on-die
Die (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...
circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies
Clock rate
The clock rate typically refers to the frequency that a CPU is running at.For example, a crystal oscillator frequency reference typically is synonymous with a fixed sinusoidal waveform, a clock rate is that frequency reference translated by electronic circuitry into a corresponding square wave...
.
Roots in VLIW
By 1989, researchers at HP recognized that RISC architectures were reaching a limit at one instruction per cycle . They began an investigation into a new architecture, later named EPIC. The basis for the research was VLIWVery long instruction word
Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...
, in which multiple operations are encoded in every instruction, and then processed by multiple execution units.
One goal of EPIC was to move the complexity of instruction scheduling from the CPU hardware to the software compiler, which can do the instruction scheduling statically (with help of trace feedback information). This eliminates the need for complex scheduling circuitry in the CPU, which frees up space and power for other functions, including additional execution resources. An equally important goal was to further exploit instruction level parallelism
Instruction level parallelism
Instruction-level parallelism is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f...
(ILP), by using the compiler to find and exploit additional opportunities for parallel execution
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
.
VLIW (at least the original forms) has several short-comings that precluded it from becoming mainstream:
- VLIW instruction setInstruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
s are not backward compatibleBackward compatibilityIn the context of telecommunications and computing, a device or technology is said to be backward or downward compatible if it can work with input generated by an older device...
between implementations. When wider implementations (more execution unitExecution unitIn computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the Branch Unit, which receives data from the CPU...
s) are built, the instruction set for the wider machines is not backward compatible with older, narrower implementations. - Load responses from a memory hierarchy which includes CPU cacheCPU cacheA CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
s and DRAMDramDram or DRAM may refer to:As a unit of measure:* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dirham, a unit of currency in several Arab nationsOther uses:...
do not have a deterministic delay. This makes static scheduling of load instructions by the compiler very difficult.
EPIC architecture has evolved from VLIW architecture, while retaining many concepts of the superscalar architecture.
Moving beyond VLIW
EPIC architectures add several features to get around the deficiencies ofVLIW:
- Each group of multiple software instructions is called a bundle. Each of the bundles has a stop bit indicating if this set of operations is depended upon by the subsequent bundle. With this capability, future implementations can be built to issue multiple bundles in parallel. The dependency information is calculated by the compiler, so the hardware does not have to perform operand dependency checking.
- A software prefetch instruction is used as a type of data prefetch. This prefetch increases the chances for a cache hit for loads, and can indicate the degree of temporal locality needed in various levels of the cache.
- A speculative load instruction is used to speculatively load data before it is known whether it will be used (bypassing control dependencies), or whether it will be modified before it is used (bypassing data dependencies).
- A check load instruction aids speculative loads by checking whether a speculative load was dependent on a later store, and thus must be reloaded.
The EPIC architecture also includes a grab-bag of architectural concepts to increase ILP:
- Predicated executionBranch predicationBranch predication is a strategy in computer architecture design for mitigating the costs usually associated with conditional branches, particularly branches to short sections of code...
is used to decrease the occurrence of branches and to increase the speculative executionSpeculative executionSpeculative execution in computer systems is doing work, the result of which may not be needed. This performance optimization technique is used in pipelined processors and other systems.-Main idea:...
of instructions. In this feature, branch conditions are converted to predicate registers which are used to kill results of executed instructions from the side of the branch which is not taken. - Delayed exceptions, using a not a thing bit within the general purpose registers, allow speculative execution past possible exceptions.
- Very large architectural register fileRegister fileA register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...
s avoid the need for register renamingRegister renamingIn computer architecture, register renaming refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations.-Problem definition:...
. - Multi-way branch instructions improve branch prediction by combining many alternative branches into one bundle.
The Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
architecture also added register renaming and rotating register files, a tool useful for software pipelining
Software pipelining
In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor...
since it avoids having to manually unroll and rename registers.
Other research and development
There have been other investigations into EPIC architectures that are not directly tied to the development of the Itanium architecture.- The IMPACT project at University of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignThe University of Illinois at Urbana–Champaign is a large public research-intensive university in the state of Illinois, United States. It is the flagship campus of the University of Illinois system...
, led by Wen-mei HwuWen-mei HwuWen-mei Hwu is a professor at University of Illinois at Urbana-Champaign specializing in compiler design, computer architecture, computer microarchitecture, and parallel processing. He currently holds the Walter J. Sanders III-Advanced Micro Devices Endowed Chair in Electrical and Computer...
, was the source of much influential research on this topic. - The PlayDoh architecture from HP-labs was another major research project.
- GelatoGelato FederationThe Gelato Federation is a "global technical community dedicated to advancing Linux on the Intel Itanium platform through collaboration, education, and leadership." Formed in 2001, membership includes more than seventy academic and research organizations around the world, including several that...
is an open source development community in which academic and commercial researchers are working to develop more effective compilers for Linux applications running on Itanium servers.
See also
- Complex instruction set computerComplex instruction set computerA complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...
(CISC) - Reduced instruction set computerReduced instruction set computerReduced instruction set computing, or RISC , is a CPU design strategy based on the insight that simplified instructions can provide higher performance if this simplicity enables much faster execution of each instruction. A computer based on this strategy is a reduced instruction set computer...
(RISC) - Very long instruction wordVery long instruction wordVery long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...
(VLIW) - Russian processors "Elbrus"Elbrus (computer)The Elbrus is a line of Soviet and Russian computer systems developed by Lebedev Institute of Precision Mechanics and Computer Engineering.In 1992 a spin-off company Moscow Center of SPARC Technologies was created and continued development....
External links
- Historical background for EPIC
- Mark Smotherman (2002) "Understanding EPIC Architectures and Implementations"