Alpha 21464
Encyclopedia
The Alpha 21464 is an unfinished microprocessor
that implements the Alpha
instruction set architecture (ISA) developed by Digital Equipment Corporation
and later by Compaq
after it acquired Digital. The microprocessor was also known as EV8 or Araña, the latter being its code-name. Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of Itanium
by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been taped out.
The 21464's origins began in the mid-1990s when computer scientist Joel Emer
was inspired by Dean Tullsen's research into simultaneous multithreading
(SMT) at the University of Washington
. Emer had researched the technology in the late 1990s and began to promote it once he was convinced of its value. Compaq made the announcement that the next Alpha microprocessor would use SMT in October 1999 at Microprocessor Forum 1999. At that time, it was expected that systems using the Alpha 21464 would ship in 2003.
design with out-of-order execution
, four-way SMT and a deep pipeline. It fetches 16 instructions from a 64 KB two-way set-associative instruction cache. The branch predictor then selected the "good" instructions and entered them into a collapsing buffer. (This allowed for a fetch bandwidth of up to 16 instructions per cycle, depending on the taken branch density.) The front-end had significantly more stages than previous Alpha implementation and as a result, the 21464 had a significant minimum branch misprediction
penalty of 14 cycles. The microprocessor used an advanced branch prediction algorithm to minimize these costly penalties.
Implementing SMT required the replication of certain resources such as the program counter
. Instead of one program counter, there were four program counters, one for each thread. However, very little logic after the front-end needed to be expanded for SMT support. The register file contained 512 entries, but its size was determined by the maximum number of in-flight instructions, not SMT. Access to the register file required three pipeline stages due to the physical size of the circuit. Up to eight instructions from four threads could be dispatched to eight integer and four floating-point execution units every cycle. The 21464 had a 64 KB data cache (Dcache), organized as eight banks to support dual-porting. This was backed by an on-die 3 MB, six-way set-associative unified secondary cache (Scache).
The integer execution unit made use of a new structure: the register cache. The register cache was not meant to mitigate the three tick register file latency (as some reports have claimed), but to reduce the complexity of operand bypass management. The register cache held all the results produced by the ALU and Load pipes for the previous N cycles. (N was something like 8.) The register cache structure was an architectural relabeling of what previous processors had implemented as a distributed mux.
The system interface was similar to that of the Alpha 21364
. There were integrated memory controller
s that provided ten RDRAM
channels. Multiprocessing was facilitated by a router that provided links to other 21464s, and it architecturally supported 512-way multiprocessing
without glue logic
.
It was to be implemented in a 0.125 μm (sometimes referred to as 0.13 μm) complementary metal–oxide–semiconductor
(CMOS) process with seven layers of copper interconnect, partially-depleted silicon-on-insulator (PD-SOI), and low-K
dielectric
. The transistor count was estimated to be 250 million and die size was estimated to be 420 mm2.
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
that implements the Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
instruction set architecture (ISA) developed by Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
and later by Compaq
Compaq
Compaq Computer Corporation is a personal computer company founded in 1982. Once the largest supplier of personal computing systems in the world, Compaq existed as an independent corporation until 2002, when it was acquired for US$25 billion by Hewlett-Packard....
after it acquired Digital. The microprocessor was also known as EV8 or Araña, the latter being its code-name. Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been taped out.
The 21464's origins began in the mid-1990s when computer scientist Joel Emer
Joel Emer
Dr Joel Emer is a pioneer in computer performance analysis techniques and a microprocessor architect. He is currently an Intel Fellow. He was the 2009 recipient of the Eckert-Mauchly Award, an ACM/IEEE joint award for contributions to computer and digital systems architecture. Dr Emer received...
was inspired by Dean Tullsen's research into simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT) at the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...
. Emer had researched the technology in the late 1990s and began to promote it once he was convinced of its value. Compaq made the announcement that the next Alpha microprocessor would use SMT in October 1999 at Microprocessor Forum 1999. At that time, it was expected that systems using the Alpha 21464 would ship in 2003.
Description
The microprocessor was an eight-issue superscalarSuperscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
design with out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...
, four-way SMT and a deep pipeline. It fetches 16 instructions from a 64 KB two-way set-associative instruction cache. The branch predictor then selected the "good" instructions and entered them into a collapsing buffer. (This allowed for a fetch bandwidth of up to 16 instructions per cycle, depending on the taken branch density.) The front-end had significantly more stages than previous Alpha implementation and as a result, the 21464 had a significant minimum branch misprediction
Branch misprediction
Branch misprediction occurs when a central processing unit mispredicts the next instruction to process in branch prediction, which is aimed at speeding up execution....
penalty of 14 cycles. The microprocessor used an advanced branch prediction algorithm to minimize these costly penalties.
Implementing SMT required the replication of certain resources such as the program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
. Instead of one program counter, there were four program counters, one for each thread. However, very little logic after the front-end needed to be expanded for SMT support. The register file contained 512 entries, but its size was determined by the maximum number of in-flight instructions, not SMT. Access to the register file required three pipeline stages due to the physical size of the circuit. Up to eight instructions from four threads could be dispatched to eight integer and four floating-point execution units every cycle. The 21464 had a 64 KB data cache (Dcache), organized as eight banks to support dual-porting. This was backed by an on-die 3 MB, six-way set-associative unified secondary cache (Scache).
The integer execution unit made use of a new structure: the register cache. The register cache was not meant to mitigate the three tick register file latency (as some reports have claimed), but to reduce the complexity of operand bypass management. The register cache held all the results produced by the ALU and Load pipes for the previous N cycles. (N was something like 8.) The register cache structure was an architectural relabeling of what previous processors had implemented as a distributed mux.
The system interface was similar to that of the Alpha 21364
Alpha 21364
The Alpha 21364, code-named "Marvel", also known as EV7 is a microprocessor developed by Digital Equipment Corporation , later Compaq Computer Corporation, that implemented the Alpha instruction set architecture .- History :...
. There were integrated memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
s that provided ten RDRAM
RDRAM
Direct Rambus DRAM or DRDRAM is a type of synchronous dynamic RAM. RDRAM was developed by Rambus inc., in the mid-1990s as a replacement for then-prevalent DIMM SDRAM memory architecture....
channels. Multiprocessing was facilitated by a router that provided links to other 21464s, and it architecturally supported 512-way multiprocessing
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...
without glue logic
Glue logic
In electronics, glue logic is the custom logic circuitry used to interface a number of off-the-shelf integrated circuits.This is often achieved using ordinary 7400- or 4000-series components. In more complex cases, programmable logic devices like a CPLD or FPGA might be used...
.
It was to be implemented in a 0.125 μm (sometimes referred to as 0.13 μm) complementary metal–oxide–semiconductor
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
(CMOS) process with seven layers of copper interconnect, partially-depleted silicon-on-insulator (PD-SOI), and low-K
Low-K
In semiconductor manufacturing, a low-κ dielectric is a material with a small dielectric constant relative to silicon dioxide. Although the proper symbol for the dielectric constant is the Greek letter κ , in conversation such materials are referred to as being "low-k" rather than "low-κ"...
dielectric
Dielectric
A dielectric is an electrical insulator that can be polarized by an applied electric field. When a dielectric is placed in an electric field, electric charges do not flow through the material, as in a conductor, but only slightly shift from their average equilibrium positions causing dielectric...
. The transistor count was estimated to be 250 million and die size was estimated to be 420 mm2.
Tarantula
Tarantula was the code-name for an extension of the Alpha architecture under consideration and a derivative of the Alpha 21464 that implemented the aforementioned extension. It was canceled while still in development, before any implementation work had started, and before the 21464 was finished. The extension was to provide Alpha with a vector processing capability. It specified thirty-two 64 by 128-bit (8,192-bit or 1 KB) vector registers, approximately 50 vector instructions, and an unspecified number of instructions for moving data to and from the vector registers. Other EV8 follow-up candidates included a multicore design with two EV8 cores and a 4.0 GHz speed-demon.Further reading
- "Alpha 21464 Targets 1.7 GHz in 2003". Microprocessor Report.