Alpha 21364
Encyclopedia
The Alpha 21364, code-named "Marvel", also known as EV7 is a microprocessor
developed by Digital Equipment Corporation
(DEC), later Compaq Computer Corporation, that implemented the Alpha
instruction set architecture (ISA).
with a 1.5 MB 6-way set-associative on-die secondary cache, an integrated Direct Rambus DRAM
memory controller
and an integrated network controller for connecting to other microprocessors. Changes to the Alpha 21264 core included a larger victim buffer, which was quadrupled in capacity to 32 entries, 16 for the Dcache and 16 for the Scache. It was reported by the Microprocessor Report that Compaq considered implementing minor changes to branch predictor to improve branch prediction accuracy and doubling the miss buffer in capacity to 16 entries instead of 8 in the Alpha 21264.
It was expected to be taped-out
in late 1999, with samples available in early 2000 and volume shipments in late 2000. However, the original schedule was delayed, with the tape-out in April 2001 instead of late 1999. The Alpha 21364 was introduced on 20 January 2002 when systems using the microprocessor debuted. It operated at 1.25 GHz, but production models in the AlphaServer
ES47, ES80 and GS1280 operated at 1.0 GHz or 1.15 GHz. Unlike previous Alpha microprocessors, the Alpha 21364 was not sold on the open market.
The Alpha 21364 was originally intended to be succeeded by the Alpha 21464
, code-named EV8, a new implementation of the Alpha ISA with four-way simultaneous multithreading
(SMT). It was first presented in October 1999 at the 12th Annual Microprocessor Forum, but was cancelled on 25 June 2001 at a late stage of development.
. The column concluded that, "Over the coming decade, memory subsystem design will be the only important design issue for microprocessors."
. The only modification was a larger victim buffer, now quadrupled in capacity to 32 entries. The 32 entries of victim buffer is divided equally into 16 entries each for the Dcache and Scache. Although the Alpha 21364 is a fourth-generation implementation of the Alpha Architecture, aside from this modification, the core is otherwise identical to the EV68CB derivative of the Alpha 21264.
The time required for data requested from the cache to when it can be used is 12 cycles. The 12-cycle latency was considered by observers, such as the Microprocessor Report, to be significant. The latency of the Scache was not reduced further as it would have not improved performance. The Alpha 21264 core upon which the Alpha 21364 was based on was designed to use an external cache built from commodity SRAM, which has a significantly higher latency than the on-die Scache of the Alpha 21364. Thus, it could only accept data at a limited rate. Once improving latency saw no further gains, the designers focused on reducing the power consumed by the Scache. Compaq was not willing to remedy this deficiency as it would have required the Alpha 21264 core to be modified significantly. The high latency of the Scache permitted the cache tags be looked up first to determine if the Scache contained the requested data and in which bank it was located in before powering up the Scache bank and accessing it. This avoided unproductive Scache accesses, reducing power consumption.
The tag store consisted of 5.75 million transistors and data store of 108 million transistors.
s that support Rambus DRAM (RDRAM) that operate at two thirds of the microprocessor's clock frequency, or 800 MHz at 1.2 GHz. Compaq designed custom memory controllers for the Alpha 21364, giving them capabilities not found in standard RDRAM memory controllers such as having all the 128 pages open, reducing the access latency to those pages; and proprietary fault-tolerant features.
Each memory controller provides five RDRAM channels that support PC800 Rambus inline memory module
s (RIMMs). Four of the channels are used to provide memory, while the fifth is used to provide RAID
-like redundancy. Each channel is 16 bits wide, operates at 400 MHz and transfers data on both the rising and falling edges of the clock signal (double data rate
) for a transfer rate of 800 MT/s, yielding 1.6 GB/s of bandwidth. The total memory bandwidth of the eight channels is 12.8 GB/s.
Cache coherence is provided by the memory controllers. Each memory controller has a cache coherence engine. The Alpha 21364 uses a directory cache coherence scheme where part of the memory is used to store Modified, Exclusive, Shared, Invalid (MESI) coherency data.
The Alpha 21364 can connect to as many as 127 other microprocessors using two network topologies: shuffle and an 2D torus. The shuffle topology had more direct paths to other microprocessors, reducing latency and therefore improving performance, but was limited to connecting up to eight microprocessors as a result of its nature. The 2D torus topology enabled the network to feature up to 128 microprocessors.
In multiprocessing
systems, each microprocessor is a node with its own memory. Accessing the memory of other nodes is possible, but with a latency. The latency increases with distance, thus the Alpha 21364 implements non-uniform memory access
multiprocessing. I/O is also distributed in an identical fashion. An Alpha 21364 microprocessor in a multiprocessing system did not have to have its RIMM slots populated with memory or its I/O port populated with devices. It could use another microprocessor's memory and I/O.
Himalaya fault-tolerant servers from the MIPS architecture
to Alpha. The machines however never used the microprocessor as the decision to phase out the Alpha in favor of the Itanium was made before the availability of the Alpha 21364.
measured 21.1 mm by 18.8 mm for an area of 397 mm². It was fabricated by International Business Machines
(IBM) in their 0.18 µm, seven-level copper complementary metal–oxide–semiconductor
(CMOS) process
. It was packaged in a 1,443-land flip-chip land grid array
(LGA). It used a 1.65 V power supply and a 1.5 V external interface for a maximum power dissipation of 155 W at 1.25 GHz.
A prototype of the microprocessor was presented by Hewlett-Packard at the International Solid State Circuits Conference in February 2003. It operated at 1.45 GHz, had a die area of 251 mm², used a 1.2 V power supply, and dissipated 100 W (estimated).
The Alpha 21364A was to improved upon the Alpha 21364 by featuring higher clock frequencies in the range of ~1.6 to ~1.7 GHz and support for 1066 Mbit/s RDRAM memory. It was to be fabricated by IBM in their 0.13 µm silicon on insulator
(SOI) process. As a result of the more advanced process, there were reductions in die size, power supply voltage (1.2 V compared to 1.65 V), and in power consumption and dissipation.
, was introduced. It was discontinued on 27 April 2007 when the computer it was featured in was discontinued. It operated at 1.3 GHz, supported PC1066 RIMMs and was fabricated in the same 0.18 µm process as the Alpha 21364. Compared to the Alpha 21364, the EV7z was 14 to 16 percent faster, but was still slower than the Alpha 21364A it replaced, which was estimated to outperform the Alpha 21364 by 25 percent at 1.5 GHz.
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
developed by Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
(DEC), later Compaq Computer Corporation, that implemented the Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
instruction set architecture (ISA).
History
The Alpha 21364 was revealed in October 1998 by Compaq at the 11th Annual Microprocessor Forum, where it was described as an Alpha 21264Alpha 21264
The Alpha 21264 was a Digital Equipment Corporation RISC microprocessor introduced in October, 1996. The 21264 implemented the Alpha instruction set architecture .- Description :...
with a 1.5 MB 6-way set-associative on-die secondary cache, an integrated Direct Rambus DRAM
RDRAM
Direct Rambus DRAM or DRDRAM is a type of synchronous dynamic RAM. RDRAM was developed by Rambus inc., in the mid-1990s as a replacement for then-prevalent DIMM SDRAM memory architecture....
memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
and an integrated network controller for connecting to other microprocessors. Changes to the Alpha 21264 core included a larger victim buffer, which was quadrupled in capacity to 32 entries, 16 for the Dcache and 16 for the Scache. It was reported by the Microprocessor Report that Compaq considered implementing minor changes to branch predictor to improve branch prediction accuracy and doubling the miss buffer in capacity to 16 entries instead of 8 in the Alpha 21264.
It was expected to be taped-out
Tape-out
In electronics design, tape-out or tapeout is the final result of the design cycle for integrated circuits or printed circuit boards, the point at which the artwork for the photomask of a circuit is sent for manufacture....
in late 1999, with samples available in early 2000 and volume shipments in late 2000. However, the original schedule was delayed, with the tape-out in April 2001 instead of late 1999. The Alpha 21364 was introduced on 20 January 2002 when systems using the microprocessor debuted. It operated at 1.25 GHz, but production models in the AlphaServer
AlphaServer
AlphaServer was the name given to a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and latterly by Compaq and HP. As the name suggests, the AlphaServers were based on the DEC Alpha 64-bit microprocessor...
ES47, ES80 and GS1280 operated at 1.0 GHz or 1.15 GHz. Unlike previous Alpha microprocessors, the Alpha 21364 was not sold on the open market.
The Alpha 21364 was originally intended to be succeeded by the Alpha 21464
Alpha 21464
The Alpha 21464 is an unfinished microprocessor that implements the Alpha instruction set architecture developed by Digital Equipment Corporation and later by Compaq after it acquired Digital. The microprocessor was also known as EV8 or Araña, the latter being its code-name...
, code-named EV8, a new implementation of the Alpha ISA with four-way simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT). It was first presented in October 1999 at the 12th Annual Microprocessor Forum, but was cancelled on 25 June 2001 at a late stage of development.
Development
The development of the Alpha 21364 was most focused on features that would improve memory performance and multiprocessor scalability. The focus on memory performance was the result of column published in Microprocessor Report titled, "It's the Memory, Stupid!" written by Richard L. Sites, the chief architect of the Alpha ArchitectureDEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
. The column concluded that, "Over the coming decade, memory subsystem design will be the only important design issue for microprocessors."
Description
The Alpha 21364 was an Alpha 21264 with a 1.75 MB on-die secondary cache, two integrated memory controllers and an integrated network controller.Core
The Alpha 21364's core is based on the EV68CB, a derivative of the Alpha 21264Alpha 21264
The Alpha 21264 was a Digital Equipment Corporation RISC microprocessor introduced in October, 1996. The 21264 implemented the Alpha instruction set architecture .- Description :...
. The only modification was a larger victim buffer, now quadrupled in capacity to 32 entries. The 32 entries of victim buffer is divided equally into 16 entries each for the Dcache and Scache. Although the Alpha 21364 is a fourth-generation implementation of the Alpha Architecture, aside from this modification, the core is otherwise identical to the EV68CB derivative of the Alpha 21264.
Scache
The secondary cache (termed "Scache") is a unified cache with a capacity of 1.75 MB. It is 7-way set associative, uses a 64-byte line size, and has a write-back policy. The cache is protected by single-bit error correction, double-bit error detection (SECDED) error-correcting code (ECC). It is connected to the cache controller by a 128-bit data path. Access to the cache is fully pipelined, yielding a sustainable bandwidth of 16 GB/s at 1.0 GHz.The time required for data requested from the cache to when it can be used is 12 cycles. The 12-cycle latency was considered by observers, such as the Microprocessor Report, to be significant. The latency of the Scache was not reduced further as it would have not improved performance. The Alpha 21264 core upon which the Alpha 21364 was based on was designed to use an external cache built from commodity SRAM, which has a significantly higher latency than the on-die Scache of the Alpha 21364. Thus, it could only accept data at a limited rate. Once improving latency saw no further gains, the designers focused on reducing the power consumed by the Scache. Compaq was not willing to remedy this deficiency as it would have required the Alpha 21264 core to be modified significantly. The high latency of the Scache permitted the cache tags be looked up first to determine if the Scache contained the requested data and in which bank it was located in before powering up the Scache bank and accessing it. This avoided unproductive Scache accesses, reducing power consumption.
The tag store consisted of 5.75 million transistors and data store of 108 million transistors.
Memory controller
The Alpha 21364 has two integrated memory controllerMemory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
s that support Rambus DRAM (RDRAM) that operate at two thirds of the microprocessor's clock frequency, or 800 MHz at 1.2 GHz. Compaq designed custom memory controllers for the Alpha 21364, giving them capabilities not found in standard RDRAM memory controllers such as having all the 128 pages open, reducing the access latency to those pages; and proprietary fault-tolerant features.
Each memory controller provides five RDRAM channels that support PC800 Rambus inline memory module
RDRAM
Direct Rambus DRAM or DRDRAM is a type of synchronous dynamic RAM. RDRAM was developed by Rambus inc., in the mid-1990s as a replacement for then-prevalent DIMM SDRAM memory architecture....
s (RIMMs). Four of the channels are used to provide memory, while the fifth is used to provide RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...
-like redundancy. Each channel is 16 bits wide, operates at 400 MHz and transfers data on both the rising and falling edges of the clock signal (double data rate
Double data rate
In computing, a computer bus operating with double data rate transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition....
) for a transfer rate of 800 MT/s, yielding 1.6 GB/s of bandwidth. The total memory bandwidth of the eight channels is 12.8 GB/s.
Cache coherence is provided by the memory controllers. Each memory controller has a cache coherence engine. The Alpha 21364 uses a directory cache coherence scheme where part of the memory is used to store Modified, Exclusive, Shared, Invalid (MESI) coherency data.
R-box
The R-box contains the network router. The network router connected the microprocessor to other microprocessors using four ports named North, South, East and West. Each port consisted of two 39-bit unidirectional links operating at 800 MHz. 32 bits were for data and 7 bits were for ECC. The network router also has a fifth port, used for I/O. This port connects to an IO7 application specific integrated circuit (ASIC), which was a bridge to an AGP 4x bus and two PCI-X buses. The I/O port consisted of two unidirectional 32-bit links operating at 200 MHz, yielding a peak bandwidth of 3.2 GB/s. The I/O port link operated at a quarter of the clock frequency to simplify the design of the I/O ASIC.The Alpha 21364 can connect to as many as 127 other microprocessors using two network topologies: shuffle and an 2D torus. The shuffle topology had more direct paths to other microprocessors, reducing latency and therefore improving performance, but was limited to connecting up to eight microprocessors as a result of its nature. The 2D torus topology enabled the network to feature up to 128 microprocessors.
In multiprocessing
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...
systems, each microprocessor is a node with its own memory. Accessing the memory of other nodes is possible, but with a latency. The latency increases with distance, thus the Alpha 21364 implements non-uniform memory access
Non-Uniform Memory Access
Non-Uniform Memory Access is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor...
multiprocessing. I/O is also distributed in an identical fashion. An Alpha 21364 microprocessor in a multiprocessing system did not have to have its RIMM slots populated with memory or its I/O port populated with devices. It could use another microprocessor's memory and I/O.
Fault tolerance
The Alpha 21364 could operate in lock-step for fault-tolerant computers. This feature was a result in Compaq's decision to migrate Tandem'sTandem Computers
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...
Himalaya fault-tolerant servers from the MIPS architecture
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...
to Alpha. The machines however never used the microprocessor as the decision to phase out the Alpha in favor of the Itanium was made before the availability of the Alpha 21364.
Fabrication
The Alpha 21364 contained 152 million transistors. The dieDie (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...
measured 21.1 mm by 18.8 mm for an area of 397 mm². It was fabricated by International Business Machines
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
(IBM) in their 0.18 µm, seven-level copper complementary metal–oxide–semiconductor
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
(CMOS) process
Semiconductor fabrication
Semiconductor device fabrication is the process used to create the integrated circuits that are present in everyday electrical and electronic devices. It is a multiple-step sequence of photolithographic and chemical processing steps during which electronic circuits are gradually created on a wafer...
. It was packaged in a 1,443-land flip-chip land grid array
Land grid array
The land grid array is a type of surface-mount packaging for integrated circuits that is notable for having the pins on the socket rather than the integrated circuit...
(LGA). It used a 1.65 V power supply and a 1.5 V external interface for a maximum power dissipation of 155 W at 1.25 GHz.
Alpha 21364A
The Alpha 21364A, code-named EV79, previously EV78, was a further development of the Alpha 21364. It was intended to be the last Alpha microprocessor developed. Scheduled to be introduced in 2004, it was cancelled on 23 October 2003, with HP cited performance and schedule issues as reasons. A replacement, the EV7z was announced on the same day.A prototype of the microprocessor was presented by Hewlett-Packard at the International Solid State Circuits Conference in February 2003. It operated at 1.45 GHz, had a die area of 251 mm², used a 1.2 V power supply, and dissipated 100 W (estimated).
The Alpha 21364A was to improved upon the Alpha 21364 by featuring higher clock frequencies in the range of ~1.6 to ~1.7 GHz and support for 1066 Mbit/s RDRAM memory. It was to be fabricated by IBM in their 0.13 µm silicon on insulator
Silicon on insulator
Silicon on insulator technology refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improving performance...
(SOI) process. As a result of the more advanced process, there were reductions in die size, power supply voltage (1.2 V compared to 1.65 V), and in power consumption and dissipation.
EV7z
The EV7z was a further development of the Alpha 21364. It was the last Alpha microprocessor developed and introduced. The EV7z became known on 23 October 2003 when HP announced they had cancelled the Alpha 21364A and would be replacing it with the EV7z. The EV7z was introduced on 16 August 2004 when the only computer using the microprocessor, AlphaServer GS1280AlphaServer
AlphaServer was the name given to a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and latterly by Compaq and HP. As the name suggests, the AlphaServers were based on the DEC Alpha 64-bit microprocessor...
, was introduced. It was discontinued on 27 April 2007 when the computer it was featured in was discontinued. It operated at 1.3 GHz, supported PC1066 RIMMs and was fabricated in the same 0.18 µm process as the Alpha 21364. Compared to the Alpha 21364, the EV7z was 14 to 16 percent faster, but was still slower than the Alpha 21364A it replaced, which was estimated to outperform the Alpha 21364 by 25 percent at 1.5 GHz.
Further reading
- Kowaleski, J.A., Jr. et al. (2003). "Implementation of an Alpha microprocessor in SOI". ISSCC Digest of Technical Papers. pp. 248–249, 491.
- Tsuk, M. et al. (2001). "Modeling and measurement of the Alpha 21364 package". Electrical Performance of Electrical Packaging. pp. 283–286.
- Xanthopoulos, T. et al. (2001). "The design and analysis of the clock distribution network for a 1.2GHz Alpha microprocessor". ISSCC Digest of Technical Papers. pp. 402–403.