IBM z196 (microprocessor)
Encyclopedia
The z196 microprocessor
is a chip made by IBM
for their zEnterprise 196
mainframe computer
s, announced on July 22, 2010. The processor was developed over a three year time span by IBM engineers from Poughkeepsie, New York; Austin, Texas
; and Böblingen
, Germany
at a cost of US$1.5 billion. Manufactured at IBM's Fishkill, New York
fabrication plant, the processor began shipping on September 10, 2010. IBM stated that it is the world's fastest microprocessor.
s fabricated in IBM's 45 nm CMOS
silicon on insulator
fabrication process, supporting speeds of 5.2 GHz
: the highest clock speed CPU ever produced for commercial sale.
The processor implements the CISC
z/Architecture
with a new superscalar
, out-of-order
pipeline
and 100 new instructions. The instruction pipeline has 15 to 17 stages; the instruction queue can hold 40 instructions; and up to 72 instructions can be "in flight". It has four cores, each with a private 64 KB
L1 instruction cache
, a private 128 KB L1 data cache and a private 1.5 MB
L2 cache
. In addition, there is a 24 MB shared L3 cache implemented in eDRAM
and controlled by two on-chip L3 cache controllers. There's also an additional shared L1 cache used for compression and cryptography operations.
Each core has six RISC-like execution units, including two integer units
, two load-store units, one binary floating point unit and one decimal floating point
unit. The z196 chip can decode three instructions and execute five operations in a single clock cycle.
The z196 chip has on board DDR3 RAM
memory controller
supporting a RAID
like configuration to recover from memory faults. The z196 also includes a GX bus controller for accessing host channel adapters and peripherals. Additionally, each chip includes co-processors for cryptographic and compression functionality.
(SMP), there are 2 dedicated companion chips called the Shared Cache (SC) that each adds 96 MB off-die L4 cache
for a total of 192 MB L4 cache. L4 cache is shared by all processors in the book. The SC chip consists of 1.5 billion transistors and measures 478.8 mm2, manufactured with the same 45 nm process as the z196 chip.
Each chip also has 24 MB L3 cache shared by the 4 cores on the chip.
uses multi-chip module
s (MCMs) which allows for six z196 chips to be on a single module. Each MCM has two shared cache chips allowing processors on the MCM to be connected with 40 GB/s links.
The different models of the zEnterprise System have a different number of active cores. To accomplish this, some processors in each MCM may have its fourth core disabled.
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
is a chip made by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
for their zEnterprise 196
IBM zEnterprise System
IBM zEnterprise System is the latest line of IBM mainframes, introduced on July 22, 2010. It consists of four components, zEnterprise 196 , zEnterprise BladeCenter Extension and zEnterprise Unified Resource Manager...
mainframe computer
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
s, announced on July 22, 2010. The processor was developed over a three year time span by IBM engineers from Poughkeepsie, New York; Austin, Texas
Austin, Texas
Austin is the capital city of the U.S. state of :Texas and the seat of Travis County. Located in Central Texas on the eastern edge of the American Southwest, it is the fourth-largest city in Texas and the 14th most populous city in the United States. It was the third-fastest-growing large city in...
; and Böblingen
Böblingen
Böblingen is a town in Baden-Württemberg, Germany, seat of Böblingen District. Physically Sindelfingen and Böblingen are continuous.-History:Böblingen was founded by Count Wilhelm von Tübingen-Böblingen in 1253. Württemberg acquired the town in 1357, and on 12 May 1525 one of the bloodiest battles...
, Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
at a cost of US$1.5 billion. Manufactured at IBM's Fishkill, New York
Fishkill, New York
Fishkill is an upscale village within the much larger town, Town of Fishkill, one of the fastest growing towns in the region, in Dutchess County, New York, USA. The village population was 1,735 at the 2000 census...
fabrication plant, the processor began shipping on September 10, 2010. IBM stated that it is the world's fastest microprocessor.
Description
The chip measures 512.3 mm2 and consists of 1.4 billion transistorTransistor
A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...
s fabricated in IBM's 45 nm CMOS
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
silicon on insulator
Silicon on insulator
Silicon on insulator technology refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improving performance...
fabrication process, supporting speeds of 5.2 GHz
GHZ
GHZ or GHz may refer to:# Gigahertz .# Greenberger-Horne-Zeilinger state — a quantum entanglement of three particles.# Galactic Habitable Zone — the region of a galaxy that is favorable to the formation of life....
: the highest clock speed CPU ever produced for commercial sale.
The processor implements the CISC
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...
z/Architecture
Z/Architecture
z/Architecture, initially and briefly called ESA Modal Extensions , refers to IBM's 64-bit computing architecture for IBM mainframe computers. IBM introduced its first z/Architecture-based system, the zSeries Model 900, in late 2000. Later z/Architecture systems include the IBM z800, z990, z890,...
with a new superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
, out-of-order
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...
pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
and 100 new instructions. The instruction pipeline has 15 to 17 stages; the instruction queue can hold 40 instructions; and up to 72 instructions can be "in flight". It has four cores, each with a private 64 KB
Kilobyte
The kilobyte is a multiple of the unit byte for digital information. Although the prefix kilo- means 1000, the term kilobyte and symbol KB have historically been used to refer to either 1024 bytes or 1000 bytes, dependent upon context, in the fields of computer science and information...
L1 instruction cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
, a private 128 KB L1 data cache and a private 1.5 MB
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...
L2 cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
. In addition, there is a 24 MB shared L3 cache implemented in eDRAM
EDRAM
eDRAM stands for "embedded DRAM", a capacitor-based dynamic random access memory integrated on the same die as an ASIC or processor. The cost-per-bit is higher than for stand-alone DRAM chips but in many applications the performance advantages of placing the eDRAM on the same chip as the processor...
and controlled by two on-chip L3 cache controllers. There's also an additional shared L1 cache used for compression and cryptography operations.
Each core has six RISC-like execution units, including two integer units
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
, two load-store units, one binary floating point unit and one decimal floating point
Decimal floating point
Decimal floating point arithmetic refers to both a representation and operations on decimal floating point numbers. Working directly with decimal fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions and binary fractions.The...
unit. The z196 chip can decode three instructions and execute five operations in a single clock cycle.
The z196 chip has on board DDR3 RAM
DDR SDRAM
Double data rate synchronous dynamic random access memory is a class of memory integrated circuits used in computers. DDR SDRAM has been superseded by DDR2 SDRAM and DDR3 SDRAM, neither of which are either forward or backward compatible with DDR SDRAM, meaning that DDR2 or DDR3 memory modules...
memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
supporting a RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...
like configuration to recover from memory faults. The z196 also includes a GX bus controller for accessing host channel adapters and peripherals. Additionally, each chip includes co-processors for cryptographic and compression functionality.
Shared Cache
Even though the z196 processor has on-die facilities for symmetric multiprocessingSymmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...
(SMP), there are 2 dedicated companion chips called the Shared Cache (SC) that each adds 96 MB off-die L4 cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
for a total of 192 MB L4 cache. L4 cache is shared by all processors in the book. The SC chip consists of 1.5 billion transistors and measures 478.8 mm2, manufactured with the same 45 nm process as the z196 chip.
Each chip also has 24 MB L3 cache shared by the 4 cores on the chip.
Multi-chip module
The zEnterprise System z196IBM zEnterprise System
IBM zEnterprise System is the latest line of IBM mainframes, introduced on July 22, 2010. It consists of four components, zEnterprise 196 , zEnterprise BladeCenter Extension and zEnterprise Unified Resource Manager...
uses multi-chip module
Multi-Chip Module
A multi-chip module is a specialized electronic package where multiple integrated circuits , semiconductor dies or other discrete components are packaged onto a unifying substrate, facilitating their use as a single component...
s (MCMs) which allows for six z196 chips to be on a single module. Each MCM has two shared cache chips allowing processors on the MCM to be connected with 40 GB/s links.
The different models of the zEnterprise System have a different number of active cores. To accomplish this, some processors in each MCM may have its fourth core disabled.