POWER1
Encyclopedia
The POWER1 is a multi-chip
CPU
developed and fabricated by IBM
that implemented the POWER
instruction set architecture
(ISA). It was originally known as the “RISC System/6000 CPU” or when an abbreviated form, the “RS/6000 CPU” before introduction of successors required the original name to be replaced with one that used the same naming scheme (POWERn) as its successors in order to differentiate it from the newer designs.
POWERserver servers
and POWERstation workstations, which featured the POWER1 clocked at 20, 25 or 30 MHz. The POWER1 received two upgrades, one in 1991, with the introduction of the POWER1+ and in 1992, with the introduction of POWER1++. These upgraded versions were clocked higher than the original POWER1, made possible by improved semiconductor processes. The POWER1+ was clocked slightly higher than the original POWER1, at frequencies of 25, 33 and 41 MHz, while the POWER1++ took the microarchitecture to its highest frequencies — 25, 33, 41.6, 45, 50 and 62.5 MHz. In September 1993, the POWER1 and its variants was succeeded by the POWER2
(known briefly as the "RIOS2"), an evolution of the POWER1 microarchitecture.
The direct derivatives of the POWER1 are the RISC Single Chip
(RSC), feature-reduced single-chip variant for entry-level RS/6000 systems, and the RAD6000
, a radiation-hardened variant of the RSC for space applications. An indirect derivative of the POWER1 is the PowerPC 601, a feature-reduced variant of the RSC intended for consumer applications.
The POWER1 is notable as it represented a number firsts for IBM and computing in general. It was IBM's first RISC processor intended for high-end applications (the ROMP
was considered a commercial failure and was not used in high-end workstations), it was the first to implement the then new POWER instruction set architecture and it was IBM's first successful RISC processor. For computing firsts, the POWER1 would be known for being the first CPU to implement some form of Register renaming
and out-of-order execution
, a technique that improves the performance of superscalar
processors but was previously reserved for mainframe
s.
The POWER1 was also the origin for the highly successful families of POWER
, PowerPC
and Power Architecture
processors that followed it, measuring in hundreds of different implementations.
The open source GCC compiler
removed support for POWER1 (RIOS) and POWER2 (RIOS2) in the 4.5 release.
two-way superscalar CPU. It contains three major execution units, a fixed-point unit
(FXU), a branch unit
(BPU) and a floating point unit
(FPU). Although the POWER1 is a 32-bit CPU with a 32-bit physical address
, its virtual address
is 52 bits long. The larger virtual address space was chosen because it was beneficial for the performance of applications, allowing each one to have a large 4 GB
address range.
The POWER1 is a big-endian
CPU that uses a Harvard style
cache
hierarchy with separate instruction and data caches. The instruction cache, referred to as the "I-cache" by IBM, is 8 KB in size and is two-way set associative with a line size of 64 bytes. The I-cache is located on the ICU chip. The data cache, referred to as the "D-cache" by IBM, is 32 KB in size for RIOS.9 configurations and 64 KB in size for RIOS-1 configurations. The D-cache is four-way set associative with a line size of 128 bytes. The D-cache employs a store-back scheme, where data that is to be stored is written to the cache instead of the memory in order to reduce the number of writes destined for the memory. The store-back scheme is used to prevent the CPU from monopolizing access to the memory.
Although the POWER1 was a high-end design, it was not capable of multiprocessing
, and as such was disadvantaged, as the only way performance could be improved was by clocking
the CPU higher, which was difficult to do with such a large multi-chip design. IBM used clustering to overcome this disadvantage in POWER1 systems, allowing them to effectively function as if they were multiprocessing systems, a concept proven by the popularity of SP1 supercomputers based on the POWER1. As the POWER1 was the basis of the POWER2 and P2SC microprocessors, the lack of multiprocessing was passed on to these later POWER processors. Multiprocessing was not supported until the introduction of the POWER3
in 1998.
unit (ICU), a fixed-point unit
(FXU), a floating point unit
(FPU), a number of data-cache
units (DCU), a storage-control
unit (SCU) and a I/O
unit. Due to its modular design, IBM was able to create two cofigurations by simply varying the number of DCUs, RIOS-1 and a RIOS.9. The RIOS-1 configuration has four DCUs, the intended amount, and was clocked at up to 40 MHz, whereas the RIOS.9 CPU had two DCUs and was clocked at lower frequencies.
The chips are mounted on the “CPU planar”, a printed circuit board
(PCB), using through-hole technology. Due to the large number of chips with wide buses, the PCB has eight planes for routing wires, four for power and ground and four for signals. There are two signal planes on each side of the board, while the four power and ground planes are in the center.
The chips that make up the POWER1 is fabricated in a 1.0 µm CMOS
process with three layers of interconnect. The chips are packaged in ceramic pin grid array
(CPGA) packages that can have up to 300 pins and dissipate a maximum of 4 W
of heat each. The total number of transistor
s featured by the POWER1, assuming that it is a RIOS-1 configuration, is 6.9 million, with 2.04 million used for logic and 4.86 million used for memory. The die area of all the chips combined is 1,284 mm². The total number of signal pins is 1,464.
(BPU). The BPU contains the program counter
, the condition code register and a loop register. The ICU contains 0.75 million transistors with 0.2 million used for logic and 0.55 million used for SRAM
. The ICU die
measures approximately 160 mm² (12.7 × 12.7 mm).
The BPU was capable of dispatching multiple instructions to the fixed and floating point instructions queues while it was executing a program flow control instruction (up to four simultaneously and out of order). Speculative branches
were also supported by using a prediction bit in the branch instructions, with the results discarded before being saved if the branch was not taken. The alternate instruction would be buffered and discarded if the branch was taken. Consequently, subroutine call
s and interrupt
s are dealt with without incurring branch penalties.
The condition code register has eight field sets, with the first two reserved for fixed and floating point instructions and the seventh for vector instructions
. The rest of the fields could be used by other instructions. The loop register is a counter for "decrement and branch on zero" loops with no branch penalty, a feature similar to those found in some DSP
s such as the TMS320C30.
for address translation. The FXU contains approximately 0.5 million transistors, with 0.25 million used for logic and 0.25 used for memory, on a die measuring approximately 160 mm².
(64-bit) instructions. It is capable of performing multiply-add instructions, which contributed to the POWER1's high floating point performance. In most processors, a multiply and an add, which is common in technical and scientific floating-point code, cannot be executed in one cycle, as in the POWER1. Use of fused multiply–add also means that the data is only rounded once, improving the precision of the result slightly.
The floating-point register file is also located on the FPU chip. It contains 32 64-bit floating-point registers, six rename registers and two registers that are used by divide instructions.
implemented through four identical data-cache units (DCU), each containing 16 KB of data cache. The cache and the buses that connect the DCU to the other chips are ECC protected. The DCUs also provide the interface to the memory. If two DCUs are present (RIOS.9 configuration), the memory bus is 64 bits wide, and if four DCUs are present (RIOS-1 configuration), the memory bus is 128 bits wide. The memory interface portion of the DCUs provide three features that improves the reliability and availability of the memory: memory scrubbing
, ECC and bit steering. Each DCU contains approximately 1.125 million transistors, with 0.175 million used for logic and 0.95 million used for SRAM, on a die measuring approximately 130 mm² (11.3 × 11.3 mm).
and I/O
devices is arbitrated by the SCU. Although the DCUs provide the means to perform memory scrubbing, it is the SCU that controls the process. The SCU contains approximately 0.23 million transistors, all of them for logic, on a die measuring approximately 130 mm².
adapters (SLAs). The IOCC implements the Micro Channel
interface and controls both I/O and DMA
transactions between the Micro Channel adapters and the system memory. The two SLAs each implement a serial fibre optic
link, which are intended to connect RS/6000 systems together. The optical links were not supported at the time of the RS/6000's release. The I/O unit contains approximately 0.5 million transistors, with 0.3 million used for logic and 0.2 million used for memory, on a die measuring approximately 160 mm².
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...
CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
developed and fabricated by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
that implemented the POWER
IBM POWER
POWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....
instruction set architecture
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
(ISA). It was originally known as the “RISC System/6000 CPU” or when an abbreviated form, the “RS/6000 CPU” before introduction of successors required the original name to be replaced with one that used the same naming scheme (POWERn) as its successors in order to differentiate it from the newer designs.
History
The POWER1 was introduced in 1990, with the introduction of the IBM RS/6000RS/6000
RISC System/6000, or RS/6000 for short, is a family of RISC and UNIX based servers, workstations and supercomputers made by IBM in the 1990s. The RS/6000 family replaced the IBM RT computer platform in February 1990 and was the first computer line to see the use of IBM's POWER and PowerPC based...
POWERserver servers
Server (computing)
In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...
and POWERstation workstations, which featured the POWER1 clocked at 20, 25 or 30 MHz. The POWER1 received two upgrades, one in 1991, with the introduction of the POWER1+ and in 1992, with the introduction of POWER1++. These upgraded versions were clocked higher than the original POWER1, made possible by improved semiconductor processes. The POWER1+ was clocked slightly higher than the original POWER1, at frequencies of 25, 33 and 41 MHz, while the POWER1++ took the microarchitecture to its highest frequencies — 25, 33, 41.6, 45, 50 and 62.5 MHz. In September 1993, the POWER1 and its variants was succeeded by the POWER2
POWER2
The POWER2, originally named RIOS2, is a processor designed by IBM that implemented the POWER instruction set architecture. The POWER2 was the successor of the POWER1, debuting in September 1993 within IBM's RS/6000 systems. When introduced, the POWER2 was the fastest microprocessor, surpassing the...
(known briefly as the "RIOS2"), an evolution of the POWER1 microarchitecture.
The direct derivatives of the POWER1 are the RISC Single Chip
RISC Single Chip
The RISC Single Chip, or RSC, is a single-chip microprocessor developed and fabricated by International Business Machines . The RSC was a feature-reduced single-chip implementation of the POWER1, a multi-chip central processing unit which implemented the POWER instruction set architecture...
(RSC), feature-reduced single-chip variant for entry-level RS/6000 systems, and the RAD6000
RAD6000
The RAD6000 radiation-hardened single board computer, based on the IBM RISC Single Chip CPU, was manufactured by IBM Federal Systems. IBM Federal Systems was sold to Loral, and by way of acquisition, ended up with Lockheed Martin and is currently a part of BAE Systems...
, a radiation-hardened variant of the RSC for space applications. An indirect derivative of the POWER1 is the PowerPC 601, a feature-reduced variant of the RSC intended for consumer applications.
The POWER1 is notable as it represented a number firsts for IBM and computing in general. It was IBM's first RISC processor intended for high-end applications (the ROMP
ROMP
The ROMP or Research Micro Processor was a 10 MHz RISC microprocessor designed by IBM in the early 1980s manufactured on a 2 µm process with 45,000 transistors....
was considered a commercial failure and was not used in high-end workstations), it was the first to implement the then new POWER instruction set architecture and it was IBM's first successful RISC processor. For computing firsts, the POWER1 would be known for being the first CPU to implement some form of Register renaming
Register renaming
In computer architecture, register renaming refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations.-Problem definition:...
and out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...
, a technique that improves the performance of superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
processors but was previously reserved for mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
s.
The POWER1 was also the origin for the highly successful families of POWER
IBM POWER
POWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....
, PowerPC
PowerPC
PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...
and Power Architecture
Power Architecture
Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi...
processors that followed it, measuring in hundreds of different implementations.
The open source GCC compiler
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
removed support for POWER1 (RIOS) and POWER2 (RIOS2) in the 4.5 release.
Microarchitecture
The POWER1 is a 32-bit32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....
two-way superscalar CPU. It contains three major execution units, a fixed-point unit
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(FXU), a branch unit
Branch predictor
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline...
(BPU) and a floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...
(FPU). Although the POWER1 is a 32-bit CPU with a 32-bit physical address
Address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.- Overview :...
, its virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...
is 52 bits long. The larger virtual address space was chosen because it was beneficial for the performance of applications, allowing each one to have a large 4 GB
Gigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...
address range.
The POWER1 is a big-endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
CPU that uses a Harvard style
Harvard architecture
The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters...
cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
hierarchy with separate instruction and data caches. The instruction cache, referred to as the "I-cache" by IBM, is 8 KB in size and is two-way set associative with a line size of 64 bytes. The I-cache is located on the ICU chip. The data cache, referred to as the "D-cache" by IBM, is 32 KB in size for RIOS.9 configurations and 64 KB in size for RIOS-1 configurations. The D-cache is four-way set associative with a line size of 128 bytes. The D-cache employs a store-back scheme, where data that is to be stored is written to the cache instead of the memory in order to reduce the number of writes destined for the memory. The store-back scheme is used to prevent the CPU from monopolizing access to the memory.
Although the POWER1 was a high-end design, it was not capable of multiprocessing
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...
, and as such was disadvantaged, as the only way performance could be improved was by clocking
Clock rate
The clock rate typically refers to the frequency that a CPU is running at.For example, a crystal oscillator frequency reference typically is synonymous with a fixed sinusoidal waveform, a clock rate is that frequency reference translated by electronic circuitry into a corresponding square wave...
the CPU higher, which was difficult to do with such a large multi-chip design. IBM used clustering to overcome this disadvantage in POWER1 systems, allowing them to effectively function as if they were multiprocessing systems, a concept proven by the popularity of SP1 supercomputers based on the POWER1. As the POWER1 was the basis of the POWER2 and P2SC microprocessors, the lack of multiprocessing was passed on to these later POWER processors. Multiprocessing was not supported until the introduction of the POWER3
POWER3
The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture , including all of the optional instructions of the ISA such as the POWER2. It was introduced on 5 October 1998, debuting in the RS/6000 43P...
in 1998.
Physical description
The POWER1 is a multi-chip CPU built from separate chips that are connected to each other by buses. The POWER1 consists of an instruction-cacheCPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
unit (ICU), a fixed-point unit
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(FXU), a floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...
(FPU), a number of data-cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
units (DCU), a storage-control
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
unit (SCU) and a I/O
I/O
I/O may refer to:* Input/output, a system of communication for information processing systems* Input-output model, an economic model of flow prediction between sectors...
unit. Due to its modular design, IBM was able to create two cofigurations by simply varying the number of DCUs, RIOS-1 and a RIOS.9. The RIOS-1 configuration has four DCUs, the intended amount, and was clocked at up to 40 MHz, whereas the RIOS.9 CPU had two DCUs and was clocked at lower frequencies.
The chips are mounted on the “CPU planar”, a printed circuit board
Printed circuit board
A printed circuit board, or PCB, is used to mechanically support and electrically connect electronic components using conductive pathways, tracks or signal traces etched from copper sheets laminated onto a non-conductive substrate. It is also referred to as printed wiring board or etched wiring...
(PCB), using through-hole technology. Due to the large number of chips with wide buses, the PCB has eight planes for routing wires, four for power and ground and four for signals. There are two signal planes on each side of the board, while the four power and ground planes are in the center.
The chips that make up the POWER1 is fabricated in a 1.0 µm CMOS
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
process with three layers of interconnect. The chips are packaged in ceramic pin grid array
CPGA
CPGA stands for Ceramic Pin Grid Array, a type of packaging used by integrated circuits. This type of packaging uses a ceramic substrate with pins arranged in a pin grid array. Some CPUs that use CPGA packaging are the AMD Socket A Athlons and the Duron....
(CPGA) packages that can have up to 300 pins and dissipate a maximum of 4 W
Watt
The watt is a derived unit of power in the International System of Units , named after the Scottish engineer James Watt . The unit, defined as one joule per second, measures the rate of energy conversion.-Definition:...
of heat each. The total number of transistor
Transistor
A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...
s featured by the POWER1, assuming that it is a RIOS-1 configuration, is 6.9 million, with 2.04 million used for logic and 4.86 million used for memory. The die area of all the chips combined is 1,284 mm². The total number of signal pins is 1,464.
Instruction-cache unit (ICU)
The ICU contains the instruction cache, referred to as the "I-cache" by IBM and the branch processing unitBranch predictor
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline...
(BPU). The BPU contains the program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
, the condition code register and a loop register. The ICU contains 0.75 million transistors with 0.2 million used for logic and 0.55 million used for SRAM
Static random access memory
Static random-access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM , it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit...
. The ICU die
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...
measures approximately 160 mm² (12.7 × 12.7 mm).
The BPU was capable of dispatching multiple instructions to the fixed and floating point instructions queues while it was executing a program flow control instruction (up to four simultaneously and out of order). Speculative branches
Branch (computer science)
A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...
were also supported by using a prediction bit in the branch instructions, with the results discarded before being saved if the branch was not taken. The alternate instruction would be buffered and discarded if the branch was taken. Consequently, subroutine call
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
s and interrupt
Interrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
s are dealt with without incurring branch penalties.
The condition code register has eight field sets, with the first two reserved for fixed and floating point instructions and the seventh for vector instructions
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
. The rest of the fields could be used by other instructions. The loop register is a counter for "decrement and branch on zero" loops with no branch penalty, a feature similar to those found in some DSP
Digital signal processor
A digital signal processor is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.-Typical characteristics:...
s such as the TMS320C30.
Fixed-point unit (FXU)
The FXU is responsible for decoding and executing all fixed-point instructions and floating-point load and store instructions. For execution, the FXU contains the POWER1's fixed-point register file, an arithmetic logic unit (ALU) for general instructions, and a dedicated fixed-point multiply and divide unit. It also contains instruction buffers that receive both fixed- and floating-point instructions from the ICU, passing on the floating-point instructions to the FPU, and a 128-entry two-way set-associative D-TLBTranslation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...
for address translation. The FXU contains approximately 0.5 million transistors, with 0.25 million used for logic and 0.25 used for memory, on a die measuring approximately 160 mm².
Floating-point unit (FPU)
The POWER1's floating point unit executes floating-point instructions issue by the ICU. The FPU is pipelined and can execute single precision (32-bit) and double precisionDouble precision
In computing, double precision is a computer number format that occupies two adjacent storage locations in computer memory. A double-precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point .Modern computers with 32-bit storage locations...
(64-bit) instructions. It is capable of performing multiply-add instructions, which contributed to the POWER1's high floating point performance. In most processors, a multiply and an add, which is common in technical and scientific floating-point code, cannot be executed in one cycle, as in the POWER1. Use of fused multiply–add also means that the data is only rounded once, improving the precision of the result slightly.
The floating-point register file is also located on the FPU chip. It contains 32 64-bit floating-point registers, six rename registers and two registers that are used by divide instructions.
Data-cache unit (DCU)
The POWER1 has a 64 KB data cacheCPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
implemented through four identical data-cache units (DCU), each containing 16 KB of data cache. The cache and the buses that connect the DCU to the other chips are ECC protected. The DCUs also provide the interface to the memory. If two DCUs are present (RIOS.9 configuration), the memory bus is 64 bits wide, and if four DCUs are present (RIOS-1 configuration), the memory bus is 128 bits wide. The memory interface portion of the DCUs provide three features that improves the reliability and availability of the memory: memory scrubbing
Memory scrubbing
Memory scrubbing is the process of detecting and correcting bit errors in computer memory by using error-detecting codes like ECC.-Motivation for scrubbing:...
, ECC and bit steering. Each DCU contains approximately 1.125 million transistors, with 0.175 million used for logic and 0.95 million used for SRAM, on a die measuring approximately 130 mm² (11.3 × 11.3 mm).
Storage-control unit (SCU)
The POWER1 is controlled by the SCU chip. All communications between the ICU, FXU and DCU chips as well as the memoryRam
-Animals:*Ram, an uncastrated male sheep*Ram cichlid, a species of freshwater fish endemic to Colombia and Venezuela-Military:*Battering ram*Ramming, a military tactic in which one vehicle runs into another...
and I/O
I/O
I/O may refer to:* Input/output, a system of communication for information processing systems* Input-output model, an economic model of flow prediction between sectors...
devices is arbitrated by the SCU. Although the DCUs provide the means to perform memory scrubbing, it is the SCU that controls the process. The SCU contains approximately 0.23 million transistors, all of them for logic, on a die measuring approximately 130 mm².
I/O unit
The POWER1's I/O interfaces are implemented by the I/O unit, which contains an I/O channel controller (IOCC) and two serial linkSerial communication
In telecommunication and computer science, serial communication is the process of sending data one bit at a time, sequentially, over a communication channel or computer bus. This is in contrast to parallel communication, where several bits are sent as a whole, on a link with several parallel channels...
adapters (SLAs). The IOCC implements the Micro Channel
Micro Channel architecture
Micro Channel Architecture was a proprietary 16- or 32-bit parallel computer bus introduced by IBM in 1987 which was used on PS/2 and other computers through the mid 1990s.- Background :...
interface and controls both I/O and DMA
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....
transactions between the Micro Channel adapters and the system memory. The two SLAs each implement a serial fibre optic
Optical fiber
An optical fiber is a flexible, transparent fiber made of a pure glass not much wider than a human hair. It functions as a waveguide, or "light pipe", to transmit light between the two ends of the fiber. The field of applied science and engineering concerned with the design and application of...
link, which are intended to connect RS/6000 systems together. The optical links were not supported at the time of the RS/6000's release. The I/O unit contains approximately 0.5 million transistors, with 0.3 million used for logic and 0.2 million used for memory, on a die measuring approximately 160 mm².
See also
- Processor architectures: IBM POWERIBM POWERPOWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....
, Power ArchitecturePower ArchitecturePower Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi... - Processors: RISC Single ChipRISC Single ChipThe RISC Single Chip, or RSC, is a single-chip microprocessor developed and fabricated by International Business Machines . The RSC was a feature-reduced single-chip implementation of the POWER1, a multi-chip central processing unit which implemented the POWER instruction set architecture...
, RAD6000RAD6000The RAD6000 radiation-hardened single board computer, based on the IBM RISC Single Chip CPU, was manufactured by IBM Federal Systems. IBM Federal Systems was sold to Loral, and by way of acquisition, ended up with Lockheed Martin and is currently a part of BAE Systems...
, POWER2POWER2The POWER2, originally named RIOS2, is a processor designed by IBM that implemented the POWER instruction set architecture. The POWER2 was the successor of the POWER1, debuting in September 1993 within IBM's RS/6000 systems. When introduced, the POWER2 was the fastest microprocessor, surpassing the...
, POWER3POWER3The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture , including all of the optional instructions of the ISA such as the POWER2. It was introduced on 5 October 1998, debuting in the RS/6000 43P...
, POWER4POWER4The POWER4 is a microprocessor developed by International Business Machines that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, and was used in RS/6000 and AS/400 computers, ending a separate...
, POWER5POWER5The POWER5 is a microprocessor developed and fabricated by IBM. It is an improved version of the highly successful POWER4. The principal improvements are support for simultaneous multithreading and an on-die memory controller...
, POWER6POWER6The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.03. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor...
, POWER7POWER7POWER7 is a Power Architecture microprocessor released in 2010 that succeeded the POWER6. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, Vermont; T. J. Watson Research Center, NY; Bromont, QC and Böblingen, Germany laboratories... - Computer Systems: RS/6000, Scalable POWERparallel
- Related technology: PowerPCPowerPCPowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...
, PowerPC 601, RS64