R4000
Encyclopedia
The R4000 is a microprocessor
developed by MIPS Computer Systems
that implemented the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment
(ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed due to a number of reasons, but the R4000 found success in the workstation and server markets.
superpipelined microprocessor with a eight-stage integer pipeline. During the first stage (IF), a virtual address for an instruction is generated and the instruction translation lookaside buffer
(TLB) begins the translation of the address to a physical address. In the second stage (IS), translation is completed and the instruction is fetched from an internal 8 KB instruction cache. The instruction cache is direct-mapped and virtually indexed, physically tagged. It has a 16- or 32-byte line size. Architecturally, it could be expanded to 32 KB.
During the third stage (RF), the instruction is decoded and the register file
is read. The MIPS III defines two register files, one for the integer unit and the other for floating-point. Each register file is 64 bits wide and contained 32 entries. The integer register file has two read ports and one write port, while the floating-point register file has two read ports and two write ports. Execution begins at stage four (EX) for both integer and floating-point instructions; and is written back to the register files when completed in stage eight (WB). Results may be bypassed if possible.
(ALU), a shifter, multiplier and divider and load aligner for executing integer instructions. The ALU consists of a 64-bit carry-select adder and a logic unit and is pipelined. The shifter is a 32-bit barrel shifter
. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; where as divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.
Load and store instructions are executed by the integer pipeline, and access the on-chip 8 KB data cache. (What was the read latency?)
The FPU can operate in parallel with the ALU unless there is a data or resource dependency, which causes it to stall. It contains three sub-units units: an adder, a multiplier and a divider. The multiplier and divider can execute an instruction in parallel with the adder, but they use the adder in their final stages of execution, thus imposing limits to overlapping execution. Thus, under certain conditions, it can execute up to three instructions at any time, one in each unit. The FPU is capable of retiring one instruction per cycle.
The adder and multiplier are pipelined. The multiplier has a four-stage multiplier pipeline. It is clocked at twice the clock frequency of the microprocessor for adequate performance and uses dynamic logic
to achieve the high clock frequency. Division has a 23- or 36-cycle latency for single- or double-precision operations and square-root has a 54- or 112-cycle latency. Division and square-root uses the SRT algorithm.
(MMU) uses a 48-entry translation lookaside buffer
to translate virtual address
es. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64-bits for 1 TB of virtual memory
. The remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address
, thus is able to address 64 GB of physical memory.
(SRAM). The data and tag buses are ECC-protected.
(CMOS) process. As MIPS was a fabless company, the R4000 was fabricated by partners in their own processes, which had a 0.8 µm minimum feature size.
(PLL).
(CPGA). The R4000SC and R4000MC were packaged in a 447-pin ceramic staggered pin grid array
(SPGA). The pin out of the R4000MC is different from the R4000SC, with some pins which are unused on the R4000SC used for signals to implement cache coherency
on the R4000MC. The pin-out of the R4000PC was similar to that of the PGA-packaged R4200
and R4600
microprocessors. This characteristic enabled a properly designed system to use any of the three microprocessors.
The R4400 was licensed by Integrated Device Technology
(IDT), LSI Logic, NEC
, Performance Semiconductor, Siemens AG
and Toshiba
. IDT, NEC, Siemens and Toshiba fabricated and marketed the microprocessor. LSI Logic used the R4400 in custom products. Performance Semiconductor sold their logic division to Cypress Semiconductor
where the MIPS microprocessor products were discontinued.
NEC marketed their version as the VR4400. The first version, a 150 MHz part, was announced in November 1992. Early versions were fabricated in a 0.6 µm process. In mid-1995, a 250 MHz part began sampling. It was fabricated in a 0.35 µm four-layer-metal process. NEC also produced the MR4401, a ceramic multi-chip module
(MCM) that contained a VR4400SC with ten 1 Mbit SRAM chips that implemented a 1 MB secondary cache. The MCM was pin-compatible with the R4x00PC. The first version, a 150 MHz part, was announced in 1994. In 1995, a 200 MHz part was announced.
Toshiba marketed their version as the TC86R4400. A 200 MHz part containing 2.3 million transistors and measuring 134 mm2 fabricated in a 0.3 µm process was introduced in mid-1994. The R4400PC was priced at $1,600, the R4400SC at $1,950, and the R4400MC at $2,150 in quantities of 10,000.
chipset. Toshiba developed the Tiger Shark chipset, which adapted the SysAD bus to a i486-compatible system bus.
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
developed by MIPS Computer Systems
MIPS Technologies
MIPS Technologies, Inc. , formerly MIPS Computer Systems, Inc., is most widely known for developing the MIPS architecture and a series of pioneering RISC chips. MIPS provides processor architectures and cores for digital home, networking and mobile applications.MIPS Computer Systems Inc. was...
that implemented the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment
Advanced Computing Environment
The Advanced Computing Environment was defined by an industry consortium in the early 1990s to be the next generation commodity computing platform, the successor to personal computers based on Intel's 32-bit instruction set architecture...
(ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed due to a number of reasons, but the R4000 found success in the workstation and server markets.
Models
There were three configurations of the R4000: the R4000PC, an entry-level model with no support for a secondary cache; the R4000SC, a model with secondary cache but no multiprocessor capability; and the R4000MC, a model with secondary cache and support for the cache coherency protocols required by multiprocessor systems.Description
The R4000 was a scalarScalar processor
Scalar processors represent the simplest class of computer processors. A scalar processor processes one datum at a time . , a scalar processor is classified as a SISD processor .In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items...
superpipelined microprocessor with a eight-stage integer pipeline. During the first stage (IF), a virtual address for an instruction is generated and the instruction translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...
(TLB) begins the translation of the address to a physical address. In the second stage (IS), translation is completed and the instruction is fetched from an internal 8 KB instruction cache. The instruction cache is direct-mapped and virtually indexed, physically tagged. It has a 16- or 32-byte line size. Architecturally, it could be expanded to 32 KB.
During the third stage (RF), the instruction is decoded and the register file
Register file
A register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...
is read. The MIPS III defines two register files, one for the integer unit and the other for floating-point. Each register file is 64 bits wide and contained 32 entries. The integer register file has two read ports and one write port, while the floating-point register file has two read ports and two write ports. Execution begins at stage four (EX) for both integer and floating-point instructions; and is written back to the register files when completed in stage eight (WB). Results may be bypassed if possible.
Integer execution
The R4000 has an arithmetic logic unitArithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(ALU), a shifter, multiplier and divider and load aligner for executing integer instructions. The ALU consists of a 64-bit carry-select adder and a logic unit and is pipelined. The shifter is a 32-bit barrel shifter
Barrel shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle. It can be implemented as a sequence of multiplexers , and in such an implementation the output of one mux is connected to the input of the next mux in a way that depends on the shift...
. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; where as divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.
Load and store instructions are executed by the integer pipeline, and access the on-chip 8 KB data cache. (What was the read latency?)
Floating-point execution
The R4000 has an on-die IEEE 754-1985-compliant floating-point unit (FPU), referred to as the R4010. The FPU is a coprocessor designated CP1 (the MIPS ISA defined four coprocessors, designated CP0 to CP3). The FPU can operate in two modes, 32- or 64-bit which are selected by setting a bit, the FR bit, in the CPU status register. In 32-bit mode, the 32 floating-point registers become 32 bits wide when used to hold single-precision floating-point numbers. When used to hold double-precision numbers, there are 16 floating-point registers (the registers are paired).The FPU can operate in parallel with the ALU unless there is a data or resource dependency, which causes it to stall. It contains three sub-units units: an adder, a multiplier and a divider. The multiplier and divider can execute an instruction in parallel with the adder, but they use the adder in their final stages of execution, thus imposing limits to overlapping execution. Thus, under certain conditions, it can execute up to three instructions at any time, one in each unit. The FPU is capable of retiring one instruction per cycle.
The adder and multiplier are pipelined. The multiplier has a four-stage multiplier pipeline. It is clocked at twice the clock frequency of the microprocessor for adequate performance and uses dynamic logic
Dynamic logic (digital logic)
In integrated circuit design, dynamic logic is a design methodology in combinatorial logic circuits, particularly those implemented in MOS technology. It is distinguished from the so-called static logic by exploiting temporary storage of information in stray and gate capacitances...
to achieve the high clock frequency. Division has a 23- or 36-cycle latency for single- or double-precision operations and square-root has a 54- or 112-cycle latency. Division and square-root uses the SRT algorithm.
Memory management
The memory management unitMemory management unit
A memory management unit , sometimes called paged memory management unit , is a computer hardware component responsible for handling accesses to memory requested by the CPU...
(MMU) uses a 48-entry translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...
to translate virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...
es. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64-bits for 1 TB of virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
. The remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address
Physical address
In computing, a physical address, also real address, or binary address, is the memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory.In a computer with virtual memory, the...
, thus is able to address 64 GB of physical memory.
Secondary cache
The R4000 (SC and MC configurations only) supported an external secondary cache with a capacity of 128 KB to 4 MB. The cache was accessed via a dedicated 128-bit data bus. The secondary cache could be configured either as a unified cache or as a split instruction and data cache. In the latter configuration, the each cache can have a capacity of 128 KB to 2 MB. The secondary cache is physically indexed, physically tagged and has a programmable line size of 128, 256, 512 or 1,024 bytes. The cache controller is on-die. The cache is built from standard static random access memoryStatic random access memory
Static random-access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM , it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit...
(SRAM). The data and tag buses are ECC-protected.
System bus
The R4000 used a 64-bit system bus called the SysAD bus. The SysAD bus was an address and data multiplexed bus, that is, it used the same set of wires to transfer data and addresses. While this reduced bandwidth, it was also less expensive than providing a separate address bus, which would have required more pins and increased the complexity of the system. The SysAD bus can be configured to operate at half, a third or a quarter of the internal clock frequency. The SysAD bus generates its clock signal by dividing the operating frequency.Transistor count, die dimensions and process details
The R4000 contained 1.2 million transistors. It was designed for a 1.0 µm two-layer metal complementary metal–oxide–semiconductorCMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
(CMOS) process. As MIPS was a fabless company, the R4000 was fabricated by partners in their own processes, which had a 0.8 µm minimum feature size.
Clocking
The R4000 generates the various clock signals from a master clock signal generated externally. For the operating frequency, the R4000 multiplies the master clock signal by two by use of an on-die phase-locked loopPhase-locked loop
A phase-locked loop or phase lock loop is a control system that generates an output signal whose phase is related to the phase of an input "reference" signal. It is an electronic circuit consisting of a variable frequency oscillator and a phase detector...
(PLL).
Packaging
The R4000PC was packaged in a 179-pin ceramic pin grid arrayPin grid array
A pin grid array, often abbreviated PGA, is a type of integrated circuit packaging. In a PGA, the package is square or roughly square, and the pins are arranged in a regular array on the underside of the package...
(CPGA). The R4000SC and R4000MC were packaged in a 447-pin ceramic staggered pin grid array
Staggered Pin Grid Array
A staggered pin grid array or SPGA is a style of arranging pins on an integrated circuit package. It consists of two square arrays of pins, offset in both directions by half the minimum distance between pins in one of the arrays. Put differently: within a square boundary the pins form a diagonal...
(SPGA). The pin out of the R4000MC is different from the R4000SC, with some pins which are unused on the R4000SC used for signals to implement cache coherency
Cache coherency
In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...
on the R4000MC. The pin-out of the R4000PC was similar to that of the PGA-packaged R4200
R4200
The R4200 is a microprocessor designed by MIPS Technologies, Inc. that implemented the MIPS III instruction set architecture . It was also known as the VRX during development. The microprocessor was licensed to NEC, and the company fabricated and marketed it as the VR4200. The first VR4200, an...
and R4600
R4600
The R4600, code-named "Orion", is a microprocessor developed by Quantum Effect Design that implemented the MIPS III instruction set architecture . As QED was a design firm that did not fabricate or sell their designs, the R4600 was first licensed to Integrated Device Technology , and later to...
microprocessors. This characteristic enabled a properly designed system to use any of the three microprocessors.
R4400
The R4400 was a further development of the R4000. It was announced in early November 1992. Samples of the microprocessor had been shipped to selected customers before then, with general availability in January 1993. The R4400 operated at higher clock frequencies of 100, 133, 150, 200, and 250 MHz. The only major improvement from the R4000 were larger primary caches, which were doubled in capacity to 16 KB each from 8 KB each. It contained 2.3 million transistors.The R4400 was licensed by Integrated Device Technology
Integrated Device Technology
Integrated Device Technology, Inc. is a publicly traded corporation headquartered in San Jose, California, that designs, manufactures, and markets low-power, high-performance mixed-signal semiconductor solutions for the advanced communications, computing, and consumer industries. The company...
(IDT), LSI Logic, NEC
NEC
, a Japanese multinational IT company, has its headquarters in Minato, Tokyo, Japan. NEC, part of the Sumitomo Group, provides information technology and network solutions to business enterprises, communications services providers and government....
, Performance Semiconductor, Siemens AG
Siemens AG
Siemens AG is a German multinational conglomerate company headquartered in Munich, Germany. It is the largest Europe-based electronics and electrical engineering company....
and Toshiba
Toshiba
is a multinational electronics and electrical equipment corporation headquartered in Tokyo, Japan. It is a diversified manufacturer and marketer of electrical products, spanning information & communications equipment and systems, Internet-based solutions and services, electronic components and...
. IDT, NEC, Siemens and Toshiba fabricated and marketed the microprocessor. LSI Logic used the R4400 in custom products. Performance Semiconductor sold their logic division to Cypress Semiconductor
Cypress Semiconductor
Cypress Semiconductor Corporation is a Silicon Valley-based semiconductor design and manufacturing company founded by T. J. Rodgers and others from Advanced Micro Devices. It was formed in 1982 with backing by Sevin Rosen and went public in 1986. The company initially focused on the design and...
where the MIPS microprocessor products were discontinued.
NEC marketed their version as the VR4400. The first version, a 150 MHz part, was announced in November 1992. Early versions were fabricated in a 0.6 µm process. In mid-1995, a 250 MHz part began sampling. It was fabricated in a 0.35 µm four-layer-metal process. NEC also produced the MR4401, a ceramic multi-chip module
Multi-Chip Module
A multi-chip module is a specialized electronic package where multiple integrated circuits , semiconductor dies or other discrete components are packaged onto a unifying substrate, facilitating their use as a single component...
(MCM) that contained a VR4400SC with ten 1 Mbit SRAM chips that implemented a 1 MB secondary cache. The MCM was pin-compatible with the R4x00PC. The first version, a 150 MHz part, was announced in 1994. In 1995, a 200 MHz part was announced.
Toshiba marketed their version as the TC86R4400. A 200 MHz part containing 2.3 million transistors and measuring 134 mm2 fabricated in a 0.3 µm process was introduced in mid-1994. The R4400PC was priced at $1,600, the R4400SC at $1,950, and the R4400MC at $2,150 in quantities of 10,000.
Users
The R4400 was used by:- Carrera Computers in their Windows NTWindows NTWindows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...
personal computers and workstations - Concurrent Computer CorporationConcurrent Computer CorporationConcurrent Computer Corporation is a developer and provider of Video on demand systems to Multiple Service Organizations. Concurrent's On-Demand technology is based on off-the-shelf hardware and customized open-source software including RedHawk Linux, a customized version of Red Hat Enterprise...
in their real-time multiprocessor Maxion systems - DeskStation TechnologyDeskStation TechnologyDeskStation Technology was a manufacturer of RISC-based computer workstations intended to run Windows NT. DeskStation was based in Lenexa, Kansas.-MIPS-based systems:...
in their Windows NT personal computers and DeskStation TyneDeskStation TyneThe DeskStation Tyne was a line of computer workstations made by DeskStation Technology and based on the MIPS R4000 and R4400 RISC microprocessors...
workstation - NEC CorporationNEC, a Japanese multinational IT company, has its headquarters in Minato, Tokyo, Japan. NEC, part of the Sumitomo Group, provides information technology and network solutions to business enterprises, communications services providers and government....
in their RISCstationNEC RISCstationThe NEC RISCstation was a line of computer workstations made by NEC in the mid-1990s, based on MIPS RISC microprocessors and designed to run Microsoft Windows NT...
workstations, RISCserver servers, and Cenju-3 supercomputer - NeTPower in their Windows NT workstations and servers
- Pyramid TechnologyPyramid TechnologyPyramid Technology Corporation was a computer company that produced a number of RISC-based minicomputers at the upper end of the performance range. They also became the second company to ship a multiprocessor Unix system , in 1985, which formed the basis of their product line into the early 1990s...
used the R4400MC in their Nile Series servers - Siemens Nixdorf InformationssystemeSiemens Nixdorf InformationssystemeSiemens Nixdorf Informationssysteme, AG was formed in 1990 by the merger of Nixdorf Computer AG and the Siemens' Data Information Services division...
(SNI) in their RM-series UNIXUnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
servers and SR2000 mainframeMainframe computerMainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the... - Silicon Graphics, Inc. in their IndigoSGI IndigoThe Indigo, introduced as the IRIS Indigo, was a line of workstation computers developed and manufactured by Silicon Graphics, Inc. . The first Indigo, code-named "Hollywood", was introduced on 22 July 1991...
, Indigo2, and IndySGI IndyThe Indy, code-named "Guinness", is a low-end workstation introduced on 12 July 1993. Developed and manufactured by Silicon Graphics Incorporated , it was the result of their attempt to obtain a share of the low-end computer-aided design market, which was dominated at the time by other workstation...
workstations; and in their ChallengeSGI ChallengeThe Challenge, code-named Eveready and Terminator , is a family of server computers and supercomputers developed and manufactured by Silicon Graphics in the early to mid-1990s that succeeded the earlier Power series systems...
server - Tandem ComputersTandem ComputersTandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...
in their NonStop HimalayaNonStopNonStop can refer to the line of HP Integrity NonStop computers, the line of Tandem NonStop computers that preceded them, or the NonStop OS operating system that is designed for them. NonStop systems are based on an integrated hardware/software stack...
fault-tolerant servers
Core logic chipsets
The R4000 and R4400 required external core logic to interface to the system. Both commercially-available and proprietary core logic were developed for these microprocessors. Proprietary designs were developed by system vendors such as SGI for use in its own systems. Commercial chipsets were developed by Acer, and MIPS microprocessors fabricators NEC, and Toshiba. Acer developed the PICAAcer PICA
The M6100 PICA is a system logic chipset designed by Acer Laboratories introduced in 1993. PICA stands for Performance-enhanced Input-output and CPU Architecture. It was based on the Jazz architecture developed by Microsoft and supported the MIPS Technologies R4000 or R4400 microprocessors...
chipset. Toshiba developed the Tiger Shark chipset, which adapted the SysAD bus to a i486-compatible system bus.