Microcode
Encyclopedia
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code
instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed circuit-level operations. It helps separate the machine instructions from the underlying electronics
so that instructions can be designed and altered more freely. It also makes it feasible to build complex multi-step instructions while still reducing the complexity of the electronic circuitry compared to other methods. Writing microcode is often called microprogramming and the microcode in a particular processor implementation is sometimes called a microprogram.
Modern microcode is normally written by an engineer during the processor design phase and stored in a ROM (read-only memory
) or PLA (programmable logic array
) structure,
or a combination of both.
However, machines also exists which have some (or all) microcode stored in SRAM
or flash memory
. This is traditionally denoted a writeable control store
in the context of computers. Complex digital processors may also employ more than one (possibly microcode based) control unit in order to delegate sub-tasks which must be performed (more or less) asynchronously in parallel. Microcode is generally not visible or changeable by a normal programmer, not even by an assembly
programmer. Unlike machine code which often retains some compatibility among different processors in a family, microcode only runs on the exact electronic circuit
ry for which it is designed, as it constitutes an inherent part of the particular processor design itself.
More extensive microcoding has also been used to allow small and simple microarchitecture
s to emulate
more powerful architectures with wider word length, more execution unit
s and so on; a relatively simple way to achieve software compatibility between different products in a processor family.
Some hardware vendors, especially IBM
, use the term as a synonym for firmware
, so that all code in a device, whether microcode or machine code
, is termed microcode (such as in a hard drive for instance, which typically contains both).
The microcode usually does not reside in the main memory, but in a special high speed memory, called the control store
. It might be either read-only
or read-write memory
. In the latter case the microcode would be loaded into the control store from some other storage medium as part of the initialization of the CPU, and it could be altered to correct bugs in the instruction set, or to implement new machine instructions.
Microprograms consist of series of microinstructions. These microinstructions control the CPU at a very fundamental level of hardware circuitry. For example, a single typical microinstruction might specify the following operations:
To simultaneously control all processor's features in one cycle, the microinstruction is often wider than 50 bits, e.g., 128 bits on a 360/85 with an emulator feature. Microprograms are carefully designed and optimized for the fastest possible execution, since a slow microprogram would yield a slow machine instruction which would in turn cause all programs using that instruction to be slow.
s were "hardwired". Each step needed to fetch, decode and execute the machine instructions (including any operand address calculations, reads and writes) was controlled directly by combinatorial logic and rather minimal sequential state machine circuitry. While very efficient, the need for powerful instruction sets with multi-step addressing and complex operations (see below) made such "hard-wired" processors difficult to design and debug; highly encoded and varied-length instructions can contribute to this as well, especially when very irregular encodings are used.
Microcode simplified the job by allowing much of the processor's behaviour and programming model to be defined via microprogram routines rather than by dedicated circuitry. Even late in the design process, microcode could easily be changed, whereas hard wired CPU designs were very cumbersome to change, so this greatly facilitated CPU design.
From the 1940s to the late 1970s, much programming was done in assembly language
; higher level instructions meant greater programmer productivity, so an important advantage of microcode was the relative ease by which powerful machine instructions could be defined. During the 1970s, CPU speeds grew more quickly than memory speeds and numerous techniques such as memory block transfer, memory pre-fetch and multi-level cache
s were used to alleviate this. High level machine instructions, made possible by microcode, helped further, as fewer more complex machine instructions require less memory bandwidth. For example, an operation on a character string could be done as a single machine instruction, thus avoiding multiple instruction fetches.
Architectures with instruction sets implemented by complex microprograms included the IBM
System/360
and Digital Equipment Corporation
VAX
. The approach of increasingly complex microcode-implemented instruction sets was later called CISC
. An alternate approach, used in many microprocessor
s, is to use PLA
s and/or ROM
s (instead of combinatorial logic) mainly for instruction decoding, and let a simple state machine (without much, or any, microcode) do most of the sequencing. The various practical uses of microcode and related techniques (such as PLAs) have been numerous over the years, as well as approaches to where, and to which extent, it should be used. It is still used in modern CPU designs.
Doing so is important if binary program compatibility is a priority. That way previously existing programs can run on totally new hardware without requiring revision and recompilation. However there may be a performance penalty for this approach. The tradeoffs between application backward compatibility vs CPU performance are hotly debated by CPU design engineers.
The IBM System/360 has a 32-bit architecture with 16 general-purpose registers, but most of the System/360 implementations actually use hardware that implemented a much simpler underlying microarchitecture; for example, the System/360 Model 30 had 8-bit data paths to the arithmetic logic unit
(ALU) and main memory and implemented the general-purpose registers in a special unit of higher-speed core memory, and the System/360 Model 40 had 8-bit data paths to the ALU and 16-bit data paths to main memory and also implemented the general-purpose registers in a special unit of higher-speed core memory. The Model 50 and Model 65 had full 32-bit data paths; the Model 50 implemented the general-purpose registers in a special unit of higher-speed core memory and the Model 65 implemented the general-purpose registers in faster transistor circuits. In this way, microprogramming enabled IBM to design many System/360 models with substantially different hardware and spanning a wide range of cost and performance, while making them all architecturally compatible. This dramatically reduced the amount of unique system software that had to be written for each model.
A similar approach was used by Digital Equipment Corporation in their VAX family of computers. Initially a 32-bit TTL
processor in conjunction with supporting microcode implemented the programmer-visible architecture. Later VAX versions used different microarchitectures, yet the programmer-visible architecture did not change.
Microprogramming also reduced the cost of field changes to correct defects (bugs) in the processor; a bug could often be fixed by replacing a portion of the microprogram rather than by changes being made to hardware logic and wiring.
introduced the concept of a control store
as a way to simplify computer design and move beyond ad hoc
methods. The control store was a diode matrix
: a two-dimensional lattice, where one dimension accepted "control time pulses" from the CPU's internal clock, and the other connected to control signals on gates and other circuits. A "pulse distributor" would take the pulses generated by the CPU clock and break them up into eight separate time pulses, each of which would activate a different row of the lattice. When the row was activated, it would activate the control signals connected to it.
Described another way, the signals transmitted by the control store are being played much like a player piano
roll. That is, they are controlled by a sequence of very wide words constructed of bit
s, and they are "played" sequentially. In a control store, however, the "song" is short and repeated continuously.
In 1951 Maurice Wilkes enhanced this concept by adding conditional execution, a concept akin to a conditional
in computer software. His initial implementation consisted of a pair of matrices, the first one generated signals in the manner of the Whirlwind control store, while the second matrix selected which row of signals (the microprogram instruction word, as it were) to invoke on the next cycle. Conditionals were implemented by providing a way that a single line in the control store could choose from alternatives in the second matrix. This made the control signals conditional on the detected internal signal. Wilkes coined the term microprogramming to describe this feature and distinguish it from a simple control store.
To take advantage of this, computers were divided into several parts:
A microsequencer
picked the next word of the control store
. A sequencer is mostly a counter, but usually also has some way to jump to a different part of the control store depending on some data, usually data from the instruction register
and always some part of the control store. The simplest sequencer is just a register loaded from a few bits of the control store.
A register
set is a fast memory containing the data of the central processing unit. It may include the program counter, stack pointer, and other numbers that are not easily accessible to the application programmer. Often the register set is a triple-ported register file
, that is, two registers can be read, and a third written at the same time.
An arithmetic and logic unit performs calculations, usually addition, logical negation, a right shift, and logical AND. It often performs other functions, as well.
There may also be a memory address register
and a memory data register
, used to access the main computer storage
.
Together, these elements form an "execution unit
". Most modern CPUs
have several execution units. Even simple computers usually have one unit to read and write memory, and another to execute user code.
These elements could often be bought together as a single chip. This chip came in a fixed width which would form a 'slice' through the execution unit. These were known as 'bit slice
' chips. The AMD Am2900
family is one of the best known examples of bit slice elements.
The parts of the execution units, and the execution units themselves are interconnected by a bundle of wires called a bus
.
Programmers develop microprograms. The basic tools are software: A microassembler
allows a programmer to define the table of bits symbolically. A simulator program executes the bits in the same way as the electronics (hopefully), and allows much more freedom to debug the microprogram.
After the microprogram is finalized, and extensively tested, it is sometimes used as the input to a computer program that constructs logic to produce the same data. This program is similar to those used to optimize a programmable logic array
. No known computer program can produce optimal logic, but even pretty good logic can vastly reduce the number of transistors from the number required for a ROM control store. This reduces the cost and power used by a CPU.
Microcode can be characterized as horizontal or vertical. This refers primarily to whether each microinstruction directly controls CPU elements (horizontal microcode), or requires subsequent decoding by combinatorial logic before doing so (vertical microcode). Consequently each horizontal microinstruction is wider (contains more bits) and occupies more storage space than a vertical microinstruction.
In a typical implementation a horizontal microprogram word comprises fairly tightly defined groups of bits. For example, one simple arrangement might be:
For this type of micromachine to implement a JUMP instruction with the address following the opcode, the microcode might require two clock ticks; the engineer designing it would write microassembler source code looking something like this:
# Any line starting with a number-sign is a comment
# This is just a label, the ordinary way assemblers symbolically represent a
# memory address.
InstructionJUMP:
# To prepare for the next instruction, the instruction-decode microcode has already
# moved the program counter to the memory address register. This instruction fetches
# the target address of the jump instruction from the memory word following the
# jump opcode, by copying from the memory data register to the memory address register.
# This gives the memory system two clock ticks to fetch the next
# instruction to the memory data register for use by the instruction decode.
# The sequencer instruction "next" means just add 1 to the control word address.
MDR, NONE, MAR, COPY, NEXT, NONE
# This places the address of the next instruction into the PC.
# This gives the memory system a clock tick to finish the fetch started on the
# previous microinstruction.
# The sequencer instruction is to jump to the start of the instruction decode.
MAR, 1, PC, ADD, JMP, InstructionDecode
# The instruction decode is not shown, because it is usually a mess, very particular
# to the exact processor being emulated. Even this example is simplified.
# Many CPUs have several ways to calculate the address, rather than just fetching
# it from the word following the op-code. Therefore, rather than just one
# jump instruction, those CPUs have a family of related jump instructions.
For each tick it is common to find that only some portions of the CPU are used, with the remaining groups of bits in the microinstruction being no-ops. With careful design of hardware and microcode this property can be exploited to parallelise operations which use different areas of the CPU, for example in the case above the ALU is not required during the first tick so it could potentially be used to complete an earlier arithmetic instruction.
Some vertical microcodes are just the assembly language of a simple conventional computer that is emulating a more complex computer.
Another form of vertical microcode has two fields:
The "field select" selects which part of the CPU will be controlled by this word of the control store.
The "field value" actually controls that part of the CPU.
With this type of microcode, a designer explicitly chooses to make a slower CPU to save money by reducing the unused bits in the control store;
however, the reduced complexity may increase the CPU's clock frequency, which lessens the effect of an increased number of cycles per instruction.
As transistors became cheaper, horizontal microcode came to dominate the design of CPUs using microcode, with vertical microcode no longer being used.
Such a computer is sometimes called a Writable Instruction Set Computer or WISC.
Many of these machines were experimental laboratory prototypes, such as the WISC CPU/16
and the RTX 32P.
There were also commercial machines that used writable microcode, such as early Xerox
workstations, the DEC
VAX
8800 ("Nautilus") family, the Symbolics
L- and G-machines, and a number of IBM
System/370
implementations. Some DEC PDP-10
machines stored their microcode in SRAM chips (about 80 bits wide x 2 Kwords), which was typically loaded on power-on through some other front-end CPU. Many more machines offered user-programmable writable control stores as an option (including the HP
2100
, DEC PDP-11/60
and Varian Data Machines
V-70 series minicomputer
s).
The Mentec M11 and Mentec M1 stored its microcode in SRAM chips, loaded on power-on through another CPU.
The Data General Eclipse MV/8000
("Eagle") had a SRAM writable control store, loaded on power-on through another CPU.
WCS offered several advantages including the ease of patching the microprogram and, for certain hardware generations, faster access than ROMs could provide. User-programmable WCS allowed the user to optimize the machine for specific purposes.
Some CPU designs compile the instruction set to a writable RAM
or FLASH
inside the CPU (such as the Rekursiv
processor and the Imsys Cjip), or an FPGA (reconfigurable computing
).
A CPU that uses microcode generally takes several clock cycles to execute a single instruction, one clock cycle for each step in the microprogram for that instruction. Some CISC
processors include instructions that can take a very long time to execute. Such variations interfere with both interrupt
latency
and, what is far more important in modern systems, pipelining.
Several Intel CPUs in the x86 architecture family have writable microcode.
This has allowed bugs in the Intel Core 2
microcode and Intel Xeon
microcode to be fixed in software, rather than requiring the entire chip to be replaced.
Such fixes can be installed by Linux, FreeBSD
, Microsoft Windows, or the motherboard BIOS.
It should be mentioned that there are counter-points as well:
Many RISC and VLIW
processors are designed to execute every instruction (as long as it is in the cache) in a single cycle. This is very similar to the way CPUs with microcode execute one microinstruction per cycle. VLIW
processors have instructions that behave similarly to very wide horizontal microcode, although typically without such fine-grained control over the hardware as provided by microcode. RISC instructions are sometimes similar to the narrow vertical microcode.
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed circuit-level operations. It helps separate the machine instructions from the underlying electronics
Electronics
Electronics is the branch of science, engineering and technology that deals with electrical circuits involving active electrical components such as vacuum tubes, transistors, diodes and integrated circuits, and associated passive interconnection technologies...
so that instructions can be designed and altered more freely. It also makes it feasible to build complex multi-step instructions while still reducing the complexity of the electronic circuitry compared to other methods. Writing microcode is often called microprogramming and the microcode in a particular processor implementation is sometimes called a microprogram.
Modern microcode is normally written by an engineer during the processor design phase and stored in a ROM (read-only memory
Read-only memory
Read-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
) or PLA (programmable logic array
Programmable logic array
A programmable logic array is a kind of programmable logic device used to implement combinational logic circuits. The PLA has a set of programmable AND gate planes, which link to a set of programmable OR gate planes, which can then be conditionally complemented to produce an output...
) structure,
or a combination of both.
However, machines also exists which have some (or all) microcode stored in SRAM
Static random access memory
Static random-access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM , it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit...
or flash memory
Flash memory
Flash memory is a non-volatile computer storage chip that can be electrically erased and reprogrammed. It was developed from EEPROM and must be erased in fairly large blocks before these can be rewritten with new data...
. This is traditionally denoted a writeable control store
Writeable control store
Writeable control store may refer to* Writable control store , memory used to load the operating system of the Amiga 1000* Writable control store, memory used to store microprograms on machines where the microcode is reloadable...
in the context of computers. Complex digital processors may also employ more than one (possibly microcode based) control unit in order to delegate sub-tasks which must be performed (more or less) asynchronously in parallel. Microcode is generally not visible or changeable by a normal programmer, not even by an assembly
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
programmer. Unlike machine code which often retains some compatibility among different processors in a family, microcode only runs on the exact electronic circuit
Electronic circuit
An electronic circuit is composed of individual electronic components, such as resistors, transistors, capacitors, inductors and diodes, connected by conductive wires or traces through which electric current can flow...
ry for which it is designed, as it constitutes an inherent part of the particular processor design itself.
More extensive microcoding has also been used to allow small and simple microarchitecture
Microarchitecture
In computer engineering, microarchitecture , also called computer organization, is the way a given instruction set architecture is implemented on a processor. A given ISA may be implemented with different microarchitectures. Implementations might vary due to different goals of a given design or...
s to emulate
Emulator
In computing, an emulator is hardware or software or both that duplicates the functions of a first computer system in a different second computer system, so that the behavior of the second system closely resembles the behavior of the first system...
more powerful architectures with wider word length, more execution unit
Execution unit
In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the Branch Unit, which receives data from the CPU...
s and so on; a relatively simple way to achieve software compatibility between different products in a processor family.
Some hardware vendors, especially IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
, use the term as a synonym for firmware
Firmware
In electronic systems and computing, firmware is a term often used to denote the fixed, usually rather small, programs and/or data structures that internally control various electronic devices...
, so that all code in a device, whether microcode or machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
, is termed microcode (such as in a hard drive for instance, which typically contains both).
Overview
The elements composing a microprogram exist on a lower conceptual level than a normal application program. Each element is differentiated by the "micro" prefix to avoid confusion: microinstruction, microassembler, microprogrammer, microarchitecture, etc.The microcode usually does not reside in the main memory, but in a special high speed memory, called the control store
Control store
A control store is the part of a CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. Early control stores were implemented as a diode-array accessed via address decoders, a form of read-only memory. This tradition dates back to the program timing...
. It might be either read-only
Read-only memory
Read-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
or read-write memory
Read-write memory
Read-write memory is a type of computer memory that may be relatively easily written to as well as read from . The term RAM is often used to describe writable memory. RAM actually referring to memory that can be accessed at any "location"....
. In the latter case the microcode would be loaded into the control store from some other storage medium as part of the initialization of the CPU, and it could be altered to correct bugs in the instruction set, or to implement new machine instructions.
Microprograms consist of series of microinstructions. These microinstructions control the CPU at a very fundamental level of hardware circuitry. For example, a single typical microinstruction might specify the following operations:
- Connect Register 1 to the "A" side of the ALUArithmetic logic unitIn computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
- Connect Register 7 to the "B" side of the ALUArithmetic logic unitIn computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
- Set the ALU to perform two's-complement addition
- Set the ALU's carry input to zero
- Store the result value in Register 8
- Update the "condition codes" with the ALU status flags ("Negative", "Zero", "Overflow", and "Carry")
- Microjump to MicroPCProgram counterThe program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
nnn for the next microinstruction
To simultaneously control all processor's features in one cycle, the microinstruction is often wider than 50 bits, e.g., 128 bits on a 360/85 with an emulator feature. Microprograms are carefully designed and optimized for the fastest possible execution, since a slow microprogram would yield a slow machine instruction which would in turn cause all programs using that instruction to be slow.
The reason for microprogramming
Microcode was originally developed as a simpler method of developing the control logic for a computer. Initially CPU instruction setInstruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
s were "hardwired". Each step needed to fetch, decode and execute the machine instructions (including any operand address calculations, reads and writes) was controlled directly by combinatorial logic and rather minimal sequential state machine circuitry. While very efficient, the need for powerful instruction sets with multi-step addressing and complex operations (see below) made such "hard-wired" processors difficult to design and debug; highly encoded and varied-length instructions can contribute to this as well, especially when very irregular encodings are used.
Microcode simplified the job by allowing much of the processor's behaviour and programming model to be defined via microprogram routines rather than by dedicated circuitry. Even late in the design process, microcode could easily be changed, whereas hard wired CPU designs were very cumbersome to change, so this greatly facilitated CPU design.
From the 1940s to the late 1970s, much programming was done in assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
; higher level instructions meant greater programmer productivity, so an important advantage of microcode was the relative ease by which powerful machine instructions could be defined. During the 1970s, CPU speeds grew more quickly than memory speeds and numerous techniques such as memory block transfer, memory pre-fetch and multi-level cache
Memory hierarchy
The term memory hierarchy is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. A 'memory hierarchy' in computer storage distinguishes each...
s were used to alleviate this. High level machine instructions, made possible by microcode, helped further, as fewer more complex machine instructions require less memory bandwidth. For example, an operation on a character string could be done as a single machine instruction, thus avoiding multiple instruction fetches.
Architectures with instruction sets implemented by complex microprograms included the IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
System/360
System/360
The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
and Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
VAX
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...
. The approach of increasingly complex microcode-implemented instruction sets was later called CISC
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...
. An alternate approach, used in many microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
s, is to use PLA
Programmable logic array
A programmable logic array is a kind of programmable logic device used to implement combinational logic circuits. The PLA has a set of programmable AND gate planes, which link to a set of programmable OR gate planes, which can then be conditionally complemented to produce an output...
s and/or ROM
Read-only memory
Read-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
s (instead of combinatorial logic) mainly for instruction decoding, and let a simple state machine (without much, or any, microcode) do most of the sequencing. The various practical uses of microcode and related techniques (such as PLAs) have been numerous over the years, as well as approaches to where, and to which extent, it should be used. It is still used in modern CPU designs.
Other benefits
A processor's microprograms operate on a more primitive, totally different and much more hardware-oriented architecture than the assembly instructions visible to normal programmers. In coordination with the hardware, the microcode implements the programmer-visible architecture. The underlying hardware need not have a fixed relationship to the visible architecture. This makes it possible to implement a given instruction set architecture on a wide variety of underlying hardware micro-architectures.Doing so is important if binary program compatibility is a priority. That way previously existing programs can run on totally new hardware without requiring revision and recompilation. However there may be a performance penalty for this approach. The tradeoffs between application backward compatibility vs CPU performance are hotly debated by CPU design engineers.
The IBM System/360 has a 32-bit architecture with 16 general-purpose registers, but most of the System/360 implementations actually use hardware that implemented a much simpler underlying microarchitecture; for example, the System/360 Model 30 had 8-bit data paths to the arithmetic logic unit
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(ALU) and main memory and implemented the general-purpose registers in a special unit of higher-speed core memory, and the System/360 Model 40 had 8-bit data paths to the ALU and 16-bit data paths to main memory and also implemented the general-purpose registers in a special unit of higher-speed core memory. The Model 50 and Model 65 had full 32-bit data paths; the Model 50 implemented the general-purpose registers in a special unit of higher-speed core memory and the Model 65 implemented the general-purpose registers in faster transistor circuits. In this way, microprogramming enabled IBM to design many System/360 models with substantially different hardware and spanning a wide range of cost and performance, while making them all architecturally compatible. This dramatically reduced the amount of unique system software that had to be written for each model.
A similar approach was used by Digital Equipment Corporation in their VAX family of computers. Initially a 32-bit TTL
Transistor-transistor logic
Transistor–transistor logic is a class of digital circuits built from bipolar junction transistors and resistors. It is called transistor–transistor logic because both the logic gating function and the amplifying function are performed by transistors .TTL is notable for being a widespread...
processor in conjunction with supporting microcode implemented the programmer-visible architecture. Later VAX versions used different microarchitectures, yet the programmer-visible architecture did not change.
Microprogramming also reduced the cost of field changes to correct defects (bugs) in the processor; a bug could often be fixed by replacing a portion of the microprogram rather than by changes being made to hardware logic and wiring.
History
In 1947, the design of the MIT WhirlwindWhirlwind (computer)
The Whirlwind computer was developed at the Massachusetts Institute of Technology. It is the first computer that operated in real time, used video displays for output, and the first that was not simply an electronic replacement of older mechanical systems...
introduced the concept of a control store
Control store
A control store is the part of a CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. Early control stores were implemented as a diode-array accessed via address decoders, a form of read-only memory. This tradition dates back to the program timing...
as a way to simplify computer design and move beyond ad hoc
Ad hoc
Ad hoc is a Latin phrase meaning "for this". It generally signifies a solution designed for a specific problem or task, non-generalizable, and not intended to be able to be adapted to other purposes. Compare A priori....
methods. The control store was a diode matrix
Diode matrix
A diode matrix is a two-dimensional grid of wires, where each "intersection" where one row crosses over another either has a diode connecting them, or the wires are isolated from each other....
: a two-dimensional lattice, where one dimension accepted "control time pulses" from the CPU's internal clock, and the other connected to control signals on gates and other circuits. A "pulse distributor" would take the pulses generated by the CPU clock and break them up into eight separate time pulses, each of which would activate a different row of the lattice. When the row was activated, it would activate the control signals connected to it.
Described another way, the signals transmitted by the control store are being played much like a player piano
Player piano
A player piano is a self-playing piano, containing a pneumatic or electro-mechanical mechanism that operates the piano action via pre-programmed music perforated paper, or in rare instances, metallic rolls. The rise of the player piano grew with the rise of the mass-produced piano for the home in...
roll. That is, they are controlled by a sequence of very wide words constructed of bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s, and they are "played" sequentially. In a control store, however, the "song" is short and repeated continuously.
In 1951 Maurice Wilkes enhanced this concept by adding conditional execution, a concept akin to a conditional
Conditional statement
In computer science, conditional statements, conditional expressions and conditional constructs are features of a programming language which perform different computations or actions depending on whether a programmer-specified boolean condition evaluates to true or false...
in computer software. His initial implementation consisted of a pair of matrices, the first one generated signals in the manner of the Whirlwind control store, while the second matrix selected which row of signals (the microprogram instruction word, as it were) to invoke on the next cycle. Conditionals were implemented by providing a way that a single line in the control store could choose from alternatives in the second matrix. This made the control signals conditional on the detected internal signal. Wilkes coined the term microprogramming to describe this feature and distinguish it from a simple control store.
Examples of microprogrammed systems
- In common with many other complex mechanical devices, Charles Babbage'sCharles BabbageCharles Babbage, FRS was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer...
analytical engineAnalytical engineThe Analytical Engine was a proposed mechanical general-purpose computer designed by English mathematician Charles Babbage. It was first described in 1837 as the successor to Babbage's difference engine, a design for a mechanical calculator...
used banks of cams to control each operation, i.e. it had a read-only control store. As such it deserves to be recognised as the first microprogrammed computer to be designed, even if it has not yet been realised in hardware. - The EMIDEC 1100EMIDEC 1100-External links:**...
reputedly used a hard-wired control store consisting of wires threaded through ferrite cores, known as 'the laces'. - Most models of the IBM System/360 series were microprogrammed:
- The Model 25 was unique among System/360 models in using the top 16k bytes of core storage to hold the control storage for the microprogram. The 2025 used a 16-bit microarchitecture with seven control words (or microinstructions). At power up, or full system reset, the microcode was loaded from the card reader. The IBM 1410 emulation for this model was loaded this way.
- The Model 30IBM 2030The IBM System/360 Model 30 was a popular IBM mainframe announced in 1964 across the world as the least powerful of the System/360 range of four compatible computers – the first range in the world to allow programs to be written that could be used across a range of computers...
, the slowest model in the line, used an 8-bit microarchitecture with only a few hardware registers; everything that the programmer saw was emulated by the microprogram. The microcode for this model was also held on special punched cards, which were stored inside the machine in a dedicated reader per card, called "CROS" units (Capacitor Read-Only Storage). A second CROS reader was installed for machines ordered with 1620 emulation. - The Model 40 used 56-bit control words. The 2040 box implements both the System/360 main processor and the multiplex channel (the I/O processor). This model used "TROS" dedicated readers similar to "CROS" units, but with an inductive pickup (Transformer Read-only Store).
- The Model 50 had two internal datapaths which operated in parallel: a 32-bit datapath used for arithmetic operations, and an 8-bit data path used in some logical operations. The control store used 90-bit microinstructions.
- The Model 85 had separate instruction fetch (I-unit) and execution (E-unit) to provide high performance. The I-unit is hardware controlled. The E-unit is microprogrammed; the control words are 108 bits wide on a basic 360/85 and wider if an emulator feature is installed.
- The NCR 315NCR 315The NCR 315 Data Processing System, released in January 1962 by NCR, was a second-generation computer. All printed circuit boards used resistor-transistor logic to create the various logic elements. It used 12-bit slab memory structure using core memory. The instructions could use a memory slab as...
was microprogrammed with hand wired ferrite cores (a ROMRead-only memoryRead-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
) pulsed by a sequencer with conditional execution. Wires routed through the cores were enabled for various data and logic elements in the processor. - The Digital Equipment Corporation PDP-11PDP-11The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...
processors, with the exception of the PDP-11/20, were microprogrammed. - Many systems from the Burroughs were microprogrammed:
- The B700 "microprocessor" executed application-level opcodes using sequences of 16-bit microinstructions stored in main memory, each of these was either a register-load operation or mapped to a single 56-bit "nanocode" instruction stored in read-only memory. This allowed comparatively simple hardware to act either as a mainframe peripheral controller or to be packaged as a standalone computer.
- The B1700 was implemented with radically different hardware including bit-addressable main memory but had a similar multi-layer organisation. The operating system would preload the interpreter for whatever language was required. These interpreters presented different virtual machines for COBOL, Fortran, etc.
- Microdata produced computers in which the microcode was accessible to the user; this allowed the creation of custom assembler level instructions. Microdata's RealityPick operating systemThe Pick operating system is a demand-paged, multiuser, virtual memory, time-sharing operating system based around a unique "multivalued" database. Pick is used primarily for business data processing...
operating system design made extensive use of this capability. - The Nintendo 64Nintendo 64The , often referred to as N64, was Nintendo′s third home video game console for the international market. Named for its 64-bit CPU, it was released in June 1996 in Japan, September 1996 in North America, March 1997 in Europe and Australia, September 1997 in France and December 1997 in Brazil...
's Reality Co-Processor, which serves as the console's graphics processing unitGraphics processing unitA graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
and audio processor, utilized microcode; it is possible to implement new effects or tweak the processor to achieve the desired output. Some well-known examples of custom microcode include Factor 5Factor 5Factor 5 GmbH is an independent software and video game developer. The company was originally co-founded by five former Rainbow Arts employees in 1987 in Cologne, Germany, which served as the inspiration behind the studio's name....
's N64 port of the Indiana Jones and the Infernal MachineIndiana Jones and the Infernal MachineIndiana Jones and the Infernal Machine is a multi-platform action-adventure video game by LucasArts released in late 1999. The first 3D installment in the series, its gameplay focuses on solving puzzles, fighting enemies, and various platforming sections...
, Star Wars: Rogue SquadronStar Wars: Rogue SquadronStar Wars: Rogue Squadron is an arcade-style action game co-developed by Factor 5 and LucasArts. The first of three games in the Rogue Squadron series, it was published by LucasArts and Nintendo and released for Windows and the Nintendo 64 in December 1998...
and Star Wars: Battle for NabooStar Wars: Battle for NabooStar Wars: Episode I: Battle for Naboo is an arcade-style action game co-developed by Factor 5 and LucasArts; LucasArts supplied most of the art and level-design, while Factor 5 provided the programming, tools, sound, and most of the cut-scene and art post-production work...
. - The VU0 and VU1 vector units in the SonySony, commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....
PlayStation 2PlayStation 2The PlayStation 2 is a sixth-generation video game console manufactured by Sony as part of the PlayStation series. Its development was announced in March 1999 and it was first released on March 4, 2000, in Japan...
are microprogrammable; in fact, VU1 was only accessible via microcode for the first several generations of the SDK.
Implementation
Each microinstruction in a microprogram provides the bits which control the functional elements that internally compose a CPU. The advantage over a hard-wired CPU is that internal CPU control becomes a specialized form of a computer program. Microcode thus transforms a complex electronic design challenge (the control of a CPU) into a less-complex programming challenge.To take advantage of this, computers were divided into several parts:
A microsequencer
Microsequencer
In computer architecture and engineering, a sequencer or microsequencer is a part of the control unit of a CPU. It generates the addresses used to step through the microprogram of a control store....
picked the next word of the control store
Control store
A control store is the part of a CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. Early control stores were implemented as a diode-array accessed via address decoders, a form of read-only memory. This tradition dates back to the program timing...
. A sequencer is mostly a counter, but usually also has some way to jump to a different part of the control store depending on some data, usually data from the instruction register
Instruction register
In computing, an instruction register is the part of a CPU's control unit that stores the instruction currently being executed or decoded. In simple processors each instruction to be executed is loaded into the instruction register which holds it while it is decoded, prepared and ultimately...
and always some part of the control store. The simplest sequencer is just a register loaded from a few bits of the control store.
A register
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
set is a fast memory containing the data of the central processing unit. It may include the program counter, stack pointer, and other numbers that are not easily accessible to the application programmer. Often the register set is a triple-ported register file
Register file
A register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...
, that is, two registers can be read, and a third written at the same time.
An arithmetic and logic unit performs calculations, usually addition, logical negation, a right shift, and logical AND. It often performs other functions, as well.
There may also be a memory address register
Memory address register
The Memory Address Register is a CPU register that either stores the memory address from which data will be fetched to the CPU or the address to which data will be sent and stored....
and a memory data register
Memory data register
The Memory Data Register is the register of a computer's control unit that contains the data to be stored in the computer storage , or the data after a fetch from the computer storage...
, used to access the main computer storage
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
.
Together, these elements form an "execution unit
Execution unit
In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the Branch Unit, which receives data from the CPU...
". Most modern CPUs
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
have several execution units. Even simple computers usually have one unit to read and write memory, and another to execute user code.
These elements could often be bought together as a single chip. This chip came in a fixed width which would form a 'slice' through the execution unit. These were known as 'bit slice
Bit slicing
Bit slicing is a technique for constructing a processor from modules of smaller bit width. Each of these components processes one bit field or "slice" of an operand...
' chips. The AMD Am2900
AMD Am2900
Am2900 is a family of integrated circuits created in 1975 by Advanced Micro Devices . They were constructed with bipolar devices, in a bit-slice topology, and were designed to be used as modular components each representing a different aspect of a computer control unit...
family is one of the best known examples of bit slice elements.
The parts of the execution units, and the execution units themselves are interconnected by a bundle of wires called a bus
Computer bus
In computer architecture, a bus is a subsystem that transfers data between components inside a computer, or between computers.Early computer buses were literally parallel electrical wires with multiple connections, but the term is now used for any physical arrangement that provides the same...
.
Programmers develop microprograms. The basic tools are software: A microassembler
Microassembler
A microassembler is a computer program that helps prepare a microprogram to control the low level operation of a computer in much the same way an assembler helps prepare higher level code for a processor. The difference is that the microprogram is usually only developed by the processor...
allows a programmer to define the table of bits symbolically. A simulator program executes the bits in the same way as the electronics (hopefully), and allows much more freedom to debug the microprogram.
After the microprogram is finalized, and extensively tested, it is sometimes used as the input to a computer program that constructs logic to produce the same data. This program is similar to those used to optimize a programmable logic array
Programmable logic device
A programmable logic device or PLD is an electronic component used to build reconfigurable digital circuits. Unlike a logic gate, which has a fixed function, a PLD has an undefined function at the time of manufacture...
. No known computer program can produce optimal logic, but even pretty good logic can vastly reduce the number of transistors from the number required for a ROM control store. This reduces the cost and power used by a CPU.
Microcode can be characterized as horizontal or vertical. This refers primarily to whether each microinstruction directly controls CPU elements (horizontal microcode), or requires subsequent decoding by combinatorial logic before doing so (vertical microcode). Consequently each horizontal microinstruction is wider (contains more bits) and occupies more storage space than a vertical microinstruction.
Horizontal microcode
Horizontal microcode is typically contained in a fairly wide control store; it is not uncommon for each word to be 56 bits or more. On each tick of a sequencer clock a microcode word is read, decoded, and used to control the functional elements which make up the CPU.In a typical implementation a horizontal microprogram word comprises fairly tightly defined groups of bits. For example, one simple arrangement might be:
register source A | register source B | destination register | arithmetic and logic unit operation | type of jump | jump address |
For this type of micromachine to implement a JUMP instruction with the address following the opcode, the microcode might require two clock ticks; the engineer designing it would write microassembler source code looking something like this:
# Any line starting with a number-sign is a comment
# This is just a label, the ordinary way assemblers symbolically represent a
# memory address.
InstructionJUMP:
# To prepare for the next instruction, the instruction-decode microcode has already
# moved the program counter to the memory address register. This instruction fetches
# the target address of the jump instruction from the memory word following the
# jump opcode, by copying from the memory data register to the memory address register.
# This gives the memory system two clock ticks to fetch the next
# instruction to the memory data register for use by the instruction decode.
# The sequencer instruction "next" means just add 1 to the control word address.
MDR, NONE, MAR, COPY, NEXT, NONE
# This places the address of the next instruction into the PC.
# This gives the memory system a clock tick to finish the fetch started on the
# previous microinstruction.
# The sequencer instruction is to jump to the start of the instruction decode.
MAR, 1, PC, ADD, JMP, InstructionDecode
# The instruction decode is not shown, because it is usually a mess, very particular
# to the exact processor being emulated. Even this example is simplified.
# Many CPUs have several ways to calculate the address, rather than just fetching
# it from the word following the op-code. Therefore, rather than just one
# jump instruction, those CPUs have a family of related jump instructions.
For each tick it is common to find that only some portions of the CPU are used, with the remaining groups of bits in the microinstruction being no-ops. With careful design of hardware and microcode this property can be exploited to parallelise operations which use different areas of the CPU, for example in the case above the ALU is not required during the first tick so it could potentially be used to complete an earlier arithmetic instruction.
Vertical microcode
In vertical microcode, each microinstruction is encoded—that is, the bit fields may pass through intermediate combinatory logic which in turn generates the actual control signals for internal CPU elements (ALU, registers, etc.). In contrast, with horizontal microcode the bit fields themselves directly produce the control signals. Consequently vertical microcode requires smaller instruction lengths and less storage, but requires more time to decode, resulting in a slower CPU clock.Some vertical microcodes are just the assembly language of a simple conventional computer that is emulating a more complex computer.
Another form of vertical microcode has two fields:
field select | field value |
The "field select" selects which part of the CPU will be controlled by this word of the control store.
The "field value" actually controls that part of the CPU.
With this type of microcode, a designer explicitly chooses to make a slower CPU to save money by reducing the unused bits in the control store;
however, the reduced complexity may increase the CPU's clock frequency, which lessens the effect of an increased number of cycles per instruction.
As transistors became cheaper, horizontal microcode came to dominate the design of CPUs using microcode, with vertical microcode no longer being used.
Writable control stores
A few computers were built using "writable microcode" -- rather than storing the microcode in ROM or hard-wired logic, the microcode was stored in a RAM called a Writable Control Store or WCS.Such a computer is sometimes called a Writable Instruction Set Computer or WISC.
Many of these machines were experimental laboratory prototypes, such as the WISC CPU/16
and the RTX 32P.
There were also commercial machines that used writable microcode, such as early Xerox
Xerox PARC
PARC , formerly Xerox PARC, is a research and co-development company in Palo Alto, California, with a distinguished reputation for its contributions to information technology and hardware systems....
workstations, the DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
VAX
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...
8800 ("Nautilus") family, the Symbolics
Symbolics
Symbolics refers to two companies: now-defunct computer manufacturer Symbolics, Inc., and a privately held company that acquired the assets of the former company and continues to sell and maintain the Open Genera Lisp system and the Macsyma computer algebra system.The symbolics.com domain was...
L- and G-machines, and a number of IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
System/370
System/370
The IBM System/370 was a model range of IBM mainframes announced on June 30, 1970 as the successors to the System/360 family. The series maintained backward compatibility with the S/360, allowing an easy migration path for customers; this, plus improved performance, were the dominant themes of the...
implementations. Some DEC PDP-10
PDP-10
The PDP-10 was a mainframe computer family manufactured by Digital Equipment Corporation from the late 1960s on; the name stands for "Programmed Data Processor model 10". The first model was delivered in 1966...
machines stored their microcode in SRAM chips (about 80 bits wide x 2 Kwords), which was typically loaded on power-on through some other front-end CPU. Many more machines offered user-programmable writable control stores as an option (including the HP
Hewlett-Packard
Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...
2100
HP 2100
The HP 2100 was a series of minicomputers produced by Hewlett-Packard from the mid-1960s to early 1990s. The 2100 was also a specific model in this series. The series was renamed HP 1000 by the 1970s and sold as real-time computers, complementing the more complex IT-oriented HP 3000, and would be...
, DEC PDP-11/60
PDP-11
The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...
and Varian Data Machines
Varian Data Machines
Varian Data Machines was a division of Varian Associates which sold minicomputers. It entered the market in 1966, but met stiff competition and was bought by Sperry Corporation in 1977....
V-70 series minicomputer
Minicomputer
A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems...
s).
The Mentec M11 and Mentec M1 stored its microcode in SRAM chips, loaded on power-on through another CPU.
The Data General Eclipse MV/8000
Data General Eclipse MV/8000
The Eclipse MV/8000 was the first in a family of 32-bit minicomputers produced by Data General during the 1980s. Codenamed Eagle during development, its architecture was a new 32-bit design backward compatible with the previous 16-bit Eclipse series. The development of the computer and the people...
("Eagle") had a SRAM writable control store, loaded on power-on through another CPU.
WCS offered several advantages including the ease of patching the microprogram and, for certain hardware generations, faster access than ROMs could provide. User-programmable WCS allowed the user to optimize the machine for specific purposes.
Some CPU designs compile the instruction set to a writable RAM
Ram
-Animals:*Ram, an uncastrated male sheep*Ram cichlid, a species of freshwater fish endemic to Colombia and Venezuela-Military:*Battering ram*Ramming, a military tactic in which one vehicle runs into another...
or FLASH
Flash memory
Flash memory is a non-volatile computer storage chip that can be electrically erased and reprogrammed. It was developed from EEPROM and must be erased in fairly large blocks before these can be rewritten with new data...
inside the CPU (such as the Rekursiv
Rekursiv
Rekursiv was a computer processor designed by David M. Harland in the mid-1980s for Linn Smart Computing in Glasgow, Scotland. It was one of the few computer architectures intended to implement object-oriented concepts directly in hardware. The Rekursiv operated directly on objects rather than...
processor and the Imsys Cjip), or an FPGA (reconfigurable computing
Reconfigurable computing
Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays...
).
A CPU that uses microcode generally takes several clock cycles to execute a single instruction, one clock cycle for each step in the microprogram for that instruction. Some CISC
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...
processors include instructions that can take a very long time to execute. Such variations interfere with both interrupt
Interrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...
and, what is far more important in modern systems, pipelining.
Several Intel CPUs in the x86 architecture family have writable microcode.
This has allowed bugs in the Intel Core 2
Intel Core 2
Core 2 is a brand encompassing a range of Intel's consumer 64-bit x86-64 single-, dual-, and quad-core microprocessors based on the Core microarchitecture. The single- and dual-core models are single-die, whereas the quad-core models comprise two dies, each containing two cores, packaged in a...
microcode and Intel Xeon
Xeon
The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...
microcode to be fixed in software, rather than requiring the entire chip to be replaced.
Such fixes can be installed by Linux, FreeBSD
FreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...
, Microsoft Windows, or the motherboard BIOS.
Microcode versus VLIW and RISC
The design trend toward heavily microcoded processors with complex instructions began in the early 1960s and continued until roughly the mid-1980s. At that point the RISC design philosophy started becoming more prominent. This included the points:- Programming has largely moved away from assembly level, so it's no longer worthwhile to provide complex instructions for productivity reasons.
- Simpler instruction sets allow direct execution by hardware, avoiding the performance penalty of microcoded execution.
- Analysis shows complex instructions are rarely used, hence the machine resources devoted to them are largely wasted.
- The machine resources devoted to rarely-used complex instructions are better used for expediting performance of simpler, commonly-used instructions.
- Complex microcoded instructions may require many clock cycles which vary, and are difficult to pipeline for increased performance.
It should be mentioned that there are counter-points as well:
- The complex instructions in heavily microcoded implementations may not take much extra machine resources, except for microcode space. For instance, the same ALU is often used to calculate an effective address as well as computing the result from the actual operands (e.g. the original Z80, 8086, and others).
- The simpler non-RISC instructions (i.e. involving direct memory operandOperandIn mathematics, an operand is the object of a mathematical operation, a quantity on which an operation is performed.-Example :The following arithmetic expression shows an example of operators and operands:3 + 6 = 9\;...
s) are frequently used by modern compilers. Even immediate to stack (i.e. memory result) arithmetic operations are commonly employed. Although such memory operations, often with varying length encodings, are more difficult to pipeline, it is still fully feasible to do so - clearly exemplified by the i486, AMD K5AMD K5The K5 was AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture...
, Cyrix 6x86Cyrix 6x86The Cyrix 6x86 is a sixth-generation, 32-bit 80x86-compatible microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.-Architecture:...
, etc. - Non-RISC instructions inherently perform more work per instruction (on average), and are also normally highly encoded, so they enable smaller overall size of the same program, and thus better use of limited cache memories.
- Modern CISC/RISC implementations, e.g. x86 designs, decode instructions into dynamically buffered micro-operationMicro-operationIn computer central processing units, micro-operations are detailed low-level instructions used in some designs to implement complex machine instructions .Various forms of μops have long been the basis for traditional microcode routines used to simplify the implementation of a...
s with instruction encodings similar to traditional fixed microcode. Ordinary static microcode is used as hardware assistance for complex multistep operations such as auto-repeating instructions and for transcendental functionTranscendental functionA transcendental function is a function that does not satisfy a polynomial equation whose coefficients are themselves polynomials, in contrast to an algebraic function, which does satisfy such an equation...
s in the floating point unitFloating point unitA floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...
; it is also used for special purpose instructions (such as CPUIDCPUIDThe CPUID opcode is a processor supplementary instruction for the x86 architecture. It was introduced by Intel in 1993 when it introduced the Pentium and SL-Enhanced 486 processors....
) and internal control and configuration purposes. - The simpler instructions in CISC architectures are also directly executed in hardware in modern implementations.
Many RISC and VLIW
Very long instruction word
Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...
processors are designed to execute every instruction (as long as it is in the cache) in a single cycle. This is very similar to the way CPUs with microcode execute one microinstruction per cycle. VLIW
Very long instruction word
Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...
processors have instructions that behave similarly to very wide horizontal microcode, although typically without such fine-grained control over the hardware as provided by microcode. RISC instructions are sometimes similar to the narrow vertical microcode.
See also
- FirmwareFirmwareIn electronic systems and computing, firmware is a term often used to denote the fixed, usually rather small, programs and/or data structures that internally control various electronic devices...
- Control unitControl unitA control unit in general is a central part of the machinery that controls its operation, provided that a piece of machinery is complex and organized enough to contain any such unit. One domain in which the term is specifically used is the area of computer design...
- Finite-state machine (FSM)
- MicrosequencerMicrosequencerIn computer architecture and engineering, a sequencer or microsequencer is a part of the control unit of a CPU. It generates the addresses used to step through the microprogram of a control store....
- MicroassemblerMicroassemblerA microassembler is a computer program that helps prepare a microprogram to control the low level operation of a computer in much the same way an assembler helps prepare higher level code for a processor. The difference is that the microprogram is usually only developed by the processor...
- Control storeControl storeA control store is the part of a CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. Early control stores were implemented as a diode-array accessed via address decoders, a form of read-only memory. This tradition dates back to the program timing...
- Execution unitExecution unitIn computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the Branch Unit, which receives data from the CPU...
- Arithmetic logic unitArithmetic logic unitIn computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(ALU) - Floating-point unit (FPU)
- Instruction pipelineInstruction pipelineAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
- SuperscalarSuperscalarA superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
- MicroarchitectureMicroarchitectureIn computer engineering, microarchitecture , also called computer organization, is the way a given instruction set architecture is implemented on a processor. A given ISA may be implemented with different microarchitectures. Implementations might vary due to different goals of a given design or...
- CPU designCPU designCPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...
Further reading
- Tucker, S. G., "Microprogram control for SYSTEM/360" IBM Systems Journal, Volume 6, Number 4, pp. 222-241 (1967)