Complex instruction set computer
Encyclopedia
A complex instruction set computer (CISC) (icon), is a computer
where single instructions can execute several low-level operations (such as a load from memory, an arithmetic
operation
, and a memory store) and/or are capable of multi-step operations or addressing mode
s within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC).
Examples of CISC instruction set architectures are System/360
through z/Architecture
, PDP-11
, VAX
, Motorola 68k, and x86.
tried to bridge the so called semantic gap
, i.e. to design instruction sets that directly supported high-level programming constructs such as procedure calls, loop control, and complex addressing mode
s, allowing data structure and array accesses to be combined into single instructions. Instructions are also typically highly encoded in order to further enhance the code density. The compact nature of such instruction sets results in smaller program
sizes and fewer (slow) main memory accesses, which at the time (early 1960s and onwards) resulted in a tremendous savings on the cost of computer memory and disc storage, as well as faster execution. It also meant good programming productivity
even in assembly language
, as high level languages such as Fortran
or Algol
were not always available or appropriate (microprocessors in this category are sometimes still programmed in assembly language for certain types of critical applications).
s). This is because these fast, but complex and expensive, memories are inherently limited in size, making compact code beneficial. Of course, the fundamental reason they are needed is that main memories (i.e. dynamic RAM today) remain slow compared to a (high performance) CPU-core.
One reason for this was that architects (microcode writers) sometimes "over-designed" assembler language instructions, i.e. including features which were not possible to implement efficiently on the basic hardware available. This could, for instance, be "side effects" (above conventional flags), such as the setting of a register or memory location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the external bus, it would demand extra cycles every time, and thus be quite inefficient.
Even in balanced high performance designs, highly encoded and (relatively) high-level instructions could be complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore required a great deal of work on the part of the processor designer in cases where a simpler, but (typically) slower, solution based on decode tables and/or microcode sequencing is not appropriate. At the time where transistors and other components were a limited resource, this also left fewer components and less area for other types of performance optimizations.
- IBM
s Watson Research Center, mid-1970s) was a tightly pipelined simple machine originally intended to be used as an internal microcode kernel, or engine, in CISC designs, but also became the processor that introduced the RISC idea to a somewhat larger public. Simplicity and regularity also in the visible instruction set would make it easier to implement overlapping processor stages (pipelining) at the machine code level (i.e. the level seen by compilers.) However, pipelining at that level was already used in some high performance CISC "supercomputers" in order to reduce the instruction cycle time (despite the complications of implementing within the limited component count and wiring complexity feasible at the time). Internal microcode execution in CISC processors, on the other hand, could be more or less pipelined depending on the particular design, and therefore more or less akin to the basic structure of RISC processors.
implementation of a CISC programming model directly; the in-order superscalar Original Pentium and the out-of-order superscalar Cyrix 6x86
are well known examples of this. The frequent memory accesses for operands of a typical CISC machine may limit the instruction level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures used in modern designs, as well as by other measures. Due to inherently compact and semantically rich instructions, the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC processor, which may give it a significant advantage in a modern cache based implementation. (Whether the downsides versus the upsides justifies a complex design or not is food for a never-ending debate in certain circles.)
Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are limited by the maximum number of transistors today. Although complex, the transistor count of CISC decoders do not grow exponentially like the total number of transistors per processor (the majority typically used for caches). Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and variable length designs without load-store limitations (i.e. non-RISC). This governs re-implementations of older architectures such as the ubiquitous x86 (see below) as well as new designs for microcontroller
s for embedded system
s, and similar uses. The superscalar complexity in the case of modern x86 was solved with dynamically issued and buffered micro-operations, i.e. indirect and dynamic superscalar execution; the Pentium Pro
and AMD K5
are early examples of this. It allows a fairly simple superscalar design to be located after the (fairly complex) decoders (and buffers), giving, so to speak, the best of both worlds in many respects.
, and IBM
, supported every instruction that their predecessors did, but achieved maximum efficiency only on a fairly simple x86 subset that was only a little more than a typical RISC instruction set (i.e. without typical RISC load-store limitations). The Intel P5
Pentium generation was a superscalar version of these principles. However, modern x86 processors also (typically) decode and split instructions into dynamic sequences of internally-buffered micro-operations, which not only helps execute a larger subset of instructions in a pipelined (overlapping) fashion, but also facilitates more advanced extraction of parallelism out of the code stream, for even higher performance.
Contrary to popular simplifications (present also in some academic texts), not all CISCs are microcoded or have "complex" instructions. As CISC became a catch-all term meaning anything that's not a load-store (RISC) architecture, it's not the number of instructions, nor the complexity of the implementation or of the instructions themselves, that define CISC, but the fact that arithmetic instructions also perform memory accesses. Compared to a small 8-bit CISC processor, a RISC floating-point instruction is complex. CISC does not even need to have complex addressing modes; 32 or 64-bit RISC processors may well have more complex addressing modes than small 8-bit CISC processors.
A PDP-10
, a PDP-8
, an Intel 386, an Intel 4004
, a Motorola 68000
, a System z mainframe, a Burroughs B5000
, a VAX
, a Zilog Z80000
, and a 6502 all vary wildly in the number, sizes, and formats of instructions, the number, types, and sizes of registers, and the available data types. Some have hardware support for operations like scanning for a substring, arbitrary-precision BCD arithmetic, or transcendental function
s, while others have only 8-bit addition and subtraction. But they are all in the CISC category because they have "load-operate" instructions that load and/or store memory contents within the same instructions that perform the actual calculations. For instance, the PDP-8, having only 8 fixed-length instructions and no microcode at all, is a CISC because of how the instructions work, PowerPC, which has over 230 instructions (more than some VAXes) and complex internals like register renaming and a reorder buffer is a RISC, while Minimal CISC has 8 instructions, but is clearly a CISC because it combines memory access and computation in the same instructions.
Some of the problems and contradictions in this terminology will perhaps disappear as more systematic terms, such as (non) load/store, becomes more popular and eventually replaces the imprecise and slightly counter-intuitive RISC/CISC terms.
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
where single instructions can execute several low-level operations (such as a load from memory, an arithmetic
Arithmetic
Arithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...
operation
Operator (programming)
Programming languages typically support a set of operators: operations which differ from the language's functions in calling syntax and/or argument passing mode. Common examples that differ by syntax are mathematical arithmetic operations, e.g...
, and a memory store) and/or are capable of multi-step operations or addressing mode
Addressing mode
Addressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand of each instruction...
s within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC).
Examples of CISC instruction set architectures are System/360
System/360
The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
through z/Architecture
Z/Architecture
z/Architecture, initially and briefly called ESA Modal Extensions , refers to IBM's 64-bit computing architecture for IBM mainframe computers. IBM introduced its first z/Architecture-based system, the zSeries Model 900, in late 2000. Later z/Architecture systems include the IBM z800, z990, z890,...
, PDP-11
PDP-11
The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...
, VAX
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...
, Motorola 68k, and x86.
Incitements and benefits
Before the RISC philosophy became prominent, many computer architectsComputer architecture
In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
tried to bridge the so called semantic gap
Semantic gap
The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols...
, i.e. to design instruction sets that directly supported high-level programming constructs such as procedure calls, loop control, and complex addressing mode
Addressing mode
Addressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand of each instruction...
s, allowing data structure and array accesses to be combined into single instructions. Instructions are also typically highly encoded in order to further enhance the code density. The compact nature of such instruction sets results in smaller program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
sizes and fewer (slow) main memory accesses, which at the time (early 1960s and onwards) resulted in a tremendous savings on the cost of computer memory and disc storage, as well as faster execution. It also meant good programming productivity
Programming productivity
Programming productivity refers to a variety of software development issues and methodologies affecting the quantity and quality of code produced by an individual or team...
even in assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
, as high level languages such as Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
or Algol
ALGOL
ALGOL is a family of imperative computer programming languages originally developed in the mid 1950s which greatly influenced many other languages and became the de facto way algorithms were described in textbooks and academic works for almost the next 30 years...
were not always available or appropriate (microprocessors in this category are sometimes still programmed in assembly language for certain types of critical applications).
New instructions
In the 1970s, analysis of high level languages indicated some complex machine language implementations and it was determined that new instructions could improve performance. Some instructions were added that were never intended to be used in assembly language but fit well with compiled high level languages. Compilers were updated to take advantage of these instructions. The benefits of semantically rich instructions with compact encodings can be seen in modern processors as well, particularly in the high performance segment where caches are a central component (as opposed to most embedded systemEmbedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
s). This is because these fast, but complex and expensive, memories are inherently limited in size, making compact code beneficial. Of course, the fundamental reason they are needed is that main memories (i.e. dynamic RAM today) remain slow compared to a (high performance) CPU-core.
Design issues
While many designs achieved the aim of higher throughput at lower cost and also allowed high-level language constructs to be expressed by fewer instructions, it was observed that this was not always the case. For instance, low-end versions of complex architectures (i.e. using less hardware) could lead to situations where it was possible to improve performance by not using a complex instruction (such as a procedure call or enter instruction), but instead using a sequence of simpler instructions.One reason for this was that architects (microcode writers) sometimes "over-designed" assembler language instructions, i.e. including features which were not possible to implement efficiently on the basic hardware available. This could, for instance, be "side effects" (above conventional flags), such as the setting of a register or memory location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the external bus, it would demand extra cycles every time, and thus be quite inefficient.
Even in balanced high performance designs, highly encoded and (relatively) high-level instructions could be complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore required a great deal of work on the part of the processor designer in cases where a simpler, but (typically) slower, solution based on decode tables and/or microcode sequencing is not appropriate. At the time where transistors and other components were a limited resource, this also left fewer components and less area for other types of performance optimizations.
The RISC idea
The circuitry that performs the actions defined by the microcode in many (but not all) CISC processors is, in itself, a processor which in many ways is reminiscent in structure to very early CPU designs. In the early 1970s, this gave rise to ideas to return to simpler processor designs in order to make it more feasible to cope without (then relatively large and expensive) ROM tables and/or PLA structures for sequencing and/or decoding. The first (retroactively) RISC-labeled processor (IBM 801IBM 801
The 801 was an experimental minicomputer designed by IBM. The resulting architecture was used in various roles in IBM until the 1980s. The 801 was started as a pure research project led by John Cocke in October 1975 at the Thomas J. Watson Research Center. The name 801 comes from the building the...
- IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
s Watson Research Center, mid-1970s) was a tightly pipelined simple machine originally intended to be used as an internal microcode kernel, or engine, in CISC designs, but also became the processor that introduced the RISC idea to a somewhat larger public. Simplicity and regularity also in the visible instruction set would make it easier to implement overlapping processor stages (pipelining) at the machine code level (i.e. the level seen by compilers.) However, pipelining at that level was already used in some high performance CISC "supercomputers" in order to reduce the instruction cycle time (despite the complications of implementing within the limited component count and wiring complexity feasible at the time). Internal microcode execution in CISC processors, on the other hand, could be more or less pipelined depending on the particular design, and therefore more or less akin to the basic structure of RISC processors.
Superscalar
In a more modern context, the complex variable length encoding used by some of the typical CISC architectures makes it complicated, but still feasible, to build a superscalarSuperscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
implementation of a CISC programming model directly; the in-order superscalar Original Pentium and the out-of-order superscalar Cyrix 6x86
Cyrix 6x86
The Cyrix 6x86 is a sixth-generation, 32-bit 80x86-compatible microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.-Architecture:...
are well known examples of this. The frequent memory accesses for operands of a typical CISC machine may limit the instruction level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures used in modern designs, as well as by other measures. Due to inherently compact and semantically rich instructions, the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC processor, which may give it a significant advantage in a modern cache based implementation. (Whether the downsides versus the upsides justifies a complex design or not is food for a never-ending debate in certain circles.)
Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are limited by the maximum number of transistors today. Although complex, the transistor count of CISC decoders do not grow exponentially like the total number of transistors per processor (the majority typically used for caches). Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and variable length designs without load-store limitations (i.e. non-RISC). This governs re-implementations of older architectures such as the ubiquitous x86 (see below) as well as new designs for microcontroller
Microcontroller
A microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM...
s for embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
s, and similar uses. The superscalar complexity in the case of modern x86 was solved with dynamically issued and buffered micro-operations, i.e. indirect and dynamic superscalar execution; the Pentium Pro
Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel introduced in November 1, 1995 . It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications...
and AMD K5
AMD K5
The K5 was AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture...
are early examples of this. It allows a fairly simple superscalar design to be located after the (fairly complex) decoders (and buffers), giving, so to speak, the best of both worlds in many respects.
CISC and RISC terms
The terms CISC and RISC have become less meaningful with the continued evolution of both CISC and RISC designs and implementations. The first highly (or tightly) pipelined x86 implementations, the 486 designs from Intel, AMD, CyrixCyrix
Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas as a specialist supplier of high-performance math coprocessors for 286 and 386 microprocessors. The company was founded by former Texas Instruments staff members and had a long but troubled relationship...
, and IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
, supported every instruction that their predecessors did, but achieved maximum efficiency only on a fairly simple x86 subset that was only a little more than a typical RISC instruction set (i.e. without typical RISC load-store limitations). The Intel P5
P5 (microarchitecture)
The original Pentium microprocessor was introduced on March 22, 1993. Its microarchitecture, deemed P5, was Intel's fifth-generation and first superscalar x86 microarchitecture. As a direct extension of the 80486 architecture, it included dual integer pipelines, a faster FPU, wider data bus,...
Pentium generation was a superscalar version of these principles. However, modern x86 processors also (typically) decode and split instructions into dynamic sequences of internally-buffered micro-operations, which not only helps execute a larger subset of instructions in a pipelined (overlapping) fashion, but also facilitates more advanced extraction of parallelism out of the code stream, for even higher performance.
Contrary to popular simplifications (present also in some academic texts), not all CISCs are microcoded or have "complex" instructions. As CISC became a catch-all term meaning anything that's not a load-store (RISC) architecture, it's not the number of instructions, nor the complexity of the implementation or of the instructions themselves, that define CISC, but the fact that arithmetic instructions also perform memory accesses. Compared to a small 8-bit CISC processor, a RISC floating-point instruction is complex. CISC does not even need to have complex addressing modes; 32 or 64-bit RISC processors may well have more complex addressing modes than small 8-bit CISC processors.
A PDP-10
PDP-10
The PDP-10 was a mainframe computer family manufactured by Digital Equipment Corporation from the late 1960s on; the name stands for "Programmed Data Processor model 10". The first model was delivered in 1966...
, a PDP-8
PDP-8
The 12-bit PDP-8 was the first successful commercial minicomputer, produced by Digital Equipment Corporation in the 1960s. DEC introduced it on 22 March 1965, and sold more than 50,000 systems, the most of any computer up to that date. It was the first widely sold computer in the DEC PDP series of...
, an Intel 386, an Intel 4004
Intel 4004
The Intel 4004 was a 4-bit central processing unit released by Intel Corporation in 1971. It was the first complete CPU on one chip, and also the first commercially available microprocessor...
, a Motorola 68000
Motorola 68000
The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...
, a System z mainframe, a Burroughs B5000
Burroughs B5000
In the 1970s, Burroughs Corporation was organized into three divisions with very different product line architectures for high-end, mid-range, and entry-level business computer systems...
, a VAX
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...
, a Zilog Z80000
Zilog Z80000
The Z80000 is Zilog's 32-bit processor from 1986, an expansion of its 16-bit predecessor, the Zilog Z8000. It includes multiprocessing capability, a six-stage instruction pipeline, and a 256-byte cache. Its memory addressing system can access 4 gigabytes of RAM. Described at the time as a...
, and a 6502 all vary wildly in the number, sizes, and formats of instructions, the number, types, and sizes of registers, and the available data types. Some have hardware support for operations like scanning for a substring, arbitrary-precision BCD arithmetic, or transcendental function
Transcendental function
A transcendental function is a function that does not satisfy a polynomial equation whose coefficients are themselves polynomials, in contrast to an algebraic function, which does satisfy such an equation...
s, while others have only 8-bit addition and subtraction. But they are all in the CISC category because they have "load-operate" instructions that load and/or store memory contents within the same instructions that perform the actual calculations. For instance, the PDP-8, having only 8 fixed-length instructions and no microcode at all, is a CISC because of how the instructions work, PowerPC, which has over 230 instructions (more than some VAXes) and complex internals like register renaming and a reorder buffer is a RISC, while Minimal CISC has 8 instructions, but is clearly a CISC because it combines memory access and computation in the same instructions.
Some of the problems and contradictions in this terminology will perhaps disappear as more systematic terms, such as (non) load/store, becomes more popular and eventually replaces the imprecise and slightly counter-intuitive RISC/CISC terms.
See also
- CPUCentral processing unitThe central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
- RISC
- ZISC
- VLIW
- CPU designCPU designCPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...
- Computer architectureComputer architectureIn computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....