Micro-operation
Encyclopedia
In computer
central processing unit
s, micro-operations (also known as a micro-ops or μops) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed macro-instructions in this context).
Various forms of μops have long been the basis for traditional microcode
routines used to simplify the implementation of a particular CPU design
or perhaps just the sequencing of certain multi-step operations or addressing modes. More recently, μops have also been employed in a different way in order to let modern "CISC
" processors more easily handle asynchronous parallel and speculative execution: As with traditional microcode, one or more table lookups (or equivalent) is done to locate the appropriate μop-sequence based on the encoding and semantics of the machine instruction (the decoding or translation step), however, instead of having rigid μop-sequences controlling the CPU directly from a microcode-ROM
, μops are here dynamically issued, that is, buffered in rather long sequences before being executed.
This buffering means that the fetch and decode stages can be more detached from the execution units than is feasible in a more traditional microcoded (or "hard-wired") design. As this allows a degree of freedom regarding execution order, it makes some extraction of instruction level parallelism
out of a normal single-threaded program possible (provided that dependencies are checked etc). It opens up for more analysis and therefore also for reordering of code sequences in order to dynamically optimize mapping and scheduling of μops onto machine resources (such as ALUs
, load/store units etc). As this happens on the μop-level, sub-operations of different machine (macro) instructions may often intermix in a particular μop-sequence (forming partially reordered machine instructions).
Today, the optimization has gone even further; processors not only translate many machine instructions into a series of μops, but also do the opposite when appropriate; they combine certain machine instruction sequences (such as a compare followed by a conditional jump) into a more complex μop which fits the execution model better and thus can be executed faster or with less machine resources involved.
Another way to try to improve performance is to cache the decoded micro-operations, so that if the same macroinstruction is executed again, the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. The Execution Trace Cache found in Intel NetBurst microarchitecture (Pentium 4
) is so far the only widespread example of this technique. The size of this cache may be stated in terms of how many thousands of micro-operations it can store: kμops.
There is a bunch of varieties and optimizations of accelerated instruction execution as described above, which are extremely difficult to explain in detail without losing the basic overview. Due to this, for didactical explanation there is a need to simplify general computational concepts to a minimum of complexity necessary. Therefore, several valuable eLearning Tools have been developed during the years in academic areas for visualizing, simulating and emulating aspects on Computer Architecture
, the Instruction Set
and its architecture.
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
s, micro-operations (also known as a micro-ops or μops) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed macro-instructions in this context).
Various forms of μops have long been the basis for traditional microcode
Microcode
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...
routines used to simplify the implementation of a particular CPU design
CPU design
CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...
or perhaps just the sequencing of certain multi-step operations or addressing modes. More recently, μops have also been employed in a different way in order to let modern "CISC
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...
" processors more easily handle asynchronous parallel and speculative execution: As with traditional microcode, one or more table lookups (or equivalent) is done to locate the appropriate μop-sequence based on the encoding and semantics of the machine instruction (the decoding or translation step), however, instead of having rigid μop-sequences controlling the CPU directly from a microcode-ROM
Read-only memory
Read-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
, μops are here dynamically issued, that is, buffered in rather long sequences before being executed.
This buffering means that the fetch and decode stages can be more detached from the execution units than is feasible in a more traditional microcoded (or "hard-wired") design. As this allows a degree of freedom regarding execution order, it makes some extraction of instruction level parallelism
Instruction level parallelism
Instruction-level parallelism is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f...
out of a normal single-threaded program possible (provided that dependencies are checked etc). It opens up for more analysis and therefore also for reordering of code sequences in order to dynamically optimize mapping and scheduling of μops onto machine resources (such as ALUs
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
, load/store units etc). As this happens on the μop-level, sub-operations of different machine (macro) instructions may often intermix in a particular μop-sequence (forming partially reordered machine instructions).
Today, the optimization has gone even further; processors not only translate many machine instructions into a series of μops, but also do the opposite when appropriate; they combine certain machine instruction sequences (such as a compare followed by a conditional jump) into a more complex μop which fits the execution model better and thus can be executed faster or with less machine resources involved.
Another way to try to improve performance is to cache the decoded micro-operations, so that if the same macroinstruction is executed again, the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. The Execution Trace Cache found in Intel NetBurst microarchitecture (Pentium 4
Pentium 4
Pentium 4 was a line of single-core desktop and laptop central processing units , introduced by Intel on November 20, 2000 and shipped through August 8, 2008. They had a 7th-generation x86 microarchitecture, called NetBurst, which was the company's first all-new design since the introduction of the...
) is so far the only widespread example of this technique. The size of this cache may be stated in terms of how many thousands of micro-operations it can store: kμops.
There is a bunch of varieties and optimizations of accelerated instruction execution as described above, which are extremely difficult to explain in detail without losing the basic overview. Due to this, for didactical explanation there is a need to simplify general computational concepts to a minimum of complexity necessary. Therefore, several valuable eLearning Tools have been developed during the years in academic areas for visualizing, simulating and emulating aspects on Computer Architecture
Computer architecture simulator
In computer science, a computer architecture simulator, or an architectural simulator, is a piece of software to model computer devices to predict outputs and performance metrics on a given input...
, the Instruction Set
Instruction Set Simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
and its architecture.