Barrel processor
Encyclopedia
A barrel processor is a CPU
that switches between threads
of execution on every cycle
. This CPU design
technique is also known as "interleaved" or "fine-grained" temporal multithreading
. As opposed to simultaneous multithreading
in modern superscalar
architectures, it generally does not allow execution of multiple instructions in one cycle.
For example, the peripheral processing system of the CDC 6000 series
computers and its successors executed one instruction (or a portion of an instruction) from each of 10 different virtual processors (called peripheral processors) before returning to the first processor. Also, the IP3023 processor from Ubicom executes one instruction from each of 8 different threads before returning to the first thread. The Cray XMT
also uses a barrel processor (Threadstorm) in its architecture.
Like preemptive multitasking, each thread of execution is assigned its own program counter
and other hardware register
s (each thread's architectural state
). A barrel processor can guarantee that each thread will execute 1 instruction every N cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for hundreds or thousands of cycles, while all other threads wait their turn.
A technique called C-slowing
can take a normal single-tasking processor design and automatically generate a corresponding barrel processor design. An n-way barrel processor generated this way acts much like n separate multiprocessing
copies of the original single-tasking processor, each one running at roughly 1/n the original speed.
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
that switches between threads
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...
of execution on every cycle
Instruction cycle
An instruction cycle is the basic operation cycle of a computer. It is the process by which a computer retrieves a program instruction from its memory, determines what actions the instruction requires, and carries out those actions...
. This CPU design
CPU design
CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...
technique is also known as "interleaved" or "fine-grained" temporal multithreading
Temporal multithreading
Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other being simultaneous multithreading. The distinguishing difference between the two forms is the maximum number of concurrent threads that can execute in any given...
. As opposed to simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
in modern superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
architectures, it generally does not allow execution of multiple instructions in one cycle.
For example, the peripheral processing system of the CDC 6000 series
CDC 6000 series
The CDC 6000 series was a family of mainframe computers manufactured by Control Data Corporation in the 1960s. It consisted of CDC 6400, CDC 6500, CDC 6600 and CDC 6700 computers, which all were extremely rapid and efficient for their time...
computers and its successors executed one instruction (or a portion of an instruction) from each of 10 different virtual processors (called peripheral processors) before returning to the first processor. Also, the IP3023 processor from Ubicom executes one instruction from each of 8 different threads before returning to the first thread. The Cray XMT
Cray XMT
The Cray XMT is the third generation of the Cray MTA supercomputer architecture originally developed by Tera. The earlier generations were called the Cray MTA and the Cray MTA-2. The XMT makes the MTA's multithreaded processors, now dubbed Threadstorm, compatible with the 1207-pin Socket F used...
also uses a barrel processor (Threadstorm) in its architecture.
Like preemptive multitasking, each thread of execution is assigned its own program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
and other hardware register
Hardware register
In digital electronics, especially computing, a hardware register stores bits of information, in a way that all the bits can be written to or read out simultaneously.The hardware registers inside a central processing unit are called processor registers....
s (each thread's architectural state
Architectural state
The architectural state is the part of the CPU which holds the state ofa process, this includes:* Control registers** Instruction Flag Registers ** Interrupt Mask Registers** Memory management unit Registers** Status registers...
). A barrel processor can guarantee that each thread will execute 1 instruction every N cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for hundreds or thousands of cycles, while all other threads wait their turn.
A technique called C-slowing
C-slowing
C-slowing is a technique used in conjunction with retiming to improve throughput of a digital circuit. Each register in a circuit is replaced by a set of C registers . This creates a circuit with C independent threads, as if the new circuit contained C copies of the original circuit...
can take a normal single-tasking processor design and automatically generate a corresponding barrel processor design. An n-way barrel processor generated this way acts much like n separate multiprocessing
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...
copies of the original single-tasking processor, each one running at roughly 1/n the original speed.
Advantages compared to single threaded processors
A single-tasking processor spends a lot of time idle, not doing anything useful whenever a cache miss or pipeline stall occurs. Advantages to employing barrel processors over single-tasking processors include:- The ability to do useful work on the other threads while the stalled thread is waiting.
- Designing an n-way barrel processor with n-deep pipelineInstruction pipelineAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
s is much simpler than designing a single-tasking processor because a barrel processor never has a pipeline stall and doesn't need feed-forwardFeed-forwardFeed-forward is a term describing an element or pathway within a control system which passes a controlling signal from a source in the control system's external environment, often a command signal from an external operator, to a load elsewhere in its external environment...
circuits. - For real-timeReal-time computingIn computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
applications, a barrel processor can guarantee that a "real-time" thread can execute with precise timing, no matter what happens to the other threads—even if some other thread locks upDeadlockA deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...
in an infinite loopInfinite loopAn infinite loop is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over...
or is continuously interrupted by hardware interrupts.
Disadvantages compared to single threaded processors
There are, however, some disadvantages to barrel processors.- Either all threads must share the same cacheCPU cacheA CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
, which slows overall system performance, or there must be one unit of cache for each execution thread, which can significantly increase the transistor countTransistor countThe transistor count of a device is the number of transistors in the device.Transistor count is the most common measure of integrated circuit complexity. According to Moore's Law, the transistor count of the integrated circuits doubles every two years...
(and thus cost) of such a CPU. However, most barrel processors are used to implement hard real-time embedded systemEmbedded systemAn embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
s, where memory access costs are typically calculated assuming worst-case behavior of the cache, so this is less of a concern. - The state of each thread must be kept on-chip (typically in registers) to avoid costly off-chip context switches. This requires a large number of registers compared to typical processors.
See also
- Super-threadingSuper-threadingSuper-threading is a type of multithreading that enables different threads to be executed by a single processor without truly executing them at the same time. This qualifies it as time-sliced or temporal multithreading rather than simultaneous multithreading...
- Computer multitaskingComputer multitaskingIn computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...
- Simultaneous multithreadingSimultaneous multithreadingSimultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT) - Hyper-threadingHyper-threadingHyper-threading is Intel's term for its simultaneous multithreading implementation in its Atom, Intel Core i3/i5/i7, Itanium, Pentium 4 and Xeon CPUs....
External links
- Soft peripherals Embedded.com article examines Ubicom's IP3023 processor
- An Evaluation of the Design of the Gamma 60