Hyper-threading
Encyclopedia
Hyper-threading is Intel's term for its simultaneous multithreading
implementation in its Atom
, Intel Core
i3/i5/i7, Itanium
, Pentium 4
and Xeon
CPUs.
Hyper-threading is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. For each processor core that is physically present, the operating system
addresses two virtual processors, and shares the workload between them when possible. Hyper-threading requires not only that the operating system support multiple processors, but also that it be specifically optimized for HTT, and Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature.
—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction
, or data dependency
.)
This technology is transparent to operating systems and programs. The minimum that is required to take advantage of hyper-threading is symmetric multiprocessing
(SMP) support in the operating system
, as the logical processors appear as standard separate processors.
It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's thread scheduler
is unaware of hyper-threading it will treat all four processors as being the same. If only two threads are eligible to run it might choose to schedule those threads on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA
systems.
but was brought to the market by Intel.
Hyper-Threading was first introduced in the Foster MP-based Xeon in March 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor. Previous generations of Intel's processors based on the Core microarchitecture do not have Hyper-Threading, because the Core microarchitecture is a descendant of the P6 microarchitecture used in iterations of Pentium since the Pentium Pro
through the Pentium III
and the Celeron
(Covington, Mendocino, Coppermine and Tualatin-based) and the Pentium II Xeon
and Pentium III Xeon models.
Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. The first generation Nehalem contains four cores and effectively scales eight threads. Since then, both two- and six-core models have been released, scaling four and twelve threads respectively.
The Intel Atom
is an in-order processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.
The Itanium
9300 launched with eight threads per processor (two threads per core) through enhanced hyper-threading technology. Poulson, the next-generation Itanium, is scheduled to have additional hyper-threading enhancements.
The Intel Xeon
5500 server chips also utilize two-way hyper-threading.
According to Intel the first implementation only used 5% more die area
than the comparable non-hyperthreaded processor, but the performance was 15–30% better.
Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading
Pentium 4. Tomshardware.com states "In some cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz without HT turned on". Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms. The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper-Threading Technology is turned on. This is due to the replay system
of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs which adds a varying amount of execution time. The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system. This is enough to completely overcome that performance hit.
In 2006, hyper-threading was criticised for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated simultaneous multithreading
(SMT) can use up to 46% more power than dual-core designs. Furthermore, they claim SMT
increases cache thrashing by 42%, whereas dual core results in a 37% decrease. Intel has disputed this claim, stating that hyper-threading is highly efficient because it simply uses resources that would otherwise be idle. In 2010, ARM has stated that it will include simultaneous multithreading in its chips in the future.
to monitor the memory access patterns of another thread with which it shares a cache, allowing the theft of cryptographic information. Potential solutions to this include the processor changing its cache eviction strategy, or the operating system preventing the simultaneous execution, on the same physical core, of threads with different privileges.
Security
Performance
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
implementation in its Atom
Intel Atom
Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs from Intel, designed in 45 nm CMOS and used mainly in netbooks, nettops, embedded application ranging from health care to advanced robotics and Mobile Internet devices...
, Intel Core
Intel Core
Yonah was the code name for Intel's first generation of 65 nm process mobile microprocessors, based on the Banias/Dothan-core Pentium M microarchitecture. SIMD performance has been improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations, while integer...
i3/i5/i7, Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
, Pentium 4
Pentium 4
Pentium 4 was a line of single-core desktop and laptop central processing units , introduced by Intel on November 20, 2000 and shipped through August 8, 2008. They had a 7th-generation x86 microarchitecture, called NetBurst, which was the company's first all-new design since the introduction of the...
and Xeon
Xeon
The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...
CPUs.
Hyper-threading is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. For each processor core that is physically present, the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
addresses two virtual processors, and shares the workload between them when possible. Hyper-threading requires not only that the operating system support multiple processors, but also that it be specifically optimized for HTT, and Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature.
Details
Hyper-threading works by duplicating certain sections of the processor—those that store the architectural stateArchitectural state
The architectural state is the part of the CPU which holds the state ofa process, this includes:* Control registers** Instruction Flag Registers ** Interrupt Mask Registers** Memory management unit Registers** Status registers...
—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction
Branch misprediction
Branch misprediction occurs when a central processing unit mispredicts the next instruction to process in branch prediction, which is aimed at speeding up execution....
, or data dependency
Data dependency
A data dependency in computer science is a situation in which a program statement refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements is called dependence analysis.There are three types of dependencies: data, name, and...
.)
This technology is transparent to operating systems and programs. The minimum that is required to take advantage of hyper-threading is symmetric multiprocessing
Symmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...
(SMP) support in the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
, as the logical processors appear as standard separate processors.
It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's thread scheduler
Scheduling (computing)
In computer science, a scheduling is the method by which threads, processes or data flows are given access to system resources . This is usually done to load balance a system effectively or achieve a target quality of service...
is unaware of hyper-threading it will treat all four processors as being the same. If only two threads are eligible to run it might choose to schedule those threads on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA
Non-Uniform Memory Access
Non-Uniform Memory Access is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor...
systems.
History
The hyper-threading technology found its roots in Digital Equipment CorporationDigital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
but was brought to the market by Intel.
Hyper-Threading was first introduced in the Foster MP-based Xeon in March 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor. Previous generations of Intel's processors based on the Core microarchitecture do not have Hyper-Threading, because the Core microarchitecture is a descendant of the P6 microarchitecture used in iterations of Pentium since the Pentium Pro
Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel introduced in November 1, 1995 . It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications...
through the Pentium III
Pentium III
The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation P6 microarchitecture introduced on February 26, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded microprocessors...
and the Celeron
Celeron
Celeron is a brand name given by Intel Corp. to a number of different x86 computer microprocessor models targeted at budget personal computers....
(Covington, Mendocino, Coppermine and Tualatin-based) and the Pentium II Xeon
Xeon
The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...
and Pentium III Xeon models.
Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. The first generation Nehalem contains four cores and effectively scales eight threads. Since then, both two- and six-core models have been released, scaling four and twelve threads respectively.
The Intel Atom
Intel Atom
Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs from Intel, designed in 45 nm CMOS and used mainly in netbooks, nettops, embedded application ranging from health care to advanced robotics and Mobile Internet devices...
is an in-order processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.
The Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
9300 launched with eight threads per processor (two threads per core) through enhanced hyper-threading technology. Poulson, the next-generation Itanium, is scheduled to have additional hyper-threading enhancements.
The Intel Xeon
Xeon
The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...
5500 server chips also utilize two-way hyper-threading.
Performance
The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.According to Intel the first implementation only used 5% more die area
Die (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...
than the comparable non-hyperthreaded processor, but the performance was 15–30% better.
Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
Pentium 4. Tomshardware.com states "In some cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz without HT turned on". Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms. The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper-Threading Technology is turned on. This is due to the replay system
Replay system
The Replay system is a little known subsystem within the Intel Pentium 4 processor. Its primary function is to catch operations that have been mistakenly sent for execution by the processor's scheduler...
of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs which adds a varying amount of execution time. The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system. This is enough to completely overcome that performance hit.
Drawback history
When the Intel Pentium 4 3.06 GHz HT was released it was difficult for some application programmers to decide whether it was best to use Hyper-Threading technology or not for their specific applications, because some programmers were still testing their programs on operating systems that were not optimized for hyper-threading technology (e.g. Windows 2000), and most computers had single-threaded processors instead of bi-threaded processors at the time.In 2006, hyper-threading was criticised for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT) can use up to 46% more power than dual-core designs. Furthermore, they claim SMT
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
increases cache thrashing by 42%, whereas dual core results in a 37% decrease. Intel has disputed this claim, stating that hyper-threading is highly efficient because it simply uses resources that would otherwise be idle. In 2010, ARM has stated that it will include simultaneous multithreading in its chips in the future.
Security
In May 2005 Colin Percival demonstrated that on the Pentium 4, a malicious thread can use a timing attackTiming attack
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms...
to monitor the memory access patterns of another thread with which it shares a cache, allowing the theft of cryptographic information. Potential solutions to this include the processor changing its cache eviction strategy, or the operating system preventing the simultaneous execution, on the same physical core, of threads with different privileges.
External links
- Intel's high level overview of Hyper-threading
- Hyper-threading on MSDN Magazine
- HyperThreading Overview from OSDEV Community (Wayback MachineWayback MachineThe Wayback Machine is a digital time capsule created by the Internet Archive non-profit organization, based in San Francisco, California. It is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a "three...
) - An introductory article from Ars Technica
- Hyper-Threading Technology Architecture and Microarchitecture, technical description of Hyper-Threading (1.2 MB PDF-file)
- Enter Patent Number 4,847,755
- Merom, Conroe, Woodcrest lose HyperThreading
Security
- KernelTrapKernelTrapKernelTrap is a computing news website which covered topics related to the development of free and open source operating system kernels, and especially, the Linux kernel....
discussion: Hyper-Threading Vulnerability
Performance
- ZDnet: Hyperthreading hurts server performance, say developers
- ARM is no fan of HyperThreading - Outlines problems of SMT solutions
- Replay: Unknown Features of the NetBurst Core