FMA instruction set
Encyclopedia
The FMA instruction set is the name of a future extension to the 128-bit SIMD
instructions in the X86 microprocessor
instruction set
to perform fused multiply–add (FMA) operations. Two different variants of FMA instruction sets will be used:
scalar and SIMD
operations. It will take some time for compilers to find mechanisms to cope with the differences and optimize code accordingly.
The 4-operand form (FMA4) allows a, b, c and d to be four different registers, while the 3-operand form (FMA3) requires that d is the same register as either a, b or c. The 3-operand form makes the code shorter and the hardware implementation slightly simpler while the 4-operand form provides more programming flexibility.
See XOP instruction set for more discussion of compatibility issues between Intel and AMD.
It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...
instructions in the X86 microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
to perform fused multiply–add (FMA) operations. Two different variants of FMA instruction sets will be used:
- FMA3 will be supported by IntelIntel CorporationIntel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...
in their Haswell processors in 2013 & Broadwell processors in 2014 - FMA4 will be supported in AMDAdvanced Micro DevicesAdvanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...
processors from 2011.
New instructions
The FMA3 and FMA4 instruction sets have almost identical functionality but are not mutually compatible. Both contain fused multiply–add (FMA) instructions for floating pointFloating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
scalar and SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
operations. It will take some time for compilers to find mechanisms to cope with the differences and optimize code accordingly.
Compatibility issue
The difference between FMA3 and FMA4 concerns the issue of whether the instruction can have three or four different operands. The FMA operation has the form:The 4-operand form (FMA4) allows a, b, c and d to be four different registers, while the 3-operand form (FMA3) requires that d is the same register as either a, b or c. The 3-operand form makes the code shorter and the hardware implementation slightly simpler while the 4-operand form provides more programming flexibility.
See XOP instruction set for more discussion of compatibility issues between Intel and AMD.
CPUs with FMA3
- Intel
- Intel will introduce hardware FMA in processors based on Haswell (microarchitecture) during 2013.
- AMD
- AMD will support FMA3 in the future for compatibility reasons if Intel sticks to FMA3 only.. It is rumoured that the second version of BulldozerBulldozer (processor)Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...
processor core, codenamed Piledriver, which will arrive in 2012, might support FMA3.
- AMD will support FMA3 in the future for compatibility reasons if Intel sticks to FMA3 only.. It is rumoured that the second version of Bulldozer
Excerpt from FMA3
Mnemonic (AT&T) | Operands | Operation |
---|---|---|
VFMADD132PDy | ymm, ymm, ymm/m256 | $0 = $0×$2 + $1 |
VFMADD132PSy | ||
VFMADD132PDx | xmm, xmm, xmm/m128 | |
VFMADD132PSx | ||
VFMADD132SD | xmm, xmm, xmm/m64 | |
VFMADD132SS | xmm, xmm, xmm/m32 | |
VFMADD213PDy | ymm, ymm, ymm/m256 | $0 = $1×$0 + $2 |
VFMADD213PSy | ||
VFMADD213PDx | xmm, xmm, xmm/m128 | |
VFMADD213PSx | ||
VFMADD213SD | xmm, xmm, xmm/m64 | |
VFMADD213SS | xmm, xmm, xmm/m32 | |
VFMADD231PDy | ymm, ymm, ymm/m256 | $0 = $1×$2 + $0 |
VFMADD231PSy | ||
VFMADD231PDx | xmm, xmm, xmm/m128 | |
VFMADD231PSx | ||
VFMADD231SD | xmm, xmm, xmm/m64 | |
VFMADD231SS | xmm, xmm, xmm/m32 |
CPUs with FMA4
- AMD
- BulldozerBulldozer (processor)Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...
processor core, due to begin production in 2011.
- Bulldozer
- Intel
- It is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.
Excerpt from FMA4
Mnemonic (AT&T) | Operands | Operation |
---|---|---|
VFMADDPDx | xmm, xmm, xmm/m128, xmm/m128 | $0 = $1×$2 + $3 |
VFMADDPDy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDPSx | xmm, xmm, xmm/m128, xmm/m128 | |
VFMADDPSy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDSD | xmm, xmm, xmm/m64, xmm/m64 | |
VFMADDSS | xmm, xmm, xmm/m32, xmm/m32 |
History
The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:- August 2007: AMD announces the SSE5SSE5The SSE5 was an instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture....
instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands. - April 2008: Intel announces their AVXAdvanced Vector ExtensionsAdvanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...
and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEXVEX prefixThe VEX prefix and VEX coding scheme is a proposed future extension to the x86 instruction set architecture for microprocessors from Intel, AMD and others.-Features:...
coding scheme which is more flexible than AMD's DREX scheme. - December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.
- May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.
It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.