Fully Buffered DIMM
Encyclopedia
Fully Buffered DIMM
(or FB-DIMM) is a memory technology which can be used to increase reliability and density of memory systems. Conventionally, data lines from the memory controller
have to be connected to data lines in every DRAM
module. As memory width, as well as access speed, increases, the signal degrades at the interface of the bus and the device. This limits the speed and/or the memory density. FB-DIMMs take a different approach to solve this problem. As with nearly all RAM specifications, the FB-DIMM specification was published by JEDEC
.
between the memory controller and the AMB. This enables an increase to the width of the memory without increasing the pin count of the memory controller beyond a feasible level. With this architecture, the memory controller does not write to the memory module directly; rather it is done via the AMB. The AMB can thus compensate for signal deterioration by buffering and resending the signal. In addition, the AMB can also offer error correction
, without posing any overhead on the processor or the memory controller. It can also use the Bit Lane Failover Correction feature to identify bad data paths and remove them from operation, which dramatically reduces command/address errors. Also, since reads and writes are buffered, they can be done in parallel by the memory controller. This allows simpler interconnects, and (in theory) hardware-agnostic memory controller chips (such as DDR2
and DDR3
) which can be used interchangeably. The downsides to this approach are that it introduces latency to the memory request, that it requires additional power consumption for the buffer chips, and that current implementations implement a memory write bus that is significantly narrower than the memory read bus, so workloads that use many writes like high-performance computing
workloads will be significantly slowed down by this architecture. However, this slowdown is nowhere near as bad as not having enough memory capacity to avoid using significant amounts of virtual memory
, so workloads that use extreme amounts of memory in irregular patterns might be helped by using fully buffered DIMMs.
The FB-DIMM channel consists of 14 "northbound" bit lanes carrying data from memory to the processor and 10 "southbound" bit lanes carrying commands and data from the processor to memory. Each bit is carried over a differential pair, clocked at 12 times the basic memory clock rate, 6 times the double-pumped data rate. E.g. for DDR2-667 DRAM chips, the channel would operate at 4000 MHz. Every 12 cycles constitutes one frame, 168 bits northbound and 120 bits southbound.
One northbound frame carries 144 data bits, the amount of data produced by a 72-bit wide DDR SDRAM array in that time, and 24 bits of CRC
for error detection. There is no header information, although unused frames include a deliberately invalid CRC.
One southbound frame carries 98 payload bits and 22 CRC bits. Two payload bits are a frame type, and 24 bits are a command. The remaining 72 bits may be either (depending on the frame type), 72 bits of write data, two more 24-bit commands, or one more command plus 36 bits of data to be written to an AMB control register.
The commands correspond to standard DRAM access cycles, such as row select, precharge, and refresh commands. Read and write commands include only column addresses. All commands include a 3-bit FB-DIMM address, allowing up to 8 FB-DIMM modules on a channel.
Because write data is supplied more slowly than DDR memory expects it, writes are buffered in the AMB until they can be written in a burst. Write commands are not directly linked to the write data; instead, each AMB has a write data FIFO which is filled by four consecutive write data frames, and is emptied by a write command.
Both northbound and southbound links can operate at full speed with one bit line disabled, by discarding 12 bits of CRC information per frame.
Note that the bandwidth of an FB-DIMM channel is equal to the peak read bandwidth of a DDR memory channel (and this speed can be sustained, as there is no contention for the northbound channel), plus half of the peak write bandwidth of a DDR memory channel (which can often be sustained, if one command per frame is sufficient). The only overhead is the need for a channel sync frame (which elicits a northbound status frame in response) every 32 to 42 frames (2.5–3% overhead).
5000/5100 series and beyond, which they consider "a long-term strategic direction for servers".
Sun Microsystems
is using FB-DIMMs for the Niagara II (UltraSparc T2)
server processor.
Intel's enthusiast system platform Skulltrail
uses FB-DIMMs for their dual CPU socket, multi-GPU system.
FB-DIMMS have 240 pins and are the same total length as other DDR DIMMs but differ by having indents on both ends within the slot.
The cost of FB-DIMM memory was initially much higher than registered DIMM, which may be one of the factors behind its current level of acceptance. Also, the AMB chip dissipates considerable heat, leading to additional cooling problems. Although strenuous efforts were made to minimize delay in the AMB, there is some noticeable cost in memory access latency.
has taken FB-DIMM off their roadmap. In December 2006, AMD has revealed in one of the slides that microprocessors based on the new K10
microarchtecture has the support for FB-DIMM "when appropriate". In addition, AMD also developed Socket G3 Memory Extender (G3MX
), which uses a single buffer for every 4 modules instead of one for each, to be used by Opteron-based systems in 2009.
In 2007 Intel Developer Forum
, it was revealed that major memory manufacturers have no plans to extend FB-DIMM to support DDR3 SDRAM. Instead, only registered DIMM for DDR3 SDRAM had been demonstrated.
In 2007 Intel demonstrated FB-DIMM with shorter latencies, CL5 and CL3, showing improvement in latencies.
On August 5, 2008, Elpida Memory
announced that it would mass-produce the world's first FB-DIMM at 16 Gigabyte
capacity, as from Q4 2008, however the product has not appeared and the press release has been deleted from Elpida's site.
DIMM
A DIMM or dual in-line memory module, comprises a series of dynamic random-access memory integrated circuits. These modules are mounted on a printed circuit board and designed for use in personal computers, workstations and servers...
(or FB-DIMM) is a memory technology which can be used to increase reliability and density of memory systems. Conventionally, data lines from the memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
have to be connected to data lines in every DRAM
Dynamic random access memory
Dynamic random-access memory is a type of random-access memory that stores each bit of data in a separate capacitor within an integrated circuit. The capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1...
module. As memory width, as well as access speed, increases, the signal degrades at the interface of the bus and the device. This limits the speed and/or the memory density. FB-DIMMs take a different approach to solve this problem. As with nearly all RAM specifications, the FB-DIMM specification was published by JEDEC
JEDEC
The JEDEC Solid State Technology Association, formerly known as the Joint Electron Devices Engineering Council , is an independent semiconductor engineering trade organization and standardization body...
.
Technology
Fully buffered DIMM architecture introduces an advanced memory buffer (AMB) between the memory controller and the memory module. Unlike the parallel bus architecture of traditional DRAMs, an FB-DIMM has a serial interfaceSerial communication
In telecommunication and computer science, serial communication is the process of sending data one bit at a time, sequentially, over a communication channel or computer bus. This is in contrast to parallel communication, where several bits are sent as a whole, on a link with several parallel channels...
between the memory controller and the AMB. This enables an increase to the width of the memory without increasing the pin count of the memory controller beyond a feasible level. With this architecture, the memory controller does not write to the memory module directly; rather it is done via the AMB. The AMB can thus compensate for signal deterioration by buffering and resending the signal. In addition, the AMB can also offer error correction
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels...
, without posing any overhead on the processor or the memory controller. It can also use the Bit Lane Failover Correction feature to identify bad data paths and remove them from operation, which dramatically reduces command/address errors. Also, since reads and writes are buffered, they can be done in parallel by the memory controller. This allows simpler interconnects, and (in theory) hardware-agnostic memory controller chips (such as DDR2
DDR2 SDRAM
DDR2 SDRAM is a double data rate synchronous dynamic random-access memory interface. It supersedes the original DDR SDRAM specification and has itself been superseded by DDR3 SDRAM...
and DDR3
DDR3 SDRAM
In computing, DDR3 SDRAM, an abbreviation for double data rate type three synchronous dynamic random access memory, is a modern kind of dynamic random access memory with a high bandwidth interface. It is one of several variants of DRAM and associated interface techniques used since the early 1970s...
) which can be used interchangeably. The downsides to this approach are that it introduces latency to the memory request, that it requires additional power consumption for the buffer chips, and that current implementations implement a memory write bus that is significantly narrower than the memory read bus, so workloads that use many writes like high-performance computing
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...
workloads will be significantly slowed down by this architecture. However, this slowdown is nowhere near as bad as not having enough memory capacity to avoid using significant amounts of virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
, so workloads that use extreme amounts of memory in irregular patterns might be helped by using fully buffered DIMMs.
Protocol
The JEDEC standard JESD206 defines the protocol, and JESD82-20 defines the AMB interface to DDR2 memory. The protocol is more generally described in many other places.The FB-DIMM channel consists of 14 "northbound" bit lanes carrying data from memory to the processor and 10 "southbound" bit lanes carrying commands and data from the processor to memory. Each bit is carried over a differential pair, clocked at 12 times the basic memory clock rate, 6 times the double-pumped data rate. E.g. for DDR2-667 DRAM chips, the channel would operate at 4000 MHz. Every 12 cycles constitutes one frame, 168 bits northbound and 120 bits southbound.
One northbound frame carries 144 data bits, the amount of data produced by a 72-bit wide DDR SDRAM array in that time, and 24 bits of CRC
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...
for error detection. There is no header information, although unused frames include a deliberately invalid CRC.
One southbound frame carries 98 payload bits and 22 CRC bits. Two payload bits are a frame type, and 24 bits are a command. The remaining 72 bits may be either (depending on the frame type), 72 bits of write data, two more 24-bit commands, or one more command plus 36 bits of data to be written to an AMB control register.
The commands correspond to standard DRAM access cycles, such as row select, precharge, and refresh commands. Read and write commands include only column addresses. All commands include a 3-bit FB-DIMM address, allowing up to 8 FB-DIMM modules on a channel.
Because write data is supplied more slowly than DDR memory expects it, writes are buffered in the AMB until they can be written in a burst. Write commands are not directly linked to the write data; instead, each AMB has a write data FIFO which is filled by four consecutive write data frames, and is emptied by a write command.
Both northbound and southbound links can operate at full speed with one bit line disabled, by discarding 12 bits of CRC information per frame.
Note that the bandwidth of an FB-DIMM channel is equal to the peak read bandwidth of a DDR memory channel (and this speed can be sustained, as there is no contention for the northbound channel), plus half of the peak write bandwidth of a DDR memory channel (which can often be sustained, if one command per frame is sufficient). The only overhead is the need for a channel sync frame (which elicits a northbound status frame in response) every 32 to 42 frames (2.5–3% overhead).
Implementations
Intel has adopted the technology for their newer XeonXeon
The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...
5000/5100 series and beyond, which they consider "a long-term strategic direction for servers".
Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...
is using FB-DIMMs for the Niagara II (UltraSparc T2)
UltraSPARC T2
Sun Microsystems' UltraSPARC T2 microprocessor is a multithreading, multi-core CPU. It is a member of the SPARC family, and the successor to the UltraSPARC T1. The chip is sometimes referred to by its codename, Niagara 2...
server processor.
Intel's enthusiast system platform Skulltrail
Intel Skulltrail
Intel's Skulltrail is an enthusiast gaming platform that was released on February 19, 2008. It is based on the company's 5400 "Seaburg" workstation chipset. The primary difference between Skulltrail and Intel's current and past enthusiast chipsets is a dual CPU socket design that allows two...
uses FB-DIMMs for their dual CPU socket, multi-GPU system.
FB-DIMMS have 240 pins and are the same total length as other DDR DIMMs but differ by having indents on both ends within the slot.
The cost of FB-DIMM memory was initially much higher than registered DIMM, which may be one of the factors behind its current level of acceptance. Also, the AMB chip dissipates considerable heat, leading to additional cooling problems. Although strenuous efforts were made to minimize delay in the AMB, there is some noticeable cost in memory access latency.
Future
As of September 2006, AMDAdvanced Micro Devices
Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...
has taken FB-DIMM off their roadmap. In December 2006, AMD has revealed in one of the slides that microprocessors based on the new K10
AMD K10
The AMD Family 10h is a microprocessor microarchitecture by AMD. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November...
microarchtecture has the support for FB-DIMM "when appropriate". In addition, AMD also developed Socket G3 Memory Extender (G3MX
Socket G3 Memory Extender
The Socket G3 Memory Extender or in short G3MX is Advanced Micro Devices' solution to the problem of connecting large amounts of memory to a single microprocessor...
), which uses a single buffer for every 4 modules instead of one for each, to be used by Opteron-based systems in 2009.
In 2007 Intel Developer Forum
Intel Developer Forum
Intel Developer Forum , is a gathering of technologists to discuss Intel products and products based around Intel products. The first IDF was in 1997...
, it was revealed that major memory manufacturers have no plans to extend FB-DIMM to support DDR3 SDRAM. Instead, only registered DIMM for DDR3 SDRAM had been demonstrated.
In 2007 Intel demonstrated FB-DIMM with shorter latencies, CL5 and CL3, showing improvement in latencies.
On August 5, 2008, Elpida Memory
Elpida Memory
is a corporation established in 1999 that develops, designs, manufactures and sells dynamic random-access memory products. It is also a semiconductor foundry. With headquarters in Yaesu, Chūō, Tokyo, Japan, it was formed under the name NEC Hitachi Memory in 1999 by the merger of the Hitachi, Ltd....
announced that it would mass-produce the world's first FB-DIMM at 16 Gigabyte
Gigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...
capacity, as from Q4 2008, however the product has not appeared and the press release has been deleted from Elpida's site.
External links
- How FB-DIMM Memories Work
- The InquirerThe InquirerThe Inquirer is a British technology tabloid website founded by Mike Magee after his departure from The Register in 2001. In 2006 the site was acquired by Dutch publisher Verenigde Nederlandse Uitgeverijen...
series: Part 1 Part 2 Part 3