Fountain code
Encyclopedia
In coding theory
, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the original source symbols can ideally be recovered from any subset of the encoding symbols of size equal to or only slightly larger than the number of source symbols. The term fountain or rateless refers to the fact that these codes do not exhibit a fixed code rate
.
A fountain code is optimal if the original k source symbols can be recovered from any k encoding symbols. Fountain codes are known that have efficient encoding and decoding algorithms and that allow the recovery of the original k source symbols from any k’ of the encoding symbols with high probability, where k’ is just slightly larger than k.
LT codes
were the first practical realization of fountain codes. Raptor codes
and Online codes
were subsequently introduced, and achieve linear time encoding and decoding complexity through a pre-coding stage of the input symbols.
, or where a fixed code rate cannot be determined a priori, and where efficient encoding and decoding of large amounts of data is required.
One example is that of a data carousel, where some large file is continuously broadcast to a set of receivers. Using a fixed-rate erasure code, a receiver missing a source symbol (due to a transmission error) faces the coupon collector's problem
: it must successfully receive an encoding symbol which it does not already have. This problem becomes much more apparent when using a traditional short-length erasure code, as the file must be split into several blocks, each being separately encoded: the receiver must now collect the required number of missing encoding symbols for each block. Using a fountain code, it suffices for a receiver to retrieve any subset of encoding symbols of size slightly larger than the set of source symbols. (In practice, the broadcast is typically scheduled for a fixed period of time by an operator based on characteristics of the network and receivers and desired delivery reliability, and thus the fountain code is used at a code rate that is determined dynamically at the time when the file is scheduled to be broadcast.)
Another application is that of hybrid ARQ
in reliable multicast
scenarios: parity information that is requested by a receiver can potentially be useful for all receivers in the multicast group.
Raptor code, which has been adopted into multiple standards beyond the IETF, such as within the 3GPP
MBMS
standard for broadcast file delivery and streaming services, the DVB-H
IPDC standard for delivering IP services over DVB networks, and DVB-IPTV for delivering commercial TV services over an IP network. This code can be used with up to 8,192 source symbols in a source block, and a total of up to 65,536 encoded symbols generated for a source block. This code has an average relative reception overhead of 0.2% when applied to source blocks with 1,000 source symbols, and has a relative reception overhead of less than 2% with probability 99.9999%. The relative reception overhead is defined as the extra encoding data required beyond the length of the source data to recover the original source data, measured as a percentage of the size of the source data. For example, if the relative reception overhead is 0.2%, then this means that source data of size 1 Megabyte can be recovered from 1.002 Megabytes of encoding data.
A more advanced Raptor code with greater flexibility and improved reception overhead, called RaptorQ, has been introduced into the IETF. This code can be used with up to 56,403 source symbols in a source block, and a total of up to 16,777,216 encoded symbols generated for a source block. This code is able to recover a source block from any set of encoded symbols equal to the number of source symbols in the source block with high probability, and in rare cases from slightly more than the number of source symbols in the source block.
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...
, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the original source symbols can ideally be recovered from any subset of the encoding symbols of size equal to or only slightly larger than the number of source symbols. The term fountain or rateless refers to the fact that these codes do not exhibit a fixed code rate
Code rate
In telecommunication and information theory, the code rate of a forward error correction code is the proportion of the data-stream that is useful...
.
A fountain code is optimal if the original k source symbols can be recovered from any k encoding symbols. Fountain codes are known that have efficient encoding and decoding algorithms and that allow the recovery of the original k source symbols from any k’ of the encoding symbols with high probability, where k’ is just slightly larger than k.
LT codes
LT codes
In computer science, Luby transform codes are the first class of practical fountain codes that are near optimal erasure correcting codes invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, LT codes depend on sparse bipartite graphs to trade reception overhead...
were the first practical realization of fountain codes. Raptor codes
Raptor codes
In computer science, raptor codes are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an extended abstract...
and Online codes
Online codes
In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message...
were subsequently introduced, and achieve linear time encoding and decoding complexity through a pre-coding stage of the input symbols.
Applications
Fountain codes are flexibly applicable at a fixed code rateCode rate
In telecommunication and information theory, the code rate of a forward error correction code is the proportion of the data-stream that is useful...
, or where a fixed code rate cannot be determined a priori, and where efficient encoding and decoding of large amounts of data is required.
One example is that of a data carousel, where some large file is continuously broadcast to a set of receivers. Using a fixed-rate erasure code, a receiver missing a source symbol (due to a transmission error) faces the coupon collector's problem
Coupon collector's problem
In probability theory, the coupon collector's problem describes the "collect all coupons and win" contests. It asks the following question: Suppose that there are n coupons, from which coupons are being collected with replacement...
: it must successfully receive an encoding symbol which it does not already have. This problem becomes much more apparent when using a traditional short-length erasure code, as the file must be split into several blocks, each being separately encoded: the receiver must now collect the required number of missing encoding symbols for each block. Using a fountain code, it suffices for a receiver to retrieve any subset of encoding symbols of size slightly larger than the set of source symbols. (In practice, the broadcast is typically scheduled for a fixed period of time by an operator based on characteristics of the network and receivers and desired delivery reliability, and thus the fountain code is used at a code rate that is determined dynamically at the time when the file is scheduled to be broadcast.)
Another application is that of hybrid ARQ
Hybrid ARQ
Hybrid automatic repeat request is a combination of high-rate forward error-correcting coding, and ARQ error-control for detectable-but-uncorrectable errors. In standard ARQ, redundant bits are added to data to be transmitted using an error-detecting code such as cyclic redundancy check...
in reliable multicast
Reliable multicast
A reliable multicast protocol is a computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications like multi-receiver file transfer or streaming media.-Overview:...
scenarios: parity information that is requested by a receiver can potentially be useful for all receivers in the multicast group.
Fountain codes in standards
Raptor codes are the most efficient fountain codes at this time, having very efficient linear time encoding and decoding algorithms, and requiring only a small constant number of XOR operations per generated symbol for both encoding and decoding. IETF RFC 5053 specifies in detail a systematicSystematic code
In coding theory, a systematic code is any error-correcting code in which the input data is embedded in the encoded output. Conversely, in a non-systematic code the output does not contain the input symbols....
Raptor code, which has been adopted into multiple standards beyond the IETF, such as within the 3GPP
3GPP
The 3rd Generation Partnership Project is a collaboration between groups of telecommunications associations, known as the Organizational Partners...
MBMS
MBMS
MBMS may refer to:* Multimedia Broadcast Multicast Service* Margaret Brent Middle School...
standard for broadcast file delivery and streaming services, the DVB-H
DVB-H
DVB-H is one of three prevalent mobile TV formats. It is a technical specification for bringing broadcast services to mobile handsets. DVB-H was formally adopted as ETSI standard EN 302 304 in November 2004. The DVB-H specification can be downloaded from the official DVB-H website...
IPDC standard for delivering IP services over DVB networks, and DVB-IPTV for delivering commercial TV services over an IP network. This code can be used with up to 8,192 source symbols in a source block, and a total of up to 65,536 encoded symbols generated for a source block. This code has an average relative reception overhead of 0.2% when applied to source blocks with 1,000 source symbols, and has a relative reception overhead of less than 2% with probability 99.9999%. The relative reception overhead is defined as the extra encoding data required beyond the length of the source data to recover the original source data, measured as a percentage of the size of the source data. For example, if the relative reception overhead is 0.2%, then this means that source data of size 1 Megabyte can be recovered from 1.002 Megabytes of encoding data.
A more advanced Raptor code with greater flexibility and improved reception overhead, called RaptorQ, has been introduced into the IETF. This code can be used with up to 56,403 source symbols in a source block, and a total of up to 16,777,216 encoded symbols generated for a source block. This code is able to recover a source block from any set of encoded symbols equal to the number of source symbols in the source block with high probability, and in rare cases from slightly more than the number of source symbols in the source block.
See also
- Raptor codesRaptor codesIn computer science, raptor codes are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an extended abstract...
- LT codesLT codesIn computer science, Luby transform codes are the first class of practical fountain codes that are near optimal erasure correcting codes invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, LT codes depend on sparse bipartite graphs to trade reception overhead...
- Online codesOnline codesIn computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message...
- Tornado codesTornado codesIn computer science, Tornado codes are a class of erasure codes that support error correction. Tornado codes require a constant C more redundant blocks than the more data-efficient Reed-Solomon erasure codes, but are much faster to generate and can fix erasures faster...
, the precursor to Fountain codes