Tornado codes - AbsoluteAstronomy.com

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, Tornado codes are a class of erasure codes that support error correction. Tornado codes require a constant C more redundant blocks than the more data-efficient Reed-Solomon erasure codes, but are much faster to generate and can fix erasures faster. Software-based implementations of tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than Reed-Solomon erasure codes. Since the introduction of Tornado codes, many other similar erasure code

Erasure code

In information theory, an erasure code is a forward error correction code for the binary erasure channel, which transforms a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols...

s have emerged, most notably Online codes

Online codes

In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message...

, LT codes

LT codes

In computer science, Luby transform codes are the first class of practical fountain codes that are near optimal erasure correcting codes invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, LT codes depend on sparse bipartite graphs to trade reception overhead...

and Raptor codes

Raptor codes

In computer science, raptor codes are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an extended abstract...

.

Tornado codes use a layered approach. All layers except the last use an LDPC error correction code, which is fast but has a chance of failure. The final layer uses a Reed-Solomon correction code, which is slower but is optimal in terms of failure recovery. Tornado codes dictates how many levels, how many recovery blocks in each level, and the distribution used to generate blocks for the non-final layers.

Overview

The input data is divided into blocks. Blocks are sequences of bits that are all the same size. Recovery data uses the same block size as the input data. The erasure of a block (input or recovery) is detected by some other means. (For example, a block from disk does not pass a CRC check or a network packet with a given sequence number never arrived.)

The number of recovery blocks is given by the user. Then the number of levels is determined along with the number of blocks in each level. The number in each level is determined by a factor B which is less than one. If there are N input blocks, the first recovery level has B*N blocks, the second has B*B*N, the third has B*B*B*N, and so on.

All levels of recovery except the final one use an LDPC, which works by xor (exclusive-or). Xor operates on binary values, 1s and 0s. A xor B is 1 if A and B have different values and 0 if A and B have the same values. If you are given (A xor B) and A, you can determine the value for B. (A xor B xor A = B) Similarly, if you are given (A xor B) and B, you can determine the value for A. This extends to multiple values, so given (A xor B xor C xor D) and any 3 of the values, the missing value can be recovered.

So the recovery blocks in level one are just the xor of some set of input blocks. Similarly, the recovery blocks in level two are each the xor of some set of blocks in level one. The blocks used in the xor are chosen randomly, without repetition. However, the number of blocks xor'ed to make a recovery block is chosen from a very specific distribution for each level.

Since xor is a fast operation and the recovery blocks are an xor of only a subset of the blocks in the input (or at a lower recovery level), the recovery blocks can be generated quickly.

The final level is a Reed-Solomon code. Reed-Solomon codes are optimal in terms of recovering from failures, but slow to generate and recover. Since each level has fewer blocks than the one before, the Reed-Solomon code has a small number of recovery blocks to generate and to use in recovery. So, even though Reed-Solomon is slow, it only has a small amount of data to handle.

During recovery, the Reed-Solomon code is recovered first. This is guaranteed to work if the number of missing blocks in the next-to-final level is less than the present blocks in the final level.

Going lower, the LDPC (xor) recovery level can be used to recover the level beneath it with high probability if all the recovery blocks are present and the level beneath is missing at most C' fewer blocks than the recovery level. The algorithm for recovery is to find some recovery block that has only one of its generating set missing from the lower level. Then the xor of the recovery block with all of the blocks that are present is equal to the missing block.

External links

A readable description from CMU (PostScript) http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscico-guyb/realworld/www/tornado.ps and another from Luby at the International Computer Science Institute

International Computer Science Institute

The International Computer Science Institute is an independent, non-profit research organization located in Berkeley, California, USA. Since its founding in 1988, ICSI has maintained an affiliation with the University of California, Berkeley, where several of its members hold faculty appointments...

(PostScript) http://www.icsi.berkeley.edu/~luby/PAPERS/tordig.ps.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.