Fletcher's checksum
Encyclopedia
The Fletcher checksum is an algorithm
for computing a position-dependent checksum devised by John G. Fletcher at Lawrence Livermore Labs
in the late 1970s. A description of the algorithm and an analysis of the performance characteristics of a particular implementation were published in the IEEE Transactions on Communications in January 1982. The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check
but with the lower computational effort associated with summation techniques.
As an example, the data may be a message to be transmitted consisting of 136 characters, each stored as an 8-bit byte, making a data word of 1088 bits in total. A convenient block size would be 8 bits, although this is not required. Similarly, a convenient modulus would be 255, although, again, others could be chosen. So, the simple checksum is computed by adding together all the 8-bit bytes of the message, dividing by 255 and keeping only the remainder. (In practice, the modulo operation
is performed during the summation to control the size of the result.) The checksum value is transmitted with the message, increasing its length to 137 bytes or 1096 bits. The receiver of the message can re-compute the checksum and compare it to the value received to determine whether the message has been altered by the transmission process.
Sensitivity to the order of blocks is introduced because once a block is added to the first sum, it is then repeatedly added to the second sum along with every block after it. If, for example, two adjacent blocks become exchanged, the one that was originally first will be added to the second sum one fewer times and the one that was originally second will be added to the second sum one more time. The final value of the first sum will be the same but the second sum will be different, detecting the change to the message.
The universe of possible checksum values is now the square of the value for the simple checksum. In our example, the two sums each with 255 possible values result in 65025 possible values for the combined checksum.
The choice of modulus must obviously be such that the results will fit in the block size. 256 is therefore the largest possible modulus for Fletcher-16. It is a poor choice, however, as bits that overflow past bit 7 of the sum are simply lost. A modulus that takes the overflow bits and mixes them into the lower bits provides better error detection. The modulus should, however, be large so as to obtain the largest universe of checksum values. The value 255 takes the second consideration over the first, but has been found to have excellent performance.
checksum is a specialization of the Fletcher-32 checksum devised by Mark Adler
. The modulus selected (for both sums) is the prime number 65,521 (65,535 is divisible by 3, 5, 17 and 257). The first sum also begins with the value 1. The selection of a prime modulus results in improved "mixing" (error patterns are detected with more uniform probability, improving the probability that the least detectable patterns will be detected, which tends to dominate overall performance). However, the reduction in size of the universe of possible checksum values acts against this and reduces performance slightly. Studies show that the difference in performance of the Adler-32 and Fletcher-32 checksums is so small as to be of academic interest only. As modulo-65,535 addition is considerably simpler and faster to implement than modulo-65,521 addition, the Fletcher-32 checksum is generally to be preferred.
The checksum bytes are 0xF9 0x04. The transmitted byte stream is 0x01 0x02 0xF9 0x04. The receiver runs the checksum on all four bytes and calculates a passing checksum of 0x00 0x00.
An inefficient but straightforward implementation of a C language
function
to compute the Fletcher-16 checksum of an array of 8-bit data elements follows:
On lines 3 and 4, the sums are 16-bit variables
so that the additions on lines 9 and 10 will not overflow
. The modulo operation
is applied to the first sum on line 9 and to the second sum on line 10. Here, this is done after each addition, so that at the end of the while loop
the sums are always reduced to 8-bits. At the end of the input data, the two sums are combined into the 16-bit Fletcher checksum value and returned by the function on line 13.
Each sum is computed modulo 255 and thus remains less than 0xFF at all times. This implementation will thus never produce the checksum results 0x00FF, 0xFF00 or 0xFFFF. It can produce the checksum result 0x0000, which may not be desirable in some circumstances (e.g. when this value has been reserved to mean "no checksum has been computed").
operates as follows:
A few tricks, well-known to implementers of the IP checksum, are used here for efficiency:
An efficient 8 bit implementation in the C programming language
is as follows:
Bit and byte ordering (endianness
As with any calculation that divides a binary data word into short blocks and treats the blocks as numbers, any two systems expecting to get the same result should preserve the ordering of bits in the data word. In this respect, the Fletcher checksum is no different from other checksum and CRC algorithms and needs no special explanation.
An ordering problem that is easy to envision occurs when the data word is transferred byte-by-byte between a big-endian system and a little-endian system and the Fletcher-32 checksum is computed. If blocks are extracted from the data word in memory by a simple read of a 16-bit unsigned integer, then the values of the blocks will be different in the two systems, due to the reversal of the byte order of 16-bit data elements in memory, and the checksum result will be different as a consequence. The implementation examples, above, do not address ordering issues so as not to obscure the checksum algorithm. Because the Fletcher-16 checksum uses 8-bit blocks, it is not affected by byte endianness.
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
for computing a position-dependent checksum devised by John G. Fletcher at Lawrence Livermore Labs
Lawrence Livermore National Laboratory
The Lawrence Livermore National Laboratory , just outside Livermore, California, is a Federally Funded Research and Development Center founded by the University of California in 1952...
in the late 1970s. A description of the algorithm and an analysis of the performance characteristics of a particular implementation were published in the IEEE Transactions on Communications in January 1982. The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...
but with the lower computational effort associated with summation techniques.
Review of simple checksums
As with simpler checksum algorithms, the Fletcher checksum involves dividing the binary data word to be protected from errors into short "blocks" of bits and computing the modular sum of those blocks. (Note that the terminology used in this domain can be confusing. The data to be protected, in its entirety, is referred to as a "word" and the pieces into which it is divided are referred to as "blocks". It is tempting to think of a block of data divided into words, which gets the terms the wrong way round.)As an example, the data may be a message to be transmitted consisting of 136 characters, each stored as an 8-bit byte, making a data word of 1088 bits in total. A convenient block size would be 8 bits, although this is not required. Similarly, a convenient modulus would be 255, although, again, others could be chosen. So, the simple checksum is computed by adding together all the 8-bit bytes of the message, dividing by 255 and keeping only the remainder. (In practice, the modulo operation
Modulo operation
In computing, the modulo operation finds the remainder of division of one number by another.Given two positive numbers, and , a modulo n can be thought of as the remainder, on division of a by n...
is performed during the summation to control the size of the result.) The checksum value is transmitted with the message, increasing its length to 137 bytes or 1096 bits. The receiver of the message can re-compute the checksum and compare it to the value received to determine whether the message has been altered by the transmission process.
Weaknesses of simple checksums
The first weakness of the simple checksum is that it is insensitive to the order of the blocks (bytes) in the data word (message). If the order is changed, the checksum value will be the same and the change will not be detected. The second weakness is that the universe of checksum values is small, being equal to the chosen modulus. In our example, there are only 255 possible checksum values, so it is easy to see that even random data has about a 0.4% probability of having the same checksum as our message.The Fletcher checksum
Fletcher addresses both of these weaknesses by computing a second value along with the simple checksum. This is the modular sum of the values taken by the simple checksum as each block of the data word is added to it. The modulus used is the same. So, for each block of the data word, taken in sequence, the block's value is added to the first sum and the new value of the first sum is then added to the second sum. Both sums start with the value zero (or some other known value). At the end of the data word, the modulus operator is applied and the two values are combined to form the Fletcher checksum value.Sensitivity to the order of blocks is introduced because once a block is added to the first sum, it is then repeatedly added to the second sum along with every block after it. If, for example, two adjacent blocks become exchanged, the one that was originally first will be added to the second sum one fewer times and the one that was originally second will be added to the second sum one more time. The final value of the first sum will be the same but the second sum will be different, detecting the change to the message.
The universe of possible checksum values is now the square of the value for the simple checksum. In our example, the two sums each with 255 possible values result in 65025 possible values for the combined checksum.
Fletcher-16
When the data word is divided into 8 bit blocks, as in the example above, two 8-bit sums result and are combined into a 16-bit Fletcher checksum. Usually, the second sum will be multiplied by 256 and added to the simple checksum, effectively stacking the sums side-by-side in a 16-bit word with the simple checksum at the least significant end. This algorithm is then called the Fletcher-16 checksum. The use of the modulus 255 is also generally implied.The choice of modulus must obviously be such that the results will fit in the block size. 256 is therefore the largest possible modulus for Fletcher-16. It is a poor choice, however, as bits that overflow past bit 7 of the sum are simply lost. A modulus that takes the overflow bits and mixes them into the lower bits provides better error detection. The modulus should, however, be large so as to obtain the largest universe of checksum values. The value 255 takes the second consideration over the first, but has been found to have excellent performance.
Fletcher-32
When the data word is divided into 16 bit blocks, two 16-bit sums result and are combined into a 32-bit Fletcher checksum. Usually, the second sum will be multiplied by 216 and added to the simple checksum, effectively stacking the sums side-by-side in a 32-bit word with the simple checksum at the least significant end. This algorithm is then called the Fletcher-32 checksum. The use of the modulus 65,535 is also generally implied. The rationale for this choice is the same as for Fletcher-16.Fletcher-64
When the data word is divided into 32 bit blocks, two 32-bit sums result and are combined into a 64-bit Fletcher checksum. Usually, the second sum will be multiplied by 232 and added to the simple checksum, effectively stacking the sums side-by-side in a 64-bit word with the simple checksum at the least significant end. This algorithm is then called the Fletcher-64 checksum. The use of the modulus 4,294,967,295 is also generally implied. The rationale for this choice is the same as for Fletcher-16 and Fletcher-32.Comparison with the Adler checksum
The Adler-32Adler-32
Adler-32 is a checksum algorithm which was invented by Mark Adler in 1995, and is a modification of the Fletcher checksum. Compared to a cyclic redundancy check of the same length, it trades reliability for speed. Adler-32 is more reliable than Fletcher-16, and slightly less reliable than Fletcher-32...
checksum is a specialization of the Fletcher-32 checksum devised by Mark Adler
Mark Adler
Dr. Mark Adler may be best known for his work in the field of data compression. Adler is the author of the Adler-32 hash function, a co-author of the zlib compression library and gzip, has contributed to Info-ZIP, and has participated in developing the Portable Network Graphics image format...
. The modulus selected (for both sums) is the prime number 65,521 (65,535 is divisible by 3, 5, 17 and 257). The first sum also begins with the value 1. The selection of a prime modulus results in improved "mixing" (error patterns are detected with more uniform probability, improving the probability that the least detectable patterns will be detected, which tends to dominate overall performance). However, the reduction in size of the universe of possible checksum values acts against this and reduces performance slightly. Studies show that the difference in performance of the Adler-32 and Fletcher-32 checksums is so small as to be of academic interest only. As modulo-65,535 addition is considerably simpler and faster to implement than modulo-65,521 addition, the Fletcher-32 checksum is generally to be preferred.
Example calculation of the Fletcher checksum
As an example, we'll calculate the checksum and verify it for a byte stream of 0x01 0x02.- C0_initial = 0
- C1_initial = 0
Byte | C0 = C0 + B | C1 = C1 + C0 |
---|---|---|
0x01 | 0x01 | 0x01 |
0x02 | 0x03 | 0x04 |
−(C0 + C1) = 0xF9 | 0x03 + 0xF9 = 0xFC | 0x04 + 0xFC = 0x00 |
−(C0 + C1) = 0x04 | 0xFC + 0x04 = 0x00 | 0x00 + 0x00 = 0x00 |
The checksum bytes are 0xF9 0x04. The transmitted byte stream is 0x01 0x02 0xF9 0x04. The receiver runs the checksum on all four bytes and calculates a passing checksum of 0x00 0x00.
Weaknesses
The Fletcher checksum cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. For example, if a 16-bit block in the data word changes from 0x0000 to 0xFFFF, the Fletcher-32 checksum remains the same. This also means a sequence of all 00 bytes has the same checksum as a sequence (of the same size) of all FF bytes.Implementation
These examples assume two's complement arithmetic, as Fletcher's algorithm will be incorrect on one's complement machines.Straightforward
The below is a treatment on how to calculate the checksum including the check bytes; i.e., the final result should equal 0, given properly-calculated check bytes. The code by itself, however, will not calculate the check bytes.An inefficient but straightforward implementation of a C language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
function
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
to compute the Fletcher-16 checksum of an array of 8-bit data elements follows:
On lines 3 and 4, the sums are 16-bit variables
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
so that the additions on lines 9 and 10 will not overflow
Arithmetic overflow
The term arithmetic overflow or simply overflow has the following meanings.# In a computer, the condition that occurs when a calculation produces a result that is greater in magnitude than that which a given register or storage location can store or represent.# In a computer, the amount by which a...
. The modulo operation
Modulo operation
In computing, the modulo operation finds the remainder of division of one number by another.Given two positive numbers, and , a modulo n can be thought of as the remainder, on division of a by n...
is applied to the first sum on line 9 and to the second sum on line 10. Here, this is done after each addition, so that at the end of the while loop
While loop
In most computer programming languages, a while loop is a control flow statement that allows code to be executed repeatedly based on a given boolean condition. The while loop can be thought of as a repeating if statement....
the sums are always reduced to 8-bits. At the end of the input data, the two sums are combined into the 16-bit Fletcher checksum value and returned by the function on line 13.
Each sum is computed modulo 255 and thus remains less than 0xFF at all times. This implementation will thus never produce the checksum results 0x00FF, 0xFF00 or 0xFFFF. It can produce the checksum result 0x0000, which may not be desirable in some circumstances (e.g. when this value has been reserved to mean "no checksum has been computed").
Check bytes
Example source code for calculating the check bytes, using the above function, is as follows. The check bytes may be appended to the end of the data stream, with the MSB coming before the LSB, and the LSB being the last byte in the stream.Optimizations
An optimized implementation in the C programming languageC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
operates as follows:
A few tricks, well-known to implementers of the IP checksum, are used here for efficiency:
- This reduces to the range 1..65535 rather than 0..65534. Modulo 65535, the values 65535 =
0xffff
and 0 are equivalent, but it is easier to detect overflow if the former convention is used. This also provides the guarantee that the resultant checksum will never be zero, so that value is available for a special flag, such as "checksum not yet computed". - 65536 ≡ 1 mod 65535, so the end-around carry expression
(x & 0xffff) + (x >> 16)
reducesx
modulo 65535. Only doing it once is not guaranteed to be complete, but it will be in the range1..0x1fffe
. A second repetition guarantees a fully reduced sum in the range of1..0xffff
. - This uses a 32-bit accumulator to perform a number of sums before doing any modular reduction. The magic value 360 is the largest number of sums that can be performed without numeric overflow. Any smaller value is also permissible; 256 may be convenient in many cases.
- For 8 bit checksums (with 16 bit accumulators) the maximum number of sums that can be performed before doing the modular reduction is 21.
An efficient 8 bit implementation in the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
is as follows:
Bit and byte ordering (endiannessEndiannessIn computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
/ network order)
As with any calculation that divides a binary data word into short blocks and treats the blocks as numbers, any two systems expecting to get the same result should preserve the ordering of bits in the data word. In this respect, the Fletcher checksum is no different from other checksum and CRC algorithms and needs no special explanation.An ordering problem that is easy to envision occurs when the data word is transferred byte-by-byte between a big-endian system and a little-endian system and the Fletcher-32 checksum is computed. If blocks are extracted from the data word in memory by a simple read of a 16-bit unsigned integer, then the values of the blocks will be different in the two systems, due to the reversal of the byte order of 16-bit data elements in memory, and the checksum result will be different as a consequence. The implementation examples, above, do not address ordering issues so as not to obscure the checksum algorithm. Because the Fletcher-16 checksum uses 8-bit blocks, it is not affected by byte endianness.
Further references
- Fletcher, J. G., “An Arithmetic Checksum for Serial Transmissions”, IEEE Trans. on Comm., Vol. COM-30, No. 1, January 1982, pp 247–252.
External links
- RFC 905 - ISO Transport Protocol Specification describes the Fletcher checksum algorithm summing to zero.
- RFC 1146 - TCP Alternate Checksum Options describes the Fletcher checksum algorithm for use with TCP.
- RFC 905 - information about generating (as well as verifying) such a checksum in Annex B.
- Performance of Checksums and CRCs over Real Data
- Maxino & Koopman - compares Adler, Fletcher, and CRC checksums
- John Kodis - When it comes to high-speed data verification, Fletcher's checksum algorithm can do the job.