Rabin fingerprint
Encyclopedia
The Rabin fingerprinting scheme is a method for implementing public key fingerprint
s using polynomial
s over a finite field
.
GF(2).
We then pick a random irreducible polynomial
p(x) of degree k over GF(2), and we define the fingerprint of m to be the remainder after division of by over GF(2) which can be viewed as a polynomial of degree k-1 or as a k-bit number.
The basic idea is that the filesystem computes the cryptographic hash of each block in a file. To save on transfers between the client and server,
they compare their checksums and only transfer blocks whose checksums differ. But one problem with this scheme is that a single insertion at the beginning of the file will cause every checksum to change if fixed-sized (e.g. 4 KB) blocks are used. So the idea is to select blocks not based on a specific offset but rather by some property of the block contents. LBFS does this by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one. Since the output of Rabin fingerprints are pseudo-random the probability of any given 48 bytes being a breakpoint is . This has the effect of shift-resistant variable size blocks. Now, it is clear that any cryptographic hash could be used for this purpose: but the Rabin fingerprint is an efficient rolling hash
, since the computation of the Rabin fingerprint of region B can reuse some of the computation of the Rabin fingerprint of region A when regions A and B overlap. Note that this is a problem similar to that faced by rsync
.
Public key fingerprint
In public-key cryptography, a public key fingerprint is a short sequence of bytes used to authenticate or look up a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key...
s using polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...
s over a finite field
Finite field
In abstract algebra, a finite field or Galois field is a field that contains a finite number of elements. Finite fields are important in number theory, algebraic geometry, Galois theory, cryptography, and coding theory...
.
Scheme
Given an n-bit message m0,...,mn-1, we view it as a polynomial of degree n-1 over the finite fieldFinite field
In abstract algebra, a finite field or Galois field is a field that contains a finite number of elements. Finite fields are important in number theory, algebraic geometry, Galois theory, cryptography, and coding theory...
GF(2).
We then pick a random irreducible polynomial
Irreducible polynomial
In mathematics, the adjective irreducible means that an object cannot be expressed as the product of two or more non-trivial factors in a given set. See also factorization....
p(x) of degree k over GF(2), and we define the fingerprint of m to be the remainder after division of by over GF(2) which can be viewed as a polynomial of degree k-1 or as a k-bit number.
Applications
The Low Bandwidth Network Filesystem (LBFS) from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.The basic idea is that the filesystem computes the cryptographic hash of each block in a file. To save on transfers between the client and server,
they compare their checksums and only transfer blocks whose checksums differ. But one problem with this scheme is that a single insertion at the beginning of the file will cause every checksum to change if fixed-sized (e.g. 4 KB) blocks are used. So the idea is to select blocks not based on a specific offset but rather by some property of the block contents. LBFS does this by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one. Since the output of Rabin fingerprints are pseudo-random the probability of any given 48 bytes being a breakpoint is . This has the effect of shift-resistant variable size blocks. Now, it is clear that any cryptographic hash could be used for this purpose: but the Rabin fingerprint is an efficient rolling hash
Rolling hash
A rolling hash is a hash function where the input is hashed in a window that moves through the input.A few hash functions allow a rolling hash to be computed very quickly -- the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new...
, since the computation of the Rabin fingerprint of region B can reuse some of the computation of the Rabin fingerprint of region A when regions A and B overlap. Note that this is a problem similar to that faced by rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...
.