Collision resistance
Encyclopedia
Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H(a) = H(b), and a ≠ b.
Every hash function with more inputs than outputs will necessarily have collisions. Consider a hash function such as SHA-256 that produces 256 bits of output from an arbitrarily large input. Since it must generate one of 2256 outputs for each member of a much larger set of inputs, the pigeonhole principle guarantees that some inputs will hash to the same output. Collision resistance doesn't mean that no collisions exist; simply that they are hard to find.
The birthday "paradox"
places an upper bound on collision resistance: if a hash function produces N bits of output, an attacker who computes "only" 2N/2 hash operations on random input is likely to find two matching outputs. If there is an easier method than this brute force attack
, it is typically considered a flaw in the hash function.
Cryptographic hash function
s are usually designed to be collision resistant. But many hash functions that were once thought to be collision resistant were later broken. MD5
and SHA-1 in particular both have published techniques more efficient than brute force for finding collisions. However, some compression functions have a proof that finding collisions is at least as difficult as some hard mathematical problem (such as integer factorization
or discrete logarithm
). Those functions are called provably secure
.
Every hash function with more inputs than outputs will necessarily have collisions. Consider a hash function such as SHA-256 that produces 256 bits of output from an arbitrarily large input. Since it must generate one of 2256 outputs for each member of a much larger set of inputs, the pigeonhole principle guarantees that some inputs will hash to the same output. Collision resistance doesn't mean that no collisions exist; simply that they are hard to find.
The birthday "paradox"
Birthday paradox
In probability theory, the birthday problem or birthday paradox pertains to the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 366. However, 99%...
places an upper bound on collision resistance: if a hash function produces N bits of output, an attacker who computes "only" 2N/2 hash operations on random input is likely to find two matching outputs. If there is an easier method than this brute force attack
Brute force attack
In cryptography, a brute-force attack, or exhaustive key search, is a strategy that can, in theory, be used against any encrypted data. Such an attack might be utilized when it is not possible to take advantage of other weaknesses in an encryption system that would make the task easier...
, it is typically considered a flaw in the hash function.
Cryptographic hash function
Cryptographic hash function
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...
s are usually designed to be collision resistant. But many hash functions that were once thought to be collision resistant were later broken. MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
and SHA-1 in particular both have published techniques more efficient than brute force for finding collisions. However, some compression functions have a proof that finding collisions is at least as difficult as some hard mathematical problem (such as integer factorization
Integer factorization
In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....
or discrete logarithm
Discrete logarithm
In mathematics, specifically in abstract algebra and its applications, discrete logarithms are group-theoretic analogues of ordinary logarithms. In particular, an ordinary logarithm loga is a solution of the equation ax = b over the real or complex numbers...
). Those functions are called provably secure
Provably secure cryptographic hash function
In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory and formal reduction. These...
.
Rationale
Collision resistance is desirable for several reasons.- In some digital signatureDigital signatureA digital signature or digital signature scheme is a mathematical scheme for demonstrating the authenticity of a digital message or document. A valid digital signature gives a recipient reason to believe that the message was created by a known sender, and that it was not altered in transit...
systems, a party attests to a document by publishing a public key signature on a hash of the document. If it is possible to produce two documents with the same hash, an attacker could get a party to attest to one, and then claim that the party had attested to the other. - In some proof-of-workProof-of-work systemA proof-of-work system is an economic measure to deter denial of service attacks and other service abuses such as spam on a network by requiring some work from the service requester, usually meaning processing time by a computer...
systems, users provide hash collisions as proof that they have performed a certain amount of computation to find them. If there is an easier way to find collisions than brute force, users can cheat the system. - In some distributed content systems, parties compare cryptographic hashes of files in order to make sure they have the same version. An attacker who could produce two files with the same hash could trick users into believing they had the same version of a file when they in fact did not.
See also
- Collision attackCollision attackIn cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision...
- Preimage attackPreimage attackIn cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...
- NIST hash function competitionNIST hash function competitionThe NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...
- Provably secure cryptographic hash functionProvably secure cryptographic hash functionIn cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory and formal reduction. These...