MD5CRK
Encyclopedia
In cryptography
, MD5CRK was a distributed
effort (similar to distributed.net
) launched by Jean-Luc Cooke and his company, CertainKey Cryptosystems, to demonstrate that the MD5
message digest algorithm
is insecure by finding a collision
— two messages that produce the same MD5 hash. The project went live on March 1, 2004. The project ended on August 24, 2004 after researchers independently demonstrated a technique for generating collisions in MD5 using analytical methods by Xiaoyun Wang
, Feng, Xuejia Lai
, and Yuhttp://eprint.iacr.org/2004/199.pdf. CertainKey awarded a 10,000 Canadian Dollar
prize to Wang, Feng, Lai and Yu for their discovery.
A technique called Floyd's cycle-finding algorithm
was used to try to find a collision for MD5. The algorithm can be described by analogy with a random walk
. Using the principle that any function with a finite number of possible outputs placed in a feedback loop will cycle, one can use a relatively small amount of memory to store outputs with particular structures and use them as "markers" to better detect when a marker has been "passed" before. These markers are called distinguished points, the point where two inputs produce the same output is called a collision point. MD5CRK considered any point whose first 32 bit
s were zeroes to be a distinguished point.
For this project, the probability of success after MD5 computations can be approximated by: .
The expected number of computations required to produce a collision in the 128-bit MD5 message digest function is thus:
To give some perspective to this, using Virginia Tech's System X with a max performance of 12.25 Teraflops, it would take approximately seconds or about 3 weeks. Or for commodity processors at 2 gigaflops it would take 6,000 machines approximately the same amount of time.
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
, MD5CRK was a distributed
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
effort (similar to distributed.net
Distributed.net
distributed.net is a worldwide distributed computing effort that is attempting to solve large scale problems using otherwise idle CPU or GPU time. It is officially recognized as a non-profit organization under U.S...
) launched by Jean-Luc Cooke and his company, CertainKey Cryptosystems, to demonstrate that the MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
message digest algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
is insecure by finding a collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....
— two messages that produce the same MD5 hash. The project went live on March 1, 2004. The project ended on August 24, 2004 after researchers independently demonstrated a technique for generating collisions in MD5 using analytical methods by Xiaoyun Wang
Xiaoyun Wang
Wang Xiaoyun is a researcher and professor in the Department of Mathematics and System Science, Shandong University, Shandong, China....
, Feng, Xuejia Lai
Xuejia Lai
Xuejia Lai is a cryptographer, currently a professor at Shanghai Jiao Tong University. His notable work includes the design of the block cipher IDEA, the theory of Markov ciphers, and the cryptanalysis of a number of cryptographic hash functions. His book On the Design and Security of Block...
, and Yuhttp://eprint.iacr.org/2004/199.pdf. CertainKey awarded a 10,000 Canadian Dollar
Canadian dollar
The Canadian dollar is the currency of Canada. As of 2007, the Canadian dollar is the 7th most traded currency in the world. It is abbreviated with the dollar sign $, or C$ to distinguish it from other dollar-denominated currencies...
prize to Wang, Feng, Lai and Yu for their discovery.
A technique called Floyd's cycle-finding algorithm
Floyd's cycle-finding algorithm
In computer science, cycle detection is the algorithmic problem of finding a cycle in a sequence of iterated function values.For any function ƒ that maps a finite set S to itself, and any initial value x0 in S, the sequence of iterated function values x_0,\ x_1=f,\ x_2=f,\ \dots,\ x_i=f,\ \dotsmust...
was used to try to find a collision for MD5. The algorithm can be described by analogy with a random walk
Random walk
A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...
. Using the principle that any function with a finite number of possible outputs placed in a feedback loop will cycle, one can use a relatively small amount of memory to store outputs with particular structures and use them as "markers" to better detect when a marker has been "passed" before. These markers are called distinguished points, the point where two inputs produce the same output is called a collision point. MD5CRK considered any point whose first 32 bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s were zeroes to be a distinguished point.
Complexity
The expected time to find a collision is not equal to where is the number of bits in the digest output. It is in fact , where is the number of function outputs collected.For this project, the probability of success after MD5 computations can be approximated by: .
The expected number of computations required to produce a collision in the 128-bit MD5 message digest function is thus:
To give some perspective to this, using Virginia Tech's System X with a max performance of 12.25 Teraflops, it would take approximately seconds or about 3 weeks. Or for commodity processors at 2 gigaflops it would take 6,000 machines approximately the same amount of time.