Floyd's cycle-finding algorithm
Encyclopedia
In computer science
, cycle detection is the algorithm
ic problem of finding a cycle in a sequence
of iterated function
values.
For any function
ƒ that maps a finite set S to itself, and any initial value x0 in S, the sequence of iterated function values
must eventually use the same value twice: there must be some i ≠ j such that xi = xj. Once this happens, the sequence must continue by repeating the cycle
of values from xi to xj−1. Cycle detection is the problem of finding i and j, given ƒ and x0.
The cycle to be detected is the repeating subsequence of values 6, 3, 1 in this sequence.
One can view the same problem graph-theoretically
, by constructing a functional graph (that is, a directed graph
in which each vertex has a single outgoing edge) the vertices of which are the elements of S and the edges of which map an element to the corresponding function value, as shown in the figure. The set of vertices reachable
from any starting vertex x0 form a subgraph with a shape resembling the Greek letter rho
(ρ): a path of length μ from x0 to a cycle
of λ vertices.
In some applications, and in particular in Pollard's rho algorithm
for integer factorization
, the algorithm has much more limited access to S and to ƒ. In Pollard's rho algorithm, for instance, S is the set of integers modulo an unknown prime factor of the number to be factorized, so even the size of S is unknown to the algorithm. We may view a cycle detection algorithm for this application as having the following capabilities: it initially has in its memory an object representing a pointer to the starting value x0. At any step, it may perform one of three actions: it may copy any pointer it has to another object in memory, it may apply ƒ and replace any of its pointers by a pointer to the next object in the sequence, or it may apply a subroutine for determining whether two of its pointers represent equal values in the sequence. The equality test action may involve some nontrivial computation: in Pollard's rho algorithm, it is implemented by testing whether the difference between two stored values has a nontrivial gcd
with the number to be factored. In this context, we will call an algorithm that only uses pointer copying, advancement within the sequence, and equality tests a pointer algorithm.
such as a hash table
to store these values and test whether each subsequent value has already been stored. However, the space complexity of this algorithm is λ+μ, unnecessarily large. Additionally, to implement this method as a pointer algorithm would require applying the equality test to each pair of values, resulting in quadratic time overall. Thus, research in this area has concentrated on two goals: using less space than this naive algorithm, and finding pointer algorithms that use fewer equality tests.
" algorithm, is a pointer algorithm that uses only two pointers, which move through the sequence at different speeds. The algorithm is named for Robert W. Floyd, who invented it in the late 1960s.
The key insight in the algorithm is that, for any integers and , where λ is the length of the loop to be found. In particular, whenever , it follows that .
Thus, the algorithm only needs to check for repeated values of this special form, one twice as far from the start of the sequence than the other, to find a period ν of a repetition that is a multiple of λ.
Once ν is found, the algorithm retraces the sequence from its start to find the first repeated value xμ in the sequence, using the fact that λ divides ν and therefore that . Finally, once the value of μ is known it is trivial to find the length λ of the shortest repeating cycle, by searching for the first position for which .
The algorithm thus maintains two pointers into the given sequence, one (the tortoise) at xi, and the other (the hare) at x2i. At each step of the algorithm, it increases i by one, moving the tortoise one step forward and the hare two steps forward in the sequence, and then compares the sequence values at these two pointers. The smallest value of i > 0 for which the tortoise and hare point to equal values is the desired value ν.
The following Python
code shows how this idea may be implemented as an algorithm.
This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm uses operations of these types, and O(1) storage space.
described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence. However, it is based on a different principle: searching for the smallest power of two
2i that is larger than both λ and μ. For i = 0, 1, 2, etc., the algorithm compares x2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length λ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of ƒ rather than three.
The following Python code shows how this technique works in more detail.
Like the tortoise and hare algorithm, this is a pointer algorithm that uses tests and function evaluations and O(1) storage space.
It is not difficult to show that the number of function evaluations can never be higher than for Floyd's algorithm.
Brent claims that, on average, his cycle finding algorithm runs around 36% more quickly than Floyd's and that it speeds up the Pollard rho algorithm by around 24%. He also performs an average case analysis for a randomized version of the algorithm in which the sequence of indices traced by the slower of the two pointers is not the powers of two themselves, but rather a randomized multiple of the powers of two. Although his main intended application was in integer factorization algorithms, Brent also discusses applications in testing pseudorandom number generators.
Any cycle detection algorithm that stores at most M values from the input sequence must perform at least function evaluations.
Applications
Cycle detection has been used in many applications.
External links
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, cycle detection is the algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
ic problem of finding a cycle in a sequence
Sequence
In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...
of iterated function
Iterated function
In mathematics, an iterated function is a function which is composed with itself, possibly ad infinitum, in a process called iteration. In this process, starting from some initial value, the result of applying a given function is fed again in the function as input, and this process is repeated...
values.
For any function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
ƒ that maps a finite set S to itself, and any initial value x0 in S, the sequence of iterated function values
must eventually use the same value twice: there must be some i ≠ j such that xi = xj. Once this happens, the sequence must continue by repeating the cycle
Cycle (mathematics)
In mathematics, and in particular in group theory, a cycle is a permutation of the elements of some set X which maps the elements of some subset S to each other in a cyclic fashion, while fixing all other elements...
of values from xi to xj−1. Cycle detection is the problem of finding i and j, given ƒ and x0.
Example
The figure shows a function ƒ that maps the set S = {0,1,2,3,4,5,6,7,8} to itself. If one starts from x0 = 2 and repeatedly applies ƒ, one sees the sequence of values- 2, 0, 6, 3, 1, 6, 3, 1, 6, 3, 1, ....
The cycle to be detected is the repeating subsequence of values 6, 3, 1 in this sequence.
Definitions
Let S be any finite set, ƒ be any function from S to itself, and x0 be any element of S. For any i > 0, let xi = ƒ(xi−1). Let μ be the smallest index such that the value xμ reappears infinitely often within the sequence of values xi, and let λ (the loop length) be the smallest positive integer such that xμ = xλ+μ. The cycle detection problem is the task of finding λ and μ.One can view the same problem graph-theoretically
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...
, by constructing a functional graph (that is, a directed graph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...
in which each vertex has a single outgoing edge) the vertices of which are the elements of S and the edges of which map an element to the corresponding function value, as shown in the figure. The set of vertices reachable
Reachability
In graph theory, reachability is the notion of being able to get from one vertex in a directed graph to some other vertex. Note that reachability in undirected graphs is trivial — it is sufficient to find the connected components in the graph, which can be done in linear time.- Definition :For a...
from any starting vertex x0 form a subgraph with a shape resembling the Greek letter rho
Rho (letter)
Rho is the 17th letter of the Greek alphabet. In the system of Greek numerals, it has a value of 100. It is derived from Semitic resh "head"...
(ρ): a path of length μ from x0 to a cycle
Cycle graph
In graph theory, a cycle graph or circular graph is a graph that consists of a single cycle, or in other words, some number of vertices connected in a closed chain. The cycle graph with n vertices is called Cn...
of λ vertices.
Computer representation
Generally, ƒ will not be specified as a table of values, as we have given it in the figure above. Rather, we may be given access either to the sequence of values xi, or to a subroutine for calculating ƒ. The task is to find λ and μ while examining as few values from the sequence or performing as few subroutine calls as possible. Typically, also, the space complexity of an algorithm for the cycle detection problem is of importance: we wish to solve the problem while using an amount of memory significantly smaller than it would take to store the entire sequence.In some applications, and in particular in Pollard's rho algorithm
Pollard's rho algorithm
Pollard's rho algorithm is a special-purpose integer factorization algorithm. It was invented by John Pollard in 1975. It is particularly effective at splitting composite numbers with small factors.-Core ideas:...
for integer factorization
Integer factorization
In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....
, the algorithm has much more limited access to S and to ƒ. In Pollard's rho algorithm, for instance, S is the set of integers modulo an unknown prime factor of the number to be factorized, so even the size of S is unknown to the algorithm. We may view a cycle detection algorithm for this application as having the following capabilities: it initially has in its memory an object representing a pointer to the starting value x0. At any step, it may perform one of three actions: it may copy any pointer it has to another object in memory, it may apply ƒ and replace any of its pointers by a pointer to the next object in the sequence, or it may apply a subroutine for determining whether two of its pointers represent equal values in the sequence. The equality test action may involve some nontrivial computation: in Pollard's rho algorithm, it is implemented by testing whether the difference between two stored values has a nontrivial gcd
Greatest common divisor
In mathematics, the greatest common divisor , also known as the greatest common factor , or highest common factor , of two or more non-zero integers, is the largest positive integer that divides the numbers without a remainder.For example, the GCD of 8 and 12 is 4.This notion can be extended to...
with the number to be factored. In this context, we will call an algorithm that only uses pointer copying, advancement within the sequence, and equality tests a pointer algorithm.
Algorithms
If the input is given as a subroutine for calculating ƒ, the cycle detection problem may be trivially solved using only λ+μ function applications, simply by computing the sequence of values xi and using a data structureData structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
such as a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
to store these values and test whether each subsequent value has already been stored. However, the space complexity of this algorithm is λ+μ, unnecessarily large. Additionally, to implement this method as a pointer algorithm would require applying the equality test to each pair of values, resulting in quadratic time overall. Thus, research in this area has concentrated on two goals: using less space than this naive algorithm, and finding pointer algorithms that use fewer equality tests.
Tortoise and hare
Floyd's cycle-finding algorithm, also called the "tortoise and the hareThe Tortoise and the Hare
The Tortoise and the Hare is a fable attributed to Aesop and is number 226 in the Perry Index. The story concerns a hare who ridicules a slow-moving tortoise and is challenged by him to a race. The hare soon leaves the tortoise behind and, confident of winning, decides to take a nap midway through...
" algorithm, is a pointer algorithm that uses only two pointers, which move through the sequence at different speeds. The algorithm is named for Robert W. Floyd, who invented it in the late 1960s.
The key insight in the algorithm is that, for any integers and , where λ is the length of the loop to be found. In particular, whenever , it follows that .
Thus, the algorithm only needs to check for repeated values of this special form, one twice as far from the start of the sequence than the other, to find a period ν of a repetition that is a multiple of λ.
Once ν is found, the algorithm retraces the sequence from its start to find the first repeated value xμ in the sequence, using the fact that λ divides ν and therefore that . Finally, once the value of μ is known it is trivial to find the length λ of the shortest repeating cycle, by searching for the first position for which .
The algorithm thus maintains two pointers into the given sequence, one (the tortoise) at xi, and the other (the hare) at x2i. At each step of the algorithm, it increases i by one, moving the tortoise one step forward and the hare two steps forward in the sequence, and then compares the sequence values at these two pointers. The smallest value of i > 0 for which the tortoise and hare point to equal values is the desired value ν.
The following Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
code shows how this idea may be implemented as an algorithm.
This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm uses operations of these types, and O(1) storage space.
Brent's algorithm
Richard P. BrentRichard Brent (scientist)
Richard Peirce Brent is an Australian mathematician and computer scientist, born in 1946. He holds the position of Distinguished Professor of Mathematics and Computer Science with a joint appointment in the Mathematical Sciences Institute and the College of Engineering and Computer Science at...
described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence. However, it is based on a different principle: searching for the smallest power of two
Power of two
In mathematics, a power of two means a number of the form 2n where n is an integer, i.e. the result of exponentiation with as base the number two and as exponent the integer n....
2i that is larger than both λ and μ. For i = 0, 1, 2, etc., the algorithm compares x2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length λ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of ƒ rather than three.
The following Python code shows how this technique works in more detail.
Like the tortoise and hare algorithm, this is a pointer algorithm that uses tests and function evaluations and O(1) storage space.
It is not difficult to show that the number of function evaluations can never be higher than for Floyd's algorithm.
Brent claims that, on average, his cycle finding algorithm runs around 36% more quickly than Floyd's and that it speeds up the Pollard rho algorithm by around 24%. He also performs an average case analysis for a randomized version of the algorithm in which the sequence of indices traced by the slower of the two pointers is not the powers of two themselves, but rather a randomized multiple of the powers of two. Although his main intended application was in integer factorization algorithms, Brent also discusses applications in testing pseudorandom number generators.
Time–space tradeoffs
A number of authors have studied techniques for cycle detection that use more memory than Floyd's and Brent's methods, but detect cycles more quickly. In general these methods store several previously-computed sequence values, and test whether each new value equals one of the previously-computed values. In order to do so quickly, they typically use a hash table or similar data structure for storing the previously-computed values, and therefore are not pointer algorithms: in particular, they usually cannot be applied to Pollard's rho algorithm. Where these methods differ is in how they determine which values to store. Following Nivasch, we survey these techniques briefly.- Brent already describes variations of his technique in which the indices of saved sequence values are powers of a number R other than two. By choosing R to be a number close to one, and storing the sequence values at indices that are near a sequence of consecutive powers of R, a cycle detection algorithm can use a number of function evaluations that is within an arbitrarily small factor of the optimum λ+μ.
- Sedgewick, Szymanski, and Yao provide a method that uses M memory cells and requires in the worst case only function evaluations, for some constant c, which they show to be optimal. The technique involves maintaining a numerical parameter d, storing in a table only those positions in the sequence that are multiples of d, and clearing the table and doubling d whenever too many values have been stored.
- Several authors have described distinguished point methods that store function values in a table based on a criterion involving the values, rather than (as in the method of Sedgewick et al.) based on their positions. For instance, values equal to zero modulo some value d might be stored. More simply, Nivasch credits D. P. Woodruff with the suggestion of storing a random sample of previously seen values, making an appropriate random choice at each step so that the sample remains random.
- Nivasch describes an algorithm that does not use a fixed amount of memory, but for which the expected amount of memory used (under the assumption that the input function is random) is logarithmic in the sequence length. An item is stored in the memory table, with this technique, when no later item has a smaller value. As Nivasch shows, the items with this technique can be maintained using a stack data structureStack (data structure)In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
, and each successive sequence value need be compared only to the top of the stack. The algorithm terminates when the repeated sequence element with smallest value is found. Running the same algorithm with multiple stacks, using random permutations of the values to reorder the values within each stack, allows a time–space tradeoff similar to the previous algorithms. However, even the version of this algorithm with a single stack is not a pointer algorithm, due to the comparisons needed to determine which of two values is smaller.
Any cycle detection algorithm that stores at most M values from the input sequence must perform at least function evaluations.
Applications
Cycle detection has been used in many applications.
- Determining the cycle length of a pseudorandom number generatorPseudorandom number generatorA pseudorandom number generator , also known as a deterministic random bit generator , is an algorithm for generating a sequence of numbers that approximates the properties of random numbers...
is one measure of its strength. This is the application cited by Knuth in describing Floyd's method. Brent describes the results of testing a linear congruential generatorLinear congruential generatorA Linear Congruential Generator represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is easy to understand, and they are easily implemented and fast....
in this fashion; its period turned out to be significantly smaller than advertised. For more complex generators, the sequence of values in which the cycle is to be found may not represent the output of the generator, but rather its internal state. - Several number-theoreticNumber theoryNumber theory is a branch of pure mathematics devoted primarily to the study of the integers. Number theorists study prime numbers as well...
algorithms are based on cycle detection, including Pollard's rho algorithmPollard's rho algorithmPollard's rho algorithm is a special-purpose integer factorization algorithm. It was invented by John Pollard in 1975. It is particularly effective at splitting composite numbers with small factors.-Core ideas:...
for integer factorization and his related kangaroo algorithm for the discrete logarithmDiscrete logarithmIn mathematics, specifically in abstract algebra and its applications, discrete logarithms are group-theoretic analogues of ordinary logarithms. In particular, an ordinary logarithm loga is a solution of the equation ax = b over the real or complex numbers...
problem. - In cryptographicCryptographyCryptography is the practice and study of techniques for secure communication in the presence of third parties...
applications, the ability to find two distinct values xμ−-1 and xλ+μ−-1 mapped by some cryptographic function ƒ to the same value xμ may indicate a weakness in ƒ. For instance, Quisquater and Delescaille apply cycle detection algorithms in the search for a message and a pair of Data Encryption StandardData Encryption StandardThe Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...
keys that map that message to the same encrypted value; Kaliski, Rivest, and Sherman also use cycle detection algorithms to attack DES. The technique may also be used to find a collisionHash collisionNot to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....
in a cryptographic hash functionCryptographic hash functionA cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...
. - Cycle detection may be helpful as a way of discovering infinite loopInfinite loopAn infinite loop is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over...
s in certain types of computer programComputer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
s. - Periodic configurations in cellular automatonCellular automatonA cellular automaton is a discrete model studied in computability theory, mathematics, physics, complexity science, theoretical biology and microstructure modeling. It consists of a regular grid of cells, each in one of a finite number of states, such as "On" and "Off"...
simulations may be found by applying cycle detection algorithms to the sequence of automaton states. - Shape analysisShape analysis (software)In program analysis, a shape analysis is a static code analysis technique that discovers and verifies properties of linked, dynamically allocated data structures in computer programs. It is typically used at compile time to find software bugs or to verify high-level correctness properties of...
of linked listLinked listIn computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
data structures is a technique for verifying the correctness of an algorithm using those structures. If a node in the list incorrectly points to an earlier node in the same list, the structure will form a cycle that can be detected by these algorithms. - Teske describes applications in computational group theoryComputational group theoryIn mathematics, computational group theory is the study ofgroups by means of computers. It is concernedwith designing and analysing algorithms anddata structures to compute information about groups...
: determining the structure of an Abelian groupAbelian groupIn abstract algebra, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on their order . Abelian groups generalize the arithmetic of addition of integers...
from a set of its generators. The cryptographic algorithms of Kaliski et al. may also be viewed as attempting to infer the structure of an unknown group. - Fich briefly mentions an application to computer simulationComputer simulationA computer simulation, a computer model, or a computational model is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system...
of celestial mechanicsCelestial mechanicsCelestial mechanics is the branch of astronomy that deals with the motions of celestial objects. The field applies principles of physics, historically classical mechanics, to astronomical objects such as stars and planets to produce ephemeris data. Orbital mechanics is a subfield which focuses on...
, which she attributes to William KahanWilliam KahanWilliam Morton Kahan is a mathematician and computer scientist who received the Turing Award in 1989 for "his fundamental contributions to numerical analysis", and was named an ACM Fellow in 1994....
. In this application, cycle detection in the phase spacePhase spaceIn mathematics and physics, a phase space, introduced by Willard Gibbs in 1901, is a space in which all possible states of a system are represented, with each possible state of the system corresponding to one unique point in the phase space...
of an orbital system may be used to determine whether the system is periodic to within the accuracy of the simulation.
External links
- Gabriel Nivasch, The Cycle Detection Problem and the Stack Algorithm.
- Tortoise and Hare, Portland Pattern Repository