Fast Fourier transform
Encyclopedia
A fast Fourier transform (FFT) is an efficient algorithm
to compute the discrete Fourier transform
(DFT) and its inverse. "The FFT has been called the most important numerical algorithm
of our lifetime (Strang, 1994)." (Kent & Read 2002, 61) There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic
to group theory
and number theory
; this article gives an overview of the available techniques and some of their general properties, while the specific algorithms are described in subsidiary articles linked below.
A DFT decomposes a sequence
of values into components of different frequencies. This operation is useful in many fields (see discrete Fourier transform
for properties and applications of the transform) but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes O
(N2) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions—in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to
N / log(N). This huge improvement made many DFT-based algorithms practical; FFTs are of great importance to a wide variety of applications, from digital signal processing
and solving partial differential equation
s to algorithms for quick multiplication of large integers
.
The most well known FFT algorithms depend upon the factorization
of N, but (contrary to popular misconception) there are FFTs with O(N log N) complexity
for all N, even for prime
N. Many FFT algorithms only depend on the fact that is an th primitive root of unity, and thus can be applied to analogous transforms over any finite field
, such as number-theoretic transforms.
Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/N factor, any FFT algorithm can easily be adapted for it.
and produces exactly the same result as evaluating the DFT definition directly; the only difference is that an FFT is much faster. (In the presence of round-off error
, many FFT algorithms are also much more accurate than evaluating the DFT definition directly, as discussed below.)
Let x0, ...., xN-1 be complex number
s. The DFT is defined by the formula
Evaluating this definition directly requires O(N2) operations: there are N outputs Xk, and each output requires a sum of N terms. An FFT is any method to compute the same results in O(N log N) operations. More precisely, all known FFT algorithms require Θ(N log N) operations (technically, O only denotes an upper bound
), although there is no known proof that better complexity is impossible.
To illustrate the savings of an FFT, consider the count of complex multiplications and additions. Evaluating the DFT's sums directly involves N2 complex multiplications and N(N − 1) complex additions [of which O(N) operations can be saved by eliminating trivial operations such as multiplications by 1]. The well-known radix-2 Cooley–Tukey algorithm, for N a power of 2, can compute the same result with only (N/2) log2 N complex multiplies (again, ignoring simplifications of multiplications by 1 and similar) and N log2N complex additions. In practice, actual performance on modern computers is usually dominated by factors other than arithmetic and is a complicated subject (see, e.g., Frigo & Johnson, 2005), but the overall improvement from O(N2) to O(N log N) remains.
that recursively
breaks down a DFT of any composite
size N = N1N2 into many smaller DFTs of sizes N1 and N2, along with O(N) multiplications by complex roots of unity traditionally called twiddle factor
s (after Gentleman and Sande, 1966).
This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in 1965, but it was later discovered (Heideman & Burrus, 1984) that those two authors had independently re-invented an algorithm known to Carl Friedrich Gauss
around 1805 (and subsequently rediscovered several times in limited forms).
The most well-known use of the Cooley–Tukey algorithm is to divide the transform into two pieces of size at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey). These are called the radix-2 and mixed-radix cases, respectively (and other variants such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as those described below.
and , one can use the Prime-Factor
(Good-Thomas) algorithm (PFA), based on the Chinese Remainder Theorem
, to factorize the DFT similarly to Cooley–Tukey but without the twiddle factors. The Rader-Brenner algorithm (1976) is a Cooley–Tukey-like factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability; it was later superseded by the split-radix
variant of Cooley–Tukey (which achieves the same multiplication count but with fewer additions and without sacrificing accuracy). Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The Rader-Brenner and QFT algorithms were proposed for power-of-two sizes, but it is possible that they could be adapted to general composite . Bruun's algorithm applies to arbitrary even composite sizes.) Bruun's algorithm
, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial
, here into real-coefficient polynomials of the form and
.
Another polynomial viewpoint is exploited by the Winograd algorithm, which factorizes into cyclotomic polynomial
s—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only irrational multiplications, leading to a proven achievable lower bound on the number of multiplications for power-of-two sizes; unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors
with hardware multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime sizes.
Rader's algorithm
, exploiting the existence of a generator
for the multiplicative group
modulo prime , expresses a DFT of prime size as a cyclic convolution
of (composite) size , which can then be computed by a pair of ordinary FFTs via the convolution theorem
(although Winograd uses other convolution methods). Another prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as a convolution, but this time of the same size (which can be zero-padded to a power of two
and evaluated by radix-2 Cooley–Tukey FFTs, for example), via the identity .
and efficient FFT algorithms have been designed for this situation (see e.g. Sorensen, 1987). One approach consists of taking an ordinary algorithm (e.g. Cooley–Tukey) and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(N) post-processing operations.
It was once believed that real-input DFTs could be more efficiently computed by means of the discrete Hartley transform
(DHT), but it was subsequently argued that a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs. Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has not proved popular.
There are further FFT specializations for the cases of real data that have even/odd
symmetry, in which case one can gain another factor of (roughly) two in time and memory and the DFT becomes the discrete cosine/sine transform(s) (DCT
/DST
). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(N) pre/post processing.
and exact operation counts of fast Fourier transforms, and many open problems remain. It is not even rigorously proved whether DFTs truly require (i.e., order or greater) operations, even for the simple case of power of two
sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic operations is usually the focus of such questions, although actual performance on modern-day computers is determined by many other factors such as cache
or CPU pipeline optimization.
Following pioneering work by Winograd
(1978), a tight lower bound is known for the number of real multiplications required by an FFT
. It can be shown that only irrational real multiplications are required to compute a DFT of power-of-two length . Moreover, explicit algorithms that achieve this count are known (Heideman & Burrus, 1986; Duhamel, 1990). Unfortunately, these algorithms require too many additions to be practical, at least on modern computers with hardware multipliers.
A tight lower bound is not known on the number of required additions, although lower bounds have been proved under some restrictive assumptions on the algorithms. In 1973, Morgenstern proved an lower bound on the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for most but not all FFT algorithms). Pan (1986) proved an lower bound assuming a bound on a measure of the FFT algorithm's "asynchronicity", but the generality of this assumption is unclear. For the case of power-of-two , Papadimitriou (1979) argued that the number of complex-number additions achieved by Cooley–Tukey algorithms is optimal under certain assumptions on the graph
of the algorithm (his assumptions imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would imply that at least real additions are required, although this is not a tight bound because extra additions are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer than complex-number additions (or their equivalent) for power-of-two .
A third problem is to minimize the total number of real multiplications and additions, sometimes called the "arithmetic complexity" (although in this context it is the exact count and not the asymptotic complexity that is being considered). Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for power-of-two was long achieved by the split-radix FFT algorithm
, which requires real multiplications and additions for . This was recently reduced to (Johnson and Frigo, 2007; Lundy and Van Buskirk, 2007).
Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related problems such as real-data FFTs, discrete cosine transform
s, discrete Hartley transform
s, and so on, that any improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990).
with the help of a fast multipole method
. A wavelet
-based approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and non-sparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.
Even the "exact" FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley–Tukey, have excellent numerical properties as a consequence of the pairwise summation
structure of the algorithms. The upper bound on the relative error
for the Cooley–Tukey algorithm is O(ε log N), compared to O(εN3/2) for the naïve DFT formula (Gentleman and Sande, 1966), where ε is the machine floating-point relative precision. In fact, the root mean square
(rms) errors are much better than these upper bounds, being only O(ε √log N) for Cooley–Tukey and O(ε √N) for the naïve DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function
values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate trigonometric recurrence
formulas. Some FFTs other than Cooley–Tukey, such as the Rader-Brenner algorithm, are intrinsically less stable.
In fixed-point arithmetic
, the finite-precision errors accumulated by FFT algorithms are worse, with rms errors growing as O(√N) for the Cooley–Tukey algorithm (Welch, 1969). Moreover, even achieving this accuracy requires careful attention to scaling in order to minimize the loss of precision, and fixed-point FFT algorithms involve rescaling at each intermediate stage of decompositions like Cooley–Tukey.
To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in O(N log N) time by a simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random inputs (Ergün, 1995).
transforms an array with a -dimensional vector
of indices by a set of nested summations (over for each ), where the division , defined as , is performed element-wise. Equivalently, it is simply the composition of a sequence of sets of one-dimensional DFTs, performed along one dimension at a time (in any order).
This compositional viewpoint immediately provides the simplest and most common multidimensional DFT algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply performs a sequence of one-dimensional FFTs (by any of the above algorithms): first you transform along the dimension, then along the dimension, and so on (or actually, any ordering will work). This method is easily shown to have the usual complexity, where is the total number of data points transformed. In particular, there are transforms of size , etcetera, so the complexity of the sequence of FFTs is:
In two dimensions, the can be viewed as an matrix
, and this algorithm corresponds to first performing the FFT of all the rows and then of all the columns (or vice versa), hence the name.
In more than two dimensions, it is often advantageous for cache
locality to group the dimensions recursively. For example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar "slice" for each fixed , and then perform the one-dimensional FFTs along the direction. More generally, an asymptotically optimal
cache-oblivious algorithm consists of recursively dividing the dimensions into two groups and that are transformed recursively (rounding if is not even) (see Frigo and Johnson, 2005). Still, this remains a straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT algorithm as the base case, and still has complexity. Yet another variation is to perform matrix transpositions
in between transforming subsequent dimensions, so that the transforms operate on contiguous data; this is especially important for out-of-core and distributed memory
situations where accessing non-contiguous data is extremely time-consuming.
There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of them have complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm, which is a generalization of the ordinary Cooley–Tukey algorithm where one divides the transform dimensions by a vector of radices at each step. (This may also have cache benefits.) The simplest case of vector-radix is where all of the radices are equal (e.g. vector-radix-2 divides all of the dimensions by two), but this is not necessary. Vector radix with only a single non-unit radix at a time, i.e. , is essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due to Nussbaumer (1977), which view the transform in terms of convolutions and polynomial products. See Duhamel and Vetterli (1990) for more information and references.
on the sphere S2 with N2 nodes was described by Mohlenkamp (1999), along with an algorithm conjectured (but not proven) to have O(N2 log2 N) complexity; Mohlenkamp also provides an implementation in the libftsh library. A spherical-harmonic algorithm with O(N2 log N) complexity is described by Rokhlin and Tygert (2006).
Various groups have also published "FFT" algorithms for non-equispaced data, as reviewed in Potts et al. (2001). Such algorithms do not strictly compute the DFT (which is only defined for equispaced data), but rather some approximation thereof (a non-uniform discrete Fourier transform
, or NDFT, which itself is often computed only approximately).
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
to compute the discrete Fourier transform
Discrete Fourier transform
In mathematics, the discrete Fourier transform is a specific kind of discrete transform, used in Fourier analysis. It transforms one function into another, which is called the frequency domain representation, or simply the DFT, of the original function...
(DFT) and its inverse. "The FFT has been called the most important numerical algorithm
Numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation for the problems of mathematical analysis ....
of our lifetime (Strang, 1994)." (Kent & Read 2002, 61) There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic
Complex number
A complex number is a number consisting of a real part and an imaginary part. Complex numbers extend the idea of the one-dimensional number line to the two-dimensional complex plane by using the number line for the real part and adding a vertical axis to plot the imaginary part...
to group theory
Group theory
In mathematics and abstract algebra, group theory studies the algebraic structures known as groups.The concept of a group is central to abstract algebra: other well-known algebraic structures, such as rings, fields, and vector spaces can all be seen as groups endowed with additional operations and...
and number theory
Number theory
Number theory is a branch of pure mathematics devoted primarily to the study of the integers. Number theorists study prime numbers as well...
; this article gives an overview of the available techniques and some of their general properties, while the specific algorithms are described in subsidiary articles linked below.
A DFT decomposes a sequence
Sequence
In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...
of values into components of different frequencies. This operation is useful in many fields (see discrete Fourier transform
Discrete Fourier transform
In mathematics, the discrete Fourier transform is a specific kind of discrete transform, used in Fourier analysis. It transforms one function into another, which is called the frequency domain representation, or simply the DFT, of the original function...
for properties and applications of the transform) but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
(N2) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions—in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to
Proportionality (mathematics)
In mathematics, two variable quantities are proportional if one of them is always the product of the other and a constant quantity, called the coefficient of proportionality or proportionality constant. In other words, are proportional if the ratio \tfrac yx is constant. We also say that one...
N / log(N). This huge improvement made many DFT-based algorithms practical; FFTs are of great importance to a wide variety of applications, from digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
and solving partial differential equation
Partial differential equation
In mathematics, partial differential equations are a type of differential equation, i.e., a relation involving an unknown function of several independent variables and their partial derivatives with respect to those variables...
s to algorithms for quick multiplication of large integers
Multiplication algorithm
A multiplication algorithm is an algorithm to multiply two numbers. Depending on the size of the numbers, different algorithms are in use...
.
The most well known FFT algorithms depend upon the factorization
Factorization
In mathematics, factorization or factoring is the decomposition of an object into a product of other objects, or factors, which when multiplied together give the original...
of N, but (contrary to popular misconception) there are FFTs with O(N log N) complexity
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
for all N, even for prime
Prime number
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example 5 is prime, as only 1 and 5 divide it, whereas 6 is composite, since it has the divisors 2...
N. Many FFT algorithms only depend on the fact that is an th primitive root of unity, and thus can be applied to analogous transforms over any finite field
Finite field
In abstract algebra, a finite field or Galois field is a field that contains a finite number of elements. Finite fields are important in number theory, algebraic geometry, Galois theory, cryptography, and coding theory...
, such as number-theoretic transforms.
Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/N factor, any FFT algorithm can easily be adapted for it.
Definition and speed
An FFT computes the DFTDiscrete Fourier transform
In mathematics, the discrete Fourier transform is a specific kind of discrete transform, used in Fourier analysis. It transforms one function into another, which is called the frequency domain representation, or simply the DFT, of the original function...
and produces exactly the same result as evaluating the DFT definition directly; the only difference is that an FFT is much faster. (In the presence of round-off error
Round-off error
A round-off error, also called rounding error, is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate this error when using approximation equations and/or algorithms, especially when using finitely many...
, many FFT algorithms are also much more accurate than evaluating the DFT definition directly, as discussed below.)
Let x0, ...., xN-1 be complex number
Complex number
A complex number is a number consisting of a real part and an imaginary part. Complex numbers extend the idea of the one-dimensional number line to the two-dimensional complex plane by using the number line for the real part and adding a vertical axis to plot the imaginary part...
s. The DFT is defined by the formula
Evaluating this definition directly requires O(N2) operations: there are N outputs Xk, and each output requires a sum of N terms. An FFT is any method to compute the same results in O(N log N) operations. More precisely, all known FFT algorithms require Θ(N log N) operations (technically, O only denotes an upper bound
Upper bound
In mathematics, especially in order theory, an upper bound of a subset S of some partially ordered set is an element of P which is greater than or equal to every element of S. The term lower bound is defined dually as an element of P which is lesser than or equal to every element of S...
), although there is no known proof that better complexity is impossible.
To illustrate the savings of an FFT, consider the count of complex multiplications and additions. Evaluating the DFT's sums directly involves N2 complex multiplications and N(N − 1) complex additions [of which O(N) operations can be saved by eliminating trivial operations such as multiplications by 1]. The well-known radix-2 Cooley–Tukey algorithm, for N a power of 2, can compute the same result with only (N/2) log2 N complex multiplies (again, ignoring simplifications of multiplications by 1 and similar) and N log2N complex additions. In practice, actual performance on modern computers is usually dominated by factors other than arithmetic and is a complicated subject (see, e.g., Frigo & Johnson, 2005), but the overall improvement from O(N2) to O(N log N) remains.
Cooley–Tukey algorithm
By far the most common FFT is the Cooley–Tukey algorithm. This is a divide and conquer algorithmDivide and conquer algorithm
In computer science, divide and conquer is an important algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple enough to be solved directly...
that recursively
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...
breaks down a DFT of any composite
Composite number
A composite number is a positive integer which has a positive divisor other than one or itself. In other words a composite number is any positive integer greater than one that is not a prime number....
size N = N1N2 into many smaller DFTs of sizes N1 and N2, along with O(N) multiplications by complex roots of unity traditionally called twiddle factor
Twiddle factor
A twiddle factor, in fast Fourier transform algorithms, is any of the trigonometric constant coefficients that are multiplied by the data in the course of the algorithm...
s (after Gentleman and Sande, 1966).
This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in 1965, but it was later discovered (Heideman & Burrus, 1984) that those two authors had independently re-invented an algorithm known to Carl Friedrich Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...
around 1805 (and subsequently rediscovered several times in limited forms).
The most well-known use of the Cooley–Tukey algorithm is to divide the transform into two pieces of size at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey). These are called the radix-2 and mixed-radix cases, respectively (and other variants such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as those described below.
Other FFT algorithms
There are other FFT algorithms distinct from Cooley–Tukey. For with coprimeCoprime
In number theory, a branch of mathematics, two integers a and b are said to be coprime or relatively prime if the only positive integer that evenly divides both of them is 1. This is the same thing as their greatest common divisor being 1...
and , one can use the Prime-Factor
Prime-factor FFT algorithm
The prime-factor algorithm , also called the Good–Thomas algorithm , is a fast Fourier transform algorithm that re-expresses the discrete Fourier transform of a size N = N1N2 as a two-dimensional N1×N2 DFT, but only for the case where N1 and N2 are relatively prime...
(Good-Thomas) algorithm (PFA), based on the Chinese Remainder Theorem
Chinese remainder theorem
The Chinese remainder theorem is a result about congruences in number theory and its generalizations in abstract algebra.In its most basic form it concerned with determining n, given the remainders generated by division of n by several numbers...
, to factorize the DFT similarly to Cooley–Tukey but without the twiddle factors. The Rader-Brenner algorithm (1976) is a Cooley–Tukey-like factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability; it was later superseded by the split-radix
Split-radix FFT algorithm
The split-radix FFT is a fast Fourier transform algorithm for computing the discrete Fourier transform , and was first described in an initially little-appreciated paper by R. Yavne and subsequently rediscovered simultaneously by various authors in 1984. The split-radix FFT is a fast Fourier...
variant of Cooley–Tukey (which achieves the same multiplication count but with fewer additions and without sacrificing accuracy). Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The Rader-Brenner and QFT algorithms were proposed for power-of-two sizes, but it is possible that they could be adapted to general composite . Bruun's algorithm applies to arbitrary even composite sizes.) Bruun's algorithm
Bruun's FFT algorithm
Bruun's algorithm is a fast Fourier transform algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996...
, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...
, here into real-coefficient polynomials of the form and
.
Another polynomial viewpoint is exploited by the Winograd algorithm, which factorizes into cyclotomic polynomial
Cyclotomic polynomial
In algebra, the nth cyclotomic polynomial, for any positive integer n, is the monic polynomial:\Phi_n = \prod_\omega \,where the product is over all nth primitive roots of unity ω in a field, i.e...
s—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only irrational multiplications, leading to a proven achievable lower bound on the number of multiplications for power-of-two sizes; unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
with hardware multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime sizes.
Rader's algorithm
Rader's FFT algorithm
Rader's algorithm is a fast Fourier transform algorithm that computes the discrete Fourier transform of prime sizes by re-expressing the DFT as a cyclic convolution...
, exploiting the existence of a generator
Generating set of a group
In abstract algebra, a generating set of a group is a subset that is not contained in any proper subgroup of the group. Equivalently, a generating set of a group is a subset such that every element of the group can be expressed as the combination of finitely many elements of the subset and their...
for the multiplicative group
Group (mathematics)
In mathematics, a group is an algebraic structure consisting of a set together with an operation that combines any two of its elements to form a third element. To qualify as a group, the set and the operation must satisfy a few conditions called group axioms, namely closure, associativity, identity...
modulo prime , expresses a DFT of prime size as a cyclic convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...
of (composite) size , which can then be computed by a pair of ordinary FFTs via the convolution theorem
Convolution theorem
In mathematics, the convolution theorem states that under suitableconditions the Fourier transform of a convolution is the pointwise product of Fourier transforms. In other words, convolution in one domain equals point-wise multiplication in the other domain...
(although Winograd uses other convolution methods). Another prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as a convolution, but this time of the same size (which can be zero-padded to a power of two
Power of two
In mathematics, a power of two means a number of the form 2n where n is an integer, i.e. the result of exponentiation with as base the number two and as exponent the integer n....
and evaluated by radix-2 Cooley–Tukey FFTs, for example), via the identity .
FFT algorithms specialized for real and/or symmetric data
In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetryand efficient FFT algorithms have been designed for this situation (see e.g. Sorensen, 1987). One approach consists of taking an ordinary algorithm (e.g. Cooley–Tukey) and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(N) post-processing operations.
It was once believed that real-input DFTs could be more efficiently computed by means of the discrete Hartley transform
Discrete Hartley transform
A discrete Hartley transform is a Fourier-related transform of discrete, periodic data similar to the discrete Fourier transform , with analogous applications in signal processing and related fields. Its main distinction from the DFT is that it transforms real inputs to real outputs, with no...
(DHT), but it was subsequently argued that a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs. Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has not proved popular.
There are further FFT specializations for the cases of real data that have even/odd
Even and odd functions
In mathematics, even functions and odd functions are functions which satisfy particular symmetry relations, with respect to taking additive inverses. They are important in many areas of mathematical analysis, especially the theory of power series and Fourier series...
symmetry, in which case one can gain another factor of (roughly) two in time and memory and the DFT becomes the discrete cosine/sine transform(s) (DCT
Discrete cosine transform
A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images A discrete cosine transform...
/DST
Discrete sine transform
In mathematics, the discrete sine transform is a Fourier-related transform similar to the discrete Fourier transform , but using a purely real matrix...
). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(N) pre/post processing.
Bounds on complexity and operation counts
A fundamental question of longstanding theoretical interest is to prove lower bounds on the complexityComputational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
and exact operation counts of fast Fourier transforms, and many open problems remain. It is not even rigorously proved whether DFTs truly require (i.e., order or greater) operations, even for the simple case of power of two
Power of two
In mathematics, a power of two means a number of the form 2n where n is an integer, i.e. the result of exponentiation with as base the number two and as exponent the integer n....
sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic operations is usually the focus of such questions, although actual performance on modern-day computers is determined by many other factors such as cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
or CPU pipeline optimization.
Following pioneering work by Winograd
Shmuel Winograd
Shmuel Winograd is an American computer scientist, noted for his contributions to computational complexity. He has proved several major results regarding the computational aspects of arithmetic; his contributions include the Coppersmith-Winograd algorithm and an algorithm for Fast Fourier...
(1978), a tight lower bound is known for the number of real multiplications required by an FFT
Arithmetic complexity of the discrete Fourier transform
See Fast Fourier transform#Bounds on complexity and operation counts for a general summary of this issue.-Bounds on the multiplicative complexity of FFT:...
. It can be shown that only irrational real multiplications are required to compute a DFT of power-of-two length . Moreover, explicit algorithms that achieve this count are known (Heideman & Burrus, 1986; Duhamel, 1990). Unfortunately, these algorithms require too many additions to be practical, at least on modern computers with hardware multipliers.
A tight lower bound is not known on the number of required additions, although lower bounds have been proved under some restrictive assumptions on the algorithms. In 1973, Morgenstern proved an lower bound on the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for most but not all FFT algorithms). Pan (1986) proved an lower bound assuming a bound on a measure of the FFT algorithm's "asynchronicity", but the generality of this assumption is unclear. For the case of power-of-two , Papadimitriou (1979) argued that the number of complex-number additions achieved by Cooley–Tukey algorithms is optimal under certain assumptions on the graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...
of the algorithm (his assumptions imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would imply that at least real additions are required, although this is not a tight bound because extra additions are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer than complex-number additions (or their equivalent) for power-of-two .
A third problem is to minimize the total number of real multiplications and additions, sometimes called the "arithmetic complexity" (although in this context it is the exact count and not the asymptotic complexity that is being considered). Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for power-of-two was long achieved by the split-radix FFT algorithm
Split-radix FFT algorithm
The split-radix FFT is a fast Fourier transform algorithm for computing the discrete Fourier transform , and was first described in an initially little-appreciated paper by R. Yavne and subsequently rediscovered simultaneously by various authors in 1984. The split-radix FFT is a fast Fourier...
, which requires real multiplications and additions for . This was recently reduced to (Johnson and Frigo, 2007; Lundy and Van Buskirk, 2007).
Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related problems such as real-data FFTs, discrete cosine transform
Discrete cosine transform
A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images A discrete cosine transform...
s, discrete Hartley transform
Discrete Hartley transform
A discrete Hartley transform is a Fourier-related transform of discrete, periodic data similar to the discrete Fourier transform , with analogous applications in signal processing and related fields. Its main distinction from the DFT is that it transforms real inputs to real outputs, with no...
s, and so on, that any improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990).
Accuracy and approximations
All of the FFT algorithms discussed below compute the DFT exactly (in exact arithmetic, i.e. neglecting floating-point errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) achieves lower communication requirements for parallel computingParallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
with the help of a fast multipole method
Fast Multipole Method
The fast multipole method is a mathematical technique that was developed to speed up the calculation of long-ranged forces in the n-body problem...
. A wavelet
Wavelet
A wavelet is a wave-like oscillation with an amplitude that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" like one might see recorded by a seismograph or heart monitor. Generally, wavelets are purposefully crafted to have...
-based approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and non-sparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.
Even the "exact" FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley–Tukey, have excellent numerical properties as a consequence of the pairwise summation
Pairwise summation
In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence...
structure of the algorithms. The upper bound on the relative error
Approximation error
The approximation error in some data is the discrepancy between an exact value and some approximation to it. An approximation error can occur because#the measurement of the data is not precise due to the instruments...
for the Cooley–Tukey algorithm is O(ε log N), compared to O(εN3/2) for the naïve DFT formula (Gentleman and Sande, 1966), where ε is the machine floating-point relative precision. In fact, the root mean square
Root mean square
In mathematics, the root mean square , also known as the quadratic mean, is a statistical measure of the magnitude of a varying quantity. It is especially useful when variates are positive and negative, e.g., sinusoids...
(rms) errors are much better than these upper bounds, being only O(ε √log N) for Cooley–Tukey and O(ε √N) for the naïve DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function
Trigonometric function
In mathematics, the trigonometric functions are functions of an angle. They are used to relate the angles of a triangle to the lengths of the sides of a triangle...
values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate trigonometric recurrence
Generating trigonometric tables
In mathematics, tables of trigonometric functions are useful in a number of areas. Before the existence of pocket calculators, trigonometric tables were essential for navigation, science and engineering...
formulas. Some FFTs other than Cooley–Tukey, such as the Rader-Brenner algorithm, are intrinsically less stable.
In fixed-point arithmetic
Fixed-point arithmetic
In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...
, the finite-precision errors accumulated by FFT algorithms are worse, with rms errors growing as O(√N) for the Cooley–Tukey algorithm (Welch, 1969). Moreover, even achieving this accuracy requires careful attention to scaling in order to minimize the loss of precision, and fixed-point FFT algorithms involve rescaling at each intermediate stage of decompositions like Cooley–Tukey.
To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in O(N log N) time by a simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random inputs (Ergün, 1995).
Multidimensional FFTs
As defined in the multidimensional DFT article, the multidimensional DFTtransforms an array with a -dimensional vector
Coordinate vector
In linear algebra, a coordinate vector is an explicit representation of a vector in an abstract vector space as an ordered list of numbers or, equivalently, as an element of the coordinate space Fn....
of indices by a set of nested summations (over for each ), where the division , defined as , is performed element-wise. Equivalently, it is simply the composition of a sequence of sets of one-dimensional DFTs, performed along one dimension at a time (in any order).
This compositional viewpoint immediately provides the simplest and most common multidimensional DFT algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply performs a sequence of one-dimensional FFTs (by any of the above algorithms): first you transform along the dimension, then along the dimension, and so on (or actually, any ordering will work). This method is easily shown to have the usual complexity, where is the total number of data points transformed. In particular, there are transforms of size , etcetera, so the complexity of the sequence of FFTs is:
In two dimensions, the can be viewed as an matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
, and this algorithm corresponds to first performing the FFT of all the rows and then of all the columns (or vice versa), hence the name.
In more than two dimensions, it is often advantageous for cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
locality to group the dimensions recursively. For example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar "slice" for each fixed , and then perform the one-dimensional FFTs along the direction. More generally, an asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...
cache-oblivious algorithm consists of recursively dividing the dimensions into two groups and that are transformed recursively (rounding if is not even) (see Frigo and Johnson, 2005). Still, this remains a straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT algorithm as the base case, and still has complexity. Yet another variation is to perform matrix transpositions
Transpose
In linear algebra, the transpose of a matrix A is another matrix AT created by any one of the following equivalent actions:...
in between transforming subsequent dimensions, so that the transforms operate on contiguous data; this is especially important for out-of-core and distributed memory
Distributed memory
In computer science, distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors...
situations where accessing non-contiguous data is extremely time-consuming.
There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of them have complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm, which is a generalization of the ordinary Cooley–Tukey algorithm where one divides the transform dimensions by a vector of radices at each step. (This may also have cache benefits.) The simplest case of vector-radix is where all of the radices are equal (e.g. vector-radix-2 divides all of the dimensions by two), but this is not necessary. Vector radix with only a single non-unit radix at a time, i.e. , is essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due to Nussbaumer (1977), which view the transform in terms of convolutions and polynomial products. See Duhamel and Vetterli (1990) for more information and references.
Other generalizations
An O(N5/2 log N) generalization to spherical harmonicsSpherical harmonics
In mathematics, spherical harmonics are the angular portion of a set of solutions to Laplace's equation. Represented in a system of spherical coordinates, Laplace's spherical harmonics Y_\ell^m are a specific set of spherical harmonics that forms an orthogonal system, first introduced by Pierre...
on the sphere S2 with N2 nodes was described by Mohlenkamp (1999), along with an algorithm conjectured (but not proven) to have O(N2 log2 N) complexity; Mohlenkamp also provides an implementation in the libftsh library. A spherical-harmonic algorithm with O(N2 log N) complexity is described by Rokhlin and Tygert (2006).
Various groups have also published "FFT" algorithms for non-equispaced data, as reviewed in Potts et al. (2001). Such algorithms do not strictly compute the DFT (which is only defined for equispaced data), but rather some approximation thereof (a non-uniform discrete Fourier transform
Non-uniform discrete Fourier transform
In applied mathematics, the non-uniform discrete Fourier transform of a signal is a type of Fourier transform, related to a discrete Fourier transform or discrete-time Fourier transform, but in which the input signal is not sampled at equally-spaced intervals. As a result of this, the computed...
, or NDFT, which itself is often computed only approximately).
See also
- Prime-factor FFT algorithmPrime-factor FFT algorithmThe prime-factor algorithm , also called the Good–Thomas algorithm , is a fast Fourier transform algorithm that re-expresses the discrete Fourier transform of a size N = N1N2 as a two-dimensional N1×N2 DFT, but only for the case where N1 and N2 are relatively prime...
- Bruun's FFT algorithmBruun's FFT algorithmBruun's algorithm is a fast Fourier transform algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996...
- Rader's FFT algorithmRader's FFT algorithmRader's algorithm is a fast Fourier transform algorithm that computes the discrete Fourier transform of prime sizes by re-expressing the DFT as a cyclic convolution...
- Bluestein's FFT algorithmBluestein's FFT algorithmBluestein's FFT algorithm , commonly called the chirp z-transform algorithm , is a fast Fourier transform algorithm that computes the discrete Fourier transform of arbitrary sizes by re-expressing the DFT as a convolution...
- Butterfly diagramButterfly diagramIn the context of fast Fourier transform algorithms, a butterfly is a portion of the computation that combines the results of smaller discrete Fourier transforms into a larger DFT, or vice versa . The name "butterfly" comes from the shape of the data-flow diagram in the radix-2 case, as described...
– a diagram used to describe FFTs. - Odlyzko–Schönhage algorithm applies the FFT to finite Dirichlet series.
- Overlap add/Overlap save – efficient convolution methods using FFT for long signals
- Spectral musicSpectral musicSpectral music is a musical composition practice where compositional decisions are often informed by the analysis of sound spectra. Computer-based sound spectrum analysis using tools like DFT, FFT, and spectrograms...
(involves application of FFT analysis to musical composition) - Spectrum analyzerSpectrum analyzerA spectrum analyzer measures the magnitude of an input signal versus frequency within the full frequency range of the instrument. The primary use is to measure the power of the spectrum of known and unknown signals...
s – Devices that perform an FFT - FFTWFFTWFFTW, for "Fastest Fourier Transform in the West", is a software library for computing discrete Fourier transforms , developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology....
"Fastest Fourier Transform in the West" - 'C' library for the discrete Fourier transform (DFT) in one or more dimensions. - FFTPACKFFTPACKFFTPACK is a package of Fortran subroutines for the fast Fourier transform. It includes complex, real, sine, cosine, and quarter-wave transforms. It was developed by Paul Swarztrauber of the National Center for Atmospheric Research....
– another C and Java FFT library (public domain) - Time SeriesTime seriesIn statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
- Math Kernel LibraryMath Kernel LibraryIntel's Math Kernel Library is a library of optimized, math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, Sparse Solvers, Fast Fourier Transforms and Vector Math....
External links
- Fast Fourier Algorithm
- Fast Fourier Transforms, ConnexionsConnexionsConnexions is a global repository of educational content provided by Rice University. The entire collection is available free of charge, and students and learners alike can explore all the content they desire....
online book edited by C. Sidney Burrus, with chapters by C. Sidney Burrus, Ivan Selesnick, Markus Pueschel, Matteo Frigo, and Steven G. Johnson (2008). - Links to FFT code and information online.
- National Taiwan University – FFT
- FFT programming in C++ — Cooley–Tukey algorithm.
- Online documentation, links, book, and code.
- Using FFT to construct aggregate probability distributions
- Sri Welaratna, "30 years of FFT Analyzers", Sound and Vibration (January 1997, 30th anniversary issue). A historical review of hardware FFT devices.
- FFT Basics and Case Study Using Multi-Instrument
- FFT Textbook notes, PPTs, Videos at Holistic Numerical Methods Institute.
- ALGLIB FFT Code GPL Licensed multilanguage (VBA, C++, Pascal, etc.) numerical analysis and data processing library.