Short-time Fourier transform
Encyclopedia
The short-time Fourier transform (STFT), or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.
which is nonzero for only a short period of time. The Fourier transform
(a one-dimensional function) of the resulting signal is taken as the window is slid along the time axis, resulting in a two-dimensional representation of the signal. Mathematically, this is written as:
where w(t) is the window function
, commonly a Hann window or Gaussian "hill" centered around zero, and x(t) is the signal to be transformed. X(τ,ω) is essentially the Fourier Transform of x(t)w(t-τ), a complex function representing the phase and magnitude of the signal over time and frequency. Often phase unwrapping is employed along either or both the time axis, τ, and frequency axis, ω, to suppress any jump discontinuity of the phase result of the STFT. The time index τ is normally considered to be "slow" time and usually not expressed in as high resolution as time t.
ed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency. This can be expressed as:
likewise, with signal x[n] and window w[n]. In this case, m is discrete and ω is continuous, but in most typical applications the STFT is performed on a computer using the Fast Fourier Transform
, so both variables are discrete and quantized
.
The magnitude
squared of the STFT yields the spectrogram
of the function:
See also the modified discrete cosine transform
(MDCT), which is also a Fourier-related transform that uses overlapping windows.
It easily follows that
and
The continuous Fourier Transform is
Substituting x(t) from above:
Swapping order of integration:
So the Fourier Transform can be seen as a sort of phase coherent sum of all of the STFTs of x(t). Since the inverse Fourier transform is
then x(t) can be recovered from X(τ,ω) as
or
It can be seen, comparing to above that windowed "grain" or "wavelet" of x(t) is
the inverse Fourier transform of X(τ,ω) for τ fixed.
This is one of the reasons for the creation of the wavelet transform (or multiresolution analysis
in general), which can give good time resolution for high-frequency events, and good frequency resolution for low-frequency events, which is the type of analysis best suited for many real signals.
This property is related to the Heisenberg
uncertainty principle
, but it is not a direct relationship – see Gabor limit for discussion. The product of the standard deviation in time and frequency is limited. The boundary of the uncertainty principle (best simultaneous resolution of both) is reached with a Gaussian window function, as the Gaussian minimizes the Fourier uncertainty principle.
One can consider the STFT for varying window size as a two-dimensional domain (time and frequency), as illustrated in the example below, which can be calculated by varying the window size. However, this is no longer a strictly time–frequency representation – the kernel is not constant over the entire signal.
). The definition of is:
Then it is sampled at 400 Hz. The following spectrograms were produced:
The 25 ms window allows us to identify a precise time at which the signals change but the precise frequencies are difficult to identify. At the other end of the scale, the 1000 ms window allows the frequencies to be precisely seen but the time between frequency changes is blurred.
.
Take a window of N samples from an arbitrary real-valued signal at sampling rate fs . Taking the Fourier transform produces N complex coefficients. Of these coefficients only half are useful (the last N/2 being the complex conjugate of the first N/2 in reverse order, as this is a real valued signal).
These N/2 coefficients represent the frequencies 0 to fs/2 (Nyquist) and two consecutive coefficients are spaced apart by
fs/N Hz.
To increase the frequency resolution of the window the frequency spacing of the coefficients needs to be reduced. There are only two variables, but decreasing fs (and keeping N constant) will cause the window size to increase — since there are now fewer samples per unit time. The other alternative is to increase N, but this again causes the window size to increase. So any attempt to increase the frequency resolution causes a larger window size and therefore a reduction in time resolution—and vice-versa.
can, for example, show frequency on the horizontal axis, with the lowest frequencies at left, and the highest at the right. The height of each bar (augmented by color) represents the amplitude
of the frequencies within that band. The depth dimension represents time, where each new bar was a separate distinct transform. Audio engineers use this kind of visual to gain information about an audio sample, for example, to locate the frequencies of specific noises (especially when used with greater frequency resolution) or to find frequencies which may be more or less resonant in the space where the signal was recorded. This information can be used for equalization
or tuning other audio effects.
Other time-frequency transforms:
Continuous-time STFT
Simply, in the continuous-time case, the function to be transformed is multiplied by a window functionWindow function
In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation...
which is nonzero for only a short period of time. The Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...
(a one-dimensional function) of the resulting signal is taken as the window is slid along the time axis, resulting in a two-dimensional representation of the signal. Mathematically, this is written as:
where w(t) is the window function
Window function
In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation...
, commonly a Hann window or Gaussian "hill" centered around zero, and x(t) is the signal to be transformed. X(τ,ω) is essentially the Fourier Transform of x(t)w(t-τ), a complex function representing the phase and magnitude of the signal over time and frequency. Often phase unwrapping is employed along either or both the time axis, τ, and frequency axis, ω, to suppress any jump discontinuity of the phase result of the STFT. The time index τ is normally considered to be "slow" time and usually not expressed in as high resolution as time t.
Discrete-time STFT
In the discrete time case, the data to be transformed could be broken up into chunks or frames (which usually overlap each other, to reduce artifacts at the boundary). Each chunk is Fourier transformFourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...
ed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency. This can be expressed as:
likewise, with signal x[n] and window w[n]. In this case, m is discrete and ω is continuous, but in most typical applications the STFT is performed on a computer using the Fast Fourier Transform
Fast Fourier transform
A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...
, so both variables are discrete and quantized
Quantization (signal processing)
Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The error introduced by...
.
The magnitude
Magnitude (mathematics)
The magnitude of an object in mathematics is its size: a property by which it can be compared as larger or smaller than other objects of the same kind; in technical terms, an ordering of the class of objects to which it belongs....
squared of the STFT yields the spectrogram
Spectrogram
A spectrogram is a time-varying spectral representation that shows how the spectral density of a signal varies with time. Also known as spectral waterfalls, sonograms, voiceprints, or voicegrams, spectrograms are used to identify phonetic sounds, to analyse the cries of animals; they were also...
of the function:
See also the modified discrete cosine transform
Modified discrete cosine transform
The modified discrete cosine transform is a Fourier-related transform based on the type-IV discrete cosine transform , with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,...
(MDCT), which is also a Fourier-related transform that uses overlapping windows.
Sliding DFT
If only a small number of ω are desired, or if the STFT is desired to evaluated for every shift m of the window, then the STFT may be more efficiently evaluated using a sliding DFT algorithm.Inverse STFT
The STFT is invertible, that is, the original signal can be recovered from the transform by the Inverse STFT.Continuous-time STFT
Given the width and definition of the window function w(t), we initially require the height of the window function to be scaled so thatIt easily follows that
and
The continuous Fourier Transform is
Substituting x(t) from above:
Swapping order of integration:
So the Fourier Transform can be seen as a sort of phase coherent sum of all of the STFTs of x(t). Since the inverse Fourier transform is
then x(t) can be recovered from X(τ,ω) as
or
It can be seen, comparing to above that windowed "grain" or "wavelet" of x(t) is
the inverse Fourier transform of X(τ,ω) for τ fixed.
Resolution issues
One of the downfalls of the STFT is that it has a fixed resolution. The width of the windowing function relates to how the signal is represented—it determines whether there is good frequency resolution (frequency components close together can be separated) or good time resolution (the time at which frequencies change). A wide window gives better frequency resolution but poor time resolution. A narrower window gives good time resolution but poor frequency resolution. These are called narrowband and wideband transforms, respectively.This is one of the reasons for the creation of the wavelet transform (or multiresolution analysis
Multiresolution analysis
A multiresolution analysis or multiscale approximation is the design method of most of the practically relevant discrete wavelet transforms and the justification for the algorithm of the fast wavelet transform...
in general), which can give good time resolution for high-frequency events, and good frequency resolution for low-frequency events, which is the type of analysis best suited for many real signals.
This property is related to the Heisenberg
Werner Heisenberg
Werner Karl Heisenberg was a German theoretical physicist who made foundational contributions to quantum mechanics and is best known for asserting the uncertainty principle of quantum theory...
uncertainty principle
Uncertainty principle
In quantum mechanics, the Heisenberg uncertainty principle states a fundamental limit on the accuracy with which certain pairs of physical properties of a particle, such as position and momentum, can be simultaneously known...
, but it is not a direct relationship – see Gabor limit for discussion. The product of the standard deviation in time and frequency is limited. The boundary of the uncertainty principle (best simultaneous resolution of both) is reached with a Gaussian window function, as the Gaussian minimizes the Fourier uncertainty principle.
One can consider the STFT for varying window size as a two-dimensional domain (time and frequency), as illustrated in the example below, which can be calculated by varying the window size. However, this is no longer a strictly time–frequency representation – the kernel is not constant over the entire signal.
Example
Using the following sample signal that is composed of a set of four sinusoidal waveforms joined together in sequence. Each waveform is only composed of one of four frequencies (10, 25, 50, 100 HzHertz
The hertz is the SI unit of frequency defined as the number of cycles per second of a periodic phenomenon. One of its most common uses is the description of the sine wave, particularly those used in radio and audio applications....
). The definition of is:
Then it is sampled at 400 Hz. The following spectrograms were produced:
The 25 ms window allows us to identify a precise time at which the signals change but the precise frequencies are difficult to identify. At the other end of the scale, the 1000 ms window allows the frequencies to be precisely seen but the time between frequency changes is blurred.
Explanation
It can also be explained with reference to the sampling and Nyquist frequencyNyquist frequency
The Nyquist frequency, named after the Swedish-American engineer Harry Nyquist or the Nyquist–Shannon sampling theorem, is half the sampling frequency of a discrete signal processing system...
.
Take a window of N samples from an arbitrary real-valued signal at sampling rate fs . Taking the Fourier transform produces N complex coefficients. Of these coefficients only half are useful (the last N/2 being the complex conjugate of the first N/2 in reverse order, as this is a real valued signal).
These N/2 coefficients represent the frequencies 0 to fs/2 (Nyquist) and two consecutive coefficients are spaced apart by
fs/N Hz.
To increase the frequency resolution of the window the frequency spacing of the coefficients needs to be reduced. There are only two variables, but decreasing fs (and keeping N constant) will cause the window size to increase — since there are now fewer samples per unit time. The other alternative is to increase N, but this again causes the window size to increase. So any attempt to increase the frequency resolution causes a larger window size and therefore a reduction in time resolution—and vice-versa.
Application
STFTs as well as standard Fourier transforms and other tools are frequently used to analyze music. The spectrogramSpectrogram
A spectrogram is a time-varying spectral representation that shows how the spectral density of a signal varies with time. Also known as spectral waterfalls, sonograms, voiceprints, or voicegrams, spectrograms are used to identify phonetic sounds, to analyse the cries of animals; they were also...
can, for example, show frequency on the horizontal axis, with the lowest frequencies at left, and the highest at the right. The height of each bar (augmented by color) represents the amplitude
Amplitude
Amplitude is the magnitude of change in the oscillating variable with each oscillation within an oscillating system. For example, sound waves in air are oscillations in atmospheric pressure and their amplitudes are proportional to the change in pressure during one oscillation...
of the frequencies within that band. The depth dimension represents time, where each new bar was a separate distinct transform. Audio engineers use this kind of visual to gain information about an audio sample, for example, to locate the frequencies of specific noises (especially when used with greater frequency resolution) or to find frequencies which may be more or less resonant in the space where the signal was recorded. This information can be used for equalization
Equalization
Equalization, is the process of adjusting the balance between frequency components within an electronic signal. The most well known use of equalization is in sound recording and reproduction but there are many other applications in electronics and telecommunications. The circuit or equipment used...
or tuning other audio effects.
See also
- Time-frequency representationTime-frequency representationA time–frequency representation is a view of a signal represented over both time and frequency. Time–frequency analysis means analysis into the time–frequency domain provided by a TFR...
- Reassignment methodReassignment methodThe method of reassignment is a technique forsharpening a time-frequency representation by mappingthe data to time-frequency coordinates that are nearer tothe true region of support of theanalyzed signal...
Other time-frequency transforms:
- wavelet transform
- chirplet transformChirplet transformIn signal processing, the chirplet transform is an inner product of an input signal with a family of analysis primitives called chirplets.-Similarity to other transforms:...
- fractional Fourier transformFractional Fourier transformIn mathematics, in the area of harmonic analysis, the fractional Fourier transform is a linear transformation generalizing the Fourier transform. It can be thought of as the Fourier transform to the n-th power where n need not be an integer — thus, it can transform a function to an...
- Newland transform
- Constant Q transformConstant Q transformIn mathematics and signal processing, the Constant Q Transform transforms a data series to the frequency domain, and is related to the Fourier Transform ....
External links
- DiscreteTFDs – software for computing the short-time Fourier transform and other time-frequency distributions
- Singular Spectral Analysis - MultiTaper Method Toolkit - a free software program to analyze short, noisy time series.
- kSpectra Toolkit for Mac OS X from SpectraWorks