Modified discrete cosine transform
Encyclopedia
The modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform
(DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,
where subsequent blocks are overlapped so that
the last half of one block coincides with the first half of the next block.
This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts
stemming from the block boundaries. As a result of these advantages, the MDCT is employed in most modern lossy audio formats, including MP3
, AC-3
, Vorbis
, Windows Media Audio
, ATRAC
, Cook
, and AAC
.
The MDCT was proposed by Princen, Johnson, and Bradley in 1987, following earlier (1986) work by Princen and Bradley to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the discrete sine transform
, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter
(PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) MPEG-4 AAC-SSR variant (by Sony
) uses a four-band PQF bank followed by an MDCT. Similar to MP3, ATRAC
uses stacked quadrature mirror filter
s (QMF) followed by an MDCT.
function
(where R denotes the set of real number
s). The 2N real numbers x0, ..., x2N-1 are transformed into the N real numbers X0, ..., XN-1 according to the formula:
(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)
The IMDCT transforms N real numbers X0, ..., XN-1 into 2N real numbers y0, ..., y2N-1 according to the formula:
(Like for the DCT-IV, an orthogonal transform, the inverse has the same form as the forward transform.)
In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/N).
(FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(N) pre- and post-processing steps. Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size.
wn (n = 0, ..., 2N-1) that is multiplied with xn and yn in the MDCT and IMDCT formulas, above, in order to avoid discontinuities at the n = 0 and 2N boundaries by making the function go smoothly to zero at those points. (That is, we window the data before the MDCT and after the IMDCT.) In principle, x and y could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks.
The transform remains invertible (that is, TDAC works), for a symmetric window wn = w2N-1-n, as long as w satisfies the Princen-Bradley condition:
.
various window functions are common, e.g.
for MP3 and MPEG-2 AAC, and
for Vorbis. AC-3 uses a Kaiser-Bessel derived (KBD) window, and MPEG-4 AAC can also use a KBD window.
Note that windows applied to the MDCT are different from windows used for other types of signal analysis, since they must fulfill the Princen-Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).
In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around n=–1/2), odd at its right boundary (around n=N–1/2), and so on (instead of periodic boundaries as for a DFT
). This follows from the identities and . Thus, if its inputs are an array x of length N, we can imagine extending this array to (x, –xR, –x, xR, ...) and so on, where xR denotes x in reverse order.
Consider an MDCT with 2N inputs and N outputs, where we divide the inputs into four blocks (a, b, c, d) each of size N/2. If we shift these by N/2 (from the +N/2 term in the MDCT definition), then (b, c, d) extend past the end of the N DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above.
(In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.)
Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is shifted by N/2 and extended (via the boundary conditions) to a length 2N. The inverse DCT-IV would simply give back the inputs (–cR–d, a–bR) from above. When this is shifted and extended via the boundary conditions, one obtains:
Half of the IMDCT outputs are thus redundant, as b–aR = –(a–bR)R, and likewise for the last two terms.
One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2N block (c, d, e, f). The IMDCT will then yield, analogous to the above: (c–dR, d–cR, e+fR, eR+f) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply (c, d), recovering the original data.
are aliased
to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain. Hence the combinations c–dR and so on, which have precisely the right signs for the combinations to cancel when they are added.
For odd N (which are rarely used in practice), N/2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above.
Recall from above that when and are MDCTed, IMDCTed, and added in their overlapping half, we obtain , the original data.
Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a window function of length 2N. As above, we assume a symmetric window function, which is therefore of the form where w and z are length-N/2 vectors and R denotes reversal as before. Then the Princen-Bradley condition can be written: , with the multiplications and additions performed elementwise, or equivalently (reversing w and z).
Therefore, instead of MDCTing , we now MDCT (with all multiplications performed elementwise). When this is IMDCTed and multiplied again (elementwise) by the window function, the last-N half becomes:.
(Note that we no longer have the multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.)
Similarly, the windowed MDCT and IMDCT of yields, in its first-N half:.
When we add these two halves together, we obtain:
recovering the original data.
MDCT page from Google knol data base
Discrete cosine transform
A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images A discrete cosine transform...
(DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,
where subsequent blocks are overlapped so that
the last half of one block coincides with the first half of the next block.
This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts
Compression artifact
A compression artifact is a noticeable distortion of media caused by the application of lossy data compression....
stemming from the block boundaries. As a result of these advantages, the MDCT is employed in most modern lossy audio formats, including MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...
, AC-3
AC-3
AC-3 can be:* AC-3 algorithm , one of a series of algorithms used for the solution of constraint satisfaction problems* Comte AC-3, a 1920s Swiss bomber/transport aircraft.* Dolby AC-3, a Dolby Digital, audio codec...
, Vorbis
Vorbis
Vorbis is a free software / open source project headed by the Xiph.Org Foundation . The project produces an audio format specification and software implementation for lossy audio compression...
, Windows Media Audio
Windows Media Audio
Windows Media Audio is an audio data compression technology developed by Microsoft. The name can be used to refer to its audio file format or its audio codecs. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs...
, ATRAC
ATRAC
Adaptive Transform Acoustic Coding is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio...
, Cook
Cook Codec
The cook codec is a lossy audio compression codec developed by RealNetworks. It is also known as Cooker, Gecko, RealAudio G2, and RealAudio 8 low bitrate ....
, and AAC
Advanced Audio Coding
Advanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
.
The MDCT was proposed by Princen, Johnson, and Bradley in 1987, following earlier (1986) work by Princen and Bradley to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the discrete sine transform
Discrete sine transform
In mathematics, the discrete sine transform is a Fourier-related transform similar to the discrete Fourier transform , but using a purely real matrix...
, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter
Polyphase quadrature filter
A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N of equidistant sub-bands. These sub-bands are subsampled by a factor of N, so they are critically sampled....
(PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) MPEG-4 AAC-SSR variant (by Sony
Sony
, commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....
) uses a four-band PQF bank followed by an MDCT. Similar to MP3, ATRAC
ATRAC
Adaptive Transform Acoustic Coding is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio...
uses stacked quadrature mirror filter
Quadrature mirror filter
In digital signal processing, a quadrature mirror filter is a filter most commonly used to implement a filter bank that splits an input signal into two bands...
s (QMF) followed by an MDCT.
Definition
As a lapped transform, the MDCT is a bit unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a linearLinear
In mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...
function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
(where R denotes the set of real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
s). The 2N real numbers x0, ..., x2N-1 are transformed into the N real numbers X0, ..., XN-1 according to the formula:
(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)
Inverse transform
The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as time-domain aliasing cancellation (TDAC).The IMDCT transforms N real numbers X0, ..., XN-1 into 2N real numbers y0, ..., y2N-1 according to the formula:
(Like for the DCT-IV, an orthogonal transform, the inverse has the same form as the forward transform.)
In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/N).
Computation
Although the direct application of the MDCT formula would require O(N2) operations, it is possible to compute the same thing with only O(N log N) complexity by recursively factorizing the computation, as in the fast Fourier transformFast Fourier transform
A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...
(FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(N) pre- and post-processing steps. Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size.
Window functions
In typical signal-compression applications, the transform properties are further improved by using a window functionWindow function
In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation...
wn (n = 0, ..., 2N-1) that is multiplied with xn and yn in the MDCT and IMDCT formulas, above, in order to avoid discontinuities at the n = 0 and 2N boundaries by making the function go smoothly to zero at those points. (That is, we window the data before the MDCT and after the IMDCT.) In principle, x and y could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks.
The transform remains invertible (that is, TDAC works), for a symmetric window wn = w2N-1-n, as long as w satisfies the Princen-Bradley condition:
.
various window functions are common, e.g.
for MP3 and MPEG-2 AAC, and
for Vorbis. AC-3 uses a Kaiser-Bessel derived (KBD) window, and MPEG-4 AAC can also use a KBD window.
Note that windows applied to the MDCT are different from windows used for other types of signal analysis, since they must fulfill the Princen-Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).
Relationship to DCT-IV and Origin of TDAC
As can be seen by inspection of the definitions, for even N the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by N/2 and two N-blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived.In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around n=–1/2), odd at its right boundary (around n=N–1/2), and so on (instead of periodic boundaries as for a DFT
Discrete Fourier transform
In mathematics, the discrete Fourier transform is a specific kind of discrete transform, used in Fourier analysis. It transforms one function into another, which is called the frequency domain representation, or simply the DFT, of the original function...
). This follows from the identities and . Thus, if its inputs are an array x of length N, we can imagine extending this array to (x, –xR, –x, xR, ...) and so on, where xR denotes x in reverse order.
Consider an MDCT with 2N inputs and N outputs, where we divide the inputs into four blocks (a, b, c, d) each of size N/2. If we shift these by N/2 (from the +N/2 term in the MDCT definition), then (b, c, d) extend past the end of the N DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above.
- Thus, the MDCT of 2N inputs (a, b, c, d) is exactly equivalent to a DCT-IV of the N inputs: (–cR–d, a–bR), where R denotes reversal as above.
(In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.)
Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is shifted by N/2 and extended (via the boundary conditions) to a length 2N. The inverse DCT-IV would simply give back the inputs (–cR–d, a–bR) from above. When this is shifted and extended via the boundary conditions, one obtains:
- IMDCT(MDCT(a, b, c, d)) = (a–bR, b–aR, c+dR, d+cR) / 2.
Half of the IMDCT outputs are thus redundant, as b–aR = –(a–bR)R, and likewise for the last two terms.
One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2N block (c, d, e, f). The IMDCT will then yield, analogous to the above: (c–dR, d–cR, e+fR, eR+f) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply (c, d), recovering the original data.
Origin of TDAC
The origin of the term "time-domain aliasing cancellation" is now clear. The use of input data that extend beyond the boundaries of the logical DCT-IV causes the data to be aliased in exactly the same way that frequencies beyond the Nyquist frequencyNyquist frequency
The Nyquist frequency, named after the Swedish-American engineer Harry Nyquist or the Nyquist–Shannon sampling theorem, is half the sampling frequency of a discrete signal processing system...
are aliased
Aliasing
In signal processing and related disciplines, aliasing refers to an effect that causes different signals to become indistinguishable when sampled...
to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain. Hence the combinations c–dR and so on, which have precisely the right signs for the combinations to cancel when they are added.
For odd N (which are rarely used in practice), N/2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above.
TDAC for the windowed MDCT
Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data. The derivation of this inverse property for the windowed MDCT is only slightly more complicated.Recall from above that when and are MDCTed, IMDCTed, and added in their overlapping half, we obtain , the original data.
Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a window function of length 2N. As above, we assume a symmetric window function, which is therefore of the form where w and z are length-N/2 vectors and R denotes reversal as before. Then the Princen-Bradley condition can be written: , with the multiplications and additions performed elementwise, or equivalently (reversing w and z).
Therefore, instead of MDCTing , we now MDCT (with all multiplications performed elementwise). When this is IMDCTed and multiplied again (elementwise) by the window function, the last-N half becomes:.
(Note that we no longer have the multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.)
Similarly, the windowed MDCT and IMDCT of yields, in its first-N half:.
When we add these two halves together, we obtain:
recovering the original data.
See also
Other overlapping windowed Fourier transforms include:- Short-time Fourier transformShort-time Fourier transformThe short-time Fourier transform , or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time....
- Welch's method
MDCT page from Google knol data base