Sample rate conversion
Encyclopedia
Sample rate conversion is the process of converting a (usually digital) signal from one sampling rate
to another, while changing the information carried by the signal as little as possible. When applied to an image, this process is sometimes called image scaling
.
Sample rate conversion is needed because different systems use different sampling rates, for engineering, economic, or historical reasons. The physics of sampling merely sets minimum sampling rate (an analog signal can be sampled at any rate above twice the highest frequency contained in the signal, see Nyquist frequency
), and so other factors determine the actual rates used. For example, different audio systems use different rates of 44.1, 48, and 96 kHz. As another example, American television, European television, and movies all use different numbers of frames per second. Users would like to transfer source material between these systems. Just replaying the existing data at the new rate will not normally work — it introduces large changes in pitch (for audio) and movement as well (for video), plus it cannot be done in real time. Hence sample rate conversion is required.
Two basic approaches are:
Modern systems almost all use the latter since this method introduces less noise and distortion. Though the calculations needed can be quite complex, they are entirely practical given today’s modern processing power.
A famous example of analog rate conversion was converting the slow-scan TV signals from the Apollo
moon missions to the conventional TV rates for the viewers at home. Another historical example, part analog and part digital, is the conversion of movies
(shot at 24 frames per second) to television (roughly 50 or 60 fieldsA field is half of an interlaced frame – just the odd or even lines. per second). To convert a 24 frame/sec movie to 60 field/sec television, for example, alternate movie frames are shown 2 and 3 times, respectively. For 50 Hz systems such as PAL
each frame is shown twice. Since 50 is not exactly 2x24, the movie will run 50/48 = 4% faster, and the audio pitch will be 4% higher, an effect known as PAL speed-up. This is often accepted for simplicity, but more complex methods are possible that preserve the running time and pitch. Every twelfth frame can be repeated 3 times rather than twice, or digital interpolation (see below) can be used in a video scaler
.
Although the two approaches seem very different, they are mathematically identical. Picking an interpolation function in the second scheme is equivalent to picking the impulse response of the digital filter in the first scheme. Linear interpolation is equivalent to a triangular impulse response; sinc will be an approximation to a brick-wall filter (it approaches the desirable "brick wall" filter as the number of points increase).
If the sample rate ratios are known, fixed, and rational, method (a) is better, in theory. The length of the impulse response of the filter in (a) is the same as choosing the number of points used in interpolation in (b). In approach (a), a slow precomputation such as the Remez algorithm
can be used to compute the "best" response possible given the number of points (best in terms of peak error in various frequency bands, and so on). Note that a truncated sinc function, though correct in the limit of an infinite number of points, is not the most accurate filter for a finite number of points.
However, method (b) will work in more general cases, where the sample rate ratios are not rational, or two real time streams must be accommodated, or the sample rates are time varying.
Normally, due to the mathematical operations employed, the output samples of sample rate conversion are almost always computed to more precision than the output format can hold. Conversion to the output bit size can be done by simple rounding
, or more sophisticated methods such as dither
or noise shaping
can be employed.
are sampled at 44.1 kHz, but a Digital Audio Tape
, or DAT is usually sampled at 48 kHz. How can material be converted from one sample rate to the other? First, note that 44.1 and 48 are in the ratio 147/160. Therefore to convert from 44.1 to 48, for example, the process is (conceptually):
So now the problem is how to generate the 7.056 MHz sampled signal, given that the original has only 1/160 of the samples needed. A first thought might be to interpolate between the existing points, but that turns out to have two problems. First, the frequency response will not be flat, and second, this will create some higher frequency content. The high frequency content can (and must) be removed with a digital filter (basically a complicated average over many points) but the frequency response problem remains.
The somewhat surprising answer is to replace the missing samples with zeros. So if the original audio samples were ..,a,b,c,.., then the 7.056 MHz sequence is ..,a,0,0,0,...0,0,b,0,0...0,0,c,.., with 159 zeros between each original sample. This too will create extra high frequency content (in fact it is worse in this respect than linear interpolation) but at least the frequency response is flat. Then the digital filter removes the unwanted high frequency content. The work of this digital filter is also much easier if zeros are inserted, since the filter is basically an average and almost all of the samples are known to be zero.
So inserting the zeros, then running the digital filter, gives the needed signal - sampled at 7.056 MHz, but with no content above 24 kHz. Then just taking every 147th sample gives the desired output. Which sample to start with does not matter - any set will work as long as they are 147 samples apart.
(In practice, of course, there is no reason to compute the values of the samples that will be discarded, and for the samples you still need to compute, you can take advantage of the fact that most of the inputs are 0. This is called polyphase decomposition, and drastically reduces the computation effort, without affecting the conversion quality.)
This process requires a digital filter (almost always an FIR
filter since these can be designed to have no phase distortion) that is flat to 20 kHz, and down at least x dB at 24 kHz. How big does x need to be? A first impression might be about 100 dB, since the maximum signal size is roughly ±32767, and the input quantization ±1/2, so the input had a signal to broadband noise ratio of 98 dB at most. However, the noise in the stopband (20 kHz to 3.5 MHz) is all folded into the passband by the decimation in the third step, so another 22 dB (that's a ratio of 160:1 expressed in dB) of stopband rejection is required to account for the noise folding. Thus 120 dB rejection yields a broadband noise roughly equal to the original quantizing noise.
There is no requirement that the resampling in the ratio 160:147 all be done in one step. Using the same example, we could re-sample the original at a ratio of 10:7, then 8:7, then 2:3 (or do these in any order that does not reduce the sample rate below the initial or final rates, or use any other factorization of the ratios). There may be various technical reasons for using a single step or multi-step process — typically the single step process involves less total computation but requires more coefficient storage.
Sampling rate
The sampling rate, sample rate, or sampling frequency defines the number of samples per unit of time taken from a continuous signal to make a discrete signal. For time-domain signals, the unit for sampling rate is hertz , sometimes noted as Sa/s...
to another, while changing the information carried by the signal as little as possible. When applied to an image, this process is sometimes called image scaling
Image scaling
In computer graphics, image scaling is the process of resizing a digital image. Scaling is a non-trivial process that involves a trade-off between efficiency, smoothness and sharpness. As the size of an image is increased, so the pixels which comprise the image become increasingly visible, making...
.
Sample rate conversion is needed because different systems use different sampling rates, for engineering, economic, or historical reasons. The physics of sampling merely sets minimum sampling rate (an analog signal can be sampled at any rate above twice the highest frequency contained in the signal, see Nyquist frequency
Nyquist frequency
The Nyquist frequency, named after the Swedish-American engineer Harry Nyquist or the Nyquist–Shannon sampling theorem, is half the sampling frequency of a discrete signal processing system...
), and so other factors determine the actual rates used. For example, different audio systems use different rates of 44.1, 48, and 96 kHz. As another example, American television, European television, and movies all use different numbers of frames per second. Users would like to transfer source material between these systems. Just replaying the existing data at the new rate will not normally work — it introduces large changes in pitch (for audio) and movement as well (for video), plus it cannot be done in real time. Hence sample rate conversion is required.
Two basic approaches are:
- Convert to analog, then re-sampleResamplingResampling may refer to:* Resampling , several related audio processes* Resampling , resampling methods in statistics* Resampling , scaling of bitmap images* Sample rate conversion-See also:* Downsampling* Upsampling...
at the new rate. - Digital signal processingDigital signal processingDigital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
– compute the values of the new samples from the old samples.
Modern systems almost all use the latter since this method introduces less noise and distortion. Though the calculations needed can be quite complex, they are entirely practical given today’s modern processing power.
A famous example of analog rate conversion was converting the slow-scan TV signals from the Apollo
Project Apollo
The Apollo program was the spaceflight effort carried out by the United States' National Aeronautics and Space Administration , that landed the first humans on Earth's Moon. Conceived during the Presidency of Dwight D. Eisenhower, Apollo began in earnest after President John F...
moon missions to the conventional TV rates for the viewers at home. Another historical example, part analog and part digital, is the conversion of movies
Telecine
Telecine is transferring motion picture film into video and is performed in a color suite. The term is also used to refer to the equipment used in the post-production process....
(shot at 24 frames per second) to television (roughly 50 or 60 fieldsA field is half of an interlaced frame – just the odd or even lines. per second). To convert a 24 frame/sec movie to 60 field/sec television, for example, alternate movie frames are shown 2 and 3 times, respectively. For 50 Hz systems such as PAL
PAL
PAL, short for Phase Alternating Line, is an analogue television colour encoding system used in broadcast television systems in many countries. Other common analogue television systems are NTSC and SECAM. This page primarily discusses the PAL colour encoding system...
each frame is shown twice. Since 50 is not exactly 2x24, the movie will run 50/48 = 4% faster, and the audio pitch will be 4% higher, an effect known as PAL speed-up. This is often accepted for simplicity, but more complex methods are possible that preserve the running time and pitch. Every twelfth frame can be repeated 3 times rather than twice, or digital interpolation (see below) can be used in a video scaler
Video scaler
A video scaler is a device for converting video signals from one size or resolution to another: usually "upscaling" or "upconverting" a video signal from a low resolution to one of higher resolution A video scaler is a device for converting video signals from one size or resolution to another:...
.
Digital sample rate conversion
There are at least two ways to perform digital sample rate conversion:- (a) If the two frequencies are in a fixed ratio, the conversion can be done as follows: Let F = lowest common multiple of the two frequencies. Generate a signal sampled at F by interpolating 0s in the original sample. This will also introduce replicas at multiples of the baseband frequency. Remove these with a digital low pass filter, until only the signals with less than half of the output sample frequency remain. Then reduce the sample rate by discarding the appropriate samples.
- (b) Another approach is to treat the samples as a time series, and create any needed new points by interpolation. In theory any interpolation method can be used, though linear (for simplicity) and a truncated sinc function (from theory) are most common.
Although the two approaches seem very different, they are mathematically identical. Picking an interpolation function in the second scheme is equivalent to picking the impulse response of the digital filter in the first scheme. Linear interpolation is equivalent to a triangular impulse response; sinc will be an approximation to a brick-wall filter (it approaches the desirable "brick wall" filter as the number of points increase).
If the sample rate ratios are known, fixed, and rational, method (a) is better, in theory. The length of the impulse response of the filter in (a) is the same as choosing the number of points used in interpolation in (b). In approach (a), a slow precomputation such as the Remez algorithm
Remez algorithm
The Remez algorithm , published by Evgeny Yakovlevich Remez in 1934 is an iterative algorithm used to find simple approximations to functions, specifically, approximations by functions in a Chebyshev space that are the best in the uniform norm L∞ sense.A typical...
can be used to compute the "best" response possible given the number of points (best in terms of peak error in various frequency bands, and so on). Note that a truncated sinc function, though correct in the limit of an infinite number of points, is not the most accurate filter for a finite number of points.
However, method (b) will work in more general cases, where the sample rate ratios are not rational, or two real time streams must be accommodated, or the sample rates are time varying.
Normally, due to the mathematical operations employed, the output samples of sample rate conversion are almost always computed to more precision than the output format can hold. Conversion to the output bit size can be done by simple rounding
Rounding
Rounding a numerical value means replacing it by another value that is approximately equal but has a shorter, simpler, or more explicit representation; for example, replacing $23.4476 with $23.45, or the fraction 312/937 with 1/3, or the expression √2 with 1.414.Rounding is often done on purpose to...
, or more sophisticated methods such as dither
Dither
Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images...
or noise shaping
Noise shaping
Noise shaping is a technique typically used in digital audio, image, and video processing, usually in combination with dithering, as part of the process of quantization or bit-depth reduction of a digital signal...
can be employed.
Example
CDsCompact Disc
The Compact Disc is an optical disc used to store digital data. It was originally developed to store and playback sound recordings exclusively, but later expanded to encompass data storage , write-once audio and data storage , rewritable media , Video Compact Discs , Super Video Compact Discs ,...
are sampled at 44.1 kHz, but a Digital Audio Tape
Digital Audio Tape
Digital Audio Tape is a signal recording and playback medium developed by Sony and introduced in 1987. In appearance it is similar to a compact audio cassette, using 4 mm magnetic tape enclosed in a protective shell, but is roughly half the size at 73 mm × 54 mm × 10.5 mm. As...
, or DAT is usually sampled at 48 kHz. How can material be converted from one sample rate to the other? First, note that 44.1 and 48 are in the ratio 147/160. Therefore to convert from 44.1 to 48, for example, the process is (conceptually):
Less technical explanation
The lowest common multiple of 44.1 kHz and 48 kHz is 7.056 MHz. Had the original audio signal been recorded at that sampling rate then the process would be simple. Since 7.056 MHz is 160 x 44.1 kHz, and also 147 x 48 kHz, all we would need to do is take every 160th sample to get a 44.1 kHz sampling rate, and every 147th sample to get a 48 kHz sampling rate. Taking every Nth sample like this preserves the content provided the information (the audio signal) does not have any content above half the lowest sampling rate used (22.05 kHz) in this case.So now the problem is how to generate the 7.056 MHz sampled signal, given that the original has only 1/160 of the samples needed. A first thought might be to interpolate between the existing points, but that turns out to have two problems. First, the frequency response will not be flat, and second, this will create some higher frequency content. The high frequency content can (and must) be removed with a digital filter (basically a complicated average over many points) but the frequency response problem remains.
The somewhat surprising answer is to replace the missing samples with zeros. So if the original audio samples were ..,a,b,c,.., then the 7.056 MHz sequence is ..,a,0,0,0,...0,0,b,0,0...0,0,c,.., with 159 zeros between each original sample. This too will create extra high frequency content (in fact it is worse in this respect than linear interpolation) but at least the frequency response is flat. Then the digital filter removes the unwanted high frequency content. The work of this digital filter is also much easier if zeros are inserted, since the filter is basically an average and almost all of the samples are known to be zero.
So inserting the zeros, then running the digital filter, gives the needed signal - sampled at 7.056 MHz, but with no content above 24 kHz. Then just taking every 147th sample gives the desired output. Which sample to start with does not matter - any set will work as long as they are 147 samples apart.
Technical explanation
- Insert 159 zeros between every input sample. This raises the data rate to 7.056 MHz, the least common multipleLeast common multipleIn arithmetic and number theory, the least common multiple of two integers a and b, usually denoted by LCM, is the smallest positive integer that is a multiple of both a and b...
of 44.1 and 48 kHz. Since this operation is equivalent to reconstructing with Dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...
s, it also creates images of frequency f at 44.1−f, 44.1+f, 88.2−f, 88.2+f, ... - Remove the images with a digital filterDigital filterIn electronics, computer science and mathematics, a digital filter is a system that performs mathematical operations on a sampled, discrete-time signal to reduce or enhance certain aspects of that signal. This is in contrast to the other major type of electronic filter, the analog filter, which is...
, leaving a signal containing only 0–20 kHz information, but still sampled at a rate of 7.056 MHz. - Discard 146 of every 147 output samples. It does not hurt to do so since the signal now has no significant content above 24 kHz.
(In practice, of course, there is no reason to compute the values of the samples that will be discarded, and for the samples you still need to compute, you can take advantage of the fact that most of the inputs are 0. This is called polyphase decomposition, and drastically reduces the computation effort, without affecting the conversion quality.)
This process requires a digital filter (almost always an FIR
Finite impulse response
A finite impulse response filter is a type of a signal processing filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response filters, which have internal feedback and may continue to respond indefinitely...
filter since these can be designed to have no phase distortion) that is flat to 20 kHz, and down at least x dB at 24 kHz. How big does x need to be? A first impression might be about 100 dB, since the maximum signal size is roughly ±32767, and the input quantization ±1/2, so the input had a signal to broadband noise ratio of 98 dB at most. However, the noise in the stopband (20 kHz to 3.5 MHz) is all folded into the passband by the decimation in the third step, so another 22 dB (that's a ratio of 160:1 expressed in dB) of stopband rejection is required to account for the noise folding. Thus 120 dB rejection yields a broadband noise roughly equal to the original quantizing noise.
There is no requirement that the resampling in the ratio 160:147 all be done in one step. Using the same example, we could re-sample the original at a ratio of 10:7, then 8:7, then 2:3 (or do these in any order that does not reduce the sample rate below the initial or final rates, or use any other factorization of the ratios). There may be various technical reasons for using a single step or multi-step process — typically the single step process involves less total computation but requires more coefficient storage.
Further reading
- Multirate Digital Signal Processing, by Crochiere and Rabiner. ISBN 0-13-605162-6