Linear predictive coding
Encyclopedia
Linear predictive coding (LPC) is a tool used mostly in audio signal processing
and speech processing
for representing the spectral envelope
of a digital
signal of speech in compressed
form, using the information of a linear predictive
model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.
(the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.
The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.
Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.
of Stanford University
, the first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT
described an approach to automatic phoneme discrimination that involved the first maximum likelihood
approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation
, May Glen Culler proposed realtime speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America
. In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold.
In 1972 Bob Kahn
of ARPA
, with Jim Forgie (Lincoln Laboratory
, LL) and Dave Walden (BBN Technologies
), started the first developments in packetized speech, which would eventually lead to Voice over IP
technology. In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol
, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate
LPC algorithm.
for definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.
There are more advanced representations such as Log Area Ratios
(LAR), line spectral pairs
(LSP) decomposition and reflection coefficient
s. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations.
and sent over a narrow voice channel; an early example of this is the US government's Navajo I
.
LPC synthesis can be used to construct vocoder
s where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music
.
Paul Lansky
made the well-known computer music piece notjustmoreidlechatter using linear predictive coding.http://www.music.princeton.edu/~paul/liner_notes/morethanidlechatter.html
A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy.
Waveform ROM
in some digital sample-based
music synthesizer
s made by Yamaha Corporation may be compressed using the LPC algorithm.
LPC predictors are used in Shorten
, MPEG-4 ALS, FLAC
, and other lossless audio codecs.
Audio signal processing
Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound. As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain...
and speech processing
Speech processing
Speech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...
for representing the spectral envelope
Spectral envelope
A spectral envelope is a curve in the frequency-amplitude plane, derived from a Fourier magnitude spectrum. It describes one point in time ....
of a digital
Digital
A digital system is a data technology that uses discrete values. By contrast, non-digital systems use a continuous range of values to represent information...
signal of speech in compressed
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
form, using the information of a linear predictive
Linear prediction
Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples....
model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.
Overview
LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation of the reality of speech production. The glottisGlottis
The glottis is defined as the combination of the vocal folds and the space in between the folds .-Function:...
(the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.
The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.
Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.
Early history of LPC
According to Robert M. GrayRobert M. Gray
Robert M. Gray is an American information theorist, and the Alcatel-Lucent Professor of Electrical Engineering at Stanford University in Palo Alto, California. He is best known for his contributions to quantization and compression, particularly the development of vector quantization.- Awards :Gray...
of Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
, the first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT
Nippon Telegraph and Telephone
, commonly known as NTT, is a Japanese telecommunications company headquartered in Tokyo, Japan. Ranked the 31st in Fortune Global 500, NTT is the largest telecommunications company in Asia, and the second-largest in the world in terms of revenue....
described an approach to automatic phoneme discrimination that involved the first maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation
Partial correlation
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.-Formal definition:...
, May Glen Culler proposed realtime speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America
Acoustical Society of America
The Acoustical Society of America is an international scientific society dedicated to increasing and diffusing the knowledge of acoustics and its practical applications.-History:...
. In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold.
In 1972 Bob Kahn
Bob Kahn
Robert Elliot Kahn is an American Internet pioneer, engineer and computer scientist, who, along with Vinton G. Cerf, invented the Transmission Control Protocol and the Internet Protocol , the fundamental communication protocols at the heart of the Internet.-Career:After receiving a B.E.E...
of ARPA
Defense Advanced Research Projects Agency
The Defense Advanced Research Projects Agency is an agency of the United States Department of Defense responsible for the development of new technology for use by the military...
, with Jim Forgie (Lincoln Laboratory
Lincoln Laboratory
MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense research and development center chartered to apply advanced technology to problems of national security. Research and development activities focus on long-term technology development as well as...
, LL) and Dave Walden (BBN Technologies
BBN Technologies
BBN Technologies is a high-technology company which provides research and development services. BBN is based next to Fresh Pond in Cambridge, Massachusetts, USA...
), started the first developments in packetized speech, which would eventually lead to Voice over IP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...
technology. In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol
Network Voice Protocol
The Network Voice Protocol was a pioneering computer network protocol for transporting human speech over packetized communications networks...
, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate
Variable bitrate
Variable bitrate is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate , VBR files vary the amount of output data per time segment...
LPC algorithm.
LPC coefficient representations
LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear predictionLinear prediction
Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples....
for definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.
There are more advanced representations such as Log Area Ratios
Log Area Ratios
Log area ratios can be used to represent reflection coefficients for transmission over a channel. While not as efficient as line spectral pairs , log area ratios are much simpler to compute...
(LAR), line spectral pairs
Line spectral pairs
Line spectral pairs or line spectral frequencies are used to represent linear prediction coefficients for transmission over a channel. LSPs have several properties that make them superior to direct quantization of LPCs...
(LSP) decomposition and reflection coefficient
Reflection coefficient
The reflection coefficient is used in physics and electrical engineering when wave propagation in a medium containing discontinuities is considered. A reflection coefficient describes either the amplitude or the intensity of a reflected wave relative to an incident wave...
s. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations.
Applications
LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encryptedEncryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...
and sent over a narrow voice channel; an early example of this is the US government's Navajo I
Navajo I
The Navajo I is a secure telephone built into a briefcase that was developed by the U.S. National Security Agency. According to information on display in 2002 at the NSA's National Cryptologic Museum, 110 units were built in the 1980s for use by senior government officials when traveling...
.
LPC synthesis can be used to construct vocoder
Vocoder
A vocoder is an analysis/synthesis system, mostly used for speech. In the encoder, the input is passed through a multiband filter, each band is passed through an envelope follower, and the control signals from the envelope followers are communicated to the decoder...
s where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music
Electronic music
Electronic music is music that employs electronic musical instruments and electronic music technology in its production. In general a distinction can be made between sound produced using electromechanical means and that produced using electronic technology. Examples of electromechanical sound...
.
Paul Lansky
Paul Lansky
Paul Lansky is an American electronic-music or computer-music composer who has been producing works from the 1970s up to the present day .-Biography:...
made the well-known computer music piece notjustmoreidlechatter using linear predictive coding.http://www.music.princeton.edu/~paul/liner_notes/morethanidlechatter.html
A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy.
Waveform ROM
Read-only memory
Read-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
in some digital sample-based
Sample-based synthesis
Sample-based synthesis is a form of audio synthesis that can be contrasted to either subtractive synthesis or additive synthesis. The principal difference with sample-based synthesis is that the seed waveforms are sampled sounds or instruments instead of fundamental waveforms such as the saw waves...
music synthesizer
Synthesizer
A synthesizer is an electronic instrument capable of producing sounds by generating electrical signals of different frequencies. These electrical signals are played through a loudspeaker or set of headphones...
s made by Yamaha Corporation may be compressed using the LPC algorithm.
LPC predictors are used in Shorten
Shorten
Shorten is a file format used for compressing audio data. It is a form of data compression of files and is used to losslessly compress CD-quality audio files . Shorten is no longer developed and more recent lossless audio codecs such as FLAC, Monkey's Audio , TTA, and WavPack have become more...
, MPEG-4 ALS, FLAC
FLAC
FLAC is a codec which allows digital audio to be losslessly compressed such that file size is reduced without any information being lost...
, and other lossless audio codecs.
See also
- Warped Linear Predictive CodingWarped Linear Predictive CodingWarped linear predictive coding is a variant of linear predictive coding in which the spectral representation of the system is modified, for example by replacing the unit delays used in an LPC implementation with first-order allpass filters...
- Akaike information criterionAkaike information criterionThe Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974...
- Audio compressionAudio compressionAudio compression may refer to:*Audio compression , a type of lossy compression in which the amount of data in a recorded waveform is reduced for transmission with some loss of quality, used in CD and MP3 encoding, Internet radio, and the like...
- Pitch estimation
- FS-1015FS-1015FS-1015 is a secure telephony speech encoding standard developed by the United States Department of Defense and later by NATO. It is also known as LPC-10 and STANAG 4198....
- FS-1016FS-1016FS-1016 is a deprecated secure telephony speech encoding standard developed by the United States Department of Defense. The standard was finished 1991.Currently, the CELP algorithm is used. Unlike the vocoder used in FS-1015, CELP provides more natural speech...
- Linear predictionLinear predictionLinear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples....