Spectral modelling synthesis
Encyclopedia
Spectral modeling synthesis or simply SMS is an acoustic modeling
approach for speech and other signals.
SMS considers sounds as a combination of harmonic
content and noise
content. Harmonic components are identified based on peaks in the frequency spectrum
of the signal, normally as found by the short-time Fourier transform
. The signal that remains following removal of the spectral components, sometimes referred to as the residual, is then modeled as white noise passed through a time-varying filter. The output of the model, then, are the frequencies and levels of the detected harmonic components and the coefficients of the time-varying filter.
Intuitively, the model can be applied to many types of audio signals. Speech signals, for example, include slowly-changing harmonic sounds caused by vibration of the vocal cords plus wideband, noise-like sounds caused by the lips and mouth. Musical instruments also produce sounds containing both harmonic componenents and percussive, noise-like sounds when the notes are struck or changed.
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech....
approach for speech and other signals.
SMS considers sounds as a combination of harmonic
Harmonic analysis
Harmonic analysis is the branch of mathematics that studies the representation of functions or signals as the superposition of basic waves. It investigates and generalizes the notions of Fourier series and Fourier transforms...
content and noise
Noise
In common use, the word noise means any unwanted sound. In both analog and digital electronics, noise is random unwanted perturbation to a wanted signal; it is called noise as a generalisation of the acoustic noise heard when listening to a weak radio transmission with significant electrical noise...
content. Harmonic components are identified based on peaks in the frequency spectrum
Frequency spectrum
The frequency spectrum of a time-domain signal is a representation of that signal in the frequency domain. The frequency spectrum can be generated via a Fourier transform of the signal, and the resulting values are usually presented as amplitude and phase, both plotted versus frequency.Any signal...
of the signal, normally as found by the short-time Fourier transform
Short-time Fourier transform
The short-time Fourier transform , or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time....
. The signal that remains following removal of the spectral components, sometimes referred to as the residual, is then modeled as white noise passed through a time-varying filter. The output of the model, then, are the frequencies and levels of the detected harmonic components and the coefficients of the time-varying filter.
Intuitively, the model can be applied to many types of audio signals. Speech signals, for example, include slowly-changing harmonic sounds caused by vibration of the vocal cords plus wideband, noise-like sounds caused by the lips and mouth. Musical instruments also produce sounds containing both harmonic componenents and percussive, noise-like sounds when the notes are struck or changed.
See also
- Speech codingSpeech codingSpeech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...
- CELPCode Excited Linear PredictionCode-excited linear prediction is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders...
- Source-filter model of speech productionSource-filter model of speech productionThe source–filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a linear acoustic filter, the vocal tract . An important assumption that is often made in the use of the source-filter model is the independence of source and filter...
- FM synthesis
- SPEAR - Sinusoidal Partial Editing Analysis and Resynthesis