Articulatory synthesis
Encyclopedia
Articulatory synthesis refers to computational techniques for synthesizing speech
based on models of the human vocal tract
and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue
, jaw
, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.
Gerbert
(d. 1003), Albertus Magnus
(1198–1280) and Roger Bacon
(1214–1294) are all said to have built speaking heads (Wheatstone
1837). However, historically confirmed speech synthesis begins with Wolfgang von Kempelen
(1734–1804), who published an account of his research in 1791 (see also Dudley and Tarnoczy 1950).
and colleagues (1953), Gunnar Fant
(1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an analog computer
simulation.
in the mid-1970s by Philip Rubin
, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY http://www.haskins.yale.edu/facilities/asy.html, was a computational model of speech production based on vocal tract models developed at Bell Laboratories in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control tongue
shape.
, Mark Tiede http://www.haskins.yale.edu/staff/tiede.html, and Louis Goldstein http://www.yale.edu/linguist/faculty/louis.html, which matches midsagittal vocal tracts to actual magnetic resonance imaging
(MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. The ArtiSynth project http://www.magic.ubc.ca/artisynth/pmwiki.php, headed by Sidney Fels http://www.ece.ubc.ca/~ssfels/ at the University of British Columbia
, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the tongue
has been pioneered by a number of scientists, including Reiner Wilhelms-Tricarico http://www.haskins.yale.edu/staff/tricarico.html, Yohan Payan http://www-timc.imag.fr/Yohan.Payan/ and Jean-Michel Gerard http://www-timc.imag.fr/gmcao/en-fiches-projets/modele-langue.htm, Jianwu Dang and Kiyoshi Honda http://iipl.jaist.ac.jp/dang-lab/en/.
-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the University of Calgary
, where much of the original research was conducted. Following the demise of the various incarnations of NeXT
(started by Steve Jobs
in the late 1980s and merged with Apple Computer
in 1997), the Trillium software was published under a GNU General Public Licence, with work continuing as gnuspeech
. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model"http://www.ddl.ish-lyon.cnrs.fr/Annuaires/Index.asp?Action=Edit&Langue=A&Page=Rene%20CARRE.
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
based on models of the human vocal tract
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....
and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue
Tongue
The tongue is a muscular hydrostat on the floors of the mouths of most vertebrates which manipulates food for mastication. It is the primary organ of taste , as much of the upper surface of the tongue is covered in papillae and taste buds. It is sensitive and kept moist by saliva, and is richly...
, jaw
Jaw
The jaw is any opposable articulated structure at the entrance of the mouth, typically used for grasping and manipulating food. The term jaws is also broadly applied to the whole of the structures constituting the vault of the mouth and serving to open and close it and is part of the body plan of...
, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.
Mechanical talking heads
There is a long history of attempts to build mechanical "talking heads." http://www.haskins.yale.edu/featured/heads/heads.html Speech synthesis#Mechanical devices.Gerbert
Pope Silvester II
Pope Sylvester II , born Gerbert d'Aurillac, was a prolific scholar, teacher, and Pope. He endorsed and promoted study of Arab/Greco-Roman arithmetic, mathematics, and astronomy, reintroducing to Europe the abacus and armillary sphere, which had been lost to Europe since the end of the Greco-Roman...
(d. 1003), Albertus Magnus
Albertus Magnus
Albertus Magnus, O.P. , also known as Albert the Great and Albert of Cologne, is a Catholic saint. He was a German Dominican friar and a bishop, who achieved fame for his comprehensive knowledge of and advocacy for the peaceful coexistence of science and religion. Those such as James A. Weisheipl...
(1198–1280) and Roger Bacon
Roger Bacon
Roger Bacon, O.F.M. , also known as Doctor Mirabilis , was an English philosopher and Franciscan friar who placed considerable emphasis on the study of nature through empirical methods...
(1214–1294) are all said to have built speaking heads (Wheatstone
Charles Wheatstone
Sir Charles Wheatstone FRS , was an English scientist and inventor of many scientific breakthroughs of the Victorian era, including the English concertina, the stereoscope , and the Playfair cipher...
1837). However, historically confirmed speech synthesis begins with Wolfgang von Kempelen
Wolfgang von Kempelen
Johann Wolfgang Ritter von Kempelen de Pázmánd was a Hungarian author and inventor with Irish ancestors.-Life:...
(1734–1804), who published an account of his research in 1791 (see also Dudley and Tarnoczy 1950).
Electrical vocal tract analogs
The first electrical vocal tract analogs were static, like those of Dunn (1950), Ken StevensKenneth N. Stevens
Kenneth N. Stevens is Clarence J. LeBel Professor of Electrical Engineering and Computer Science, and Professor of Health Sciences and Technology at MIT. Stevens heads the Speech Communication Group in MIT's Research Laboratory of Electronics , and is one of the world's leading scientists in...
and colleagues (1953), Gunnar Fant
Gunnar Fant
Carl Gunnar Michael Fant was professor emeritus at the Royal Institute of Technology in Stockholm. He was a first cousin of George Fant.Gunnar Fant received a Master of Science in Electrical Engineering in 1945...
(1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an analog computer
Analog computer
An analog computer is a form of computer that uses the continuously-changeable aspects of physical phenomena such as electrical, mechanical, or hydraulic quantities to model the problem being solved...
simulation.
Haskins and Maeda models
The first software articulatory synthesizer regularly used for laboratory experiments was developed at Haskins LaboratoriesHaskins Laboratories
Haskins Laboratories is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language. Founded in 1935 and located in New Haven, Connecticut since 1970, Haskins Laboratories is a private, non-profit research institute with a...
in the mid-1970s by Philip Rubin
Philip Rubin
Philip E. Rubin is an American cognitive scientist and technologist who since 2003 has been the Chief Executive Officer and a Senior Scientist at Haskins Laboratories in New Haven, Connecticut...
, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY http://www.haskins.yale.edu/facilities/asy.html, was a computational model of speech production based on vocal tract models developed at Bell Laboratories in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control tongue
Tongue
The tongue is a muscular hydrostat on the floors of the mouths of most vertebrates which manipulates food for mastication. It is the primary organ of taste , as much of the upper surface of the tongue is covered in papillae and taste buds. It is sensitive and kept moist by saliva, and is richly...
shape.
Modern models
Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performed http://shylock.uab.es/icphs/plenariesandsymposia.htm. Examples include the Haskins CASY model (Configurable Articulatory Synthesis)http://www.haskins.yale.edu/facilities/casy.html, designed by Philip RubinPhilip Rubin
Philip E. Rubin is an American cognitive scientist and technologist who since 2003 has been the Chief Executive Officer and a Senior Scientist at Haskins Laboratories in New Haven, Connecticut...
, Mark Tiede http://www.haskins.yale.edu/staff/tiede.html, and Louis Goldstein http://www.yale.edu/linguist/faculty/louis.html, which matches midsagittal vocal tracts to actual magnetic resonance imaging
Magnetic resonance imaging
Magnetic resonance imaging , nuclear magnetic resonance imaging , or magnetic resonance tomography is a medical imaging technique used in radiology to visualize detailed internal structures...
(MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. The ArtiSynth project http://www.magic.ubc.ca/artisynth/pmwiki.php, headed by Sidney Fels http://www.ece.ubc.ca/~ssfels/ at the University of British Columbia
University of British Columbia
The University of British Columbia is a public research university. UBC’s two main campuses are situated in Vancouver and in Kelowna in the Okanagan Valley...
, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the tongue
Tongue
The tongue is a muscular hydrostat on the floors of the mouths of most vertebrates which manipulates food for mastication. It is the primary organ of taste , as much of the upper surface of the tongue is covered in papillae and taste buds. It is sensitive and kept moist by saliva, and is richly...
has been pioneered by a number of scientists, including Reiner Wilhelms-Tricarico http://www.haskins.yale.edu/staff/tricarico.html, Yohan Payan http://www-timc.imag.fr/Yohan.Payan/ and Jean-Michel Gerard http://www-timc.imag.fr/gmcao/en-fiches-projets/modele-langue.htm, Jianwu Dang and Kiyoshi Honda http://iipl.jaist.ac.jp/dang-lab/en/.
Commercial models
One of the few commercial articulatory speech synthesis systems is the NeXTNeXT
Next, Inc. was an American computer company headquartered in Redwood City, California, that developed and manufactured a series of computer workstations intended for the higher education and business markets...
-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the University of Calgary
University of Calgary
The University of Calgary is a public research university located in Calgary, Alberta, Canada. Founded in 1966 the U of C is composed of 14 faculties and more than 85 research institutes and centres.More than 25,000 undergraduate and 5,500 graduate students are currently...
, where much of the original research was conducted. Following the demise of the various incarnations of NeXT
NeXT
Next, Inc. was an American computer company headquartered in Redwood City, California, that developed and manufactured a series of computer workstations intended for the higher education and business markets...
(started by Steve Jobs
Steve Jobs
Steven Paul Jobs was an American businessman and inventor widely recognized as a charismatic pioneer of the personal computer revolution. He was co-founder, chairman, and chief executive officer of Apple Inc...
in the late 1980s and merged with Apple Computer
Apple Computer
Apple Inc. is an American multinational corporation that designs and markets consumer electronics, computer software, and personal computers. The company's best-known hardware products include the Macintosh line of computers, the iPod, the iPhone and the iPad...
in 1997), the Trillium software was published under a GNU General Public Licence, with work continuing as gnuspeech
Gnuspeech
Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules...
. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model"http://www.ddl.ish-lyon.cnrs.fr/Annuaires/Index.asp?Action=Edit&Langue=A&Page=Rene%20CARRE.
External links
- ArtiSynth
- ASY
- CASY
- From MRI and Acoustic Data to Articulatory Synthesis
- Praat
- Real-time articulatory speech-synthesis-by-rules
- Smithsonian Speech Synthesis History Project (SSSHP) 1986-2002
- Talking Heads
- TractSyn
- Introduction to Articulatory Speech Synthesis
- Simulated singing with the singing robot Pavarobotti or a description from the BBCBBCThe British Broadcasting Corporation is a British public service broadcaster. Its headquarters is at Broadcasting House in the City of Westminster, London. It is the largest broadcaster in the world, with about 23,000 staff...
on how the robot synthesized the singing.