CMU Pronouncing Dictionary
Encyclopedia
The CMU Pronouncing Dictionary (also known as cmudict) is a public domain
Public domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...

 pronouncing dictionary
Dictionary
A dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...

 created by Carnegie Mellon University
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....

 (CMU). It is used as the American lexicon for the Festival Speech Synthesis System
Festival Speech Synthesis System
Festival is a general multi-lingual speech synthesis system originally developed by Alan W. Black at at the University of Edinburgh. Substantial contributions have also been provided by Carnegie Mellon University and other sites...

 and also for the CMU Sphinx
CMU Sphinx
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University...

 speech recognition system. The latest release is 0.7a, which contains 133,746 entries (from 123,442 baseforms).

Database Format

The database is distributed as a text file of the format word pronunciation. If there are multiple pronunciations available for a word, all subsequent entries are followed by an index in parentheses. The pronunciation is encoded using a modified form of the Arpabet
Arpabet
Arpabet is a phonetic transcription code developed by Advanced Research Projects Agency as a part of their Speech Understanding Project . It represents each phoneme of General American English with a distinct sequence of ASCII characters. Arpabet has been used in several speech synthesizers, like...

 system. The difference is stress marks on vowels with levels 0, 1, 2; not all entries have stress however.

History

Version Release date
0.1 16 September 1993
0.2 10 March 1994
0.3 28 September 1994
0.4 8 November 1995
0.5 No public release
0.6 11 August 1998
0.7a 19 February 2008

Applications

  • The Unifon
    Unifon
    Unifon is a phonemic orthography for English designed in the mid-1950s by Dr. John R. Malone, a Chicago economist and newspaper equipment consultant. It was developed into a teaching aid to help children acquire reading and writing skills. Like the pronunciation key in a dictionary, Unifon matches...

     converter is based on the CMU Pronouncing Dictionary.
  • The Natural Language Toolkit
    Natural Language Toolkit
    Natural Language Toolkit or, more commonly, NLTK is a suite of libraries and programs for symbolic and statistical natural language processing for the Python programming language. NLTK includes graphical demonstrations and sample data...

     contains an interface to the CMU Pronouncing Dictionary.
  • The Carnegie Mellon Logios tool incorporates the CMU Pronouncing Dictionary.

External links

  • The current version of the dictionary is maintained at SourceForge.
  • Homepage – includes database search
  • RDF converted to Resource Description Framework
    Resource Description Framework
    The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

     by the open source Texai project.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK