Speech processing
Encyclopedia
Speech processing is the study of speech signals and the processing methods of these signals.
The signals are usually processed in a digital
representation, so speech processing can be regarded as a special case of digital signal processing
, applied to speech signal.
It is also closely tied to natural language processing
(NLP), as its input can come from / output can go to NLP applications. E.g. text-to-speech synthesis may use a syntactic parser on its input text and speech recognition
's output may be used by e.g. information extraction
techniques.
Speech processing can be divided into the following categories:
The signals are usually processed in a digital
Digital
A digital system is a data technology that uses discrete values. By contrast, non-digital systems use a continuous range of values to represent information...
representation, so speech processing can be regarded as a special case of digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
, applied to speech signal.
It is also closely tied to natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
(NLP), as its input can come from / output can go to NLP applications. E.g. text-to-speech synthesis may use a syntactic parser on its input text and speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
's output may be used by e.g. information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
techniques.
Speech processing can be divided into the following categories:
- Speech recognitionSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
, which deals with analysis of the linguisticLinguisticsLinguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
content of a speech signal. - Speaker recognitionSpeaker recognitionSpeaker recognition is the computing task of validating a user's claimed identity using characteristics extracted from their voices .There is a difference between speaker recognition and speech recognition . These two terms are frequently confused, as is voice recognition...
, where the aim is to recognize the identityIdentity (social science)Identity is a term used to describe a person's conception and expression of their individuality or group affiliations . The term is used more specifically in psychology and sociology, and is given a great deal of attention in social psychology...
of the speaker. - Speech codingSpeech codingSpeech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...
, a specialized form of data compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
, is important in the telecommunicationTelecommunicationTelecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...
area. - Voice analysisVoice analysisVoice analysis is the study of speech sounds for purposes other than linguistic content, such as in speech recognition. Such studies include mostly medical analysis of the voice i.e. phoniatrics, but also speaker identification...
for medical purposes, such as analysis of vocal loadingVocal loadingVocal loading is the stress inflicted on the speech organs when speaking for long periods.- Background :Of the working population, about 15% have professions where their voice is their primary tool. That includes professions such as teachers, sales personnel, actors and singers, and TV and radio...
and dysfunction of the vocal cords. - Speech synthesisSpeech synthesisSpeech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
: the artificial synthesis of speech, which usually means computer-generated speech. - Speech enhancementSpeech enhancementSpeech enhancement aims to improve speech quality by using various algorithms.The objective of enhancement is improvement in intelligibility and/or overall perceptual quality of degraded speech signal using audio signal processing techniques....
: enhancing the intelligibilityIntelligibilityIn phonetics, Intelligibility is a measure of how comprehendible speech is, or the degree to which speech can be understood. Intelligibility is affected by spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision.-Noise levels:...
and/or perceptual quality of a speech signal, like audio noise reduction for audio signals.
See also
- Audio signal processingAudio signal processingAudio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound. As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain...
- LinguisticsLinguisticsLinguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
- PhoneticsPhoneticsPhonetics is a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs : their physiological production, acoustic properties, auditory...
- Speech impediment
- Speech signal processingSpeech signal processingSpeech signal processing refers to the acquisition, manipulation, storage, transfer and output of vocal utterances by a computer. The main applications are the recognition, synthesis and compression of human speech:...
- Speech interface guidelineSpeech interface guidelineSpeech interface guideline is a guideline with the aim for guiding decisions and criteria regarding designing interfaces operated by human voice. Speech interface system has many advantages such as consistent service and saving cost. However, for users, listening is a difficult task. It can become...
- Packet loss concealmentPacket Loss ConcealmentPacket loss concealment is a technique to mask the effects of packet loss in VoIP communications. Because the voice signal is sent as packets on a VoIP network, they may travel different routes to get to destination. At the receiver a packet might arrive very late, corrupted or simply might not...
- UtteranceUtteranceIn spoken language analysis an utterance is a complete unit of speech. It is generally but not always bounded by silence.It can be represented and delineated in written language in many ways. Note that in such areas of research utterances do not exist in written language, only their representations...
External links
- Language Technologies Institute at CMU
- Language Technologies Research Center at IIIT Hyderabad, India
- Center for Language and Speech Processing at JHU
- Center for Research in Urdu Language Processing at FAST-NU
- Spoken Language Processing Group at Columbia University, New York
- Speech Processing Group
- Spoken Language Processing Group at LIMSI
- Speech Processing Group at the Laboratory of Applied Physics
- Speech and Hearing Group at University of Sheffield
- Speech and Language Processing
- Speech Processing
- Speech Processing Discussion Group
- Philips Speech Processing
- Philips Speech Recognition Systems
- Idiap Research Institute Speech Processing
- Nuance Communications
- Crescendo Speech Processing for Healthcare
- Vocapia Research