Voice analysis
Encyclopedia
Voice analysis is the study of speech sounds for purposes other than linguistic content, such as in speech recognition
. Such studies include mostly medical analysis of the voice
i.e. phoniatrics
, but also speaker identification. More controversially, some believe that the truthfulness or emotional state of speakers can be determined using Voice Stress Analysis
or Layered Voice Analysis.
removed from his or her vocal cords through an operation. In order to objectively evaluate the improvement in voice quality there has to be some measure of voice quality. An experienced voice therapist can quite reliably evaluate the voice, but this requires extensive training and is still always subjective.
Another active research topic in medical voice analysis is vocal loading
evaluation. The vocal cords of a person speaking for an extended period of time will suffer from tiring, that is, the process of speaking exerts a load on the vocal cords where the tissue will suffer from tiring. Among professional voice users (i.e. teachers, sales people) this tiring can cause voice failures and sick leaves. To evaluate these problems vocal loading needs to be objectively measured.
s or ultrasound
s do not work because the vocal cords are surrounded by cartilage which distort image quality. Movements in the vocal cords are rapid, fundamental frequencies
are usually between 80 and 300 Hz
, thus preventing usage of ordinary video. Stroboscopic
, and high-speed videos provide an option but in order to see the vocal folds, a fiberoptic probe leading to the camera has to be positioned in the throat, which makes speaking difficult. In addition, placing objects in the pharynx usually triggers a gag reflex that stops voicing and closes the larynx. In addition, stroboscopic imaging is only useful when the vocal fold vibratory pattern is closely periodic.
The most important indirect methods are currently inverse filtering
of either microphone or oral airflow recordings and electroglottography
(EGG). In inverse filtering, the speech sound (the radiated acoustic pressure waveform, as obtained from a microphone) or the oral airflow waveform from a circumferentially vented (CV) mask is recorded outside the mouth and then filtered by a mathematical method to remove the effects of the vocal tract. This method produces an estimate of the waveform of the glottal airflow pulses, which in turn reflect the movements of the vocal folds. The other kind of noninvasive indirect indication of vocal fold motion is the electroglottography, in which electrodes placed on either side of the subject's throat at the level of the vocal folds record the changes in the conductivity of the throat according to how large a portion of the vocal folds are touching each other. It thus yields one-dimensional information of the contact area. Neither inverse filtering nor EGG are sufficient to completely describe the complex 3-dimensional pattern of vocal fold movement, but can provide useful indirect evidence of that movement.
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
. Such studies include mostly medical analysis of the voice
Human voice
The human voice consists of sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Its frequency ranges from about 60 to 7000 Hz. The human voice is specifically that part of human sound production in which the vocal folds are the primary...
i.e. phoniatrics
Phoniatrics
Phoniatrics is the medical research and treatment of organs involved with speech production. In general terms the speech organs means the mouth, throat , the vocal cords and lungs...
, but also speaker identification. More controversially, some believe that the truthfulness or emotional state of speakers can be determined using Voice Stress Analysis
Voice stress analysis
Voice Stress Analysis technology is said to record psychophysiological stress responses that are present in human voice, when a person suffers psychological stress in response to a stimulus and where the consequences may be dire for the subject being 'tested'.In the Detection Of Deception ...
or Layered Voice Analysis.
Typical voice problems
A medical study of the voice can be, for instance, analysis of the voice of patients who have had a polypPolyp
A polyp in zoology is one of two forms found in the phylum Cnidaria, the other being the medusa. Polyps are approximately cylindrical in shape and elongated at the axis of the body...
removed from his or her vocal cords through an operation. In order to objectively evaluate the improvement in voice quality there has to be some measure of voice quality. An experienced voice therapist can quite reliably evaluate the voice, but this requires extensive training and is still always subjective.
Another active research topic in medical voice analysis is vocal loading
Vocal loading
Vocal loading is the stress inflicted on the speech organs when speaking for long periods.- Background :Of the working population, about 15% have professions where their voice is their primary tool. That includes professions such as teachers, sales personnel, actors and singers, and TV and radio...
evaluation. The vocal cords of a person speaking for an extended period of time will suffer from tiring, that is, the process of speaking exerts a load on the vocal cords where the tissue will suffer from tiring. Among professional voice users (i.e. teachers, sales people) this tiring can cause voice failures and sick leaves. To evaluate these problems vocal loading needs to be objectively measured.
Analysis methods
Voice problems that require voice analysis most commonly originate from the vocal folds or the laryngeal musculature that controls them, since the folds are subject to collision forces with each vibratory cycle and to drying from the air being forced through the small gap between them, and the laryngeal musclature is intensely active during speech or singing and is subject to tiring. However, dynamic analysis of the vocal folds and their movement is physically difficult. The location of the vocal folds effectively prohibits direct, invasive measurement of movement. Less invasive imaging methods such as x-rayX-ray
X-radiation is a form of electromagnetic radiation. X-rays have a wavelength in the range of 0.01 to 10 nanometers, corresponding to frequencies in the range 30 petahertz to 30 exahertz and energies in the range 120 eV to 120 keV. They are shorter in wavelength than UV rays and longer than gamma...
s or ultrasound
Ultrasound
Ultrasound is cyclic sound pressure with a frequency greater than the upper limit of human hearing. Ultrasound is thus not separated from "normal" sound based on differences in physical properties, only the fact that humans cannot hear it. Although this limit varies from person to person, it is...
s do not work because the vocal cords are surrounded by cartilage which distort image quality. Movements in the vocal cords are rapid, fundamental frequencies
Fundamental frequency
The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the lowest frequency of a periodic waveform. In terms of a superposition of sinusoids The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the...
are usually between 80 and 300 Hz
Hertz
The hertz is the SI unit of frequency defined as the number of cycles per second of a periodic phenomenon. One of its most common uses is the description of the sine wave, particularly those used in radio and audio applications....
, thus preventing usage of ordinary video. Stroboscopic
Stroboscope
A stroboscope, also known as a strobe, is an instrument used to make a cyclically moving object appear to be slow-moving, or stationary. The principle is used for the study of rotating, reciprocating, oscillating or vibrating objects...
, and high-speed videos provide an option but in order to see the vocal folds, a fiberoptic probe leading to the camera has to be positioned in the throat, which makes speaking difficult. In addition, placing objects in the pharynx usually triggers a gag reflex that stops voicing and closes the larynx. In addition, stroboscopic imaging is only useful when the vocal fold vibratory pattern is closely periodic.
The most important indirect methods are currently inverse filtering
Inverse filter
In all proposed models for the production of human speech, an important variable is the waveform of the airflow, or volume velocity, at the glottis. The glottal volume velocity waveform provides the link between movements of the vocal folds and the acoustical results of such movements, in that the...
of either microphone or oral airflow recordings and electroglottography
Electroglottograph
The electroglottograph, or EGG, is a device for the noninvasive measurement of the time variation of the degree of contact between the vibrating vocal folds during voice production...
(EGG). In inverse filtering, the speech sound (the radiated acoustic pressure waveform, as obtained from a microphone) or the oral airflow waveform from a circumferentially vented (CV) mask is recorded outside the mouth and then filtered by a mathematical method to remove the effects of the vocal tract. This method produces an estimate of the waveform of the glottal airflow pulses, which in turn reflect the movements of the vocal folds. The other kind of noninvasive indirect indication of vocal fold motion is the electroglottography, in which electrodes placed on either side of the subject's throat at the level of the vocal folds record the changes in the conductivity of the throat according to how large a portion of the vocal folds are touching each other. It thus yields one-dimensional information of the contact area. Neither inverse filtering nor EGG are sufficient to completely describe the complex 3-dimensional pattern of vocal fold movement, but can provide useful indirect evidence of that movement.
External links
See also
- BiometricsBiometricsBiometrics As Jain & Ross point out, "the term biometric authentication is perhaps more appropriate than biometrics since the latter has been historically used in the field of statistics to refer to the analysis of biological data [36]" . consists of methods...
- Speech processingSpeech processingSpeech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...
- Audio signal processingAudio signal processingAudio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound. As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain...
- Digital signal processingDigital signal processingDigital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
- StutteringStutteringStuttering , also known as stammering , is a speech disorder in which the flow of speech is disrupted by involuntary repetitions and prolongations of sounds, syllables, words or phrases, and involuntary silent pauses or blocks in which the stutterer is unable to produce sounds...