Computer Audition
Encyclopedia
Computer Audition is general field of study of algorithm
s and systems for audio understanding by machine. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind.
Inspired by models of human audition
, CA deals with questions of representation, transduction
, grouping, use of music
al knowledge and general sound semantics
for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing
, auditory modelling, music perception and cognition
, pattern recognition
, and machine learning
, as well as more traditional methods of artificial intelligence
for musical knowledge representation.
versus Image Processing, Computer Audition versus Audio Engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings.
Applications of Computer Auditions are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on.
or digital
recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files.
Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric
representation for general audio. Parametric audio representations usually use filter bank
s or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings.
invariance (chroma).
Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation.
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s and systems for audio understanding by machine. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind.
Inspired by models of human audition
Hearing (sense)
Hearing is the ability to perceive sound by detecting vibrations through an organ such as the ear. It is one of the traditional five senses...
, CA deals with questions of representation, transduction
Transduction (machine learning)
In logic, statistical inference, and supervised learning,transduction or transductive inference is reasoning fromobserved, specific cases to specific cases. In contrast,induction is reasoning from observed training cases...
, grouping, use of music
Music
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...
al knowledge and general sound semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...
, auditory modelling, music perception and cognition
Cognition
In science, cognition refers to mental processes. These processes include attention, remembering, producing and understanding language, solving problems, and making decisions. Cognition is studied in various disciplines such as psychology, philosophy, linguistics, and computer science...
, pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
, and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, as well as more traditional methods of artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
for musical knowledge representation.
Applications
Like Computer VisionComputer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
versus Image Processing, Computer Audition versus Audio Engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings.
Applications of Computer Auditions are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on.
Related disciplines
Computer Audition overlaps with the following disciplines:- Music Information Retrieval: methods for search and analysis of similarity between music signals.
- Auditory Science Analysis: understanding and description of audio sources and events.
- Machine listening: methods for extracting auditory meaningful parameters from audio signals.
- Computational musicologyMusicologyMusicology is the scholarly study of music. The word is used in narrow, broad and intermediate senses. In the narrow sense, musicology is confined to the music history of Western culture...
and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. - Computer music: use of computers in creative musical applications.
- Machine musicianship: audition driven interactive music systems.
Areas of study
The study of CA could be roughly divided into the following sub-problems:- Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture.
- Feature extraction: sound descriptors, segmentation, onset, pitchPitch detection algorithmA pitch detection algorithm is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or virtually periodic signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain or the frequency domain.PDAs are used in various...
and envelopeEnvelope detectorAn envelope detector is an electronic circuit that takes a high-frequency signal as input and provides an output which is the "envelope" of the original signal. The capacitor in the circuit stores up charge on the rising edge, and releases it slowly through the resistor when the signal falls...
detection, chromaPitch classIn music, a pitch class is a set of all pitches that are a whole number of octaves apart, e.g., the pitch class C consists of the Cs in all octaves...
, and auditory representations. - Musical knowledge structures: analysis of tonalityTonalityTonality is a system of music in which specific hierarchical pitch relationships are based on a key "center", or tonic. The term tonalité originated with Alexandre-Étienne Choron and was borrowed by François-Joseph Fétis in 1840...
, rhythmRhythmRhythm may be generally defined as a "movement marked by the regulated succession of strong and weak elements, or of opposite or different conditions." This general meaning of regular recurrence or pattern in time may be applied to a wide variety of cyclical natural phenomena having a periodicity or...
, and harmoniesHarmonyIn music, harmony is the use of simultaneous pitches , or chords. The study of harmony involves chords and their construction and chord progressions and the principles of connection that govern them. Harmony is often said to refer to the "vertical" aspect of music, as distinguished from melodic...
. - Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering.
- Sequence modeling: matching and alignment between signals and note sequences.
- Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods.
- Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure.
- Multi-modal analysis: finding correspondences between textual, visual, and audio signals.
Representation issues
Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogueAnalog recording
Analog recording is a technique used for the recording of analog signals which among many possibilities include audio frequency, analog audio and analog video information for later playback.Analog recording methods store signals as a continual wave in or on the media...
or digital
Digital
A digital system is a data technology that uses discrete values. By contrast, non-digital systems use a continuous range of values to represent information...
recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files.
Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....
representation for general audio. Parametric audio representations usually use filter bank
Filter bank
In signal processing, a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original signal. One application of a filter bank is a graphic equalizer, which can attenuate the components...
s or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings.
Features
Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape and etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octaveOctave
In music, an octave is the interval between one musical pitch and another with half or double its frequency. The octave relationship is a natural phenomenon that has been referred to as the "basic miracle of music", the use of which is "common in most musical systems"...
invariance (chroma).
Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation.