Multimedia Information Retrieval
Encyclopedia
Multimedia Information Retrieval (MMIR) is a research discipline of computer science
that aims at extracting semantic information from multimedia
data sources. Data sources include directly perceivable media such as audio
, image
and video
, indirectly perceivable sources such as text
, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:
Frequently used methods for description filtering include factor analysis
(e.g. by PCA), singular value decomposition (e.g. as latent semantic indexing in text retrieval) and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter
are used for merging of descriptions.
, while Dynamic Time Warping
- a semantically related method - is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:
The selection of the best classifier for a given problem (test set with descriptions and class labels, so-called ground truth
) can be performed automatically, for example, using the Weka
Data Miner.
competition is currently one of the most relevant sources of high-quality ground truth.
The new Journal of Multimedia Information Retrieval should help the development of MMIR as a research discipline independent of these areas.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
that aims at extracting semantic information from multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
data sources. Data sources include directly perceivable media such as audio
Audio
Audio is an electrical or other representation of sound.Audio may also refer to:*Audio, audible content in media production and publishing*AUDIO , an American R&B band of 5 brothers formerly known as TNT Boyz and as B5...
, image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
and video
Video
Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...
, indirectly perceivable sources such as text
Written language
A written language is the representation of a language by means of a writing system. Written language is an invention in that it must be taught to children, who will instinctively learn or create spoken or gestural languages....
, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:
- Methods for the summarization of media content (feature extractionFeature extractionIn pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation...
). The result of feature extraction is a description. - Methods for the filtering of media descriptions (for example, elimination of redundancyRedundancyRedundancy may refer to:* Redundancy * Redundancy * Redundancy * Redundancy * Redundancy * Data redundancy* Gene redundancy* Logic redundancy...
) - Methods for the categorizationCategorizationCategorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge...
of media descriptions into classes.
Feature Extraction Methods
Feature extraction is motivated by the sheer size of multimedia objects as well as their redundancy and, possibly, noisiness. Generally, two possible goals can be achieved by feature extraction:- Summarization of media content. Methods for summarization include in the audio domain, for example, Mel Frequency Cepstral Coefficients, Zero Crossings Rate, Short-Time Energy. In the visual domain, color histograms such as the MPEG-7MPEG-7MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938 . This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description...
Scalable Color Descriptor can be used for summarization. - Detection of patterns by auto-correlation and/or cross-correlationCross-correlationIn signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long-duration signal for a shorter, known feature...
. Patterns are recurring media chunks that can either be detected by comparing chunks over the media dimensions (time, space, etc.) or comparing media chunks to templates (e.g. face templates, phrases). Typical methods include Linear Predictive Coding in the audio/biosignal domain, texture description in the visual domain and n-grams in text information retrieval.
Merging and Filtering Methods
Multimedia Information Retrieval implies that multiple channels are employed for the understanding of media content. Each of this channels is described by media-specific feature transformations. The resulting descriptions have to be merged to one description per media object. Merging can be performed by simple concatenation if the descriptions are of fixed size. Variable-sized descriptions - as they frequently occur in motion description - have to be normalized to a fixed length first.Frequently used methods for description filtering include factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
(e.g. by PCA), singular value decomposition (e.g. as latent semantic indexing in text retrieval) and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter
Kalman filter
In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...
are used for merging of descriptions.
Categorization Methods
Generally, all forms of machine learning can be employed for the categorization of multimedia descriptions though some methods are more frequently used in one area than another. For example, Hidden Markov models are state-of-the-art in speech recognitionSpeech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
, while Dynamic Time Warping
Dynamic time warping
Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even...
- a semantically related method - is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:
- Metric approaches (Cluster Analysis, Vector Space ModelVector space modelVector space model is an algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings...
, MinkowskiMinkowskiMinkowski is a surname, and may refer to:* Eugène Minkowski , French psychiatrist* Hermann Minkowski Russian-born German mathematician and physicist, known for:** Minkowski addition** Minkowski–Bouligand dimension...
Distances, Dynamic Alignment) - Nearest Neighbor methods (K-Nearest Neighbor, K-Means, Self-Organizing MapSelf-organizing mapA self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...
) - Risk Minimization (Support Vector Regression, Support Vector MachineSupport vector machineA support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
, Linear Discriminant AnalysisLinear discriminant analysisLinear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
) - Density-based Methods (Bayes Nets, Markov Processes, Mixture Models)
- Neural Networks (PerceptronPerceptronThe perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
, Associative Memories, Spiking Nets) - Heuristics (Decision Trees, Random Forests, etc.)
The selection of the best classifier for a given problem (test set with descriptions and class labels, so-called ground truth
Ground truth
Ground truth is a term used in cartography, meteorology, analysis of aerial photographs, satellite imagery and a range of other remote sensing techniques in which data are gathered at a distance. Ground truth refers to information that is collected "on location." In remote sensing, this is...
) can be performed automatically, for example, using the Weka
Weka
The Weka or woodhen is a flightless bird species of the rail family. It is endemic to New Zealand, where four subspecies are recognized. Weka are sturdy brown birds, about the size of a chicken. As omnivores, they feed mainly on invertebrates and fruit...
Data Miner.
Open Problems
The quality of MMIR Systems depends heavily on the quality of the training data. Discriminative descriptions can be extracted from media sources in various forms. Machine learning provides categorization methods for all types of data. However, the classifier can only be as good as the given training data. On the other hand, it requires considerable effort to provide class labels for large databases. The future success of MMIR will depend on the provision of such data. The annual TRECVIDTRECVID
The TRECVID evaluation meetings are on-going series of workshops focusing on a list of different information retrieval research areas in content based retrieval of video. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Projects Activity of...
competition is currently one of the most relevant sources of high-quality ground truth.
Related Areas
MMIR provides an overview over methods employed in the areas of information retrieval. Methods of one area are adapted and employed on other types of media. Multimedia content is merged before the classification is performed. MMIR methods are, therefore, usually reused from other areas such as:- Bioinformation AnalysisBioinformaticsBioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
- Biosignal ProcessingBiosignalBiosignal is a summarizing term for all kinds of signals that can be measured and monitored from biological beings. The term biosignal is often used to mean bio-electrical signal but in fact, biosignal refers to both electrical and non-electrical signals.Electrical biosignals are usually taken to...
- Content-based Image and Video RetrievalContent-based image retrievalContent-based image retrieval , also known as query by image content and content-based visual information retrieval is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases....
- Face Recognition
- Audio and Music ClassificationMusic information retrievalMusic information retrieval is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications...
- Speech RecognitionSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
- Technical Chart AnalysisTechnical analysisIn finance, technical analysis is security analysis discipline for forecasting the direction of prices through the study of past market data, primarily price and volume. Behavioral economics and quantitative analysis incorporate technical analysis, which being an aspect of active management stands...
- Text Information RetrievalInformation retrievalInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
The new Journal of Multimedia Information Retrieval should help the development of MMIR as a research discipline independent of these areas.