Kernel methods
Encyclopedia
In computer science
, kernel methods (KMs) are a class of algorithms for pattern analysis, whose best known element
is the support vector machine
(SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, ranking
s, principal components, correlation
s, classification
s) in general types of data (such as sequences, text documents, sets of points, vectors, images, etc.).
KMs approach the problem by mapping the data into a high dimensional feature space
, where each coordinate corresponds to one feature of the data items, transforming
the data into a set of points in a Euclidean space
. In that space, a variety of methods can be used to find relations in the data. Since the mapping can be quite general (not necessarily linear
, for example), the relations found in this way are accordingly very general. This approach is called the kernel trick
.
KMs owe their name to the use of kernel function
s, which enable them to operate in the feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.
Algorithms capable of operating with kernels include Support vector machine
(SVM), Gaussian process
es, Fisher's
linear discriminant analysis
(LDA), principal components analysis
(PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters
and many others.
Because of the particular culture of the research community that has been developing this approach since the mid-1990s, most kernel algorithms are based on convex optimization or eigenproblems
, are computationally efficient and statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory
(for example, using Rademacher complexity
).
, kriging
, inverse distance weighting
, bioinformatics
, chemoinformatics, information extraction
, text categorization, and handwriting recognition
.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, kernel methods (KMs) are a class of algorithms for pattern analysis, whose best known element
is the support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
(SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, ranking
Ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....
s, principal components, correlation
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
s, classification
Categorization
Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge...
s) in general types of data (such as sequences, text documents, sets of points, vectors, images, etc.).
KMs approach the problem by mapping the data into a high dimensional feature space
Feature space
In pattern recognition a feature space is an abstract space where each pattern sample is represented as a point in n-dimensional space. Its dimension is determined by the number of features used to describe the patterns...
, where each coordinate corresponds to one feature of the data items, transforming
Data transformation
In metadata and data warehouse, a data transformation converts data from a source data format into destination data.Data transformation can be divided into two steps:...
the data into a set of points in a Euclidean space
Euclidean space
In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...
. In that space, a variety of methods can be used to find relations in the data. Since the mapping can be quite general (not necessarily linear
Linear
In mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...
, for example), the relations found in this way are accordingly very general. This approach is called the kernel trick
Kernel trick
For machine learning algorithms, the kernel trick is a way of mapping observations from a general set S into an inner product space V , without ever having to compute the mapping explicitly, in the hope that the observations will gain meaningful linear structure in V...
.
KMs owe their name to the use of kernel function
Kernel (statistics)
A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...
s, which enable them to operate in the feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.
Algorithms capable of operating with kernels include Support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
(SVM), Gaussian process
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...
es, Fisher's
Ronald Fisher
Sir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...
linear discriminant analysis
Linear discriminant analysis
Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
(LDA), principal components analysis
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
(PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters
Adaptive filter
An adaptive filter is a filter that self-adjusts its transfer function according to an optimization algorithm driven by an error signal. Because of the complexity of the optimization algorithms, most adaptive filters are digital filters. By way of contrast, a non-adaptive filter has a static...
and many others.
Because of the particular culture of the research community that has been developing this approach since the mid-1990s, most kernel algorithms are based on convex optimization or eigenproblems
Eigenvalue, eigenvector and eigenspace
The eigenvectors of a square matrix are the non-zero vectors that, after being multiplied by the matrix, remain parallel to the original vector. For each eigenvector, the corresponding eigenvalue is the factor by which the eigenvector is scaled when multiplied by the matrix...
, are computationally efficient and statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory
Statistical learning theory
Statistical learning theory is an ambiguous term.#It may refer to computational learning theory, which is a sub-field of theoretical computer science that studies how algorithms can learn from data....
(for example, using Rademacher complexity
Rademacher complexity
In statistics and machine learning, Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution....
).
Applications
At the moment, the main application areas are in geostatisticsGeostatistics
Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology,...
, kriging
Kriging
Kriging is a group of geostatistical techniques to interpolate the value of a random field at an unobserved location from observations of its value at nearby locations....
, inverse distance weighting
Inverse distance weighting
Inverse distance weighting is a method for multivariate interpolation, a process of assigning values to unknown points by using values from usually scattered set of known points...
, bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
, chemoinformatics, information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
, text categorization, and handwriting recognition
Handwriting recognition
Handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or...
.
External links
- Kernel-Machines Org -- community website
- www.support-vector-machines.org (Literature, Review, Software, Links related to Support Vector Machines - Academic Site)
- onlineprediction.net Kernel Methods Article