Generative Topographic Map
Encyclopedia
Generative topographic map (GTM) is a machine learning
method that is a probabilistic counterpart of the self-organizing map
(SOM), is provably convergent and does not require a shrinking neighborhood
or a decreasing step size. It is a generative model
: the data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in that space. The parameters of the low-dimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectation-maximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams.
and a multi-layer perceptron to form a non-linear latent variable model
. In the GTM the latent space is a discrete grid of points which is assumed to be non-linearly projected into data space. A Gaussian noise
assumption is then made in data space so that the model becomes a constrained mixture of Gaussians. Then the model's likelihood can be maximized by EM.
In theory, an arbitrary nonlinear parametric deformation could be used. The optimal parameters could be found by gradient descent etc.
The suggested approach to the nonlinear mapping is to use a radial basis function network
(RBF) to create a nonlinear mapping between the latent space and the data space. The nodes of the
RBF network then form a feature space and the nonlinear mapping can then be taken as a linear transform of this feature space. This approach has the advantage over the suggested density network approach that it can be optimised analytically.
, which allows high dimensional data to be modelled as resulting from Gaussian noise added to sources in lower-dimensional latent space. For example, to locate stocks in plottable 2D space based on their hi-D time-series shapes. Other applications may want to have fewer sources than data points, for example mixture models.
In generative deformational modelling, the latent and data spaces have the same dimensions, for example, 2D images or 1 audio sound waves. Extra 'empty' dimensions are added to the source (known as the 'template' in this form of modelling), for example locating the 1D sound wave in 2D space. Further nonlinear dimensions are then added, produced by combining the original dimensions. The enlarged latent space is then projected back into the 1D data space. The probability of a given projection is, as before, given by the product of the likelihood of the data under the Gaussian noise model with the prior on the deformation parameter. Unlike conventional spring-based deformation modelling, this has the advantage of being analytically optimizable. The disadvantage is that it is a 'data-mining' approach, ie. the shape of the deformation prior is unlikely to be meaningful as an explanation of the possible deformations, as it is based on a very high, artificial- and arbitrarily constructed nonlinear latent space. For this reason the prior is learned from data rather than created by a human expert, as is possible for spring-based models.
can wander around at will, GTM nodes are constrained by the allowable transformations and their probabilities. If the deformations are well-behaved the topology of the latent space is preserved.
The SOM was created as a biological model of neurons and is a heuristic algorithm. By contrast, the GTM has nothing to do with neuroscience or cognition and is a probabilistically principled model. Thus, it has a number of advantages over SOM, namely:
GTM was introduced by Bishop, Svensen and Williams in their Technical Report in 1997 (Technical Report NCRG/96/015, Aston University, UK) published later in Neural Computation. It was also described in PhD
thesis of Markus Svensen (Aston, 1998).
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
method that is a probabilistic counterpart of the self-organizing map
Self-organizing map
A self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...
(SOM), is provably convergent and does not require a shrinking neighborhood
Neighbourhood (mathematics)
In topology and related areas of mathematics, a neighbourhood is one of the basic concepts in a topological space. Intuitively speaking, a neighbourhood of a point is a set containing the point where you can move that point some amount without leaving the set.This concept is closely related to the...
or a decreasing step size. It is a generative model
Generative model
In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences...
: the data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in that space. The parameters of the low-dimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectation-maximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams.
Details of the algorithm
The approach is strongly related to density networks which use importance samplingImportance sampling
In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution rather than the distribution of interest. It is related to Umbrella sampling in computational physics...
and a multi-layer perceptron to form a non-linear latent variable model
Latent variable model
A latent variable model is a statistical model that relates a set of variables to a set of latent variables.It is assumed that 1) the responses on the indicators or manifest variables are the result of...
. In the GTM the latent space is a discrete grid of points which is assumed to be non-linearly projected into data space. A Gaussian noise
Gaussian noise
Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. In other words, the values that the noise can take on are Gaussian-distributed. A special case is white Gaussian noise, in which...
assumption is then made in data space so that the model becomes a constrained mixture of Gaussians. Then the model's likelihood can be maximized by EM.
In theory, an arbitrary nonlinear parametric deformation could be used. The optimal parameters could be found by gradient descent etc.
The suggested approach to the nonlinear mapping is to use a radial basis function network
Radial basis function network
A radial basis function network is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions...
(RBF) to create a nonlinear mapping between the latent space and the data space. The nodes of the
RBF network then form a feature space and the nonlinear mapping can then be taken as a linear transform of this feature space. This approach has the advantage over the suggested density network approach that it can be optimised analytically.
Uses
In data analysis, GTMs are like a nonlinear version of principal components analysisPrincipal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
, which allows high dimensional data to be modelled as resulting from Gaussian noise added to sources in lower-dimensional latent space. For example, to locate stocks in plottable 2D space based on their hi-D time-series shapes. Other applications may want to have fewer sources than data points, for example mixture models.
In generative deformational modelling, the latent and data spaces have the same dimensions, for example, 2D images or 1 audio sound waves. Extra 'empty' dimensions are added to the source (known as the 'template' in this form of modelling), for example locating the 1D sound wave in 2D space. Further nonlinear dimensions are then added, produced by combining the original dimensions. The enlarged latent space is then projected back into the 1D data space. The probability of a given projection is, as before, given by the product of the likelihood of the data under the Gaussian noise model with the prior on the deformation parameter. Unlike conventional spring-based deformation modelling, this has the advantage of being analytically optimizable. The disadvantage is that it is a 'data-mining' approach, ie. the shape of the deformation prior is unlikely to be meaningful as an explanation of the possible deformations, as it is based on a very high, artificial- and arbitrarily constructed nonlinear latent space. For this reason the prior is learned from data rather than created by a human expert, as is possible for spring-based models.
Comparison with Kohonen's self-organizing maps
While nodes in the self-organizing map (SOM)Self-organizing map
A self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...
can wander around at will, GTM nodes are constrained by the allowable transformations and their probabilities. If the deformations are well-behaved the topology of the latent space is preserved.
The SOM was created as a biological model of neurons and is a heuristic algorithm. By contrast, the GTM has nothing to do with neuroscience or cognition and is a probabilistically principled model. Thus, it has a number of advantages over SOM, namely:
- it explicitly formulates a density model over the data.
- it uses a cost function that quantifies how well the map is trained.
- it uses a sound optimization procedure (EM algorithm).
GTM was introduced by Bishop, Svensen and Williams in their Technical Report in 1997 (Technical Report NCRG/96/015, Aston University, UK) published later in Neural Computation. It was also described in PhD
PHD
PHD may refer to:*Ph.D., a doctorate of philosophy*Ph.D. , a 1980s British group*PHD finger, a protein sequence*PHD Mountain Software, an outdoor clothing and equipment company*PhD Docbook renderer, an XML renderer...
thesis of Markus Svensen (Aston, 1998).
See also
- Self-organizing map (SOM)Self-organizing mapA self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...
- Artificial Neural NetworkArtificial neural networkAn artificial neural network , usually called neural network , is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes...
- ConnectionismConnectionismConnectionism is a set of approaches in the fields of artificial intelligence, cognitive psychology, cognitive science, neuroscience and philosophy of mind, that models mental or behavioral phenomena as the emergent processes of interconnected networks of simple units...
- Data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Machine learningMachine learningMachine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
- Nonlinear dimensionality reductionNonlinear dimensionality reductionHigh-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher-dimensional space...
- Neural network softwareNeural network softwareNeural network software is used to simulate, research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems.-Simulators:...
- Pattern recognitionPattern recognitionIn machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
External links
- Bishop, Svensen and Williams Generative Topographic Mapping paper
- Generative topographic mapping developed at the Neural Computing Research Group os Aston University (UK). ( Matlab toolbox )