Types of artificial neural networks
Encyclopedia
There are many types of artificial neural networks (ANN). An artificial neural network
is a computational simulation of a biological neural network
. These models mimic the real life behaviour of neurons and the electrical messages they produce between input (such as from the eyes or nerve endings in the hand), processing by the brain and the final output from the brain (such as reacting to light or from sensing touch or heat). There are other ANNs which are adaptive system
s used to model things such as environments and population.
The systems can be hardware and software based specifically built systems or purely software based and run in computer models.
units, e.g. binary McCulloch-Pitts neurons, the simplest example being the perceptron
. Continuous neurons, frequently with sigmoidal activation, are used in the context
of backpropagation
of error.
of a linear combination of hidden layer values, representing a posterior probability. Performance in both cases is often improved by shrinkage techniques, known as ridge regression in classical statistics and known to correspond to a prior belief in small parameter values (and therefore smooth output functions) in a Bayesian
framework.
RBF networks have the advantage of not suffering from local minima in the same way as Multi-Layer Perceptrons. This is because the only parameters that are adjusted in the learning process are the linear mapping from hidden layer to output layer. Linearity ensures that the error surface is quadratic and therefore has a single easily found minimum. In regression problems this can be found in one matrix operation. In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using iteratively re-weighted least squares
.
RBF networks have the disadvantage of requiring good coverage of the input space by radial basis functions. RBF centres are determined with reference to the distribution of the input data, but without reference to the prediction task. As a result, representational resources may be wasted on areas of the input space that are irrelevant to the learning task. A common solution is to associate each data point with its own centre, although this can make the linear system to be solved in the final layer rather large, and requires shrinkage techniques to avoid overfitting
.
Associating each input datum with an RBF leads naturally to kernel methods such as support vector machine
s and Gaussian processes (the RBF is the kernel function). All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model. Like Gaussian Processes, and unlike SVMs, RBF networks are typically trained in a Maximum Likelihood framework by maximizing the probability (minimizing the error) of the data under the model. SVMs take a different approach to avoiding overfitting by maximizing instead a margin. RBF networks are outperformed in most classification applications by SVMs. In regression applications they can be competitive when the dimensionality of the input space is relatively small.
performs a form of unsupervised learning
. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.
It was suggested by Teuvo Kohonen
, originally.
In LVQ, prototypical representatives of the classes parameterize, together with an appropriate distance measure, a distance-based classification scheme.
s (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages. RNNs can be used as general sequence processors.
For supervised learning
in discrete time settings, training sequences of real-valued input vectors become sequences of activations of the input nodes, one input vector at a time. At any given time step, each non-input unit computes its current activation as a nonlinear function of the weighted sum of the activations of all units from which it receives connections. There may be teacher-given target activations for some of the output units at certain time steps. For example, if the input sequence is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a label classifying the digit. For each sequence, its error is the sum of the deviations of all target signals from the corresponding activations computed by the network. For a training set of numerous sequences, the total error is the sum of the errors of all individual sequences.
To minimize total error, gradient descent
can be used to change each weight in proportion to its derivative with respect to the error, provided the non-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early 1990s by Paul Werbos
, Ronald J. Williams
, Tony Robinson
, Jürgen Schmidhuber
, Barak Pearlmutter, and others. The standard method is called "backpropagation through time
" or BPTT, a generalization of back-propagation for feedforward networks. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL. Unlike BPTT this algorithm is local in time but not local in space. There also is an online hybrid between BPTT and RTRL with intermediate complexity, and there are variants for continuous time.
A major problem with gradient descent for standard RNN architectures is that error gradients vanish exponentially quickly with the size of the time lag between important events, as first realized by Sepp Hochreiter
in 1991. The Long short term memory
architecture overcomes these problems.
In reinforcement learning
settings, there is no teacher providing target signals for the RNN, instead a fitness function
or reward function or utility function is occasionally used to evaluate the performance of the RNN, which is influencing its input stream through output units connected to actuators affecting the environment. Variants of evolutionary computation
are often used to optimize the weight matrix.
, resistant to connection alteration.
can be thought of as a noisy Hopfield network. Invented by Geoff Hinton and Terry Sejnowski
in 1985, the Boltzmann machine is important because it is one of the first neural networks to demonstrate learning of latent variables (hidden units). Boltzmann machine learning was at first slow to simulate, but the contrastive divergence algorithm of Geoff Hinton (circa 2000) allows models such as Boltzmann machines and Products of Experts
to be trained much faster.
. A three-layer network is used, with the addition of a set of "context units" in the input layer. There are connections from the hidden layer (Elman) or from the output layer (Jordan) to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feedforward fashion, and then a simple backprop-like learning rule is applied (this rule is not performing proper gradient descent
, however). The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied).
(ESN) is a recurrent neural network
with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be trained. ESN are good at reproducing certain time series . A variant for spiking neurons is known as Liquid state machines.
(LSTM), developed by Hochreiter
and Schmidhuber
in 1997, is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It works even when there are long delays, and it can handle signals that have a mix of low and high frequency components. LSTM RNN outperformed other RNN and other sequence learning methods methods such as HMM
in numerous applications such as language learning and connected handwriting recognition.
differs from a typical neural network because it introduces random variations into the network. In a probabilistic view of neural networks, such random variations can be viewed as a form of statistical sampling, such as Monte Carlo sampling.
, in which several small networks cooperate or compete to solve problems.
(CoM) is a collection of different neural networks that together "vote" on a given example. This generally gives a much better result compared to other neural network models. Because neural networks suffer from local minima, starting with the same architecture and training but using different initial random weights often gives vastly different networks. A CoM tends to stabilize the result.
The CoM is similar to the general machine learning
bagging
method, except that the necessary variety of machines in the committee is obtained by training from different random starting weights rather than training on different randomly selected subsets of the training data.
neural network developed by Bernard Widrow
in the 1960s and the memristor
based neural network developed by Greg Snider of HP Labs
in 2008.
represents a family of analog, correlation-based, associative, stimulus-response memories, where information is mapped onto the phase orientation of complex numbers operating.
(ITNNs) were inspired by the phenomenon of short-term learning that seems to occur instantaneously. In these networks the weights of the hidden and the output layers are mapped directly from the training vector data. Ordinarily, they work on binary data, but versions for continuous data that require small additional processing are also available.
s (SNNs) are models which explicitly take into account the timing of inputs. The network input and output are usually represented as series of spikes (delta function or more complex shapes). SNNs have an advantage of being able to process information in the time domain
(signals that vary over time). They are often implemented as recurrent networks. SNNs are also a form of pulse computer
.
Spiking neural networks with axonal conduction delays exhibit polychronization, and hence could have a very large memory capacity.
Networks of spiking neurons — and the temporal correlations of neural assemblies in such networks — have been used to model figure/ground separation and region linking in the visual system (see, for example, Reitboeck et al.in Haken and Stadler: Synergetics of the Brain. Berlin, 1989).
In June 2005 IBM
announced construction of a Blue Gene
supercomputer
dedicated to the simulation of a large recurrent spiking neural network.
Gerstner and Kistler have a freely available online textbook on Spiking Neuron Models.
is an architecture and supervised learning
algorithm
developed by Scott Fahlman
and Christian Lebiere.
Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a
minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer
structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit
then becomes a permanent feature-detector in the network, available for producing outputs or for creating
other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over
existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the
structures it has built even if the training set changes, and it requires no back-propagation of error signals
through the connections of the network.
inference system in the body of an artificial neural network. Depending on the FIS type, there are several layers that simulate the processes involved in a fuzzy inference like fuzzification, inference, aggregation and defuzzification. Embedding an FIS in a general structure of an ANN has the benefit of using available ANN training methods to find the parameters of a fuzzy system.
s (CPPNs) are a variation of ANNs which differ in their set of activation function
s and how they are applied. While typical ANNs often contain only sigmoid function
s (and sometimes Gaussian functions), CPPNs can include both types of functions and many others. Furthermore, unlike typical ANNs, CPPNs are applied across the entire space of possible inputs so that they can represent a complete image. Since they are compositions of functions, CPPNs in effect encode images at infinite resolution and can be sampled for a particular display at whatever resolution is optimal.
s (WSN), Grid computing
, and GPGPU
s.
Artificial neural network
An artificial neural network , usually called neural network , is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes...
is a computational simulation of a biological neural network
Biological neural network
In neuroscience, a biological neural network describes a population of physically interconnected neurons or a group of disparate neurons whose inputs or signalling targets define a recognizable circuit. Communication between neurons often involves an electrochemical process...
. These models mimic the real life behaviour of neurons and the electrical messages they produce between input (such as from the eyes or nerve endings in the hand), processing by the brain and the final output from the brain (such as reacting to light or from sensing touch or heat). There are other ANNs which are adaptive system
Adaptive system
The term adaptation arises mainly in the biological scope as a trial to study the relationship between the characteristics of living beings and their environments...
s used to model things such as environments and population.
The systems can be hardware and software based specifically built systems or purely software based and run in computer models.
Feedforward neural network
The feedforward neural network was the first and arguably most simple type of artificial neural network devised. In this network the information moves in only one direction — forwards: From the input nodes data goes through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. Feedforward networks can be constructed from different types ofunits, e.g. binary McCulloch-Pitts neurons, the simplest example being the perceptron
Perceptron
The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
. Continuous neurons, frequently with sigmoidal activation, are used in the context
of backpropagation
Backpropagation
Backpropagation is a common method of teaching artificial neural networks how to perform a given task. Arthur E. Bryson and Yu-Chi Ho described it as a multi-stage dynamic system optimization method in 1969 . It wasn't until 1974 and later, when applied in the context of neural networks and...
of error.
Radial basis function (RBF) network
Radial basis functions are powerful techniques for interpolation in multidimensional space. A RBF is a function which has built into a distance criterion with respect to a center. Radial basis functions have been applied in the area of neural networks where they may be used as a replacement for the sigmoidal hidden layer transfer characteristic in multi-layer perceptrons. RBF networks have two layers of processing: In the first, input is mapped onto each RBF in the 'hidden' layer. The RBF chosen is usually a Gaussian. In regression problems the output layer is then a linear combination of hidden layer values representing mean predicted output. The interpretation of this output layer value is the same as a regression model in statistics. In classification problems the output layer is typically a sigmoid functionSigmoid function
Many natural processes, including those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time. When a detailed description is lacking, a sigmoid function is often used. A sigmoid curve is produced by a mathematical...
of a linear combination of hidden layer values, representing a posterior probability. Performance in both cases is often improved by shrinkage techniques, known as ridge regression in classical statistics and known to correspond to a prior belief in small parameter values (and therefore smooth output functions) in a Bayesian
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
framework.
RBF networks have the advantage of not suffering from local minima in the same way as Multi-Layer Perceptrons. This is because the only parameters that are adjusted in the learning process are the linear mapping from hidden layer to output layer. Linearity ensures that the error surface is quadratic and therefore has a single easily found minimum. In regression problems this can be found in one matrix operation. In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using iteratively re-weighted least squares
Iteratively re-weighted least squares
The method of iteratively reweighted least squares is used to solve certain optimization problems. It solves objective functions of the form:...
.
RBF networks have the disadvantage of requiring good coverage of the input space by radial basis functions. RBF centres are determined with reference to the distribution of the input data, but without reference to the prediction task. As a result, representational resources may be wasted on areas of the input space that are irrelevant to the learning task. A common solution is to associate each data point with its own centre, although this can make the linear system to be solved in the final layer rather large, and requires shrinkage techniques to avoid overfitting
Overfitting
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...
.
Associating each input datum with an RBF leads naturally to kernel methods such as support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
s and Gaussian processes (the RBF is the kernel function). All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model. Like Gaussian Processes, and unlike SVMs, RBF networks are typically trained in a Maximum Likelihood framework by maximizing the probability (minimizing the error) of the data under the model. SVMs take a different approach to avoiding overfitting by maximizing instead a margin. RBF networks are outperformed in most classification applications by SVMs. In regression applications they can be competitive when the dimensionality of the input space is relatively small.
Kohonen self-organizing network
The self-organizing map (SOM) invented by Teuvo KohonenTeuvo Kohonen
Teuvo Kohonen, Dr. Ing , is a Finnish academician and prominent researcher. He is currently professor emeritus of the Academy of Finland.Prof...
performs a form of unsupervised learning
Unsupervised learning
In machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.
Learning Vector Quantization
Learning Vector Quantization (LVQ) can also be interpreted as a neural network architecture.It was suggested by Teuvo Kohonen
Teuvo Kohonen
Teuvo Kohonen, Dr. Ing , is a Finnish academician and prominent researcher. He is currently professor emeritus of the Academy of Finland.Prof...
, originally.
In LVQ, prototypical representatives of the classes parameterize, together with an appropriate distance measure, a distance-based classification scheme.
Recurrent neural network
Contrary to feedforward networks recurrent neural networkRecurrent neural network
A recurrent neural network is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process...
s (RNNs) are models with bi-directional data flow. While a feedforward network propagates data linearly from input to output, RNNs also propagate data from later processing stages to earlier stages. RNNs can be used as general sequence processors.
Fully recurrent network
This is the basic architecture developed in the 1980s: a network of neuron-like units, each with a directed connection to every other unit. Each unit has a time-varying real-valued activation. Each connection has a modifiable real-valued weight. Some of the nodes are called input nodes, some output nodes, the rest hidden nodes. Most architectures below are special cases.For supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
in discrete time settings, training sequences of real-valued input vectors become sequences of activations of the input nodes, one input vector at a time. At any given time step, each non-input unit computes its current activation as a nonlinear function of the weighted sum of the activations of all units from which it receives connections. There may be teacher-given target activations for some of the output units at certain time steps. For example, if the input sequence is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a label classifying the digit. For each sequence, its error is the sum of the deviations of all target signals from the corresponding activations computed by the network. For a training set of numerous sequences, the total error is the sum of the errors of all individual sequences.
To minimize total error, gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...
can be used to change each weight in proportion to its derivative with respect to the error, provided the non-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early 1990s by Paul Werbos
Paul Werbos
Paul J. Werbos is a scientist best known for his 1974 Harvard University Ph.D. thesis, which first described the process of training artificial neural networks through backpropagation of errors. The thesis, and some supplementary information, can be found in his book, The Roots of Backpropagation...
, Ronald J. Williams
Ronald J. Williams
Ronald J. Williams is professor of computer science at Northeastern University, and one of the pioneers of neural networks. He co-authored a paper on the backpropagation algorithm which triggered a boom in neural network research. He also made fundamental contributions to the fields of recurrent...
, Tony Robinson
Tony Robinson
Tony Robinson is an English actor, comedian, author, broadcaster and political campaigner. He is best known for playing Baldrick in the BBC television series Blackadder, and for hosting Channel 4 programmes such as Time Team and The Worst Jobs in History. Robinson is a member of the Labour Party...
, Jürgen Schmidhuber
Jürgen Schmidhuber
Jürgen Schmidhuber is a computer scientist and artist known for his work on machine learning, universal Artificial Intelligence , artificial neural networks, digital physics, and low-complexity art. His contributions also include generalizations of Kolmogorov complexity and the Speed Prior...
, Barak Pearlmutter, and others. The standard method is called "backpropagation through time
Backpropagation through time
Backpropagation through time is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers-Algorithm:...
" or BPTT, a generalization of back-propagation for feedforward networks. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL. Unlike BPTT this algorithm is local in time but not local in space. There also is an online hybrid between BPTT and RTRL with intermediate complexity, and there are variants for continuous time.
A major problem with gradient descent for standard RNN architectures is that error gradients vanish exponentially quickly with the size of the time lag between important events, as first realized by Sepp Hochreiter
Sepp Hochreiter
Sepp Hochreiter is acomputer scientist working in the fields of bioinformatics andmachine learning. Since 2006 he has been head of the at theJohannes Kepler University ofLinz. Before, he was at the...
in 1991. The Long short term memory
Long short term memory
Long short term memory or LSTM is a recurrent neural network architecture published in 1997 by Sepp Hochreiter and Jürgen Schmidhuber...
architecture overcomes these problems.
In reinforcement learning
Reinforcement learning
Inspired by behaviorist psychology, reinforcement learning is an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward...
settings, there is no teacher providing target signals for the RNN, instead a fitness function
Fitness function
A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims....
or reward function or utility function is occasionally used to evaluate the performance of the RNN, which is influencing its input stream through output units connected to actuators affecting the environment. Variants of evolutionary computation
Evolutionary computation
In computer science, evolutionary computation is a subfield of artificial intelligence that involves combinatorial optimization problems....
are often used to optimize the weight matrix.
Hopfield network
The Hopfield network (like similar attractor-based networks) is of historic interest although it is not a general RNN, as it is not designed to process sequences of patterns. Instead it requires stationary inputs. It is an RNN in which all connections are symmetric. Invented by John Hopfield in 1982 it guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable memoryContent-addressable memory
Content-addressable memory is a special type of computer memory used in certain very high speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure...
, resistant to connection alteration.
Boltzmann machine
The Boltzmann machineBoltzmann machine
A Boltzmann machine is a type of stochastic recurrent neural network invented by Geoffrey Hinton and Terry Sejnowski. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets...
can be thought of as a noisy Hopfield network. Invented by Geoff Hinton and Terry Sejnowski
Terry Sejnowski
Terrence Joseph Sejnowski is an Investigator with the Howard Hughes Medical Institute and is the Francis Crick Professor at The Salk Institute for Biological Studies where he directs the Computational Neurobiology Laboratory...
in 1985, the Boltzmann machine is important because it is one of the first neural networks to demonstrate learning of latent variables (hidden units). Boltzmann machine learning was at first slow to simulate, but the contrastive divergence algorithm of Geoff Hinton (circa 2000) allows models such as Boltzmann machines and Products of Experts
Product of Experts
Product of Experts is a machine learning technique. It models a probability distribution by combining the output from several simpler distributions.It was proposed by Geoff Hinton, along with an algorithm for training the parameters of such a system....
to be trained much faster.
Simple recurrent networks
This special case of the basic architecture above was employed by Jeff Elman and Michael I. JordanMichael I. Jordan
Michael I. Jordan is a leading researcher in machine learning and artificial intelligence. Jordan was a prime mover behind popularising Bayesian networks in the machine learning community and is known for pointing out links between machine learning and statistics...
. A three-layer network is used, with the addition of a set of "context units" in the input layer. There are connections from the hidden layer (Elman) or from the output layer (Jordan) to these context units fixed with a weight of one. At each time step, the input is propagated in a standard feedforward fashion, and then a simple backprop-like learning rule is applied (this rule is not performing proper gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...
, however). The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied).
Echo state network
The echo state networkEcho state network
The echo state network is a recurrent neural network with a sparsely connected hidden layer . The connectivity and weights of hidden neurons are randomly assigned and are fixed...
(ESN) is a recurrent neural network
Recurrent neural network
A recurrent neural network is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process...
with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be trained. ESN are good at reproducing certain time series . A variant for spiking neurons is known as Liquid state machines.
Long short term memory network
The Long short term memoryLong short term memory
Long short term memory or LSTM is a recurrent neural network architecture published in 1997 by Sepp Hochreiter and Jürgen Schmidhuber...
(LSTM), developed by Hochreiter
Sepp Hochreiter
Sepp Hochreiter is acomputer scientist working in the fields of bioinformatics andmachine learning. Since 2006 he has been head of the at theJohannes Kepler University ofLinz. Before, he was at the...
and Schmidhuber
Jürgen Schmidhuber
Jürgen Schmidhuber is a computer scientist and artist known for his work on machine learning, universal Artificial Intelligence , artificial neural networks, digital physics, and low-complexity art. His contributions also include generalizations of Kolmogorov complexity and the Speed Prior...
in 1997, is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It works even when there are long delays, and it can handle signals that have a mix of low and high frequency components. LSTM RNN outperformed other RNN and other sequence learning methods methods such as HMM
HMM
HMM may refer to:* Hammerton railway station, England, National Rail station code HMM.* Heavy meromyosin, a fragment obtained from the muscle protein myosin II following limited proteolysis.* Heavy metal music, a subgenre of rock music....
in numerous applications such as language learning and connected handwriting recognition.
Bi-directional RNN
Invented by Schuster & Paliwal in 1997 bi-directional RNNs, or BRNNs, use a finite sequence to predict or label each element of the sequence based on both the past and the future context of the element. This is done by adding the outputs of two RNNs: one processing the sequence from left to right, the other one from right to left. The combined outputs are the predictions of the teacher-given target signals. This technique proved to be especially useful when combined with LSTM RNNs.Hierarchical RNN
There are many instances of hierarchical RNN whose elements are connected in various ways to decompose hierarchical behavior into useful subprograms.Stochastic neural networks
A stochastic neural networkStochastic neural network
Stochastic neural networks are a type of artificial neural networks, which is a tool of artificial intelligence. They are built by introducing random variations into the network, either by giving the network's neurons stochastic transfer functions, or by giving them stochastic weights...
differs from a typical neural network because it introduces random variations into the network. In a probabilistic view of neural networks, such random variations can be viewed as a form of statistical sampling, such as Monte Carlo sampling.
Modular neural networks
Biological studies have shown that the human brain functions not as a single massive network, but as a collection of small networks. This realization gave birth to the concept of modular neural networksModular neural networks
A modular neural network is a neural network characterized by a series of independent neural networks moderated by some intermediary. Each independent neural network serves as a module and operates on separate inputs to accomplish some subtask of the task the network hopes to perform...
, in which several small networks cooperate or compete to solve problems.
Committee of machines
A committee of machinesCommittee machine
A committee machine is a type of neural network using a divide and conquer strategy in which the responses of multiple neural networks are combined into a single response. The combined response of the committee machine is supposed to be superior to those of its constituent experts...
(CoM) is a collection of different neural networks that together "vote" on a given example. This generally gives a much better result compared to other neural network models. Because neural networks suffer from local minima, starting with the same architecture and training but using different initial random weights often gives vastly different networks. A CoM tends to stabilize the result.
The CoM is similar to the general machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
bagging
Bootstrap aggregating
Bootstrap aggregating is a machine learning ensemble meta-algorithm to improve machine learning of statistical classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision...
method, except that the necessary variety of machines in the committee is obtained by training from different random starting weights rather than training on different randomly selected subsets of the training data.
Associative neural network (ASNN)
The ASNN is an extension of the committee of machines that goes beyond a simple/weighted average of different models. ASNN represents a combination of an ensemble of feedforward neural networks and the k-nearest neighbor technique (kNN). It uses the correlation between ensemble responses as a measure of distance amid the analyzed cases for the kNN. This corrects the bias of the neural network ensemble. An associative neural network has a memory that can coincide with the training set. If new data become available, the network instantly improves its predictive ability and provides data approximation (self-learn the data) without a need to retrain the ensemble. Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models. The method is demonstrated at www.vcclab.org, where it can be used online or downloaded.Physical neural network
A physical neural network includes electrically adjustable resistance material to simulate artificial synapses. Examples include the ADALINEADALINE
ADALINE is a single layer neural network. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron...
neural network developed by Bernard Widrow
Bernard Widrow
Bernard Widrow is a U.S. professor of electrical engineering at Stanford University. He is the co-inventor of the Widrow–Hoff least mean squares filter adaptive algorithm with his then doctoral student Ted Hoff...
in the 1960s and the memristor
Memristor
Memristor is a passive two-terminal electrical component envisioned by Leon Chua as a fundamental non-linear circuit element relating charge and magnetic flux linkage...
based neural network developed by Greg Snider of HP Labs
HP Labs
HP Labs is the exploratory and advanced research group for Hewlett-Packard. The lab has some 600 researchersin seven locations throughout the world....
in 2008.
Holographic associative memory
Holographic associative memoryHolographic associative memory
Holographic Associative Memory is part of the family of analog, correlation-based, associative, stimulus-response memories, where information is mapped onto the phase orientation of complex numbers operating. It can be considered as a complex valued artificial neural network. The holographic...
represents a family of analog, correlation-based, associative, stimulus-response memories, where information is mapped onto the phase orientation of complex numbers operating.
Instantaneously trained networks
Instantaneously trained neural networksInstantaneously trained neural networks
In the artificial intelligence topic of machine learning, probably the best known example of an instant-training network is the Willshaw network, and its descendant the ADAM network . These are both associative networks; this is an example of supervised learning....
(ITNNs) were inspired by the phenomenon of short-term learning that seems to occur instantaneously. In these networks the weights of the hidden and the output layers are mapped directly from the training vector data. Ordinarily, they work on binary data, but versions for continuous data that require small additional processing are also available.
Spiking neural networks
Spiking neural networkSpiking neural network
Spiking neural networks fall into the third generation of neural network models, increasing the level of realism in a neural simulation. In addition to neuronal and synaptic state, SNNs also incorporate the concept of time into their operating model...
s (SNNs) are models which explicitly take into account the timing of inputs. The network input and output are usually represented as series of spikes (delta function or more complex shapes). SNNs have an advantage of being able to process information in the time domain
Time domain
Time domain is a term used to describe the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various...
(signals that vary over time). They are often implemented as recurrent networks. SNNs are also a form of pulse computer
Pulse computer
Pulse computation is a hybrid of digital and analog computation that uses aperiodic electrical spikes, as opposed to the periodic voltages in a digital computer or the continuously varying voltages in on analog computer...
.
Spiking neural networks with axonal conduction delays exhibit polychronization, and hence could have a very large memory capacity.
Networks of spiking neurons — and the temporal correlations of neural assemblies in such networks — have been used to model figure/ground separation and region linking in the visual system (see, for example, Reitboeck et al.in Haken and Stadler: Synergetics of the Brain. Berlin, 1989).
In June 2005 IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
announced construction of a Blue Gene
Blue Gene
Blue Gene is a computer architecture project to produce several supercomputers, designed to reach operating speeds in the PFLOPS range, and currently reaching sustained speeds of nearly 500 TFLOPS . It is a cooperative project among IBM Blue Gene is a computer architecture project to produce...
supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...
dedicated to the simulation of a large recurrent spiking neural network.
Gerstner and Kistler have a freely available online textbook on Spiking Neuron Models.
Dynamic neural networks
Dynamic neural networks not only deal with nonlinear multivariate behaviour, but also include (learning of) time-dependent behaviour such as various transient phenomena and delay effects. Techniques to estimate a system process from observed data fall under the general category of system identification.Cascading neural networks
Cascade CorrelationCascade correlation algorithm
Cascade-Correlation is an architecture and supervised learning algorithm for artificial neural networks developed by Scott Fahlman at Carnegie Mellon in 1990....
is an architecture and supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
developed by Scott Fahlman
Scott Fahlman
Scott Elliott Fahlman is a computer scientist at Carnegie Mellon University. He is notable for early work on automated planning in a blocks world, on semantic networks, on neural networks , on the Dylan programming language, and on Common Lisp...
and Christian Lebiere.
Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a
minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer
structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit
then becomes a permanent feature-detector in the network, available for producing outputs or for creating
other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over
existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the
structures it has built even if the training set changes, and it requires no back-propagation of error signals
through the connections of the network.
Neuro-fuzzy networks
A neuro-fuzzy network is a fuzzyFuzzy logic
Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
inference system in the body of an artificial neural network. Depending on the FIS type, there are several layers that simulate the processes involved in a fuzzy inference like fuzzification, inference, aggregation and defuzzification. Embedding an FIS in a general structure of an ANN has the benefit of using available ANN training methods to find the parameters of a fuzzy system.
Compositional pattern-producing networks
Compositional pattern-producing networkCompositional pattern-producing network
Compositional pattern-producing networks , are a variation of artificial neural networks which differ in their set of activation functions and how they are applied....
s (CPPNs) are a variation of ANNs which differ in their set of activation function
Activation function
In computational networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be "ON" or "OFF" , depending on input. This is similar to the behavior of...
s and how they are applied. While typical ANNs often contain only sigmoid function
Sigmoid function
Many natural processes, including those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time. When a detailed description is lacking, a sigmoid function is often used. A sigmoid curve is produced by a mathematical...
s (and sometimes Gaussian functions), CPPNs can include both types of functions and many others. Furthermore, unlike typical ANNs, CPPNs are applied across the entire space of possible inputs so that they can represent a complete image. Since they are compositions of functions, CPPNs in effect encode images at infinite resolution and can be sampled for a particular display at whatever resolution is optimal.
One-shot associative memory
This type of network can add new patterns without the need for re-training. It is done by creating a specific memory structure, which assigns each new pattern to an orthogonal plane using adjacently connected hierarchical arrays . The network offers real-time pattern recognition and high scalability, it however requires parallel processing and is thus best suited for platforms such as Wireless sensor networkWireless sensor network
A wireless sensor network consists of spatially distributed autonomous sensors to monitor physical or environmental conditions, such as temperature, sound, vibration, pressure, motion or pollutants and to cooperatively pass their data through the network to a main location. The more modern...
s (WSN), Grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...
, and GPGPU
GPGPU
General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...
s.
See also
- Adaptive resonance theoryAdaptive resonance theoryAdaptive Resonance Theory is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and...
- Artificial lifeArtificial lifeArtificial life is a field of study and an associated art form which examine systems related to life, its processes, and its evolution through simulations using computer models, robotics, and biochemistry. The discipline was named by Christopher Langton, an American computer scientist, in 1986...
- Associative memoryAssociative memoryAssociative memory may refer to:* A type of memory closely associated with neural networks.* A recent technology that involves creating a memory of multiple databases, both structured and unstructured data, and making associations in a manner that is similar to the human brain's memory functions.**...
- AutoencoderAutoencoderAn auto-encoder is an artificial neural network used for learning efficient codings.The aim of an auto-encoder is to learn a compressed representation for a set of data....
- Biological neural networkBiological neural networkIn neuroscience, a biological neural network describes a population of physically interconnected neurons or a group of disparate neurons whose inputs or signalling targets define a recognizable circuit. Communication between neurons often involves an electrochemical process...
- Biologically inspired computing
- Blue brainBlue BrainThe Blue Brain Project is an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level.The aim of the project, founded in May 2005 by the Brain and Mind Institute of the École Polytechnique Fédérale de Lausanne is to study the brain's architectural...
- Connectionist expert systemConnectionist expert systemConnectionist expert systems are artificial neural network based expert systems where the ANN generates inferencing rules e.g., fuzzy-multi layer perceptron where linguistic and natural form of inputs are used. Apart from that, rough set theory may be used for encoding knowledge in the weights...
- Decision treeDecision tree learningDecision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...
- Expert systemExpert systemIn artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...
- Fuzzy logicFuzzy logicFuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
- Genetic algorithmGenetic algorithmA genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...
- In Situ Adaptive TabulationIn Situ Adaptive TabulationIn situ adaptive tabulation is an algorithm for the approximation of nonlinear relationships. ISAT is based on multiple linear regressions that are dynamically added as additional information is discovered. The technique is adaptive as it adds new linear regressions dynamically to a store of...
- Learning Vector Quantization
- Linear discriminant analysisLinear discriminant analysisLinear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
- Logistic regressionLogistic regressionIn statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
- MemristorMemristorMemristor is a passive two-terminal electrical component envisioned by Leon Chua as a fundamental non-linear circuit element relating charge and magnetic flux linkage...
- Multilayer perceptronMultilayer perceptronA multilayer perceptron is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate output. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a...
- Nearest neighbor (pattern recognition)
- Neural GasNeural gasNeural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors...
- Neural networkNeural networkThe term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...
- NeuroevolutionNeuroevolutionNeuroevolution, or neuro-evolution, is a form of machine learning that uses evolutionary algorithms to train artificial neural networks. It is useful for applications such as games and robot motor control, where it is easy to measure a network's performance at a task but difficult or impossible to...
, NeuroEvolution of Augmented TopologiesNeuroEvolution of Augmented TopologiesNeuroEvolution of Augmenting Topologies is a genetic algorithm for the generation of evolving artificial neural networks developed by Ken Stanley in 2002 while at The University of Texas at Austin...
(NEAT) - Neural network softwareNeural network softwareNeural network software is used to simulate, research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems.-Simulators:...
- Ni1000Ni1000The Ni1000 is an artificial neural network chip developed by Nestor Corporation. The chip is aimed at image analysis applications, contains more than 3 million transistors and can analyze patterns at the rate of 40,000 per second...
chip - Optical neural networkOptical neural networkAn optical neural network is a physical implementation of an artificial neural network with optical components.Some artificial neural networks that have been implemented as optical neural networks include the Hopfield neural network and the Kohonen self-organizing map with liquid crystals...
- Particle swarm optimizationParticle swarm optimizationIn computer science, particle swarm optimization is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality...
- PerceptronPerceptronThe perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
- Predictive analyticsPredictive analyticsPredictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events....
- Principal components analysisPrincipal components analysisPrincipal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
- Regression analysisRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
- Simulated annealingSimulated annealingSimulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...
- Systolic arraySystolic arrayIn computer architecture, a systolic array is a pipe network arrangement of processing units called cells. It is a specialized form of parallel computing, where cells , compute data and store it independently of each other.thumb|240px...
- Time delay neural networkTime delay neural networkTime delay neural network is an alternative neural network architecture whose primary purpose is to work on continuous data. The advantage of this architecture is to adapt the network online and hence helpful in many real time applications, like time series prediction, online spell check,...
(TDNN)