Backpropagation through time
Encyclopedia
Backpropagation through time (BPTT) is a gradient
-based technique for training certain types of recurrent neural networks
. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers
BPTT begins by unfolding a recurrent neural network through time as shown in this figure. This recurrent neural network contains two feed-forward neural networks, f and g. When the network is unfolded through time, the unfolded network contains k instances of f and one instance of g. In the example shown, the network has been unfolded to a depth of k=3.
Training then proceeds in a manner similar to training a feed-forward neural network with backpropagation
, except that each epoch must run through the observations, , in sequential order. Each training pattern consists of . (All of the actions for k time-steps are needed because the unfolded network contains inputs at each unfolded level.) Typically, backpropagation
is applied in an online manner to update the weights as each training pattern is presented.
After each pattern is presented, and the weights have been updated, the weights in each instance of f () are averaged together so that they all have the same weights. Also, is calculated as , which provides the information necessary so that the algorithm can move on to the next time-step, t+1.
Gradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
-based technique for training certain types of recurrent neural networks
Recurrent neural network
A recurrent neural network is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process...
. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers
Algorithm
To train a recurrent neural network using BPTT, some training data is needed. This data should be an ordered sequence of input-output pairs, . Also, an initial value must be specified for . Typically, the vector with zero-magnitude is used for this purpose.BPTT begins by unfolding a recurrent neural network through time as shown in this figure. This recurrent neural network contains two feed-forward neural networks, f and g. When the network is unfolded through time, the unfolded network contains k instances of f and one instance of g. In the example shown, the network has been unfolded to a depth of k=3.
Training then proceeds in a manner similar to training a feed-forward neural network with backpropagation
Backpropagation
Backpropagation is a common method of teaching artificial neural networks how to perform a given task. Arthur E. Bryson and Yu-Chi Ho described it as a multi-stage dynamic system optimization method in 1969 . It wasn't until 1974 and later, when applied in the context of neural networks and...
, except that each epoch must run through the observations, , in sequential order. Each training pattern consists of . (All of the actions for k time-steps are needed because the unfolded network contains inputs at each unfolded level.) Typically, backpropagation
Backpropagation
Backpropagation is a common method of teaching artificial neural networks how to perform a given task. Arthur E. Bryson and Yu-Chi Ho described it as a multi-stage dynamic system optimization method in 1969 . It wasn't until 1974 and later, when applied in the context of neural networks and...
is applied in an online manner to update the weights as each training pattern is presented.
After each pattern is presented, and the weights have been updated, the weights in each instance of f () are averaged together so that they all have the same weights. Also, is calculated as , which provides the information necessary so that the algorithm can move on to the next time-step, t+1.
Pseudo-code
Pseudo-code for BPTT:
Back_Propagation_Through_Time(a, y) // a[t] is the input at time t. y[t] is the output
Unfold the network to contain k instances of f
do until stopping criteria is met:
x = the zero-magnitude vector;// x is the current context
for t from 0 to n - 1 // t is time. n is the length of the training sequence
Set the network inputs to x, a[t], a[t+1], ..., a[t+k-1]
p = forward-propagate the inputs over the whole unfolded network
e = y[t+k] - p; // error = target - prediction
Back-propagate the error, e, back across the whole unfolded network
Update all the weights in the network
Average the weights in each instance of f together, so that each f is identical
x = f(x); // compute the context for the next time-step