A recurrent neural network (RNN) is one of the types of artificial neural networks that are adapted to work with sequential data, sometimes called time-series data. Normally any feed-forward network consists of just data points which are not related to each other in a series manner. But if we have data in which the current data point depends on the previous data point then the neural networks need to be modified according to it.
The traditional neural networks have one to one architecture.
In one to many architecture, one input can have multiple outputs. Example: music generation.
In this case many inputs combine to make a single output. For example: emotion detection.
There are different possibilities to this architecture. More than one input produces multiple outputs. Example: translation systems.
Let’s see how a normal neural network looks. It can basically contain n number of input nodes, hidden nodes, and output nodes. If we look at RNN, the hidden layers have a feedback loop system inside them, making the information get passed to that same node
multiple times.
A recurrent unit takes in information for a fixed number of timesteps every time passing input and a hidden state for input for that timesteps through one activation function. So, in simpler words, the number of timesteps you have is the number of times your input will be processed.
Let’s take an example. You are predicting tomorrow’s weather temperature based on the temperature from the previous three days.
Inputs: It could be the case that you have just one node as input, but you need to feed it three temperature numbers as the input, as this is required {x0, x1,x2}.
Recurrent Layer: Normally a hidden layer or node has two parameters: bias and weight. But a recurrent node has three parameters: input, bias, and weight. It will always be three parameters irrespective of any number of timesteps.
Training: RNN uses a little changed version of backpropagation that covers the unwinding in time to train the weights of the network. RNN uses backpropagation in time for computing the gradient.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Import numpy as np
from pandas
import read_csv
from keras.models
import Sequential
from sklearn.preprocessing
import MinMaxScaler
from sklearn.metrics
import mean_squared_error
from sklearn.preprocessing
import MinMaxScaler
import math
import matplotlib.pyplot as plt
def RNN_create(hidden_units, dense_units, input_shape, activation):
model_RNN = Sequential()
model_RNN.add(SimpleRNN(hidden_units, input_shape = input_shape,
activation = activation[0])
model_RNN.add(Dense(units = dense_units, activation[1])) model_RNN.compile(loss = ‘mean_squared_error’, optimizer = ‘adam’)
return model_RNN
model_demo = RNN_create(2, 1, (3, 1), activation = [‘linear’, ‘linear’])
This model will return hidden units and it will create a simple RNN and one unit that is dense, it will be created via a dense layer. The shape of the input is set as 3 x 1, and a linear activation is used in both layers. Let’s recall that the linear activation function f(x) = x doesn’t make any change in the input.
RNN Advantages and Disadvantages
RNNs have various advantages such as:
It handles sequenced data.
It memorizes the historical information.
It handles varying lengths of inputs.
Disadvantages
Slow computation
Future inputs are not taken into consideration by the RNN.
There are various types of RNN architecture:
They have two gates, an update and reset gate. GRNNs are made to handle the gradient problem. The information to be maintained for future predictions are determined by these gates.
The timesteps in the future are considered to rectify the accuracy of the BRNN. It is the same as considering the first and last word of a sentence to get the middle word.
LSTM are made to handle the gradient vanishing problem in RNN. The three gates input, forget, and output are used in this network. The information that must be retained is handled by these gates.