Complete solution for Recurrent neural network.pptx

CNN
• Nature of CNN
-classification, object recognition, pattern
matching, clustering
• Limitation
– CNNs generally don’t perform well when the input
data is interdependent in a sequential pattern.
– No correlation between previous and next input
– outputs are self dependent
Example: If you run 100 different inputs none of them
would be biased by the previous output.

Why RNN?
Imagine a scenario like sentence generation or
text translation.

Why RNN?
• The words generated are dependent on the
words generated before
• In this case ,need to have some bias based on
the previous output.
• This is a moment where RNNs shine.
• RNNs includes a sense of memory about what
happened earlier in the sequence of data.

Why RNN?
• RNN’s are good at processing sequence data
for predictions. But how??
-Sequence should have interdependent
data.
-Examples :time series data, informative pieces
of strings, conversations etc.

Sequence of data?
• So this is a sequence, a particular order in
which one thing follows another.
• With this information, you can now see that
the ball is moving to the right.
• Sequence data comes in many forms

Audio sequence
• Audio is a natural sequence. You can chop up
an audio spectrogram into chunks and feed
that into RNN’s.

Text sequence
• Text is another form of sequences. You can
break Text up into a sequence of characters or
a sequence of words.
- “I” “am” “writing” “a” “letter”

Sequential Memory?
Sequential memory is a mechanism that
makes it easier for your brain to recognize
sequence patterns.
-I am writing a ……

Recurrent Neural Networks
• How RNN replicate this sequential memory
concept?

• Feed-forward Networks
• Recurrent Networks
• Recurrent Neuron
• Backpropagation Through Time (BPTT)

Traditional feed-forward neural
network

• In a feed-forward network whatever image is shown to the
classifier during test phase, it doesn’t alter the weights so
the second decision is not affected.
• This is one very important difference between feed-forward
networks and recurrent nets.
Note:Feed-forward nets don’t remember historic input data
at test time unlike recurrent networks.

Recurrent Networks
• How do we get a feed-forward neural network
to be able to use previous information to
effect later ones?
• An RNN has a looping mechanism that acts as
a highway to allow information to flow from
one step to the next.

Recurrent Networks
• Recurrent networks, on the other hand, take as
their input not just the current input, but also
what they have perceived previously in time.

A Simple Multilayer Perceptron

Recurrent Neuron
• A recurrent neuron now stores all the previous
step input and merges that information with the
current step input.

how recurrent neural networks work?

• Feed-forward Networks
• Recurrent Networks
• Recurrent Neuron
• Back propagation Through Time (BPTT)

• So now we understand how a RNN actually
works, but how does the training actually work?
• How do we decide the weights for each
connection? And how do we initialise these
weights for these hidden units.
• The purpose of recurrent nets is to accurately
classify sequential input. We rely on the back
propagation of error and gradient descent to do
so.
• But a standard back propagation like how used in
feed forward networks can’t be used here.

• The problem with RNNs is that they are cyclic
graphs unlike feed-forward networks which
are acyclic directional graphs.
• In feed-forward networks we could calculate
the error derivatives from the layer above. In a
RNN we don’t have such layering.

• Replication of RNN’s hidden units for every time step.
• Each replication through time step is like a layer in a
feed-forward network.
• Each time step t layer connects to all possible layers in
the time step t+1.
• Thus we randomly initialise the weights, unroll the
network and then use back propagation to optimise
the weights in the hidden layer.
• Initialisation is done by passing parameters to the
lowest layer.
• These parameters are also optimised as a part of back
propagation.

• An outcome of the unrolling is that each layer now
starts maintaining different weights and thus end up
getting optimised differently.
• The errors calculated w.r.t the weights are not
guaranteed to be equal.
• So each layer can have different weights at the end of
a single run.
• We definitely don’t want that to happen.
• The easy solution out is to aggregate the errors across
all the layers in some fashion.
• We can average out the errors or even sum them up.
• This way we can have a single layer in all time steps to
maintain the same weights.

Recurrent Network Architecture

Recurrent Network Architecture
x
A
h

Architecture for an RNN
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Sequence of outputs
Sequence of inputs
Start of
sequence
marker
End of
sequence
marker
Some
information
is passed
from one
subunit to
the next

Architecture for an 1980’s RNN
…
Problem with this: it’s extremely deep
and very hard to train

Breaking up a sentence into word
sequences

How to feed data into RNN?
• The first step is to feed “What” into the RNN.
• The RNN encodes “What” and produces an
output.

• For the next step, feed the word “time” and
the hidden state from the previous step.
• The RNN now has information on both the
word “What” and “time.”

• Repeat this process, until the final step.
• The final step of the RNN has encoded
information from all the words in previous
steps.

• Since the final output was created from the
rest of the sequence.
• Take the final output and pass it to the feed-
forward layer to classify an intent.

Limitation
• Theoretically RNNs have infinite memory,
-capability to look back indefinitely.
• But practically they can only look back a last
few steps.(The problem of Long term
dependencies)

Vanishing Gradient
• Final hidden state of the RNN
• Short-term memory is caused by the infamous
vanishing gradient problem.

Vanishing Gradient
• As the RNN processes more steps, it has troubles
retaining information from previous steps.
• The information from the word “what” and
“time” is almost non-existent at the final time
step.
• Short-Term memory and the vanishing gradient is
due to the nature of back-propagation; an
algorithm used to train and optimize neural
networks.

Vanishing Gradient in Back
Propagation Network
• Training a neural network has three major steps.
• First, it does a forward pass and makes a
prediction.
• Second, it compares the prediction to the ground
truth using a loss function. The loss function
outputs an error value which is an estimate of
how poorly the network is performing.
• Last, it uses that error value to do back
propagation which calculates the gradients for
each node in the network.

Propagation Network

Propagation Network
• The gradient is the value used to adjust the networks
internal weights, allowing the network to learn.
• The bigger the gradient, the bigger the adjustments
and vice versa.
• Here is where the problem lies.
• When doing back propagation, each node in a layer
calculates it’s gradient with respect to the effects of the
gradients, in the layer before it.
• So if the adjustments to the layers before it is small,
then adjustments to the current layer will be even
smaller.

Propagation Network
• That causes gradients to exponentially shrink
as it back propagates down.
• The earlier layers fail to do any learning as the
internal weights are barely being adjusted due
to extremely small gradients.
• And that’s the vanishing gradient problem.

Gradients shrink as it back-propagates
down

Gradients shrink as it back-propagates
through time

Impact of Gradient in BPNN
• The gradient is used to make adjustments in
the neural networks weights thus allowing it
to learn.
• Small gradients means small adjustments.
That causes the early layers not to learn.

Vanishing Gradient
• Because of vanishing gradients, the RNN doesn’t learn
the long-range dependencies across time steps.
• That means that there is a possibility that the word
“what” and “time” are not considered when trying to
predict the user’s intention.
• The network then has to make the best guess with “is
it?”.
• That’s pretty ambiguous and would be difficult even for
a human.
• So not being able to learn on earlier time steps causes
the network to have a short-term memory.

LSTM’s and GRU’s
• To mitigate short-term memory, two specialized
recurrent neural networks were created.
• One called Long Short-Term Memory or LSTM’s
for short. The other is Gated Recurrent Units or
GRU’s.
• LSTM’s and GRU’s essentially function just like
RNN’s, but they’re capable of learning long-term
dependencies using mechanisms called “gates.”

Where to use a RNN?
• Language Modelling and Generating Text
• Machine Translation
• Speech Recognition
• Generating Image Descriptions
• Video Tagging
• stock predictions

Real World Applications
• Neural Machine Translation

Real World Applications
• Sentiment Analysis

References
• https://hackernoon.com/rnn-or-recurrent-neural-
network-for-noobs-a9afbb00e860
• https://medium.com/cracking-the-data-science-
interview/recurrent-neural-networks-the-
powerhouse-of-language-modeling-
f292c918b879
• https://towardsdatascience.com/animated-rnn-
lstm-and-gru-ef124d06cf45

Complete solution for Recurrent neural network.pptx

Recommended

Recommended

More Related Content

Similar to Complete solution for Recurrent neural network.pptx

Similar to Complete solution for Recurrent neural network.pptx (20)

More from ArunKumar674066

More from ArunKumar674066 (8)

Recently uploaded

Recently uploaded (20)

Complete solution for Recurrent neural network.pptx