2. CNN
• Nature of CNN
-classification, object recognition, pattern
matching, clustering
• Limitation
– CNNs generally don’t perform well when the input
data is interdependent in a sequential pattern.
– No correlation between previous and next input
– outputs are self dependent
Example: If you run 100 different inputs none of them
would be biased by the previous output.
4. Why RNN?
Imagine a scenario like sentence generation or
text translation.
5. Why RNN?
• The words generated are dependent on the
words generated before
• In this case ,need to have some bias based on
the previous output.
• This is a moment where RNNs shine.
• RNNs includes a sense of memory about what
happened earlier in the sequence of data.
6. Why RNN?
• RNN’s are good at processing sequence data
for predictions. But how??
-Sequence should have interdependent
data.
-Examples :time series data, informative pieces
of strings, conversations etc.
9. Sequence of data?
• So this is a sequence, a particular order in
which one thing follows another.
• With this information, you can now see that
the ball is moving to the right.
• Sequence data comes in many forms
10. Audio sequence
• Audio is a natural sequence. You can chop up
an audio spectrogram into chunks and feed
that into RNN’s.
11. Text sequence
• Text is another form of sequences. You can
break Text up into a sequence of characters or
a sequence of words.
- “I” “am” “writing” “a” “letter”
16. • In a feed-forward network whatever image is shown to the
classifier during test phase, it doesn’t alter the weights so
the second decision is not affected.
• This is one very important difference between feed-forward
networks and recurrent nets.
Note:Feed-forward nets don’t remember historic input data
at test time unlike recurrent networks.
17. • Feed-forward Networks
• Recurrent Networks
• Recurrent Neuron
• Backpropagation Through Time (BPTT)
18. Recurrent Networks
• How do we get a feed-forward neural network
to be able to use previous information to
effect later ones?
• An RNN has a looping mechanism that acts as
a highway to allow information to flow from
one step to the next.
19. Recurrent Networks
• Recurrent networks, on the other hand, take as
their input not just the current input, but also
what they have perceived previously in time.
27. • Feed-forward Networks
• Recurrent Networks
• Recurrent Neuron
• Back propagation Through Time (BPTT)
28. how recurrent neural networks work?
• So now we understand how a RNN actually
works, but how does the training actually work?
• How do we decide the weights for each
connection? And how do we initialise these
weights for these hidden units.
• The purpose of recurrent nets is to accurately
classify sequential input. We rely on the back
propagation of error and gradient descent to do
so.
• But a standard back propagation like how used in
feed forward networks can’t be used here.
29. how recurrent neural networks work?
• The problem with RNNs is that they are cyclic
graphs unlike feed-forward networks which
are acyclic directional graphs.
• In feed-forward networks we could calculate
the error derivatives from the layer above. In a
RNN we don’t have such layering.
31. Recurrent Neural Networks
• Replication of RNN’s hidden units for every time step.
• Each replication through time step is like a layer in a
feed-forward network.
• Each time step t layer connects to all possible layers in
the time step t+1.
• Thus we randomly initialise the weights, unroll the
network and then use back propagation to optimise
the weights in the hidden layer.
• Initialisation is done by passing parameters to the
lowest layer.
• These parameters are also optimised as a part of back
propagation.
32. Recurrent Neural Networks
• An outcome of the unrolling is that each layer now
starts maintaining different weights and thus end up
getting optimised differently.
• The errors calculated w.r.t the weights are not
guaranteed to be equal.
• So each layer can have different weights at the end of
a single run.
• We definitely don’t want that to happen.
• The easy solution out is to aggregate the errors across
all the layers in some fashion.
• We can average out the errors or even sum them up.
• This way we can have a single layer in all time steps to
maintain the same weights.
35. Architecture for an RNN
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Sequence of outputs
Sequence of inputs
Start of
sequence
marker
End of
sequence
marker
Some
information
is passed
from one
subunit to
the next
36. Architecture for an 1980’s RNN
…
Problem with this: it’s extremely deep
and very hard to train
40. How to feed data into RNN?
• For the next step, feed the word “time” and
the hidden state from the previous step.
• The RNN now has information on both the
word “What” and “time.”
42. How to feed data into RNN?
• Repeat this process, until the final step.
• The final step of the RNN has encoded
information from all the words in previous
steps.
44. How to feed data into RNN?
• Since the final output was created from the
rest of the sequence.
• Take the final output and pass it to the feed-
forward layer to classify an intent.
46. Limitation
• Theoretically RNNs have infinite memory,
-capability to look back indefinitely.
• But practically they can only look back a last
few steps.(The problem of Long term
dependencies)
47. Vanishing Gradient
• Final hidden state of the RNN
• Short-term memory is caused by the infamous
vanishing gradient problem.
48. Vanishing Gradient
• As the RNN processes more steps, it has troubles
retaining information from previous steps.
• The information from the word “what” and
“time” is almost non-existent at the final time
step.
• Short-Term memory and the vanishing gradient is
due to the nature of back-propagation; an
algorithm used to train and optimize neural
networks.
49. Vanishing Gradient in Back
Propagation Network
• Training a neural network has three major steps.
• First, it does a forward pass and makes a
prediction.
• Second, it compares the prediction to the ground
truth using a loss function. The loss function
outputs an error value which is an estimate of
how poorly the network is performing.
• Last, it uses that error value to do back
propagation which calculates the gradients for
each node in the network.
51. Vanishing Gradient in Back
Propagation Network
• The gradient is the value used to adjust the networks
internal weights, allowing the network to learn.
• The bigger the gradient, the bigger the adjustments
and vice versa.
• Here is where the problem lies.
• When doing back propagation, each node in a layer
calculates it’s gradient with respect to the effects of the
gradients, in the layer before it.
• So if the adjustments to the layers before it is small,
then adjustments to the current layer will be even
smaller.
52. Vanishing Gradient in Back
Propagation Network
• That causes gradients to exponentially shrink
as it back propagates down.
• The earlier layers fail to do any learning as the
internal weights are barely being adjusted due
to extremely small gradients.
• And that’s the vanishing gradient problem.
55. Impact of Gradient in BPNN
• The gradient is used to make adjustments in
the neural networks weights thus allowing it
to learn.
• Small gradients means small adjustments.
That causes the early layers not to learn.
56. Vanishing Gradient
• Because of vanishing gradients, the RNN doesn’t learn
the long-range dependencies across time steps.
• That means that there is a possibility that the word
“what” and “time” are not considered when trying to
predict the user’s intention.
• The network then has to make the best guess with “is
it?”.
• That’s pretty ambiguous and would be difficult even for
a human.
• So not being able to learn on earlier time steps causes
the network to have a short-term memory.
57. LSTM’s and GRU’s
• To mitigate short-term memory, two specialized
recurrent neural networks were created.
• One called Long Short-Term Memory or LSTM’s
for short. The other is Gated Recurrent Units or
GRU’s.
• LSTM’s and GRU’s essentially function just like
RNN’s, but they’re capable of learning long-term
dependencies using mechanisms called “gates.”
58. Where to use a RNN?
• Language Modelling and Generating Text
• Machine Translation
• Speech Recognition
• Generating Image Descriptions
• Video Tagging
• stock predictions