An Introduction to Long Short-term Memory (LSTMs)

LONG SHORT-
TERM MEMORY
Ssenjovu Emmanuel Joster
Student-No: 2001200052
Bsc. Information Technology
Faculty of Technoscience
Dept. Computer Science & Electrical Engineering
Muni University || P.O.Box 725, Arua(UG)

Artificial neurons are inspired by and modeled
after the biological structure of the human brain
ANNs are capable of learning to solve problems
in a way our brains can do naturally.
Artificial Neurons
First things first...
Connected neurons then form a network, hence
the name neural network consisting of : input
layer, hidden layer, and output layer

● First step in a neural network.
● The network makes a prediction on what the output
would be given an input.
● To propagate the input across the layers, we perform
functions like that of below:
a) Forward Propagation
How ANNs work?
A case of feedforward ANNs

● Comes into play in training phase of a neural
network.
● Involves adjusting the weights until network can
produce desired outputs.
● We calculate error and its gradients with respect to
each weight at each layer & subsequently adjust
weights:
b) Back Propagation
How ANNs work?

● Built upon ANNs like feedforward neural networks
● Have additional connections between layers making
them eg feedback loop.
● ideal for sequential inputs eg text, music, speech,
handwriting, change of price in stock markets.
The previous step’s hidden layer and final outputs
are fed back into the network and will be used as
input to the next steps’ hidden layer, which means
the network will remember the past and it will
repeatedly predict what will happen next.
Traditional RNN
a) Forward Propagation
Memory heavy, and hard to train for long-term
temporal dependency

● We calculate the error the weight matrices W
generate, and then adjust their weights until the
error cannot go any lower.
● To compute the gradient for the current , we
𝑊
need to perform the chain rule through a series of
previous time steps. Because of this, we call the
process back propagation through time (BPTT). If
the sequences are quite long, the BPTT can take a
long time;
Traditional RNN
ln practice many people truncate the backpropagation
to a few steps instead of all the way to the beginning.
a) Back Propagation Through Time(BPTT)

The
Vanishing Gradient
In multilayered ANNs eg RNN the vanishing gradient problem refers to
the situation where the gradients(derivatives) of the loss function with
respect to weights of early layers in a deep neural network become
extremely too small to allow training for activities which require long-
term dependency. For-example if we are predicting the next word in
longer multi-sentence paragraph, it is less likely that that the model will
be able to remember the first words in the beginning of the paragraph.

LONG SHORT-TERM MEMORY
● LSTM is an implementation of improved RNN
architecture to address the issues of general RNN
● Enables long-range dependencies.
● Has better memory through linear memory cells
surrounded by a set of gate units used to control
the flow of information.
● It uses no activation function within its recurrent
components, thus the gradient term does not
vanish with back propagation.

The “Hidden Layer” Of an LSTM
The hidden layer s made of a cell which s surrounded by gates that
control the flow of information

1.Forget Gate
● First step in which the sigmoid
function outputs a value ranging from
0 to 1 to
● Determines how much information of
the previous hidden state and current
input it should retain.
● LSTM does not necessarily need to
remember everything that has
happened in the past.
2.Input Gate 3.Output Gate
The gates of an LSTM
The gates perform the following functions to control the flow of
information
Next step and involves two parts;
● First, the input gate determines what
new information to store in the
memory cell.
● Next, a tanh layer creates a vector of
new candidate values to be added to
the state.
● To determine what to output from the
memory cell, we again apply the
sigmoid function to the previous
hidden state and current input, then
multiply that with tanh applied to the
new memory cell(this will make the
values between -1 and 1)

Memory
LSTM has an actual memory
built into the architecture that
lacks in RNN
Vanshng/ Exploding
gradients
LSTMs can deal with these
problems
Accuracy
More accurate predictions
for larger sequential data
Long-term
dependency
Able to capture complex
patterns n huge datasets
WHY LSTMs?

[1] Mastering Machine Learning with Python in Six Steps By Manohar
Swamynathan Bangalore
[2] Long Short-Term Memory By Sepp Hochreiter, Fakultät für Informatik,
Technische Universität München, 80290 München, Germany
[3] Long Short-Term Memory-Networks for Machine Reading, Jianpeng Cheng, Li
Dong and Mirella Lapata , School of Informatics, University of Edinburgh
[4] Long Short-Term Memory, M. Stanley Fujimoto, CS778–Winter2016,30 Jan
2016
References:

An Introduction to Long Short-term Memory (LSTMs)

In this document

More Related Content

What's hot

Similar to An Introduction to Long Short-term Memory (LSTMs)

Recently uploaded

An Introduction to Long Short-term Memory (LSTMs)