Recurrent Neural Networks. Part 1: Theory

Andrii Gakhov
Andrii GakhovPh.D., Senior Software Engineer at Ferret-Go
RECURRENT NEURAL NETWORKS
PART 1: THEORY
ANDRII GAKHOV
10.12.2015
Techtalk @ ferret
FEEDFORWARD NEURAL NETWORKS
NEURAL NETWORKS: INTUITION
▸ Neural network is a computational graph whose nodes are
computing units and whose directed edges transmit numerical
information from node to node.
▸ Each computing unit (neuron) is capable of evaluating a single
primitive function (activation function) of its input.
▸ In fact the network represents a chain of function compositions
which transform an input to an output vector.
3
x
y
h
V
W
h4
x1 x2 x3
y1 y2
h1 h2 h3Hidden Layer
Input Layer
Output Layer y
x
h
W
V
NEURAL NETWORKS: FNN
▸ The Feedforward neural network (FNN) is the most basic and widely
used artificial neural network. It consists of a number of layers of
computational units that are arranged into a layered configuration.
4
x
y
h
V
W
h = g Wx( )
y = f Vh( )
Unlike all hidden layers in a neural network, the output layer units
most commonly have as activation function:
‣ linear identity function (for regression problems)
‣ softmax (for classification problems).
▸ g - activation function for the hidden layer units
▸ f - activation function for the output layer units
NEURAL NETWORKS: POPULAR ACTIVATION FUNCTIONS
▸ Nonlinear “squashing” functions
▸ Easy to find the derivative
▸ Make stronger weak signals and don’t pay too much
attention to already strong signals
5
σ x( )=
1
1+ e−x
, σ :! → 0,1( ) tanh x( )=
ex
− e−x
ex
+ e−x
, tanh :! → −1,1( )
sigmoid tanh
NEURAL NETWORKS: HOW TO TRAIN
▸ The main problems for neural network training:
▸ billions parameters of the model
▸ multi-objective problem
▸ requirement of the high-level parallelism
▸ requirement to find wide domain where all minimising
functions are close to their minimums
6
Image: Wikipedia
In general, training of neural
network is an error minimization
problem
NEURAL NETWORKS: BP
▸ Feedforward neural network can be trained with
backpropagation algorithm (BP) - propagated gradient
descent algorithm, where gradients are propagated
backward, leading to very efficient computing of the higher
layer weight change.
▸ Backpropagation algorithm consists of 2 phases:
▸ Propagation. Forward propagation of inputs through the
neural network and generation output values. Backward
propagation of the output values through the neural
network and calculate errors on each layer.
▸ Weight update. Calculate gradients and correct weights
proportional to the negative gradient of the cost function
7
NEURAL NETWORKS: LIMITATIONS
▸ Feedforward neural network has several limitations due to
its architecture:
▸ accepts a fixed-sized vector as input (e.g. an image)
▸ produces a fixed-sized vector as output (e.g.
probabilities of different classes)
▸ performs such input-output mapping using a fixed
amount of computational steps (e.g. the number of
layers).
▸ These limitations make it really hard to model time series
problems when input and output are real-valued
sequences
8
RECURRENT NEURAL NETWORKS
RECURRENT NEURAL NETWORKS: INTUITION
▸ Recurrent neural network (RNN) is a neural network model
proposed in the 80’s for modelling time series.
▸ The structure of the network is similar to feedforward neural
network, with the distinction that it allows a recurrent hidden
state whose activation at each time is dependent on that of
the previous time (cycle).
T0 1 t…
x0 x1 xt
y0 y1 yt
x
y
h U
V
W
h0 h1 ht
10
RECURRENT NEURAL NETWORKS: SIMPLE RNN
▸ The time recurrence is introduced by relation for hidden layer
activity ht with its past hidden layer activity ht-1.
▸ This dependence is nonlinear because of using a logistic function.
11
ht = σ Wxt +Uht−1( )
yt = f Vht( )
xt
yt
ht-1 ht
ht
RECURRENT NEURAL NETWORKS: BPTT
The unfolded recurrent neural network can be seen as a deep
neural network, except that the recurrent weights are tied. To
train it we can use a modification of the BP algorithm that
works on sequences in time - backpropagation through time
(BPTT).
‣ For each training epoch: start by training on shorter
sequences, and then train on progressively longer
sequences until the length of max sequence (1, 2 … N-1, N).
‣ For each length of sequence k: unfold the network into a
normal feedforward network that has k hidden layers.
‣ Proceed with a standard BP algorithm.
12
RECURRENT NEURAL NETWORKS: TRUNCATED BPTT 13
Read more: http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf
One of the main problems of BPTT is the high cost of a single
parameter update, which makes it impossible to use a large
number of iterations.
‣ For instance, the gradient of an RNN on sequences of length 1000
costs the equivalent of a forward and a backward pass in a neural
network that has 1000 layers.
Truncated BPTT processes the sequence one timestep at a
time, and every T1 timesteps it runs BPTT for T2 timesteps, so
a parameter update can be cheap if T2 is small.
Truncated backpropagation is arguably the most practical
method for training RNNs
RECURRENT NEURAL NETWORKS: PROBLEMS
While in principle the recurrent network is a simple and powerful
model, in practice, it is unfortunately hard to train properly.
▸ PROBLEM: In the gradient back-propagation phase, the gradient
signal multiplied a large number of times by the weight matrix
associated with the recurrent connection.
▸ If |λ1| < 1 (weights are small) => Vanishing gradient problem
▸ If |λ1| > 1 (weights are big) => Exploding gradient problem
▸ SOLUTION:
▸ Exploding gradient problem => clipping the norm of the
exploded gradients when it is too large
▸ Vanishing gradient problem => relax nonlinear dependency to
liner dependency => LSTM, GRU, etc.
Read more: Razvan Pascanu et al. http://arxiv.org/pdf/1211.5063.pdf
14
LONG SHORT-TERM MEMORY
LONG SHORT-TERM MEMORY (LSTM)
▸ Proposed by Hochreiter & Schmidhuber (1997) and since then
has been modified by many researchers.
▸ The LSTM architecture consists of a set of recurrently
connected subnets, known as memory blocks.
▸ Each memory block consists of:
▸ memory cell - stores the state
▸ input gate - controls what to learn
▸ forget get - controls what to forget
▸ output gate - controls the amount of content to modify
▸ Unlike the traditional recurrent unit which overwrites its
content each timestep, the LSTM unit is able to decide
whether to keep the existing memory via the introduced gates
16
LSTM
x0
y0
x1 … xt
T
y1 … yt
0 1 t
? ? ? ?
▸ Model parameters:
▸ xt is the input at time t
▸ Weight matrices: Wi, Wf, Wc, Wo, Ui, Uf, Uc, Uo, Vo
▸ Bias vectors: bo, bf, bc, bo
17
The basic unit in the hidden layer of an LSTM is the memory
block that replaces the hidden units in a “traditional” RNN
LSTM MEMORY BLOCK
xt
yt
ht-1 ht
Ct-1 Ct
▸ Memory block is a subnet that allow an LSTM unit to
adaptively forget, memorise and expose the memory content.
18
it = σ Wi xt +Uiht−1 + bi( )
Ct = tanh Wcxt +Ucht−1 + bc( )
ft = σ Wf xt +Uf ht−1 + bf( )
Ct = ft ⋅Ct−1 + it ⋅Ct
ot = σ Woxt +Uoht−1 +VoCt( )
ht = ot ⋅tanh Ct( )
LSTM MEMORY BLOCK: INPUT GATE
yt
ht-1 ht
Ct-1 Ct
it
?
it = σ Wi xt +Uiht−1 + bi( )
▸ The input gate it controls the degree to which the new
memory content is added to the memory cell
19
xt
LSTM MEMORY BLOCK: CANDIDATES
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
?
Ct = tanh Wcxt +Ucht−1 + bc( )
▸ The values C̅ t are candidates for the state of the memory
cell (that could be filtered by input gate decision later)
20
xt
LSTM MEMORY BLOCK: FORGET GATE
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
?
ft = σ Wf xt +Uf ht−1 + bf( )
▸ If the detected feature seems important, the forget gate ft will
be closed and carry information about it across many timesteps,
otherwise it can reset the memory content.
21
xt
LSTM MEMORY BLOCK: FORGET GATE
▸ Sometimes it’s good to forget.
If you’re analyzing a text corpus and come to the end of a
document you may have no reason to believe that the next
document has any relationship to it whatsoever, and
therefore the memory cell should be reset before the
network gets the first element of the next document.
▸ In many cases by reset we don’t only mean immediate set
it to 0, but also gradual resets corresponding to slowly
fading cell states
22
LSTM MEMORY BLOCK: MEMORY CELL
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
?
Ct = ft ⋅Ct−1 + it ⋅Ct
▸ The new state of the memory cell Ct calculated by partially
forgetting the existing memory content Ct-1 and adding a new
memory content C̅ t
23
xt
LSTM MEMORY BLOCK: OUTPUT GATE
▸ The output gate ot controls the amount of the memory
content to yield to the next hidden state
ot = σ Woxt +Uoht−1 +VoCt( )
24
xt
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
?
ot
LSTM MEMORY BLOCK: HIDDEN STATE
ht = ot ⋅tanh Ct( )
25
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
ht
xt
ot
LSTM MEMORY BLOCK: ALL TOGETHER
xt
yt
ht-1
ht
Ct-1 Ct
it
C̅ t
ft
Ct
ht
ot
26
yt = f Vyht( )
GATED RECURRENT UNIT
GATED RECURENT UNIT (GRU)
▸ Proposed by Cho et al. [2014].
▸ It is similar to LSTM in using gating functions, but differs
from LSTM in that it doesn’t have a memory cell.
▸ Each GRU consists of:
▸ update gate
▸ reset get
28
▸ Model parameters:
▸ xt is the input at time t
▸ Weight matrices: Wz, Wr, WH, Uz, Ur, UH
GRU 29
xt
yt
ht-1 ht
ht = 1− zt( )ht−1 + zt Ht
zt = σ Wz xt +Uzht−1( )
Ht = tanh WH xt +UH rt ⋅ht−1( )( )
rt = σ Wr xt +Urht−1( )
GRU: UPDATE GATE
zt = σ Wz xt +Uzht−1( )
30
xt
yt
ht-1 ht
zt
?
▸ Update gate zt decides how much unit update its
activation or content
GRU: RESET GATE
▸ When rt close to 0 (gate off), it makes the unit act as it’s
reading the first symbol from the input sequence, allowing
it to forget previously computed states
rt = σ Wr xt +Urht−1( )
31
xt
yt
ht-1 ht
rt
zt
?
GRU: CANDIDATE ACTIVTION
▸ Update gate zt decides how much unit update its
activation or content.
Ht = tanh WH xt +UH rt ⋅ht−1( )( )
32
xt
yt
ht-1 ht
Ht
rt
zt
?
GRU: HIDDEN STATE
▸ Activation at time t is the linear interpolation between
previous activations ht-1 and candidate activation Ht
ht = 1− zt( )ht−1 + zt Ht
33
xt
yt
ht-1 ht
Ht
rt
zt ht
GRU: ALL TOGETHER 34
xt
yt
ht-1
ht
Ht
rt
zt ht
READ MORE
▸ Supervised Sequence Labelling with Recurrent Neural Networks

http://www.cs.toronto.edu/~graves/preprint.pdf
▸ On the difficulty of training Recurrent Neural Networks

http://arxiv.org/pdf/1211.5063.pdf
▸ Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

http://arxiv.org/pdf/1412.3555v1.pdf
▸ Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation

http://arxiv.org/pdf/1406.1078v3.pdf
▸ Understanding LSTM Networks

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
▸ General Sequence Learning using Recurrent Neural Networks

https://www.youtube.com/watch?v=VINCQghQRuM
35
END OF PART 1
▸ @gakhov
▸ linkedin.com/in/gakhov
▸ www.datacrucis.com
36
THANK YOU
1 of 36

Recommended

Recurrent Neural Networks, LSTM and GRU by
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
29.8K views38 slides
Rnn and lstm by
Rnn and lstmRnn and lstm
Rnn and lstmShreshth Saxena
1.1K views15 slides
Lstm by
LstmLstm
LstmMehrnaz Faraz
10.2K views31 slides
Introduction to Recurrent Neural Network by
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
2K views28 slides
Recurrent neural network by
Recurrent neural networkRecurrent neural network
Recurrent neural networkSyed Annus Ali SHah
1.3K views13 slides
Introduction to Recurrent Neural Network by
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
4.8K views32 slides

More Related Content

What's hot

LSTM Basics by
LSTM BasicsLSTM Basics
LSTM BasicsAkshay Sehgal
1.5K views13 slides
Long Short Term Memory by
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
3K views46 slides
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf... by
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!
3.7K views36 slides
Introduction For seq2seq(sequence to sequence) and RNN by
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
4K views20 slides
Understanding RNN and LSTM by
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
3.3K views15 slides
Rnn & Lstm by
Rnn & LstmRnn & Lstm
Rnn & LstmSubash Chandra Pakhrin
581 views31 slides

What's hot(20)

Long Short Term Memory by Yan Xu
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu3K views
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf... by Edureka!
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!3.7K views
Introduction For seq2seq(sequence to sequence) and RNN by Hye-min Ahn
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn4K views
Understanding RNN and LSTM by 健程 杨
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨3.3K views
RNN and its applications by Sungjoon Choi
RNN and its applicationsRNN and its applications
RNN and its applications
Sungjoon Choi8.1K views
Recurrent Neural Networks by Sharath TS
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Sharath TS895 views
Word embeddings, RNN, GRU and LSTM by Divya Gera
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera633 views
Introduction of Deep Learning by Myungjin Lee
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
Myungjin Lee6K views
Deep learning by Rajgupta258
Deep learning Deep learning
Deep learning
Rajgupta2581.6K views
Deep Learning: Recurrent Neural Network (Chapter 10) by Larry Guo
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo3.8K views
Introduction to CNN by Shuai Zhang
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang8.9K views
Time series predictions using LSTMs by Setu Chokshi
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
Setu Chokshi2K views

Viewers also liked

RNN, LSTM and Seq-2-Seq Models by
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsEmory NLP
8.6K views11 slides
Machine Learning Lecture 3 Decision Trees by
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
5.8K views26 slides
A Brief Introduction on Recurrent Neural Network and Its Application by
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
3.6K views25 slides
Recurrent Neural Networks for Text Analysis by
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
6.6K views56 slides
Recurrent Neural Network, Fractal for Deep Learning by
Recurrent Neural Network, Fractal for Deep LearningRecurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep Learning신동 강
2.7K views52 slides
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ... by
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya
3.6K views20 slides

Viewers also liked(20)

RNN, LSTM and Seq-2-Seq Models by Emory NLP
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
Emory NLP8.6K views
Machine Learning Lecture 3 Decision Trees by ananth
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth5.8K views
A Brief Introduction on Recurrent Neural Network and Its Application by Xiaohu ZHU
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU3.6K views
Recurrent Neural Networks for Text Analysis by odsc
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc6.6K views
Recurrent Neural Network, Fractal for Deep Learning by 신동 강
Recurrent Neural Network, Fractal for Deep LearningRecurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep Learning
신동 강2.7K views
Recurrent Neural Network tutorial (2nd) by 신동 강
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd)
신동 강1.9K views
Neural network & its applications by Ahmed_hashmi
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi195.3K views
LSTM - Moving to the Brightside by D2L Barry
LSTM - Moving to the BrightsideLSTM - Moving to the Brightside
LSTM - Moving to the Brightside
D2L Barry484 views
RNN Explore by Yan Kang
RNN ExploreRNN Explore
RNN Explore
Yan Kang2.7K views
Modeling Electronic Health Records with Recurrent Neural Networks by Josh Patterson
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
Josh Patterson5.3K views
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras by Taegyun Jeon
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon7K views
Electricity price forecasting with Recurrent Neural Networks by Taegyun Jeon
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon35.8K views
Deep learning and feature extraction for time series forecasting by Pavel Filonov
Deep learning and feature extraction for time series forecastingDeep learning and feature extraction for time series forecasting
Deep learning and feature extraction for time series forecasting
Pavel Filonov6.9K views
Implementing a Fileserver with Nginx and Lua by Andrii Gakhov
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and Lua
Andrii Gakhov2.1K views
Generating Sequences with Deep LSTMs & RNNS in julia by Andre Pemmelaar
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
Andre Pemmelaar7.2K views

Similar to Recurrent Neural Networks. Part 1: Theory

1-pytorch-CNN-RNN.pdf by
1-pytorch-CNN-RNN.pdf1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdfAndrey63387
51 views95 slides
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020 by
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
473 views56 slides
Hardware Acceleration for Machine Learning by
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
1.5K views256 slides
neural networksNnf by
neural networksNnfneural networksNnf
neural networksNnfSandilya Sridhara
1.2K views62 slides
Artificial Neural Networks Lect7: Neural networks based on competition by
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionMohammed Bennamoun
2.5K views62 slides
5.MLP(Multi-Layer Perceptron) by
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
735 views20 slides

Similar to Recurrent Neural Networks. Part 1: Theory(20)

1-pytorch-CNN-RNN.pdf by Andrey63387
1-pytorch-CNN-RNN.pdf1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdf
Andrey6338751 views
Hardware Acceleration for Machine Learning by CastLabKAIST
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST1.5K views
Artificial Neural Networks Lect7: Neural networks based on competition by Mohammed Bennamoun
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competition
Mohammed Bennamoun2.5K views
5.MLP(Multi-Layer Perceptron) by 艾鍗科技
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
艾鍗科技735 views
Deep learning for detection hate speech.ppt by usmanshoukat28
Deep learning for detection hate speech.pptDeep learning for detection hate speech.ppt
Deep learning for detection hate speech.ppt
usmanshoukat2895 views
Accelerating HPC Applications on NVIDIA GPUs with OpenACC by inside-BigData.com
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com2.1K views
Introduction to Neural Networks and Deep Learning by Vahid Mirjalili
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
Vahid Mirjalili689 views
Parallel Computing 2007: Bring your own parallel application by Geoffrey Fox
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox1.5K views
Memory Requirements for Convolutional Neural Network Hardware Accelerators by SepidehShirkhanzadeh
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Dm part03 neural-networks-handout by okeee
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
okeee649 views

More from Andrii Gakhov

Let's start GraphQL: structure, behavior, and architecture by
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureAndrii Gakhov
423 views90 slides
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat... by
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
735 views39 slides
Too Much Data? - Just Sample, Just Hash, ... by
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Andrii Gakhov
386 views23 slides
DNS Delegation by
DNS DelegationDNS Delegation
DNS DelegationAndrii Gakhov
902 views15 slides
Pecha Kucha: Ukrainian Food Traditions by
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsAndrii Gakhov
839 views20 slides
Probabilistic data structures. Part 4. Similarity by
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
2.4K views46 slides

More from Andrii Gakhov(20)

Let's start GraphQL: structure, behavior, and architecture by Andrii Gakhov
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architecture
Andrii Gakhov423 views
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat... by Andrii Gakhov
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Andrii Gakhov735 views
Too Much Data? - Just Sample, Just Hash, ... by Andrii Gakhov
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
Andrii Gakhov386 views
Pecha Kucha: Ukrainian Food Traditions by Andrii Gakhov
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food Traditions
Andrii Gakhov839 views
Probabilistic data structures. Part 4. Similarity by Andrii Gakhov
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
Andrii Gakhov2.4K views
Probabilistic data structures. Part 3. Frequency by Andrii Gakhov
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
Andrii Gakhov1.7K views
Probabilistic data structures. Part 2. Cardinality by Andrii Gakhov
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
Andrii Gakhov1.7K views
Вероятностные структуры данных by Andrii Gakhov
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данных
Andrii Gakhov1.3K views
Apache Big Data Europe 2015: Selected Talks by Andrii Gakhov
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
Andrii Gakhov716 views
Swagger / Quick Start Guide by Andrii Gakhov
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
Andrii Gakhov7.6K views
API Days Berlin highlights by Andrii Gakhov
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlights
Andrii Gakhov787 views
ELK - What's new and showcases by Andrii Gakhov
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
Andrii Gakhov938 views
Apache Spark Overview @ ferret by Andrii Gakhov
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
Andrii Gakhov1.2K views
Data Mining - lecture 8 - 2014 by Andrii Gakhov
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014
Andrii Gakhov1.3K views
Data Mining - lecture 7 - 2014 by Andrii Gakhov
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014
Andrii Gakhov2.5K views
Data Mining - lecture 6 - 2014 by Andrii Gakhov
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014
Andrii Gakhov1.1K views
Data Mining - lecture 5 - 2014 by Andrii Gakhov
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014
Andrii Gakhov816 views
Data Mining - lecture 4 - 2014 by Andrii Gakhov
Data Mining - lecture 4 - 2014Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014
Andrii Gakhov1K views
Data Mining - lecture 3 - 2014 by Andrii Gakhov
Data Mining - lecture 3 - 2014Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014
Andrii Gakhov1K views

Recently uploaded

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
138 views8 slides
DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
110 views21 slides
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
74 views17 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
120 views62 slides
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...ShapeBlue
86 views25 slides
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsShapeBlue
172 views13 slides

Recently uploaded(20)

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue138 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue110 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue74 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue120 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue86 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue113 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu287 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue59 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue149 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue63 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue68 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue105 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue121 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue93 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li74 views

Recurrent Neural Networks. Part 1: Theory

  • 1. RECURRENT NEURAL NETWORKS PART 1: THEORY ANDRII GAKHOV 10.12.2015 Techtalk @ ferret
  • 3. NEURAL NETWORKS: INTUITION ▸ Neural network is a computational graph whose nodes are computing units and whose directed edges transmit numerical information from node to node. ▸ Each computing unit (neuron) is capable of evaluating a single primitive function (activation function) of its input. ▸ In fact the network represents a chain of function compositions which transform an input to an output vector. 3 x y h V W h4 x1 x2 x3 y1 y2 h1 h2 h3Hidden Layer Input Layer Output Layer y x h W V
  • 4. NEURAL NETWORKS: FNN ▸ The Feedforward neural network (FNN) is the most basic and widely used artificial neural network. It consists of a number of layers of computational units that are arranged into a layered configuration. 4 x y h V W h = g Wx( ) y = f Vh( ) Unlike all hidden layers in a neural network, the output layer units most commonly have as activation function: ‣ linear identity function (for regression problems) ‣ softmax (for classification problems). ▸ g - activation function for the hidden layer units ▸ f - activation function for the output layer units
  • 5. NEURAL NETWORKS: POPULAR ACTIVATION FUNCTIONS ▸ Nonlinear “squashing” functions ▸ Easy to find the derivative ▸ Make stronger weak signals and don’t pay too much attention to already strong signals 5 σ x( )= 1 1+ e−x , σ :! → 0,1( ) tanh x( )= ex − e−x ex + e−x , tanh :! → −1,1( ) sigmoid tanh
  • 6. NEURAL NETWORKS: HOW TO TRAIN ▸ The main problems for neural network training: ▸ billions parameters of the model ▸ multi-objective problem ▸ requirement of the high-level parallelism ▸ requirement to find wide domain where all minimising functions are close to their minimums 6 Image: Wikipedia In general, training of neural network is an error minimization problem
  • 7. NEURAL NETWORKS: BP ▸ Feedforward neural network can be trained with backpropagation algorithm (BP) - propagated gradient descent algorithm, where gradients are propagated backward, leading to very efficient computing of the higher layer weight change. ▸ Backpropagation algorithm consists of 2 phases: ▸ Propagation. Forward propagation of inputs through the neural network and generation output values. Backward propagation of the output values through the neural network and calculate errors on each layer. ▸ Weight update. Calculate gradients and correct weights proportional to the negative gradient of the cost function 7
  • 8. NEURAL NETWORKS: LIMITATIONS ▸ Feedforward neural network has several limitations due to its architecture: ▸ accepts a fixed-sized vector as input (e.g. an image) ▸ produces a fixed-sized vector as output (e.g. probabilities of different classes) ▸ performs such input-output mapping using a fixed amount of computational steps (e.g. the number of layers). ▸ These limitations make it really hard to model time series problems when input and output are real-valued sequences 8
  • 10. RECURRENT NEURAL NETWORKS: INTUITION ▸ Recurrent neural network (RNN) is a neural network model proposed in the 80’s for modelling time series. ▸ The structure of the network is similar to feedforward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle). T0 1 t… x0 x1 xt y0 y1 yt x y h U V W h0 h1 ht 10
  • 11. RECURRENT NEURAL NETWORKS: SIMPLE RNN ▸ The time recurrence is introduced by relation for hidden layer activity ht with its past hidden layer activity ht-1. ▸ This dependence is nonlinear because of using a logistic function. 11 ht = σ Wxt +Uht−1( ) yt = f Vht( ) xt yt ht-1 ht ht
  • 12. RECURRENT NEURAL NETWORKS: BPTT The unfolded recurrent neural network can be seen as a deep neural network, except that the recurrent weights are tied. To train it we can use a modification of the BP algorithm that works on sequences in time - backpropagation through time (BPTT). ‣ For each training epoch: start by training on shorter sequences, and then train on progressively longer sequences until the length of max sequence (1, 2 … N-1, N). ‣ For each length of sequence k: unfold the network into a normal feedforward network that has k hidden layers. ‣ Proceed with a standard BP algorithm. 12
  • 13. RECURRENT NEURAL NETWORKS: TRUNCATED BPTT 13 Read more: http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf One of the main problems of BPTT is the high cost of a single parameter update, which makes it impossible to use a large number of iterations. ‣ For instance, the gradient of an RNN on sequences of length 1000 costs the equivalent of a forward and a backward pass in a neural network that has 1000 layers. Truncated BPTT processes the sequence one timestep at a time, and every T1 timesteps it runs BPTT for T2 timesteps, so a parameter update can be cheap if T2 is small. Truncated backpropagation is arguably the most practical method for training RNNs
  • 14. RECURRENT NEURAL NETWORKS: PROBLEMS While in principle the recurrent network is a simple and powerful model, in practice, it is unfortunately hard to train properly. ▸ PROBLEM: In the gradient back-propagation phase, the gradient signal multiplied a large number of times by the weight matrix associated with the recurrent connection. ▸ If |λ1| < 1 (weights are small) => Vanishing gradient problem ▸ If |λ1| > 1 (weights are big) => Exploding gradient problem ▸ SOLUTION: ▸ Exploding gradient problem => clipping the norm of the exploded gradients when it is too large ▸ Vanishing gradient problem => relax nonlinear dependency to liner dependency => LSTM, GRU, etc. Read more: Razvan Pascanu et al. http://arxiv.org/pdf/1211.5063.pdf 14
  • 16. LONG SHORT-TERM MEMORY (LSTM) ▸ Proposed by Hochreiter & Schmidhuber (1997) and since then has been modified by many researchers. ▸ The LSTM architecture consists of a set of recurrently connected subnets, known as memory blocks. ▸ Each memory block consists of: ▸ memory cell - stores the state ▸ input gate - controls what to learn ▸ forget get - controls what to forget ▸ output gate - controls the amount of content to modify ▸ Unlike the traditional recurrent unit which overwrites its content each timestep, the LSTM unit is able to decide whether to keep the existing memory via the introduced gates 16
  • 17. LSTM x0 y0 x1 … xt T y1 … yt 0 1 t ? ? ? ? ▸ Model parameters: ▸ xt is the input at time t ▸ Weight matrices: Wi, Wf, Wc, Wo, Ui, Uf, Uc, Uo, Vo ▸ Bias vectors: bo, bf, bc, bo 17 The basic unit in the hidden layer of an LSTM is the memory block that replaces the hidden units in a “traditional” RNN
  • 18. LSTM MEMORY BLOCK xt yt ht-1 ht Ct-1 Ct ▸ Memory block is a subnet that allow an LSTM unit to adaptively forget, memorise and expose the memory content. 18 it = σ Wi xt +Uiht−1 + bi( ) Ct = tanh Wcxt +Ucht−1 + bc( ) ft = σ Wf xt +Uf ht−1 + bf( ) Ct = ft ⋅Ct−1 + it ⋅Ct ot = σ Woxt +Uoht−1 +VoCt( ) ht = ot ⋅tanh Ct( )
  • 19. LSTM MEMORY BLOCK: INPUT GATE yt ht-1 ht Ct-1 Ct it ? it = σ Wi xt +Uiht−1 + bi( ) ▸ The input gate it controls the degree to which the new memory content is added to the memory cell 19 xt
  • 20. LSTM MEMORY BLOCK: CANDIDATES yt ht-1 ht Ct-1 Ct it C̅ t ? Ct = tanh Wcxt +Ucht−1 + bc( ) ▸ The values C̅ t are candidates for the state of the memory cell (that could be filtered by input gate decision later) 20 xt
  • 21. LSTM MEMORY BLOCK: FORGET GATE yt ht-1 ht Ct-1 Ct it C̅ t ft ? ft = σ Wf xt +Uf ht−1 + bf( ) ▸ If the detected feature seems important, the forget gate ft will be closed and carry information about it across many timesteps, otherwise it can reset the memory content. 21 xt
  • 22. LSTM MEMORY BLOCK: FORGET GATE ▸ Sometimes it’s good to forget. If you’re analyzing a text corpus and come to the end of a document you may have no reason to believe that the next document has any relationship to it whatsoever, and therefore the memory cell should be reset before the network gets the first element of the next document. ▸ In many cases by reset we don’t only mean immediate set it to 0, but also gradual resets corresponding to slowly fading cell states 22
  • 23. LSTM MEMORY BLOCK: MEMORY CELL yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ? Ct = ft ⋅Ct−1 + it ⋅Ct ▸ The new state of the memory cell Ct calculated by partially forgetting the existing memory content Ct-1 and adding a new memory content C̅ t 23 xt
  • 24. LSTM MEMORY BLOCK: OUTPUT GATE ▸ The output gate ot controls the amount of the memory content to yield to the next hidden state ot = σ Woxt +Uoht−1 +VoCt( ) 24 xt yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ? ot
  • 25. LSTM MEMORY BLOCK: HIDDEN STATE ht = ot ⋅tanh Ct( ) 25 yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ht xt ot
  • 26. LSTM MEMORY BLOCK: ALL TOGETHER xt yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ht ot 26 yt = f Vyht( )
  • 28. GATED RECURENT UNIT (GRU) ▸ Proposed by Cho et al. [2014]. ▸ It is similar to LSTM in using gating functions, but differs from LSTM in that it doesn’t have a memory cell. ▸ Each GRU consists of: ▸ update gate ▸ reset get 28 ▸ Model parameters: ▸ xt is the input at time t ▸ Weight matrices: Wz, Wr, WH, Uz, Ur, UH
  • 29. GRU 29 xt yt ht-1 ht ht = 1− zt( )ht−1 + zt Ht zt = σ Wz xt +Uzht−1( ) Ht = tanh WH xt +UH rt ⋅ht−1( )( ) rt = σ Wr xt +Urht−1( )
  • 30. GRU: UPDATE GATE zt = σ Wz xt +Uzht−1( ) 30 xt yt ht-1 ht zt ? ▸ Update gate zt decides how much unit update its activation or content
  • 31. GRU: RESET GATE ▸ When rt close to 0 (gate off), it makes the unit act as it’s reading the first symbol from the input sequence, allowing it to forget previously computed states rt = σ Wr xt +Urht−1( ) 31 xt yt ht-1 ht rt zt ?
  • 32. GRU: CANDIDATE ACTIVTION ▸ Update gate zt decides how much unit update its activation or content. Ht = tanh WH xt +UH rt ⋅ht−1( )( ) 32 xt yt ht-1 ht Ht rt zt ?
  • 33. GRU: HIDDEN STATE ▸ Activation at time t is the linear interpolation between previous activations ht-1 and candidate activation Ht ht = 1− zt( )ht−1 + zt Ht 33 xt yt ht-1 ht Ht rt zt ht
  • 34. GRU: ALL TOGETHER 34 xt yt ht-1 ht Ht rt zt ht
  • 35. READ MORE ▸ Supervised Sequence Labelling with Recurrent Neural Networks
 http://www.cs.toronto.edu/~graves/preprint.pdf ▸ On the difficulty of training Recurrent Neural Networks
 http://arxiv.org/pdf/1211.5063.pdf ▸ Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
 http://arxiv.org/pdf/1412.3555v1.pdf ▸ Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
 http://arxiv.org/pdf/1406.1078v3.pdf ▸ Understanding LSTM Networks
 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ▸ General Sequence Learning using Recurrent Neural Networks
 https://www.youtube.com/watch?v=VINCQghQRuM 35
  • 36. END OF PART 1 ▸ @gakhov ▸ linkedin.com/in/gakhov ▸ www.datacrucis.com 36 THANK YOU