SlideShare a Scribd company logo
1 of 36
Download to read offline
RECURRENT NEURAL NETWORKS
PART 1: THEORY
ANDRII GAKHOV
10.12.2015
Techtalk @ ferret
FEEDFORWARD NEURAL NETWORKS
NEURAL NETWORKS: INTUITION
▸ Neural network is a computational graph whose nodes are
computing units and whose directed edges transmit numerical
information from node to node.
▸ Each computing unit (neuron) is capable of evaluating a single
primitive function (activation function) of its input.
▸ In fact the network represents a chain of function compositions
which transform an input to an output vector.
3
x
y
h
V
W
h4
x1 x2 x3
y1 y2
h1 h2 h3Hidden Layer
Input Layer
Output Layer y
x
h
W
V
NEURAL NETWORKS: FNN
▸ The Feedforward neural network (FNN) is the most basic and widely
used artificial neural network. It consists of a number of layers of
computational units that are arranged into a layered configuration.
4
x
y
h
V
W
h = g Wx( )
y = f Vh( )
Unlike all hidden layers in a neural network, the output layer units
most commonly have as activation function:
‣ linear identity function (for regression problems)
‣ softmax (for classification problems).
▸ g - activation function for the hidden layer units
▸ f - activation function for the output layer units
NEURAL NETWORKS: POPULAR ACTIVATION FUNCTIONS
▸ Nonlinear “squashing” functions
▸ Easy to find the derivative
▸ Make stronger weak signals and don’t pay too much
attention to already strong signals
5
σ x( )=
1
1+ e−x
, σ :! → 0,1( ) tanh x( )=
ex
− e−x
ex
+ e−x
, tanh :! → −1,1( )
sigmoid tanh
NEURAL NETWORKS: HOW TO TRAIN
▸ The main problems for neural network training:
▸ billions parameters of the model
▸ multi-objective problem
▸ requirement of the high-level parallelism
▸ requirement to find wide domain where all minimising
functions are close to their minimums
6
Image: Wikipedia
In general, training of neural
network is an error minimization
problem
NEURAL NETWORKS: BP
▸ Feedforward neural network can be trained with
backpropagation algorithm (BP) - propagated gradient
descent algorithm, where gradients are propagated
backward, leading to very efficient computing of the higher
layer weight change.
▸ Backpropagation algorithm consists of 2 phases:
▸ Propagation. Forward propagation of inputs through the
neural network and generation output values. Backward
propagation of the output values through the neural
network and calculate errors on each layer.
▸ Weight update. Calculate gradients and correct weights
proportional to the negative gradient of the cost function
7
NEURAL NETWORKS: LIMITATIONS
▸ Feedforward neural network has several limitations due to
its architecture:
▸ accepts a fixed-sized vector as input (e.g. an image)
▸ produces a fixed-sized vector as output (e.g.
probabilities of different classes)
▸ performs such input-output mapping using a fixed
amount of computational steps (e.g. the number of
layers).
▸ These limitations make it really hard to model time series
problems when input and output are real-valued
sequences
8
RECURRENT NEURAL NETWORKS
RECURRENT NEURAL NETWORKS: INTUITION
▸ Recurrent neural network (RNN) is a neural network model
proposed in the 80’s for modelling time series.
▸ The structure of the network is similar to feedforward neural
network, with the distinction that it allows a recurrent hidden
state whose activation at each time is dependent on that of
the previous time (cycle).
T0 1 t…
x0 x1 xt
y0 y1 yt
x
y
h U
V
W
h0 h1 ht
10
RECURRENT NEURAL NETWORKS: SIMPLE RNN
▸ The time recurrence is introduced by relation for hidden layer
activity ht with its past hidden layer activity ht-1.
▸ This dependence is nonlinear because of using a logistic function.
11
ht = σ Wxt +Uht−1( )
yt = f Vht( )
xt
yt
ht-1 ht
ht
RECURRENT NEURAL NETWORKS: BPTT
The unfolded recurrent neural network can be seen as a deep
neural network, except that the recurrent weights are tied. To
train it we can use a modification of the BP algorithm that
works on sequences in time - backpropagation through time
(BPTT).
‣ For each training epoch: start by training on shorter
sequences, and then train on progressively longer
sequences until the length of max sequence (1, 2 … N-1, N).
‣ For each length of sequence k: unfold the network into a
normal feedforward network that has k hidden layers.
‣ Proceed with a standard BP algorithm.
12
RECURRENT NEURAL NETWORKS: TRUNCATED BPTT 13
Read more: http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf
One of the main problems of BPTT is the high cost of a single
parameter update, which makes it impossible to use a large
number of iterations.
‣ For instance, the gradient of an RNN on sequences of length 1000
costs the equivalent of a forward and a backward pass in a neural
network that has 1000 layers.
Truncated BPTT processes the sequence one timestep at a
time, and every T1 timesteps it runs BPTT for T2 timesteps, so
a parameter update can be cheap if T2 is small.
Truncated backpropagation is arguably the most practical
method for training RNNs
RECURRENT NEURAL NETWORKS: PROBLEMS
While in principle the recurrent network is a simple and powerful
model, in practice, it is unfortunately hard to train properly.
▸ PROBLEM: In the gradient back-propagation phase, the gradient
signal multiplied a large number of times by the weight matrix
associated with the recurrent connection.
▸ If |λ1| < 1 (weights are small) => Vanishing gradient problem
▸ If |λ1| > 1 (weights are big) => Exploding gradient problem
▸ SOLUTION:
▸ Exploding gradient problem => clipping the norm of the
exploded gradients when it is too large
▸ Vanishing gradient problem => relax nonlinear dependency to
liner dependency => LSTM, GRU, etc.
Read more: Razvan Pascanu et al. http://arxiv.org/pdf/1211.5063.pdf
14
LONG SHORT-TERM MEMORY
LONG SHORT-TERM MEMORY (LSTM)
▸ Proposed by Hochreiter & Schmidhuber (1997) and since then
has been modified by many researchers.
▸ The LSTM architecture consists of a set of recurrently
connected subnets, known as memory blocks.
▸ Each memory block consists of:
▸ memory cell - stores the state
▸ input gate - controls what to learn
▸ forget get - controls what to forget
▸ output gate - controls the amount of content to modify
▸ Unlike the traditional recurrent unit which overwrites its
content each timestep, the LSTM unit is able to decide
whether to keep the existing memory via the introduced gates
16
LSTM
x0
y0
x1 … xt
T
y1 … yt
0 1 t
? ? ? ?
▸ Model parameters:
▸ xt is the input at time t
▸ Weight matrices: Wi, Wf, Wc, Wo, Ui, Uf, Uc, Uo, Vo
▸ Bias vectors: bo, bf, bc, bo
17
The basic unit in the hidden layer of an LSTM is the memory
block that replaces the hidden units in a “traditional” RNN
LSTM MEMORY BLOCK
xt
yt
ht-1 ht
Ct-1 Ct
▸ Memory block is a subnet that allow an LSTM unit to
adaptively forget, memorise and expose the memory content.
18
it = σ Wi xt +Uiht−1 + bi( )
Ct = tanh Wcxt +Ucht−1 + bc( )
ft = σ Wf xt +Uf ht−1 + bf( )
Ct = ft ⋅Ct−1 + it ⋅Ct
ot = σ Woxt +Uoht−1 +VoCt( )
ht = ot ⋅tanh Ct( )
LSTM MEMORY BLOCK: INPUT GATE
yt
ht-1 ht
Ct-1 Ct
it
?
it = σ Wi xt +Uiht−1 + bi( )
▸ The input gate it controls the degree to which the new
memory content is added to the memory cell
19
xt
LSTM MEMORY BLOCK: CANDIDATES
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
?
Ct = tanh Wcxt +Ucht−1 + bc( )
▸ The values C̅ t are candidates for the state of the memory
cell (that could be filtered by input gate decision later)
20
xt
LSTM MEMORY BLOCK: FORGET GATE
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
?
ft = σ Wf xt +Uf ht−1 + bf( )
▸ If the detected feature seems important, the forget gate ft will
be closed and carry information about it across many timesteps,
otherwise it can reset the memory content.
21
xt
LSTM MEMORY BLOCK: FORGET GATE
▸ Sometimes it’s good to forget.
If you’re analyzing a text corpus and come to the end of a
document you may have no reason to believe that the next
document has any relationship to it whatsoever, and
therefore the memory cell should be reset before the
network gets the first element of the next document.
▸ In many cases by reset we don’t only mean immediate set
it to 0, but also gradual resets corresponding to slowly
fading cell states
22
LSTM MEMORY BLOCK: MEMORY CELL
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
?
Ct = ft ⋅Ct−1 + it ⋅Ct
▸ The new state of the memory cell Ct calculated by partially
forgetting the existing memory content Ct-1 and adding a new
memory content C̅ t
23
xt
LSTM MEMORY BLOCK: OUTPUT GATE
▸ The output gate ot controls the amount of the memory
content to yield to the next hidden state
ot = σ Woxt +Uoht−1 +VoCt( )
24
xt
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
?
ot
LSTM MEMORY BLOCK: HIDDEN STATE
ht = ot ⋅tanh Ct( )
25
yt
ht-1 ht
Ct-1 Ct
it
C̅ t
ft
Ct
ht
xt
ot
LSTM MEMORY BLOCK: ALL TOGETHER
xt
yt
ht-1
ht
Ct-1 Ct
it
C̅ t
ft
Ct
ht
ot
26
yt = f Vyht( )
GATED RECURRENT UNIT
GATED RECURENT UNIT (GRU)
▸ Proposed by Cho et al. [2014].
▸ It is similar to LSTM in using gating functions, but differs
from LSTM in that it doesn’t have a memory cell.
▸ Each GRU consists of:
▸ update gate
▸ reset get
28
▸ Model parameters:
▸ xt is the input at time t
▸ Weight matrices: Wz, Wr, WH, Uz, Ur, UH
GRU 29
xt
yt
ht-1 ht
ht = 1− zt( )ht−1 + zt Ht
zt = σ Wz xt +Uzht−1( )
Ht = tanh WH xt +UH rt ⋅ht−1( )( )
rt = σ Wr xt +Urht−1( )
GRU: UPDATE GATE
zt = σ Wz xt +Uzht−1( )
30
xt
yt
ht-1 ht
zt
?
▸ Update gate zt decides how much unit update its
activation or content
GRU: RESET GATE
▸ When rt close to 0 (gate off), it makes the unit act as it’s
reading the first symbol from the input sequence, allowing
it to forget previously computed states
rt = σ Wr xt +Urht−1( )
31
xt
yt
ht-1 ht
rt
zt
?
GRU: CANDIDATE ACTIVTION
▸ Update gate zt decides how much unit update its
activation or content.
Ht = tanh WH xt +UH rt ⋅ht−1( )( )
32
xt
yt
ht-1 ht
Ht
rt
zt
?
GRU: HIDDEN STATE
▸ Activation at time t is the linear interpolation between
previous activations ht-1 and candidate activation Ht
ht = 1− zt( )ht−1 + zt Ht
33
xt
yt
ht-1 ht
Ht
rt
zt ht
GRU: ALL TOGETHER 34
xt
yt
ht-1
ht
Ht
rt
zt ht
READ MORE
▸ Supervised Sequence Labelling with Recurrent Neural Networks

http://www.cs.toronto.edu/~graves/preprint.pdf
▸ On the difficulty of training Recurrent Neural Networks

http://arxiv.org/pdf/1211.5063.pdf
▸ Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

http://arxiv.org/pdf/1412.3555v1.pdf
▸ Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation

http://arxiv.org/pdf/1406.1078v3.pdf
▸ Understanding LSTM Networks

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
▸ General Sequence Learning using Recurrent Neural Networks

https://www.youtube.com/watch?v=VINCQghQRuM
35
END OF PART 1
▸ @gakhov
▸ linkedin.com/in/gakhov
▸ www.datacrucis.com
36
THANK YOU

More Related Content

What's hot

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning健程 杨
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Simplilearn
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applicationsAnas Arram, Ph.D
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 

What's hot (20)

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
LSTM
LSTMLSTM
LSTM
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 

Viewers also liked

RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsEmory NLP
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Recurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep LearningRecurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep Learning신동 강
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya
 
Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) 신동 강
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
LSTM - Moving to the Brightside
LSTM - Moving to the BrightsideLSTM - Moving to the Brightside
LSTM - Moving to the BrightsideD2L Barry
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
RNN Explore
RNN ExploreRNN Explore
RNN ExploreYan Kang
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasTaegyun Jeon
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
Deep learning and feature extraction for time series forecasting
Deep learning and feature extraction for time series forecastingDeep learning and feature extraction for time series forecasting
Deep learning and feature extraction for time series forecastingPavel Filonov
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 

Viewers also liked (20)

RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Recurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep LearningRecurrent Neural Network, Fractal for Deep Learning
Recurrent Neural Network, Fractal for Deep Learning
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
 
Recurrent neural networks
Recurrent neural networksRecurrent neural networks
Recurrent neural networks
 
Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd)
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
LSTM - Moving to the Brightside
LSTM - Moving to the BrightsideLSTM - Moving to the Brightside
LSTM - Moving to the Brightside
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
RNN Explore
RNN ExploreRNN Explore
RNN Explore
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Deep learning and feature extraction for time series forecasting
Deep learning and feature extraction for time series forecastingDeep learning and feature extraction for time series forecasting
Deep learning and feature extraction for time series forecasting
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and Lua
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 

Similar to RNN Theory and Limitations

1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdf1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdfAndrey63387
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionMohammed Bennamoun
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
 
Deep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptDeep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptShankerRajendiran2
 
Deep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptDeep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptSatyaNarayana594629
 
Deep learning for detection hate speech.ppt
Deep learning for detection hate speech.pptDeep learning for detection hate speech.ppt
Deep learning for detection hate speech.pptusmanshoukat28
 
12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.ppt12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.pptManiMaran230751
 
Deep-Learning-2017-Lecture ML DL RNN.ppt
Deep-Learning-2017-Lecture  ML DL RNN.pptDeep-Learning-2017-Lecture  ML DL RNN.ppt
Deep-Learning-2017-Lecture ML DL RNN.pptManiMaran230751
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 

Similar to RNN Theory and Limitations (20)

1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdf1-pytorch-CNN-RNN.pdf
1-pytorch-CNN-RNN.pdf
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
neural networksNnf
neural networksNnfneural networksNnf
neural networksNnf
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competition
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
 
Deep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptDeep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.ppt
 
Deep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.pptDeep-Learning-2017-Lecture6RNN.ppt
Deep-Learning-2017-Lecture6RNN.ppt
 
RNN.ppt
RNN.pptRNN.ppt
RNN.ppt
 
Deep learning for detection hate speech.ppt
Deep learning for detection hate speech.pptDeep learning for detection hate speech.ppt
Deep learning for detection hate speech.ppt
 
12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.ppt12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.ppt
 
Deep-Learning-2017-Lecture ML DL RNN.ppt
Deep-Learning-2017-Lecture  ML DL RNN.pptDeep-Learning-2017-Lecture  ML DL RNN.ppt
Deep-Learning-2017-Lecture ML DL RNN.ppt
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 

More from Andrii Gakhov

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureAndrii Gakhov
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Andrii Gakhov
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsAndrii Gakhov
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данныхAndrii Gakhov
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksAndrii Gakhov
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start GuideAndrii Gakhov
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlightsAndrii Gakhov
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Andrii Gakhov
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Andrii Gakhov
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Andrii Gakhov
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Andrii Gakhov
 
Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Andrii Gakhov
 
Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Andrii Gakhov
 

More from Andrii Gakhov (20)

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architecture
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
 
DNS Delegation
DNS DelegationDNS Delegation
DNS Delegation
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food Traditions
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данных
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlights
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014
 
Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014
 
Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

RNN Theory and Limitations

  • 1. RECURRENT NEURAL NETWORKS PART 1: THEORY ANDRII GAKHOV 10.12.2015 Techtalk @ ferret
  • 3. NEURAL NETWORKS: INTUITION ▸ Neural network is a computational graph whose nodes are computing units and whose directed edges transmit numerical information from node to node. ▸ Each computing unit (neuron) is capable of evaluating a single primitive function (activation function) of its input. ▸ In fact the network represents a chain of function compositions which transform an input to an output vector. 3 x y h V W h4 x1 x2 x3 y1 y2 h1 h2 h3Hidden Layer Input Layer Output Layer y x h W V
  • 4. NEURAL NETWORKS: FNN ▸ The Feedforward neural network (FNN) is the most basic and widely used artificial neural network. It consists of a number of layers of computational units that are arranged into a layered configuration. 4 x y h V W h = g Wx( ) y = f Vh( ) Unlike all hidden layers in a neural network, the output layer units most commonly have as activation function: ‣ linear identity function (for regression problems) ‣ softmax (for classification problems). ▸ g - activation function for the hidden layer units ▸ f - activation function for the output layer units
  • 5. NEURAL NETWORKS: POPULAR ACTIVATION FUNCTIONS ▸ Nonlinear “squashing” functions ▸ Easy to find the derivative ▸ Make stronger weak signals and don’t pay too much attention to already strong signals 5 σ x( )= 1 1+ e−x , σ :! → 0,1( ) tanh x( )= ex − e−x ex + e−x , tanh :! → −1,1( ) sigmoid tanh
  • 6. NEURAL NETWORKS: HOW TO TRAIN ▸ The main problems for neural network training: ▸ billions parameters of the model ▸ multi-objective problem ▸ requirement of the high-level parallelism ▸ requirement to find wide domain where all minimising functions are close to their minimums 6 Image: Wikipedia In general, training of neural network is an error minimization problem
  • 7. NEURAL NETWORKS: BP ▸ Feedforward neural network can be trained with backpropagation algorithm (BP) - propagated gradient descent algorithm, where gradients are propagated backward, leading to very efficient computing of the higher layer weight change. ▸ Backpropagation algorithm consists of 2 phases: ▸ Propagation. Forward propagation of inputs through the neural network and generation output values. Backward propagation of the output values through the neural network and calculate errors on each layer. ▸ Weight update. Calculate gradients and correct weights proportional to the negative gradient of the cost function 7
  • 8. NEURAL NETWORKS: LIMITATIONS ▸ Feedforward neural network has several limitations due to its architecture: ▸ accepts a fixed-sized vector as input (e.g. an image) ▸ produces a fixed-sized vector as output (e.g. probabilities of different classes) ▸ performs such input-output mapping using a fixed amount of computational steps (e.g. the number of layers). ▸ These limitations make it really hard to model time series problems when input and output are real-valued sequences 8
  • 10. RECURRENT NEURAL NETWORKS: INTUITION ▸ Recurrent neural network (RNN) is a neural network model proposed in the 80’s for modelling time series. ▸ The structure of the network is similar to feedforward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle). T0 1 t… x0 x1 xt y0 y1 yt x y h U V W h0 h1 ht 10
  • 11. RECURRENT NEURAL NETWORKS: SIMPLE RNN ▸ The time recurrence is introduced by relation for hidden layer activity ht with its past hidden layer activity ht-1. ▸ This dependence is nonlinear because of using a logistic function. 11 ht = σ Wxt +Uht−1( ) yt = f Vht( ) xt yt ht-1 ht ht
  • 12. RECURRENT NEURAL NETWORKS: BPTT The unfolded recurrent neural network can be seen as a deep neural network, except that the recurrent weights are tied. To train it we can use a modification of the BP algorithm that works on sequences in time - backpropagation through time (BPTT). ‣ For each training epoch: start by training on shorter sequences, and then train on progressively longer sequences until the length of max sequence (1, 2 … N-1, N). ‣ For each length of sequence k: unfold the network into a normal feedforward network that has k hidden layers. ‣ Proceed with a standard BP algorithm. 12
  • 13. RECURRENT NEURAL NETWORKS: TRUNCATED BPTT 13 Read more: http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf One of the main problems of BPTT is the high cost of a single parameter update, which makes it impossible to use a large number of iterations. ‣ For instance, the gradient of an RNN on sequences of length 1000 costs the equivalent of a forward and a backward pass in a neural network that has 1000 layers. Truncated BPTT processes the sequence one timestep at a time, and every T1 timesteps it runs BPTT for T2 timesteps, so a parameter update can be cheap if T2 is small. Truncated backpropagation is arguably the most practical method for training RNNs
  • 14. RECURRENT NEURAL NETWORKS: PROBLEMS While in principle the recurrent network is a simple and powerful model, in practice, it is unfortunately hard to train properly. ▸ PROBLEM: In the gradient back-propagation phase, the gradient signal multiplied a large number of times by the weight matrix associated with the recurrent connection. ▸ If |λ1| < 1 (weights are small) => Vanishing gradient problem ▸ If |λ1| > 1 (weights are big) => Exploding gradient problem ▸ SOLUTION: ▸ Exploding gradient problem => clipping the norm of the exploded gradients when it is too large ▸ Vanishing gradient problem => relax nonlinear dependency to liner dependency => LSTM, GRU, etc. Read more: Razvan Pascanu et al. http://arxiv.org/pdf/1211.5063.pdf 14
  • 16. LONG SHORT-TERM MEMORY (LSTM) ▸ Proposed by Hochreiter & Schmidhuber (1997) and since then has been modified by many researchers. ▸ The LSTM architecture consists of a set of recurrently connected subnets, known as memory blocks. ▸ Each memory block consists of: ▸ memory cell - stores the state ▸ input gate - controls what to learn ▸ forget get - controls what to forget ▸ output gate - controls the amount of content to modify ▸ Unlike the traditional recurrent unit which overwrites its content each timestep, the LSTM unit is able to decide whether to keep the existing memory via the introduced gates 16
  • 17. LSTM x0 y0 x1 … xt T y1 … yt 0 1 t ? ? ? ? ▸ Model parameters: ▸ xt is the input at time t ▸ Weight matrices: Wi, Wf, Wc, Wo, Ui, Uf, Uc, Uo, Vo ▸ Bias vectors: bo, bf, bc, bo 17 The basic unit in the hidden layer of an LSTM is the memory block that replaces the hidden units in a “traditional” RNN
  • 18. LSTM MEMORY BLOCK xt yt ht-1 ht Ct-1 Ct ▸ Memory block is a subnet that allow an LSTM unit to adaptively forget, memorise and expose the memory content. 18 it = σ Wi xt +Uiht−1 + bi( ) Ct = tanh Wcxt +Ucht−1 + bc( ) ft = σ Wf xt +Uf ht−1 + bf( ) Ct = ft ⋅Ct−1 + it ⋅Ct ot = σ Woxt +Uoht−1 +VoCt( ) ht = ot ⋅tanh Ct( )
  • 19. LSTM MEMORY BLOCK: INPUT GATE yt ht-1 ht Ct-1 Ct it ? it = σ Wi xt +Uiht−1 + bi( ) ▸ The input gate it controls the degree to which the new memory content is added to the memory cell 19 xt
  • 20. LSTM MEMORY BLOCK: CANDIDATES yt ht-1 ht Ct-1 Ct it C̅ t ? Ct = tanh Wcxt +Ucht−1 + bc( ) ▸ The values C̅ t are candidates for the state of the memory cell (that could be filtered by input gate decision later) 20 xt
  • 21. LSTM MEMORY BLOCK: FORGET GATE yt ht-1 ht Ct-1 Ct it C̅ t ft ? ft = σ Wf xt +Uf ht−1 + bf( ) ▸ If the detected feature seems important, the forget gate ft will be closed and carry information about it across many timesteps, otherwise it can reset the memory content. 21 xt
  • 22. LSTM MEMORY BLOCK: FORGET GATE ▸ Sometimes it’s good to forget. If you’re analyzing a text corpus and come to the end of a document you may have no reason to believe that the next document has any relationship to it whatsoever, and therefore the memory cell should be reset before the network gets the first element of the next document. ▸ In many cases by reset we don’t only mean immediate set it to 0, but also gradual resets corresponding to slowly fading cell states 22
  • 23. LSTM MEMORY BLOCK: MEMORY CELL yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ? Ct = ft ⋅Ct−1 + it ⋅Ct ▸ The new state of the memory cell Ct calculated by partially forgetting the existing memory content Ct-1 and adding a new memory content C̅ t 23 xt
  • 24. LSTM MEMORY BLOCK: OUTPUT GATE ▸ The output gate ot controls the amount of the memory content to yield to the next hidden state ot = σ Woxt +Uoht−1 +VoCt( ) 24 xt yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ? ot
  • 25. LSTM MEMORY BLOCK: HIDDEN STATE ht = ot ⋅tanh Ct( ) 25 yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ht xt ot
  • 26. LSTM MEMORY BLOCK: ALL TOGETHER xt yt ht-1 ht Ct-1 Ct it C̅ t ft Ct ht ot 26 yt = f Vyht( )
  • 28. GATED RECURENT UNIT (GRU) ▸ Proposed by Cho et al. [2014]. ▸ It is similar to LSTM in using gating functions, but differs from LSTM in that it doesn’t have a memory cell. ▸ Each GRU consists of: ▸ update gate ▸ reset get 28 ▸ Model parameters: ▸ xt is the input at time t ▸ Weight matrices: Wz, Wr, WH, Uz, Ur, UH
  • 29. GRU 29 xt yt ht-1 ht ht = 1− zt( )ht−1 + zt Ht zt = σ Wz xt +Uzht−1( ) Ht = tanh WH xt +UH rt ⋅ht−1( )( ) rt = σ Wr xt +Urht−1( )
  • 30. GRU: UPDATE GATE zt = σ Wz xt +Uzht−1( ) 30 xt yt ht-1 ht zt ? ▸ Update gate zt decides how much unit update its activation or content
  • 31. GRU: RESET GATE ▸ When rt close to 0 (gate off), it makes the unit act as it’s reading the first symbol from the input sequence, allowing it to forget previously computed states rt = σ Wr xt +Urht−1( ) 31 xt yt ht-1 ht rt zt ?
  • 32. GRU: CANDIDATE ACTIVTION ▸ Update gate zt decides how much unit update its activation or content. Ht = tanh WH xt +UH rt ⋅ht−1( )( ) 32 xt yt ht-1 ht Ht rt zt ?
  • 33. GRU: HIDDEN STATE ▸ Activation at time t is the linear interpolation between previous activations ht-1 and candidate activation Ht ht = 1− zt( )ht−1 + zt Ht 33 xt yt ht-1 ht Ht rt zt ht
  • 34. GRU: ALL TOGETHER 34 xt yt ht-1 ht Ht rt zt ht
  • 35. READ MORE ▸ Supervised Sequence Labelling with Recurrent Neural Networks
 http://www.cs.toronto.edu/~graves/preprint.pdf ▸ On the difficulty of training Recurrent Neural Networks
 http://arxiv.org/pdf/1211.5063.pdf ▸ Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
 http://arxiv.org/pdf/1412.3555v1.pdf ▸ Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
 http://arxiv.org/pdf/1406.1078v3.pdf ▸ Understanding LSTM Networks
 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ▸ General Sequence Learning using Recurrent Neural Networks
 https://www.youtube.com/watch?v=VINCQghQRuM 35
  • 36. END OF PART 1 ▸ @gakhov ▸ linkedin.com/in/gakhov ▸ www.datacrucis.com 36 THANK YOU