SlideShare a Scribd company logo
Sequence Modeling:
Recurrent and Recursive
Nets
Lecture slides for Chapter 10 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-27
(Goodfellow 2016)
Classical Dynamical Systems
=f(f(s(1)
; ✓); ✓)
ding the equation by repeatedly applying the definition in this
n expression that does not involve recurrence. Such an expres
epresented by a traditional directed acyclic computational gra
computational graph of equation 10.1 and equation 10.3 is illus
1.
s(t 1)
s(t 1)
s(t)
s(t)
s(t+1)
s(t+1)
f
f
s(... )
s(... )
s(... )
s(... )
f
f f
f f
f
1: The classical dynamical system described by equation 10.1, illustra
omputational graph. Each node represents the state at some time
maps the state at t to the state at t + 1. The same parameters (the s
to parametrize f) are used for all time steps.
other example, let us consider a dynamical system driven by an
)
Figure 10.1
(Goodfellow 2016)
Unfolding Computation
Graphs
training criterion, this summary might selectively keep some aspects of the past
sequence with more precision than other aspects. For example, if the RNN is used
in statistical language modeling, typically to predict the next word given previous
words, it may not be necessary to store all of the information in the input sequence
up to time t, but rather only enough information to predict the rest of the sentence.
The most demanding situation is when we ask h(t) to be rich enough to allow
one to approximately recover the input sequence, as in autoencoder frameworks
(chapter 14).
f
f
h
h
x
x
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
h(... )
h(... )
h(... )
h(... )
f
f
Unfold
f
f f
f f
Figure 10.2: A recurrent network with no outputs. This recurrent network just processes
information from the input x by incorporating it into the state h that is passed forward
through time. (Left)Circuit diagram. The black square indicates a delay of a single time
step. (Right)The same network seen as an unfolded computational graph, where each
node is now associated with one particular time instance.
Equation 10.5 can be drawn in two different ways. One way to draw the RNN
is with a diagram containing one node for every component that might exist in a
Figure 10.2
(Goodfellow 2016)
Recurrent Hidden Units
information flows.
10.2 Recurrent Neural Networks
Armed with the graph unrolling and parameter sharing ideas of section 10.1, we
can design a wide variety of recurrent neural networks.
U
U
V
V
W
W
o(t 1)
o(t 1)
h
h
o
o
y
y
L
L
x
x
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
W
W
W
W W
W W
W
h(... )
h(... )
h(... )
h(... )
V
V V
V V
V
U
U U
U U
U
Unfold
Figure 10.3: The computational graph to compute the training loss of a recurrent network
that maps an input sequence of x values to a corresponding sequence of output o values.
A loss L measures how far each o is from the corresponding training target y. When using
Figure 10.3
(Goodfellow 2016)
Recurrence through only the Output
U
V
W
o(t 1)
o(t 1)
h
h
o
o
y
y
L
L
x
x
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
W
W W W
o(... )
o(... )
h(... )
h(... )
V V V
U U U
Unfold
Figure 10.4: An RNN whose only recurrence is the feedback connection from the output
to the hidden layer. At each time step t, the input is xt, the hidden layer activations are
Figure 10.4
(Goodfellow 2016)
Sequence Input, Single Output
is that, for any loss function based on comparing the prediction at time t to the
training target at time t, all the time steps are decoupled. Training can thus be
parallelized, with the gradient for each step t computed in isolation. There is no
need to compute the output for the previous time step first, because the training
set provides the ideal value of that output.
h(t 1)
h(t 1)
W
h(t)
h(t) . . .
. . .
x(t 1)
x(t 1)
x(t)
x(t)
x(...)
x(...)
W W
U U U
h( )
h( )
x( )
x( )
W
U
o( )
o( )
y( )
y( )
L( )
L( )
V
. . .
. . .
Figure 10.5: Time-unfolded recurrent neural network with a single output at the end
of the sequence. Such a network can be used to summarize a sequence and produce a
Figure 10.5
(Goodfellow 2016)
Teacher Forcing
o(t 1)
o(t 1)
o(t)
o(t)
h(t 1)
h(t 1)
h(t)
h(t)
x(t 1)
x(t 1)
x(t)
x(t)
W
V V
U U
o(t 1)
o(t 1)
o(t)
o(t)
L(t 1)
L(t 1)
L(t)
L(t)
y(t 1)
y(t 1)
y(t)
y(t)
h(t 1)
h(t 1)
h(t)
h(t)
x(t 1)
x(t 1)
x(t)
x(t)
W
V V
U U
Train time Test time
Figure 10.6: Illustration of teacher forcing. Teacher forcing is a training technique that is
applicable to RNNs that have connections from their output to their hidden states at the
Figure 10.6
(Goodfellow 2016)
Fully Connected Graphical Model
HAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
y(1)
y(1)
y(2)
y(2)
y(3)
y(3)
y(4)
y(4)
y(5)
y(5)
y(...)
y(...)
gure 10.7: Fully connected graphical model for a sequence y(1)
, y(2)
, . . . , y(t)
, . . .: eve
st observation y(i)
may influence the conditional distribution of some y(t)
(for t >
ven the previous values. Parametrizing the graphical model directly according to th
Figure 10.7
(Goodfellow 2016)
RNN Graphical Model
L =
X
t
L(t)
(10.3
ere
L(t)
= log P(y(t)
= y(t)
| y(t 1)
, y(t 2)
, . . . , y(1)
). (10.3
y(1)
y(1)
y(2)
y(2)
y(3)
y(3)
y(4)
y(4)
y(5)
y(5)
y(...)
y(...)
h(1)
h(1)
h(2)
h(2)
h(3)
h(3)
h(4)
h(4)
h(5)
h(5)
h(... )
h(... )
ure 10.8: Introducing the state variable in the graphical model of the RNN, ev
ugh it is a deterministic function of its inputs, helps to see how we can obtain a v
cient parametrization, based on equation 10.5. Every stage in the sequence (for h
y(t)
) involves the same structure (the same number of inputs for each node) and c
re the same parameters with the other stages.
Figure 10.8
(The conditional distributions for the
hidden units are deterministic)
(Goodfellow 2016)
Vector to Sequence
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
W
W W W
h(... )
h(... )
h(... )
h(... )
V V V
U U U
x
x
y(...)
y(...)
R R R R R
Figure 10.9
(Goodfellow 2016)
Hidden and Output Recurrence
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
W
W W W
h(... )
h(... )
h(... )
h(... )
V V V
U U U
x(t 1)
x(t 1)
R
x(t)
x(t)
x(t+1)
x(t+1)
R R
Figure 10.10: A conditional recurrent neural network mapping a variable-length sequence
of x values into a distribution over sequences of y values of the same length. Compared to
figure 10.3, this RNN contains connections from the previous output to the current state.
These connections allow this RNN to model an arbitrary distribution over sequences of y
given sequences of x of the same length. The RNN of figure 10.3 is only able to represent
distributions in which the y values are conditionally independent from each other given
Figure 10.10
The output values are not forced to be
conditionally independent in this model.
(Goodfellow 2016)
Bidirectional RNN
CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
sequence still has one restriction, which is that the length of both sequences must
be the same. We describe how to remove this restriction in section 10.4.
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
g(t 1)
g(t 1)
g(t)
g(t)
g(t+1)
g(t+1)
Figure 10.11: Computation of a typical bidirectional recurrent neural network, meant
to learn to map input sequences x to target sequences y, with loss L(t)
at each step t.
The h recurrence propagates information forward in time (towards the right) while the
g recurrence propagates information backward in time (towards the left). Thus at each
Figure 10.11
(Goodfellow 2016)
Sequence to Sequence
Architecture
10.4 Encoder-Decoder Sequence-to-Sequence Architec-
tures
We have seen in figure 10.5 how an RNN can map an input sequence to a fixed-size
vector. We have seen in figure 10.9 how an RNN can map a fixed-size vector to a
sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can
map an input sequence to an output sequence of the same length.
Encoder
…
x(1)
x(1)
x(2)
x(2)
x(...)
x(...)
x(nx)
x(nx)
Decoder
…
y(1)
y(1)
y(2)
y(2)
y(...)
y(...)
y(ny)
y(ny)
C
C
Figure 10.12: Example of an encoder-decoder or sequence-to-sequence RNN architecture,
for learning to generate an output sequence (y(1)
, . . . , y(ny)
) given an input sequence
(x(1)
, x(2)
, . . . , x(nx)
). It is composed of an encoder RNN that reads the input sequence
Figure 10.12
(Goodfellow 2016)
Deep RNNs
h
y
x
z
(a) (b) (c)
x
h
y
x
h
y
Figure 10.13
(Goodfellow 2016)
Recursive Network
can be mitigated by introducing skip connections in the hidden-to-hidden path, as
illustrated in figure 10.13c.
10.6 Recursive Neural Networks
x(1)
x(1)
x(2)
x(2)
x(3)
x(3)
V V V
y
y
L
L
x(4)
x(4)
V
o
o
U W U W
U
W
Figure 10.14: A recursive network has a computational graph that generalizes that of the
recurrent network from a chain to a tree. A variable-size sequence x(1)
, x(2)
, . . . , x(t)
can
be mapped to a fixed-size representation (the output o), with a fixed set of parameters
Figure 10.14
(Goodfellow 2016)
Exploding Gradients from
Function Composition
TER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
60 40 20 0 20 40 60
Input coordinate
4
3
2
1
0
1
2
3
4
Projection
of
output
0
1
2
3
4
5
10.15: When composing many nonlinear functions (like the linear-tanh layer s
Figure 10.15
(Goodfellow 2016)
LSTM
CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
at each time step.
×
input input gate forget gate output gate
output
state
self-loop
×
+ ×
Figure 10.16: Block diagram of the LSTM recurrent network “cell.” Cells are connected
Figure 10.16
(Goodfellow 2016)
Gradient Clipping
y slowly enough that consecutive steps have approximately the same learning
A step size that is appropriate for a relatively linear part of the landscape is
n inappropriate and causes uphill motion if we enter a more curved part of the
scape on the next step.
w
b
J(
w,
b)
Without clipping
w
b
J(
w,
b)
With clipping
re 10.17: Example of the effect of gradient clipping in a recurrent network with
parameters w and b. Gradient clipping can make gradient descent perform more
onably in the vicinity of extremely steep cliffs. These steep cliffs commonly occur
Figure 10.17
(Goodfellow 2016)
Networks with explicit Memory
Task network,
controlling the memory
Memory cells
Writing
mechanism
Reading
mechanism
Figure 10.18: A schematic of an example of a network with an explicit memory, capturing
some of the key design elements of the neural Turing machine. In this diagram we
distinguish the “representation” part of the model (the “task network,” here a recurrent
Figure 10.18

More Related Content

What's hot

Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
The Statistical and Applied Mathematical Sciences Institute
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Olga Zinkevych
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Umberto Picchini
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
The Statistical and Applied Mathematical Sciences Institute
 
K map
K mapK map
Vertex cover Problem
Vertex cover ProblemVertex cover Problem
Vertex cover Problem
Gajanand Sharma
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
mehdi Cherti
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
The Statistical and Applied Mathematical Sciences Institute
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
appasami
 
BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINES
butest
 
Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem
sumit gyawali
 
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurProblem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Vivekananda Samiti
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
Umberto Picchini
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
The Statistical and Applied Mathematical Sciences Institute
 
Csci101 lect01 first_program
Csci101 lect01 first_programCsci101 lect01 first_program
Csci101 lect01 first_program
Elsayed Hemayed
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
Yan Xu
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
David Khosid
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 
Chapter 9 newer
Chapter 9   newerChapter 9   newer
Chapter 9 newer
Sonam Maurya
 

What's hot (20)

Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
 
K map
K mapK map
K map
 
Vertex cover Problem
Vertex cover ProblemVertex cover Problem
Vertex cover Problem
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
 
BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINES
 
Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem
 
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurProblem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Csci101 lect01 first_program
Csci101 lect01 first_programCsci101 lect01 first_program
Csci101 lect01 first_program
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Chapter 9 newer
Chapter 9   newerChapter 9   newer
Chapter 9 newer
 

Similar to 10_rnn.pdf

Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
inventy
 
Convolution problems
Convolution problemsConvolution problems
Convolution problems
PatrickMumba7
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
TsegaTeklewold1
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
TsegaTeklewold1
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
Matlab Assignment Experts
 
Matlab Assignment Help
Matlab Assignment HelpMatlab Assignment Help
Matlab Assignment Help
Matlab Assignment Experts
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemes
inventy
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
sohaib_alam
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
ieijjournal
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
ieijjournal
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
Chiheb Ben Hammouda
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
paperpublications3
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
Andres Mendez-Vazquez
 
Mid term solution
Mid term solutionMid term solution
Mid term solution
Gopi Saiteja
 
lec07_DFT.pdf
lec07_DFT.pdflec07_DFT.pdf
lec07_DFT.pdf
shannlevia123
 
Online Signals and Systems Assignment Help
Online Signals and Systems Assignment HelpOnline Signals and Systems Assignment Help
Online Signals and Systems Assignment Help
Matlab Assignment Experts
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
Matlab Assignment Experts
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
Matlab Assignment Experts
 
Unit 8
Unit 8Unit 8

Similar to 10_rnn.pdf (20)

Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
 
Convolution problems
Convolution problemsConvolution problems
Convolution problems
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
 
Matlab Assignment Help
Matlab Assignment HelpMatlab Assignment Help
Matlab Assignment Help
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemes
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
Mid term solution
Mid term solutionMid term solution
Mid term solution
 
lec07_DFT.pdf
lec07_DFT.pdflec07_DFT.pdf
lec07_DFT.pdf
 
Online Signals and Systems Assignment Help
Online Signals and Systems Assignment HelpOnline Signals and Systems Assignment Help
Online Signals and Systems Assignment Help
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
 
Unit 8
Unit 8Unit 8
Unit 8
 

More from KSChidanandKumarJSSS

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
KSChidanandKumarJSSS
 
16_graphical_models.pdf
16_graphical_models.pdf16_graphical_models.pdf
16_graphical_models.pdf
KSChidanandKumarJSSS
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
KSChidanandKumarJSSS
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
KSChidanandKumarJSSS
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
KSChidanandKumarJSSS
 
18_partition.pdf
18_partition.pdf18_partition.pdf
18_partition.pdf
KSChidanandKumarJSSS
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
KSChidanandKumarJSSS
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
KSChidanandKumarJSSS
 

More from KSChidanandKumarJSSS (8)

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
16_graphical_models.pdf
16_graphical_models.pdf16_graphical_models.pdf
16_graphical_models.pdf
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
18_partition.pdf
18_partition.pdf18_partition.pdf
18_partition.pdf
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
 

Recently uploaded

lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
Ghh
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
Bruce Bennett
 
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
pxyhy
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
AlessandroMartins454470
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
yuhofha
 
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
PaviBangera
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
Bruce Bennett
 
thyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatialthyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatial
Aditya Raghav
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
mmorales2173
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
taqyea
 
Lbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdfLbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdf
ashiquepa3
 
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
taqyed
 
Leadership Ambassador club Adventist module
Leadership Ambassador club Adventist moduleLeadership Ambassador club Adventist module
Leadership Ambassador club Adventist module
kakomaeric00
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
yuhofha
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
andreakaterasco
 
一比一原版(QU毕业证)皇后大学毕业证如何办理
一比一原版(QU毕业证)皇后大学毕业证如何办理一比一原版(QU毕业证)皇后大学毕业证如何办理
一比一原版(QU毕业证)皇后大学毕业证如何办理
yuhofha
 
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
2zjra9bn
 
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMAMISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
DK PAGEANT
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
Ghh
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
atwvhyhm
 

Recently uploaded (20)

lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
 
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
 
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
 
thyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatialthyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatial
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
 
Lbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdfLbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdf
 
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
一比一原版(U-Barcelona毕业证)巴塞罗那大学毕业证成绩单如何办理
 
Leadership Ambassador club Adventist module
Leadership Ambassador club Adventist moduleLeadership Ambassador club Adventist module
Leadership Ambassador club Adventist module
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
 
一比一原版(QU毕业证)皇后大学毕业证如何办理
一比一原版(QU毕业证)皇后大学毕业证如何办理一比一原版(QU毕业证)皇后大学毕业证如何办理
一比一原版(QU毕业证)皇后大学毕业证如何办理
 
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
 
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMAMISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
 

10_rnn.pdf

  • 1. Sequence Modeling: Recurrent and Recursive Nets Lecture slides for Chapter 10 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27
  • 2. (Goodfellow 2016) Classical Dynamical Systems =f(f(s(1) ; ✓); ✓) ding the equation by repeatedly applying the definition in this n expression that does not involve recurrence. Such an expres epresented by a traditional directed acyclic computational gra computational graph of equation 10.1 and equation 10.3 is illus 1. s(t 1) s(t 1) s(t) s(t) s(t+1) s(t+1) f f s(... ) s(... ) s(... ) s(... ) f f f f f f 1: The classical dynamical system described by equation 10.1, illustra omputational graph. Each node represents the state at some time maps the state at t to the state at t + 1. The same parameters (the s to parametrize f) are used for all time steps. other example, let us consider a dynamical system driven by an ) Figure 10.1
  • 3. (Goodfellow 2016) Unfolding Computation Graphs training criterion, this summary might selectively keep some aspects of the past sequence with more precision than other aspects. For example, if the RNN is used in statistical language modeling, typically to predict the next word given previous words, it may not be necessary to store all of the information in the input sequence up to time t, but rather only enough information to predict the rest of the sentence. The most demanding situation is when we ask h(t) to be rich enough to allow one to approximately recover the input sequence, as in autoencoder frameworks (chapter 14). f f h h x x h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) h(... ) h(... ) h(... ) h(... ) f f Unfold f f f f f Figure 10.2: A recurrent network with no outputs. This recurrent network just processes information from the input x by incorporating it into the state h that is passed forward through time. (Left)Circuit diagram. The black square indicates a delay of a single time step. (Right)The same network seen as an unfolded computational graph, where each node is now associated with one particular time instance. Equation 10.5 can be drawn in two different ways. One way to draw the RNN is with a diagram containing one node for every component that might exist in a Figure 10.2
  • 4. (Goodfellow 2016) Recurrent Hidden Units information flows. 10.2 Recurrent Neural Networks Armed with the graph unrolling and parameter sharing ideas of section 10.1, we can design a wide variety of recurrent neural networks. U U V V W W o(t 1) o(t 1) h h o o y y L L x x o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h(... ) h(... ) h(... ) h(... ) V V V V V V U U U U U U Unfold Figure 10.3: The computational graph to compute the training loss of a recurrent network that maps an input sequence of x values to a corresponding sequence of output o values. A loss L measures how far each o is from the corresponding training target y. When using Figure 10.3
  • 5. (Goodfellow 2016) Recurrence through only the Output U V W o(t 1) o(t 1) h h o o y y L L x x o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o(... ) o(... ) h(... ) h(... ) V V V U U U Unfold Figure 10.4: An RNN whose only recurrence is the feedback connection from the output to the hidden layer. At each time step t, the input is xt, the hidden layer activations are Figure 10.4
  • 6. (Goodfellow 2016) Sequence Input, Single Output is that, for any loss function based on comparing the prediction at time t to the training target at time t, all the time steps are decoupled. Training can thus be parallelized, with the gradient for each step t computed in isolation. There is no need to compute the output for the previous time step first, because the training set provides the ideal value of that output. h(t 1) h(t 1) W h(t) h(t) . . . . . . x(t 1) x(t 1) x(t) x(t) x(...) x(...) W W U U U h( ) h( ) x( ) x( ) W U o( ) o( ) y( ) y( ) L( ) L( ) V . . . . . . Figure 10.5: Time-unfolded recurrent neural network with a single output at the end of the sequence. Such a network can be used to summarize a sequence and produce a Figure 10.5
  • 7. (Goodfellow 2016) Teacher Forcing o(t 1) o(t 1) o(t) o(t) h(t 1) h(t 1) h(t) h(t) x(t 1) x(t 1) x(t) x(t) W V V U U o(t 1) o(t 1) o(t) o(t) L(t 1) L(t 1) L(t) L(t) y(t 1) y(t 1) y(t) y(t) h(t 1) h(t 1) h(t) h(t) x(t 1) x(t 1) x(t) x(t) W V V U U Train time Test time Figure 10.6: Illustration of teacher forcing. Teacher forcing is a training technique that is applicable to RNNs that have connections from their output to their hidden states at the Figure 10.6
  • 8. (Goodfellow 2016) Fully Connected Graphical Model HAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS y(1) y(1) y(2) y(2) y(3) y(3) y(4) y(4) y(5) y(5) y(...) y(...) gure 10.7: Fully connected graphical model for a sequence y(1) , y(2) , . . . , y(t) , . . .: eve st observation y(i) may influence the conditional distribution of some y(t) (for t > ven the previous values. Parametrizing the graphical model directly according to th Figure 10.7
  • 9. (Goodfellow 2016) RNN Graphical Model L = X t L(t) (10.3 ere L(t) = log P(y(t) = y(t) | y(t 1) , y(t 2) , . . . , y(1) ). (10.3 y(1) y(1) y(2) y(2) y(3) y(3) y(4) y(4) y(5) y(5) y(...) y(...) h(1) h(1) h(2) h(2) h(3) h(3) h(4) h(4) h(5) h(5) h(... ) h(... ) ure 10.8: Introducing the state variable in the graphical model of the RNN, ev ugh it is a deterministic function of its inputs, helps to see how we can obtain a v cient parametrization, based on equation 10.5. Every stage in the sequence (for h y(t) ) involves the same structure (the same number of inputs for each node) and c re the same parameters with the other stages. Figure 10.8 (The conditional distributions for the hidden units are deterministic)
  • 10. (Goodfellow 2016) Vector to Sequence o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) W W W W h(... ) h(... ) h(... ) h(... ) V V V U U U x x y(...) y(...) R R R R R Figure 10.9
  • 11. (Goodfellow 2016) Hidden and Output Recurrence o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) W W W W h(... ) h(... ) h(... ) h(... ) V V V U U U x(t 1) x(t 1) R x(t) x(t) x(t+1) x(t+1) R R Figure 10.10: A conditional recurrent neural network mapping a variable-length sequence of x values into a distribution over sequences of y values of the same length. Compared to figure 10.3, this RNN contains connections from the previous output to the current state. These connections allow this RNN to model an arbitrary distribution over sequences of y given sequences of x of the same length. The RNN of figure 10.3 is only able to represent distributions in which the y values are conditionally independent from each other given Figure 10.10 The output values are not forced to be conditionally independent in this model.
  • 12. (Goodfellow 2016) Bidirectional RNN CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS sequence still has one restriction, which is that the length of both sequences must be the same. We describe how to remove this restriction in section 10.4. o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) g(t 1) g(t 1) g(t) g(t) g(t+1) g(t+1) Figure 10.11: Computation of a typical bidirectional recurrent neural network, meant to learn to map input sequences x to target sequences y, with loss L(t) at each step t. The h recurrence propagates information forward in time (towards the right) while the g recurrence propagates information backward in time (towards the left). Thus at each Figure 10.11
  • 13. (Goodfellow 2016) Sequence to Sequence Architecture 10.4 Encoder-Decoder Sequence-to-Sequence Architec- tures We have seen in figure 10.5 how an RNN can map an input sequence to a fixed-size vector. We have seen in figure 10.9 how an RNN can map a fixed-size vector to a sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can map an input sequence to an output sequence of the same length. Encoder … x(1) x(1) x(2) x(2) x(...) x(...) x(nx) x(nx) Decoder … y(1) y(1) y(2) y(2) y(...) y(...) y(ny) y(ny) C C Figure 10.12: Example of an encoder-decoder or sequence-to-sequence RNN architecture, for learning to generate an output sequence (y(1) , . . . , y(ny) ) given an input sequence (x(1) , x(2) , . . . , x(nx) ). It is composed of an encoder RNN that reads the input sequence Figure 10.12
  • 14. (Goodfellow 2016) Deep RNNs h y x z (a) (b) (c) x h y x h y Figure 10.13
  • 15. (Goodfellow 2016) Recursive Network can be mitigated by introducing skip connections in the hidden-to-hidden path, as illustrated in figure 10.13c. 10.6 Recursive Neural Networks x(1) x(1) x(2) x(2) x(3) x(3) V V V y y L L x(4) x(4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of the recurrent network from a chain to a tree. A variable-size sequence x(1) , x(2) , . . . , x(t) can be mapped to a fixed-size representation (the output o), with a fixed set of parameters Figure 10.14
  • 16. (Goodfellow 2016) Exploding Gradients from Function Composition TER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS 60 40 20 0 20 40 60 Input coordinate 4 3 2 1 0 1 2 3 4 Projection of output 0 1 2 3 4 5 10.15: When composing many nonlinear functions (like the linear-tanh layer s Figure 10.15
  • 17. (Goodfellow 2016) LSTM CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS at each time step. × input input gate forget gate output gate output state self-loop × + × Figure 10.16: Block diagram of the LSTM recurrent network “cell.” Cells are connected Figure 10.16
  • 18. (Goodfellow 2016) Gradient Clipping y slowly enough that consecutive steps have approximately the same learning A step size that is appropriate for a relatively linear part of the landscape is n inappropriate and causes uphill motion if we enter a more curved part of the scape on the next step. w b J( w, b) Without clipping w b J( w, b) With clipping re 10.17: Example of the effect of gradient clipping in a recurrent network with parameters w and b. Gradient clipping can make gradient descent perform more onably in the vicinity of extremely steep cliffs. These steep cliffs commonly occur Figure 10.17
  • 19. (Goodfellow 2016) Networks with explicit Memory Task network, controlling the memory Memory cells Writing mechanism Reading mechanism Figure 10.18: A schematic of an example of a network with an explicit memory, capturing some of the key design elements of the neural Turing machine. In this diagram we distinguish the “representation” part of the model (the “task network,” here a recurrent Figure 10.18