SlideShare a Scribd company logo
1 of 19
Download to read offline
Sequence Modeling:
Recurrent and Recursive
Nets
Lecture slides for Chapter 10 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-27
(Goodfellow 2016)
Classical Dynamical Systems
=f(f(s(1)
; ✓); ✓)
ding the equation by repeatedly applying the definition in this
n expression that does not involve recurrence. Such an expres
epresented by a traditional directed acyclic computational gra
computational graph of equation 10.1 and equation 10.3 is illus
1.
s(t 1)
s(t 1)
s(t)
s(t)
s(t+1)
s(t+1)
f
f
s(... )
s(... )
s(... )
s(... )
f
f f
f f
f
1: The classical dynamical system described by equation 10.1, illustra
omputational graph. Each node represents the state at some time
maps the state at t to the state at t + 1. The same parameters (the s
to parametrize f) are used for all time steps.
other example, let us consider a dynamical system driven by an
)
Figure 10.1
(Goodfellow 2016)
Unfolding Computation
Graphs
training criterion, this summary might selectively keep some aspects of the past
sequence with more precision than other aspects. For example, if the RNN is used
in statistical language modeling, typically to predict the next word given previous
words, it may not be necessary to store all of the information in the input sequence
up to time t, but rather only enough information to predict the rest of the sentence.
The most demanding situation is when we ask h(t) to be rich enough to allow
one to approximately recover the input sequence, as in autoencoder frameworks
(chapter 14).
f
f
h
h
x
x
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
h(... )
h(... )
h(... )
h(... )
f
f
Unfold
f
f f
f f
Figure 10.2: A recurrent network with no outputs. This recurrent network just processes
information from the input x by incorporating it into the state h that is passed forward
through time. (Left)Circuit diagram. The black square indicates a delay of a single time
step. (Right)The same network seen as an unfolded computational graph, where each
node is now associated with one particular time instance.
Equation 10.5 can be drawn in two different ways. One way to draw the RNN
is with a diagram containing one node for every component that might exist in a
Figure 10.2
(Goodfellow 2016)
Recurrent Hidden Units
information flows.
10.2 Recurrent Neural Networks
Armed with the graph unrolling and parameter sharing ideas of section 10.1, we
can design a wide variety of recurrent neural networks.
U
U
V
V
W
W
o(t 1)
o(t 1)
h
h
o
o
y
y
L
L
x
x
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
W
W
W
W W
W W
W
h(... )
h(... )
h(... )
h(... )
V
V V
V V
V
U
U U
U U
U
Unfold
Figure 10.3: The computational graph to compute the training loss of a recurrent network
that maps an input sequence of x values to a corresponding sequence of output o values.
A loss L measures how far each o is from the corresponding training target y. When using
Figure 10.3
(Goodfellow 2016)
Recurrence through only the Output
U
V
W
o(t 1)
o(t 1)
h
h
o
o
y
y
L
L
x
x
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
W
W W W
o(... )
o(... )
h(... )
h(... )
V V V
U U U
Unfold
Figure 10.4: An RNN whose only recurrence is the feedback connection from the output
to the hidden layer. At each time step t, the input is xt, the hidden layer activations are
Figure 10.4
(Goodfellow 2016)
Sequence Input, Single Output
is that, for any loss function based on comparing the prediction at time t to the
training target at time t, all the time steps are decoupled. Training can thus be
parallelized, with the gradient for each step t computed in isolation. There is no
need to compute the output for the previous time step first, because the training
set provides the ideal value of that output.
h(t 1)
h(t 1)
W
h(t)
h(t) . . .
. . .
x(t 1)
x(t 1)
x(t)
x(t)
x(...)
x(...)
W W
U U U
h( )
h( )
x( )
x( )
W
U
o( )
o( )
y( )
y( )
L( )
L( )
V
. . .
. . .
Figure 10.5: Time-unfolded recurrent neural network with a single output at the end
of the sequence. Such a network can be used to summarize a sequence and produce a
Figure 10.5
(Goodfellow 2016)
Teacher Forcing
o(t 1)
o(t 1)
o(t)
o(t)
h(t 1)
h(t 1)
h(t)
h(t)
x(t 1)
x(t 1)
x(t)
x(t)
W
V V
U U
o(t 1)
o(t 1)
o(t)
o(t)
L(t 1)
L(t 1)
L(t)
L(t)
y(t 1)
y(t 1)
y(t)
y(t)
h(t 1)
h(t 1)
h(t)
h(t)
x(t 1)
x(t 1)
x(t)
x(t)
W
V V
U U
Train time Test time
Figure 10.6: Illustration of teacher forcing. Teacher forcing is a training technique that is
applicable to RNNs that have connections from their output to their hidden states at the
Figure 10.6
(Goodfellow 2016)
Fully Connected Graphical Model
HAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
y(1)
y(1)
y(2)
y(2)
y(3)
y(3)
y(4)
y(4)
y(5)
y(5)
y(...)
y(...)
gure 10.7: Fully connected graphical model for a sequence y(1)
, y(2)
, . . . , y(t)
, . . .: eve
st observation y(i)
may influence the conditional distribution of some y(t)
(for t >
ven the previous values. Parametrizing the graphical model directly according to th
Figure 10.7
(Goodfellow 2016)
RNN Graphical Model
L =
X
t
L(t)
(10.3
ere
L(t)
= log P(y(t)
= y(t)
| y(t 1)
, y(t 2)
, . . . , y(1)
). (10.3
y(1)
y(1)
y(2)
y(2)
y(3)
y(3)
y(4)
y(4)
y(5)
y(5)
y(...)
y(...)
h(1)
h(1)
h(2)
h(2)
h(3)
h(3)
h(4)
h(4)
h(5)
h(5)
h(... )
h(... )
ure 10.8: Introducing the state variable in the graphical model of the RNN, ev
ugh it is a deterministic function of its inputs, helps to see how we can obtain a v
cient parametrization, based on equation 10.5. Every stage in the sequence (for h
y(t)
) involves the same structure (the same number of inputs for each node) and c
re the same parameters with the other stages.
Figure 10.8
(The conditional distributions for the
hidden units are deterministic)
(Goodfellow 2016)
Vector to Sequence
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
W
W W W
h(... )
h(... )
h(... )
h(... )
V V V
U U U
x
x
y(...)
y(...)
R R R R R
Figure 10.9
(Goodfellow 2016)
Hidden and Output Recurrence
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
W
W W W
h(... )
h(... )
h(... )
h(... )
V V V
U U U
x(t 1)
x(t 1)
R
x(t)
x(t)
x(t+1)
x(t+1)
R R
Figure 10.10: A conditional recurrent neural network mapping a variable-length sequence
of x values into a distribution over sequences of y values of the same length. Compared to
figure 10.3, this RNN contains connections from the previous output to the current state.
These connections allow this RNN to model an arbitrary distribution over sequences of y
given sequences of x of the same length. The RNN of figure 10.3 is only able to represent
distributions in which the y values are conditionally independent from each other given
Figure 10.10
The output values are not forced to be
conditionally independent in this model.
(Goodfellow 2016)
Bidirectional RNN
CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
sequence still has one restriction, which is that the length of both sequences must
be the same. We describe how to remove this restriction in section 10.4.
o(t 1)
o(t 1)
o(t)
o(t)
o(t+1)
o(t+1)
L(t 1)
L(t 1)
L(t)
L(t)
L(t+1)
L(t+1)
y(t 1)
y(t 1)
y(t)
y(t)
y(t+1)
y(t+1)
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
g(t 1)
g(t 1)
g(t)
g(t)
g(t+1)
g(t+1)
Figure 10.11: Computation of a typical bidirectional recurrent neural network, meant
to learn to map input sequences x to target sequences y, with loss L(t)
at each step t.
The h recurrence propagates information forward in time (towards the right) while the
g recurrence propagates information backward in time (towards the left). Thus at each
Figure 10.11
(Goodfellow 2016)
Sequence to Sequence
Architecture
10.4 Encoder-Decoder Sequence-to-Sequence Architec-
tures
We have seen in figure 10.5 how an RNN can map an input sequence to a fixed-size
vector. We have seen in figure 10.9 how an RNN can map a fixed-size vector to a
sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can
map an input sequence to an output sequence of the same length.
Encoder
…
x(1)
x(1)
x(2)
x(2)
x(...)
x(...)
x(nx)
x(nx)
Decoder
…
y(1)
y(1)
y(2)
y(2)
y(...)
y(...)
y(ny)
y(ny)
C
C
Figure 10.12: Example of an encoder-decoder or sequence-to-sequence RNN architecture,
for learning to generate an output sequence (y(1)
, . . . , y(ny)
) given an input sequence
(x(1)
, x(2)
, . . . , x(nx)
). It is composed of an encoder RNN that reads the input sequence
Figure 10.12
(Goodfellow 2016)
Deep RNNs
h
y
x
z
(a) (b) (c)
x
h
y
x
h
y
Figure 10.13
(Goodfellow 2016)
Recursive Network
can be mitigated by introducing skip connections in the hidden-to-hidden path, as
illustrated in figure 10.13c.
10.6 Recursive Neural Networks
x(1)
x(1)
x(2)
x(2)
x(3)
x(3)
V V V
y
y
L
L
x(4)
x(4)
V
o
o
U W U W
U
W
Figure 10.14: A recursive network has a computational graph that generalizes that of the
recurrent network from a chain to a tree. A variable-size sequence x(1)
, x(2)
, . . . , x(t)
can
be mapped to a fixed-size representation (the output o), with a fixed set of parameters
Figure 10.14
(Goodfellow 2016)
Exploding Gradients from
Function Composition
TER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
60 40 20 0 20 40 60
Input coordinate
4
3
2
1
0
1
2
3
4
Projection
of
output
0
1
2
3
4
5
10.15: When composing many nonlinear functions (like the linear-tanh layer s
Figure 10.15
(Goodfellow 2016)
LSTM
CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS
at each time step.
×
input input gate forget gate output gate
output
state
self-loop
×
+ ×
Figure 10.16: Block diagram of the LSTM recurrent network “cell.” Cells are connected
Figure 10.16
(Goodfellow 2016)
Gradient Clipping
y slowly enough that consecutive steps have approximately the same learning
A step size that is appropriate for a relatively linear part of the landscape is
n inappropriate and causes uphill motion if we enter a more curved part of the
scape on the next step.
w
b
J(
w,
b)
Without clipping
w
b
J(
w,
b)
With clipping
re 10.17: Example of the effect of gradient clipping in a recurrent network with
parameters w and b. Gradient clipping can make gradient descent perform more
onably in the vicinity of extremely steep cliffs. These steep cliffs commonly occur
Figure 10.17
(Goodfellow 2016)
Networks with explicit Memory
Task network,
controlling the memory
Memory cells
Writing
mechanism
Reading
mechanism
Figure 10.18: A schematic of an example of a network with an explicit memory, capturing
some of the key design elements of the neural Turing machine. In this diagram we
distinguish the “representation” part of the model (the “task network,” here a recurrent
Figure 10.18

More Related Content

What's hot

Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Olga Zinkevych
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationUmberto Picchini
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayesmehdi Cherti
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016appasami
 
BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINESbutest
 
Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem sumit gyawali
 
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurProblem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurVivekananda Samiti
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Csci101 lect01 first_program
Csci101 lect01 first_programCsci101 lect01 first_program
Csci101 lect01 first_programElsayed Hemayed
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNEDavid Khosid
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 

What's hot (20)

Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
 
K map
K mapK map
K map
 
Vertex cover Problem
Vertex cover ProblemVertex cover Problem
Vertex cover Problem
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
 
BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINES
 
Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem Presentation of daa on approximation algorithm and vertex cover problem
Presentation of daa on approximation algorithm and vertex cover problem
 
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurProblem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Problem set3 | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Csci101 lect01 first_program
Csci101 lect01 first_programCsci101 lect01 first_program
Csci101 lect01 first_program
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Chapter 9 newer
Chapter 9   newerChapter 9   newer
Chapter 9 newer
 

Similar to 10_rnn.pdf

Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...inventy
 
Convolution problems
Convolution problemsConvolution problems
Convolution problemsPatrickMumba7
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfTsegaTeklewold1
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfTsegaTeklewold1
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemesinventy
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)sohaib_alam
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...ieijjournal
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...ieijjournal
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)paperpublications3
 

Similar to 10_rnn.pdf (20)

Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
Convergence Theorems for Implicit Iteration Scheme With Errors For A Finite F...
 
Convolution problems
Convolution problemsConvolution problems
Convolution problems
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
 
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdfcab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
cab2602ff858c51113591d17321a80fc_MITRES_6_007S11_hw04.pdf
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
 
Matlab Assignment Help
Matlab Assignment HelpMatlab Assignment Help
Matlab Assignment Help
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemes
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
Mid term solution
Mid term solutionMid term solution
Mid term solution
 
lec07_DFT.pdf
lec07_DFT.pdflec07_DFT.pdf
lec07_DFT.pdf
 
Online Signals and Systems Assignment Help
Online Signals and Systems Assignment HelpOnline Signals and Systems Assignment Help
Online Signals and Systems Assignment Help
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
 
Unit 8
Unit 8Unit 8
Unit 8
 

More from KSChidanandKumarJSSS (8)

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
16_graphical_models.pdf
16_graphical_models.pdf16_graphical_models.pdf
16_graphical_models.pdf
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
18_partition.pdf
18_partition.pdf18_partition.pdf
18_partition.pdf
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
 

Recently uploaded

WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)Delhi Call girls
 
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证obuhobo
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Top Rated Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理cowagem
 
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130Suhani Kapoor
 
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...robinsonayot
 
Biography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleBiography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleHafizMuhammadAbdulla5
 
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfExperience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfSoham Mondal
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...shivangimorya083
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...ranjana rawat
 
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girls
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call GirlsDelhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girls
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girlsshivangimorya083
 
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceanilsa9823
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Masuk Ahmed
 
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMotilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMaulikVasani1
 
CFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceCFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceSanjay Bokadia
 

Recently uploaded (20)

WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
 
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sensual Moments: +91 9999965857 Independent Call Girls Paharganj Delhi {{ Mon...
Sensual Moments: +91 9999965857 Independent Call Girls Paharganj Delhi {{ Mon...Sensual Moments: +91 9999965857 Independent Call Girls Paharganj Delhi {{ Mon...
Sensual Moments: +91 9999965857 Independent Call Girls Paharganj Delhi {{ Mon...
 
Top Rated Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Warje ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理OSU毕业证留学文凭,制做办理
OSU毕业证留学文凭,制做办理
 
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
 
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
TEST BANK For Evidence-Based Practice for Nurses Appraisal and Application of...
 
Biography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleBiography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO Google
 
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Road Call Me 7737669865 Budget Friendly No Advance Booking
 
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfExperience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
 
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...Vip  Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
Vip Modals Call Girls (Delhi) Rohini 9711199171✔️ Full night Service for one...
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
Book Paid Saswad Call Girls Pune 8250192130Low Budget Full Independent High P...
 
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girls
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call GirlsDelhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girls
Delhi Call Girls In Atta Market 9711199012 Book Your One night Stand Call Girls
 
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Nishatganj Lucknow best sexual service
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..
 
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptxMotilal Oswal Gift City Fund PPT - Apr 2024.pptx
Motilal Oswal Gift City Fund PPT - Apr 2024.pptx
 
CFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector ExperienceCFO_SB_Career History_Multi Sector Experience
CFO_SB_Career History_Multi Sector Experience
 

10_rnn.pdf

  • 1. Sequence Modeling: Recurrent and Recursive Nets Lecture slides for Chapter 10 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27
  • 2. (Goodfellow 2016) Classical Dynamical Systems =f(f(s(1) ; ✓); ✓) ding the equation by repeatedly applying the definition in this n expression that does not involve recurrence. Such an expres epresented by a traditional directed acyclic computational gra computational graph of equation 10.1 and equation 10.3 is illus 1. s(t 1) s(t 1) s(t) s(t) s(t+1) s(t+1) f f s(... ) s(... ) s(... ) s(... ) f f f f f f 1: The classical dynamical system described by equation 10.1, illustra omputational graph. Each node represents the state at some time maps the state at t to the state at t + 1. The same parameters (the s to parametrize f) are used for all time steps. other example, let us consider a dynamical system driven by an ) Figure 10.1
  • 3. (Goodfellow 2016) Unfolding Computation Graphs training criterion, this summary might selectively keep some aspects of the past sequence with more precision than other aspects. For example, if the RNN is used in statistical language modeling, typically to predict the next word given previous words, it may not be necessary to store all of the information in the input sequence up to time t, but rather only enough information to predict the rest of the sentence. The most demanding situation is when we ask h(t) to be rich enough to allow one to approximately recover the input sequence, as in autoencoder frameworks (chapter 14). f f h h x x h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) h(... ) h(... ) h(... ) h(... ) f f Unfold f f f f f Figure 10.2: A recurrent network with no outputs. This recurrent network just processes information from the input x by incorporating it into the state h that is passed forward through time. (Left)Circuit diagram. The black square indicates a delay of a single time step. (Right)The same network seen as an unfolded computational graph, where each node is now associated with one particular time instance. Equation 10.5 can be drawn in two different ways. One way to draw the RNN is with a diagram containing one node for every component that might exist in a Figure 10.2
  • 4. (Goodfellow 2016) Recurrent Hidden Units information flows. 10.2 Recurrent Neural Networks Armed with the graph unrolling and parameter sharing ideas of section 10.1, we can design a wide variety of recurrent neural networks. U U V V W W o(t 1) o(t 1) h h o o y y L L x x o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h(... ) h(... ) h(... ) h(... ) V V V V V V U U U U U U Unfold Figure 10.3: The computational graph to compute the training loss of a recurrent network that maps an input sequence of x values to a corresponding sequence of output o values. A loss L measures how far each o is from the corresponding training target y. When using Figure 10.3
  • 5. (Goodfellow 2016) Recurrence through only the Output U V W o(t 1) o(t 1) h h o o y y L L x x o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o(... ) o(... ) h(... ) h(... ) V V V U U U Unfold Figure 10.4: An RNN whose only recurrence is the feedback connection from the output to the hidden layer. At each time step t, the input is xt, the hidden layer activations are Figure 10.4
  • 6. (Goodfellow 2016) Sequence Input, Single Output is that, for any loss function based on comparing the prediction at time t to the training target at time t, all the time steps are decoupled. Training can thus be parallelized, with the gradient for each step t computed in isolation. There is no need to compute the output for the previous time step first, because the training set provides the ideal value of that output. h(t 1) h(t 1) W h(t) h(t) . . . . . . x(t 1) x(t 1) x(t) x(t) x(...) x(...) W W U U U h( ) h( ) x( ) x( ) W U o( ) o( ) y( ) y( ) L( ) L( ) V . . . . . . Figure 10.5: Time-unfolded recurrent neural network with a single output at the end of the sequence. Such a network can be used to summarize a sequence and produce a Figure 10.5
  • 7. (Goodfellow 2016) Teacher Forcing o(t 1) o(t 1) o(t) o(t) h(t 1) h(t 1) h(t) h(t) x(t 1) x(t 1) x(t) x(t) W V V U U o(t 1) o(t 1) o(t) o(t) L(t 1) L(t 1) L(t) L(t) y(t 1) y(t 1) y(t) y(t) h(t 1) h(t 1) h(t) h(t) x(t 1) x(t 1) x(t) x(t) W V V U U Train time Test time Figure 10.6: Illustration of teacher forcing. Teacher forcing is a training technique that is applicable to RNNs that have connections from their output to their hidden states at the Figure 10.6
  • 8. (Goodfellow 2016) Fully Connected Graphical Model HAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS y(1) y(1) y(2) y(2) y(3) y(3) y(4) y(4) y(5) y(5) y(...) y(...) gure 10.7: Fully connected graphical model for a sequence y(1) , y(2) , . . . , y(t) , . . .: eve st observation y(i) may influence the conditional distribution of some y(t) (for t > ven the previous values. Parametrizing the graphical model directly according to th Figure 10.7
  • 9. (Goodfellow 2016) RNN Graphical Model L = X t L(t) (10.3 ere L(t) = log P(y(t) = y(t) | y(t 1) , y(t 2) , . . . , y(1) ). (10.3 y(1) y(1) y(2) y(2) y(3) y(3) y(4) y(4) y(5) y(5) y(...) y(...) h(1) h(1) h(2) h(2) h(3) h(3) h(4) h(4) h(5) h(5) h(... ) h(... ) ure 10.8: Introducing the state variable in the graphical model of the RNN, ev ugh it is a deterministic function of its inputs, helps to see how we can obtain a v cient parametrization, based on equation 10.5. Every stage in the sequence (for h y(t) ) involves the same structure (the same number of inputs for each node) and c re the same parameters with the other stages. Figure 10.8 (The conditional distributions for the hidden units are deterministic)
  • 10. (Goodfellow 2016) Vector to Sequence o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) W W W W h(... ) h(... ) h(... ) h(... ) V V V U U U x x y(...) y(...) R R R R R Figure 10.9
  • 11. (Goodfellow 2016) Hidden and Output Recurrence o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) W W W W h(... ) h(... ) h(... ) h(... ) V V V U U U x(t 1) x(t 1) R x(t) x(t) x(t+1) x(t+1) R R Figure 10.10: A conditional recurrent neural network mapping a variable-length sequence of x values into a distribution over sequences of y values of the same length. Compared to figure 10.3, this RNN contains connections from the previous output to the current state. These connections allow this RNN to model an arbitrary distribution over sequences of y given sequences of x of the same length. The RNN of figure 10.3 is only able to represent distributions in which the y values are conditionally independent from each other given Figure 10.10 The output values are not forced to be conditionally independent in this model.
  • 12. (Goodfellow 2016) Bidirectional RNN CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS sequence still has one restriction, which is that the length of both sequences must be the same. We describe how to remove this restriction in section 10.4. o(t 1) o(t 1) o(t) o(t) o(t+1) o(t+1) L(t 1) L(t 1) L(t) L(t) L(t+1) L(t+1) y(t 1) y(t 1) y(t) y(t) y(t+1) y(t+1) h(t 1) h(t 1) h(t) h(t) h(t+1) h(t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) g(t 1) g(t 1) g(t) g(t) g(t+1) g(t+1) Figure 10.11: Computation of a typical bidirectional recurrent neural network, meant to learn to map input sequences x to target sequences y, with loss L(t) at each step t. The h recurrence propagates information forward in time (towards the right) while the g recurrence propagates information backward in time (towards the left). Thus at each Figure 10.11
  • 13. (Goodfellow 2016) Sequence to Sequence Architecture 10.4 Encoder-Decoder Sequence-to-Sequence Architec- tures We have seen in figure 10.5 how an RNN can map an input sequence to a fixed-size vector. We have seen in figure 10.9 how an RNN can map a fixed-size vector to a sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can map an input sequence to an output sequence of the same length. Encoder … x(1) x(1) x(2) x(2) x(...) x(...) x(nx) x(nx) Decoder … y(1) y(1) y(2) y(2) y(...) y(...) y(ny) y(ny) C C Figure 10.12: Example of an encoder-decoder or sequence-to-sequence RNN architecture, for learning to generate an output sequence (y(1) , . . . , y(ny) ) given an input sequence (x(1) , x(2) , . . . , x(nx) ). It is composed of an encoder RNN that reads the input sequence Figure 10.12
  • 14. (Goodfellow 2016) Deep RNNs h y x z (a) (b) (c) x h y x h y Figure 10.13
  • 15. (Goodfellow 2016) Recursive Network can be mitigated by introducing skip connections in the hidden-to-hidden path, as illustrated in figure 10.13c. 10.6 Recursive Neural Networks x(1) x(1) x(2) x(2) x(3) x(3) V V V y y L L x(4) x(4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of the recurrent network from a chain to a tree. A variable-size sequence x(1) , x(2) , . . . , x(t) can be mapped to a fixed-size representation (the output o), with a fixed set of parameters Figure 10.14
  • 16. (Goodfellow 2016) Exploding Gradients from Function Composition TER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS 60 40 20 0 20 40 60 Input coordinate 4 3 2 1 0 1 2 3 4 Projection of output 0 1 2 3 4 5 10.15: When composing many nonlinear functions (like the linear-tanh layer s Figure 10.15
  • 17. (Goodfellow 2016) LSTM CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS at each time step. × input input gate forget gate output gate output state self-loop × + × Figure 10.16: Block diagram of the LSTM recurrent network “cell.” Cells are connected Figure 10.16
  • 18. (Goodfellow 2016) Gradient Clipping y slowly enough that consecutive steps have approximately the same learning A step size that is appropriate for a relatively linear part of the landscape is n inappropriate and causes uphill motion if we enter a more curved part of the scape on the next step. w b J( w, b) Without clipping w b J( w, b) With clipping re 10.17: Example of the effect of gradient clipping in a recurrent network with parameters w and b. Gradient clipping can make gradient descent perform more onably in the vicinity of extremely steep cliffs. These steep cliffs commonly occur Figure 10.17
  • 19. (Goodfellow 2016) Networks with explicit Memory Task network, controlling the memory Memory cells Writing mechanism Reading mechanism Figure 10.18: A schematic of an example of a network with an explicit memory, capturing some of the key design elements of the neural Turing machine. In this diagram we distinguish the “representation” part of the model (the “task network,” here a recurrent Figure 10.18