SlideShare a Scribd company logo
1 of 39
Lecture 6 Smaller Network: RNN
This is our fully connected network. If x1 .... xn, n is very large and growing,
this network would become too large. We now will input one xi at a time,
and re-use the same edge weights.
Recurrent Neural Network
How does RNN reduce complexity?
f
h0
h1
y1
x1
f h2
y2
x2
f h3
y3
x3
……
No matter how long the input/output sequence is, we only
need one function f. If f’s are different, then it becomes a
feedforward NN. This may be treated as another compression
from fully connected network.
h and h’ are vectors with
the same dimension
 Given function f: h’,y=f(h,x)
Deep RNN
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
……
f2
g
0
g
1
z1
f2
g
2
z2
f2
g
3
z3
……
…
…
…
h’,y = f1(h,x), g’,z = f2(g,y)
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
f2
g
0
g
1 f2
g
2 f2
g
3
Bidirectional RNN
x1
x2 x3
z1
z2 z3
f3 f3 f3
p
1
p
2
p
3
p=f3(y,z)
y,h=f1(x,h) z,g = f2(g,x)
Pyramid RNN
 Reducing the number of time steps
W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural
network for large vocabulary conversational speech recognition,” ICASSP, 2016
Bidirectional
RNN
Significantly speed up training
Naïve RNN

f
h h'
y
x
We have ignored the bias
h'
y Wo
Wh
h’
softmax
Wi
x
h
Note, y is computed
from h’
Problems with naive RNN
When dealing with a time series, it tends
to forget old information. When there is a
distant relationship of unknown length, we
wish to have a “memory” to it.
Vanishing gradient problem.
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
LSTM
The core idea is this cell
state Ct, it is changed
slowly, with only minor
linear interactions. It is very
easy for information to flow
along it unchanged.
ht-1
Ct-1
This sigmoid gate
determines how much
information goes thru
This decides what info
Is to add to the cell state
Output gate
Controls what
goes into output
Forget input
gate gate
Why sigmoid or tanh:
Sigmoid: 0,1 gating as switch.
Vanishing gradient problem in
LSTM is handled already.
ReLU replaces tanh ok?
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell
state to output
RNN vs LSTM
Peephole LSTM
Allows “peeping into the memory”
Naïve RNN vs LSTM
c changes slowly
h changes faster
ct is ct-1 added by something
ht and ht-1 can be very different
Naïve
RNN
ht
yt
xt
ht-1
LSTM
yt
xt
ct
ht
ht-1
ct-1
xt
z
zi
zf zo
ht-1
ct-1
z
xt
ht-1
W
zi
xt
ht-1
Wi
z
f
xt
ht-1
Wf
zo
xt
ht-1
Wo
= σ( )
= σ( )
= σ( )
Information flow of LSTM
Controls
forget gate
Controls
input gate
Updating
information
Controls
Output gate
These 4 matrix
computation should
be done concurrently.
xt
z
zi
zf zo
ht-1
ct-1
“peephole”
z W
xt
ht-1
ct-1
diagonal
zi
z
f
zo
obtained by the same way
=tanh( )
Information flow of LSTM
ht
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
tanh
ct = zf  ct-1 + ziz
ht = zo  tanh(ct)
yt = σ(W’ ht)
Information flow of LSTM
Element-wise multiply
LSTM information flow
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
xt+1
z
zi
zf zo
yt+1
ht
ct+1
tanh tanh
ht+1
Information flow of LSTM
GRU – gated recurrent unit
(more compression)
It combines the forget and input into a single update gate.
It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.
reset gate
X,*: element-wise multiply
LSTM
Update gate
GRUs also takes xt and ht-1 as inputs. They perform some
calculations and then pass along ht. What makes them different
from LSTMs is that GRUs don't need the cell layer to pass values
along. The calculations within each iteration insure that the ht
values being passed along either retain a high amount of old
information or are jump-started with a high amount of new
information.
x f1 a1
f2 a2
f3 a3
f4 y
x1
h0
f h1
x2
f
x3
h2
f
x4
h3 f g y4
t is layer
t is time step
We will turn the recurrent network 90 degrees.
Feed-forward vs Recurrent Network
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer
at = ft(at-1) = σ(Wtat-1 + bt)
at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
ht-1
r z
yt
xt
ht-1
h'
xt
1-
ht
reset update
No input xt at
each step
at-1 is the output of
the (t-1)-th layer
at is the output of
the t-th layer
No output yt at
each step
No reset gate
at-1 at
at-1
Highway Network
• Residual Network
• Highway Network
Deep Residual Learning for Image
Recognition
http://arxiv.org/abs/1512.03385
Training Very Deep Networks
https://arxiv.org/pdf/1507.06228v
2.pdf
+
copy
copy
Gate
controlle
r
at-1
at-1
at at
at-1
h’
h’
z controls red arrow
h’=σ(Wat-1)
z=σ(W’at-1)
at = z  at-1 + (1-z)  h
Input layer
output layer
Input layer
output layer
Input layer
output layer
Highway Network automatically
determines the layers needed!
Highway Network Experiments
Grid LSTM
LSTM
y
x
c’
h’
h
c
Grid
LSTM
c’
h’
h
c
Memory for both
time and depth
b
a
b’
a’
time
depth
Grid LSTM
Grid
LSTM
c’
h’
h
c
b
a
b’
a’
h'
z
zi
zf zo
h
c
tanh
c'
a
b
a'
b'
You can generalize this to 3D, and more.
Applications of LSTM / RNN
Neural machine translation
LSTM
Sequence to sequence chat model
Chat with context
U: Hi
U: Hi
M: Hi
M: Hello
M: Hi
M: Hello
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
Baidu’s speech recognition using RNN
Attention
Image caption generation using attention
(From CY Lee lecture)
filter filter filter
filter filter filter
match 0.7
CNN
filter filter filter
filter filter filter
z0
A vector for
each region
z0 is initial parameter, it is also learned
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
0.7 0.1 0.1
0.1 0.0 0.0
weighted
sum
z1
Word 1
z0
Attention to
a region
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
z0
0.0 0.8 0.2
0.0 0.0 0.0
weighted
sum
z1
Word 1
z2
Word 2
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo
Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV,
2015
* Possible project?

More Related Content

Similar to Deep-Learning-2017-Lecture6RNN.ppt

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Iosif Itkin
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksSteve Nouri
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroSiby Jose Plathottam
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networksh m
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdmaGurpreet Singh
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignAndrada Astefanoaie
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representationTanmayVijay1
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Neural Network
Neural NetworkNeural Network
Neural Networksamisounda
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationMohammed Bennamoun
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 

Similar to Deep-Learning-2017-Lecture6RNN.ppt (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networks
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdma
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block Design
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimization
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Deep-Learning-2017-Lecture6RNN.ppt

  • 1. Lecture 6 Smaller Network: RNN This is our fully connected network. If x1 .... xn, n is very large and growing, this network would become too large. We now will input one xi at a time, and re-use the same edge weights.
  • 3. How does RNN reduce complexity? f h0 h1 y1 x1 f h2 y2 x2 f h3 y3 x3 …… No matter how long the input/output sequence is, we only need one function f. If f’s are different, then it becomes a feedforward NN. This may be treated as another compression from fully connected network. h and h’ are vectors with the same dimension  Given function f: h’,y=f(h,x)
  • 5. f1 h 0 h 1 y1 x1 f1 h 2 y2 x2 f1 h 3 y3 x3 f2 g 0 g 1 f2 g 2 f2 g 3 Bidirectional RNN x1 x2 x3 z1 z2 z3 f3 f3 f3 p 1 p 2 p 3 p=f3(y,z) y,h=f1(x,h) z,g = f2(g,x)
  • 6. Pyramid RNN  Reducing the number of time steps W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” ICASSP, 2016 Bidirectional RNN Significantly speed up training
  • 7. Naïve RNN  f h h' y x We have ignored the bias h' y Wo Wh h’ softmax Wi x h Note, y is computed from h’
  • 8. Problems with naive RNN When dealing with a time series, it tends to forget old information. When there is a distant relationship of unknown length, we wish to have a “memory” to it. Vanishing gradient problem.
  • 9. The sigmoid layer outputs numbers between 0-1 determine how much each component should be let through. Pink X gate is point-wise multiplication.
  • 10. LSTM The core idea is this cell state Ct, it is changed slowly, with only minor linear interactions. It is very easy for information to flow along it unchanged. ht-1 Ct-1 This sigmoid gate determines how much information goes thru This decides what info Is to add to the cell state Output gate Controls what goes into output Forget input gate gate Why sigmoid or tanh: Sigmoid: 0,1 gating as switch. Vanishing gradient problem in LSTM is handled already. ReLU replaces tanh ok?
  • 11. it decides what component is to be updated. C’t provides change contents Updating the cell state Decide what part of the cell state to output
  • 13. Peephole LSTM Allows “peeping into the memory”
  • 14. Naïve RNN vs LSTM c changes slowly h changes faster ct is ct-1 added by something ht and ht-1 can be very different Naïve RNN ht yt xt ht-1 LSTM yt xt ct ht ht-1 ct-1
  • 15. xt z zi zf zo ht-1 ct-1 z xt ht-1 W zi xt ht-1 Wi z f xt ht-1 Wf zo xt ht-1 Wo = σ( ) = σ( ) = σ( ) Information flow of LSTM Controls forget gate Controls input gate Updating information Controls Output gate These 4 matrix computation should be done concurrently.
  • 17. ht xt z zi zf zo yt ht-1 ct-1 ct tanh ct = zf  ct-1 + ziz ht = zo  tanh(ct) yt = σ(W’ ht) Information flow of LSTM Element-wise multiply
  • 18. LSTM information flow xt z zi zf zo yt ht-1 ct-1 ct xt+1 z zi zf zo yt+1 ht ct+1 tanh tanh ht+1 Information flow of LSTM
  • 19. GRU – gated recurrent unit (more compression) It combines the forget and input into a single update gate. It also merges the cell state and hidden state. This is simpler than LSTM. There are many other variants too. reset gate X,*: element-wise multiply LSTM Update gate
  • 20. GRUs also takes xt and ht-1 as inputs. They perform some calculations and then pass along ht. What makes them different from LSTMs is that GRUs don't need the cell layer to pass values along. The calculations within each iteration insure that the ht values being passed along either retain a high amount of old information or are jump-started with a high amount of new information.
  • 21. x f1 a1 f2 a2 f3 a3 f4 y x1 h0 f h1 x2 f x3 h2 f x4 h3 f g y4 t is layer t is time step We will turn the recurrent network 90 degrees. Feed-forward vs Recurrent Network 1. Feedforward network does not have input at each step 2. Feedforward network has different parameters for each layer at = ft(at-1) = σ(Wtat-1 + bt) at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
  • 22. ht-1 r z yt xt ht-1 h' xt 1- ht reset update No input xt at each step at-1 is the output of the (t-1)-th layer at is the output of the t-th layer No output yt at each step No reset gate at-1 at at-1
  • 23. Highway Network • Residual Network • Highway Network Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 Training Very Deep Networks https://arxiv.org/pdf/1507.06228v 2.pdf + copy copy Gate controlle r at-1 at-1 at at at-1 h’ h’ z controls red arrow h’=σ(Wat-1) z=σ(W’at-1) at = z  at-1 + (1-z)  h
  • 24. Input layer output layer Input layer output layer Input layer output layer Highway Network automatically determines the layers needed!
  • 26. Grid LSTM LSTM y x c’ h’ h c Grid LSTM c’ h’ h c Memory for both time and depth b a b’ a’ time depth
  • 30. Sequence to sequence chat model
  • 31. Chat with context U: Hi U: Hi M: Hi M: Hello M: Hi M: Hello Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
  • 34. Image caption generation using attention (From CY Lee lecture) filter filter filter filter filter filter match 0.7 CNN filter filter filter filter filter filter z0 A vector for each region z0 is initial parameter, it is also learned
  • 35. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region 0.7 0.1 0.1 0.1 0.0 0.0 weighted sum z1 Word 1 z0 Attention to a region
  • 36. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region z0 0.0 0.8 0.2 0.0 0.0 0.0 weighted sum z1 Word 1 z2 Word 2
  • 37. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 38. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 39. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV, 2015 * Possible project?

Editor's Notes

  1. Caption generation story like How to do story like !!!!!!!!!!!!!!!!!! http://www.cs.toronto.edu/~mbweb/ https://github.com/ryankiros/neural-storyteller
  2. 滑板相關詞 skateboarding 查看全部 skateboard  1 Dr.eye譯典通 KK [ˋsket͵bord] DJ [ˋskeitbɔ:d]  
  3. Another application is summarization