SlideShare a Scribd company logo
1 of 39
Lecture 6 Smaller Network: RNN
This is our fully connected network. If x1 .... xn, n is very large and growing,
this network would become too large. We now will input one xi at a time,
and re-use the same edge weights.
Recurrent Neural Network
How does RNN reduce complexity?
f
h0
h1
y1
x1
f h2
y2
x2
f h3
y3
x3
……
No matter how long the input/output sequence is, we only
need one function f. If f’s are different, then it becomes a
feedforward NN. This may be treated as another compression
from fully connected network.
h and h’ are vectors with
the same dimension
 Given function f: h’,y=f(h,x)
Deep RNN
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
……
f2
g
0
g
1
z1
f2
g
2
z2
f2
g
3
z3
……
…
…
…
h’,y = f1(h,x), g’,z = f2(g,y)
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
f2
g
0
g
1 f2
g
2 f2
g
3
Bidirectional RNN
x1
x2 x3
z1
z2 z3
f3 f3 f3
p
1
p
2
p
3
p=f3(y,z)
y,h=f1(x,h) z,g = f2(g,x)
Pyramid RNN
 Reducing the number of time steps
W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural
network for large vocabulary conversational speech recognition,” ICASSP, 2016
Bidirectional
RNN
Significantly speed up training
Naïve RNN

f
h h'
y
x
We have ignored the bias
h'
y Wo
Wh
h’
softmax
Wi
x
h
Note, y is computed
from h’
Problems with naive RNN
When dealing with a time series, it tends
to forget old information. When there is a
distant relationship of unknown length, we
wish to have a “memory” to it.
Vanishing gradient problem.
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
LSTM
The core idea is this cell
state Ct, it is changed
slowly, with only minor
linear interactions. It is very
easy for information to flow
along it unchanged.
ht-1
Ct-1
This sigmoid gate
determines how much
information goes thru
This decides what info
Is to add to the cell state
Output gate
Controls what
goes into output
Forget input
gate gate
Why sigmoid or tanh:
Sigmoid: 0,1 gating as switch.
Vanishing gradient problem in
LSTM is handled already.
ReLU replaces tanh ok?
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell
state to output
RNN vs LSTM
Peephole LSTM
Allows “peeping into the memory”
Naïve RNN vs LSTM
c changes slowly
h changes faster
ct is ct-1 added by something
ht and ht-1 can be very different
Naïve
RNN
ht
yt
xt
ht-1
LSTM
yt
xt
ct
ht
ht-1
ct-1
xt
z
zi
zf zo
ht-1
ct-1
z
xt
ht-1
W
zi
xt
ht-1
Wi
z
f
xt
ht-1
Wf
zo
xt
ht-1
Wo
= σ( )
= σ( )
= σ( )
Information flow of LSTM
Controls
forget gate
Controls
input gate
Updating
information
Controls
Output gate
These 4 matrix
computation should
be done concurrently.
xt
z
zi
zf zo
ht-1
ct-1
“peephole”
z W
xt
ht-1
ct-1
diagonal
zi
z
f
zo
obtained by the same way
=tanh( )
Information flow of LSTM
ht
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
tanh
ct = zf  ct-1 + ziz
ht = zo  tanh(ct)
yt = σ(W’ ht)
Information flow of LSTM
Element-wise multiply
LSTM information flow
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
xt+1
z
zi
zf zo
yt+1
ht
ct+1
tanh tanh
ht+1
Information flow of LSTM
GRU – gated recurrent unit
(more compression)
It combines the forget and input into a single update gate.
It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.
reset gate
X,*: element-wise multiply
LSTM
Update gate
GRUs also takes xt and ht-1 as inputs. They perform some
calculations and then pass along ht. What makes them different
from LSTMs is that GRUs don't need the cell layer to pass values
along. The calculations within each iteration insure that the ht
values being passed along either retain a high amount of old
information or are jump-started with a high amount of new
information.
x f1 a1
f2 a2
f3 a3
f4 y
x1
h0
f h1
x2
f
x3
h2
f
x4
h3 f g y4
t is layer
t is time step
We will turn the recurrent network 90 degrees.
Feed-forward vs Recurrent Network
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer
at = ft(at-1) = σ(Wtat-1 + bt)
at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
ht-1
r z
yt
xt
ht-1
h'
xt
1-
ht
reset update
No input xt at
each step
at-1 is the output of
the (t-1)-th layer
at is the output of
the t-th layer
No output yt at
each step
No reset gate
at-1 at
at-1
Highway Network
• Residual Network
• Highway Network
Deep Residual Learning for Image
Recognition
http://arxiv.org/abs/1512.03385
Training Very Deep Networks
https://arxiv.org/pdf/1507.06228v
2.pdf
+
copy
copy
Gate
controlle
r
at-1
at-1
at at
at-1
h’
h’
z controls red arrow
h’=σ(Wat-1)
z=σ(W’at-1)
at = z  at-1 + (1-z)  h
Input layer
output layer
Input layer
output layer
Input layer
output layer
Highway Network automatically
determines the layers needed!
Highway Network Experiments
Grid LSTM
LSTM
y
x
c’
h’
h
c
Grid
LSTM
c’
h’
h
c
Memory for both
time and depth
b
a
b’
a’
time
depth
Grid LSTM
Grid
LSTM
c’
h’
h
c
b
a
b’
a’
h'
z
zi
zf zo
h
c
tanh
c'
a
b
a'
b'
You can generalize this to 3D, and more.
Applications of LSTM / RNN
Neural machine translation
LSTM
Sequence to sequence chat model
Chat with context
U: Hi
U: Hi
M: Hi
M: Hello
M: Hi
M: Hello
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
Baidu’s speech recognition using RNN
Attention
Image caption generation using attention
(From CY Lee lecture)
filter filter filter
filter filter filter
match 0.7
CNN
filter filter filter
filter filter filter
z0
A vector for
each region
z0 is initial parameter, it is also learned
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
0.7 0.1 0.1
0.1 0.0 0.0
weighted
sum
z1
Word 1
z0
Attention to
a region
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
z0
0.0 0.8 0.2
0.0 0.0 0.0
weighted
sum
z1
Word 1
z2
Word 2
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo
Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV,
2015
* Possible project?

More Related Content

Similar to RNN.ppt

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Iosif Itkin
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksSteve Nouri
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroSiby Jose Plathottam
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networksh m
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdmaGurpreet Singh
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignAndrada Astefanoaie
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representationTanmayVijay1
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Neural Network
Neural NetworkNeural Network
Neural Networksamisounda
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationMohammed Bennamoun
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 

Similar to RNN.ppt (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networks
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdma
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block Design
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimization
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

RNN.ppt

  • 1. Lecture 6 Smaller Network: RNN This is our fully connected network. If x1 .... xn, n is very large and growing, this network would become too large. We now will input one xi at a time, and re-use the same edge weights.
  • 3. How does RNN reduce complexity? f h0 h1 y1 x1 f h2 y2 x2 f h3 y3 x3 …… No matter how long the input/output sequence is, we only need one function f. If f’s are different, then it becomes a feedforward NN. This may be treated as another compression from fully connected network. h and h’ are vectors with the same dimension  Given function f: h’,y=f(h,x)
  • 5. f1 h 0 h 1 y1 x1 f1 h 2 y2 x2 f1 h 3 y3 x3 f2 g 0 g 1 f2 g 2 f2 g 3 Bidirectional RNN x1 x2 x3 z1 z2 z3 f3 f3 f3 p 1 p 2 p 3 p=f3(y,z) y,h=f1(x,h) z,g = f2(g,x)
  • 6. Pyramid RNN  Reducing the number of time steps W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” ICASSP, 2016 Bidirectional RNN Significantly speed up training
  • 7. Naïve RNN  f h h' y x We have ignored the bias h' y Wo Wh h’ softmax Wi x h Note, y is computed from h’
  • 8. Problems with naive RNN When dealing with a time series, it tends to forget old information. When there is a distant relationship of unknown length, we wish to have a “memory” to it. Vanishing gradient problem.
  • 9. The sigmoid layer outputs numbers between 0-1 determine how much each component should be let through. Pink X gate is point-wise multiplication.
  • 10. LSTM The core idea is this cell state Ct, it is changed slowly, with only minor linear interactions. It is very easy for information to flow along it unchanged. ht-1 Ct-1 This sigmoid gate determines how much information goes thru This decides what info Is to add to the cell state Output gate Controls what goes into output Forget input gate gate Why sigmoid or tanh: Sigmoid: 0,1 gating as switch. Vanishing gradient problem in LSTM is handled already. ReLU replaces tanh ok?
  • 11. it decides what component is to be updated. C’t provides change contents Updating the cell state Decide what part of the cell state to output
  • 13. Peephole LSTM Allows “peeping into the memory”
  • 14. Naïve RNN vs LSTM c changes slowly h changes faster ct is ct-1 added by something ht and ht-1 can be very different Naïve RNN ht yt xt ht-1 LSTM yt xt ct ht ht-1 ct-1
  • 15. xt z zi zf zo ht-1 ct-1 z xt ht-1 W zi xt ht-1 Wi z f xt ht-1 Wf zo xt ht-1 Wo = σ( ) = σ( ) = σ( ) Information flow of LSTM Controls forget gate Controls input gate Updating information Controls Output gate These 4 matrix computation should be done concurrently.
  • 17. ht xt z zi zf zo yt ht-1 ct-1 ct tanh ct = zf  ct-1 + ziz ht = zo  tanh(ct) yt = σ(W’ ht) Information flow of LSTM Element-wise multiply
  • 18. LSTM information flow xt z zi zf zo yt ht-1 ct-1 ct xt+1 z zi zf zo yt+1 ht ct+1 tanh tanh ht+1 Information flow of LSTM
  • 19. GRU – gated recurrent unit (more compression) It combines the forget and input into a single update gate. It also merges the cell state and hidden state. This is simpler than LSTM. There are many other variants too. reset gate X,*: element-wise multiply LSTM Update gate
  • 20. GRUs also takes xt and ht-1 as inputs. They perform some calculations and then pass along ht. What makes them different from LSTMs is that GRUs don't need the cell layer to pass values along. The calculations within each iteration insure that the ht values being passed along either retain a high amount of old information or are jump-started with a high amount of new information.
  • 21. x f1 a1 f2 a2 f3 a3 f4 y x1 h0 f h1 x2 f x3 h2 f x4 h3 f g y4 t is layer t is time step We will turn the recurrent network 90 degrees. Feed-forward vs Recurrent Network 1. Feedforward network does not have input at each step 2. Feedforward network has different parameters for each layer at = ft(at-1) = σ(Wtat-1 + bt) at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
  • 22. ht-1 r z yt xt ht-1 h' xt 1- ht reset update No input xt at each step at-1 is the output of the (t-1)-th layer at is the output of the t-th layer No output yt at each step No reset gate at-1 at at-1
  • 23. Highway Network • Residual Network • Highway Network Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 Training Very Deep Networks https://arxiv.org/pdf/1507.06228v 2.pdf + copy copy Gate controlle r at-1 at-1 at at at-1 h’ h’ z controls red arrow h’=σ(Wat-1) z=σ(W’at-1) at = z  at-1 + (1-z)  h
  • 24. Input layer output layer Input layer output layer Input layer output layer Highway Network automatically determines the layers needed!
  • 26. Grid LSTM LSTM y x c’ h’ h c Grid LSTM c’ h’ h c Memory for both time and depth b a b’ a’ time depth
  • 30. Sequence to sequence chat model
  • 31. Chat with context U: Hi U: Hi M: Hi M: Hello M: Hi M: Hello Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
  • 34. Image caption generation using attention (From CY Lee lecture) filter filter filter filter filter filter match 0.7 CNN filter filter filter filter filter filter z0 A vector for each region z0 is initial parameter, it is also learned
  • 35. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region 0.7 0.1 0.1 0.1 0.0 0.0 weighted sum z1 Word 1 z0 Attention to a region
  • 36. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region z0 0.0 0.8 0.2 0.0 0.0 0.0 weighted sum z1 Word 1 z2 Word 2
  • 37. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 38. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 39. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV, 2015 * Possible project?

Editor's Notes

  1. Caption generation story like How to do story like !!!!!!!!!!!!!!!!!! http://www.cs.toronto.edu/~mbweb/ https://github.com/ryankiros/neural-storyteller
  2. 滑板相關詞 skateboarding 查看全部 skateboard  1 Dr.eye譯典通 KK [ˋsket͵bord] DJ [ˋskeitbɔ:d]  
  3. Another application is summarization