SlideShare a Scribd company logo
1 of 39
Lecture 6 Smaller Network: RNN
This is our fully connected network. If x1 .... xn, n is very large and growing,
this network would become too large. We now will input one xi at a time,
and re-use the same edge weights.
Recurrent Neural Network
How does RNN reduce complexity?
f
h0
h1
y1
x1
f h2
y2
x2
f h3
y3
x3
……
No matter how long the input/output sequence is, we only
need one function f. If f’s are different, then it becomes a
feedforward NN. This may be treated as another compression
from fully connected network.
h and h’ are vectors with
the same dimension
 Given function f: h’,y=f(h,x)
Deep RNN
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
……
f2
g
0
g
1
z1
f2
g
2
z2
f2
g
3
z3
……
…
…
…
h’,y = f1(h,x), g’,z = f2(g,y)
f1
h
0
h
1
y1
x1
f1
h
2
y2
x2
f1
h
3
y3
x3
f2
g
0
g
1 f2
g
2 f2
g
3
Bidirectional RNN
x1
x2 x3
z1
z2 z3
f3 f3 f3
p
1
p
2
p
3
p=f3(y,z)
y,h=f1(x,h) z,g = f2(g,x)
Pyramid RNN
 Reducing the number of time steps
W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural
network for large vocabulary conversational speech recognition,” ICASSP, 2016
Bidirectional
RNN
Significantly speed up training
Naïve RNN

f
h h'
y
x
We have ignored the bias
h'
y Wo
Wh
h’
softmax
Wi
x
h
Note, y is computed
from h’
Problems with naive RNN
When dealing with a time series, it tends
to forget old information. When there is a
distant relationship of unknown length, we
wish to have a “memory” to it.
Vanishing gradient problem.
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
LSTM
The core idea is this cell
state Ct, it is changed
slowly, with only minor
linear interactions. It is very
easy for information to flow
along it unchanged.
ht-1
Ct-1
This sigmoid gate
determines how much
information goes thru
This decides what info
Is to add to the cell state
Output gate
Controls what
goes into output
Forget input
gate gate
Why sigmoid or tanh:
Sigmoid: 0,1 gating as switch.
Vanishing gradient problem in
LSTM is handled already.
ReLU replaces tanh ok?
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell
state to output
RNN vs LSTM
Peephole LSTM
Allows “peeping into the memory”
Naïve RNN vs LSTM
c changes slowly
h changes faster
ct is ct-1 added by something
ht and ht-1 can be very different
Naïve
RNN
ht
yt
xt
ht-1
LSTM
yt
xt
ct
ht
ht-1
ct-1
xt
z
zi
zf zo
ht-1
ct-1
z
xt
ht-1
W
zi
xt
ht-1
Wi
z
f
xt
ht-1
Wf
zo
xt
ht-1
Wo
= σ( )
= σ( )
= σ( )
Information flow of LSTM
Controls
forget gate
Controls
input gate
Updating
information
Controls
Output gate
These 4 matrix
computation should
be done concurrently.
xt
z
zi
zf zo
ht-1
ct-1
“peephole”
z W
xt
ht-1
ct-1
diagonal
zi
z
f
zo
obtained by the same way
=tanh( )
Information flow of LSTM
ht
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
tanh
ct = zf  ct-1 + ziz
ht = zo  tanh(ct)
yt = σ(W’ ht)
Information flow of LSTM
Element-wise multiply
LSTM information flow
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
xt+1
z
zi
zf zo
yt+1
ht
ct+1
tanh tanh
ht+1
Information flow of LSTM
GRU – gated recurrent unit
(more compression)
It combines the forget and input into a single update gate.
It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.
reset gate
X,*: element-wise multiply
LSTM
Update gate
GRUs also takes xt and ht-1 as inputs. They perform some
calculations and then pass along ht. What makes them different
from LSTMs is that GRUs don't need the cell layer to pass values
along. The calculations within each iteration insure that the ht
values being passed along either retain a high amount of old
information or are jump-started with a high amount of new
information.
x f1 a1
f2 a2
f3 a3
f4 y
x1
h0
f h1
x2
f
x3
h2
f
x4
h3 f g y4
t is layer
t is time step
We will turn the recurrent network 90 degrees.
Feed-forward vs Recurrent Network
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer
at = ft(at-1) = σ(Wtat-1 + bt)
at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
ht-1
r z
yt
xt
ht-1
h'
xt
1-
ht
reset update
No input xt at
each step
at-1 is the output of
the (t-1)-th layer
at is the output of
the t-th layer
No output yt at
each step
No reset gate
at-1 at
at-1
Highway Network
• Residual Network
• Highway Network
Deep Residual Learning for Image
Recognition
http://arxiv.org/abs/1512.03385
Training Very Deep Networks
https://arxiv.org/pdf/1507.06228v
2.pdf
+
copy
copy
Gate
controlle
r
at-1
at-1
at at
at-1
h’
h’
z controls red arrow
h’=σ(Wat-1)
z=σ(W’at-1)
at = z  at-1 + (1-z)  h
Input layer
output layer
Input layer
output layer
Input layer
output layer
Highway Network automatically
determines the layers needed!
Highway Network Experiments
Grid LSTM
LSTM
y
x
c’
h’
h
c
Grid
LSTM
c’
h’
h
c
Memory for both
time and depth
b
a
b’
a’
time
depth
Grid LSTM
Grid
LSTM
c’
h’
h
c
b
a
b’
a’
h'
z
zi
zf zo
h
c
tanh
c'
a
b
a'
b'
You can generalize this to 3D, and more.
Applications of LSTM / RNN
Neural machine translation
LSTM
Sequence to sequence chat model
Chat with context
U: Hi
U: Hi
M: Hi
M: Hello
M: Hi
M: Hello
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
Baidu’s speech recognition using RNN
Attention
Image caption generation using attention
(From CY Lee lecture)
filter filter filter
filter filter filter
match 0.7
CNN
filter filter filter
filter filter filter
z0
A vector for
each region
z0 is initial parameter, it is also learned
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
0.7 0.1 0.1
0.1 0.0 0.0
weighted
sum
z1
Word 1
z0
Attention to
a region
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
z0
0.0 0.8 0.2
0.0 0.0 0.0
weighted
sum
z1
Word 1
z2
Word 2
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo
Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV,
2015
* Possible project?

More Related Content

Similar to 12337673 deep learning RNN RNN DL ML sa.ppt

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Iosif Itkin
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksSteve Nouri
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroSiby Jose Plathottam
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networksh m
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdmaGurpreet Singh
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignAndrada Astefanoaie
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representationTanmayVijay1
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Neural Network
Neural NetworkNeural Network
Neural Networksamisounda
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationMohammed Bennamoun
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 

Similar to 12337673 deep learning RNN RNN DL ML sa.ppt (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networks
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Lecture intro to_wcdma
Lecture intro to_wcdmaLecture intro to_wcdma
Lecture intro to_wcdma
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Unbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block DesignUnbalanced Feistel Networks and Code Block Design
Unbalanced Feistel Networks and Code Block Design
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimizationArtificial Neural Networks Lect8: Neural networks for constrained optimization
Artificial Neural Networks Lect8: Neural networks for constrained optimization
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 

More from ManiMaran230751

one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSManiMaran230751
 
Deep-Learning-2017-Lecture ML DL RNN.ppt
Deep-Learning-2017-Lecture  ML DL RNN.pptDeep-Learning-2017-Lecture  ML DL RNN.ppt
Deep-Learning-2017-Lecture ML DL RNN.pptManiMaran230751
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptManiMaran230751
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptManiMaran230751
 
Reinforcemnet Leaning in ML and DL.pptx
Reinforcemnet Leaning in ML and  DL.pptxReinforcemnet Leaning in ML and  DL.pptx
Reinforcemnet Leaning in ML and DL.pptxManiMaran230751
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
The Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxThe Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxManiMaran230751
 
Open Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxOpen Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxManiMaran230751
 

More from ManiMaran230751 (10)

one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DS
 
Deep-Learning-2017-Lecture ML DL RNN.ppt
Deep-Learning-2017-Lecture  ML DL RNN.pptDeep-Learning-2017-Lecture  ML DL RNN.ppt
Deep-Learning-2017-Lecture ML DL RNN.ppt
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
Reinforcemnet Leaning in ML and DL.pptx
Reinforcemnet Leaning in ML and  DL.pptxReinforcemnet Leaning in ML and  DL.pptx
Reinforcemnet Leaning in ML and DL.pptx
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
The Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxThe Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptx
 
Open Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxOpen Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptx
 
Fundamentals of RL.pptx
Fundamentals of RL.pptxFundamentals of RL.pptx
Fundamentals of RL.pptx
 
Acoustic Model.pptx
Acoustic Model.pptxAcoustic Model.pptx
Acoustic Model.pptx
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

12337673 deep learning RNN RNN DL ML sa.ppt

  • 1. Lecture 6 Smaller Network: RNN This is our fully connected network. If x1 .... xn, n is very large and growing, this network would become too large. We now will input one xi at a time, and re-use the same edge weights.
  • 3. How does RNN reduce complexity? f h0 h1 y1 x1 f h2 y2 x2 f h3 y3 x3 …… No matter how long the input/output sequence is, we only need one function f. If f’s are different, then it becomes a feedforward NN. This may be treated as another compression from fully connected network. h and h’ are vectors with the same dimension  Given function f: h’,y=f(h,x)
  • 5. f1 h 0 h 1 y1 x1 f1 h 2 y2 x2 f1 h 3 y3 x3 f2 g 0 g 1 f2 g 2 f2 g 3 Bidirectional RNN x1 x2 x3 z1 z2 z3 f3 f3 f3 p 1 p 2 p 3 p=f3(y,z) y,h=f1(x,h) z,g = f2(g,x)
  • 6. Pyramid RNN  Reducing the number of time steps W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” ICASSP, 2016 Bidirectional RNN Significantly speed up training
  • 7. Naïve RNN  f h h' y x We have ignored the bias h' y Wo Wh h’ softmax Wi x h Note, y is computed from h’
  • 8. Problems with naive RNN When dealing with a time series, it tends to forget old information. When there is a distant relationship of unknown length, we wish to have a “memory” to it. Vanishing gradient problem.
  • 9. The sigmoid layer outputs numbers between 0-1 determine how much each component should be let through. Pink X gate is point-wise multiplication.
  • 10. LSTM The core idea is this cell state Ct, it is changed slowly, with only minor linear interactions. It is very easy for information to flow along it unchanged. ht-1 Ct-1 This sigmoid gate determines how much information goes thru This decides what info Is to add to the cell state Output gate Controls what goes into output Forget input gate gate Why sigmoid or tanh: Sigmoid: 0,1 gating as switch. Vanishing gradient problem in LSTM is handled already. ReLU replaces tanh ok?
  • 11. it decides what component is to be updated. C’t provides change contents Updating the cell state Decide what part of the cell state to output
  • 13. Peephole LSTM Allows “peeping into the memory”
  • 14. Naïve RNN vs LSTM c changes slowly h changes faster ct is ct-1 added by something ht and ht-1 can be very different Naïve RNN ht yt xt ht-1 LSTM yt xt ct ht ht-1 ct-1
  • 15. xt z zi zf zo ht-1 ct-1 z xt ht-1 W zi xt ht-1 Wi z f xt ht-1 Wf zo xt ht-1 Wo = σ( ) = σ( ) = σ( ) Information flow of LSTM Controls forget gate Controls input gate Updating information Controls Output gate These 4 matrix computation should be done concurrently.
  • 17. ht xt z zi zf zo yt ht-1 ct-1 ct tanh ct = zf  ct-1 + ziz ht = zo  tanh(ct) yt = σ(W’ ht) Information flow of LSTM Element-wise multiply
  • 18. LSTM information flow xt z zi zf zo yt ht-1 ct-1 ct xt+1 z zi zf zo yt+1 ht ct+1 tanh tanh ht+1 Information flow of LSTM
  • 19. GRU – gated recurrent unit (more compression) It combines the forget and input into a single update gate. It also merges the cell state and hidden state. This is simpler than LSTM. There are many other variants too. reset gate X,*: element-wise multiply LSTM Update gate
  • 20. GRUs also takes xt and ht-1 as inputs. They perform some calculations and then pass along ht. What makes them different from LSTMs is that GRUs don't need the cell layer to pass values along. The calculations within each iteration insure that the ht values being passed along either retain a high amount of old information or are jump-started with a high amount of new information.
  • 21. x f1 a1 f2 a2 f3 a3 f4 y x1 h0 f h1 x2 f x3 h2 f x4 h3 f g y4 t is layer t is time step We will turn the recurrent network 90 degrees. Feed-forward vs Recurrent Network 1. Feedforward network does not have input at each step 2. Feedforward network has different parameters for each layer at = ft(at-1) = σ(Wtat-1 + bt) at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
  • 22. ht-1 r z yt xt ht-1 h' xt 1- ht reset update No input xt at each step at-1 is the output of the (t-1)-th layer at is the output of the t-th layer No output yt at each step No reset gate at-1 at at-1
  • 23. Highway Network • Residual Network • Highway Network Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 Training Very Deep Networks https://arxiv.org/pdf/1507.06228v 2.pdf + copy copy Gate controlle r at-1 at-1 at at at-1 h’ h’ z controls red arrow h’=σ(Wat-1) z=σ(W’at-1) at = z  at-1 + (1-z)  h
  • 24. Input layer output layer Input layer output layer Input layer output layer Highway Network automatically determines the layers needed!
  • 26. Grid LSTM LSTM y x c’ h’ h c Grid LSTM c’ h’ h c Memory for both time and depth b a b’ a’ time depth
  • 30. Sequence to sequence chat model
  • 31. Chat with context U: Hi U: Hi M: Hi M: Hello M: Hi M: Hello Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
  • 34. Image caption generation using attention (From CY Lee lecture) filter filter filter filter filter filter match 0.7 CNN filter filter filter filter filter filter z0 A vector for each region z0 is initial parameter, it is also learned
  • 35. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region 0.7 0.1 0.1 0.1 0.0 0.0 weighted sum z1 Word 1 z0 Attention to a region
  • 36. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region z0 0.0 0.8 0.2 0.0 0.0 0.0 weighted sum z1 Word 1 z2 Word 2
  • 37. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 38. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 39. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV, 2015 * Possible project?

Editor's Notes

  1. Caption generation story like How to do story like !!!!!!!!!!!!!!!!!! http://www.cs.toronto.edu/~mbweb/ https://github.com/ryankiros/neural-storyteller
  2. 滑板相關詞 skateboarding 查看全部 skateboard  1 Dr.eye譯典通 KK [ˋsket͵bord] DJ [ˋskeitbɔ:d]  
  3. Another application is summarization