SlideShare a Scribd company logo
1 of 20
Introduction to
Sequence to Sequence Model
2017.03.16 Seminar
Presenter : Hyemin Ahn
Recurrent Neural Networks : For what?
2017-03-28 CPSLAB (EECS) 2
 Human remembers and uses the pattern of sequence.
• Try ‘a b c d e f g…’
• But how about ‘z y x w v u t s…’ ?
 The idea behind RNN is to make use of sequential
information.
 Let’s learn a pattern of a sequence, and utilize (estimate,
generate, etc…) it!
 But HOW?
Recurrent Neural Networks : Typical RNNs
2017-03-28 CPSLAB (EECS) 3
OUTPUT
INPUT
ONE
STEP
DELAY
HIDDEN
STATE
 RNNs are called “RECURRENT” because they
perform the same task for every element of a
sequence, with the output being depended on
the previous computations.
 RNNs have a “memory” which captures
information about what has been calculated so
far.
 The hidden state ℎ 𝑡 captures some information
about a sequence.
 If we use 𝑓 = tanh , Vanishing/Exploding
gradient problem happens.
 For overcome this, we use LSTM/GRU.
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕
ℎ 𝑡 = 𝑓 𝑈𝑥 𝑡 + 𝑊ℎ 𝑡−1 + 𝑏
𝑦𝑡 = 𝑉ℎ 𝑡 + 𝑐
𝑈
𝑊
𝑉
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 4
 Let’s think about the machine, which guesses the dinner menu from
things in shopping bag.
Umm,,
Carbonara!
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 5
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 6
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 7
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 8
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Insert
Some
Memories!
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 9
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 10
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 11
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 12
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 13
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 14
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : GRU
2017-03-28 CPSLAB (EECS) 15
𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓)
𝑖 𝑡 = 𝜎 𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖
𝑜𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜)
𝐶𝑡 = tanh 𝑊𝐶 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡
ℎ 𝑡 = 𝑜𝑡 ∗ tanh(𝐶𝑡)
Maybe we can simplify this structure, efficiently!
GRU
𝑧𝑡 = 𝜎 𝑊𝑧 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑧
𝑟𝑡 = 𝜎 𝑊𝑟 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑟
ℎ 𝑡 = tanh 𝑊ℎ ∙ 𝑟𝑡 ∗ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
ℎ 𝑡 = (1 − 𝑧𝑡) ∗ ℎ 𝑡−1 + 𝑧𝑡 ∗ ℎ 𝑡
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Sequence to Sequence Model: What is it?
2017-03-28 CPSLAB (EECS) 16
ℎ 𝑒(1) ℎ 𝑒(2) ℎ 𝑒(3) ℎ 𝑒(4) ℎ 𝑒(5)
LSTM/GRU
Encoder
LSTM/GRU
Decoder
ℎ 𝑑(1) ℎ 𝑑(𝑇𝑒)
Western Food
To
Korean Food
Transition
Sequence to Sequence Model: Implementation
2017-03-28 CPSLAB (EECS) 17
 The simplest way to implement sequence to sequence model is
to just pass the last hidden state of decoder 𝒉 𝑻
to the first GRU cell of encoder!
 However, this method’s power gets weaker when the encoder need to
generate longer sequence.
Sequence to Sequence Model: Attention Decoder
2017-03-28 CPSLAB (EECS) 18
Bidirectional
GRU Encoder
Attention
GRU Decoder
𝑐𝑡
 For each GRU cell consisting the
decoder, let’s differently pass the
encoder’s information!
ℎ𝑖 =
ℎ𝑖
ℎ𝑖
𝑐𝑖 =
𝑗=1
𝑇𝑥
𝛼𝑖𝑗ℎ𝑗
𝑠𝑖 = 𝑓 𝑠𝑖−1, 𝑦𝑖−1, 𝑐𝑖
= 1 − 𝑧𝑖 ∗ 𝑠𝑖−1 + 𝑧𝑖 ∗ 𝑠𝑖
𝑧𝑖 = 𝜎 𝑊𝑧 𝑦𝑖−1 + 𝑈𝑧 𝑠𝑖−1
𝑟𝑖 = 𝜎 𝑊𝑟 𝑦𝑖−1 + 𝑈𝑟 𝑠𝑖−1
𝑠𝑖 = tanh(𝑦𝑖−1 + 𝑈 𝑟𝑖 ∗ 𝑠𝑖−1 + 𝐶𝑐𝑖)
𝛼𝑖𝑗 =
exp(𝑒 𝑖𝑗)
𝑘=1
𝑇 𝑥 exp(𝑒 𝑖𝑘)
𝑒𝑖𝑗 = 𝑣 𝑎
𝑇
tanh 𝑊𝑎 𝑠𝑖−1 + 𝑈 𝑎ℎ𝑗
Sequence to Sequence Model: Example codes
2017-03-28 CPSLAB (EECS) 19
Codes Here @ Github
2017-03-28 CPSLAB (EECS) 20

More Related Content

What's hot

INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUSri Geetha
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 

What's hot (20)

INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Lstm
LstmLstm
Lstm
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Bert
BertBert
Bert
 

Viewers also liked

1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
 
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
Understanding deep learning requires rethinking generalization (2017)    2 2(2)Understanding deep learning requires rethinking generalization (2017)    2 2(2)
Understanding deep learning requires rethinking generalization (2017) 2 2(2)정훈 서
 
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2정훈 서
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
해커에게 전해들은 머신러닝 #1
해커에게 전해들은 머신러닝 #1해커에게 전해들은 머신러닝 #1
해커에게 전해들은 머신러닝 #1Haesun Park
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정Sampling-Importance-Sampling을 이용한 선수 경기능력 측정
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정Mad Scientists
 
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...Darshan Santani
 
회사에서 기술서적을 읽는다는것
회사에서 기술서적을 읽는다는것회사에서 기술서적을 읽는다는것
회사에서 기술서적을 읽는다는것성환 조
 
Cooperative Collision Avoidance via Proximal Message Passing
Cooperative Collision Avoidance via Proximal Message PassingCooperative Collision Avoidance via Proximal Message Passing
Cooperative Collision Avoidance via Proximal Message PassingLyft
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)YerevaNN research lab
 
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineDeep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineKoby Karp
 
Human brain how it work
Human brain how it workHuman brain how it work
Human brain how it workhudvin
 
Build Message Bot With Neural Network
Build Message Bot With Neural NetworkBuild Message Bot With Neural Network
Build Message Bot With Neural NetworkBilly Yang
 
시나브로 Django 발표
시나브로 Django 발표시나브로 Django 발표
시나브로 Django 발표명서 강
 
Paper Reading : Enriching word vectors with subword information(2016)
Paper Reading : Enriching word vectors with subword information(2016)Paper Reading : Enriching word vectors with subword information(2016)
Paper Reading : Enriching word vectors with subword information(2016)정훈 서
 

Viewers also liked (20)

1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
Understanding deep learning requires rethinking generalization (2017)    2 2(2)Understanding deep learning requires rethinking generalization (2017)    2 2(2)
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
 
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
해커에게 전해들은 머신러닝 #1
해커에게 전해들은 머신러닝 #1해커에게 전해들은 머신러닝 #1
해커에게 전해들은 머신러닝 #1
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정Sampling-Importance-Sampling을 이용한 선수 경기능력 측정
Sampling-Importance-Sampling을 이용한 선수 경기능력 측정
 
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...
Loud and Trendy: Crowdsourcing Impressions of Social Ambiance in Popular Indo...
 
회사에서 기술서적을 읽는다는것
회사에서 기술서적을 읽는다는것회사에서 기술서적을 읽는다는것
회사에서 기술서적을 읽는다는것
 
Cooperative Collision Avoidance via Proximal Message Passing
Cooperative Collision Avoidance via Proximal Message PassingCooperative Collision Avoidance via Proximal Message Passing
Cooperative Collision Avoidance via Proximal Message Passing
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
 
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineDeep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
 
Human brain how it work
Human brain how it workHuman brain how it work
Human brain how it work
 
RNN & LSTM
RNN & LSTMRNN & LSTM
RNN & LSTM
 
Build Message Bot With Neural Network
Build Message Bot With Neural NetworkBuild Message Bot With Neural Network
Build Message Bot With Neural Network
 
시나브로 Django 발표
시나브로 Django 발표시나브로 Django 발표
시나브로 Django 발표
 
Paper Reading : Enriching word vectors with subword information(2016)
Paper Reading : Enriching word vectors with subword information(2016)Paper Reading : Enriching word vectors with subword information(2016)
Paper Reading : Enriching word vectors with subword information(2016)
 

Similar to Introduction For seq2seq(sequence to sequence) and RNN

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Intro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiIntro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiDeep Learning Italia
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...cscpconf
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1paul0001
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissectionChenYiHuang5
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3TOMMYLINK1
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 

Similar to Introduction For seq2seq(sequence to sequence) and RNN (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Intro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiIntro to deep learning_ matteo alberti
Intro to deep learning_ matteo alberti
 
LSTM
LSTMLSTM
LSTM
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Introduction Data Compression/ Data compression, modelling and coding,Image C...
Introduction Data Compression/ Data compression, modelling and coding,Image C...Introduction Data Compression/ Data compression, modelling and coding,Image C...
Introduction Data Compression/ Data compression, modelling and coding,Image C...
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 

Introduction For seq2seq(sequence to sequence) and RNN

  • 1. Introduction to Sequence to Sequence Model 2017.03.16 Seminar Presenter : Hyemin Ahn
  • 2. Recurrent Neural Networks : For what? 2017-03-28 CPSLAB (EECS) 2  Human remembers and uses the pattern of sequence. • Try ‘a b c d e f g…’ • But how about ‘z y x w v u t s…’ ?  The idea behind RNN is to make use of sequential information.  Let’s learn a pattern of a sequence, and utilize (estimate, generate, etc…) it!  But HOW?
  • 3. Recurrent Neural Networks : Typical RNNs 2017-03-28 CPSLAB (EECS) 3 OUTPUT INPUT ONE STEP DELAY HIDDEN STATE  RNNs are called “RECURRENT” because they perform the same task for every element of a sequence, with the output being depended on the previous computations.  RNNs have a “memory” which captures information about what has been calculated so far.  The hidden state ℎ 𝑡 captures some information about a sequence.  If we use 𝑓 = tanh , Vanishing/Exploding gradient problem happens.  For overcome this, we use LSTM/GRU. 𝒉 𝒕 𝒚 𝒕 𝒙 𝒕 ℎ 𝑡 = 𝑓 𝑈𝑥 𝑡 + 𝑊ℎ 𝑡−1 + 𝑏 𝑦𝑡 = 𝑉ℎ 𝑡 + 𝑐 𝑈 𝑊 𝑉
  • 4. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 4  Let’s think about the machine, which guesses the dinner menu from things in shopping bag. Umm,, Carbonara!
  • 5. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 5 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕
  • 6. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 6 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Forget Some Memories!
  • 7. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 7 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Forget Some Memories! LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 8. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 8 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Insert Some Memories! LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 9. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 9 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 10. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 10 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 11. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 11 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒚 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 12. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 12 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 13. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 13 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 14. Recurrent Neural Networks : LSTM 2017-03-28 CPSLAB (EECS) 14 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 15. Recurrent Neural Networks : GRU 2017-03-28 CPSLAB (EECS) 15 𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓) 𝑖 𝑡 = 𝜎 𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖 𝑜𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜) 𝐶𝑡 = tanh 𝑊𝐶 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶 𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡 ℎ 𝑡 = 𝑜𝑡 ∗ tanh(𝐶𝑡) Maybe we can simplify this structure, efficiently! GRU 𝑧𝑡 = 𝜎 𝑊𝑧 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑧 𝑟𝑡 = 𝜎 𝑊𝑟 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑟 ℎ 𝑡 = tanh 𝑊ℎ ∙ 𝑟𝑡 ∗ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶 ℎ 𝑡 = (1 − 𝑧𝑡) ∗ ℎ 𝑡−1 + 𝑧𝑡 ∗ ℎ 𝑡 Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 16. Sequence to Sequence Model: What is it? 2017-03-28 CPSLAB (EECS) 16 ℎ 𝑒(1) ℎ 𝑒(2) ℎ 𝑒(3) ℎ 𝑒(4) ℎ 𝑒(5) LSTM/GRU Encoder LSTM/GRU Decoder ℎ 𝑑(1) ℎ 𝑑(𝑇𝑒) Western Food To Korean Food Transition
  • 17. Sequence to Sequence Model: Implementation 2017-03-28 CPSLAB (EECS) 17  The simplest way to implement sequence to sequence model is to just pass the last hidden state of decoder 𝒉 𝑻 to the first GRU cell of encoder!  However, this method’s power gets weaker when the encoder need to generate longer sequence.
  • 18. Sequence to Sequence Model: Attention Decoder 2017-03-28 CPSLAB (EECS) 18 Bidirectional GRU Encoder Attention GRU Decoder 𝑐𝑡  For each GRU cell consisting the decoder, let’s differently pass the encoder’s information! ℎ𝑖 = ℎ𝑖 ℎ𝑖 𝑐𝑖 = 𝑗=1 𝑇𝑥 𝛼𝑖𝑗ℎ𝑗 𝑠𝑖 = 𝑓 𝑠𝑖−1, 𝑦𝑖−1, 𝑐𝑖 = 1 − 𝑧𝑖 ∗ 𝑠𝑖−1 + 𝑧𝑖 ∗ 𝑠𝑖 𝑧𝑖 = 𝜎 𝑊𝑧 𝑦𝑖−1 + 𝑈𝑧 𝑠𝑖−1 𝑟𝑖 = 𝜎 𝑊𝑟 𝑦𝑖−1 + 𝑈𝑟 𝑠𝑖−1 𝑠𝑖 = tanh(𝑦𝑖−1 + 𝑈 𝑟𝑖 ∗ 𝑠𝑖−1 + 𝐶𝑐𝑖) 𝛼𝑖𝑗 = exp(𝑒 𝑖𝑗) 𝑘=1 𝑇 𝑥 exp(𝑒 𝑖𝑘) 𝑒𝑖𝑗 = 𝑣 𝑎 𝑇 tanh 𝑊𝑎 𝑠𝑖−1 + 𝑈 𝑎ℎ𝑗
  • 19. Sequence to Sequence Model: Example codes 2017-03-28 CPSLAB (EECS) 19 Codes Here @ Github