220910_GatedRNN

Empirical Evaluation of Gated Recurrent Neural
Networks on Sequence Modeling
유용상
NIPS 2014

Introduction
• Evaluate two Gated RNN-based models(LSTM, GRU) + tanh-RNN
• 본 논문에서는 polyphonic music, speech signal modeling dataset을 사용하여
실험을 진행하였으나 NLP에서의 활용사례를 중심으로 설명

Background
• Vanishing gradient problem in vanilla RNN
: 학습을 진행하면서 은닉층에 계속 같은 행렬을 곱하게 되므로 gradient 가 0이 되거나 발산하게 됨

Methodology
• LSTM
• Core Idea : pass cell state information straightly without any transformation

Methodology
• LSTM
i : input gate, Whether to write to cell
f : forget gate, whether to erase cell
o :output gate, how much to reveal cell
g :gate gate, how much to write to cell
Sigmoid 통과한 0~1값
Tanh 통과한 -1~1값

Methodology
• LSTM
-Forget gate
• 현재 시점의 x값과 바로 이전 시점의 hidden state 선형결합 후 시그모이드 함수에 넣음
• t-1시점의 cell state 와 곱한 값이 정보를 기억하는 정도

Methodology
• LSTM
-input gate &g gate
• 현재 정보를 기억하기 위한 게이트
• 인풋 게이트와 곱해짐

Methodology
• LSTM
-output gate
• 은닉 상태를 결정하는 일에 쓰임, hidden state의 입력으로 들어감
• 앞에서 구한 cell state 값을 tanh에 넣어서 -1~1사이 값이 되고 출력 게이트와 곱해지면서 filtering역할 하게 됨

Methodology
• GRU • Cell state vector와 hidden state vector를 일원화함
• Reset gate -> 이전의 hidden state를 얼마나 제외할 지 결정
• Forget gate 대신 (1-input gate) 사용 –> 합쳐서 update gate
Update gate
Reset gate

Methodology
• GRU
• Cell state를 덧셈으로 연산한다는 점 덕분에 gradient를 더 오래 보존 가능

summary
• 공통점
1. Keeping old info
- important feature 유지 가능(forget gate in LSTM, update gate in GRU)
2. Temporal steps를 몇 개 단위로 뛰어넘는 shortcut 가능
- Error들이 back-propagated 잘되게 함
• 차이점
1. Controlling exposure(Output gate)의 유무
2. Location of the input gate + corresponding reset gate
- LSTM은 input과 forget gate가 독립적
- GRU는 input과 forget 합쳐놓음

Reference
• https://wikidocs.net/22885
• https://sumim.tistory.com/entry/NLP-근본-논문-1-GloVe-Global-Vectors-for-Word-
Representation
• https://github.com/ukairia777/tensorflow-nlp-
tutorial/blob/main/09.%20Word%20Embedding/9-5.%20glove.ipynb

220910_GatedRNN

Recommended

Recommended

More Related Content

More from YongSang Yoo

More from YongSang Yoo (10)

220910_GatedRNN