研究室文献発表 11/10 QRNN

James Bradbury∗, Stephen Merity∗ , Caiming Xiong & Richard Socher
Salesforce Research Palo Alto, California
arXiv:1611.01576v2 [cs.NE] 21 Nov 2016
ICLR 2017 accepted
QRNN: QUASI-RECURRENT
NEURAL NETWORKS

Abstract
- QRNN = RNN processing like CNN
- can process sequential data in parallel
- up to 16 times faster than LSTM in train/test
- can make visual analysis of weights easy

Outline
- Introduction
- review of RNN/LSTM
- Model (QRNN)
- Variants
- Results
- sentiment classification
- language modeling
- character-level machine translation
- Conclusion
- Reference

Introduction (review of RNN)
- the standard model architecture for deep learning approaches to sequence
modeling tasks
- sentence classification | word- and character-level language modeling | machine translation |
question answering | image caption | time series forecasting

- the network which has loop
arhictectures
- RNN is very deep (causing
gradient vanising) word2vec(“私”）
昨日の株価
(“の”:0.2, “は”:0.3, ...)
今日の株価の予測値

- problem: not good at
learning very long
sequences
- document classification |
character-level
- why?: can’t deal with
sequential data in parallel

Introduction (review of LSTM)
- LSTM solves gradient vanising, using memory cell
- LSTM has 3 gates to control information flow

- forget gate to control long-term information (in memory cell c)

- input gate to control current+short-time information (in x and h(t-1))

- update memory cell, mixing the current with the previos memory cell

- output gate to control current hidden-state information to the next layer

- using a forget gate instead of an input gate
Introduction (variants of LSTM)

Model (convolution component)
“ズン”, “ドコ”, “きよし”
( 1, 0, 0, )
=“ズン”
この例はone-hotだが
word2vecというもっといい変換
を使う
ズン, ズン, ズン, ドコ, きよし
時刻tの値を予測す
るのに未来の時刻
t+1のデータを用い
てはいけないので、
masked
convolution

bottle-neckになっていた前の層のhidden
state h[t-1] を用いるのではなく、前の時刻
の入力x[t-1/2/...]を用いて並列処理を可能
にした。
Model (pooling component)
LSTM
さらに、hidden state h に重みをかけずに渡していくので、各要素
の情報がごっちゃにならないので可視化しやすい。
ここは従来のLSTMと同じく逐次
計算するが、そんなに大して時
間かからない。

Model (pooling component)
other type poolings （この論文では使われていない？）
f-pooling
ifo-pooling

Variants
- Zoneout: Dropout for LSTM
- skip-connection like DenseNet
- Attention for Encoder-Decoder

Experiments
- Sentiment Classification (document binary-classification)
- IMDb movie review
- 25,000 positive/negative reviews
- Language Modeling (word-level prediction)
- PTB: Penn Treebank
- Character-level Machine Translatoin
- IWST English-German spoken language translation task

Results (sentiment classification)
- 小 batch_size, 長 seq_len
に向いている（最大16倍早
かった。）
- training 時間は3倍早い

Results (sentiment classification)
final layer’s
hidden state

Results (character-level machine translation)
BLEU: upper is better
http://unicorn.ike.tottori-u.ac.jp/2010/s072046/paper/graduation-thesis/node32.html

考察
- LSTMに精度で少し負けてしまった理由は、隠れ層の状態 h[t-1] ではなく、直前の
入力 x[t-1|t-2|,...]を使って近似したからと考えられる。
- 入力で、隠れ層の状態を近似する場合、使う、前の時刻の filter size k を無限大ま
で長くすれば一致する。(sentiment classificationのtaskではkを大きくしたら精度
上がった）
- なので、filter-sizeを大きくすればいいが、そうすると、計算速度はどれほど落ちる
のかが問題。

Conclusion
- QRNN = RNN processing like CNN
- can process sequential data in parallel
- up to 16x faster than LSTM in train/test
- can make visual analysis of weights easy

Reference
- LSTM
- LSTMネットワークの概要 https://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca
- わかるLSTM ～最近の動向と共に https://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca
- ニューラルネットワーク勉強会
http://isw3.naist.jp/~neubig/student/2015/seitaro-s/161025neuralnet_study_LSTM.pdf
- conv の 3D図作成
- thinkercad https://www.tinkercad.com/
- QRNN
- LSTMを超える期待の新星、QRNN https://qiita.com/icoxfog417/items/d77912e10a7c60ae680e
- slideshare
https://www.slideshare.net/DeepLearningJP2016/dlquasirecurrent-neural-networks?qid=a4ead77d-d8dd-458b-965c-5e53723d7757
&v=&b=&from_search=1
- pytorchでの公式実装 https://github.com/salesforce/pytorch-qrnn/blob/master/torchqrnn/qrnn.py

研究室文献発表 11/10 QRNN

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

研究室文献発表 11/10 QRNN