1. James Bradbury∗, Stephen Merity∗ , Caiming Xiong & Richard Socher
Salesforce Research Palo Alto, California
arXiv:1611.01576v2 [cs.NE] 21 Nov 2016
ICLR 2017 accepted
QRNN: QUASI-RECURRENT
NEURAL NETWORKS
2. Abstract
- QRNN = RNN processing like CNN
- can process sequential data in parallel
- up to 16 times faster than LSTM in train/test
- can make visual analysis of weights easy
3. Outline
- Introduction
- review of RNN/LSTM
- Model (QRNN)
- Variants
- Results
- sentiment classification
- language modeling
- character-level machine translation
- Conclusion
- Reference
4. Introduction (review of RNN)
- the standard model architecture for deep learning approaches to sequence
modeling tasks
- sentence classification | word- and character-level language modeling | machine translation |
question answering | image caption | time series forecasting
5. Introduction (review of RNN)
- the network which has loop
arhictectures
- RNN is very deep (causing
gradient vanising) word2vec(“私”)
昨日の株価
(“の”:0.2, “は”:0.3, ...)
今日の株価の予測値
6. Introduction (review of RNN)
- problem: not good at
learning very long
sequences
- document classification |
character-level
- why?: can’t deal with
sequential data in parallel
7. Introduction (review of LSTM)
- LSTM solves gradient vanising, using memory cell
- LSTM has 3 gates to control information flow
8. Introduction (review of LSTM)
- forget gate to control long-term information (in memory cell c)
9. Introduction (review of LSTM)
- input gate to control current+short-time information (in x and h(t-1))
10. Introduction (review of LSTM)
- update memory cell, mixing the current with the previos memory cell
11. - output gate to control current hidden-state information to the next layer
Introduction (review of LSTM)
12. - using a forget gate instead of an input gate
Introduction (variants of LSTM)
25. Conclusion
- QRNN = RNN processing like CNN
- can process sequential data in parallel
- up to 16x faster than LSTM in train/test
- can make visual analysis of weights easy