Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 969127 views
- AI and Machine Learning Demystified... by Carol Smith 3591713 views
- 10 facts about jobs in the future by Pew Research Cent... 646644 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 1046982 views
- Harry Surden - Artificial Intellige... by Harry Surden 608211 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1190891 views

1,656 views

Published on

"Improving Language Modeling using Densely Connected Recurrent Neural Networks".

See http://www.fredericgodin.com/publications/ for more info.

Published in:
Data & Analytics

No Downloads

Total views

1,656

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

27

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Skip, residual and densely connected RNN architectures Frederic Godin - Ph.D. Researcher Department of Electronics and Information Systems IDLab
- 2. Fréderic Godin - Skip, residual and densely connected RNN architectures Who is Fréderic? Ph.D. Reseacher Deep Learning @ IDLab Main interests: ̶ Sequence models ̶ Hybrid RNN/CNN models Major application domain: Natural Language Processing ̶ Noisy data (E.g., Twitter data) ̶ Parsing tasks (E.g., Named Entity Recognition) Minor application domain: Computer Vision ̶ Lung cancer detection (Kaggle competition 7th/1972) (http://blog.kaggle.com/2017/05/16/data-science-bowl-2017-predicting-lung-cancer-solution-write-up-team-deep-breath/) 2
- 3. Fréderic Godin - Skip, residual and densely connected RNN architectures Agenda 1. Recurrent neural networks 2. Skip, residual and dense connections 3. Dense connections in practice 3
- 4. Recurrent neural networks 4
- 5. Fréderic Godin - Skip, residual and densely connected RNN architectures Recurrent neural networks ̶ Neural network with a cyclic connection ̶ Has memory ̶ Models variable-length sequences 5
- 6. Fréderic Godin - Skip, residual and densely connected RNN architectures 6 t=1 t=2 t=3 t=4 word1 word2 word3 word4E.g.: Unfolded recurrent neural network
- 7. Fréderic Godin - Skip, residual and densely connected RNN architectures Stacking recurrent neural networks 7 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Deep in time ...Deep in height
- 8. Fréderic Godin - Skip, residual and densely connected RNN architectures Vanishing gradients - When updating the weights using backpropagation, the gradient tends to vanish with every neuron it crosses - Often caused by the activation function 8
- 9. Fréderic Godin - Skip, residual and densely connected RNN architectures Backpropagating through stacked RNNs 9 t=1 t=2 t=3 t=4 word1 word2 word3 word4 Backpropagation in time ... Back- propagation in height
- 10. Fréderic Godin - Skip, residual and densely connected RNN architectures Mitigating the vanishing gradient problem In time: Long Short-Term Memory (LSTM) 10 In height: ̶ Many techniques exist in convolutional neural networks ̶ This talk: can we apply them in RNNs? Key equation to model depth in time
- 11. Skip, residual and dense connections 11
- 12. Fréderic Godin - Skip, residual and densely connected RNN architectures Skip connection 12 Layer 2 Merge 1,2 Out 1 A direct connection between 2 non-consecutive layers - No vanishing gradient - 2 main flavors - Concatenative skip connections - Additive skip connections Layer 3 Layer 1
- 13. Fréderic Godin - Skip, residual and densely connected RNN architectures (Concatenative) skip connection 13 Concatenate output of previous layer and skip connection Advantage: Provides the output of first layer to third layer without altering it Disadvantage: Doubles the input size Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1
- 14. Fréderic Godin - Skip, residual and densely connected RNN architectures Additive skip connection (Residual connection) Originates from image classification domain Residual connection is defined as: 14 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 “Residue” Out 1 + 2 Layer 2 Out 1
- 15. Fréderic Godin - Skip, residual and densely connected RNN architectures Residual connections do not make sense in RNNs Layer 2 also depends on h(t-1) 15 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection (Residual connection) in RNN Additive skip connection Out 1 + 2 Layer 2 Out 1 h(t-1) ht y x
- 16. Fréderic Godin - Skip, residual and densely connected RNN architectures 16 Layer 2 Out 1 + 2 Out 1 Layer 3 Layer 1 Additive skip connection Sum output of previous layer and skip connection Advantage: Input size to next layer does not increase Disadvantage: Can create noisy input to next layer
- 17. Fréderic Godin - Skip, residual and densely connected RNN architectures Densely connecting layers Add a skip connection between every output and every input of every layer Advantage: - Direct paths between every layer - Hierarchy of features as input to every layer Disadvantage: (L-1)*L connections 17 Layer 2 Out 2 Out 1 Layer 3 Layer 1 Out 1 Out 3 Layer 4 Out 2Out 1
- 18. Densely connected layers in practice 18
- 19. Fréderic Godin - Skip, residual and densely connected RNN architectures Language modeling Building a model which captures statistical characteristics of a language: In practice: predicting next word in a sentence 19
- 20. Fréderic Godin - Skip, residual and densely connected RNN architectures Example architecture 20 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Embedding layer
- 21. Fréderic Godin - Skip, residual and densely connected RNN architectures Training details 21 Stochastic Gradient Descent with learning scheme Uniform initialization [-0.05:0.05] Dropout with probability 0.6
- 22. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 22 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 Stacked LSTM 200 2 5M 100.9 200 3 5M 108.8 350 2 9M 87.9 Densely Connected LSTM 200 2 9M 80.4 200 3 11M 78.5 200 4 14M 76.9 Lower perplexity is better
- 23. Fréderic Godin - Skip, residual and densely connected RNN architectures Character-to-word language modeling 23 word2 word3 word4 word5 word1 word2 word3 word4 ... Classification layer LSTM LSTM Highway layer ConvNet Embedding layer
- 24. Fréderic Godin - Skip, residual and densely connected RNN architectures Experimental results 24 Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 CharCNN (Kim et al. 2016) 650 2 19M 78.9 Densely Connected LSTM 200 3 11M 78.5 200 4 14M 76.9 Densely Connected CharCNN* 200 4 20M 74.6 *Not published Lower perplexity is better
- 25. Conclusion 25
- 26. Fréderic Godin - Skip, residual and densely connected RNN architectures Conclusion Densely connecting all layers improves language modeling performance Avoids vanishing gradients Creates hierarchy of features, available to each layer We use six times fewer parameters to obtain the same result as a stacked LSTM 26
- 27. Fréderic Godin - Skip, residual and densely connected RNN architectures Q&A Also more details in our publication: Fréderic Godin, Joni Dambre & Wesley De Neve “Improving Language Modeling using Densely Connected Recurrent Neural Networks” https://arxiv.org/abs/1707.06130 27
- 28. Fréderic Godin Ph.D. Researcher Deep Learning IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be

No public clipboards found for this slide

Be the first to comment