Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Scott Clark, CEO, SigOpt, at MLconf... by MLconf 308 views
- Alex Smola, Director of Machine Lea... by MLconf 2020 views
- Layla El Asri, Research Scientist, ... by MLconf 565 views
- Alexandra Johnson, Software Enginee... by MLconf 4619 views
- Nikhil Garg, Engineering Manager, Q... by MLconf 1321 views
- Ben Hamner, CTO, Kaggle, at MLconf ... by MLconf 545 views

1,124 views

Published on

Published in:
Technology

No Downloads

Total views

1,124

On SlideShare

0

From Embeds

0

Number of Embeds

5

Shares

0

Downloads

72

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Neural Turing Machines: Perils and Promise Daniel Shank
- 2. Overview 1.Neural Turing Machines 2.Applications and Performance 3.Challenges and Recommendations 4.Dynamic Neural Computers
- 3. 2 28 Neural Turing Machines
- 4. 3 28 What’s a Turing Machine? Model of a computer Memory tape Read and write heads
- 5. 4 28 What’s a Neural Turing Machine? Neural Network “Controller” Memory Learns from sequence Graves et al 2014, arXiv:1410.5401v2
- 6. 5 28 Neural Turing Machines are Differentiable Turing Machines ‘Sharp’ functions made smooth Can train with backpropagation
- 7. 6 28 Applications and Performance
- 8. 7 28 Neural Turing Machines can… Learn simple algorithms (Copy, repeat, recognize simple formal languages...) Generalize Do well at language modeling Do well at bAbI
- 9. 8 28 Generalization on Copy/Repeat task Graves et al 2014
- 10. 9 28 Neural Turing Machines Outperform LSTMs Graves et al 2014
- 11. 10 28 Balanced Parenthesis Tristan Deleu https://medium.com/snips-ai/
- 12. 11 28 bAbI dataset 1 Mary moved to the bathroom. 2 John went to the hallway. 3 Where is Mary? bathroom 1 4 Daniel went back to the hallway. 5 Sandra moved to the garden. 6 Where is Daniel? hallway 4 7 John moved to the office. 8 Sandra journeyed to the bathroom. 9 Where is Daniel? hallway 4 10 Mary moved to the hallway. 11 Daniel travelled to the office. 12 Where is Daniel? office 11 13 John went back to the garden. 14 John moved to the bedroom. 15 Where is Sandra? bathroom 8 1 Sandra travelled to the office. 2 Sandra went to the bathroom. 3 Where is Sandra? bathroom 2 Small vocabulary Stories Context https://research.facebook.com/research/babi/
- 13. 12 28 bAbI results Empirical Study on Deep Learning Models for Question Answering Yu et al. 2015
- 14. 13 28 Challenges and Recommendations
- 15. 14 28 Problems Architecture dependent Large number of parameters Doesn’t benefit much from GPU acceleration Hard to train
- 16. 15 28 Hard to train Numerical Instability Using memory is hard Needs smart optimization Difficult to use in practice
- 17. 16 28 Combating Numerical Instability: Gradient clipping Limits training speed of parameters Particularly helpful for learning long range dependencies
- 18. 17 28 Loss clipping Cap total response to a given training batch Helpful in addition to gradient clipping
- 19. 18 28 Graves’ RMSprop A version of back propagation used to train the network Used in many of Graves’ RNN papers: 𝑛𝑖 = 𝛼 + 1 − 𝛼 𝜖𝑖 2 𝑔𝑖 = 𝛼𝑔𝑖 + 1 − 𝛼 𝜖𝑖 Δ𝑖 = 𝛽Δ𝑖 − 𝛾 𝜖𝑖 𝑛𝑖 − 𝑔𝑖 2 + 𝛾 + 𝛿 𝑤𝑖 = 𝑤𝑖 + Δ𝑖 Similar to normalizing gradient updates by their variance, important for the NTM’s high-variability changes in loss.
- 20. 19 28 Adam Optimizer Works well for many tasks Comes pre-loaded in most ML frameworks Like Graves’ RMSprop, smooths gradients
- 21. 20 28 Attention to initialization Memory initialization extremely important Poor initialization can prevent convergence Pay particularly close attention to the starting value of the memory
- 22. 21 28 Short sequences first (“Curriculum Learning”) 1) Feed in short training data 2) When loss hits a target, increase the size of the input 3) Repeat
- 23. 22 28 Dynamic Neural Computers
- 24. 23 28 Neural Turing Machines “V2” Similar to NTMs, except… No index shift based addressing Can ‘allocate’ and ‘deallocate’ memory Remembers recent memory use
- 25. 24 28 Architecture updates(1) Graves et al. 2016
- 26. 25 28 Architecture updates(2) Graves et al. 2016
- 27. 26 28 Dynamic Neural Computer Performance on Inference Tasks Graves et al. 2016
- 28. 27 28 Dynamic Neural Computer bAbI Results Graves et al. 2016
- 29. 28 28 References Implementations: Tensorflow: https://github.com/carpedm20/NTM-tensorflow Go: https://github.com/fumin/ntm Torch: https://github.com/kaishengtai/torch-ntm Node.JS: https://github.com/gcgibson/NTM Lasagne: https://github.com/snipsco/ntm-lasagne Theano: https://github.com/shawntan/neural-turing-machines Papers: Graves et al. 2016 – Hybrid computing using a neural network with dynamic external memory Graves et al. 2014 – Neural Turing Machines Yu et al. 2015 – Empirical Study on Deep Learning Models for Question Answering Rae et al. 2016 – Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes
- 30. 29 28 NTM operations The Convolutional Shift parameter has proven to be one of if not the most problematic.
- 31. 30 28

No public clipboards found for this slide

Be the first to comment