Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

2,748 views

Published on

Attention Neural Net Model Fundamentals: Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.

This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.

Published in: Technology
  • Be the first to comment

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

  1. 1. Attention Models Melanie Warrick @nyghtowl
  2. 2. @nyghtowl Overview - Attention - Soft vs Hard - Hard Attention for Computer Vision - Learning Rule - Example Performance
  3. 3. @nyghtowl Attention ~ Selective
  4. 4. @nyghtowl Attention Mechanism input focus sequential | context weights
  5. 5. @nyghtowl Zoom-lens adds changing filter size Attention Techniques Spotlight varying resolution
  6. 6. @nyghtowl Where to look? Attention Decision
  7. 7. @nyghtowl Soft - read all input & weighted average of all expected output - standard loss derivative Hard - samples input & weighted average of estimated output - policy gradient & variance reduction Model Types
  8. 8. @nyghtowl Soft vs Hard Focus Examples Soft Hard
  9. 9. @nyghtowl Soft Attention Value Challenge scale limitations CONTEXT AWARE
  10. 10. @nyghtowl Value data size # computations Challenge context & training time Hard Attention
  11. 11. @nyghtowl Model Variations Soft - NTM Neural Turing Machine - Memory Network - DRAW Deep Recurrent Attention Writer (“Differentiable”) - Stacked-Augmented Recurrent Nets Hard - RAM Recurrent Attention Model - DRAM Deep Recurrent Attention Model - RL-NTM Reinforce Neural Turing Machine
  12. 12. @nyghtowl - Memory - Reading / Writing - Language generation - Picture generation - Classifying image objects - Image search - Describing images / videos Applications
  13. 13. @nyghtowl Hard Model & Computer Vision
  14. 14. @nyghtowl Convolutional Neural Nets
  15. 15. @nyghtowl Linear Complexity Growth
  16. 16. @nyghtowl Constrained Computations
  17. 17. @nyghtowl Recurrent Neural Nets
  18. 18. @nyghtowl General Goal - min error | max reward - reward can be sparse & delayed
  19. 19. @nyghtowl Deep Recurrent Attention Model
  20. 20. @nyghtowl REINFORCE Learning Rule weight change = reward change given glimpse
  21. 21. @nyghtowl Performance Comparison SVHN - Street View House Number data-set
  22. 22. @nyghtowl Performance Comparison DRAM vs CNN - Computation Complexity
  23. 23. @nyghtowl Last Points - adaptive selection & context - constrained computations - accuracy
  24. 24. @nyghtowl ● Neural Turing Machines http://arxiv.org/pdf/1410.5401v2.pdf (Graves et al., 2014) ● Reinforcement Learning NTM http://arxiv.org/pdf/1505.00521v1.pdf (Zaremba et al., 2015) ● End-To-End Memory Network http://arxiv.org/pdf/1503.08895v4.pdf (Sukhbaatar et al., 2015) ● Recurrent Models of Visual Attention http://arxiv.org/pdf/1406.6247v1.pdf (Mnih et al., 2014) ● Multiple Object Recognition with Visual Attention http://arxiv.org/pdf/1412.7755v2.pdf (Ba et al., 2014) ● Show, Attend and Tell http://arxiv.org/pdf/1502.03044v2.pdf (Xu et al., 2015) ● DRAW http://arxiv.org/pdf/1502.04623v2.pdf (Gregor et al., 2015) ● Neural Machine Translation by Jointly Learning to Align and Translate http://arxiv.org/pdf/1409. 0473v6.pdf (Bahdanau et al., 2014) ● Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets http://arxiv.org/pdf/1503. 01007v4.pdf (Joulin et al., 2015) ● Deep Learning Theory & Applicaitons: https://www.youtube.com/watch?v=aUTHdgh1OjI ● The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github. io/2015/05/21/rnn-effectiveness/ ● Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning http://www- anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf (Williams, 1992) References
  25. 25. @nyghtowl ● Spatial Transformer Networks http://arxiv.org/pdf/1506.02025v1.pdf (Jaderberg et al., 2015) ● Recurrent Spatial Transformer Networks http://arxiv.org/pdf/1509.05329v1.pdf (Sønderby et al., 2015) ● Spatial Transformer Networks Video https://youtu.be/yGFVO2B8gok ● Learning Stochastic Feedforward Neural Networks http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang & Salakhutdinov, 2013) ● Learning Stochastic Recurrent Networks http://arxiv.org/pdf/1411.7610v3.pdf (Bayer & Osendorfer 2015) ● Learning Generative Models with Visual Attention http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang et al., 2014) References
  26. 26. @nyghtowl Special Thanks ● Mark Ettinger ● Rewon Child ● Diogo Almeida ● Stanislav Nikolov ● Adam Gibson ● Tarin Ziyaee ● Charlie Tang ● Dave Kammeyer
  27. 27. @nyghtowl References: Images ● http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ ● http://deeplearning.net/tutorial/lenet.html ● https://stats.stackexchange.com/questions/114385/what-is-the-difference-between- convolutional-neural-networks-restricted-boltzma ● http://myndset.com/2011/12/15/making-the-switch-where-to-find-the-money-for-your-digital- marketing-strategy/ ● http://blog.archerhotel.com/spyglass-rooftop-bar-nyc-making-manhattan-look-twice/ ● http://www.serps-invaders.com/blog/how-to-find-broken-links-on-your-site/ ● http://arxiv.org/pdf/1502.04623v2.pdf ● https://en.wikipedia.org/wiki/Attention ● http://web.media.mit.edu/~lieber/Teaching/Context/
  28. 28. @nyghtowl Attention Models Melanie Warrick skymind.io (company) gitter.im/deeplearning4j/deeplearning4j
  29. 29. @nyghtowl Artificial Neural Nets Input OutputHidden Run until error stops improving = converge Loss Function Outputk jX M kj Wy

×