Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Paper Reviews on Visual Attention

101 views

Published on

Paper reviews on visual attention

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Paper Reviews on Visual Attention

  1. 1. Paper Reviews in Visual Attention 1 2018.3.29 SNU DATAMINING CENTER MINKI CHUNG
  2. 2. WHO AM I 2 ▸ Chung Minki ▸ BS, KAIST, IE, 2016 ▸ MS, SNU, IE, 2018..?! ▸ Vision Projects ▸ Working on Semantic Image Inpainting
  3. 3. WHAT IS VISUAL ATTENTION 3 ▸ Attention is HOT nowadays ▸ http://openaccess.thecvf.com/CVPR2017_search.py ▸ http://search.iclr2018.smerity.com/search/?query=attention
  4. 4. WHAT IS VISUAL ATTENTION 4 ▸ Maybe heard of ▸ "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ "Show, Attend, and Tell: Neural Image Caption" Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention"
  5. 5. WHAT IS VISUAL ATTENTION 5 ▸ More, Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, NIPS, 2014. "Spatial Transformer Network" Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" Siavash Gorji, James J. Clark, 2017, CVPR. "Attentional Push: A Deep Convolutional Network for Augmenting Image Salience with Shared Attention Modeling in Social Scenes"
  6. 6. WHAT IS VISUAL ATTENTION 6 ▸ Visual Attention: ▸ Attend on certain part of image to solve a task more efficiently ▸ Deep learning, the black box model → Interpretability
  7. 7. TABLE OF CONTENTS 7 ▸ Early Works ▸ Recurrent Attention Model (RAM) ▸ Spatial Transformer Network (STN) ▸ Recent Works of visual attention ▸ in ICLR ▸ in CVPR
  8. 8. PREREQUISITE 8 ▸ CNN, Transpose Convolution(or Deconvolution), Dilated Convolution ▸ RNN ▸ MLP ▸ GAN https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
  9. 9. EARLY WORKS :RAM, STN 9
  10. 10. RECURRENT ATTENTION MODEL 10 ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Google DeepMind, 563 citations ▸ Motivation: Confronted by large image, human process image sequentially, selecting where and what to look ▸ Tackle ConvNet limitation: poor scalability with increasing input image size
  11. 11. RECURRENT ATTENTION MODEL 11 ▸ Multiple Object Recognition with Visual Attention (DRAM), 2015, ICLR ▸ Refined architecture version of RAM ▸ RNN Structure with multi-resolution crop, called glimpse ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  12. 12. RECURRENT ATTENTION MODEL 12 ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" WHERE TO SEE WHAT TO SEE provide initial state locate glimpse outputs the inputs for rnn(1) for multiple objects
  13. 13. RECURRENT ATTENTION MODEL 13 ▸ Demo ▸ Single object classification https://github.com/kevinzakka/recurrent-visual-attention
  14. 14. RECURRENT ATTENTION MODEL 14 ▸ Training: ▸ maximize Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" LOWERBOUND F multiple object case
  15. 15. RECURRENT ATTENTION MODEL 15 ▸ Cont'd: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" REINFORCE
  16. 16. RECURRENT ATTENTION MODEL 16 ▸ Experiments & Results ▸ MNIST, SVHN Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  17. 17. SPATIAL TRANSFORMER NETWORK 17 ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Google DeepMind, 624 citations ▸ Motivation: Human process distorted objects by un-distorting it ▸ ConvNet is not actually invariant to large transformation(only realised over a deep hierarchy of max-pooling) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" https://kevinzakka.github.io/2017/01/18/stn-part2/
  18. 18. SPATIAL TRANSFORMER NETWORK 18 ▸ Architecture: ▸ three parts: localisation net, sampling grid, sampler ▸ Assume 𝛵𝜃 is 2D affine transformation A𝜃, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" regression H,W,C H',W',C
  19. 19. SPATIAL TRANSFORMER NETWORK 19 ▸ 𝛵𝜃, for attention becomes: ▸ Allowing cropping, translation, isotropic scaling ▸ In case if a bilinear sampling kernel, ▸ Differentiable, Modular, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  20. 20. SPATIAL TRANSFORMER NETWORK 20 ▸ Experiments and Results ▸ MNIST ▸ SVHN Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  21. 21. SPATIAL TRANSFORMER NETWORK 21 ▸ Experiments and Results ▸ Fine-grained classification (CUB-200-211 bird classification dataset) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  22. 22. SPATIAL TRANSFORMER NETWORK 22 ▸ Already implemented in Tensorlayer Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  23. 23. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 23 ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ RAM(Glimpse system) + STN(Differentiability) for Saliency Detection Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  24. 24. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 24 ▸ Recurrent Attentional Convolutional-Deconvolutional Network (RACDNN) ▸ Architecture Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  25. 25. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 25 ▸ Experiments & Results Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  26. 26. RECENT WORKS :ICLR, CVPR 26
  27. 27. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 27 ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Adobe Research Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  28. 28. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 28 ▸ Architecture ▸ Two-stage(coarse to fine) ▸ Global and Local W-GANS ▸ Spatially discounted reconstruction loss(𝑙1): 𝛾 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention USE W-GAN attention 𝑙
  29. 29. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 29 ▸ Attention Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention fx,y bx,y Calculate cosine similarity:
  30. 30. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 30 ▸ Experiments & Results Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  31. 31. LEARN TO PAY ATTENTION 31 ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Very simple Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  32. 32. LEARN TO PAY ATTENTION 32 ▸ Architecture Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" Attention Compatibility function(dot product)
  33. 33. LEARN TO PAY ATTENTION 33 ▸ Experiments & Results ▸ Image classification and fine-grained recognition Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  34. 34. LEARN TO PAY ATTENTION 34 ▸ Experiments & Results ▸ Weakly supervised semantic segmentation Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  35. 35. LOOK CLOSER TO SEE BETTER 35 ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition" ▸ Fine-grained image recognition: ▸ Discriminative region localization + fine-grained feature learning Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  36. 36. LOOK CLOSER TO SEE BETTER 36 ▸ Recurrent Attention Convolutional Neural Network (RA-CNN) ▸ Multi-scale networks: classification sub-network, attention proposal sub- network(APN) ▸ Finer-scale network (coarse to fine) ▸ Intra-scale softmax loss for classification, inter-scale pairwise ranking loss for APN Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  37. 37. LOOK CLOSER TO SEE BETTER 37 ▸ RA-CNN architecture: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" bilinear interpolation to amplify
  38. 38. LOOK CLOSER TO SEE BETTER 38 ▸ Training: ▸ Multi-task loss: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" forces
  39. 39. LOOK CLOSER TO SEE BETTER 39 ▸ Experiments & Results ▸ CUB-200-211 Bird Dataset Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  40. 40. LOOK CLOSER TO SEE BETTER 40 ▸ Experiments & Results ▸ Stanford Dogs, Stanford Cars Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  41. 41. SUMMARY 41 ▸ Attention for efficiency, better performance, interpretability ▸ Many types of Attention: ▸ RAM ▸ STN ▸ RAM+STN ▸ Others
  42. 42. ANY Q? 42
  43. 43. REFERERNCE 43 ▸ Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention" ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition"
  44. 44. END OF DOCUMENT 44

×