Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

262 views

Published on

https://telecombcn-dl.github.io/2017-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

  1. 1. [course site] #DLUPC Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Visual saliency Day 4 Lecture 3
  2. 2. The importance of visual attention
  3. 3. The importance of visual attention
  4. 4. The importance of visual attention
  5. 5. The importance of visual attention
  6. 6. Why don’t we see the changes? We don’t really see the whole image We only focus on small specific regions: the salient parts Human beings reliably attend to the same regions of images when shown
  7. 7. What we perceive
  8. 8. Where we look
  9. 9. What we actually see
  10. 10. Can we predict where humans will look? Yes! Computational models of visual saliency Why might this be useful?
  11. 11. SalNet: deep visual saliency model Predict map of visual attention from image pixels (find the parts of the image that stand out) ● Feedforward 8 layer “fully convolutional” architecture ● Transfer learning in bottom 3 layers from pretrained VGG-M model on ImageNet ● Trained on SALICON dataset (simulated crowdsourced attention dataset using mouse and artificial foveation) ● MIT 300 saliency benchmark http://saliency.mit.edu/results_mit300.html Predicted Ground truth Pan, McGuinness, et al. Shallow and Deep Convolutional Networks for Saliency Prediction, CVPR 2016 http://arxiv.org/abs/1603.00845
  12. 12. ImageGroundtruthPrediction
  13. 13. ImageGroundtruthPrediction
  14. 14. SalGAN Adversarial loss Data loss Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 16
  15. 15. SalNet and SalGAN benchmarks
  16. 16. SALICON Huang et al., SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks, ICCV 2015
  17. 17. ML-NET Cornia et al., A Deep Multi-Level Network for Saliency Prediction https://arxiv.org/abs/1609.01064
  18. 18. DeepFix Weights initialized from VGG16 trained on ImageNet Dilated convolutions Location biased convolutions Inception layers Kruthiventi et al. DeepFix: A Fully Convolutional Neural Network for predicting Human Eye Fixations https://arxiv.org/abs/1510.02927
  19. 19. Applications of visual attention Intelligent image cropping Image retrieval Improved image classification
  20. 20. Intelligent image cropping
  21. 21. Image retrieval: query by example Given: ● An example query image that illustrates the user's information need ● A very large dataset of images Task: ● Rank all images in the dataset according to how likely they are to fulfil the user's information need 24
  22. 22. Retrieval benchmarks Oxford Buildings 2007 Paris Buildings 2008 TRECVID INS 2014
  23. 23. Bags of convolutional features instance search Objective: rank images according to relevance to query image Local CNN features and BoW ● Pretrained VGG-16 network ● Features from conv-5 ● L2-norm, PCA, L2-norm ● K-means clustering -> BoW ● Cosine similarity ● Query augmentation, spatial reranking Scalable, fast, high-performance on Oxford 5K, Paris 6K and TRECVid INS BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
  24. 24. Bags of convolutional features instance search BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
  25. 25. Using saliency to improve retrieval CNN CNN Saliency Semantic features Importance weighting Weighted features Pooling (e.g. BoW) Image descriptors
  26. 26. Saliency weighted retrieval Oxford Paris INSTRE Global Local Global Local Global Local No weighting 0.614 0.680 0.621 0.720 0.304 0.472 Center prior 0.656 0.702 0.691 0.758 0.407 0.546 Saliency 0.680 0.717 0.716 0.770 0.514 0.617 QE saliency - 0.784 - 0.834 0.719 Mean Average Precision
  27. 27. 12.4% Using saliency to improve image classification Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Figure credit: Eric Arazo
  28. 28. Why does it improve classification accuracy? Acoustic guitar +25 % Volleyball +23 %
  29. 29. Questions?

×