Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)

599 views

Published on

http://imatge-upc.github.io/telecombcn-2016-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)

  1. 1. Day 3 Lecture 2 Saliency Prediction Acknowledments: Junting Pan, Kevin McGuinness and Xavier Giró-i-Nieto Elisa Sayrol [course site]
  2. 2. 2 Saliency
  3. 3. 3 What have you seen? Saliency
  4. 4. 4 Lighthouse Saliency
  5. 5. 5 House Lighthouse Saliency
  6. 6. 6 Rocks House Lighthouse Saliency
  7. 7. 7 Saliency
  8. 8. Saliency Map Original Image Ground Truth Saliency Map (Eye-Fixation Map) The Goal is to obtain the Saliency Map of an Image. Regression problem, not Classification
  9. 9. 9 Eye Tracker Mouse Click Data Bases: Groundtruth generation
  10. 10. 10 DataBases Other databases: http://saliency.mit.edu/datasets.html TRAIN VALIDATION TEST SALICON Jiang’15 10,000 5,000 5,000 iSun Xu’15 6,000 926 2,000 CAT2000 [Borji’15] 2,000 - 2,000 MIT300 [Judd’12] 300 - - Pascal-S 850
  11. 11. 11 Upsample + filter 2D map 96x96 2340=48x48 Architectures: Junting Net (Shallow Network)
  12. 12. Loss function Mean Square Error (MSE) Weight initialization Gaussian distribution Learning rate 0.03 to 0.0001 Mini batch size 128 Training time 7h (SALICON) / 4h (iSUN) Acceleration SGD+ nesterov momentum (0.9) Regularisation Maxout norm GPU NVidia GTX 980 Architectures: Junting Net (Shallow Network) Shallow and Deep Convolutional Networks for Saliency Prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, Xavier Giro-i-Nieto, CVPR 2016 Winner of the LSUN Challenge 2015!!
  13. 13. Architectures: SalNet (Deep Network) Loss function Mean Square Error (MSE) Weight initialization First 3 layers pre-trained with VGG, the rest of the layers random distribution Learning rate 0,01(halved every 100 iterations) Mini batch size 2 images for 24.000 iterations Training time 15h Acceleration SGD+ nesterov momentum (0.9) Regularisation L2 weight GPU NVidia GTX Titan Shallow and Deep Convolutional Networks for Saliency Prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, Xavier Giro-i-Nieto, CVPR 2016
  14. 14. Quality Results
  15. 15. 15 Results from CVPR LSUN Challenge 2015 (iSUN Database) Architectures: Junting Net (Shallow Network) Winner of the LSUN Challenge 2015!!
  16. 16. Quantitative Results Metrics: Saliency and Human Fixations: State-of-the-art and Study of Comparison Metrics Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin and Thierry Dutoit, iccv 2013
  17. 17. Similar to VGG_16 Architectures: Saliency Unified ( Very Deep Network) Saliency Unified: A Deep Architecture for simultaneous Eye Fixation Prediction and Salient Object Segmentation Srinivas S S Kruthiventi, Vennela Gudisa, Jaley H Dholakiya and R. Venkatesh Babu, CVPR 2016
  18. 18. Quantitative Results Saliency Unified: A Deep Architecture for simultaneous Eye Fixation Prediction and Salient Object Segmentation Srinivas S S Kruthiventi, Vennela Gudisa, Jaley H Dholakiya and R. Venkatesh Babu, CVPR 2016

×