Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning Where and When to Look #reworkRETAIL 2018

383 views

Published on

Learning Where and When to Look

Deep learning models do not only achieve superior performances in image recognition tasks, but also in predicting where and when users focus their attention. This talk will provide an overview of how convolutional neural networks have been trained to predict saliency maps that describe the probability of fixing the gaze on each image location. Different solution have been proposed for this task, and our recent work has added a temporal dimension by predicting the gaze scanpath over 360 degree images for VR/AR. These techniques allow simulating eye tracker data with no need of user data collection.

https://www.re-work.co/events/deep-learning-in-retail-summit-london-2018

Published in: Data & Analytics
  • Be the first to comment

Learning Where and When to Look #reworkRETAIL 2018

  1. 1. DEEP LEARNING IN RETAIL & ADVERTISING SUMMIT London, UK 16th March 2018 Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Learning Where and When to Look @DocXavi[Slides on GDrive] #reworkRETAIL
  2. 2. A salient team Kevin McGuinness Noel E. O’Connor Cristian Canton Junting Pan Marc Assens Marta Coll Elisa Sayrol Jordi Torres
  3. 3. 3 Click to learn more
  4. 4. 4 Let’s play a game!
  5. 5. The importance of visual attention
  6. 6. The importance of visual attention
  7. 7. The importance of visual attention
  8. 8. The importance of visual attention
  9. 9. The importance of visual attention
  10. 10. Why don’t we see the changes? We don’t really see the whole image We only focus on small specific regions: the salient parts Human beings reliably attend to the same regions of images when shown
  11. 11. What we perceive
  12. 12. Where we look
  13. 13. What we actually see
  14. 14. Can we predict WHEREhumans will look? Yes! Computational models of visual saliency Why might this be useful?
  15. 15. 17Johanna Närväinen, Janne Laine, “Looking through their eyes” VTT Impulse 2016.
  16. 16. 18Johanna Närväinen, Janne Laine, “Looking through their eyes” VTT Impulse 2016.
  17. 17. 19Johanna Närväinen, Janne Laine, “Looking through their eyes” VTT Impulse 2016.
  18. 18. Image (input) Saliency map (ouput)
  19. 19. 21John Markoff, “Scientists see promise in deep learning Programs”, The New York Times (2012). Photo: Keith Penner
  20. 20. 22Yazmin How, “Interview: Yoshua Bengio, Yann LeCun, Geoffrey Hinton” RE·WORK (2017).
  21. 21. 23A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” NIPS 2012 Orange AlexNet Photo: Keith Penner
  22. 22. 24 DATA AlexNet ALGORITHMS CNNs COMPUTATION
  23. 23. 25 JuntingNet ALGORITHMS CNNs DATA COMPUTATION
  24. 24. 26J. Pan, X. Giró-i-Nieto, “End-to-end Convolutional Network for Saliency Prediction” (CVPRW 2015) End to end CNN for Saliency Prediction
  25. 25. 27 Upsample + filter 2D map J. Pan, X. Giró-i-Nieto, “End-to-end Convolutional Network for Saliency Prediction” (CVPRW 2015) JuntingNet
  26. 26. 28
  27. 27. 29 JuntingNet ALGORITHMS CNNs DATA COMPUTATION
  28. 28. 30 SalNet ALGORITHMS CNNs DATA COMPUTATION
  29. 29. 31 Convolution 1 (7 × 7 × 96) Input (size 320 × 240 × 3) Local response normalization Max pool (kernel 3 × 3 stride 2) Convolution 2 (5 × 5 × 256) Max pool (kernel 3 × 3 stride 2) Convolution 3 (3 × 3 × 512) Convolution 4 (5 × 5 × 512) Convolution 5 (5 × 5 × 512) Convolution 6 (7 × 7 × 256) Convolution 7 (11 × 11 × 128) Convolution 8 (13 × 13 × 1) Deconvolution 1 (8 × 8 × 1, stride 4) VGG-M Pan, Junting, Kevin McGuinness, Elisa Sayrol, Xavier Giro-i-Nieto, and Noel E. O'Connor. "Shallow and deep convolutional networks for saliency prediction." CVPR 2016. SalNet
  30. 30. 32 SalNet ALGORITHMS CNNs DATA COMPUTATION
  31. 31. 33 DATA COMPUTATION ALGORITHMS CNNs ALGORITHMS Adversarial SalGAN
  32. 32. 34 SalGAN DATA COMPUTATION ALGORITHMS CNNs ALGORITHMS Adversarial
  33. 33. 35Credit: Santiago Pascual [slides] [video]
  34. 34. 36 Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." NIPS 2014 Interview to Ian Goodfellow at Re-Work Deep Learning Summit San Francisco, 2018. Adversarial Training
  35. 35. 37 Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” CVPRW 2017. Generator SalGAN Discriminator
  36. 36. Can we predict WHERE & WHEN humans will look?
  37. 37. Challenge: Create a model able to predict visual scanpaths
  38. 38. 40 SalTiNet ALGORITHMS CNNs DATA COMPUTATION
  39. 39. 41 SalTiNet ALGORITHMS CNNs DATA COMPUTATION
  40. 40. 42 Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. “A Dataset of Head and Eye Movements for 360 Degree Images“. MMSys 2017
  41. 41. 43 Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. “A Dataset of Head and Eye Movements for 360 Degree Images“. MMSys 2017
  42. 42. 44 SalTiNet ALGORITHMS CNNs DATA COMPUTATION
  43. 43. 45 Create a model able to predict visual scanpaths SalTiNet solution: Add a TIME dimension to saliency maps with saliency volumes and sample from them. Challenge:
  44. 44. 46 Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017. width height timeSaliency Volume
  45. 45. 47 00000000000000000 00000000000000000 00000000000000000 00000000000000000 00000000000000000 00000000000000000 00000000000000000 width height time Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017. Saliency Volume
  46. 46. 48 00000000000000000 00000000000000000 00010000000010000 00000000000000000 00000000000000000 00000000010000000 00000000000000000 Fixation points : (x, y, t) width height time Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017. Saliency Volume
  47. 47. 49 00000000000000000 00000000000000000 00010000000010000 00000000000000000 00000000000000000 00000000010000000 00000000000000000 Convolution Fixation points : (x, y, t) width height timeSaliency Volume Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  48. 48. 50 Predicted saliency volume SalTiNet Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  49. 49. 51 Stochastic sampling SalTiNet Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  50. 50. 52 Winners of IEEE ICME 2017 Challenge Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  51. 51. 53 SalTiNet ALGORITHMS CNNs DATA COMPUTATION
  52. 52. 54 PathGAN DATA COMPUTATION ALGORITHMS CNN & RNN ALGORITHMS Adversarial
  53. 53. 55 PathGAN DATA COMPUTATION ALGORITHMS CNN & RNN ALGORITHMS Adversarial
  54. 54. 56 Create a model able to predict visual scanpaths PathGAN solution: Generate a sequence of fixation points with recurrent neural networks (RNNs). Challenge:
  55. 55. 57 Recurrent Neural Network (RNN) Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780.
  56. 56. 58 Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “PathGan: Visual Scan-path Prediction with Generative Adversarial Networks“ arXiv 2018. PathGAN
  57. 57. 59 Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “PathGan: Visual Scan-path Prediction with Generative Adversarial Networks“ arXiv 2018. PathGAN
  58. 58. 60 PathGAN DATA COMPUTATION ALGORITHMS CNN & RNN ALGORITHMS Adversarial
  59. 59. SalNetJuntingNet SalGAN SaltiNet PathGAN Saliency maps Scanpaths
  60. 60. 2016 20172015 Convolutional Neural Networks (CNNs) Recurrent Networks Transfer Learning from ImageNet Adversarial Training 2018 Scanpath Prediction
  61. 61. ● MSc course (2017) ● BSc course (2018) 63 Deep Learning courses UPC Barcelona ● 1st edition (2016) ● 2nd edition (2017) ● 3rd edition (2018) ● 1st edition (2017) ● 2nd edition (2018) Next edition Autumn 2018 Next edition Winter/Spring 2019Summer School (late June 2018)
  62. 62. @DocXavi Xavier Giro-i-Nieto Slides available at: http://bit.ly/reworkRETAIL We are looking for both industrial & academic partners: xavier.giro@upc.edu #reworkRETAIL

×