Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

493 views

Published on

https://imatge-upc.github.io/pathgan/

We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

  1. 1. bit.ly/PathGAN @DocXavi EPIC Third International Workshop on Egocentric Perception, Interaction and Computing Munich, Germany 9th September, 2018 PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks @DocXavi Marc Assens Kevin McGuinness Noel E. O’Connor Xavier Giro-i-Nieto
  2. 2. bit.ly/PathGAN @DocXavi 2 Let’s play a game!
  3. 3. bit.ly/PathGAN @DocXavi The importance of visual attention
  4. 4. bit.ly/PathGAN @DocXavi
  5. 5. bit.ly/PathGAN @DocXavi The importance of visual attention
  6. 6. bit.ly/PathGAN @DocXavi The importance of visual attention
  7. 7. bit.ly/PathGAN @DocXavi
  8. 8. bit.ly/PathGAN @DocXavi The importance of visual attention
  9. 9. bit.ly/PathGAN @DocXavi The importance of visual attention
  10. 10. bit.ly/PathGAN @DocXavi Why don’t we see the changes? We don’t really see the whole image We only focus on small specific regions: the salient parts Human beings reliably attend to the same regions of images when shown
  11. 11. bit.ly/PathGAN @DocXavi What we perceive
  12. 12. bit.ly/PathGAN @DocXavi Where we look
  13. 13. bit.ly/PathGAN @DocXavi What we actually see
  14. 14. bit.ly/PathGAN @DocXavi Why predicting WHEREhumans will look?
  15. 15. bit.ly/PathGAN @DocXavi 15Johanna Närväinen, Janne Laine, “Looking through their eyes” VTT Impulse 2016.
  16. 16. bit.ly/PathGAN @DocXavi 16 Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. “A Dataset of Head and Eye Movements for 360 Degree Images“. MMSys 2017
  17. 17. bit.ly/PathGAN @DocXavi Can we predict WHEREhumans will look? Yes! We “just” need to collect data...
  18. 18. bit.ly/PathGAN @DocXavi 18
  19. 19. bit.ly/PathGAN @DocXavi Can we predict WHEREhumans will look? Yes! We “just” need to collect data... ...and train a deep neural network !
  20. 20. bit.ly/PathGAN @DocXavi Image (input) Visual scanpath (ouput)
  21. 21. bit.ly/PathGAN @DocXavi Image (input) Saliency map (ouput)
  22. 22. bit.ly/PathGAN @DocXavi Our brief history of saliency models Eva Mohedano Panagiotis Linardos Cristian Canton Junting Pan Èric Arazo Marta Coll Elisa Sayrol Jordi Torres
  23. 23. bit.ly/PathGAN @DocXavi JuntingNet Saliency maps
  24. 24. bit.ly/PathGAN @DocXavi 24
  25. 25. bit.ly/PathGAN @DocXavi 25 Upsample + filter 2D map J. Pan, X. Giró-i-Nieto, “End-to-end Convolutional Network for Saliency Prediction” (CVPRW 2015) JuntingNet
  26. 26. bit.ly/PathGAN @DocXavi SalNetJuntingNet Saliency maps
  27. 27. bit.ly/PathGAN @DocXavi 27
  28. 28. bit.ly/PathGAN @DocXavi 28 VGG-M Pan, Junting, Kevin McGuinness, Elisa Sayrol, Xavier Giro-i-Nieto, and Noel E. O'Connor. "Shallow and deep convolutional networks for saliency prediction." CVPR 2016. SalNet
  29. 29. bit.ly/PathGAN @DocXavi SalNetJuntingNet SalGAN Saliency maps
  30. 30. bit.ly/PathGAN @DocXavi 30
  31. 31. bit.ly/PathGAN @DocXavi 31
  32. 32. bit.ly/PathGAN @DocXavi 32 Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” CVPRW 2017. Generator SalGAN Discriminator
  33. 33. bit.ly/PathGAN @DocXavi SalNetJuntingNet SalGAN SaltiNet Saliency maps Scanpaths
  34. 34. bit.ly/PathGAN @DocXavi 34
  35. 35. bit.ly/PathGAN @DocXavi 35 Stochastic sampling SalTiNet Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  36. 36. bit.ly/PathGAN @DocXavi 36 Winners of IEEE ICME 2017 Challenge Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
  37. 37. bit.ly/PathGAN @DocXavi SalNetJuntingNet SalGAN SaltiNet PathGAN Saliency maps Scanpaths
  38. 38. bit.ly/PathGAN @DocXavi 38
  39. 39. bit.ly/PathGAN @DocXavi 39
  40. 40. bit.ly/PathGAN @DocXavi 40 Create a model able to predict visual scanpaths PathGAN solution: Generate a sequence of fixation points with Recurrent Neural Networks (RNNs). Challenge:
  41. 41. bit.ly/PathGAN @DocXavi 41 Recurrent Neural Network (RNN) Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780.
  42. 42. bit.ly/PathGAN @DocXavi 42
  43. 43. bit.ly/PathGAN @DocXavi 43Credit: Santiago Pascual [slides] [video]
  44. 44. bit.ly/PathGAN @DocXavi 44 ● Pick a real scanpath x from training set ● Show x to D and update weights to output 1 (real) Adversarial training in a nutshell
  45. 45. bit.ly/PathGAN @DocXavi 45 ● Pick a real scanpath x from training set ● Show x to D and update weights to predict “real” (1). Adversarial training in a nutshell
  46. 46. bit.ly/PathGAN @DocXavi 46 Adversarial training in a nutshell ● G generates a synthetic scanpath ẍ (adding noise & drop out) ● Show synthetic scanpath ẍ to D and update its weights to predict “fake” (0).
  47. 47. bit.ly/PathGAN @DocXavi 47 Adversarial training in a nutshell ● Freeze D weights ● Keep showing the same synthetic scanpath ẍ to D ● Update G weights to make D predict “real” (1) (only G weights!) ● Unfreeze D Weights and repeat
  48. 48. bit.ly/PathGAN @DocXavi 48 Adversarial Loss for cGAN Content Loss Multiobjective Loss for G
  49. 49. bit.ly/PathGAN @DocXavi 49 Training curves
  50. 50. bit.ly/PathGAN @DocXavi 50 Model Architecture (cGAN) Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014).
  51. 51. bit.ly/PathGAN @DocXavi 51 Model Architecture (cGAN)
  52. 52. bit.ly/PathGAN @DocXavi 52 Model Architecture (cGAN)
  53. 53. bit.ly/PathGAN @DocXavi 53 Qualitative Results: iSUN
  54. 54. bit.ly/PathGAN @DocXavi 54 Quantitative Results: iSUN
  55. 55. bit.ly/PathGAN @DocXavi 55 Qualitative Results: Salient360(2017)
  56. 56. bit.ly/PathGAN @DocXavi 56 Quantitative Results: Salient360(2017)
  57. 57. bit.ly/PathGAN @DocXavi 57
  58. 58. bit.ly/PathGAN @DocXavi SalNetJuntingNet SalGAN SaltiNet PathGAN Saliency maps Scanpaths
  59. 59. bit.ly/PathGAN @DocXavi 59 Next: What about video ? Check our other poster presented by Eva Mohedano ! Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin, Xavier Giro-i-Nieto. “Temporal Saliency Adaptation in Egocentric Videos”, Extended abstract at the ECCV Workshop on Egocentric Perception, Interaction and Computing (EPIC), 2018. EgoMon datasetSalGAN + ConvLSTM Saliency maps
  60. 60. bit.ly/PathGAN @DocXavi Project page at: http://bit.ly/pathgan xavier.giro@upc.edu Danke schön !

×