Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The impact of visual saliency prediction in image classification

1,022 views

Published on

https://imatge.upc.edu/web/publications/impact-visual-saliency-prediction-image-classification

This thesis introduces an architecture to improve the accuracy of a Convolutional Neural Network trained for image classification using visual saliency predictions from the original images. In this thesis the accuracy of a Convolutional Neural Network (CNN) trained for classification has been improved using saliency maps from the original images. The network had an AlexNet architecture and was trained using 1.2 million images from the Imagenet dataset. Two methods had been explored in order to exploit the information from the visual saliency predictions. The first methodologies implemented applied the saliency maps directly to the existing layers of the CNN, which in some cases were already trained for classification and in other they were initialized with random weights. In the second methodology the information from the saliency maps was merged from a new branch, trained at the same time as the initial CNN. In order to speed up the training of the networks the experiments were implemented using images reduced to 128x128. With this sizes the proposed model achieves 12.39% increase in Top-1 accuracy performance with respect to the original CNN, and additionally reduces the number of parameters needed compared to AlexNet. Regarding the original size images 227x227 a model that increases 1.72% Top-1 accuracy is proposed. To accelerate the training process of the network the images have been reduced. The methodology that provides the higher improvement in accuracy will be implemented using the original size of the images. The results will be compared to those obtained from the network trained only with the original images. All the methodologies proposed are implemented in a network previously trained for classification. Additionally the most successful methodologies will be implemented in the training of a network. The results will provide information about the best way to add saliency maps to improve the accuracy.

Published in: Data & Analytics
  • Yes you are right. There are many research paper writing services available now. But almost services are fake and illegal. Only a genuine service will treat their customer with quality research papers. ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • To get professional research papers you must go for experts like ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

The impact of visual saliency prediction in image classification

  1. 1. The Impact of Visual Saliency Prediction in Image Classification 1 Eric Arazo Sánchez Kevin McGuinness Eva Mohedano Xavier Giró-i-Nieto Advisors:
  2. 2. Introduction - Computer vision 2 Classifier Handcrafted descriptors “guitar” Classifier Learned descriptors Trainable Trainable Classical computer vision Deep Learning “guitar”
  3. 3. Introduction - Imagenet 3 Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge”. International Journal of Computer Vision (2015).
  4. 4. Imagenet 4 Images: ● 1.2 M train ● 50,000 test ● 1,000 categories Evaluation dataset unpublished before the competition
  5. 5. Imagenet 5 Metrics: ● Top-1 accuracy ● Top-5 accuracy
  6. 6. Imagenet 6 Metrics: ● Top-1 accuracy ● Top-5 accuracy
  7. 7. Introduction - Imagenet 7 ILSVRC - Evolution since 2010 Slide credit: Kaiming He (FAIR)
  8. 8. Introduction - Imagenet 8 ILSVRC - Evolution since 2010 Slide credit: Kaiming He (FAIR) Some models have already reached human-level performance. Still the olympic games of computer vision?
  9. 9. Introduction - Imagenet 9Slide credit: Kaiming He (FAIR) -9.4% 2012 Introduction of the Convolutional Neural Networks (CNN) in the competition with AlexNet ILSVRC - Evolution since 2010
  10. 10. Introduction - AlexNet 10 Ref: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. NIPS 2012.
  11. 11. Introduction - AlexNet 11 5 Convolutional Layers 3 Fully Connected Layers 1000 softmax Object class
  12. 12. Introduction - CNN 12 LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
  13. 13. Introduction - CNN 13 LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. CNN are very useful in computer vision: ● Reduction of parameters (shared filters) ● Spatial coherence
  14. 14. Introduction - CNN 14 Image captioning Image segmentation
  15. 15. Introduction - CNN 15 Saliency prediction
  16. 16. Introduction - Saliency prediction 16 CNN model Images Saliency maps
  17. 17. Introduction - Saliency prediction 17 CNN for image classification
  18. 18. Objective 18 ● Explore if saliency maps could improve other computer vision tasks
  19. 19. Objective 19 ● Explore if saliency maps could improve computer vision tasks
  20. 20. Objective 20 ● Explore if saliency maps could improve computer vision tasks
  21. 21. Outline ● Introduction ● Objective ● State-of-the-art ● Methodology ● Conclusions ● Future work 21
  22. 22. State-of-the-art - Saliency prediction 22 SalNet Pan, Junting and McGuinness, Kevin and Sayrol, Elisa and Giro-i-Nieto, Xavier and O'Connor, Noel E. Shallow and Deep Convolutional Networks for Saliency Prediction. CVPR 2016. Trained on SALICON
  23. 23. Saliency prediction 23 Application of saliency:
  24. 24. Saliency prediction 24 Application of saliency: ● In image retrieval ○ Finding the last appearance of an object. Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)
  25. 25. Saliency prediction 25 Application of saliency: ● In image retrieval ○ Finding the last appearance of an object. ● Object recognition ○ Health care Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016) Ref: Pérez de San Roman, Philippe et al. Saliency Driven Object recognition in egocentric videos with deep CNN. 2016
  26. 26. Saliency prediction - our approach 26
  27. 27. Saliency prediction - our approach 27 AlexNet*SalNet
  28. 28. Outline ● Introduction ● Objective ● State-of-the-art ● Methodology ● Conclusions ● Future work 28
  29. 29. Methodology 29 RGB images
  30. 30. 30 RGB images RGB - The Baseline
  31. 31. 31 RGB images RGB - The Baseline ● 1.2 M images ● 227 x 227
  32. 32. ● 1.2 M images ● 227 x 227 32 RGB images RGB - The Baseline 9 days to train on computation cluster
  33. 33. RGB - The Baseline 33
  34. 34. RGB - The Baseline 34 9 days 5 days
  35. 35. RGB - The Baseline 35 9 days 5 days 1.5 days
  36. 36. How to introduce saliency predictions? 36 Multiplication Fan-in Network Concatenation
  37. 37. 37 Alexnet Multiplication Fan-in Network Concatenation Alexnet How to introduce saliency predictions?
  38. 38. 38 Multiplication Fan-in Network Concatenation Alexnet Alexnet How to introduce saliency predictions?
  39. 39. 39 Multiplication Fan-in Network Concatenation Alexnet Alexnet Alexnet CNN How to introduce saliency predictions?
  40. 40. 40 Multiplication Fan-in Network Concatenation Where? Alexnet Alexnet Alexnet CNN How to introduce saliency predictions?
  41. 41. 41 Multiplication Fan-in Network Concatenation Alexnet Alexnet Alexnet CNN How to introduce saliency predictions?
  42. 42. 42 Alexnet Alexnet Alexnet CNN Makes sense to use the baseline, which is already trained Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  43. 43. 43 Alexnet Alexnet Alexnet CNN Makes sense to use the baseline, which is already trained Multiplication Fan-in Network Concatenation Pre-trained CNN How to introduce saliency predictions?
  44. 44. Multiplication vs. Concatenation 44 Three strategies for each of them:
  45. 45. Multiplication vs. Concatenation 45 Three strategies for each of them: RGBS Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  46. 46. Multiplication vs. Concatenation 46 Three strategies for each of them: RGB-1S-2SRGBS Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  47. 47. Multiplication vs. Concatenation 47 Three strategies for each of them: RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  48. 48. Multiplication vs. Concatenation 48 RGBSRGBS RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  49. 49. Multiplication vs. Concatenation 49 RGBSRGBS RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  50. 50. Multiplication vs. Concatenation 50 RGB-1S-2S RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  51. 51. Multiplication vs. Concatenation 51 RGB-1S-2S RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  52. 52. Multiplication vs. Concatenation 52 RGBS-1S-2S RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  53. 53. Multiplication vs. Concatenation 53 RGBS-1S-2S RGBS RGB-1S-2S RGBS-1S-2S Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  54. 54. Multiplication vs. Concatenation 54 The best option is concatenation: ● RGBS ● RGB-1S-2S
  55. 55. 55 Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  56. 56. 56 Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  57. 57. 57 RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  58. 58. 58 RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  59. 59. 59 Alexnet CNN RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation How to introduce saliency predictions?
  60. 60. 60 Alexnet CNN RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation Where? How to introduce saliency predictions?
  61. 61. Fan-in architecture 61 Three strategies: Fan-in C1.1 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  62. 62. Fan-in architecture 62 Three strategies: Fan-in C1.1 Fan-in C2.1 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 2 Batch Norm. Max-Pooling
  63. 63. Fan-in architecture 63 Three strategies: Fan-in C1.1 Fan-in C2.1 Fan-in C2 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 2 Batch Norm. Max-Pooling Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  64. 64. Fan-in architecture 64 Fan-in C1.1 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  65. 65. Fan-in architecture 65 Fan-in C1.1 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  66. 66. Fan-in architecture 66 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Fan-in C2.1 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 2 Batch Norm. Max-Pooling
  67. 67. Fan-in architecture 67 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Fan-in C2.1 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 2 Batch Norm. Max-Pooling
  68. 68. Fan-in architecture 68 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Fan-in C2 Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  69. 69. Fan-in architecture 69 Fan-in C1.1 Fan-in C2.1 Fan-in C2 Fan-in C2 Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling
  70. 70. Fan-in architecture 70 The best option is concatenation: ● Fan-in C2.1 ● Fan-in C2
  71. 71. Fan-in architecture 71 The best option is concatenation: ● Fan-in C2.1 ● Fan-in C2 Surprising result for Fan-in C2 since it has less parameters than the baseline More experiments 12.4%
  72. 72. RGB-C2 (128x128) 72 Fan-in C2Fan-in Network
  73. 73. RGB-C2 (128x128) 73 Fan-in C2Fan-in Network
  74. 74. RGB-C2 (128x128) 74 RGB-C2RGB (baseline) Fan-in C2Fan-in Network
  75. 75. 75 RGB-C2 (128x128) RGB (baseline) Fan-in Network RGB-C2 Fan-in C2
  76. 76. 76 Multiplication Fan-in Network Concatenation RGBS RGB-1S-2S How to introduce saliency predictions?
  77. 77. 77 Multiplication Fan-in Network Concatenation RGBS RGB-1S-2S Fan-in C2.1 Fan-in C2 How to introduce saliency predictions?
  78. 78. Analysis of per-class improvements 78 Fan-in C2.1 Fan-in C2 RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation
  79. 79. Analysis of per-class improvements 79 Fan-in C2.1 Fan-in C2 RGBS RGB-1S-2S Multiplication Fan-in Network Concatenation
  80. 80. Analysis of per-class improvements 80 Class Increase of accuracy Acoustic guitar 25 % Volleyball 23 %
  81. 81. 81 Analysis of per-class improvements Class Increase of accuracy Wrecker, tow car -23 % Entertainment center -18 %
  82. 82. Outline ● Introduction ● Objective ● State-of-the-art ● Methodology ● Conclusions ● Future work 82
  83. 83. ● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification 83 Conclusions
  84. 84. ● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification 84 Conclusions Fan-in Network
  85. 85. ● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification 85 Conclusions Fan-in Network
  86. 86. ● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps 86 Conclusions
  87. 87. ● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps 87 Conclusions Fan-in C2.1 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Conv 2 Batch Norm. Max-Pooling Fan-in NetworkConcatenation RGBS Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Batch Norm. Max-Pooling Max-Pooling Max-Pooling RGB Saliency
  88. 88. ● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps 88 Conclusions
  89. 89. ● The methodology of downsampling the images provides accurate results on the improvements of the CNN in larger images 89 Conclusions 227 x 227 128 x 128
  90. 90. Outline ● Introduction ● Objective ● State-of-the-art ● Methodology ● Conclusions ● Future work 90
  91. 91. Future work 91 ● Several experiments: ○ Fan-in: ■ Fan-in C2 without saliency maps ■ Concatenating instead of multiplying ○ Concatenation only in the first convolutional layer ○ Multiplication and training from scratch ● Once we have a reasonable model try with other saliency models
  92. 92. Future work 92 ● Several experiments: ○ Fan-in: ■ Fan-in C2 without saliency maps ■ Concatenating instead of multiplying ○ Concatenation only in the first convolutional layer ○ Multiplication and training from scratch ● Once we have a reasonable model try with other saliency models Thank you

×