Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016

1,307 views

Published on

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.

Published in: Technology

Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016

  1. 1. @DocXavi Deep Learning for Computer Vision Image Analytics 5 May 2016 Xavier Giró-i-Nieto Master en Creació Multimedia
  2. 2. 2 Densely linked slides
  3. 3. 3 Introduction Xavier Giro-i-Nieto • Web: https://imatge.upc.edu/web/people/xavier-giro Associate Professor at Universitat Politecnica de Catalunya (UPC)
  4. 4. 4 Acknowledgments
  5. 5. 5 Acknowledgments
  6. 6. One lecture organized in three parts 6 Images (global) Objects (local) Deep ConvNets for Recognition for... Video (2D+T)
  7. 7. One lecture organized in three parts 7 Images (global) Objects (local) Deep ConvNets for Recognition for... Video (2D+T)
  8. 8. Previously, before deep learning... 8Slide credit: Jose M Àlvarez Dog
  9. 9. 9Slide credit: Jose M Àlvarez Dog Learned Representation Previously, before deep learning...
  10. 10. Outline for Part I: Image Analytics... 10 Dog Learned Representation Part I: End-to-end learning (E2E)
  11. 11. 11 Learned Representation Part I: End-to-end learning (E2E) Task A (eg. image classification) Outline for Part I: Image Analytics...
  12. 12. 12 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features Outline for Part I: Image Analytics...
  13. 13. 13 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features Outline for Part I: Image Analytics...
  14. 14. E2E: Classification: Supervised learning 14 Manual Annotations Model New Image Automatic classification Training Test Anchor
  15. 15. E2E: Classification: LeNet-5 15 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278- 2324.
  16. 16. E2E: Classification: LeNet-5 16 Demo: 3D Visualization of a Convolutional Neural Network Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer International Publishing, 2015.
  17. 17. E2E: Classification: Similar to LeNet-5 17 Demo: Classify MNIST digits with a Convolutional Neural Network “ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat.”
  18. 18. E2E: Classification: Databases 18 Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  19. 19. 19 E2E: Classification: Databases Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  20. 20. 20 Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web] E2E: Classification: Databases ● 205 scene classes (categories). ● Images: ○ 2.5M train ○ 20.5k validation ○ 41k test
  21. 21. 21 E2E: Classification: ImageNet ILSRVC ● 1000 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  22. 22. E2E: Classification: ImageNet ILSRVC Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ● Predict 5 classes.
  23. 23. Slide credit: Rob Fergus (NYU) Image Classifcation 2012 -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 23 E2E: Classification: ILSRVC
  24. 24. E2E: Classification: AlexNet (Supervision) 24Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  25. 25. 25Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  26. 26. 26Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  27. 27. 27Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  28. 28. 28Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  29. 29. 29Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  30. 30. 30 Rectified Linear Unit (non-linearity) f(x) = max(0,x) Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  31. 31. 31 Dot Product Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  32. 32. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 32 E2E: Classification: ImageNet ILSRVC
  33. 33. The development of better convnets is reduced to trial-and- error. 33 E2E: Classification: Visualize: ZF Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  34. 34. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 34 E2E: Classification: Visualize: ZF
  35. 35. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net 35 E2E: Classification: Visualize: ZF
  36. 36. 36 E2E: Classification: Visualize: ZF
  37. 37. 37 E2E: Classification: Visualize: ZF
  38. 38. 38 E2E: Classification: Visualize: ZF
  39. 39. “To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer.” 39 E2E: Classification: Visualize: ZF
  40. 40. 40 E2E: Classification: Visualize: ZF
  41. 41. “(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables.” 41 E2E: Classification: Visualize: ZF
  42. 42. XX “(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the feature maps are always positive.” 42 E2E: Classification: Visualize: ZF
  43. 43. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 43 E2E: Classification: Visualize: ZF
  44. 44. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 44 E2E: Classification: Visualize: ZF
  45. 45. 45 Top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach. Corresponding image patches. E2E: Classification: Visualize: ZF
  46. 46. 46 E2E: Classification: Visualize: ZF
  47. 47. 47 E2E: Classification: Visualize: ZF
  48. 48. 48 E2E: Classification: Visualize: ZF
  49. 49. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 49 E2E: Classification: Visualize: ZF
  50. 50. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 50 E2E: Classification: Visualize: ZF
  51. 51. 51 The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) Clarifai (Layer 1) E2E: Classification: Visualize: ZF
  52. 52. 52 Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet. AlexNet (Layer 2) Clarifai (Layer 2) E2E: Classification: Visualize: ZF
  53. 53. 53 Regularization with dropout: Reduction of overfitting by setting to zero the output of a portion (typically 50%) of each intermediate neuron. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago E2E: Classification: Dropout: ZF
  54. 54. 54 E2E: Classification: Visualize: ZF
  55. 55. 55 E2E: Classification: Ensembles: ZF
  56. 56. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 56 E2E: Classification: ImageNet ILSRVC
  57. 57. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 57 E2E: Classification: ImageNet ILSRVC
  58. 58. E2E: Classification 58
  59. 59. E2E: Classification: GoogLeNet 59Movie: Inception (2010)
  60. 60. E2E: Classification: GoogLeNet 60 ● 22 layers, but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
  61. 61. E2E: Classification: GoogLeNet 61 ● Challenges of going deeper: ○ Overfitting, due to the increase amount of parameters. ○ Inefficient computation if most weights end up close to zero. Solution Sparsity How ? Inception modules
  62. 62. E2E: Classification: GoogLeNet 62
  63. 63. E2E: Classification: GoogLeNet 63 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  64. 64. E2E: Classification: GoogLeNet 64 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  65. 65. E2E: Classification: GoogLeNet (NiN) 65 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  66. 66. 66 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  67. 67. 67 In NiN, the Cascaded 1x1 Convolutions compute reductions after the convolutions. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  68. 68. E2E: Classification: GoogLeNet 68 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.
  69. 69. E2E: Classification: GoogLeNet 69 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  70. 70. E2E: Classification: GoogLeNet 70 3x3 max pooling introduces somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.
  71. 71. E2E: Classification: GoogLeNet 71 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !
  72. 72. E2E: Classification: GoogLeNet 72
  73. 73. E2E: Classification: GoogLeNet 73NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
  74. 74. E2E: Classification: GoogLeNet 74 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
  75. 75. E2E: Classification: VGG 75 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  76. 76. E2E: Classification: VGG 76 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  77. 77. E2E: Classification: VGG: 3x3 Stacks 77 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  78. 78. E2E: Classification: VGG 78 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  79. 79. E2E: Classification 79 3.6% top 5 error… with 152 layers !!
  80. 80. E2E: Classification: ResNet 80 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  81. 81. E2E: Classification: ResNet 81 ● Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  82. 82. E2E: Classification: ResNet 82 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  83. 83. E2E: Classification: ResNet 83 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  84. 84. 84 E2E: Classification: Humans Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  85. 85. 85 E2E: Classification: Humans “Is this a Border terrier” ? Crowdsourcing Yes No ● Binary ground truth annotation from the crowd. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  86. 86. 86 ● Annotation Problems: Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Crowdsource loss (0.3%) More than 5 objects classes E2E: Classification: Humans
  87. 87. 87 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] E2E: Classification: Humans
  88. 88. 88 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] “Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120 breeds of dog to guess what species it is ?” E2E: Classification: Humans
  89. 89. E2E: Classification: Humans 89 NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015) ResNet
  90. 90. 9090 Let’s play a game!
  91. 91. 91
  92. 92. 92 What have you seen?
  93. 93. 93 Tower
  94. 94. 94 Tower House
  95. 95. 95 Tower House Rocks
  96. 96. 96 E2E: Saliency Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
  97. 97. 97 Eye Tracker Mouse Click Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015) E2E: Saliency
  98. 98. 98 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  99. 99. 99 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  100. 100. 100 TRAIN VALIDATION TEST 10,000 5,000 5,000 6,000 926 2,000 CAT2000 [Borji’15] 2,000 - 2,000 MIT300 [Judd’12] 300 - -Large Scale E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  101. 101. 101 E2E: Saliency: JuntingNet
  102. 102. 102 Upsample + filter 2D map 96x96 2340=48x48 IMAGE INPUT (RGB) E2E: Saliency: JuntingNet
  103. 103. 103 Upsample + filter 2D map 96x96 2340=48x48 3 CONV LAYERS E2E: Saliency: JuntingNet
  104. 104. 104 Upsample + filter 2D map 96x96 2340=48x48 2 DENSE LAYERS E2E: Saliency: JuntingNet
  105. 105. 105 Upsample + filter 2D map 96x96 2340=48x48 E2E: Saliency: JuntingNet
  106. 106. 106 E2E: Saliency: JuntingNet
  107. 107. 107 Loss function Mean Square Error (MSE) Weight initialization Gaussian distribution Learning rate 0.03 to 0.0001 Mini batch size 128 Training time 7h (SALICON) / 4h (iSUN) Acceleration SGD+ nesterov momentum (0.9) Regularisation Maxout norm GPU NVidia GTX 980 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  108. 108. 108 Number of iterations (Training time) ● Back-propagation with the Euclidean distance. ● Training curve for the SALICON database. E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  109. 109. 109 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  110. 110. 110 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  111. 111. 111 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  112. 112. 112 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  113. 113. 113 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  114. 114. 114 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  115. 115. 115 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  116. 116. 116 Part I: End-to-end learning (E2E) Domain B Fine-tuned Learned Representation Part I’: End-to-End Fine-Tuning (FT) Part I: End-to-end learning (E2E) Domain ALearned Representation Part I: End-to-end learning (E2E) Transfer Outline for Part I: Image Analytics
  117. 117. 117 E2E: Fine-tuning Fine-tuning a pre-trained network Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
  118. 118. 118 E2E: Fine-tuning Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015) Fine-tuning a pre-trained network
  119. 119. 119 E2E: Fine-tuning: Sentiments CNN Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57-62. ACM, 2015.
  120. 120. 120 Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR 2015 E2E: Fine-tuning: Sentiments True positive True negative False positive False negative Visualizations with fully convolutional networks.
  121. 121. 121 E2E: Fine-tuning: Cultural events ChaLearn Workshop A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People Workshop 2015, 2015. [slides]
  122. 122. 122 VGG + Fine tuned E2E: Fine-tuning: Saliency prediction
  123. 123. 123 E2E: Fine-tuning: Saliency prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016. From scratch VGG + Fine tuned
  124. 124. 124 E2E: Fine-tuning: Saliency prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016.
  125. 125. 125 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval) Part II: Off-The-Shelf features (OTS) Outline for Part I: Image Analytics...
  126. 126. 126 Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Off-The-Shelf (OTS) Features
  127. 127. ● Intermediate features can be used as regular visual descriptors for any task. 127 Off-The-Shelf (OTS) Features Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
  128. 128. 128 OTS: Classification: Razavian Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Pascal VOC 2007
  129. 129. 129 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014 Classifier L2-normalization Accuracy +5%
  130. 130. Three representative architectures considered: AlexNet ZF OverFeat 5 days (fast) 3 weeks (slow) @ NVIDIA GTX Titan GPU 130 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  131. 131. F C 131 OTS: Classification: Return of devil Data augmentation Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  132. 132. 132 OTS: Classification: Return of devil Fisher Kernels (FK) ConvNets (CNN)
  133. 133. Color Gray Scale (GS) Accuracy -2.5% 133 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  134. 134. Dimensionality reduction by retraining the last layer to smaller sizes. Accuracy -2% Size x32 134 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  135. 135. Ranking Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 584-599. 135 OTS: Retrieval
  136. 136. Oxford Buildings Inria Holidays UKB 136 OTS: Retrieval
  137. 137. Pooled from the network from Krizhevsky et. al. pretrained with images from ImageNet. 137 OTS: Retrieval: FC layers Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
  138. 138. Off-the-shelf CNN descriptors from fully connected layers show useful but not superior (w.r.t. FV, VLAD, Sparse Coding,...) 138Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. OTS: Retrieval: FC layers
  139. 139. Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 139 Convolutional layers have shown better performance than fully connected ones. OTS: Retrieval: Conv layers
  140. 140. OTS: Retrieval: Conv layers Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 140 Spatial Search, (extract N local descriptor from predefined locations) increases performance at computational cost.
  141. 141. Medium memory footprints Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 141 OTS: Retrieval: Conv layers
  142. 142. 142 OTS: Summarization Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy: 2015 Clustering based on Euclidean distance over FC7 features from AlexNet.
  143. 143. 143 Thank you ! https://imatge.upc.edu/web/people/xavier-giro https://twitter.com/DocXavi https://www.facebook.com/ProfessorXavi xavier.giro@upc.edu Xavier Giró-i-Nieto

×