Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[course site]
Imagenet Large Scale
Visual Recognition
Challenge (ILSVRC)
Day 2 Lecture 4
Xavier Giró-i-Nieto
2
ImageNet Dataset
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russakovsky, O., Deng,...
3
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visu...
4
ImageNet Challenge
● 1,000 object classes
(categories).
● Images:
○ 1.2 M train
○ 100k test.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual...
Slide credit:
Rob Fergus (NYU)
Image Classification 2012
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S....
AlexNet (Supervision)
7Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC ...
8Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
AlexNet (Supervi...
9Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
AlexNet (Supervi...
10Image credit: Deep learning Tutorial (Stanford University)
AlexNet (Supervision)
11Image credit: Deep learning Tutorial (Stanford University)
AlexNet (Supervision)
12Image credit: Deep learning Tutorial (Stanford University)
AlexNet (Supervision)
13
Rectified
Linear
Unit
(non-linearity)
f(x) = max(0,x)
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep...
14
Dot Product
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
Al...
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015)...
The development of better
convnets is reduced to
trial-and-error.
16
Zeiler-Fergus (ZF)
Visualization can help in
proposin...
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to featur...
18
Zeiler-Fergus (ZF)
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Compute...
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp...
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp...
21
The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results
in more distinctive features and fewer “dead" featur...
22
Cleaner features in ZF, without the aliasing artifacts caused by
the stride 4 used in AlexNet.
AlexNet (Layer 2) ZF (La...
23
Regularization with more
dropout: introduced in the
input layer.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskev...
24
Zeiler-Fergus (ZF): Results
25
Zeiler-Fergus (ZF): Results
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015)...
E2E: Classification
27
E2E: Classification: GoogLeNet
28Movie: Inception (2010)
E2E: Classification: GoogLeNet
29
● 22 layers, but 12 times fewer parameters than AlexNet.
Szegedy, Christian, Wei Liu, Ya...
E2E: Classification: GoogLeNet
30
E2E: Classification: GoogLeNet
31
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
32
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Multiple
scales
E2E: Classification: GoogLeNet (NiN)
33
3x3 and 5x5 convolutions deal
with different scales.
Lin, Min, Qiang Chen, and Shu...
E2E: Classification: GoogLeNet
34
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Dimensionality...
35
1x1 convolutions does dimensionality
reduction (c3<c2) and accounts for rectified
linear units (ReLU).
Lin, Min, Qiang ...
E2E: Classification: GoogLeNet
36
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 ...
E2E: Classification: GoogLeNet
37
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
38
They somewhat spatial invariance, and has proven a
benefitial effect by adding an altern...
E2E: Classification: GoogLeNet
39
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
provi...
E2E: Classification: GoogLeNet
40
E2E: Classification: GoogLeNet
41NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” ...
E2E: Classification: GoogLeNet
42
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelo...
E2E: Classification: VGG
43
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification: VGG
44
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification: VGG: 3x3 Stacks
45
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large...
E2E: Classification: VGG
46
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification
47
3.6% top 5 error…
with 152 layers !!
E2E: Classification: ResNet
48
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image R...
E2E: Classification: ResNet
49
● Deeper networks (34 is deeper than 18) are more difficult to train.
He, Kaiming, Xiangyu ...
ResNet
50
● Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, i...
E2E: Classification: ResNet
51
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image R...
52
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi
Upcoming SlideShare
Loading in …5
×

Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)

6,253 views

Published on

http://imatge-upc.github.io/telecombcn-2016-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.

Published in: Data & Analytics
  • Be the first to comment

Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)

  1. 1. [course site] Imagenet Large Scale Visual Recognition Challenge (ILSVRC) Day 2 Lecture 4 Xavier Giró-i-Nieto
  2. 2. 2 ImageNet Dataset Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  3. 3. 3 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ImageNet Dataset
  4. 4. 4 ImageNet Challenge ● 1,000 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  5. 5. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ● Top 5 error rate ImageNet Challenge
  6. 6. Slide credit: Rob Fergus (NYU) Image Classification 2012 -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 6 Based on SIFT + Fisher Vectors ImageNet Challenge
  7. 7. AlexNet (Supervision) 7Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  8. 8. 8Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) AlexNet (Supervision)
  9. 9. 9Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) AlexNet (Supervision)
  10. 10. 10Image credit: Deep learning Tutorial (Stanford University) AlexNet (Supervision)
  11. 11. 11Image credit: Deep learning Tutorial (Stanford University) AlexNet (Supervision)
  12. 12. 12Image credit: Deep learning Tutorial (Stanford University) AlexNet (Supervision)
  13. 13. 13 Rectified Linear Unit (non-linearity) f(x) = max(0,x) Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) AlexNet (Supervision)
  14. 14. 14 Dot Product Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) AlexNet (Supervision)
  15. 15. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 15 ImageNet Challenge
  16. 16. The development of better convnets is reduced to trial-and-error. 16 Zeiler-Fergus (ZF) Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  17. 17. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 17 Zeiler-Fergus (ZF)
  18. 18. 18 Zeiler-Fergus (ZF) Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net
  19. 19. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 19 Zeiler-Fergus (ZF)
  20. 20. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 20 Zeiler-Fergus (ZF)
  21. 21. 21 The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) ZF (Layer 1) Zeiler-Fergus (ZF): Stride & filter size
  22. 22. 22 Cleaner features in ZF, without the aliasing artifacts caused by the stride 4 used in AlexNet. AlexNet (Layer 2) ZF (Layer 2) Zeiler-Fergus (ZF)
  23. 23. 23 Regularization with more dropout: introduced in the input layer. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago Zeiler-Fergus (ZF): Drop out
  24. 24. 24 Zeiler-Fergus (ZF): Results
  25. 25. 25 Zeiler-Fergus (ZF): Results
  26. 26. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% 26 E2E: Classification: ImageNet ILSRVC
  27. 27. E2E: Classification 27
  28. 28. E2E: Classification: GoogLeNet 28Movie: Inception (2010)
  29. 29. E2E: Classification: GoogLeNet 29 ● 22 layers, but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
  30. 30. E2E: Classification: GoogLeNet 30
  31. 31. E2E: Classification: GoogLeNet 31 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  32. 32. E2E: Classification: GoogLeNet 32 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. Multiple scales
  33. 33. E2E: Classification: GoogLeNet (NiN) 33 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  34. 34. E2E: Classification: GoogLeNet 34 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. Dimensionality reduction
  35. 35. 35 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  36. 36. E2E: Classification: GoogLeNet 36 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.
  37. 37. E2E: Classification: GoogLeNet 37 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  38. 38. E2E: Classification: GoogLeNet 38 They somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.
  39. 39. E2E: Classification: GoogLeNet 39 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !
  40. 40. E2E: Classification: GoogLeNet 40
  41. 41. E2E: Classification: GoogLeNet 41NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
  42. 42. E2E: Classification: GoogLeNet 42 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
  43. 43. E2E: Classification: VGG 43 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  44. 44. E2E: Classification: VGG 44 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  45. 45. E2E: Classification: VGG: 3x3 Stacks 45 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  46. 46. E2E: Classification: VGG 46 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  47. 47. E2E: Classification 47 3.6% top 5 error… with 152 layers !!
  48. 48. E2E: Classification: ResNet 48 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  49. 49. E2E: Classification: ResNet 49 ● Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  50. 50. ResNet 50 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  51. 51. E2E: Classification: ResNet 51 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  52. 52. 52 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi

×