Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Image Classification
on ImageNet
#InsightDL2017
2
ImageNet Challenge
● 1,000 object classes
(categories).
● Images:
○ 1.2 M train
○ 100k test.
3
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
ImageNet Dataset
Slide credit:
Rob Fergus (NYU)
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web] 4
Based on SIFT + Fisher Vectors
ImageNet Challenge: 2012
AlexNet (Supervision)
5
Orange
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in
Neural Information Processing Systems 25 (NIPS 2012)
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
Slide credit:
Rob Fergus (NYU)
6
ImageNet Challenge: 2013
The development of better
convnets is reduced to
trial-and-error.
7
Zeiler-Fergus (ZF)
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to features does the opposite.”
Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision
(ICCV), 2011 IEEE International Conference on. IEEE, 2011.
8
Zeiler-Fergus (ZF)
9
Zeiler-Fergus (ZF)
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
DeconvN
et
Conv
Net
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
10
Zeiler-Fergus (ZF)
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
11
Zeiler-Fergus (ZF)
12
Regularization with more
dropout: introduced in the
input layer.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of
feature detectors. arXiv preprint arXiv:1207.0580.
Chicago
Zeiler-Fergus (ZF): Drop out
13
Zeiler-Fergus (ZF): Results
14
Zeiler-Fergus (ZF): Results
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
15
ImageNet Challenge: 2013
16NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
ImageNet Challenge: 2014
17
ImageNet Challenge: 2014
GoogLeNet (Inception)
18Movie: Inception (2010)
19
● 22 layers, but 12 times fewer parameters than AlexNet.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
GoogLeNet (Inception)
20
GoogLeNet (Inception)
21
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
GoogLeNet (Inception)
22
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Multiple
scales
GoogLeNet (Inception)
GoogLeNet (NiN)
23
3x3 and 5x5 convolutions deal
with different scales.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
24
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Dimensionality
reduction
GoogLeNet (Inception)
25
1x1 convolutions does dimensionality
reduction (c3<c2) and accounts for rectified
linear units (ReLU).
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
GoogLeNet (Inception)
26
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 and 5x5 convolutions.
GoogLeNet (Inception)
27
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
GoogLeNet (Inception)
28
They somewhat spatial invariance, and has proven a
benefitial effect by adding an alternative parallel path.
GoogLeNet (Inception)
29
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
providing regularization at training time.
...and no fully connected layers needed !
GoogLeNet (Inception)
30
GoogLeNet (Inception)
31
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
GoogLeNet (Inception)
E2E: Classification: VGG
32
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR 2015.
[video] [slides] [project]
E2E: Classification: VGG
33
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG: 3x3 Stacks
34
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
35
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).
36
3.6% top 5 error…
with 152 layers !!
ImageNet Challenge: 2015
E2E: Classification: ResNet
37
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
38
● Deeper networks (34 is deeper than 18) are more difficult to train.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
Thin curves: training error
Bold curves: validation error
ResNet
39
● Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
40
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
41
Learn more
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
42
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi

Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop 2017)

  • 1.
    Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor UniversitatPolitecnica de Catalunya Technical University of Catalonia Image Classification on ImageNet #InsightDL2017
  • 2.
    2 ImageNet Challenge ● 1,000object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  • 3.
    3 Russakovsky, O., Deng,J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ImageNet Dataset
  • 4.
    Slide credit: Rob Fergus(NYU) -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 4 Based on SIFT + Fisher Vectors ImageNet Challenge: 2012
  • 5.
    AlexNet (Supervision) 5 Orange A Krizhevsky,I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  • 6.
    ImageNet Classification 2013 Russakovsky,O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 6 ImageNet Challenge: 2013
  • 7.
    The development ofbetter convnets is reduced to trial-and-error. 7 Zeiler-Fergus (ZF) Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  • 8.
    “A convnet modelthat uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 8 Zeiler-Fergus (ZF)
  • 9.
    9 Zeiler-Fergus (ZF) Zeiler, M.D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net
  • 10.
    Zeiler, M. D.,& Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 10 Zeiler-Fergus (ZF)
  • 11.
    Zeiler, M. D.,& Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 11 Zeiler-Fergus (ZF)
  • 12.
    12 Regularization with more dropout:introduced in the input layer. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago Zeiler-Fergus (ZF): Drop out
  • 13.
  • 14.
  • 15.
    ImageNet Classification 2013 Russakovsky,O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% 15 ImageNet Challenge: 2013
  • 16.
    16NVIDIA, “NVIDIA andIBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015) ImageNet Challenge: 2014
  • 17.
  • 18.
  • 19.
    19 ● 22 layers,but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." GoogLeNet (Inception)
  • 20.
  • 21.
    21 Lin, Min, QiangChen, and Shuicheng Yan. "Network in network." ICLR 2014. GoogLeNet (Inception)
  • 22.
    22 Lin, Min, QiangChen, and Shuicheng Yan. "Network in network." ICLR 2014. Multiple scales GoogLeNet (Inception)
  • 23.
    GoogLeNet (NiN) 23 3x3 and5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  • 24.
    24 Lin, Min, QiangChen, and Shuicheng Yan. "Network in network." ICLR 2014. Dimensionality reduction GoogLeNet (Inception)
  • 25.
    25 1x1 convolutions doesdimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] GoogLeNet (Inception)
  • 26.
    26 In GoogLeNet, theCascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions. GoogLeNet (Inception)
  • 27.
    27 Lin, Min, QiangChen, and Shuicheng Yan. "Network in network." ICLR 2014. GoogLeNet (Inception)
  • 28.
    28 They somewhat spatialinvariance, and has proven a benefitial effect by adding an alternative parallel path. GoogLeNet (Inception)
  • 29.
    29 Two Softmax Classifiersat intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed ! GoogLeNet (Inception)
  • 30.
  • 31.
    31 Szegedy, Christian, WeiLiu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster] GoogLeNet (Inception)
  • 32.
    E2E: Classification: VGG 32 Simonyan,Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR 2015. [video] [slides] [project]
  • 33.
    E2E: Classification: VGG 33 Simonyan,Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 34.
    E2E: Classification: VGG:3x3 Stacks 34 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 35.
    E2E: Classification: VGG 35 Simonyan,Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  • 36.
    36 3.6% top 5error… with 152 layers !! ImageNet Challenge: 2015
  • 37.
    E2E: Classification: ResNet 37 He,Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 38.
    E2E: Classification: ResNet 38 ●Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  • 39.
    ResNet 39 ● Residual learning:reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 40.
    E2E: Classification: ResNet 40 He,Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 41.
    41 Learn more Li Fei-Fei,“How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 42.
    42 Thanks ! Q&A? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi