Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Barcelona Supercomputing Center
Neural Architectures
for Still Images
Day 1 Lecture 2
#DLUPC
http://bit.ly/dlcv2019

2
Videolectures
Kevin McGuinness (DLCV 2018)Xavier Giro-i-Nieto (DLCV 2017)

3
This lecture objective
● Understand the ImageNet task of image classiﬁcation.
● Know the most popular CNN architectures for computer vision.
● Raise awareness of the importance of data for deep learning models.

4
Perceptron
Weights and bias are the parameters that define the behavior. They must be
learned during training.

5
Fully Connected Layer (FC)
Analyzing images with FC layers is
computationally infeasible.
Example:
1000x1000 grayscale image
1M hidden units
10^12 parameters!
Figure Credit: Ranzatto

6
Figure: Tim Hartley (2014)
Convolutional Filter

7
Convolutional Layer (Conv)
The goal is to estimate the parameters of
multiple convolutional ﬁlters.
100 Convolutional Filters
Filter size: 3x3
—————————-
900 parameters
The amount of parameters size does not
depend on input image size!
Finally, 900 (Conv) vs 10^12 (FC)
parameters

8
Feature Maps
Each of these learned convolutional
filters detects a different pattern
(“feature’’).
The responses at different locations
from each convolutional filter
defines a feature maps.

9
Conv Layer & Feature Maps
A convolutional layer is a module that transforms some feature maps to other
feature maps, which learn higher-abstract concepts.

10
output feature mapfilter of depth=4
Notice that the amount of input feature maps deﬁnes the depth of the conv ﬁlters...

11
output feature mapfilter of depth=4
Many feature
maps
...and the amount of convolutional features deﬁnes the amount of channels of the
output feature map.

12
Pooling Layer
Pooling is a downsample operation
along the spatial dimensions (width,
height)
● It reduces progressively the
spatial size of the
representation, so it reduces the
computation greatly.
● Provides invariance to small
local changes

13
Convolutional Neural Networks for Vision
LeNet-5: Several convolutional layers, combined with pooling layers, and followed by a
small number of fully connected layers
#LeNet-5 LeCun, Y., Bottou, L., Bengio, Y., & Haﬀner, P. (1998). Gradient-based learning applied to document
recognition. Proceedings of the IEEE, 86(11), 2278-2324.

14
ImageNet Challenge
● 1,000 object classes
(categories).
● Images:
○ 1.2 M train
○ 100k test.

15
ImageNet Challenge
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]

16
ImageNet Challenge: 2012
Slide credit:
Rob Fergus (NYU)
-9.8%
Based on SIFT + Fisher Vectors

17
AlexNet (SuperVision)
#AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoﬀrey E. Hinton. "Imagenet classiﬁcation with deep
convolutional neural networks." NIPS 2012

18
Filters learned by Alexnet
Visualization of the 96 ﬁlters of size 11 x 11 learned by bottom layer

19
Filters learned by Alexnet
First layers learn edges, textures, while deeper layers learn higher-abstract
concepts.

20
ImageNet Classification 2013
Slide credit:
Rob Fergus (NYU)

21
Zeiler-Fergus (ZF)
The development of better
convnets is reduced to
trial-and-error.
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. . Visualizing and understanding convolutional networks. ECCV 2014

22
Zeiler-Fergus (ZF)
Zeiler, M. D., & Fergus, R. . Visualizing and understanding convolutional networks. ECCV 2014

23
ImageNet Classification 2013
-5%
large scale visual recognition challenge." IJCV 2015. [web]

24
NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)

26
GoogleNet (Inception)
Movie: Inception (2010)

27
22 layers !
#Inception Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with

29
#NiN Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.

30
Two Softmax classifiers at intermediate layers combat the vanishing gradient
while providing regularization at training time.
...and no fully connected layers needed
(12 times fewer parameters than AlexNet. !)

31
#Inception Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan,
Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]

32
VGG
#VGG Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." ICLR 2015. [video] [slides] [project]

33
VGG

34
VGG: Stacked 3x3 convolutions

35
VGG: Other details
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." ICLR 2015. [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).

36
3.6% top 5 error…
with 152 layers !!

37#ResNet He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image
recognition." CVPR 2016. [slides]

38
ResNet
Deeper networks (34 is deeper than 18) are more difficult to train.
#ResNet He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image

39
ResNet
Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions

40
ResNet

41
Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for
practical applications." arXiv preprint arXiv:1605.07678 (2016).

42
Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for
practical applications." arXiv preprint arXiv:1605.07678 (2016).

45
Ensembles of Models (Hikivision)
● More than 20 models,
including VGG, Inception,
ResNet and variations of
it.
● Novel data
augmentation.
● Novel learning rate
policy.
● …and “some small tricks”

46
The end of the challenge
Electronic Frontier Foundation: “Measuring the Progress of AI Research” (2017)

47
ResNext = ResNet + Inception
#ResNext Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. "Aggregated residual
transformations for deep neural networks." CVPR 2017 [code]

48
ResNext = ResNet + Inception
#ResNext Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. "Aggregated residual
transformations for deep neural networks." CVPR 2017 [code]

49
DenseNet
Dense Block of 5-layers
with a growth rate of k=4
Connect every layer to every other layer of the same ﬁlter size.
#DenseNet Huang, Gao, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. "Densely
connected convolutional networks." CVPR 2017. [code]

50
DenseNet
#DenseNet Huang, Gao, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. "Densely
connected convolutional networks." CVPR 2017. [code]

51
Neural Architecture Search (NAS)
#AutoML Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan,
Quoc V. Le, and Alexey Kurakin. "Large-scale evolution of image classiﬁers." ICML 2017. [blog]

52
#AutoML Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan,
Quoc V. Le, and Alexey Kurakin. "Large-scale evolution of image classiﬁers." ICML 2017. [blog]

53
#NasNet Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. "Learning transferable
architectures for scalable image recognition." CVPR 2018.

54
#AdaNet Cortes, Corinna, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. "Adanet:
Adaptive structural learning of artiﬁcial neural networks." ICML 2017. [blog]

56
Learn more
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.

57
Learn more
Andrey Karpathy, Convolutional Neural Networks. Stanford
cs231n.

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

Similar to Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019 (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019