Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@DocXavi
Module 5 - Lecture 10
Deep Convnets for
Global Recognition
30 March 2016
Xavier Giró-i-Nieto
http://pagines.uab.c...
2
Densely linked slides
3
Acknowledgments
Santi
Pascual
4
Acknowledgments
Two deep lectures in M5
5
Global Scale
(today’s lecture)
Local Scale
(next lecture)
Deep ConvNets for Recognition at...
Previously, in M3...
6Slide credit: Jose M Àlvarez
Dog
Previously, in M3...
7Slide credit: Jose M Àlvarez
Dog
Learned
Representation
Outline for this session in M5...
8
Dog
Learned
Representation
Part I: End-to-end learning (E2E)
Outline for this session in M5...
9
Learned
Representation
Part I: End-to-end learning (E2E)
Task A
(eg. image classificat...
Outline for this session in M5...
10
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning ...
Outline for this session in M5...
11
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning ...
E2E: Classification: LeNet-5
12
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied t...
E2E: Classification: LeNet-5
13
Demo: 3D Visualization of a Convolutional Neural Network
Harley, Adam W. "An Interactive N...
E2E: Classification: Similar to LeNet-5
14
Demo: Classify MNIST digits with a Convolutional Neural Network
“ConvNetJS is a...
E2E: Classification: Databases
15
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russako...
16
E2E: Classification: Databases
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (...
17
Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recog...
18
E2E: Classification: ImageNet ILSRVC
● 100 object classes (categories).
● Images:
○ 1.2 M train
○ 100k test.
E2E: Classification: ImageNet ILSRVC
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L...
Slide credit:
Rob Fergus (NYU)
Image Classifcation 2012
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,...
E2E: Classification: AlexNet (Supervision)
21Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Te...
22Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classifica...
23Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classifica...
24Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
25Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
26Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
27
Rectified
Linear
Unit
(non-linearity)
f(x) = max(0,x)
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep...
28
Dot Product
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2...
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015)...
The development of better
convnets is reduced to trial-and-
error.
30
E2E: Classification: Visualize: ZF
Visualization can...
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to featur...
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp...
33
E2E: Classification: Visualize: ZF
34
E2E: Classification: Visualize: ZF
35
E2E: Classification: Visualize: ZF
“To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature
maps as inp...
37
E2E: Classification: Visualize: ZF
“(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate
inverse by ...
XX
“(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the
feature ma...
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately ...
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately ...
42
Top 9 activations in a random subset of feature maps
across the validation data, projected down to pixel space
using ou...
43
E2E: Classification: Visualize: ZF
44
E2E: Classification: Visualize: ZF
45
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp...
47
The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead"
featur...
48
Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet.
AlexNet (Layer 2) ...
49
Regularization with dropout:
Reduction of overfitting by setting
to zero the output of a portion
(typically 50%) of the...
50
E2E: Classification: Visualize: ZF
51
E2E: Classification: Ensembles: ZF
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015)...
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015)...
E2E: Classification
54
E2E: Classification: GoogLeNet
55Movie: Inception (2010)
E2E: Classification: GoogLeNet
56
● 22 layers, but 12 times fewer parameters than AlexNet.
Szegedy, Christian, Wei Liu, Ya...
E2E: Classification: GoogLeNet
57
● Challenges of going deeper:
○ Overfitting, due to the increase amount of parameters.
○...
E2E: Classification: GoogLeNet
58
E2E: Classification: GoogLeNet
59
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
60
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet (NiN)
61
3x3 and 5x5 convolutions deal with
different scales.
Lin, Min, Qiang Chen, and Shu...
62
1x1 convolutions does dimensionality reduction (c3<c2) and
accounts for rectified linear units (ReLU).
Lin, Min, Qiang ...
63
In NiN, the Cascaded 1x1 Convolutions compute reductions after the
convolutions.
Lin, Min, Qiang Chen, and Shuicheng Ya...
E2E: Classification: GoogLeNet
64
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 ...
E2E: Classification: GoogLeNet
65
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
66
3x3 max pooling introduces somewhat spatial invariance,
and has proven a benefitial effe...
E2E: Classification: GoogLeNet
67
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
provi...
E2E: Classification: GoogLeNet
68NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” ...
E2E: Classification: GoogLeNet
69
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelo...
E2E: Classification: VGG
70
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification: VGG
71
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification: VGG: 3x3 Stacks
72
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large...
E2E: Classification: VGG
73
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image...
E2E: Classification
74
3.6% top 5 error…
with 152 layers !!
E2E: Classification: ResNet
75
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image R...
E2E: Classification: ResNet
76
● Deeper networks (34 is deeper than 18) are more difficult to train.
He, Kaiming, Xiangyu ...
E2E: Classification: ResNet
77
● Residual learning: reformulate the layers as learning residual functions with
reference t...
E2E: Classification: ResNet
78
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image R...
79
E2E: Classification: Humans
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (201...
80
E2E: Classification: Humans
“Is this a Border terrier” ?
Crowdsourcing
Yes
No
● Binary ground truth annotation from the...
81
● Annotation Problems:
Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge ...
82
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one ...
83
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one ...
E2E: Classification: Humans
84
NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015)
ResNet
8585
Let’s play a
game!
86
87
What have you seen?
88
Tower
89
Tower
House
90
Tower
House
Rocks
91
E2E: Saliency
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
92
Eye Tracker Mouse Click
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB ...
93
E2E: Saliency: JuntingNet
JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Niet...
94
JuntingNet
DATA
iSun [Xu’15]
SALICON [Jiang’15]
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, ...
95
TRAIN VALIDATION TEST
SALICON [Jiang’15] 10,000 5,000 5,000
iSun [Xu’15] 6,000 926 2,000
CAT2000 [Borji’15] 2,000 - 2,0...
96
ARCHITECTURE
[Pan’15]
E2E: Saliency: JuntingNet
JuntingNet
DATA
iSun [Xu’15]
SALICON [Jiang’15]
97
Upsample
+ filter
2D
map
96x96 2340=48x48
IMAGE
INPUT
(RGB)
E2E: Saliency: JuntingNet
98
Upsample
+ filter
2D
map
96x96 2340=48x48
3 CONV LAYERS
E2E: Saliency: JuntingNet
99
Upsample
+ filter
2D
map
96x96 2340=48x48
2 DENSE
LAYERS
E2E: Saliency: JuntingNet
100
Upsample
+ filter
2D
map
96x96 2340=48x48
E2E: Saliency: JuntingNet
101
JuntingNet
ARCHITECTURE
[Pan’15]
DATA
iSun [Xu’15]
SALICON [Jiang’15]
SOFTWARE
[Bergstra’10]
[Bastien’12]
E2E: Salienc...
102
Loss function Mean Square Error (MSE)
Weight initialization Gaussian distribution
Learning rate 0.03 to 0.0001
Mini ba...
103
Number of iterations
(Training time)
● Back-propagation with the Euclidean distance.
● Training curve for the SALICON ...
104
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Conno...
105
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Conno...
106
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noe...
107
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Co...
108
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Co...
109
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, ...
110
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallo...
Outline for this session in M5...
111
Part I: End-to-end learning (E2E)
Domain B
Fine-tuned
Learned
Representation
Part I’...
112
E2E: Fine-tuning
Fine-tuning a pre-trained network
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sen...
113
E2E: Fine-tuning
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
F...
114
E2E: Fine-tuning: Sentiments
CNN
Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep in...
115
Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted)
Long, Jonathan, Evan She...
116
E2E: Fine-tuning: Cultural events
ChaLearn Workshop
A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, ...
Outline for this session in M5...
117
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning...
118
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf:
an astounding b...
● Intermediate features can be used as regular visual descriptors for any task.
119
Off-The-Shelf (OTS) Features
Babenko, ...
120
OTS: Classification: Razavian
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features ...
121
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil i...
Three representative architectures considered:
AlexNet
ZF
OverFeat
5 days (fast)
3 weeks (slow)
@ NVIDIA GTX Titan GPU
122...
F
C
123
OTS: Classification: Return of devil
Data augmentation
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.....
124
OTS: Classification: Return of devil
Fisher
Kernels
(FK)
ConvNets
(CNN)
Color
Gray Scale (GS)
Accuracy
-2.5%
125
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and...
Dimensionality reduction by retraining the last layer to smaller sizes.
Accuracy
-2%
Size
x32
126
OTS: Classification: Ret...
Ranking
Summary of the paper by Amaia Salvador on Bitsearch.
Babenko, Artem, et al. "Neural codes for image retrieval." Co...
Oxford Buildings Inria Holidays UKB
128
OTS: Retrieval
Pooled from the network from Krizhevsky et. al. pretrained with images from
ImageNet.
129
OTS: Retrieval: FC layers
Summar...
Off-the-shelf CNN descriptors from fully connected layers show useful but not
superior (w.r.t. FV, VLAD, Sparse Coding,......
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
131
Convolutional la...
OTS: Retrieval: Conv layers
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICL...
Medium memory footprints
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2...
Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted)
134
...
Instance Retrieval
(I...
Mohedano et al, Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016)
135
OTS: Retrieval: Bag of W...
136Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted)
OTS: Retrieval: Bag of Wor...
137
OTS: Summarization
Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostream...
138
OTS: Reinforcement learning: DeepQ
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, ...
139
OTS: Reinforcement learning: Deep Q
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou,...
140
OTS: Reinforcement learning
Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforceme...
ConvNets: Learn more
141
The New York Times: “The Race Is On to Control Artificial Intelligence, and Tech’s
Future” (25/03...
142
Keras http://keras.io/
Caffe http://caffe.berkeleyvision.org/
Torch (Overfeat) http://torch.ch/
Theano http://deeplear...
143
ConvNets: Frameworks
Source: @fchollet
Stanford course:
CS231n:
Convolutional Neural
Networks for Visual
Recognition
144
ConvNets: Learn more
145
● Reading Group on Tuesdays (UPC) and Wednesdays (UB) at 11am.
Schedule, list of paper, slides & video(s): https://git...
146
● Summer Course “Deep learning for computer vision” (4-8 July 2016)
(Very) temptative program at: https://github.com/i...
147
● Broader and more friendly slides for dissemination.
[Available on slideshare.net]
ConvNets: Learn more
148
Thank you !
https://imatge.upc.edu/web/people/xavier-giro
https://twitter.com/DocXavi
https://www.facebook.com/Profess...
Upcoming SlideShare
Loading in …5
×

of

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 1 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 2 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 3 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 4 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 5 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 6 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 7 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 8 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 9 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 10 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 11 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 12 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 13 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 14 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 15 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 16 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 17 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 18 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 19 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 20 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 21 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 22 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 23 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 24 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 25 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 26 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 27 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 28 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 29 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 30 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 31 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 32 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 33 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 34 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 35 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 36 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 37 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 38 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 39 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 40 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 41 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 42 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 43 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 44 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 45 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 46 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 47 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 48 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 49 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 50 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 51 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 52 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 53 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 54 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 55 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 56 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 57 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 58 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 59 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 60 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 61 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 62 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 63 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 64 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 65 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 66 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 67 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 68 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 69 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 70 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 71 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 72 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 73 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 74 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 75 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 76 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 77 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 78 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 79 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 80 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 81 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 82 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 83 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 84 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 85 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 86 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 87 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 88 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 89 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 90 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 91 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 92 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 93 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 94 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 95 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 96 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 97 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 98 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 99 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 100 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 101 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 102 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 103 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 104 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 105 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 106 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 107 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 108 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 109 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 110 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 111 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 112 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 113 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 114 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 115 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 116 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 117 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 118 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 119 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 120 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 121 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 122 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 123 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 124 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 125 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 126 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 127 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 128 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 129 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 130 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 131 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 132 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 133 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 134 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 135 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 136 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 137 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 138 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 139 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 140 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 141 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 142 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 143 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 144 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 145 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 146 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 147 Deep convnets for global recognition (Master in Computer Vision Barcelona 2016) Slide 148
Upcoming SlideShare
Case Study of Convolutional Neural Network
Next
Download to read offline and view in fullscreen.

7 Likes

Share

Download to read offline

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)

Download to read offline

Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016).
http://pagines.uab.cat/mcv/

This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features.

Please submit your feedback as comments on the GDrive source slides:
https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing

Related Books

Free with a 30 day trial from Scribd

See all

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)

  1. 1. @DocXavi Module 5 - Lecture 10 Deep Convnets for Global Recognition 30 March 2016 Xavier Giró-i-Nieto http://pagines.uab.cat/mcv/
  2. 2. 2 Densely linked slides
  3. 3. 3 Acknowledgments Santi Pascual
  4. 4. 4 Acknowledgments
  5. 5. Two deep lectures in M5 5 Global Scale (today’s lecture) Local Scale (next lecture) Deep ConvNets for Recognition at...
  6. 6. Previously, in M3... 6Slide credit: Jose M Àlvarez Dog
  7. 7. Previously, in M3... 7Slide credit: Jose M Àlvarez Dog Learned Representation
  8. 8. Outline for this session in M5... 8 Dog Learned Representation Part I: End-to-end learning (E2E)
  9. 9. Outline for this session in M5... 9 Learned Representation Part I: End-to-end learning (E2E) Task A (eg. image classification)
  10. 10. Outline for this session in M5... 10 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features
  11. 11. Outline for this session in M5... 11 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features
  12. 12. E2E: Classification: LeNet-5 12 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278- 2324.
  13. 13. E2E: Classification: LeNet-5 13 Demo: 3D Visualization of a Convolutional Neural Network Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer International Publishing, 2015.
  14. 14. E2E: Classification: Similar to LeNet-5 14 Demo: Classify MNIST digits with a Convolutional Neural Network “ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat.”
  15. 15. E2E: Classification: Databases 15 Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  16. 16. 16 E2E: Classification: Databases Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  17. 17. 17 Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web] E2E: Classification: Databases ● 205 scene classes (categories). ● Images: ○ 2.5M train ○ 20.5k validation ○ 41k test
  18. 18. 18 E2E: Classification: ImageNet ILSRVC ● 100 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  19. 19. E2E: Classification: ImageNet ILSRVC Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ● Predict 5 classes.
  20. 20. Slide credit: Rob Fergus (NYU) Image Classifcation 2012 -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 20 E2E: Classification: ILSRVC
  21. 21. E2E: Classification: AlexNet (Supervision) 21Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  22. 22. 22Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  23. 23. 23Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  24. 24. 24Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  25. 25. 25Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  26. 26. 26Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  27. 27. 27 Rectified Linear Unit (non-linearity) f(x) = max(0,x) Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  28. 28. 28 Dot Product Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  29. 29. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 29 E2E: Classification: ImageNet ILSRVC
  30. 30. The development of better convnets is reduced to trial-and- error. 30 E2E: Classification: Visualize: ZF Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  31. 31. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 31 E2E: Classification: Visualize: ZF
  32. 32. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net 32 E2E: Classification: Visualize: ZF
  33. 33. 33 E2E: Classification: Visualize: ZF
  34. 34. 34 E2E: Classification: Visualize: ZF
  35. 35. 35 E2E: Classification: Visualize: ZF
  36. 36. “To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer.” 36 E2E: Classification: Visualize: ZF
  37. 37. 37 E2E: Classification: Visualize: ZF
  38. 38. “(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables.” 38 E2E: Classification: Visualize: ZF
  39. 39. XX “(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the feature maps are always positive.” 39 E2E: Classification: Visualize: ZF
  40. 40. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 40 E2E: Classification: Visualize: ZF
  41. 41. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 41 E2E: Classification: Visualize: ZF
  42. 42. 42 Top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach. Corresponding image patches. E2E: Classification: Visualize: ZF
  43. 43. 43 E2E: Classification: Visualize: ZF
  44. 44. 44 E2E: Classification: Visualize: ZF
  45. 45. 45 E2E: Classification: Visualize: ZF
  46. 46. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 46 E2E: Classification: Visualize: ZF
  47. 47. 47 The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) Clarifai (Layer 1) E2E: Classification: Visualize: ZF
  48. 48. 48 Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet. AlexNet (Layer 2) Clarifai (Layer 2) E2E: Classification: Visualize: ZF
  49. 49. 49 Regularization with dropout: Reduction of overfitting by setting to zero the output of a portion (typically 50%) of the each intermediate neuron. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago E2E: Classification: Dropout: ZF
  50. 50. 50 E2E: Classification: Visualize: ZF
  51. 51. 51 E2E: Classification: Ensembles: ZF
  52. 52. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 52 E2E: Classification: ImageNet ILSRVC
  53. 53. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 53 E2E: Classification: ImageNet ILSRVC
  54. 54. E2E: Classification 54
  55. 55. E2E: Classification: GoogLeNet 55Movie: Inception (2010)
  56. 56. E2E: Classification: GoogLeNet 56 ● 22 layers, but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
  57. 57. E2E: Classification: GoogLeNet 57 ● Challenges of going deeper: ○ Overfitting, due to the increase amount of parameters. ○ Inefficient computation if most weights end up close to zero. Solution Sparsity How ? Inception modules
  58. 58. E2E: Classification: GoogLeNet 58
  59. 59. E2E: Classification: GoogLeNet 59 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  60. 60. E2E: Classification: GoogLeNet 60 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  61. 61. E2E: Classification: GoogLeNet (NiN) 61 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  62. 62. 62 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  63. 63. 63 In NiN, the Cascaded 1x1 Convolutions compute reductions after the convolutions. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  64. 64. E2E: Classification: GoogLeNet 64 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.
  65. 65. E2E: Classification: GoogLeNet 65 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  66. 66. E2E: Classification: GoogLeNet 66 3x3 max pooling introduces somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.
  67. 67. E2E: Classification: GoogLeNet 67 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !
  68. 68. E2E: Classification: GoogLeNet 68NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
  69. 69. E2E: Classification: GoogLeNet 69 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
  70. 70. E2E: Classification: VGG 70 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  71. 71. E2E: Classification: VGG 71 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  72. 72. E2E: Classification: VGG: 3x3 Stacks 72 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  73. 73. E2E: Classification: VGG 73 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  74. 74. E2E: Classification 74 3.6% top 5 error… with 152 layers !!
  75. 75. E2E: Classification: ResNet 75 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  76. 76. E2E: Classification: ResNet 76 ● Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  77. 77. E2E: Classification: ResNet 77 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  78. 78. E2E: Classification: ResNet 78 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  79. 79. 79 E2E: Classification: Humans Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  80. 80. 80 E2E: Classification: Humans “Is this a Border terrier” ? Crowdsourcing Yes No ● Binary ground truth annotation from the crowd. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  81. 81. 81 ● Annotation Problems: Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Crowdsource loss (0.3%) More than 5 objects classes E2E: Classification: Humans
  82. 82. 82 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] E2E: Classification: Humans
  83. 83. 83 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] “Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120 breeds of dog to guess what species it is ?” E2E: Classification: Humans
  84. 84. E2E: Classification: Humans 84 NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015) ResNet
  85. 85. 8585 Let’s play a game!
  86. 86. 86
  87. 87. 87 What have you seen?
  88. 88. 88 Tower
  89. 89. 89 Tower House
  90. 90. 90 Tower House Rocks
  91. 91. 91 E2E: Saliency Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
  92. 92. 92 Eye Tracker Mouse Click Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015) E2E: Saliency
  93. 93. 93 E2E: Saliency: JuntingNet JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  94. 94. 94 JuntingNet DATA iSun [Xu’15] SALICON [Jiang’15] E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  95. 95. 95 TRAIN VALIDATION TEST SALICON [Jiang’15] 10,000 5,000 5,000 iSun [Xu’15] 6,000 926 2,000 CAT2000 [Borji’15] 2,000 - 2,000 MIT300 [Judd’12] 300 - -Large Scale E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  96. 96. 96 ARCHITECTURE [Pan’15] E2E: Saliency: JuntingNet JuntingNet DATA iSun [Xu’15] SALICON [Jiang’15]
  97. 97. 97 Upsample + filter 2D map 96x96 2340=48x48 IMAGE INPUT (RGB) E2E: Saliency: JuntingNet
  98. 98. 98 Upsample + filter 2D map 96x96 2340=48x48 3 CONV LAYERS E2E: Saliency: JuntingNet
  99. 99. 99 Upsample + filter 2D map 96x96 2340=48x48 2 DENSE LAYERS E2E: Saliency: JuntingNet
  100. 100. 100 Upsample + filter 2D map 96x96 2340=48x48 E2E: Saliency: JuntingNet
  101. 101. 101 JuntingNet ARCHITECTURE [Pan’15] DATA iSun [Xu’15] SALICON [Jiang’15] SOFTWARE [Bergstra’10] [Bastien’12] E2E: Saliency: JuntingNet
  102. 102. 102 Loss function Mean Square Error (MSE) Weight initialization Gaussian distribution Learning rate 0.03 to 0.0001 Mini batch size 128 Training time 7h (SALICON) / 4h (iSUN) Acceleration SGD+ nesterov momentum (0.9) Regularisation Maxout norm GPU NVidia GTX 980 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  103. 103. 103 Number of iterations (Training time) ● Back-propagation with the Euclidean distance. ● Training curve for the SALICON database. E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  104. 104. 104 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  105. 105. 105 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  106. 106. 106 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  107. 107. 107 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  108. 108. 108 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  109. 109. 109 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  110. 110. 110 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  111. 111. Outline for this session in M5... 111 Part I: End-to-end learning (E2E) Domain B Fine-tuned Learned Representation Part I’: End-to-End Fine-Tuning (FT) Part I: End-to-end learning (E2E) Domain ALearned Representation Part I: End-to-end learning (E2E) Transfer
  112. 112. 112 E2E: Fine-tuning Fine-tuning a pre-trained network Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
  113. 113. 113 E2E: Fine-tuning Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015) Fine-tuning a pre-trained network
  114. 114. 114 E2E: Fine-tuning: Sentiments CNN Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57-62. ACM, 2015.
  115. 115. 115 Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR 2015 E2E: Fine-tuning: Sentiments True positive True negative False positive False negative Visualizations with fully convolutional networks.
  116. 116. 116 E2E: Fine-tuning: Cultural events ChaLearn Workshop A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People Workshop 2015, 2015. [slides]
  117. 117. Outline for this session in M5... 117 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval) Part II: Off-The-Shelf features (OTS)
  118. 118. 118 Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Off-The-Shelf (OTS) Features
  119. 119. ● Intermediate features can be used as regular visual descriptors for any task. 119 Off-The-Shelf (OTS) Features Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
  120. 120. 120 OTS: Classification: Razavian Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Pascal VOC 2007
  121. 121. 121 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014 Classifier L2-normalization Accuracy +5%
  122. 122. Three representative architectures considered: AlexNet ZF OverFeat 5 days (fast) 3 weeks (slow) @ NVIDIA GTX Titan GPU 122 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  123. 123. F C 123 OTS: Classification: Return of devil Data augmentation Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  124. 124. 124 OTS: Classification: Return of devil Fisher Kernels (FK) ConvNets (CNN)
  125. 125. Color Gray Scale (GS) Accuracy -2.5% 125 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  126. 126. Dimensionality reduction by retraining the last layer to smaller sizes. Accuracy -2% Size x32 126 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  127. 127. Ranking Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 584-599. 127 OTS: Retrieval
  128. 128. Oxford Buildings Inria Holidays UKB 128 OTS: Retrieval
  129. 129. Pooled from the network from Krizhevsky et. al. pretrained with images from ImageNet. 129 OTS: Retrieval: FC layers Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
  130. 130. Off-the-shelf CNN descriptors from fully connected layers show useful but not superior (w.r.t. FV, VLAD, Sparse Coding,...) 130Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. OTS: Retrieval: FC layers
  131. 131. Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 131 Convolutional layers have shown better performance than fully connected ones. OTS: Retrieval: Conv layers
  132. 132. OTS: Retrieval: Conv layers Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 132 Spatial Search, (extract N local descriptor from predefined locations) increases performance at computational cost.
  133. 133. Medium memory footprints Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 133 OTS: Retrieval: Conv layers
  134. 134. Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted) 134 ... Instance Retrieval (Instance: Object, Building, Person, Place…) OTS: Retrieval: Bag of Words (BoW)
  135. 135. Mohedano et al, Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016) 135 OTS: Retrieval: Bag of Words (BoW) An assignment map of BoF quantized features can be defined over the spatial locations of the convolutional feature maps.
  136. 136. 136Mohedano et al, Bag of Convolutional Local Features for Scalable Instance Search (submitted) OTS: Retrieval: Bag of Words (BoW)
  137. 137. 137 OTS: Summarization Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy: 2015 Clustering based on Euclidean distance over FC7 features from AlexNet.
  138. 138. 138 OTS: Reinforcement learning: DeepQ Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). (Google) DeepMind
  139. 139. 139 OTS: Reinforcement learning: Deep Q Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Demo: Deep Q learning Demo
  140. 140. 140 OTS: Reinforcement learning Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforcement learning." ICCV 2015 [Slides by Miriam Bellver] Other: Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Object is localized based on visual features from AlexNet FC6.
  141. 141. ConvNets: Learn more 141 The New York Times: “The Race Is On to Control Artificial Intelligence, and Tech’s Future” (25/03/2016)
  142. 142. 142 Keras http://keras.io/ Caffe http://caffe.berkeleyvision.org/ Torch (Overfeat) http://torch.ch/ Theano http://deeplearning.net/software/theano/ Tensor Flow https://www.tensorflow.org/ MatconvNet (VLFeat) http://www.vlfeat.org/matconvnet/ CNTK (Mcrosoft) http://www.cntk.ai/ ConvNets: Frameworks
  143. 143. 143 ConvNets: Frameworks Source: @fchollet
  144. 144. Stanford course: CS231n: Convolutional Neural Networks for Visual Recognition 144 ConvNets: Learn more
  145. 145. 145 ● Reading Group on Tuesdays (UPC) and Wednesdays (UB) at 11am. Schedule, list of paper, slides & video(s): https://github.com/imatge-upc/readcv ConvNets: Learn more Open to MCV students. E-mail me at xavier. giro@upc.edu before attending :)
  146. 146. 146 ● Summer Course “Deep learning for computer vision” (4-8 July 2016) (Very) temptative program at: https://github.com/imatge-upc/etsetb-2016-dlcv ConvNets: Learn more E-mail me at xavier. giro@upc.edu for a free spot (subject to availability)
  147. 147. 147 ● Broader and more friendly slides for dissemination. [Available on slideshare.net] ConvNets: Learn more
  148. 148. 148 Thank you ! https://imatge.upc.edu/web/people/xavier-giro https://twitter.com/DocXavi https://www.facebook.com/ProfessorXavi xavier.giro@upc.edu Xavier Giró-i-Nieto
  • abooun1011

    Feb. 13, 2018
  • AngeloGenovese

    Jun. 29, 2017
  • williammdavis

    Jan. 10, 2017
  • AdriRomeroLpez

    Apr. 20, 2016
  • lavakaflecomputernepal

    Apr. 10, 2016
  • stephendale

    Apr. 9, 2016
  • nospotfer

    Mar. 30, 2016

Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016). http://pagines.uab.cat/mcv/ This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features. Please submit your feedback as comments on the GDrive source slides: https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing

Views

Total views

2,696

On Slideshare

0

From embeds

0

Number of embeds

30

Actions

Downloads

68

Shares

0

Comments

0

Likes

7

×