SlideShare a Scribd company logo
@DocXavi
Deep Learning for Computer Vision
Image Analytics
5 May 2016
Xavier Giró-i-Nieto
Master en
Creació Multimedia
2
Densely linked slides
3
Introduction
Xavier Giro-i-Nieto
• Web: https://imatge.upc.edu/web/people/xavier-giro
Associate Professor at Universitat Politecnica de Catalunya (UPC)
4
Acknowledgments
5
Acknowledgments
One lecture organized in three parts
6
Images (global) Objects (local)
Deep ConvNets for Recognition for...
Video (2D+T)
One lecture organized in three parts
7
Images (global) Objects (local)
Deep ConvNets for Recognition for...
Video (2D+T)
Previously, before deep learning...
8Slide credit: Jose M Àlvarez
Dog
9Slide credit: Jose M Àlvarez
Dog
Learned
Representation
Previously, before deep learning...
Outline for Part I: Image Analytics...
10
Dog
Learned
Representation
Part I: End-to-end learning (E2E)
11
Learned
Representation
Part I: End-to-end learning (E2E)
Task A
(eg. image classification)
Outline for Part I: Image Analytics...
12
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)Part II: Off-the-shelf features
Outline for Part I: Image Analytics...
13
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)Part II: Off-the-shelf features
Outline for Part I: Image Analytics...
E2E: Classification: Supervised learning
14
Manual Annotations
Model
New Image
Automatic
classification
Training
Test
Anchor
E2E: Classification: LeNet-5
15
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-
2324.
E2E: Classification: LeNet-5
16
Demo: 3D Visualization of a Convolutional Neural Network
Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer
International Publishing, 2015.
E2E: Classification: Similar to LeNet-5
17
Demo: Classify MNIST digits with a Convolutional Neural Network
“ConvNetJS is a Javascript library for
training Deep Learning models (mainly
Neural Networks) entirely in your browser.
Open a tab and you're training. No software
requirements, no compilers, no installations,
no GPUs, no sweat.”
E2E: Classification: Databases
18
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
19
E2E: Classification: Databases
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
20
Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition
using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web]
E2E: Classification: Databases
● 205 scene classes (categories).
● Images:
○ 2.5M train
○ 20.5k validation
○ 41k test
21
E2E: Classification: ImageNet ILSRVC
● 1000 object classes (categories).
● Images:
○ 1.2 M train
○ 100k test.
E2E: Classification: ImageNet ILSRVC
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
● Predict 5 classes.
Slide credit:
Rob Fergus (NYU)
Image Classifcation 2012
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web] 23
E2E: Classification: ILSRVC
E2E: Classification: AlexNet (Supervision)
24Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
Orange
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in
Neural Information Processing Systems 25 (NIPS 2012)
25Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
26Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
27Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
28Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
29Image credit: Deep learning Tutorial (Stanford University)
E2E: Classification: AlexNet (Supervision)
30
Rectified
Linear
Unit
(non-linearity)
f(x) = max(0,x)
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
31
Dot Product
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015)
E2E: Classification: AlexNet (Supervision)
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
Slide credit:
Rob Fergus (NYU)
32
E2E: Classification: ImageNet ILSRVC
The development of better
convnets is reduced to trial-and-
error.
33
E2E: Classification: Visualize: ZF
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to features does the opposite.”
Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision
(ICCV), 2011 IEEE International Conference on. IEEE, 2011.
34
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
DeconvN
et
Conv
Net
35
E2E: Classification: Visualize: ZF
36
E2E: Classification: Visualize: ZF
37
E2E: Classification: Visualize: ZF
38
E2E: Classification: Visualize: ZF
“To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature
maps as input to the attached deconvnet layer.”
39
E2E: Classification: Visualize: ZF
40
E2E: Classification: Visualize: ZF
“(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate
inverse by recording the locations of the maxima within each pooling region in a set of switch variables.”
41
E2E: Classification: Visualize: ZF
XX
“(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the
feature maps are always positive.”
42
E2E: Classification: Visualize: ZF
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder
models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this
means flipping each filter vertically and horizontally.
XX
XX
43
E2E: Classification: Visualize: ZF
“(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To
approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder
models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this
means flipping each filter vertically and horizontally.
XX
XX
44
E2E: Classification: Visualize: ZF
45
Top 9 activations in a random subset of feature maps
across the validation data, projected down to pixel space
using our deconvolutional network approach.
Corresponding image patches.
E2E: Classification: Visualize: ZF
46
E2E: Classification: Visualize: ZF
47
E2E: Classification: Visualize: ZF
48
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
49
E2E: Classification: Visualize: ZF
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
50
E2E: Classification: Visualize: ZF
51
The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead"
features.
AlexNet (Layer 1) Clarifai (Layer 1)
E2E: Classification: Visualize: ZF
52
Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet.
AlexNet (Layer 2) Clarifai (Layer 2)
E2E: Classification: Visualize: ZF
53
Regularization with dropout:
Reduction of overfitting by setting
to zero the output of a portion
(typically 50%) of each
intermediate neuron.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of
feature detectors. arXiv preprint arXiv:1207.0580.
Chicago
E2E: Classification: Dropout: ZF
54
E2E: Classification: Visualize: ZF
55
E2E: Classification: Ensembles: ZF
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
Slide credit:
Rob Fergus (NYU)
56
E2E: Classification: ImageNet ILSRVC
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
Slide credit:
Rob Fergus (NYU)
57
E2E: Classification: ImageNet ILSRVC
E2E: Classification
58
E2E: Classification: GoogLeNet
59Movie: Inception (2010)
E2E: Classification: GoogLeNet
60
● 22 layers, but 12 times fewer parameters than AlexNet.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
E2E: Classification: GoogLeNet
61
● Challenges of going deeper:
○ Overfitting, due to the increase amount of parameters.
○ Inefficient computation if most weights end up close to zero.
Solution
Sparsity
How ?
Inception modules
E2E: Classification: GoogLeNet
62
E2E: Classification: GoogLeNet
63
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
64
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet (NiN)
65
3x3 and 5x5 convolutions deal with
different scales.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
66
1x1 convolutions does dimensionality reduction (c3<c2) and
accounts for rectified linear units (ReLU).
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
E2E: Classification: GoogLeNet (NiN)
67
In NiN, the Cascaded 1x1 Convolutions compute reductions after the
convolutions.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
E2E: Classification: GoogLeNet (NiN)
E2E: Classification: GoogLeNet
68
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 and 5x5 convolutions.
E2E: Classification: GoogLeNet
69
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
E2E: Classification: GoogLeNet
70
3x3 max pooling introduces somewhat spatial invariance,
and has proven a benefitial effect by adding an alternative
parallel path.
E2E: Classification: GoogLeNet
71
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
providing regularization at training time.
...and no fully connected layers needed !
E2E: Classification: GoogLeNet
72
E2E: Classification: GoogLeNet
73NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
E2E: Classification: GoogLeNet
74
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
E2E: Classification: VGG
75
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
76
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG: 3x3 Stacks
77
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
78
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).
E2E: Classification
79
3.6% top 5 error…
with 152 layers !!
E2E: Classification: ResNet
80
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
81
● Deeper networks (34 is deeper than 18) are more difficult to train.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
Thin curves: training error
Bold curves: validation error
E2E: Classification: ResNet
82
● Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
E2E: Classification: ResNet
83
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015). [slides]
84
E2E: Classification: Humans
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
85
E2E: Classification: Humans
“Is this a Border terrier” ?
Crowdsourcing
Yes
No
● Binary ground truth annotation from the crowd.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
86
● Annotation Problems:
Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of
crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575. [web]
Crowdsource loss (0.3%) More than 5 objects classes
E2E: Classification: Humans
87
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one human. [interface]
E2E: Classification: Humans
88
Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014)
● Test data collection from one human. [interface]
“Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120
breeds of dog to guess what species it is ?”
E2E: Classification: Humans
E2E: Classification: Humans
89
NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015)
ResNet
9090
Let’s play a
game!
91
92
What have you seen?
93
Tower
94
Tower
House
95
Tower
House
Rocks
96
E2E: Saliency
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
97
Eye Tracker Mouse Click
Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
E2E: Saliency
98
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
99
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
100
TRAIN VALIDATION TEST
10,000 5,000 5,000
6,000 926 2,000
CAT2000 [Borji’15] 2,000 - 2,000
MIT300 [Judd’12] 300 - -Large
Scale
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
101
E2E: Saliency: JuntingNet
102
Upsample
+ filter
2D
map
96x96 2340=48x48
IMAGE
INPUT
(RGB)
E2E: Saliency: JuntingNet
103
Upsample
+ filter
2D
map
96x96 2340=48x48
3 CONV LAYERS
E2E: Saliency: JuntingNet
104
Upsample
+ filter
2D
map
96x96 2340=48x48
2 DENSE
LAYERS
E2E: Saliency: JuntingNet
105
Upsample
+ filter
2D
map
96x96 2340=48x48
E2E: Saliency: JuntingNet
106
E2E: Saliency: JuntingNet
107
Loss function Mean Square Error (MSE)
Weight initialization Gaussian distribution
Learning rate 0.03 to 0.0001
Mini batch size 128
Training time 7h (SALICON) / 4h (iSUN)
Acceleration SGD+ nesterov momentum (0.9)
Regularisation Maxout norm
GPU NVidia GTX 980
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
108
Number of iterations
(Training time)
● Back-propagation with the Euclidean distance.
● Training curve for the SALICON database.
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
109
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
110
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
111
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: iSUN
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
112
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
113
JuntingNetGround TruthPixels
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
114
Results from CVPR LSUN Challenge 2015
E2E: Saliency: JuntingNet: SALICON
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
115
E2E: Saliency: JuntingNet
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." CVPR 2016
116
Part I: End-to-end learning (E2E)
Domain B
Fine-tuned
Learned
Representation
Part I’: End-to-End Fine-Tuning (FT)
Part I: End-to-end learning (E2E)
Domain ALearned
Representation
Part I: End-to-end learning (E2E)
Transfer
Outline for Part I: Image Analytics
117
E2E: Fine-tuning
Fine-tuning a pre-trained network
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
118
E2E: Fine-tuning
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
Fine-tuning a pre-trained network
119
E2E: Fine-tuning: Sentiments
CNN
Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding
Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect &
Sentiment in Multimedia, pp. 57-62. ACM, 2015.
120
Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted)
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic
Segmentation." CVPR 2015
E2E: Fine-tuning: Sentiments
True positive True negative False positive False negative
Visualizations with fully convolutional networks.
121
E2E: Fine-tuning: Cultural events
ChaLearn Workshop
A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural
Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People
Workshop 2015, 2015. [slides]
122
VGG +
Fine tuned
E2E: Fine-tuning: Saliency prediction
123
E2E: Fine-tuning: Saliency prediction
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016.
From
scratch
VGG +
Fine tuned
124
E2E: Fine-tuning: Saliency prediction
Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional
Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016.
125
Task A
(eg. image classification)
Learned
Representation
Part I: End-to-end learning (E2E)
Task B
(eg. image retrieval)
Part II: Off-The-Shelf features (OTS)
Outline for Part I: Image Analytics...
126
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf:
an astounding baseline for recognition." CVPRW 2014
Off-The-Shelf (OTS) Features
● Intermediate features can be used as regular visual descriptors for any task.
127
Off-The-Shelf (OTS) Features
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
128
OTS: Classification: Razavian
Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf:
an astounding baseline for recognition." CVPRW 2014
Pascal VOC 2007
129
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Classifier
L2-normalization
Accuracy
+5%
Three representative architectures considered:
AlexNet
ZF
OverFeat
5 days (fast)
3 weeks (slow)
@ NVIDIA GTX Titan GPU
130
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
F
C
131
OTS: Classification: Return of devil
Data augmentation
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
132
OTS: Classification: Return of devil
Fisher
Kernels
(FK)
ConvNets
(CNN)
Color
Gray Scale (GS)
Accuracy
-2.5%
133
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Dimensionality reduction by retraining the last layer to smaller sizes.
Accuracy
-2%
Size
x32
134
OTS: Classification: Return of devil
Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving
deep into convolutional nets. BMVC 2014
Ranking
Summary of the paper by Amaia Salvador on Bitsearch.
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer
International Publishing, 2014. 584-599.
135
OTS: Retrieval
Oxford Buildings Inria Holidays UKB
136
OTS: Retrieval
Pooled from the network from Krizhevsky et. al. pretrained with images from
ImageNet.
137
OTS: Retrieval: FC layers
Summary of the paper by Amaia Salvador on Bitsearch.
Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
Off-the-shelf CNN descriptors from fully connected layers show useful but not
superior (w.r.t. FV, VLAD, Sparse Coding,...)
138Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
OTS: Retrieval: FC layers
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
139
Convolutional layers have shown better performance than fully connected ones.
OTS: Retrieval: Conv layers
OTS: Retrieval: Conv layers
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
140
Spatial Search, (extract N local descriptor from predefined locations) increases
performance at computational cost.
Medium memory footprints
Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015.
141
OTS: Retrieval: Conv layers
142
OTS: Summarization
Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative
Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy:
2015
Clustering based on Euclidean distance over FC7 features from AlexNet.
143
Thank you !
https://imatge.upc.edu/web/people/xavier-giro
https://twitter.com/DocXavi
https://www.facebook.com/ProfessorXavi
xavier.giro@upc.edu
Xavier Giró-i-Nieto

More Related Content

What's hot

Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
Universitat Politècnica de Catalunya
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Universitat Politècnica de Catalunya
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Universitat Politècnica de Catalunya
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
Universitat Politècnica de Catalunya
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
Mark Scully
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Universitat Politècnica de Catalunya
 

What's hot (20)

Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 

Viewers also liked

Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
Universitat Politècnica de Catalunya
 
Cwin16 tls cnes-realite_augmentee_eng_v1 2
Cwin16 tls cnes-realite_augmentee_eng_v1 2Cwin16 tls cnes-realite_augmentee_eng_v1 2
Cwin16 tls cnes-realite_augmentee_eng_v1 2
Capgemini
 
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
Jeremy Karnowski
 
Augmented reality intro for mobile apps
Augmented reality intro for mobile appsAugmented reality intro for mobile apps
Augmented reality intro for mobile apps
Heather Downing
 
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
Nat Weerawan
 
See like a terminator: augmented reality with oculus rift - Martin Förtsch
See like a terminator: augmented reality with oculus rift - Martin FörtschSee like a terminator: augmented reality with oculus rift - Martin Förtsch
See like a terminator: augmented reality with oculus rift - Martin Förtsch
WithTheBest
 
How to Design Augmented Reality Experience ?
How to Design Augmented Reality Experience ?How to Design Augmented Reality Experience ?
How to Design Augmented Reality Experience ?
Deepak Kamboj
 
6 Retail Trends & 7 Ways Amazon is Changing Everything
6 Retail Trends & 7 Ways Amazon is Changing Everything 6 Retail Trends & 7 Ways Amazon is Changing Everything
6 Retail Trends & 7 Ways Amazon is Changing Everything
Deborah Weinswig
 
Augmented reality vs virtual reality
Augmented reality vs virtual realityAugmented reality vs virtual reality
Augmented reality vs virtual reality
heretohelpyou
 
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
AugmentedWorldExpo
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)
Brian Gross
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Chatbot 101 - Robert McGovern
Chatbot 101 - Robert McGovernChatbot 101 - Robert McGovern
Chatbot 101 - Robert McGovern
Robert McGovern
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Universitat Politècnica de Catalunya
 

Viewers also liked (19)

Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
Deep Learning for Computer Vision (4/4): Beyond vision @ laSalle 2016
 
Cwin16 tls cnes-realite_augmentee_eng_v1 2
Cwin16 tls cnes-realite_augmentee_eng_v1 2Cwin16 tls cnes-realite_augmentee_eng_v1 2
Cwin16 tls cnes-realite_augmentee_eng_v1 2
 
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
Classification of blue whale D calls and fin whale 40-Hz calls using deep lea...
 
Augmented reality intro for mobile apps
Augmented reality intro for mobile appsAugmented reality intro for mobile apps
Augmented reality intro for mobile apps
 
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
Create connected home devices using a Raspberry Pi, Siri and ESPNow for makers.
 
See like a terminator: augmented reality with oculus rift - Martin Förtsch
See like a terminator: augmented reality with oculus rift - Martin FörtschSee like a terminator: augmented reality with oculus rift - Martin Förtsch
See like a terminator: augmented reality with oculus rift - Martin Förtsch
 
How to Design Augmented Reality Experience ?
How to Design Augmented Reality Experience ?How to Design Augmented Reality Experience ?
How to Design Augmented Reality Experience ?
 
6 Retail Trends & 7 Ways Amazon is Changing Everything
6 Retail Trends & 7 Ways Amazon is Changing Everything 6 Retail Trends & 7 Ways Amazon is Changing Everything
6 Retail Trends & 7 Ways Amazon is Changing Everything
 
Augmented reality vs virtual reality
Augmented reality vs virtual realityAugmented reality vs virtual reality
Augmented reality vs virtual reality
 
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
Wolfgang Stelzle (RE’FLEKT) Time to make Money with Augmented Reality – Tools...
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)Chatbot AI Aeromexico (public)
Chatbot AI Aeromexico (public)
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 
Chatbot 101 - Robert McGovern
Chatbot 101 - Robert McGovernChatbot 101 - Robert McGovern
Chatbot 101 - Robert McGovern
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
 

Similar to Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
Joe Robinson
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
multimediaeval
 
An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
guestd1b1b5
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
zukun
 
Inspirational applications of deep learning
Inspirational applications of deep learningInspirational applications of deep learning
Inspirational applications of deep learning
ssh1
 
convnets.pptx
convnets.pptxconvnets.pptx
convnets.pptx
MohamedAliHabib3
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMSolr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Lucidworks
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
ssuser80a05c
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
Hiroshi Fukui
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)
Yehya Abouelnaga
 
On the Influence Propagation of Web Videos
On the Influence Propagation of Web VideosOn the Influence Propagation of Web Videos
On the Influence Propagation of Web Videos
abidhavp
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
Nilavra Bhattacharya
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
Suneel Marthi
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
Wesley De Neve
 
Sharath copy
Sharath   copySharath   copy
Sharath copy
Sharath Ramesh
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
Jure Leskovec
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
Berlin Language Technology
 

Similar to Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016 (20)

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
 
An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
Inspirational applications of deep learning
Inspirational applications of deep learningInspirational applications of deep learning
Inspirational applications of deep learning
 
convnets.pptx
convnets.pptxconvnets.pptx
convnets.pptx
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMSolr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)
 
On the Influence Propagation of Web Videos
On the Influence Propagation of Web VideosOn the Influence Propagation of Web Videos
On the Influence Propagation of Web Videos
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...Sparse representation based human action recognition using an action region-a...
Sparse representation based human action recognition using an action region-a...
 
Sharath copy
Sharath   copySharath   copy
Sharath copy
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
 

More from Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 

Recently uploaded

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 

Recently uploaded (20)

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 

Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016

  • 1. @DocXavi Deep Learning for Computer Vision Image Analytics 5 May 2016 Xavier Giró-i-Nieto Master en Creació Multimedia
  • 3. 3 Introduction Xavier Giro-i-Nieto • Web: https://imatge.upc.edu/web/people/xavier-giro Associate Professor at Universitat Politecnica de Catalunya (UPC)
  • 6. One lecture organized in three parts 6 Images (global) Objects (local) Deep ConvNets for Recognition for... Video (2D+T)
  • 7. One lecture organized in three parts 7 Images (global) Objects (local) Deep ConvNets for Recognition for... Video (2D+T)
  • 8. Previously, before deep learning... 8Slide credit: Jose M Àlvarez Dog
  • 9. 9Slide credit: Jose M Àlvarez Dog Learned Representation Previously, before deep learning...
  • 10. Outline for Part I: Image Analytics... 10 Dog Learned Representation Part I: End-to-end learning (E2E)
  • 11. 11 Learned Representation Part I: End-to-end learning (E2E) Task A (eg. image classification) Outline for Part I: Image Analytics...
  • 12. 12 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features Outline for Part I: Image Analytics...
  • 13. 13 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval)Part II: Off-the-shelf features Outline for Part I: Image Analytics...
  • 14. E2E: Classification: Supervised learning 14 Manual Annotations Model New Image Automatic classification Training Test Anchor
  • 15. E2E: Classification: LeNet-5 15 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278- 2324.
  • 16. E2E: Classification: LeNet-5 16 Demo: 3D Visualization of a Convolutional Neural Network Harley, Adam W. "An Interactive Node-Link Visualization of Convolutional Neural Networks." In Advances in Visual Computing, pp. 867-877. Springer International Publishing, 2015.
  • 17. E2E: Classification: Similar to LeNet-5 17 Demo: Classify MNIST digits with a Convolutional Neural Network “ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat.”
  • 18. E2E: Classification: Databases 18 Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 19. 19 E2E: Classification: Databases Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 20. 20 Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. "Learning deep features for scene recognition using places database." In Advances in neural information processing systems, pp. 487-495. 2014. [web] E2E: Classification: Databases ● 205 scene classes (categories). ● Images: ○ 2.5M train ○ 20.5k validation ○ 41k test
  • 21. 21 E2E: Classification: ImageNet ILSRVC ● 1000 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  • 22. E2E: Classification: ImageNet ILSRVC Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] ● Predict 5 classes.
  • 23. Slide credit: Rob Fergus (NYU) Image Classifcation 2012 -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 23 E2E: Classification: ILSRVC
  • 24. E2E: Classification: AlexNet (Supervision) 24Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  • 25. 25Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 26. 26Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 27. 27Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 28. 28Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 29. 29Image credit: Deep learning Tutorial (Stanford University) E2E: Classification: AlexNet (Supervision)
  • 30. 30 Rectified Linear Unit (non-linearity) f(x) = max(0,x) Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 31. 31 Dot Product Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB-UPC 2015) E2E: Classification: AlexNet (Supervision)
  • 32. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 32 E2E: Classification: ImageNet ILSRVC
  • 33. The development of better convnets is reduced to trial-and- error. 33 E2E: Classification: Visualize: ZF Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  • 34. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 34 E2E: Classification: Visualize: ZF
  • 35. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. DeconvN et Conv Net 35 E2E: Classification: Visualize: ZF
  • 39. “To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer.” 39 E2E: Classification: Visualize: ZF
  • 41. “(i) Unpool: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables.” 41 E2E: Classification: Visualize: ZF
  • 42. XX “(ii) Rectification: The convnet uses ReLU non-linearities, which rectify the feature maps thus ensuring the feature maps are always positive.” 42 E2E: Classification: Visualize: ZF
  • 43. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 43 E2E: Classification: Visualize: ZF
  • 44. “(iii) Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To approximately invert this, the deconvnet uses transposed versions of the same filters (as other autoencoder models, such as RBMs), but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally. XX XX 44 E2E: Classification: Visualize: ZF
  • 45. 45 Top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach. Corresponding image patches. E2E: Classification: Visualize: ZF
  • 49. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 49 E2E: Classification: Visualize: ZF
  • 50. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 50 E2E: Classification: Visualize: ZF
  • 51. 51 The smaller stride (2 vs 4) and filter size (7x7 vs 11x11) results in more distinctive features and fewer “dead" features. AlexNet (Layer 1) Clarifai (Layer 1) E2E: Classification: Visualize: ZF
  • 52. 52 Cleaner features in Clarifai, without the aliasing artifacts caused by the stride 4 used in AlexNet. AlexNet (Layer 2) Clarifai (Layer 2) E2E: Classification: Visualize: ZF
  • 53. 53 Regularization with dropout: Reduction of overfitting by setting to zero the output of a portion (typically 50%) of each intermediate neuron. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago E2E: Classification: Dropout: ZF
  • 56. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 56 E2E: Classification: ImageNet ILSRVC
  • 57. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% Slide credit: Rob Fergus (NYU) 57 E2E: Classification: ImageNet ILSRVC
  • 60. E2E: Classification: GoogLeNet 60 ● 22 layers, but 12 times fewer parameters than AlexNet. Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
  • 61. E2E: Classification: GoogLeNet 61 ● Challenges of going deeper: ○ Overfitting, due to the increase amount of parameters. ○ Inefficient computation if most weights end up close to zero. Solution Sparsity How ? Inception modules
  • 63. E2E: Classification: GoogLeNet 63 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 64. E2E: Classification: GoogLeNet 64 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 65. E2E: Classification: GoogLeNet (NiN) 65 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  • 66. 66 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  • 67. 67 In NiN, the Cascaded 1x1 Convolutions compute reductions after the convolutions. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] E2E: Classification: GoogLeNet (NiN)
  • 68. E2E: Classification: GoogLeNet 68 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions.
  • 69. E2E: Classification: GoogLeNet 69 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
  • 70. E2E: Classification: GoogLeNet 70 3x3 max pooling introduces somewhat spatial invariance, and has proven a benefitial effect by adding an alternative parallel path.
  • 71. E2E: Classification: GoogLeNet 71 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed !
  • 73. E2E: Classification: GoogLeNet 73NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
  • 74. E2E: Classification: GoogLeNet 74 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
  • 75. E2E: Classification: VGG 75 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 76. E2E: Classification: VGG 76 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 77. E2E: Classification: VGG: 3x3 Stacks 77 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 78. E2E: Classification: VGG 78 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  • 79. E2E: Classification 79 3.6% top 5 error… with 152 layers !!
  • 80. E2E: Classification: ResNet 80 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 81. E2E: Classification: ResNet 81 ● Deeper networks (34 is deeper than 18) are more difficult to train. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides] Thin curves: training error Bold curves: validation error
  • 82. E2E: Classification: ResNet 82 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 83. E2E: Classification: ResNet 83 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). [slides]
  • 84. 84 E2E: Classification: Humans Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 85. 85 E2E: Classification: Humans “Is this a Border terrier” ? Crowdsourcing Yes No ● Binary ground truth annotation from the crowd. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web]
  • 86. 86 ● Annotation Problems: Carlier, Axel, Amaia Salvador, Ferran Cabezas, Xavier Giro-i-Nieto, Vincent Charvillat, and Oge Marques. "Assessment of crowdsourcing and gamification loss in user-assisted object segmentation." Multimedia Tools and Applications (2015): 1-28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Crowdsource loss (0.3%) More than 5 objects classes E2E: Classification: Humans
  • 87. 87 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] E2E: Classification: Humans
  • 88. 88 Andrej Karpathy, “What I learned from competing against a computer on ImageNet” (2014) ● Test data collection from one human. [interface] “Aww, a cute dog! Would you like to spend 5 minutes scrolling through 120 breeds of dog to guess what species it is ?” E2E: Classification: Humans
  • 89. E2E: Classification: Humans 89 NVIDIA, “Mocha.jl: Deep Learning for Julia” (2015) ResNet
  • 91. 91
  • 96. 96 E2E: Saliency Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015)
  • 97. 97 Eye Tracker Mouse Click Slide credit: Junting Pan, “Visual Saliency Prediction using Deep Learning Techniques” (ETSETB 2015) E2E: Saliency
  • 98. 98 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 99. 99 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 100. 100 TRAIN VALIDATION TEST 10,000 5,000 5,000 6,000 926 2,000 CAT2000 [Borji’15] 2,000 - 2,000 MIT300 [Judd’12] 300 - -Large Scale E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 103. 103 Upsample + filter 2D map 96x96 2340=48x48 3 CONV LAYERS E2E: Saliency: JuntingNet
  • 104. 104 Upsample + filter 2D map 96x96 2340=48x48 2 DENSE LAYERS E2E: Saliency: JuntingNet
  • 107. 107 Loss function Mean Square Error (MSE) Weight initialization Gaussian distribution Learning rate 0.03 to 0.0001 Mini batch size 128 Training time 7h (SALICON) / 4h (iSUN) Acceleration SGD+ nesterov momentum (0.9) Regularisation Maxout norm GPU NVidia GTX 980 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 108. 108 Number of iterations (Training time) ● Back-propagation with the Euclidean distance. ● Training curve for the SALICON database. E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 109. 109 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 110. 110 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 111. 111 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: iSUN Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 112. 112 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 113. 113 JuntingNetGround TruthPixels E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 114. 114 Results from CVPR LSUN Challenge 2015 E2E: Saliency: JuntingNet: SALICON Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 115. 115 E2E: Saliency: JuntingNet Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." CVPR 2016
  • 116. 116 Part I: End-to-end learning (E2E) Domain B Fine-tuned Learned Representation Part I’: End-to-End Fine-Tuning (FT) Part I: End-to-end learning (E2E) Domain ALearned Representation Part I: End-to-end learning (E2E) Transfer Outline for Part I: Image Analytics
  • 117. 117 E2E: Fine-tuning Fine-tuning a pre-trained network Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
  • 118. 118 E2E: Fine-tuning Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015) Fine-tuning a pre-trained network
  • 119. 119 E2E: Fine-tuning: Sentiments CNN Campos, Victor, Amaia Salvador, Xavier Giro-i-Nieto, and Brendan Jou. "Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction." In Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57-62. ACM, 2015.
  • 120. 120 Campos, Victor, Xavier Giro-i-Nieto, and Brendan Jou. “From pixels to sentiments” (Submitted) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR 2015 E2E: Fine-tuning: Sentiments True positive True negative False positive False negative Visualizations with fully convolutional networks.
  • 121. 121 E2E: Fine-tuning: Cultural events ChaLearn Workshop A. Salvador, Zeppelzauer, M., Manchon-Vizuete, D., Calafell-Orós, A., and Giró-i-Nieto, X., “Cultural Event Recognition with Visual ConvNets and Temporal Models”, in CVPR ChaLearn Looking at People Workshop 2015, 2015. [slides]
  • 122. 122 VGG + Fine tuned E2E: Fine-tuning: Saliency prediction
  • 123. 123 E2E: Fine-tuning: Saliency prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016. From scratch VGG + Fine tuned
  • 124. 124 E2E: Fine-tuning: Saliency prediction Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision. 2016.
  • 125. 125 Task A (eg. image classification) Learned Representation Part I: End-to-end learning (E2E) Task B (eg. image retrieval) Part II: Off-The-Shelf features (OTS) Outline for Part I: Image Analytics...
  • 126. 126 Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Off-The-Shelf (OTS) Features
  • 127. ● Intermediate features can be used as regular visual descriptors for any task. 127 Off-The-Shelf (OTS) Features Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014
  • 128. 128 OTS: Classification: Razavian Razavian, Ali, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. "CNN features off-the-shelf: an astounding baseline for recognition." CVPRW 2014 Pascal VOC 2007
  • 129. 129 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014 Classifier L2-normalization Accuracy +5%
  • 130. Three representative architectures considered: AlexNet ZF OverFeat 5 days (fast) 3 weeks (slow) @ NVIDIA GTX Titan GPU 130 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 131. F C 131 OTS: Classification: Return of devil Data augmentation Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 132. 132 OTS: Classification: Return of devil Fisher Kernels (FK) ConvNets (CNN)
  • 133. Color Gray Scale (GS) Accuracy -2.5% 133 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 134. Dimensionality reduction by retraining the last layer to smaller sizes. Accuracy -2% Size x32 134 OTS: Classification: Return of devil Chatfield, K., Simonyan, K., Vedaldi, A. and Zisserman, A.. Return of the devil in the details: Delving deep into convolutional nets. BMVC 2014
  • 135. Ranking Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 584-599. 135 OTS: Retrieval
  • 136. Oxford Buildings Inria Holidays UKB 136 OTS: Retrieval
  • 137. Pooled from the network from Krizhevsky et. al. pretrained with images from ImageNet. 137 OTS: Retrieval: FC layers Summary of the paper by Amaia Salvador on Bitsearch. Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014.
  • 138. Off-the-shelf CNN descriptors from fully connected layers show useful but not superior (w.r.t. FV, VLAD, Sparse Coding,...) 138Babenko, Artem, et al. "Neural codes for image retrieval." Computer Vision–ECCV 2014. OTS: Retrieval: FC layers
  • 139. Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 139 Convolutional layers have shown better performance than fully connected ones. OTS: Retrieval: Conv layers
  • 140. OTS: Retrieval: Conv layers Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 140 Spatial Search, (extract N local descriptor from predefined locations) increases performance at computational cost.
  • 141. Medium memory footprints Razavian et al, A baseline for visual instance retrieval with deep convolutional networks, ICLR 2015. 141 OTS: Retrieval: Conv layers
  • 142. 142 OTS: Summarization Bolaños M, Mestre R, Talavera E, Giró-i-Nieto X, Radeva P. Visual Summary of Egocentric Photostreams by Representative Keyframes. In: IEEE International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX) 2015. Turin, Italy: 2015 Clustering based on Euclidean distance over FC7 features from AlexNet.