SlideShare a Scribd company logo
1 of 30
Download to read offline
Deep Learning in Computer Vision
Deep Learning in Action

24. June ’15
Caner Hazırbaş
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Computer Vision Group
2
6 Postdocs, 16 PhD students
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Research in Computer Vision
3
Image-based 3D

Reconstruction
Shape Analysis Robot Vision
RGB-D Vision
Image

Segmentation
Convex 

Relaxation

Methods
Visual SLAM
Optical Flow
Deep Learning 

in Computer Vision
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
How to teach a machine ?
5
edges classifier
(or any other hand-crafted features)
Person
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
How to teach a machine ?
6
edges classifier
Person
(or any other hand-crafted features)
N
ota
good
representation
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
What is deep learning ?
• Representation learning method

Learning good features automatically from raw data
• Learning representations of data with multiple levels of abstraction
7
Google’s cat detection neural network
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Construction of higher 

levels of abstraction
8
w1
w2
w3
“non-linear”

transformation
1
b
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Going deeper in the network
9
Input

‘Pixels’
1st and 2nd Layers
‘Edges’nvolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
not conditionally independent of one another given
layers above and below. In contrast, our treatment
g undirected edges enables combining bottom-up
top-down information more e ciently, as shown
ection 4.5.
our approach, probabilistic max-pooling helps to
ress scalability by shrinking the higher layers;
ght-sharing (convolutions) further speeds up the
rithm. For example, inference in a three-layer
work (with 200x200 input images) using weight-
ring but without max-pooling was about 10 times
wer. Without weight-sharing, it was more than 100
es slower.
work that was contemporary to and done indepen-
tly of ours, Desjardins and Bengio (2008) also ap-
d convolutional weight-sharing to RBMs and ex-
mented on small image patches. Our work, how-
, develops more sophisticated elements such as
babilistic max-pooling to make the algorithm more
able.
Experimental results
Learning hierarchical representations
from natural images
first tested our model’s ability to learn hierarchi-
representations of natural images. Specifically, we
ned a CDBN with two hidden layers from the Ky-
natural image dataset.3
The first layer consisted
4 groups (or “bases”)4
of 10x10 pixel filters, while
second layer consisted of 100 bases, each one 10x10
well.5
As shown in Figure 2 (top), the learned first
r bases are oriented, localized edge filters; this re-
is consistent with much prior work (Olshausen &
d, 1996; Bell & Sejnowski, 1997; Ranzato et al.,
6). We note that the sparsity regularization dur-
training was necessary for learning these oriented
e filters; when this term was removed, the algo-
m failed to learn oriented edges.
learned second layer bases are shown in Fig-
2 (bottom), and many of them empirically re-
Figure 2. The first layer bases (top) and the second layer
bases (bottom) learned from natural images. Each second
layer basis (filter) was visualized as a weighted linear com-
bination of the first layer bases.
unlabeled data do not share the same class labels, or
the same generative distribution, as the labeled data.
This framework, where generic unlabeled data improve
performance on a supervised learning task, is known
as self-taught learning. In their experiments, they used
sparse coding to train a single-layer representation,
and then used the learned representation to construct
features for supervised learning tasks.
We used a similar procedure to evaluate our two-layer
CDBN, described in Section 4.1, on the Caltech-101
object classification task.6
The results are shown in
Table 1. First, we observe that combining the first
and second layers significantly improves the classifica-
tion accuracy relative to the first layer alone. Overall,
we achieve 57.7% test accuracy using 15 training im-
ages per class, and 65.4% test accuracy using 30 train-
ing images per class. Our result is competitive with
state-of-the-art results using highly-specialized single
features, such as SIFT, geometric blur, and shape-
context (Lazebnik et al., 2006; Berg et al., 2005; Zhang
et al., 2006).7
Recall that the CDBN was trained en-
6
Details: Given an image from the Caltech-101
dataset (Fei-Fei et al., 2004), we scaled the image so that
Convolutional Deep Belief Networks for Scalable Unsu
Table 2. Test error fo
Labeled training samples 1,000
CDBN 2.62±0.12% 2
Ranzato et al. (2007) 3.21%
Hinton and Salakhutdinov (2006) -
Weston et al. (2008) 2.73%
faces cars elephan
Figure 3. Columns 1-4: the second layer bases (top) and th
categories. Column 5: the second layer bases (top) and the
object categories (faces, cars, airplanes, motorbikes).
0.2
0.4
0.6
Faces
first layer
second layer
third layer
0.2
0.4
0.6
Motorbikes
first layer
second layer
third layer
0.2
0.4
0.6
Cars
first layer
second layer
third layer
Convolutional Deep Belief N
Labeled training sample
CDBN
Ranzato et al. (2007)
Hinton and Salakhutdin
Weston et al. (2008)
faces
Figure 3. Columns 1-4: the seco
categories. Column 5: the secon
object categories (faces, cars, airp
0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
Area under the PR curve (AUC)
Faces
first layer
second layer
third layer
0.2 0.4 0.6 0
0
0.2
0.4
0.6
Area under the PR cur
Motorbikes
first layer
second layer
third layer
Features Faces M
First layer 0.39±0.17 0.
Second layer 0.86±0.13 0.
Third layer 0.95±0.03 0.
3rd Layer
‘Object Parts’
4th Layer
‘Objects’
ks for Scalable Unsupervised Learning of Hierarchical Representations
Table 2. Test error for MNIST dataset
1,000 2,000 3,000 5,000 60,000
2.62±0.12% 2.13±0.10% 1.91±0.09% 1.59±0.11% 0.82%
3.21% 2.53% - 1.52% 0.64%
06) - - - - 1.20%
2.73% - 1.83% - 1.50%
elephants chairs faces, cars, airplanes, motorbikes
utional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
Table 2. Test error for MNIST dataset
Labeled training samples 1,000 2,000 3,000 5,000 60,000
CDBN 2.62±0.12% 2.13±0.10% 1.91±0.09% 1.59±0.11% 0.82%
Ranzato et al. (2007) 3.21% 2.53% - 1.52% 0.64%
Hinton and Salakhutdinov (2006) - - - - 1.20%
Weston et al. (2008) 2.73% - 1.83% - 1.50%
faces cars elephants chairs faces, cars, airplanes, motorbikes
3. Columns 1-4: the second layer bases (top) and the third layer bases (bottom) learned from specific object
es. Column 5: the second layer bases (top) and the third layer bases (bottom) learned from a mixture of four
ategories (faces, cars, airplanes, motorbikes).
Faces
ayer
r 0.4
0.6
Motorbikes
first layer
second layer
third layer 0.4
0.6
Cars
first layer
second layer
third layer
faces
faces
cars
airplanes
motorbikes
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Deep Learning Methods
Unsupervised Methods
• Restricted Boltzmann Machines
• Deep Belief Networks
• Auto encoders: unsupervised feature extraction/learning
10
encode decode
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Deep Learning Methods
Supervised Methods
• Deep Neural Networks
• Recurrent Neural Networks
• Convolutional Neural Networks
11
Vision
Deep CNN
Language
Generating RNN
A group of people
shopping at an outdoor
market.
There are many
vegetables at the
fruit stand.
REVIEWINSIGHT
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
How to train a deep network ?
Stochastic Gradient Descent — supervised learning
• show input vector of few examples
• compute the output and the errors
• compute average gradient
• update the weights accordingly
12
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Convolutional Neural Networks
• CNNs are designed to process the data in the form of multiple arrays
(e.g. 2D images, 3D video/volumetric images)
• Typical architecture is composed of series of stages: convolutional layers
and pooling layers
• Each unit is connected to local patches in the feature maps of the
previous layer
13
20
15 x 5415 x 54 8 x 27
4 x 14
50
500 x 1
378 x 1
EAqyB4
conv1 pool1 conv pool2
10%
20
8 x 27
50
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
• local connections
Key Idea behind 

Convolutional Networks
14
Convolutional networks take advantage of the properties of natural signals:
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
• local connections
Key Idea behind 

Convolutional Networks
15
• shared weights
Convolutional networks take advantage of the properties of natural signals:
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
• local connections
Key Idea behind 

Convolutional Networks
16
• shared weights
• pooling
Convolutional networks take advantage of the properties of natural signals:
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
• local connections
Key Idea behind 

Convolutional Networks
Convolutional networks take advantage of the properties of natural signals:
17
• shared weights
• pooling • the use of many layers
Person
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Pros & Cons
• Best performing method in many
Computer Vision tasks
• No need of hand-crafted features
• Most applicable method for large-
scale problems, e.g. classification
of 1000 classes
• Easy parallelization on GPUs
18
• Need of huge amount of training
data
• Hard to train (local minima problem,
tuning hyper-parameters)
• Difficult to analyse (to be solved)
Deep Learning Applications

in Computer Vision
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
Handwritten Digit Recognition
20
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
ImageNet Classification with Deep
Convolutional Neural Networks (AlexNet)
21
Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels consider
The correct label is written under each image, and the probability assigned to
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
FlowNet: Learning Optical Flow
with Convolutional Networks
22
in collaboration with University of Freiburg

lmb.informatik.uni-freiburg.de
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
FlowNet: Learning Optical Flow
with Convolutional Networks
23
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
FlowNet: Learning Optical Flow
with Convolutional Networks
24
96 x 128
9999
192 x 256
6
64
128
256256
512512
512512
10245
x 5
5
x 5 3
x 3
conv6
prediction
conv5_1
conv5conv4_1
conv4conv3_1
conv3
conv2
conv1
136 x 320
7
x 7
384 x 512
refine-
ment
conv1
conv2
conv3
corr
conv_redir
conv3_1 conv4
conv4_1 conv5 conv5_1 conv6
3
64
128
256441 32
473
256
512 512512512
1024
384 x 512
sqrt
prediction
136 x 320
refine-
ment
4 x 512
4 x 51222222222222
kernel
7 x 7
5 x 5
1 x 1
1 x 1
3 x 3
FlowNetSimple
FlowNetCorr
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
FlowNet: Learning Optical Flow
with Convolutional Networks
25
128
128
256256
conv3
corr
conv_redir
conv3_1 conv4
conv4_1 conv5
128
256441 32
473
256
512512
sqrt
kernel
1 x 1
1 x 1
3 x 3
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
FlowNet: Learning Optical Flow
with Convolutional Networks
26
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
From Image to Caption
27
Figure 3 | From image to text. Captions generated by a recurrent neural with permission from ref. 102. When the RNN is given the ability to focus its
Vision
Deep CNN
Language
Generating RNN
A group of people
shopping at an outdoor
market.
There are many
vegetables at the
fruit stand.
A woman is throwing a frisbee in a park.
A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest with
trees in the background.
A dog is standing on a hardwood floor. A stop sign is on a road with a
mountain in the background
REVIEWINSIGHT
self-driving cars60,61
. Companies such as Mobileye and NVIDIA are Microsoft, IBM, Yahoo!, Twitter and Adobe, as well as a quickly
Figure 3 | From image to text. Captions generated by a recurrent neural
network (RNN) taking, as extra input, the representation extracted by a deep
convolution neural network (CNN) from a test image, with the RNN trained to
‘translate’ high-level representations of images into captions (top). Reproduced
with permission from ref. 102. When the RNN is given the ability to focus its
attention on a different location in the input image (middle and bottom; the
lighter patches were given more attention) as it generates each word (bold), we
found86
that it exploits this to achieve better ‘translation’ of images into captions.
Vision
Deep CNN
Language
Generating RNN
A group of people
shopping at an outdoor
market.
There are many
vegetables at the
fruit stand.
A woman is throwing a frisbee in a park.
A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest with
trees in the background.
A dog is standing on a hardwood floor. A stop sign is on a road with a
mountain in the background
REVIEWINSIGHT
Vision
Deep CNN
Language
Generating RNN
A group of people
shopping at an outdoor
market.
There are many
vegetables at the
fruit stand.
A woman is throwing a frisbee in a park. A dog is standing on a hardwood floor. A stop sign is on a road with a
mountain in the background
REVIEWINSIGHT
Deep Learning in Computer Vision
Questions ?
End of

Presentation
Caner Hazırbaş | hazirbas@cs.tum.edu
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
References
• Building High-level Features Using Large Scale Unsupervised Learning

Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng
ICML’12
• Convolutional Deep Belief Networks for Scalable Unsupervised Learning of
Hierarchical Representations

Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09
• ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12
• Gradient-based learning applied to document recognition.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98
• FlowNet: Learning Optical Flow with Convolutional Networks

Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov,
Patrick van der Smagt, Daniel Cremers, Thomas Brox
29
Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de
References
• Google’s cat detection neural network http://www.resnap.com/image-
selection-technology/deep-learning-image-classification/
• Example auto-encoder : http://nghiaho.com/?p=1765
• SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-
descent/
30

More Related Content

What's hot

Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중datasciencekorea
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMustafa Aldemir
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 

What's hot (20)

Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Image captioning
Image captioningImage captioning
Image captioning
 

Similar to Deep learning in Computer Vision

Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecognIlyas CHAOUA
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET Journal
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Networkaciijournal
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsIRJET Journal
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in ImagesAnil Kumar Gupta
 
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...IRJET Journal
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Networkaciijournal
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKaciijournal
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptAbdullah Gubbi
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
 
improving Profile detection using Deep Learning
improving Profile detection using Deep Learningimproving Profile detection using Deep Learning
improving Profile detection using Deep LearningSahil Kaw
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...csandit
 
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...cscpconf
 

Similar to Deep learning in Computer Vision (20)

Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine Learning
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its Variants
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
 
28 01-2021-05
28 01-2021-0528 01-2021-05
28 01-2021-05
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
 
improving Profile detection using Deep Learning
improving Profile detection using Deep Learningimproving Profile detection using Deep Learning
improving Profile detection using Deep Learning
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
 
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Deep learning in Computer Vision

  • 1. Deep Learning in Computer Vision Deep Learning in Action
 24. June ’15 Caner Hazırbaş
  • 2. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Computer Vision Group 2 6 Postdocs, 16 PhD students
  • 3. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Research in Computer Vision 3 Image-based 3D
 Reconstruction Shape Analysis Robot Vision RGB-D Vision Image
 Segmentation Convex 
 Relaxation
 Methods Visual SLAM Optical Flow
  • 4. Deep Learning 
 in Computer Vision
  • 5. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de How to teach a machine ? 5 edges classifier (or any other hand-crafted features) Person
  • 6. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de How to teach a machine ? 6 edges classifier Person (or any other hand-crafted features) N ota good representation
  • 7. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de What is deep learning ? • Representation learning method
 Learning good features automatically from raw data • Learning representations of data with multiple levels of abstraction 7 Google’s cat detection neural network
  • 8. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Construction of higher 
 levels of abstraction 8 w1 w2 w3 “non-linear”
 transformation 1 b
  • 9. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Going deeper in the network 9 Input
 ‘Pixels’ 1st and 2nd Layers ‘Edges’nvolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations not conditionally independent of one another given layers above and below. In contrast, our treatment g undirected edges enables combining bottom-up top-down information more e ciently, as shown ection 4.5. our approach, probabilistic max-pooling helps to ress scalability by shrinking the higher layers; ght-sharing (convolutions) further speeds up the rithm. For example, inference in a three-layer work (with 200x200 input images) using weight- ring but without max-pooling was about 10 times wer. Without weight-sharing, it was more than 100 es slower. work that was contemporary to and done indepen- tly of ours, Desjardins and Bengio (2008) also ap- d convolutional weight-sharing to RBMs and ex- mented on small image patches. Our work, how- , develops more sophisticated elements such as babilistic max-pooling to make the algorithm more able. Experimental results Learning hierarchical representations from natural images first tested our model’s ability to learn hierarchi- representations of natural images. Specifically, we ned a CDBN with two hidden layers from the Ky- natural image dataset.3 The first layer consisted 4 groups (or “bases”)4 of 10x10 pixel filters, while second layer consisted of 100 bases, each one 10x10 well.5 As shown in Figure 2 (top), the learned first r bases are oriented, localized edge filters; this re- is consistent with much prior work (Olshausen & d, 1996; Bell & Sejnowski, 1997; Ranzato et al., 6). We note that the sparsity regularization dur- training was necessary for learning these oriented e filters; when this term was removed, the algo- m failed to learn oriented edges. learned second layer bases are shown in Fig- 2 (bottom), and many of them empirically re- Figure 2. The first layer bases (top) and the second layer bases (bottom) learned from natural images. Each second layer basis (filter) was visualized as a weighted linear com- bination of the first layer bases. unlabeled data do not share the same class labels, or the same generative distribution, as the labeled data. This framework, where generic unlabeled data improve performance on a supervised learning task, is known as self-taught learning. In their experiments, they used sparse coding to train a single-layer representation, and then used the learned representation to construct features for supervised learning tasks. We used a similar procedure to evaluate our two-layer CDBN, described in Section 4.1, on the Caltech-101 object classification task.6 The results are shown in Table 1. First, we observe that combining the first and second layers significantly improves the classifica- tion accuracy relative to the first layer alone. Overall, we achieve 57.7% test accuracy using 15 training im- ages per class, and 65.4% test accuracy using 30 train- ing images per class. Our result is competitive with state-of-the-art results using highly-specialized single features, such as SIFT, geometric blur, and shape- context (Lazebnik et al., 2006; Berg et al., 2005; Zhang et al., 2006).7 Recall that the CDBN was trained en- 6 Details: Given an image from the Caltech-101 dataset (Fei-Fei et al., 2004), we scaled the image so that Convolutional Deep Belief Networks for Scalable Unsu Table 2. Test error fo Labeled training samples 1,000 CDBN 2.62±0.12% 2 Ranzato et al. (2007) 3.21% Hinton and Salakhutdinov (2006) - Weston et al. (2008) 2.73% faces cars elephan Figure 3. Columns 1-4: the second layer bases (top) and th categories. Column 5: the second layer bases (top) and the object categories (faces, cars, airplanes, motorbikes). 0.2 0.4 0.6 Faces first layer second layer third layer 0.2 0.4 0.6 Motorbikes first layer second layer third layer 0.2 0.4 0.6 Cars first layer second layer third layer Convolutional Deep Belief N Labeled training sample CDBN Ranzato et al. (2007) Hinton and Salakhutdin Weston et al. (2008) faces Figure 3. Columns 1-4: the seco categories. Column 5: the secon object categories (faces, cars, airp 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 Area under the PR curve (AUC) Faces first layer second layer third layer 0.2 0.4 0.6 0 0 0.2 0.4 0.6 Area under the PR cur Motorbikes first layer second layer third layer Features Faces M First layer 0.39±0.17 0. Second layer 0.86±0.13 0. Third layer 0.95±0.03 0. 3rd Layer ‘Object Parts’ 4th Layer ‘Objects’ ks for Scalable Unsupervised Learning of Hierarchical Representations Table 2. Test error for MNIST dataset 1,000 2,000 3,000 5,000 60,000 2.62±0.12% 2.13±0.10% 1.91±0.09% 1.59±0.11% 0.82% 3.21% 2.53% - 1.52% 0.64% 06) - - - - 1.20% 2.73% - 1.83% - 1.50% elephants chairs faces, cars, airplanes, motorbikes utional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Table 2. Test error for MNIST dataset Labeled training samples 1,000 2,000 3,000 5,000 60,000 CDBN 2.62±0.12% 2.13±0.10% 1.91±0.09% 1.59±0.11% 0.82% Ranzato et al. (2007) 3.21% 2.53% - 1.52% 0.64% Hinton and Salakhutdinov (2006) - - - - 1.20% Weston et al. (2008) 2.73% - 1.83% - 1.50% faces cars elephants chairs faces, cars, airplanes, motorbikes 3. Columns 1-4: the second layer bases (top) and the third layer bases (bottom) learned from specific object es. Column 5: the second layer bases (top) and the third layer bases (bottom) learned from a mixture of four ategories (faces, cars, airplanes, motorbikes). Faces ayer r 0.4 0.6 Motorbikes first layer second layer third layer 0.4 0.6 Cars first layer second layer third layer faces faces cars airplanes motorbikes
  • 10. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Deep Learning Methods Unsupervised Methods • Restricted Boltzmann Machines • Deep Belief Networks • Auto encoders: unsupervised feature extraction/learning 10 encode decode
  • 11. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Deep Learning Methods Supervised Methods • Deep Neural Networks • Recurrent Neural Networks • Convolutional Neural Networks 11 Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. REVIEWINSIGHT
  • 12. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de How to train a deep network ? Stochastic Gradient Descent — supervised learning • show input vector of few examples • compute the output and the errors • compute average gradient • update the weights accordingly 12
  • 13. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Convolutional Neural Networks • CNNs are designed to process the data in the form of multiple arrays (e.g. 2D images, 3D video/volumetric images) • Typical architecture is composed of series of stages: convolutional layers and pooling layers • Each unit is connected to local patches in the feature maps of the previous layer 13 20 15 x 5415 x 54 8 x 27 4 x 14 50 500 x 1 378 x 1 EAqyB4 conv1 pool1 conv pool2 10% 20 8 x 27 50
  • 14. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de • local connections Key Idea behind 
 Convolutional Networks 14 Convolutional networks take advantage of the properties of natural signals:
  • 15. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de • local connections Key Idea behind 
 Convolutional Networks 15 • shared weights Convolutional networks take advantage of the properties of natural signals:
  • 16. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de • local connections Key Idea behind 
 Convolutional Networks 16 • shared weights • pooling Convolutional networks take advantage of the properties of natural signals:
  • 17. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de • local connections Key Idea behind 
 Convolutional Networks Convolutional networks take advantage of the properties of natural signals: 17 • shared weights • pooling • the use of many layers Person
  • 18. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Pros & Cons • Best performing method in many Computer Vision tasks • No need of hand-crafted features • Most applicable method for large- scale problems, e.g. classification of 1000 classes • Easy parallelization on GPUs 18 • Need of huge amount of training data • Hard to train (local minima problem, tuning hyper-parameters) • Difficult to analyse (to be solved)
  • 20. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de Handwritten Digit Recognition 20
  • 21. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) 21 Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels consider The correct label is written under each image, and the probability assigned to
  • 22. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de FlowNet: Learning Optical Flow with Convolutional Networks 22 in collaboration with University of Freiburg
 lmb.informatik.uni-freiburg.de
  • 23. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de FlowNet: Learning Optical Flow with Convolutional Networks 23
  • 24. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de FlowNet: Learning Optical Flow with Convolutional Networks 24 96 x 128 9999 192 x 256 6 64 128 256256 512512 512512 10245 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5conv4_1 conv4conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256441 32 473 256 512 512512512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 51222222222222 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3 FlowNetSimple FlowNetCorr
  • 25. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de FlowNet: Learning Optical Flow with Convolutional Networks 25 128 128 256256 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 128 256441 32 473 256 512512 sqrt kernel 1 x 1 1 x 1 3 x 3
  • 26. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de FlowNet: Learning Optical Flow with Convolutional Networks 26
  • 27. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de From Image to Caption 27 Figure 3 | From image to text. Captions generated by a recurrent neural with permission from ref. 102. When the RNN is given the ability to focus its Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest with trees in the background. A dog is standing on a hardwood floor. A stop sign is on a road with a mountain in the background REVIEWINSIGHT self-driving cars60,61 . Companies such as Mobileye and NVIDIA are Microsoft, IBM, Yahoo!, Twitter and Adobe, as well as a quickly Figure 3 | From image to text. Captions generated by a recurrent neural network (RNN) taking, as extra input, the representation extracted by a deep convolution neural network (CNN) from a test image, with the RNN trained to ‘translate’ high-level representations of images into captions (top). Reproduced with permission from ref. 102. When the RNN is given the ability to focus its attention on a different location in the input image (middle and bottom; the lighter patches were given more attention) as it generates each word (bold), we found86 that it exploits this to achieve better ‘translation’ of images into captions. Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest with trees in the background. A dog is standing on a hardwood floor. A stop sign is on a road with a mountain in the background REVIEWINSIGHT
  • 28. Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A dog is standing on a hardwood floor. A stop sign is on a road with a mountain in the background REVIEWINSIGHT Deep Learning in Computer Vision Questions ? End of
 Presentation Caner Hazırbaş | hazirbas@cs.tum.edu
  • 29. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de References • Building High-level Features Using Large Scale Unsupervised Learning
 Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12 • Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
 Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09 • ImageNet Classification with Deep Convolutional Neural Networks
 Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12 • Gradient-based learning applied to document recognition.
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98 • FlowNet: Learning Optical Flow with Convolutional Networks
 Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox 29
  • 30. Deep Learning in Computer VisionCaner Hazırbaş | vision.in.tum.de References • Google’s cat detection neural network http://www.resnap.com/image- selection-technology/deep-learning-image-classification/ • Example auto-encoder : http://nghiaho.com/?p=1765 • SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient- descent/ 30