SlideShare a Scribd company logo
1 of 91
Object Recognition II
Linda Shapiro
EE/CSE 576
with CNN slides from Ross Girshick
1
Outline
• Object detection
• the task, evaluation, datasets
• Convolutional Neural Networks (CNNs)
• overview and history
• Region-based Convolutional Networks (R-CNNs)
2
Image classification
• 𝐾 classes
• Task: assign correct class label to the whole image
Digit classification (MNIST) Object recognition (Caltech-101)
3
Classification vs. Detection
 Dog
Dog
Dog
4
Problem formulation
person
motorbike
Input Desired output
{ airplane, bird, motorbike, person, sofa }
5
Evaluating a detector
Test image (previously unseen)
First detection ...
‘person’ detector predictions
0.9
Second detection ...
0.9
0.6
‘person’ detector predictions
Third detection ...
0.9
0.6
0.2
‘person’ detector predictions
Compare to ground truth
ground truth ‘person’ boxes
0.9
0.6
0.2
‘person’ detector predictions
Sort by confidence
... ... ... ... ...
✓ ✓ ✓
0.9 0.8 0.6 0.5 0.2 0.1
true
positive
(high overlap)
false
positive
(no overlap,
low overlap, or
duplicate)
X X X
Evaluation metric
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓
X X X
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑡 =
#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 + #𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
𝑟𝑒𝑐𝑎𝑙𝑙@𝑡 =
#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
#𝑔𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑜𝑏𝑗𝑒𝑐𝑡𝑠
𝑡
✓
✓ + X
Evaluation metric
Average Precision (AP)
0% is worst
100% is best
mean AP over classes
(mAP)
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓
X X X
Pedestrians
Histograms of Oriented Gradients for Human Detection,
Dalal and Triggs, CVPR 2005
AP ~77%
More sophisticated methods: AP ~90%
(a) average gradient image over training examples
(b) each “pixel” shows max positive SVM weight in the block centered on that pixel
(c) same as (b) for negative SVM weights
(d) test image
(e) its R-HOG descriptor
(f) R-HOG descriptor weighted by positive SVM weights
(g) R-HOG descriptor weighted by negative SVM weights
14
Overview of HOG Method
1. Compute gradients in the region to be described
2. Put them in bins according to orientation
3. Group the cells into large blocks
4. Normalize each block
5. Train classifiers to decide if these are parts of a human
15
Details
• Gradients
[-1 0 1] and [-1 0 1]T were good enough filters.
• Cell Histograms
Each pixel within the cell casts a weighted vote for an
orientation-based histogram channel based on the values
found in the gradient computation. (9 channels worked)
• Blocks
Group the cells together into larger blocks, either R-HOG
blocks (rectangular) or C-HOG blocks (circular).
16
More Details
• Block Normalization
• If you think of the block as a vector v, then the
normalized block is v/norm(v)
They tried 4 different kinds of normalization.
• L1-norm
• sqrt of L1-norm
• L2 norm
• L2-norm followed by clipping
17
Example: Dalal-Triggs pedestrian
detector
1. Extract fixed-sized (64x128 pixel) window at each
position and scale
2. Compute HOG (histogram of gradient) features
within each window
3. Score the window with a linear SVM classifier
4. Perform non-maxima suppression to remove
overlapping detections with lower scores
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
18
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
19
uncentered
centered
cubic-corrected
diagonal
Sobel
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Outperforms
20
• Histogram of gradient orientations
• Votes weighted by magnitude
• Bilinear interpolation between cells
Orientation: 9 bins (for
unsigned angles)
Histograms in 8x8
pixel cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
21
Normalize with respect to
surrounding cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
22
X=
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
# features = 15 x 7 x 9 x 4 = 3780
# cells
# orientations
# normalizations by
neighboring cells
23
Training set
24
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
pos w neg w
25
pedestrian
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
26
Detection examples
27
30
Deformable Parts Model
• Takes the idea a little further
• Instead of one rigid HOG model, we have multiple
HOG models in a spatial arrangement
• One root part to find first and multiple other parts
in a tree structure.
31
The Idea
Articulated parts model
• Object is configuration of parts
• Each part is detectable
Images from Felzenszwalb
32
Deformable objects
Images from Caltech-256
Slide Credit: Duan Tran
33
Deformable objects
Images from D. Ramanan’s dataset
Slide Credit: Duan Tran
34
How to model spatial relations?
• Tree-shaped model
35
36
Hybrid template/parts model
Detections
Template Visualization
Felzenszwalb et al. 2008
37
Pictorial Structures Model
Appearance likelihood Geometry likelihood
38
Results for person matching
39
Results for person matching
40
BMVC 2009
41
2012 State-of-the-art Detector:
Deformable Parts Model (DPM)
42
Felzenszwalb et al., 2008, 2010, 2011, 2012
Lifetime
Achievement
1. Strong low-level features based on HOG
2. Efficient matching algorithms for deformable part-based
models (pictorial structures)
3. Discriminative learning with latent variables (latent SVM)
Why did gradient-based models work?
Average gradient image
43
Generic categories
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?
PASCAL Visual Object Categories (VOC) dataset
44
Generic categories
Why doesn’t this work (as well)?
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?
PASCAL Visual Object Categories (VOC) dataset
45
Quiz time
(Back to Girshick)
46
Warm up
This is an average image of which object class?
47
Warm up
pedestrian
48
A little harder
?
49
A little harder
?
Hint: airplane, bicycle, bus, car, cat, chair, cow, dog, dining table
50
A little harder
bicycle (PASCAL)
51
A little harder, yet
?
52
A little harder, yet
?
Hint: white blob on a green background
53
A little harder, yet
sheep (PASCAL)
54
Impossible?
?
55
Impossible?
dog (PASCAL)
56
Impossible?
dog (PASCAL)
Why does the mean look like this?
There’s no alignment between the examples!
How do we combat this? 57
PASCAL VOC detection history
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mean
Average
Precision
(mAP)
year
DPM
DPM,
HOG+
BOW
DPM,
MKL
DPM++
DPM++,
MKL,
Selective
Search
Selective
Search,
DPM++,
MKL
41%
41%
37%
28%
23%
17%
Part-based models & multiple
features (MKL)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mean
Average
Precision
(mAP)
year
DPM
DPM,
HOG+
BOW
DPM,
MKL
DPM++
DPM++,
MKL,
Selective
Search
Selective
Search,
DPM++,
MKL
41%
41%
37%
28%
23%
17%
Kitchen-sink approaches
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mean
Average
Precision
(mAP)
year
DPM
DPM,
HOG+
BOW
DPM,
MKL
DPM++
DPM++,
MKL,
Selective
Search
Selective
Search,
DPM++,
MKL
41%
41%
37%
28%
23%
17%
increasing complexity & plateau
Region-based Convolutional
Networks (R-CNNs)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mean
Average
Precision
(mAP)
year
DPM
DPM,
HOG+
BOW
DPM,
MKL
DPM++
DPM++,
MKL,
Selective
Search
Selective
Search,
DPM++,
MKL
41%
41%
37%
28%
23%
17%
53%
62%
R-CNN v1
R-CNN v2
[R-CNN. Girshick et al. CVPR 2014]
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mean
Average
Precision
(mAP)
year
~1 year
~5 years
Region-based Convolutional
Networks (R-CNNs)
[R-CNN. Girshick et al. CVPR 2014]
Convolutional Neural Networks
• Overview
63
Standard Neural Networks
𝒙 = 𝑥1, … , 𝑥784
𝑇
𝑧𝑗 = 𝑔(𝒘𝑗
𝑇
𝒙) 𝑔 𝑡 =
1
1 + 𝑒−𝑡
“Fully connected”
64
From NNs to Convolutional NNs
• Local connectivity
• Shared (“tied”) weights
• Multiple feature maps
• Pooling
65
Convolutional NNs
• Local connectivity
• Each green unit is only connected to (3)
neighboring blue units
compare
66
Convolutional NNs
• Shared (“tied”) weights
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
𝑤1
𝑤2
𝑤3
67
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
68
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
69
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
70
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
71
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,
but with a different input window
𝑤1
𝑤2
𝑤3
72
Convolutional NNs
• Multiple feature maps
• All orange units compute the same function
but with a different input windows
• Orange and green units compute
different functions
𝑤1
𝑤2
𝑤3
𝑤′1
𝑤′2
𝑤′3
Feature map 1
(array of green
units)
Feature map 2
(array of orange
units)
73
Convolutional NNs
• Pooling (max, average)
1
4
0
3
4
3
• Pooling area: 2 units
• Pooling stride: 2 units
• Subsamples feature maps
74
Image
Pooling
Convolution
2D input
75
1989
Backpropagation applied to handwritten zip code recognition,
Lecun et al., 1989 76
Historical perspective – 1980
77
Historical perspective – 1980
Hubel and Wiesel
1962
Included basic ingredients of ConvNets, but no supervised learning algorithm
78
Supervised learning – 1986
Early demonstration that error backpropagation can be used
for supervised training of neural nets (including ConvNets)
Gradient descent training with error backpropagation
79
Supervised learning – 1986
“T” vs. “C” problem Simple ConvNet
80
Practical ConvNets
Gradient-Based Learning Applied to Document Recognition,
Lecun et al., 1998
81
Demo
• http://cs.stanford.edu/people/karpathy/convnetjs/
demo/mnist.html
• ConvNetJS by Andrej Karpathy (Ph.D. student at
Stanford)
Software libraries
• Caffe (C++, python, matlab)
• Torch7 (C++, lua)
• Theano (python)
82
The fall of ConvNets
• The rise of Support Vector Machines (SVMs)
• Mathematical advantages (theory, convex
optimization)
• Competitive performance on tasks such as digit
classification
• Neural nets became unpopular in the mid 1990s
83
The key to SVMs
• It’s all about the features
Histograms of Oriented Gradients for Human Detection,
Dalal and Triggs, CVPR 2005
SVM weights
(+) (-)
HOG features
84
Core idea of “deep learning”
• Input: the “raw” signal (image, waveform, …)
• Features: hierarchy of features is learned from the
raw input
85
• If SVMs killed neural nets, how did they come back
(in computer vision)?
86
What’s new since the 1980s?
• More layers
• LeNet-3 and LeNet-5 had 3 and 5 learnable layers
• Current models have 8 – 20+
• “ReLU” non-linearities (Rectified Linear Unit)
• 𝑔 𝑥 = max 0, 𝑥
• Gradient doesn’t vanish
• “Dropout” regularization
• Fast GPU implementations
• More data
𝑥
𝑔(𝑥)
87
What else? Object Proposals
• Sliding window based object detection
• Object proposals
• Fast execution
• High recall with low # of candidate boxes
Image
Feature
Extraction
Classificaiton
Iterate over window size, aspect
ratio, and location
Image
Feature
Extraction
Classificaiton
Object
Proposal
88
The number of contours wholly enclosed by a bounding box is indicative of
the likelihood of the box containing an object.
89
Ross’s Own System: Region CNNs
Competitive Results
Top Regions for Six Object Classes
Finale
• Object recognition has moved rapidly in the last 12
years to becoming very appearance based.
• The HOG descriptor lead to fast recognition of
specific views of generic objects, starting with
pedestrians and using SVMs.
• Deformable parts models extended that to allow
more objects with articulated limbs, but still
specific views.
• CNNs have become the method of choice; they
learn from huge amounts of data and can learn
multiple views of each object class.
93

More Related Content

Similar to ObjRecog2-17 (1).pptx

The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...WiMLDSMontreal
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic AlgorithmSHIMI S L
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Geneticalgorithms 100403002207-phpapp02
Geneticalgorithms 100403002207-phpapp02Geneticalgorithms 100403002207-phpapp02
Geneticalgorithms 100403002207-phpapp02Amna Saeed
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 

Similar to ObjRecog2-17 (1).pptx (20)

L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
07 learning
07 learning07 learning
07 learning
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Geneticalgorithms 100403002207-phpapp02
Geneticalgorithms 100403002207-phpapp02Geneticalgorithms 100403002207-phpapp02
Geneticalgorithms 100403002207-phpapp02
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

ObjRecog2-17 (1).pptx

  • 1. Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1
  • 2. Outline • Object detection • the task, evaluation, datasets • Convolutional Neural Networks (CNNs) • overview and history • Region-based Convolutional Networks (R-CNNs) 2
  • 3. Image classification • 𝐾 classes • Task: assign correct class label to the whole image Digit classification (MNIST) Object recognition (Caltech-101) 3
  • 5. Problem formulation person motorbike Input Desired output { airplane, bird, motorbike, person, sofa } 5
  • 6. Evaluating a detector Test image (previously unseen)
  • 7. First detection ... ‘person’ detector predictions 0.9
  • 10. Compare to ground truth ground truth ‘person’ boxes 0.9 0.6 0.2 ‘person’ detector predictions
  • 11. Sort by confidence ... ... ... ... ... ✓ ✓ ✓ 0.9 0.8 0.6 0.5 0.2 0.1 true positive (high overlap) false positive (no overlap, low overlap, or duplicate) X X X
  • 12. Evaluation metric ... ... ... ... ... 0.9 0.8 0.6 0.5 0.2 0.1 ✓ ✓ ✓ X X X 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑡 = #𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 #𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 + #𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 𝑟𝑒𝑐𝑎𝑙𝑙@𝑡 = #𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 #𝑔𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑡 ✓ ✓ + X
  • 13. Evaluation metric Average Precision (AP) 0% is worst 100% is best mean AP over classes (mAP) ... ... ... ... ... 0.9 0.8 0.6 0.5 0.2 0.1 ✓ ✓ ✓ X X X
  • 14. Pedestrians Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005 AP ~77% More sophisticated methods: AP ~90% (a) average gradient image over training examples (b) each “pixel” shows max positive SVM weight in the block centered on that pixel (c) same as (b) for negative SVM weights (d) test image (e) its R-HOG descriptor (f) R-HOG descriptor weighted by positive SVM weights (g) R-HOG descriptor weighted by negative SVM weights 14
  • 15. Overview of HOG Method 1. Compute gradients in the region to be described 2. Put them in bins according to orientation 3. Group the cells into large blocks 4. Normalize each block 5. Train classifiers to decide if these are parts of a human 15
  • 16. Details • Gradients [-1 0 1] and [-1 0 1]T were good enough filters. • Cell Histograms Each pixel within the cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation. (9 channels worked) • Blocks Group the cells together into larger blocks, either R-HOG blocks (rectangular) or C-HOG blocks (circular). 16
  • 17. More Details • Block Normalization • If you think of the block as a vector v, then the normalized block is v/norm(v) They tried 4 different kinds of normalization. • L1-norm • sqrt of L1-norm • L2 norm • L2-norm followed by clipping 17
  • 18. Example: Dalal-Triggs pedestrian detector 1. Extract fixed-sized (64x128 pixel) window at each position and scale 2. Compute HOG (histogram of gradient) features within each window 3. Score the window with a linear SVM classifier 4. Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 18
  • 19. Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 19
  • 20. uncentered centered cubic-corrected diagonal Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Outperforms 20
  • 21. • Histogram of gradient orientations • Votes weighted by magnitude • Bilinear interpolation between cells Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 21
  • 22. Normalize with respect to surrounding cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 22
  • 23. X= Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 # features = 15 x 7 x 9 x 4 = 3780 # cells # orientations # normalizations by neighboring cells 23
  • 25. Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 pos w neg w 25
  • 26. pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 26
  • 28. 30
  • 29. Deformable Parts Model • Takes the idea a little further • Instead of one rigid HOG model, we have multiple HOG models in a spatial arrangement • One root part to find first and multiple other parts in a tree structure. 31
  • 30. The Idea Articulated parts model • Object is configuration of parts • Each part is detectable Images from Felzenszwalb 32
  • 31. Deformable objects Images from Caltech-256 Slide Credit: Duan Tran 33
  • 32. Deformable objects Images from D. Ramanan’s dataset Slide Credit: Duan Tran 34
  • 33. How to model spatial relations? • Tree-shaped model 35
  • 34. 36
  • 35. Hybrid template/parts model Detections Template Visualization Felzenszwalb et al. 2008 37
  • 36. Pictorial Structures Model Appearance likelihood Geometry likelihood 38
  • 37. Results for person matching 39
  • 38. Results for person matching 40
  • 40. 2012 State-of-the-art Detector: Deformable Parts Model (DPM) 42 Felzenszwalb et al., 2008, 2010, 2011, 2012 Lifetime Achievement 1. Strong low-level features based on HOG 2. Efficient matching algorithms for deformable part-based models (pictorial structures) 3. Discriminative learning with latent variables (latent SVM)
  • 41. Why did gradient-based models work? Average gradient image 43
  • 42. Generic categories Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …? PASCAL Visual Object Categories (VOC) dataset 44
  • 43. Generic categories Why doesn’t this work (as well)? Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …? PASCAL Visual Object Categories (VOC) dataset 45
  • 44. Quiz time (Back to Girshick) 46
  • 45. Warm up This is an average image of which object class? 47
  • 48. A little harder ? Hint: airplane, bicycle, bus, car, cat, chair, cow, dog, dining table 50
  • 49. A little harder bicycle (PASCAL) 51
  • 50. A little harder, yet ? 52
  • 51. A little harder, yet ? Hint: white blob on a green background 53
  • 52. A little harder, yet sheep (PASCAL) 54
  • 55. Impossible? dog (PASCAL) Why does the mean look like this? There’s no alignment between the examples! How do we combat this? 57
  • 56. PASCAL VOC detection history 0% 10% 20% 30% 40% 50% 60% 70% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 mean Average Precision (mAP) year DPM DPM, HOG+ BOW DPM, MKL DPM++ DPM++, MKL, Selective Search Selective Search, DPM++, MKL 41% 41% 37% 28% 23% 17%
  • 57. Part-based models & multiple features (MKL) 0% 10% 20% 30% 40% 50% 60% 70% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 mean Average Precision (mAP) year DPM DPM, HOG+ BOW DPM, MKL DPM++ DPM++, MKL, Selective Search Selective Search, DPM++, MKL 41% 41% 37% 28% 23% 17%
  • 58. Kitchen-sink approaches 0% 10% 20% 30% 40% 50% 60% 70% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 mean Average Precision (mAP) year DPM DPM, HOG+ BOW DPM, MKL DPM++ DPM++, MKL, Selective Search Selective Search, DPM++, MKL 41% 41% 37% 28% 23% 17% increasing complexity & plateau
  • 59. Region-based Convolutional Networks (R-CNNs) 0% 10% 20% 30% 40% 50% 60% 70% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 mean Average Precision (mAP) year DPM DPM, HOG+ BOW DPM, MKL DPM++ DPM++, MKL, Selective Search Selective Search, DPM++, MKL 41% 41% 37% 28% 23% 17% 53% 62% R-CNN v1 R-CNN v2 [R-CNN. Girshick et al. CVPR 2014]
  • 60. 0% 10% 20% 30% 40% 50% 60% 70% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 mean Average Precision (mAP) year ~1 year ~5 years Region-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014]
  • 62. Standard Neural Networks 𝒙 = 𝑥1, … , 𝑥784 𝑇 𝑧𝑗 = 𝑔(𝒘𝑗 𝑇 𝒙) 𝑔 𝑡 = 1 1 + 𝑒−𝑡 “Fully connected” 64
  • 63. From NNs to Convolutional NNs • Local connectivity • Shared (“tied”) weights • Multiple feature maps • Pooling 65
  • 64. Convolutional NNs • Local connectivity • Each green unit is only connected to (3) neighboring blue units compare 66
  • 65. Convolutional NNs • Shared (“tied”) weights • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 𝑤1 𝑤2 𝑤3 67
  • 66. Convolutional NNs • Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1] • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 68
  • 67. Convolutional NNs • Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1] • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 69
  • 68. Convolutional NNs • Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1] • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 70
  • 69. Convolutional NNs • Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1] • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 71
  • 70. Convolutional NNs • Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1] • All green units share the same parameters 𝒘 • Each green unit computes the same function, but with a different input window 𝑤1 𝑤2 𝑤3 72
  • 71. Convolutional NNs • Multiple feature maps • All orange units compute the same function but with a different input windows • Orange and green units compute different functions 𝑤1 𝑤2 𝑤3 𝑤′1 𝑤′2 𝑤′3 Feature map 1 (array of green units) Feature map 2 (array of orange units) 73
  • 72. Convolutional NNs • Pooling (max, average) 1 4 0 3 4 3 • Pooling area: 2 units • Pooling stride: 2 units • Subsamples feature maps 74
  • 74. 1989 Backpropagation applied to handwritten zip code recognition, Lecun et al., 1989 76
  • 76. Historical perspective – 1980 Hubel and Wiesel 1962 Included basic ingredients of ConvNets, but no supervised learning algorithm 78
  • 77. Supervised learning – 1986 Early demonstration that error backpropagation can be used for supervised training of neural nets (including ConvNets) Gradient descent training with error backpropagation 79
  • 78. Supervised learning – 1986 “T” vs. “C” problem Simple ConvNet 80
  • 79. Practical ConvNets Gradient-Based Learning Applied to Document Recognition, Lecun et al., 1998 81
  • 80. Demo • http://cs.stanford.edu/people/karpathy/convnetjs/ demo/mnist.html • ConvNetJS by Andrej Karpathy (Ph.D. student at Stanford) Software libraries • Caffe (C++, python, matlab) • Torch7 (C++, lua) • Theano (python) 82
  • 81. The fall of ConvNets • The rise of Support Vector Machines (SVMs) • Mathematical advantages (theory, convex optimization) • Competitive performance on tasks such as digit classification • Neural nets became unpopular in the mid 1990s 83
  • 82. The key to SVMs • It’s all about the features Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005 SVM weights (+) (-) HOG features 84
  • 83. Core idea of “deep learning” • Input: the “raw” signal (image, waveform, …) • Features: hierarchy of features is learned from the raw input 85
  • 84. • If SVMs killed neural nets, how did they come back (in computer vision)? 86
  • 85. What’s new since the 1980s? • More layers • LeNet-3 and LeNet-5 had 3 and 5 learnable layers • Current models have 8 – 20+ • “ReLU” non-linearities (Rectified Linear Unit) • 𝑔 𝑥 = max 0, 𝑥 • Gradient doesn’t vanish • “Dropout” regularization • Fast GPU implementations • More data 𝑥 𝑔(𝑥) 87
  • 86. What else? Object Proposals • Sliding window based object detection • Object proposals • Fast execution • High recall with low # of candidate boxes Image Feature Extraction Classificaiton Iterate over window size, aspect ratio, and location Image Feature Extraction Classificaiton Object Proposal 88
  • 87. The number of contours wholly enclosed by a bounding box is indicative of the likelihood of the box containing an object. 89
  • 88. Ross’s Own System: Region CNNs
  • 90. Top Regions for Six Object Classes
  • 91. Finale • Object recognition has moved rapidly in the last 12 years to becoming very appearance based. • The HOG descriptor lead to fast recognition of specific views of generic objects, starting with pedestrians and using SVMs. • Deformable parts models extended that to allow more objects with articulated limbs, but still specific views. • CNNs have become the method of choice; they learn from huge amounts of data and can learn multiple views of each object class. 93