SlideShare a Scribd company logo
1 of 41
Perception and Intelligence Lab.
Scalable Object Detection
using Deep Neural Networks
Saturday, June 10, 2017
Presenter: Junho Cho
Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov
Google, Inc
CVPR 2014
Perception and Intelligence Lab.
+ DeepMultiBox
 Scalable object detection using DNN
+ Class-agnostic scalable object detection
 Only Bounding box. Not aware of what the object is in the box.
 Prediction a set of bounding boxes where potential objects are
 Localize then recognize
+ Boxes generated using single DNN
 Outputs
• fixed number of bounding boxes.
• A score for each box. Confidence of the box containing an object.
2
Introduction
Perception and Intelligence Lab.
+ Common paradigm
 Detection of particular class object.
 Operate on sub-image and apply detectors in exhaustive manner
• All locations and scales
 Was successful within discriminatively trained DPM (PAMI 2010)
 Too much computations
 Harder as # of classes ↑
• Train a separate detector per class
3
Previous work
Perception and Intelligence Lab.
+ Model
 Encode i-th object box and its confidence as node values of last net layer
+ Bounding box
 Upper-left and lower-right co-ordinates
 Vector: 𝒍𝒊 ∈ ℝ 𝟒
4 node values
 Normalized co-ordinates w.r.t. image dim.
 Linear transform of the last hidden layer
+ Confidence
 Confidence score for the box containing an object
Score: 𝒄𝒊∈ [𝟎, 𝟏] 1 node value
 Linear transform of the last hidden layer followed by a sigmoid
4
x1, y1
x2, y2
𝒍𝒊=[x1 y1 x2 y2]
Proposed approach
Perception and Intelligence Lab.
+Inference time
 𝐾 bounding boxes. 𝐾 = 100 𝑜𝑟 200
 Bounding box locations. 𝑙𝑖, 𝑖 ∈ 1, … 𝐾
 Confidences 𝑐𝑖, 𝑖 ∈ 1, … 𝐾
5
Proposed approach
K=100 500 nodes
K=200 1000 nodes
𝑥1
𝑦1
𝑥1
𝑦2
𝑐1
𝑙1
5 𝑛𝑜𝑑𝑒𝑠
…
𝑐𝑖
…
…
𝐾
𝑐 𝐾
Perception and Intelligence Lab.
+Train Objective
 Train DNN to predict 𝑙𝑖 and their 𝑐𝑖
• Such that highest scoring boxes match well with the
ground truth object boxes.
6
Proposed approach
Perception and Intelligence Lab.
+ A training image with 𝑀 ground truth(GT)s objects with
labeled by bounding boxes.
 Bounding boxes: 𝑔𝑗, 𝑗 ∈ {1, … , 𝑀}
 Practically, 𝐾 ≫ 𝑀
Optimize only best matches with ground truth.
7
Proposed approach
Perception and Intelligence Lab.
+ Formulation of assignment problem
+ 𝑥 is assignment from predicted bounding box to GT.
+ 𝑥𝑖𝑗 ∈ 0, 1 (𝑖 ∈ {1, … 𝐾}, 𝑗 ∈ {1, … , 𝑀})
+ 𝑥𝑖𝑗 = 1  the 𝑖-th prediction is assigned to 𝑗-th true obj.
+ Localization loss
8
Proposed approach
1
…
i
…
K
1
..
j
…
M
Prediction GT
𝑥𝑖𝑗 = 1
Perception and Intelligence Lab.
+ Optimize confidences of the boxes to the assignment 𝑥𝑖
+ Confidence loss
+ Term (a)
 For all predicted box 𝒊 is assigned to ground truth 𝒋.
 𝑥𝑖𝑗 = 1 and maximize 𝑐𝑖
+ Term (b)
 𝑗 𝑥𝑖𝑗 = 1  prediction 𝑖 has been matched to a ground truth.
• becomes zero
 𝑗 𝑥𝑖𝑗 = 0  prediction 𝑖 has not been matched to a ground truth.
• Minimize 𝑐𝑖
9
Proposed approach
1
…
i
…
K
1
..
j
…
M
Prediction GT
𝑥𝑖𝑗 = 1
(a) (b)
Perception and Intelligence Lab.
+ Final loss objective.
 Combination of localization loss and confidence loss
+ 𝛼: balance term.
 Used 0.3
+ Optimization.
 For each training example, solve an optimal assignment 𝑥∗
Proposed approach
Perception and Intelligence Lab.
+Bipartite matching
 Polynomial in complexity.
• Ex) Hungarian method, time complexity: 𝑂(𝑛3)
 Inexpensive matching
• Most case, # of ground truth ≤ a dozen
 Thus fast
11
Proposed approach
1
…
…
…
…
…
….
K
1
2
3
4
5
Prediction GT
Perception and Intelligence Lab.
+ For example… 5 of GT & K # of Prediction
12
Proposed approach
3
2
4
1
Actually K=100 or 200
More red boxes
Find best match GT to
Predction
1
4 3
25
6
5
Perception and Intelligence Lab.
+ Optimize network parameters
 Via Back Propagation(BP)
 First derivatives of BP algorithm on 𝑙 and 𝑐
 Update network parameters after eval gradient given 𝑥∗
 Train with Stochastic Gradient Descent
13
Proposed approach
Perception and Intelligence Lab.
+ Sufficient principle of training model
 but additional modification enable training more accurate and faster
+Modification
1. Cluster all training GT locations. All 𝑔𝑖 from train images
• Find 𝐾 such clusters/centroids (K-means)
– 𝐾 : # of predictions
• And use as priors for each of predicted locations.
• Encourage to learn a residual to a prior.
 Prediction learns from corresponding prior
• 1st prior to 𝑙1 node
• …
• 𝐾 𝑡ℎ
prior to 𝑙 𝐾 node
 𝑙𝑖 node predicts box close to 𝑖 𝑡ℎ prior.
14
Proposed approach
1
2
3
…
…
…
….
K
Prior
1
2
3
…
…
…
….
K
Prediction
Learn from
Perception and Intelligence Lab.
+ Modification
2. Use these priors in matching process instead.
• Find best match b/w the 𝑲 priors & GT
• Confidence loss and Localization loss b/w
GT & coordinates of prediction matched to priors
• Call it prior matching
– Hypothesis: Enforces diversification among predictions
• Without it, slow convergence speed, low quality of model
15
Proposed approach
1
…
3
…
…
…
….
K
1
2
3
4
5
Prior GT
Best
Match1
…
3
…
…
…
….
K
Prediction
Prediction corresponding to prior
Loss training
 Prediction guided by Prior
Perception and Intelligence Lab.
16
Proposed approach
+ Prediction corresponding to prior
 Learn to predict near prior
 Prediction guided by prior
1
6
6
Perception and Intelligence Lab.
First localize, (DeepMultiBox)
+ Predict bounding box locations and associated confidences.
+ Can use confidence score and Non-Maximum-Suppression (NMS)
 to obtain smaller # of high confidence boxes.
+ Boxes supposed to represent objects.
then recognize
+ Can use subsequent classifier for object detection.
+ Can use powerful classifier
 Because of small # of boxes
+ In the paper, used second DNN for classification
 AlexNet. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks.
17
Proposed approach
Perception and Intelligence Lab.
+Experiment details
 Parallel training
• Faster convergence
 Boxes pruned using NMS
• Jaccard (
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝑈𝑛𝑖𝑜𝑛
) similarity threshold of 0.5
 Generate more images from the dataset
• 0-5%, 5-15%, 15-50%, 50-100% of images.
18
Experiment results
Perception and Intelligence Lab.
+ VOC 2007
 The Pascal Visual Object Classes Challenge
 20 object classes labelled on bounding box
+ Training on VOC 2012. 11000 images
 Trained K = 100 box localizer
 Trained on data set comprising of
• 10 million crops overlapping some obj
– At least 0.5 Jaccard overlap similarity
• 20 million negative crops
– At most 0.2 Jaccard sim with any obj boxes.
– Labeled with “background” class label
19
Experiment results - VOC 2007
Perception and Intelligence Lab.
+ Evaluation
 Maximum center square crop
• Resized to network input size 220x220 (AlexNet)
 Single pass, hundred candidate boxes
 Apply NMS, top 10 highest scoring detections
 Classified by 21-way classifier (20 classes + background class)
20
Experiment results - VOC 2007
Max
Square
crop
InputSize
220x220
Perception and Intelligence Lab.
+ Discussion
 Analyze on localizer in isolation
 Additional scales
• 3x3 windows of size 60%
of image
 10 bounding boxes localizing
• Max center : 45.3%
• Max center + 1 scale: 48%
 Importance of looking image at several resolution
• Better with high resolution image crops.
 Better than other reported result
• 42% (What is an object? CVPR2010)
21
Experiment results - VOC 2007
Perception and Intelligence Lab.
+ Discussion
 Post-classification
• mAP: 0.29
 Quite competitive
• As running time complexity very low
• Use top 10 boxes
22
Experiment results - VOC 2007
DPM
DPM
Perception and Intelligence Lab.
23
Experiment results - VOC 2007
Max-center crop
Full image used
But small object
Detectable
Such as
Boats, Sheep
Perception and Intelligence Lab.
+ ILSVRC 2012 Classification with Localization Challenge
 Localization model with more heuristic methods
• Inception architecture
24
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ ILSVRC 2012 Classification with Localization Challenge
 Localization model with more heuristic methods
• Inception architecture
 After classification
 Much less # of proposals.
25
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ MultiBox approach can use transfer learning
 To detect objects which never specifically trained on.
• similarities with objects that it has seen.
 Figure 5. trained on ImagetNet and test on VOC test set
• And vice versa
 Performed class-agnostic detection.
26
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ ImageNet-trained model capture more VOC windows
 Comapared to vice versa
 Hypothesize: Due to the ImageNet class set being more richer
than VOC class set.
27
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ three contributions
1. New definition of Object Detection
• A regression problem to the coordinates of several bounding boxes, as well
as a confidence score of how likely this box contains an object.
• Traditionally, score features within predefined boxes.
28
Contributions
Perception and Intelligence Lab.
+ three contributions
2. Loss function which trains bounding box predictors
as part of network training
• Solve assignment problem by utilize learning abilities of DNN
• Back Propagation
29
Contributions
Perception and Intelligence Lab.
+ three contributions
3. Train object box detector in class-agnostic manner
• Scalable way to detect large # of object classes.
• Post-classifying, achieve competitive detection results.
• Box predictor generalizes over unseen classes
– Flexible to be re-used to the other detection problems.
30
Contributions
Perception and Intelligence Lab.
+ Competitive method.
 Better detection performance but larger computations
 OverFeat
• Efficient sliding ConvNet at multiple locations and scales
• Predicting one bounding box per class
• 2 sec/image on GPU.
• 40x slower than GPU implementation of DeepMultiBox
• SCR, centered crop: closest method to DeepMultiBox
– Scores 40.0% while DeepMultiBox scores 40.94%
• DeepMultiBox extracts multiple regions of interest in one network evaluation.
Discussion and Conclusion
Perception and Intelligence Lab.
+ Competitive method.
 R-CNN using selective search
• Propose 2000 candidates locations per image
• Extract top layer features from ConvNet
• Use hard-negative trained SVM to classify the locations into VOC classes
• 200x more expensive
Discussion and Conclusion
Perception and Intelligence Lab.
+ Current state (localization network and categorization network)
 5 – 10 network evaluations
• 1 network for localization and several more for classification
 Does not scale linearly with # of classes to be recognized.
 Which makes very competitive with DPM-like approaches.
+ Hope to build localization and recognition into a single network.
 Extract both locations and class label in a single feed-forward pass in
network.
Discussion and Conclusion
Perception and Intelligence Lab.
Thank you
Perception and Intelligence Lab.
+ AlexNet (NIPS 2012)
Convolution – pooling – ReLU – Normalize
= 1 convolutional layer
 5 convolutional layer
 2 fully-connected hidden layer
35
Introduction
Perception and Intelligence Lab.
+ Evaluation
 Detection@5
• Produce one box per each of the 5 labels
– Positive when at least one box and associated label are correct
• Jaccard 0.5 overlap
• Table 2.
– # of windows chosen after NMS, ranking from confidence score
36
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ Compare with One-box-per-class
 re-implementation of the winning entry of ILSVRC-2012 “classification
with localization” challenge
• SuverVision. Hinton.
– Code not provided…
 DeepMultiBox is competitive with 5-10 windows
 Two Drawbacks:
1. Output scales linearly with the # of classes
2. Doesn’t generalize naturally to multiple instances of obj of the same type.
37
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
2. Doesn’t generalize naturally to multiple same type object.
+ Generalization to such scenario
+ necessary for actual image understanding.
+ DeepMultiBox : scalable way
+ At Fig 5., it generally capture more objects more
accurately than a single-box method.
38
Experiment results – ILSVRC 2012
Perception and Intelligence Lab.
+ Novel method for localizing object in an image.
+ Uses deep CNN as base feature extraction and learning model.
+ Formulates multi box localization cost
 Taking advantage of # of GT locations
 Learn to predict such locations in unseen images.
Discussion and Conclusion
Perception and Intelligence Lab.
+ Results on challenging benchmarks. VOC 2007 & ILSVRC 2012
+ Work fine by predicting only very few locations.
 To be probed by a subsequent classifier
+ Scalable and generalize across two datasets.
 Being able to predict locations of interest, even not trained on such class.
+ Capture multiple instances of same class
 Important feature. Aims better image understanding.
Discussion and Conclusion
Perception and Intelligence Lab.
+ Predicting more windows, able to capture more GT bounding boxes.
 But no comparable increase in mAP on VOC2007
 Hypothesize: classification model works better with hard-negative mining & learn
to better model with local features, the context and detector confidences jointly
take advantage of the proposed window
.
Discussion and Conclusion

More Related Content

What's hot

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 

What's hot (20)

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Detection
DetectionDetection
Detection
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 

Similar to 150424 Scalable Object Detection using Deep Neural Networks

K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithmsMark Moriarty
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료taeseon ryu
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateBilly Yang
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...MostafaHazemMostafaa
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 

Similar to 150424 Scalable Object Detection using Deep Neural Networks (20)

K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & Update
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

150424 Scalable Object Detection using Deep Neural Networks

  • 1. Perception and Intelligence Lab. Scalable Object Detection using Deep Neural Networks Saturday, June 10, 2017 Presenter: Junho Cho Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov Google, Inc CVPR 2014
  • 2. Perception and Intelligence Lab. + DeepMultiBox  Scalable object detection using DNN + Class-agnostic scalable object detection  Only Bounding box. Not aware of what the object is in the box.  Prediction a set of bounding boxes where potential objects are  Localize then recognize + Boxes generated using single DNN  Outputs • fixed number of bounding boxes. • A score for each box. Confidence of the box containing an object. 2 Introduction
  • 3. Perception and Intelligence Lab. + Common paradigm  Detection of particular class object.  Operate on sub-image and apply detectors in exhaustive manner • All locations and scales  Was successful within discriminatively trained DPM (PAMI 2010)  Too much computations  Harder as # of classes ↑ • Train a separate detector per class 3 Previous work
  • 4. Perception and Intelligence Lab. + Model  Encode i-th object box and its confidence as node values of last net layer + Bounding box  Upper-left and lower-right co-ordinates  Vector: 𝒍𝒊 ∈ ℝ 𝟒 4 node values  Normalized co-ordinates w.r.t. image dim.  Linear transform of the last hidden layer + Confidence  Confidence score for the box containing an object Score: 𝒄𝒊∈ [𝟎, 𝟏] 1 node value  Linear transform of the last hidden layer followed by a sigmoid 4 x1, y1 x2, y2 𝒍𝒊=[x1 y1 x2 y2] Proposed approach
  • 5. Perception and Intelligence Lab. +Inference time  𝐾 bounding boxes. 𝐾 = 100 𝑜𝑟 200  Bounding box locations. 𝑙𝑖, 𝑖 ∈ 1, … 𝐾  Confidences 𝑐𝑖, 𝑖 ∈ 1, … 𝐾 5 Proposed approach K=100 500 nodes K=200 1000 nodes 𝑥1 𝑦1 𝑥1 𝑦2 𝑐1 𝑙1 5 𝑛𝑜𝑑𝑒𝑠 … 𝑐𝑖 … … 𝐾 𝑐 𝐾
  • 6. Perception and Intelligence Lab. +Train Objective  Train DNN to predict 𝑙𝑖 and their 𝑐𝑖 • Such that highest scoring boxes match well with the ground truth object boxes. 6 Proposed approach
  • 7. Perception and Intelligence Lab. + A training image with 𝑀 ground truth(GT)s objects with labeled by bounding boxes.  Bounding boxes: 𝑔𝑗, 𝑗 ∈ {1, … , 𝑀}  Practically, 𝐾 ≫ 𝑀 Optimize only best matches with ground truth. 7 Proposed approach
  • 8. Perception and Intelligence Lab. + Formulation of assignment problem + 𝑥 is assignment from predicted bounding box to GT. + 𝑥𝑖𝑗 ∈ 0, 1 (𝑖 ∈ {1, … 𝐾}, 𝑗 ∈ {1, … , 𝑀}) + 𝑥𝑖𝑗 = 1  the 𝑖-th prediction is assigned to 𝑗-th true obj. + Localization loss 8 Proposed approach 1 … i … K 1 .. j … M Prediction GT 𝑥𝑖𝑗 = 1
  • 9. Perception and Intelligence Lab. + Optimize confidences of the boxes to the assignment 𝑥𝑖 + Confidence loss + Term (a)  For all predicted box 𝒊 is assigned to ground truth 𝒋.  𝑥𝑖𝑗 = 1 and maximize 𝑐𝑖 + Term (b)  𝑗 𝑥𝑖𝑗 = 1  prediction 𝑖 has been matched to a ground truth. • becomes zero  𝑗 𝑥𝑖𝑗 = 0  prediction 𝑖 has not been matched to a ground truth. • Minimize 𝑐𝑖 9 Proposed approach 1 … i … K 1 .. j … M Prediction GT 𝑥𝑖𝑗 = 1 (a) (b)
  • 10. Perception and Intelligence Lab. + Final loss objective.  Combination of localization loss and confidence loss + 𝛼: balance term.  Used 0.3 + Optimization.  For each training example, solve an optimal assignment 𝑥∗ Proposed approach
  • 11. Perception and Intelligence Lab. +Bipartite matching  Polynomial in complexity. • Ex) Hungarian method, time complexity: 𝑂(𝑛3)  Inexpensive matching • Most case, # of ground truth ≤ a dozen  Thus fast 11 Proposed approach 1 … … … … … …. K 1 2 3 4 5 Prediction GT
  • 12. Perception and Intelligence Lab. + For example… 5 of GT & K # of Prediction 12 Proposed approach 3 2 4 1 Actually K=100 or 200 More red boxes Find best match GT to Predction 1 4 3 25 6 5
  • 13. Perception and Intelligence Lab. + Optimize network parameters  Via Back Propagation(BP)  First derivatives of BP algorithm on 𝑙 and 𝑐  Update network parameters after eval gradient given 𝑥∗  Train with Stochastic Gradient Descent 13 Proposed approach
  • 14. Perception and Intelligence Lab. + Sufficient principle of training model  but additional modification enable training more accurate and faster +Modification 1. Cluster all training GT locations. All 𝑔𝑖 from train images • Find 𝐾 such clusters/centroids (K-means) – 𝐾 : # of predictions • And use as priors for each of predicted locations. • Encourage to learn a residual to a prior.  Prediction learns from corresponding prior • 1st prior to 𝑙1 node • … • 𝐾 𝑡ℎ prior to 𝑙 𝐾 node  𝑙𝑖 node predicts box close to 𝑖 𝑡ℎ prior. 14 Proposed approach 1 2 3 … … … …. K Prior 1 2 3 … … … …. K Prediction Learn from
  • 15. Perception and Intelligence Lab. + Modification 2. Use these priors in matching process instead. • Find best match b/w the 𝑲 priors & GT • Confidence loss and Localization loss b/w GT & coordinates of prediction matched to priors • Call it prior matching – Hypothesis: Enforces diversification among predictions • Without it, slow convergence speed, low quality of model 15 Proposed approach 1 … 3 … … … …. K 1 2 3 4 5 Prior GT Best Match1 … 3 … … … …. K Prediction Prediction corresponding to prior Loss training  Prediction guided by Prior
  • 16. Perception and Intelligence Lab. 16 Proposed approach + Prediction corresponding to prior  Learn to predict near prior  Prediction guided by prior 1 6 6
  • 17. Perception and Intelligence Lab. First localize, (DeepMultiBox) + Predict bounding box locations and associated confidences. + Can use confidence score and Non-Maximum-Suppression (NMS)  to obtain smaller # of high confidence boxes. + Boxes supposed to represent objects. then recognize + Can use subsequent classifier for object detection. + Can use powerful classifier  Because of small # of boxes + In the paper, used second DNN for classification  AlexNet. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. 17 Proposed approach
  • 18. Perception and Intelligence Lab. +Experiment details  Parallel training • Faster convergence  Boxes pruned using NMS • Jaccard ( 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝑈𝑛𝑖𝑜𝑛 ) similarity threshold of 0.5  Generate more images from the dataset • 0-5%, 5-15%, 15-50%, 50-100% of images. 18 Experiment results
  • 19. Perception and Intelligence Lab. + VOC 2007  The Pascal Visual Object Classes Challenge  20 object classes labelled on bounding box + Training on VOC 2012. 11000 images  Trained K = 100 box localizer  Trained on data set comprising of • 10 million crops overlapping some obj – At least 0.5 Jaccard overlap similarity • 20 million negative crops – At most 0.2 Jaccard sim with any obj boxes. – Labeled with “background” class label 19 Experiment results - VOC 2007
  • 20. Perception and Intelligence Lab. + Evaluation  Maximum center square crop • Resized to network input size 220x220 (AlexNet)  Single pass, hundred candidate boxes  Apply NMS, top 10 highest scoring detections  Classified by 21-way classifier (20 classes + background class) 20 Experiment results - VOC 2007 Max Square crop InputSize 220x220
  • 21. Perception and Intelligence Lab. + Discussion  Analyze on localizer in isolation  Additional scales • 3x3 windows of size 60% of image  10 bounding boxes localizing • Max center : 45.3% • Max center + 1 scale: 48%  Importance of looking image at several resolution • Better with high resolution image crops.  Better than other reported result • 42% (What is an object? CVPR2010) 21 Experiment results - VOC 2007
  • 22. Perception and Intelligence Lab. + Discussion  Post-classification • mAP: 0.29  Quite competitive • As running time complexity very low • Use top 10 boxes 22 Experiment results - VOC 2007 DPM DPM
  • 23. Perception and Intelligence Lab. 23 Experiment results - VOC 2007 Max-center crop Full image used But small object Detectable Such as Boats, Sheep
  • 24. Perception and Intelligence Lab. + ILSVRC 2012 Classification with Localization Challenge  Localization model with more heuristic methods • Inception architecture 24 Experiment results – ILSVRC 2012
  • 25. Perception and Intelligence Lab. + ILSVRC 2012 Classification with Localization Challenge  Localization model with more heuristic methods • Inception architecture  After classification  Much less # of proposals. 25 Experiment results – ILSVRC 2012
  • 26. Perception and Intelligence Lab. + MultiBox approach can use transfer learning  To detect objects which never specifically trained on. • similarities with objects that it has seen.  Figure 5. trained on ImagetNet and test on VOC test set • And vice versa  Performed class-agnostic detection. 26 Experiment results – ILSVRC 2012
  • 27. Perception and Intelligence Lab. + ImageNet-trained model capture more VOC windows  Comapared to vice versa  Hypothesize: Due to the ImageNet class set being more richer than VOC class set. 27 Experiment results – ILSVRC 2012
  • 28. Perception and Intelligence Lab. + three contributions 1. New definition of Object Detection • A regression problem to the coordinates of several bounding boxes, as well as a confidence score of how likely this box contains an object. • Traditionally, score features within predefined boxes. 28 Contributions
  • 29. Perception and Intelligence Lab. + three contributions 2. Loss function which trains bounding box predictors as part of network training • Solve assignment problem by utilize learning abilities of DNN • Back Propagation 29 Contributions
  • 30. Perception and Intelligence Lab. + three contributions 3. Train object box detector in class-agnostic manner • Scalable way to detect large # of object classes. • Post-classifying, achieve competitive detection results. • Box predictor generalizes over unseen classes – Flexible to be re-used to the other detection problems. 30 Contributions
  • 31. Perception and Intelligence Lab. + Competitive method.  Better detection performance but larger computations  OverFeat • Efficient sliding ConvNet at multiple locations and scales • Predicting one bounding box per class • 2 sec/image on GPU. • 40x slower than GPU implementation of DeepMultiBox • SCR, centered crop: closest method to DeepMultiBox – Scores 40.0% while DeepMultiBox scores 40.94% • DeepMultiBox extracts multiple regions of interest in one network evaluation. Discussion and Conclusion
  • 32. Perception and Intelligence Lab. + Competitive method.  R-CNN using selective search • Propose 2000 candidates locations per image • Extract top layer features from ConvNet • Use hard-negative trained SVM to classify the locations into VOC classes • 200x more expensive Discussion and Conclusion
  • 33. Perception and Intelligence Lab. + Current state (localization network and categorization network)  5 – 10 network evaluations • 1 network for localization and several more for classification  Does not scale linearly with # of classes to be recognized.  Which makes very competitive with DPM-like approaches. + Hope to build localization and recognition into a single network.  Extract both locations and class label in a single feed-forward pass in network. Discussion and Conclusion
  • 34. Perception and Intelligence Lab. Thank you
  • 35. Perception and Intelligence Lab. + AlexNet (NIPS 2012) Convolution – pooling – ReLU – Normalize = 1 convolutional layer  5 convolutional layer  2 fully-connected hidden layer 35 Introduction
  • 36. Perception and Intelligence Lab. + Evaluation  Detection@5 • Produce one box per each of the 5 labels – Positive when at least one box and associated label are correct • Jaccard 0.5 overlap • Table 2. – # of windows chosen after NMS, ranking from confidence score 36 Experiment results – ILSVRC 2012
  • 37. Perception and Intelligence Lab. + Compare with One-box-per-class  re-implementation of the winning entry of ILSVRC-2012 “classification with localization” challenge • SuverVision. Hinton. – Code not provided…  DeepMultiBox is competitive with 5-10 windows  Two Drawbacks: 1. Output scales linearly with the # of classes 2. Doesn’t generalize naturally to multiple instances of obj of the same type. 37 Experiment results – ILSVRC 2012
  • 38. Perception and Intelligence Lab. 2. Doesn’t generalize naturally to multiple same type object. + Generalization to such scenario + necessary for actual image understanding. + DeepMultiBox : scalable way + At Fig 5., it generally capture more objects more accurately than a single-box method. 38 Experiment results – ILSVRC 2012
  • 39. Perception and Intelligence Lab. + Novel method for localizing object in an image. + Uses deep CNN as base feature extraction and learning model. + Formulates multi box localization cost  Taking advantage of # of GT locations  Learn to predict such locations in unseen images. Discussion and Conclusion
  • 40. Perception and Intelligence Lab. + Results on challenging benchmarks. VOC 2007 & ILSVRC 2012 + Work fine by predicting only very few locations.  To be probed by a subsequent classifier + Scalable and generalize across two datasets.  Being able to predict locations of interest, even not trained on such class. + Capture multiple instances of same class  Important feature. Aims better image understanding. Discussion and Conclusion
  • 41. Perception and Intelligence Lab. + Predicting more windows, able to capture more GT bounding boxes.  But no comparable increase in mAP on VOC2007  Hypothesize: classification model works better with hard-negative mining & learn to better model with local features, the context and detector confidences jointly take advantage of the proposed window . Discussion and Conclusion