SlideShare a Scribd company logo
Object Detection
Computer Vision 2
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Spring 2020
Acknowledgements
2
Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
[UPC TelecomBCN 2016] [UPC TelecomBCN 2017]
Acknowledgements
3
[UPC TelecomBCN 2018]
Míriam Bellver
miriam.bellver@bsc.edu
PhD Candidate
Barcelona Supercomputing Center
Universitat Politècnica de Catalunya
Andreu Girbau
andreu.girbau@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
AutomaticTV
[UPC TelecomBCN 2019]
Outline
4
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Object Detection
CAT, DOG, DUCK
The task of assigning a label and a
bounding box to all objects in the
image:
1. We don’t know number of objects
2. Object detection relies on object
proposal and object classification
9
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
10
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
11
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
12
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
13
Object Detection as Classification
Challenge:
Very large amount of possibilities:
● position
● scale
● aspect ratio
14
Object Detection as Classification
Question: Do you think it is feasible to evaluate all possibilities ?
Challenge:
Very large amount of possibilities:
● position
● scale
● aspect ratio
Solution: If your classifier is fast enough, go for it
15
Object Detection as Classification
Object Detection with ConvNets?
Convnets are computationally demanding. We can’t test all positions & scales !
Solution: Look at a tiny subset of positions. Choose them wisely :)
16
Outline
17
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
Classic Datasets
18
PASCAL
20 categories
6k training images
6k validation images
10k test images
ILSVRC
200 categories
456k training images
60k validation + test images
COCO
80 categories
200k training images
60k val + test images
Classic Datasets
Classic Datasets
Open Images Dataset
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Open Images Dataset v6
PASCAL
20 categories
6k training images
6k validation images
10k test images
ILSVRC
200 categories
456k training images
60k validation + test images
COCO
80 categories
200k training images
60k val + test images
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Open Images Dataset v6
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Images with a large number of different classes annotated (11
on the left, 7 on the right).
Open Images Dataset v6
Outline
25
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
26
Evaluation metrics: Intersection over Union (IoU)
● aka Jaccard index
● Size of intersection divided by the size of
the union
● Evaluate localization
Figure: Pyimagesearch
27
Metric: Average Precision (AP) for Object Detection
Consider the case in which your object detection algorithm provides you:
● Coordinates for each bounding box.
● A confidence for each bounding box
0.7
0.9
Predictions
0.5
28
Rank your predictions based on the confidence score of your object detection
algorithm:
0.7
0.9
0.9
0.7
#1
#2
#3
Predictions
Metric: Average Precision (AP) for Object Detection
0.5
0.5
29
Set a criteria to identify whether your predictions are correct.
Typically, a minimum IoU with respect to the bounding boxes from the ground truth annotation.
○ For example, IoU > 0.5. This is referred as AP0.5
.
○ Other popular options: AP0.75
, or a range of IoU [0.5:0.95] in 0.05 steps
○ Each GT box can only be assigned to one predicted box.
0.7
0.9
0.9
0.7
#1
#2
#3
Ground truth True Positive (TP)
False Positive (FP)
0.5
0.5
Confidencescore
Metric: Average Precision (AP) for Object Detection
30
Compute the point of the Precision-Recall curve by considering as decision thresholds (Thr) the
confidence scores of the ranked detections.
Rank Correct ?
1 True
2 False
3 True
Ground truth True Positive (TP)
False Positive (FP) or
False Negative (FN)
0.7
0.9
0.5
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
Metric: Average Precision (AP) for Object Detection
31
In the object detection case, in which GT objects may never any predictions, we may consider that
trying to find the missing objects with an infinite amount of object proposals would drop precision
to 0.0, but would eventually find all objects, so recall would be 1.0
Table inspired by: Johnatan Hui, “mAP (mean Average Precision) for Object Detection” (Medium 2018)
Ground truth True Positive (TP)
False Positive (FP) or
False Negative (FN)
0.7
0.9
0.5
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
32
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
Precision
Recall
1.0
0.5
0.5 1.0
33
“The precision at each recall level r is interpolated by taking the maximum precision (...) for which the
corresponding recall exceeds r.” (from Pascal VOC) [ref]
[ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual
Object Classes (VOC) challenge." IJCV 2010.
Metric: Average Precision (AP) for Object Detection
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Precision
Recall
1.0
0.5
0.5 1.00
34
Actually, not all PR pairs need to be computed because AP for object detection only requires
the PR pairs related to True positives:
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
Precision
Recall
1.0
0.5
0.5 1.00
35
● The AP metric approximates the area of the PR curve.
● There are different methods for this approximation that may cause
inconsistencies between implementations.
● Popular ones
○ (suggested) “the mean precision at a set of eleven equally spaced
recall levels [0, 0.1, ...1]”
○ “weighted mean of precisions achieved at each threshold, with the
increase in recall from the previous threshold used as the weight”
(scikit-learn).
[ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual
Object Classes (VOC) challenge." IJCV 2010.
Metric: Average Precision (AP) for Object Detection
36
In our work, we adopt the approach from Pascal VOC:
● AP is “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]”
Threshold Precision Recall
0.9 1/1 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Recall Precision
0.0 1.00
0.1 1.00
0.2 1.00
0.3 0.67
0.4 0.67
0.5 0.00
... 0.00
1.0 0.00
AP 0.39
Precision
Recall
1.0
0.5
0.5 1.00
Metric: Average Precision (AP) for Object Detection
37
What if your object detection algorithm does not provide any confidence score ?
#1
#2
#3
Predictions
Metric: Average Precision w/o confidence scores
?
38
If your object detection algorithm does not provide any confidence score:
● Generate N random ranks (eg. N=10) and average your metrics across these N runs.
● Average the obtained APs.
#1
#2
#3
#1
#2
#3
#1
#2
#3
AP1
AP2
APN
...
AP
Metric: Average Precision w/o confidence scores
39
Evaluation metrics: mean Average Precision (mAP)
In the cases of multiple Q classes (eg. car, bike, person…), the mAP averages
across the AP(q) of each class:
● Further readings:
○ Tarang Sangh, “Measuring Object Detection models — mAP — What is Mean Average Precision?” (Medium
2018)
40
Evaluation metrics: Average Precision (AP)
You can obtain implementations for this Average Precision for Object Detection
from:
TensorFlow Microsoft CoCo dataset API
Outline
41
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Outline
42
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Object Detection
There are two main families:
● Two-Stage: Region proposal and then classification
● Single-Stage: A grid in the image where each cell is a
proposal
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
Slide Credit: CS231n 44
Region Proposals
45
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
Region Proposals
46
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
NMS: Non-Maximum Suppression
Region Proposals: from pixels
#SS Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. IJCV
2013
47
Region Proposals: from pixels
#MCG Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2016). Multiscale combinatorial grouping for
image segmentation and object proposal generation. TPAMI 2016
48
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. CVPR 2014.
49
R-CNN
R-CNN
50
We expect: We get:
Non Maximum Suppression + score threshold
R-CNN + Non Maximum Suppression (NMS)
51
#DPM Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively
trained part-based models. TPAMI 2009.
Figure: Adrian Rosebrock
52
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. CVPR 2014.
R-CNN
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of
CNN for each region proposal
2. SVMs and regressors are post-hoc: CNN features not
updated in response to SVMs and regressors
Slide Credit: CS231n 53
Fast R-CNN:
Girshick Fast R-CNN. ICCV 2015c
Solution: Share computation of convolutional layers between region proposals for an image
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
54
Fast R-CNN
Solution: Train it all together end to end
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
55Girshick Fast R-CNN. ICCV 2015
-Softmax over (K+1) classes and 4 box offsets
-Positive box are the ones with larger Intersection
Over Union with ground truth
Fast R-CNN: RoI-Pooling
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
(variable size)
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
(fixed size)
Fully-connected layers expect
low-res conv features:
C x h x w
Slide Credit: CS231n 56Girshick Fast R-CNN. ICCV 2015
RoI poolings allow 1) to propagate gradient only on interesting
regions, and 2) efficient computing.
Input: convolutional map + N regions of interest
Output: tensor of N x 7 x 7 x depth features
Fast R-CNN: RoI-Pooling
Slide Credit: CS231n 58
Fast R-CNN
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
mAP (VOC 2007) 66.0 66.9
Using VGG-16 CNN on Pascal VOC 2007 dataset
Faster!
FASTER!
Better!
Fast R-CNN: Limitation
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Test time per image
with Selective Search
50 seconds 2 seconds
(Speedup) 1x 25x
Test-time speeds do not include region proposals
59
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
Fast R-CNN
60
Learn proposals end-to-end sharing parameters with the classification network
#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region
proposal networks. NIPS 2015.
Faster R-CNN
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
61
Learn proposals end-to-end sharing parameters with the classification network
This network is called Region Proposal Network (RPN), and the proposals are learnt!!
#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region
proposal networks. NIPS 2015.
Faster R-CNN replaces
selective search (SS) with the
Region Proposal Network
(RPN), which is trained
jointly.
Faster R-CNN
Region Proposal Network (RPN)
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
63#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster R-CNN: Towards real-time object detection with region
proposal networks. NIPS 2015.
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fast R-CNN Faster R-CNN
Test time per image
(with proposals)
50 seconds 2 seconds 0.2 seconds
(Speedup) 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
Slide Credit: CS231n 64
Faster R-CNN
Mask R-CNN: Object Detection + Instance Segmentation
65
He et al. Mask R-CNN. ICCV 2017
Next lecture: Instance & Image Segmentation
66
Source: Detectron2
Carles
Ventura
Two-stage vs Single-stage methods
67
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Two-stage vs Single-stage methods
68
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Instead of having two networks
Region Proposals Network + Classifier Network
in one-stage architectures, bounding boxes and confidences for multiple categories
are predicted directly with a single network
Outline
69
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
One-stage methods
70
Problem:
Too many positions & scales to test
Previously… :
Overfeat
71#OverFeat Sermanet, Pierre, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. "Overfeat:
Integrated recognition, localization and detection using convolutional networks." ICLR 2014
One-stage methods
72
Problem:
Too many positions & scales to test
Solution: If your classifier is fast enough, go for it
Previously… :
73
Problem:
Too many positions & scales to test
Modern detectors parallelize feature extraction across all
locations.
Region classification is not slow anymore!
Previously… :
One-stage methods
YOLO: You Only Look Once
74#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
For each cell of the S x S predict:
● B boxes and confidence scores C (5 x B values) + classes c
75Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
Bounding boxes + confidence
Class probability map
Final detections
YOLO: You Only Look Once
76Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
Bounding boxes + confidence
Class probability map
Final detections
Final detections:
Cj * prob(c) > threshold
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 77
YOLO: You Only Look Once
YOLO: You Only Look Once
78
Each cell predicts:
- For each bounding box:
- 4 coordinates (x, y, w, h)
- 1 confidence value
- Some number of class
probabilities
For Pascal VOC:
- 7x7 grid
- 2 bounding boxes / cell
- 20 classes
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016
79
Same idea as YOLO, + several predictors at different stages in the network to allow different receptive
fields.
YOLOv2
80Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
YOLOv3
81
YOLO v2
+ residual blocks
+ skip connections
+ upsampling
+ detection at
multiple scales
YOLOv4
82Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection”
arXiv 2020.
83
#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Military Applications & Privacy Risks
84
RetinaNet
85
Matching proposal-based performance with a one-stage approach
Problem of one-stage detectors? They evaluate many candidate locations but only
a few have objects ---> IMBALANCE, making learning inefficient
Focal loss: Key idea is to lower loss weight for well classified samples, increase it
for difficult ones.
Lin et al. Focal Loss for Dense Object Detection. ICCV 2017
Overview
86
Neural Archictures for Object Detection
87
Two-stage methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● Mask R-CNN
Single-stage methods
● YOLO
● SSD
● RetinaNet
Software implementations
88
Most models are publicly available ready to be used off-the-shelf.
Model Framework
Faster R-CNN [torchvision] (< suggested)
[Detectron2] [Keras]
RetinaNet [Detectron2] (< suggested)
[Keras]
Benchmark [TensorFlow Object Detection API]
YOLOv3 [PyTorch]
SSD [PyTorch] [Tutorial on Keras]
Mask R-CNN [torchvision] (< suggested)
[PyTorch] [Keras & TF] [tutorial]
Software implementations
89
Wang, Xin, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. "Frustratingly Simple
Few-Shot Object Detection." arXiv preprint arXiv:2003.06957 (2020). [code based on Detectron 2]
Probably, you will not be interested in the object classes defined in Pascal/COCO. You can adapt
(fine-tune) existing models to your own object classes.
Software implementations for Mobile
90
TensorFlow Lite: Object Detection
PyTorch Mobile (no specific solutions for object detection)
Software implementations
91
Jordi Torres, “TensorFlow or PyTorch? ” (2020) [in Catalan]
Outline
92
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Next lab: ImageNet models
93
Dani
Fojo
Your questions
94

More Related Content

What's hot

Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
Object detection
Object detectionObject detection
Object detectionSomesh Vyas
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detectionchettykulkarni
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection PipelineAbhinav Dadhich
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 

What's hot (20)

Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Object detection
Object detectionObject detection
Object detection
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Yolo
YoloYolo
Yolo
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
Yolo
YoloYolo
Yolo
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 

Similar to Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionAmar Jindal
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillanceAshfaqul Haque John
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...Lviv Startup Club
 
Object detection at night
Object detection at nightObject detection at night
Object detection at nightSanjay Crúzé
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
EUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasEUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasAndy Rosales-Elias
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSSoma Boubou
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
IRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET Journal
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 

Similar to Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020 (20)

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
 
You only look once
You only look onceYou only look once
You only look once
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
Neural networks
Neural networksNeural networks
Neural networks
 
Object detection at night
Object detection at nightObject detection at night
Object detection at night
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
EUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasEUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales Elias
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
IRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning Technique
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxbenishzehra469
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 

Recently uploaded (20)

2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020

  • 1. Object Detection Computer Vision 2 Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya Spring 2020
  • 2. Acknowledgements 2 Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya [UPC TelecomBCN 2016] [UPC TelecomBCN 2017]
  • 3. Acknowledgements 3 [UPC TelecomBCN 2018] Míriam Bellver miriam.bellver@bsc.edu PhD Candidate Barcelona Supercomputing Center Universitat Politècnica de Catalunya Andreu Girbau andreu.girbau@upc.edu PhD Candidate Universitat Politècnica de Catalunya AutomaticTV [UPC TelecomBCN 2019]
  • 4. Outline 4 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage
  • 5. Recap Figure from Charles Ollion - Olivier Grisel
  • 6. Recap Figure from Charles Ollion - Olivier Grisel
  • 7. Recap Figure from Charles Ollion - Olivier Grisel
  • 8. Recap Figure from Charles Ollion - Olivier Grisel
  • 9. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image: 1. We don’t know number of objects 2. Object detection relies on object proposal and object classification 9
  • 10. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 10
  • 11. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 11
  • 12. Object Detection as Classification Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 12
  • 13. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 13 Object Detection as Classification
  • 14. Challenge: Very large amount of possibilities: ● position ● scale ● aspect ratio 14 Object Detection as Classification Question: Do you think it is feasible to evaluate all possibilities ?
  • 15. Challenge: Very large amount of possibilities: ● position ● scale ● aspect ratio Solution: If your classifier is fast enough, go for it 15 Object Detection as Classification
  • 16. Object Detection with ConvNets? Convnets are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 16
  • 17. Outline 17 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage
  • 18. Classic Datasets 18 PASCAL 20 categories 6k training images 6k validation images 10k test images ILSVRC 200 categories 456k training images 60k validation + test images COCO 80 categories 200k training images 60k val + test images
  • 21. Open Images Dataset Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
  • 22. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Open Images Dataset v6 PASCAL 20 categories 6k training images 6k validation images 10k test images ILSVRC 200 categories 456k training images 60k validation + test images COCO 80 categories 200k training images 60k val + test images
  • 23. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Open Images Dataset v6
  • 24. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Images with a large number of different classes annotated (11 on the left, 7 on the right). Open Images Dataset v6
  • 25. Outline 25 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 26. 26 Evaluation metrics: Intersection over Union (IoU) ● aka Jaccard index ● Size of intersection divided by the size of the union ● Evaluate localization Figure: Pyimagesearch
  • 27. 27 Metric: Average Precision (AP) for Object Detection Consider the case in which your object detection algorithm provides you: ● Coordinates for each bounding box. ● A confidence for each bounding box 0.7 0.9 Predictions 0.5
  • 28. 28 Rank your predictions based on the confidence score of your object detection algorithm: 0.7 0.9 0.9 0.7 #1 #2 #3 Predictions Metric: Average Precision (AP) for Object Detection 0.5 0.5
  • 29. 29 Set a criteria to identify whether your predictions are correct. Typically, a minimum IoU with respect to the bounding boxes from the ground truth annotation. ○ For example, IoU > 0.5. This is referred as AP0.5 . ○ Other popular options: AP0.75 , or a range of IoU [0.5:0.95] in 0.05 steps ○ Each GT box can only be assigned to one predicted box. 0.7 0.9 0.9 0.7 #1 #2 #3 Ground truth True Positive (TP) False Positive (FP) 0.5 0.5 Confidencescore Metric: Average Precision (AP) for Object Detection
  • 30. 30 Compute the point of the Precision-Recall curve by considering as decision thresholds (Thr) the confidence scores of the ranked detections. Rank Correct ? 1 True 2 False 3 True Ground truth True Positive (TP) False Positive (FP) or False Negative (FN) 0.7 0.9 0.5 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 Metric: Average Precision (AP) for Object Detection
  • 31. 31 In the object detection case, in which GT objects may never any predictions, we may consider that trying to find the missing objects with an infinite amount of object proposals would drop precision to 0.0, but would eventually find all objects, so recall would be 1.0 Table inspired by: Johnatan Hui, “mAP (mean Average Precision) for Object Detection” (Medium 2018) Ground truth True Positive (TP) False Positive (FP) or False Negative (FN) 0.7 0.9 0.5 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection
  • 32. 32 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection Precision Recall 1.0 0.5 0.5 1.0
  • 33. 33 “The precision at each recall level r is interpolated by taking the maximum precision (...) for which the corresponding recall exceeds r.” (from Pascal VOC) [ref] [ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual Object Classes (VOC) challenge." IJCV 2010. Metric: Average Precision (AP) for Object Detection Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Precision Recall 1.0 0.5 0.5 1.00
  • 34. 34 Actually, not all PR pairs need to be computed because AP for object detection only requires the PR pairs related to True positives: Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection Precision Recall 1.0 0.5 0.5 1.00
  • 35. 35 ● The AP metric approximates the area of the PR curve. ● There are different methods for this approximation that may cause inconsistencies between implementations. ● Popular ones ○ (suggested) “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]” ○ “weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight” (scikit-learn). [ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual Object Classes (VOC) challenge." IJCV 2010. Metric: Average Precision (AP) for Object Detection
  • 36. 36 In our work, we adopt the approach from Pascal VOC: ● AP is “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]” Threshold Precision Recall 0.9 1/1 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Recall Precision 0.0 1.00 0.1 1.00 0.2 1.00 0.3 0.67 0.4 0.67 0.5 0.00 ... 0.00 1.0 0.00 AP 0.39 Precision Recall 1.0 0.5 0.5 1.00 Metric: Average Precision (AP) for Object Detection
  • 37. 37 What if your object detection algorithm does not provide any confidence score ? #1 #2 #3 Predictions Metric: Average Precision w/o confidence scores ?
  • 38. 38 If your object detection algorithm does not provide any confidence score: ● Generate N random ranks (eg. N=10) and average your metrics across these N runs. ● Average the obtained APs. #1 #2 #3 #1 #2 #3 #1 #2 #3 AP1 AP2 APN ... AP Metric: Average Precision w/o confidence scores
  • 39. 39 Evaluation metrics: mean Average Precision (mAP) In the cases of multiple Q classes (eg. car, bike, person…), the mAP averages across the AP(q) of each class: ● Further readings: ○ Tarang Sangh, “Measuring Object Detection models — mAP — What is Mean Average Precision?” (Medium 2018)
  • 40. 40 Evaluation metrics: Average Precision (AP) You can obtain implementations for this Average Precision for Object Detection from: TensorFlow Microsoft CoCo dataset API
  • 41. Outline 41 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 42. Outline 42 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 43. Object Detection There are two main families: ● Two-Stage: Region proposal and then classification ● Single-Stage: A grid in the image where each cell is a proposal
  • 44. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector Slide Credit: CS231n 44
  • 45. Region Proposals 45 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90
  • 46. Region Proposals 46 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90 NMS: Non-Maximum Suppression
  • 47. Region Proposals: from pixels #SS Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. IJCV 2013 47
  • 48. Region Proposals: from pixels #MCG Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. TPAMI 2016 48
  • 49. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. 49 R-CNN
  • 50. R-CNN 50 We expect: We get: Non Maximum Suppression + score threshold
  • 51. R-CNN + Non Maximum Suppression (NMS) 51 #DPM Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part-based models. TPAMI 2009. Figure: Adrian Rosebrock
  • 52. 52 Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. R-CNN
  • 53. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors Slide Credit: CS231n 53
  • 54. Fast R-CNN: Girshick Fast R-CNN. ICCV 2015c Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 54
  • 55. Fast R-CNN Solution: Train it all together end to end R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 55Girshick Fast R-CNN. ICCV 2015 -Softmax over (K+1) classes and 4 box offsets -Positive box are the ones with larger Intersection Over Union with ground truth
  • 56. Fast R-CNN: RoI-Pooling Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal (variable size) Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal (fixed size) Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 56Girshick Fast R-CNN. ICCV 2015
  • 57. RoI poolings allow 1) to propagate gradient only on interesting regions, and 2) efficient computing. Input: convolutional map + N regions of interest Output: tensor of N x 7 x 7 x depth features Fast R-CNN: RoI-Pooling
  • 58. Slide Credit: CS231n 58 Fast R-CNN R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better!
  • 59. Fast R-CNN: Limitation Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds do not include region proposals 59
  • 60. Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 60 Learn proposals end-to-end sharing parameters with the classification network #Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS 2015. Faster R-CNN
  • 61. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 61 Learn proposals end-to-end sharing parameters with the classification network This network is called Region Proposal Network (RPN), and the proposals are learnt!! #Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS 2015.
  • 62. Faster R-CNN replaces selective search (SS) with the Region Proposal Network (RPN), which is trained jointly. Faster R-CNN
  • 63. Region Proposal Network (RPN) Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 63#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015.
  • 64. Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 64 Faster R-CNN
  • 65. Mask R-CNN: Object Detection + Instance Segmentation 65 He et al. Mask R-CNN. ICCV 2017
  • 66. Next lecture: Instance & Image Segmentation 66 Source: Detectron2 Carles Ventura
  • 67. Two-stage vs Single-stage methods 67 Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 68. Two-stage vs Single-stage methods 68 resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels Instead of having two networks Region Proposals Network + Classifier Network in one-stage architectures, bounding boxes and confidences for multiple categories are predicted directly with a single network
  • 69. Outline 69 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 70. One-stage methods 70 Problem: Too many positions & scales to test Previously… :
  • 71. Overfeat 71#OverFeat Sermanet, Pierre, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. "Overfeat: Integrated recognition, localization and detection using convolutional networks." ICLR 2014
  • 72. One-stage methods 72 Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it Previously… :
  • 73. 73 Problem: Too many positions & scales to test Modern detectors parallelize feature extraction across all locations. Region classification is not slow anymore! Previously… : One-stage methods
  • 74. YOLO: You Only Look Once 74#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input For each cell of the S x S predict: ● B boxes and confidence scores C (5 x B values) + classes c
  • 75. 75Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input Bounding boxes + confidence Class probability map Final detections YOLO: You Only Look Once
  • 76. 76Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input Bounding boxes + confidence Class probability map Final detections Final detections: Cj * prob(c) > threshold YOLO: You Only Look Once
  • 77. Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 77 YOLO: You Only Look Once
  • 78. YOLO: You Only Look Once 78 Each cell predicts: - For each bounding box: - 4 coordinates (x, y, w, h) - 1 confidence value - Some number of class probabilities For Pascal VOC: - 7x7 grid - 2 bounding boxes / cell - 20 classes 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
  • 79. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 79 Same idea as YOLO, + several predictors at different stages in the network to allow different receptive fields.
  • 80. YOLOv2 80Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
  • 81. YOLOv3 81 YOLO v2 + residual blocks + skip connections + upsampling + detection at multiple scales
  • 82. YOLOv4 82Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection” arXiv 2020.
  • 83. 83 #YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
  • 84. Military Applications & Privacy Risks 84
  • 85. RetinaNet 85 Matching proposal-based performance with a one-stage approach Problem of one-stage detectors? They evaluate many candidate locations but only a few have objects ---> IMBALANCE, making learning inefficient Focal loss: Key idea is to lower loss weight for well classified samples, increase it for difficult ones. Lin et al. Focal Loss for Dense Object Detection. ICCV 2017
  • 87. Neural Archictures for Object Detection 87 Two-stage methods ● R-CNN ● Fast R-CNN ● Faster R-CNN ● Mask R-CNN Single-stage methods ● YOLO ● SSD ● RetinaNet
  • 88. Software implementations 88 Most models are publicly available ready to be used off-the-shelf. Model Framework Faster R-CNN [torchvision] (< suggested) [Detectron2] [Keras] RetinaNet [Detectron2] (< suggested) [Keras] Benchmark [TensorFlow Object Detection API] YOLOv3 [PyTorch] SSD [PyTorch] [Tutorial on Keras] Mask R-CNN [torchvision] (< suggested) [PyTorch] [Keras & TF] [tutorial]
  • 89. Software implementations 89 Wang, Xin, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. "Frustratingly Simple Few-Shot Object Detection." arXiv preprint arXiv:2003.06957 (2020). [code based on Detectron 2] Probably, you will not be interested in the object classes defined in Pascal/COCO. You can adapt (fine-tune) existing models to your own object classes.
  • 90. Software implementations for Mobile 90 TensorFlow Lite: Object Detection PyTorch Mobile (no specific solutions for object detection)
  • 91. Software implementations 91 Jordi Torres, “TensorFlow or PyTorch? ” (2020) [in Catalan]
  • 92. Outline 92 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 93. Next lab: ImageNet models 93 Dani Fojo