Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[course site]
Object Detection
Day 2 Lecture 4
#DLUPC
Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Poli...
Object Detection
CAT, DOG, DUCK
The task of assigning a
label and a bounding box
to all objects in the image
2
Object Detection: Datasets
3
20 categories
6k training images
6k validation images
10k test images
200 categories
456k tra...
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
4
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
5
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
6
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
7
Object Detection as Classification
Problem:
Too many positions & scales to test
Solution: If your classifier is fast enough, go for it
8
Object Detection as ...
Object Detection with ConvNets?
Convnets are computationally demanding. We can’t test all positions & scales !
Solution: L...
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
● Loo...
Region Proposals
Selective Search (SS) Multiscale Combinatorial Grouping (MCG)
[SS] Uijlings et al. Selective search for o...
Object Detection with Convnets: R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic ...
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train...
R-CNN
14
We expect: We get:
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train...
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
16
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of
CNN for each region proposal
2. SVMs and regressors...
Fast R-CNN
Girshick Fast R-CNN. ICCV 2015
Solution: Share computation of convolutional layers between region proposals for...
Fast R-CNN: Sharing features
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv fe...
Fast R-CNN
Solution: Train it all at together E2E
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
...
Fast R-CNN
Slide Credit: CS231n
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 4...
Fast R-CNN: Problem
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Te...
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Pro...
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Pro...
Region Proposal Network
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scale...
Faster R-CNN: Training
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5...
Faster R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fa...
Faster R-CNN
28
● Faster R-CNN is the basis of the winners of COCO and
ILSVRC 2015&2016 object detection competitions.
He ...
YOLO: You Only Look Once
29Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free ...
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30
YOLO: You Only Look Once
31
Each cell predicts:
- For each bounding box:
- 4 coordinates (x, y, w, h)
- 1 confidence value...
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 32
Dog
Bicycle C...
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33
+ NMS
+ Score...
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 34
Same idea as YOLO, + severa...
YOLOv2
35
Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
YOLOv2
36
Results on Pascal VOC 2007
YOLOv2
37
Results on COCO test-dev 2015
Summary
38
Proposal-based methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● SPPnet
● R-FCN
Proposal-free methods
● YOLO, YOLOv...
Resources
39
● Official implementations:
○ Faster R-CNN [caffe]
○ Yolov2 [darknet]
○ SSD [caffe]
○ R-FCN [caffe][MxNet]
● ...
Questions?
YOLO: Training
41Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth
bounding box is matched into ...
YOLO: Training
42Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth
bounding box is matched into ...
YOLO: Training
43Slide credit: YOLO Presentation @ CVPR 2016
Optimize class prediction in that
cell:
dog: 1, cat: 0, bike:...
YOLO: Training
44Slide credit: YOLO Presentation @ CVPR 2016
Predicted boxes for this cell
YOLO: Training
45Slide credit: YOLO Presentation @ CVPR 2016
Find the best one wrt ground
truth bounding box, optimize it
...
YOLO: Training
46Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s
confidence, decrease
non-matched boxes...
YOLO: Training
47Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s
confidence, decrease
non-matched boxes...
YOLO: Training
48Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth
detections, confidences of all...
YOLO: Training
49Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth
detections:
● Confidences of a...
YOLO: Training, formally
50Slide credit: YOLO Presentation @ CVPR 2016
Bounding box
coordinate
regression
Bounding box
sco...
Upcoming SlideShare
Loading in …5
×

Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

1,762 views

Published on

https://telecombcn-dl.github.io/2017-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

  1. 1. [course site] Object Detection Day 2 Lecture 4 #DLUPC Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya
  2. 2. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image 2
  3. 3. Object Detection: Datasets 3 20 categories 6k training images 6k validation images 10k test images 200 categories 456k training images 60k validation + test images 80 categories 200k training images 60k val + test images
  4. 4. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 4
  5. 5. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 5 Object Detection as Classification
  6. 6. Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 6 Object Detection as Classification
  7. 7. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 7 Object Detection as Classification
  8. 8. Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it 8 Object Detection as Classification
  9. 9. Object Detection with ConvNets? Convnets are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 9
  10. 10. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector ● Look for “blob-like” regions Slide Credit: CS231n 10
  11. 11. Region Proposals Selective Search (SS) Multiscale Combinatorial Grouping (MCG) [SS] Uijlings et al. Selective search for object recognition. IJCV 2013 [MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 11
  12. 12. Object Detection with Convnets: R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 12
  13. 13. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 1. Train network on proposals 2. Post-hoc training of SVMs & Box regressors on fc7 features 13
  14. 14. R-CNN 14 We expect: We get:
  15. 15. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 1. Train network on proposals 2. Post-hoc training of SVMs & Box regressors on fc7 features 3. Non Maximum Suppression + score threshold 15
  16. 16. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 16
  17. 17. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors 3. Complex multistage training pipeline Slide Credit: CS231n 17
  18. 18. Fast R-CNN Girshick Fast R-CNN. ICCV 2015 Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 18
  19. 19. Fast R-CNN: Sharing features Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 19Girshick Fast R-CNN. ICCV 2015
  20. 20. Fast R-CNN Solution: Train it all at together E2E R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 20Girshick Fast R-CNN. ICCV 2015
  21. 21. Fast R-CNN Slide Credit: CS231n R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better! 21
  22. 22. Fast R-CNN: Problem Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds don’t include region proposals 22
  23. 23. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 23 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  24. 24. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 24 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  25. 25. Region Proposal Network Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 25 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  26. 26. Faster R-CNN: Training Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 26 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 RoI Pooling is not differentiable w.r.t box coordinates. Solutions: ● Alternate training ● Ignore gradient of classification branch w.r.t proposal coordinates ● Make pooling function differentiable (spoiler D3L6)
  27. 27. Faster R-CNN Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 27
  28. 28. Faster R-CNN 28 ● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015&2016 object detection competitions. He et al. Deep residual learning for image recognition. CVPR 2016
  29. 29. YOLO: You Only Look Once 29Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline
  30. 30. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30
  31. 31. YOLO: You Only Look Once 31 Each cell predicts: - For each bounding box: - 4 coordinates (x, y, w, h) - 1 confidence value - Some number of class probabilities For Pascal VOC: - 7x7 grid - 2 bounding boxes / cell - 20 classes 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
  32. 32. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 32 Dog Bicycle Car Dining Table Predict class probability for each cell
  33. 33. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33 + NMS + Score threshold
  34. 34. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 34 Same idea as YOLO, + several predictors at different stages in the network
  35. 35. YOLOv2 35 Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
  36. 36. YOLOv2 36 Results on Pascal VOC 2007
  37. 37. YOLOv2 37 Results on COCO test-dev 2015
  38. 38. Summary 38 Proposal-based methods ● R-CNN ● Fast R-CNN ● Faster R-CNN ● SPPnet ● R-FCN Proposal-free methods ● YOLO, YOLOv2 ● SSD
  39. 39. Resources 39 ● Official implementations: ○ Faster R-CNN [caffe] ○ Yolov2 [darknet] ○ SSD [caffe] ○ R-FCN [caffe][MxNet] ● Unofficial ports to other frameworks are likely to exist… eg type “yolo tensorflow” in your browser and pick the one you like best. ● Or… use the newly released Object detection API by Google: SSD, R-FCN & Faster R-CNN (code & pretrained models in tensorflow) Object detection tutorials (project ideas maybe?): ● Toy object detection (squares, circles, etc.) (keras) ● Object detection (pets dataset) (tensorflow)
  40. 40. Questions?
  41. 41. YOLO: Training 41Slide credit: YOLO Presentation @ CVPR 2016 For training, each ground truth bounding box is matched into the right cell
  42. 42. YOLO: Training 42Slide credit: YOLO Presentation @ CVPR 2016 For training, each ground truth bounding box is matched into the right cell
  43. 43. YOLO: Training 43Slide credit: YOLO Presentation @ CVPR 2016 Optimize class prediction in that cell: dog: 1, cat: 0, bike: 0, ...
  44. 44. YOLO: Training 44Slide credit: YOLO Presentation @ CVPR 2016 Predicted boxes for this cell
  45. 45. YOLO: Training 45Slide credit: YOLO Presentation @ CVPR 2016 Find the best one wrt ground truth bounding box, optimize it (i.e. adjust its coordinates to be closer to the ground truth’s coordinates)
  46. 46. YOLO: Training 46Slide credit: YOLO Presentation @ CVPR 2016 Increase matched box’s confidence, decrease non-matched boxes confidence
  47. 47. YOLO: Training 47Slide credit: YOLO Presentation @ CVPR 2016 Increase matched box’s confidence, decrease non-matched boxes confidence
  48. 48. YOLO: Training 48Slide credit: YOLO Presentation @ CVPR 2016 For cells with no ground truth detections, confidences of all predicted boxes are decreased
  49. 49. YOLO: Training 49Slide credit: YOLO Presentation @ CVPR 2016 For cells with no ground truth detections: ● Confidences of all predicted boxes are decreased ● Class probabilities are not adjusted
  50. 50. YOLO: Training, formally 50Slide credit: YOLO Presentation @ CVPR 2016 Bounding box coordinate regression Bounding box score prediction Class score prediction = 1 if box j and cell i are matched together, 0 otherwise = 1 if box j and cell i are NOT matched together = 1 if cell i has an object present

×