Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Object Detection - Míriam Bellver - UPC Barcelona 2018

363 views

Published on

https://telecombcn-dl.github.io/2018-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Object Detection - Míriam Bellver - UPC Barcelona 2018

  1. 1. Míriam Bellver miriam.bellver@bsc.edu PhD Candidate Barcelona Supercomputing Center Object Detection Day 2 Lecture 1 #DLUPC http://bit.ly/dlcv2018
  2. 2. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image 2
  3. 3. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 3
  4. 4. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 4
  5. 5. Object Detection as Classification Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 5
  6. 6. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 6 Object Detection as Classification
  7. 7. Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it 7 Object Detection as Classification
  8. 8. Object Detection with ConvNets? Convnets are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 8
  9. 9. Object Detection: Datasets 9 20 categories 6k training images 6k validation images 10k test images 200 categories 456k training images 60k validation + test images 80 categories 200k training images 60k val + test images
  10. 10. Outline 10 Proposal-based methods Proposal-free methods
  11. 11. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector Slide Credit: CS231n 11
  12. 12. Region Proposals Selective Search (SS) Multiscale Combinatorial Grouping (MCG) [SS] Uijlings et al. Selective search for object recognition. IJCV 2013 [MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 12
  13. 13. Object Detection with Convnets: R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 13
  14. 14. R-CNN 14 We expect: We get: Non Maximum Suppression + score threshold
  15. 15. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 15
  16. 16. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors Slide Credit: CS231n 16
  17. 17. Fast R-CNN Girshick Fast R-CNN. ICCV 2015 Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 17
  18. 18. Fast R-CNN: Sharing features Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 18Girshick Fast R-CNN. ICCV 2015
  19. 19. Fast R-CNN Solution: Train it all together end to end R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 19Girshick Fast R-CNN. ICCV 2015
  20. 20. Fast R-CNN Slide Credit: CS231n R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better! 20
  21. 21. Fast R-CNN: Problem Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds don’t include region proposals 21
  22. 22. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 22 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  23. 23. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 23 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  24. 24. Region Proposal Network Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 24 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  25. 25. Faster R-CNN Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 25
  26. 26. Mask R-CNN 26 He et al. Mask R-CNN. ICCV 2017
  27. 27. Outline 27 Proposal-based methods Proposal-free methods
  28. 28. One-stage methods 28 Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it Previously… :
  29. 29. One-stage methods 29 Problem: Too many positions & scales to test Modern detectors parallelize feature extraction across all locations. Region classification is not slow anymore! Previously… :
  30. 30. YOLO: You Only Look Once 30Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline
  31. 31. YOLO: You Only Look Once Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 31
  32. 32. YOLO: You Only Look Once 32 Each cell predicts: - For each bounding box: - 4 coordinates (x, y, w, h) - 1 confidence value - Some number of class probabilities For Pascal VOC: - 7x7 grid - 2 bounding boxes / cell - 20 classes 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
  33. 33. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 33 Same idea as YOLO, + several predictors at different stages in the network
  34. 34. YOLOv2 34 Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
  35. 35. YOLOv3 35 YOLO v2 + residual blocks + skip connections + upsampling + detection at multiple scales
  36. 36. RetinaNet 36 Matching proposal-based performance with a one-stage approach Problem of one-stage detectors? They evaluate many candidate locations but only a few have objects ---> IMBALANCE, making learning inefficient Key idea is to lower loss weight for well classified samples, increase it for difficult ones. Lin et al. Focal Loss for Dense Object Detection. ICCV 2017
  37. 37. Overview 37
  38. 38. Summary 38 Proposal-based methods ● R-CNN ● Fast R-CNN ● Faster R-CNN ● Mask R-CNN Proposal-free methods ● YOLO ● SSD ● RetinaNet
  39. 39. Questions? 39

×