Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep learning for object
detection
Wenjing Chen
*Created in March 2017, might be outdated the time you read.
Slide credit:...
Outline
1. Introduction
2. Common methods
Region proposal based methods
R-CNN, Fast R-CNN, Faster R-CNN, R-FCN, Mask R-CNN...
Introduction
one image -> one label one image -> labels + bounding boxes
Region based methods - R-CNN
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic s...
Region based methods - Fast R-CNN
Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Comput...
Region based methods - Faster R-CNN
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region pr...
Region based methods - Faster R-CNN
Region based methods - R-FCN
Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolution...
Region based methods - Mask R-CNN
He, Kaiming, et al. "Mask R-CNN." arXiv preprint arXiv:1703.06870 (2017).
Object instanc...
Single shot based method - YOLO
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedi...
Single shot based method - YOLOv2
Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint ar...
Single shot based method - SSD
Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Visi...
Comparison
From YOLOv2 From SSD
R-FCN
83.6% mAP
5.8fps
R-FCN
PASCAL VOC 2012
http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
Comparison
Speed
single shot > region based
Accuracy
region based > single shot
Complexity
YOLO < SSD ≤ Faster R-CNN < R-F...
Upcoming SlideShare
Loading in …5
×

Deep learning for object detection

3,023 views

Published on

A brief summary of common deep learning methods for object detection.

Published in: Technology
  • Be the first to comment

Deep learning for object detection

  1. 1. Deep learning for object detection Wenjing Chen *Created in March 2017, might be outdated the time you read. Slide credit: CS231n
  2. 2. Outline 1. Introduction 2. Common methods Region proposal based methods R-CNN, Fast R-CNN, Faster R-CNN, R-FCN, Mask R-CNN Single shot based methods YOLO, YOLOv2, SSD 1. Comparison
  3. 3. Introduction one image -> one label one image -> labels + bounding boxes
  4. 4. Region based methods - R-CNN Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
  5. 5. Region based methods - Fast R-CNN Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
  6. 6. Region based methods - Faster R-CNN Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
  7. 7. Region based methods - Faster R-CNN
  8. 8. Region based methods - R-FCN Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems. 2016. Average pooling
  9. 9. Region based methods - Mask R-CNN He, Kaiming, et al. "Mask R-CNN." arXiv preprint arXiv:1703.06870 (2017). Object instance segmentation:  Extend Faster R-CNN by adding a branch for predicting segmentation masks on each RoI  Running at 5 fps  Without tricks, outperforms all existing, single-model entries on every task in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection !!!
  10. 10. Single shot based method - YOLO Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016. 1. Resize input image to 448*448. 1. Run a single convolutional network. Predicts B bounding boxes (4 coordinates + confidence) and C class probabilities for S*S grids, encoded as an S*S*(B*5+C) tensor. 1. Non-maximum suppression. S*S*B bounding boxes per image and C class probabilities for each box.
  11. 11. Single shot based method - YOLOv2 Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint arXiv:1612.08242 (2016). YOLO problem: 1. Significant number of localization errors. 2. Low recall compared to region proposal based methods. Improvements:
  12. 12. Single shot based method - SSD Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016. Improvements: 1. Use a small convolutional filter to predict object categories and offsets in bounding box locations 2. Use multiple layers for prediction at different scales.
  13. 13. Comparison From YOLOv2 From SSD R-FCN 83.6% mAP 5.8fps R-FCN
  14. 14. PASCAL VOC 2012 http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
  15. 15. Comparison Speed single shot > region based Accuracy region based > single shot Complexity YOLO < SSD ≤ Faster R-CNN < R-FCN < YOLOv2(?)

×