SlideShare a Scribd company logo
Fast Methods for Deep Learning based
Object Detection
R-CNN: Problems
● Training is a multi-stage pipeline.
○ R-CNN first finetunes a ConvNet on object proposals using log loss.
○ Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax
classifier learnt by fine-tuning.
○ In the third training stage, bounding-box regressors are learned.
● Training is expensive in space and time.
○ For SVM and bounding-box regressor training, features are extracted from each object proposal in
each image and written to disk.
○ With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the
VOC07 trainval set. These features require hundreds of gigabytes of storage.
● Object detection is slow.
○ At test-time, features are extracted from each object proposal in each test image.
○ Detection with VGG16 takes 47s / image (on a GPU).
Fast R-CNN
Fast R-CNN
Fast R-CNN
Fast R-CNN
Fast R-CNN
Fast R-CNN
Fast R-CNN
Training
Fast R-CNN
Training
● Only calculate features once.
● ROI Pooling layer extracts constant length vector representations of proposals.
● Classify and regress bounding boxes with multi purpose loss for end-to-end
training.
Fast R-CNN
Fast R-CNN: ROI Pooling
Fast R-CNN: ROI Pooling
Fast R-CNN: ROI Pooling
Fast R-CNN: ROI Pooling
Fast R-CNN: ROI Pooling
● Instead of SVM + bounding box regression:
○ SoftMax classifier output
○ Bounding box regression output
● Multi-task training:
Fast R-CNN
● Advantages
○ Training is single-stage, using a multi-task loss
○ Training can update all network layers
○ No disk storage is required for feature caching
○ More accurate 66.9mAP vs 66.0mAP.
○ Faster training time 9.5h vs 84h (x8.8)
○ Faster test time per image: 0.32s vs 47s (x146)
● Problem
○ Test time don’t include region proposals.
○ Test time with region proposals: 2s vs 50s (x25)
● Solution
○ Make the CNN do region proposals too!
Fast R-CNN
● Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks (2015)
○ Shaoqing Ren, Kaiming He, Ross Girshick
● Insert a Region Proposal Network (RPN) after the
last convolutional layer.
● RPN trained to produce region proposals directly;
no need for external region proposals!
● After RPN, use RoI Pooling and an upstream
classifier and bbox regressor just like Fast R-CNN.
Faster R-CNN
● Slide a small window on the already computed
feature map (FREE!).
● Build a small network for:
○ Classifying object or not-object, and
○ Regressing bbox locations
● Position of the sliding window provides
localization information with reference to the
image.
● Box regression provides finer localization
information with reference to this sliding
window
Faster R-CNN: RPN
● In the paper: Ugly pipeline
○ Use alternating optimization to train RPN, then Fast
R-CNN with RPN proposals, etc.
○ More complex than it has to be
● Since publication: Joint training!
○ One network, four losses
■ RPN classification (anchor good / bad)
■ RPN regression (anchor -> proposal)
■ Fast R-CNN classification (over classes)
■ Fast R-CNN regression (proposal -> box)
Faster R-CNN: Training
How Many Anchors Do We Need?
How Many Proposals Do We Need?
● Fast R-CNN used 2000 proposals from selective search.
● Faster R-CNN needs only 300 proposals from the RPN.
● RPN is better than selective search
○ Deep learning vs. classical computer vision
○ Optimized for this task
How Much Data Do We Need?
Also Read:
R-FCN: Object Detection via Region-based Fully
Convolutional Networks
https://arxiv.org/abs/1605.06409
Another Approach For
Speeding Up
Proposals
Another Approach For
Speeding Up
Proposals
Just Don’t Do It
Just RPN From Faster R-CNN
● Much faster than Faster R-CNN!
● But RPN had only object/not object classifier.
Add Classification!
● What about accuracy?
● How well does it handle different object scales?
Add More Scales!
Add More classifiers
SSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector
Why Does Stride Matter?
● Smaller stride means more scanned
windows.
● Handles close objects better.
○ Need to have enough default boxes to do
accurate matching in each.
● Handles small objects better.
○ Better IoU with objects.
○ More positive windows per object.
● Too little stride is bad
○ Too many windows means too many false
positives to filter.
Improving Accuracy
● Object detection data is unbalanced
○ 1-30 True Positives per image.
○ 8,000 - 25,000 False Positives per image.
● Solution
○ Resample at fixed ratio (1:3)
● Not all negatives are equal!
○ Some are harder than others
● Better Solution
○ Hard negative mining: resample worst-misclassified false positives at fixed ratio.
Improving Accuracy
● Not enough data?
● Solution: Data augmentation
○ Random horizontal flip
○ Random crop
○ Random color distortion
○ Random expansion
How Much Does It Help?
Also Read:
YOLO9000: Better, Faster, Stronger
https://arxiv.org/abs/1612.08242
Speed/accuracy factors in object detectors
● Algorithm: Faster R-CNN / SSD / R-FCN / YOLO / ...
● Backbone: VGG16 / ResNet / MobileNet / etc…
● Input size
● Many other hyperparameters...
Speed/accuracy trade-offs for modern convolutional object
detectors (Google)
Frameworks
● Caffe
○ Faster R-CNN: https://github.com/rbgirshick/py-faster-rcnn
○ SSD: https://github.com/weiliu89/caffe/tree/ssd
● Tensorflow Object Detection API:
○ https://github.com/tensorflow/models/tree/master/research/object_detection
● Detectron:
○ https://github.com/facebookresearch/Detectron
● Many more re-implementations in different languages...
Honorable mentions
● VGG16: https://arxiv.org/abs/1409.1556
● ResNet: https://arxiv.org/abs/1512.03385
● Inception-ResNet: https://arxiv.org/abs/1602.07261
● ResNeXt: https://arxiv.org/abs/1611.05431
● Xception: https://arxiv.org/abs/1610.02357
● DenseNet: https://arxiv.org/abs/1608.06993
● MobileNet: https://arxiv.org/abs/1704.04861
● SqueezeNet: https://arxiv.org/abs/1602.07360
Looking for brilliant researchers
cv@brodmann17.com

More Related Content

What's hot

Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
Junho Cho
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
Intel Nervana
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Detection
DetectionDetection
Detection
simplyinsimple
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 

What's hot (20)

Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Detection
DetectionDetection
Detection
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 

Similar to Fast methods for deep learning based object detection

Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
Entrepreneur / Startup
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
Universitat de Barcelona
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
Wanjin Yu
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
ssusere945ae
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Screen space reflections on Epsilon Engine
Screen space reflections on Epsilon EngineScreen space reflections on Epsilon Engine
Screen space reflections on Epsilon Engine
Imanol Fotia
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptxObject Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptx
usmanyaseen16
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks Presentation
Bohdan Klimenko
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Aerial detection part3
Aerial detection part3Aerial detection part3
Aerial detection part3
ssuser456ad6
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Lu Jiang
 

Similar to Fast methods for deep learning based object detection (20)

Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Screen space reflections on Epsilon Engine
Screen space reflections on Epsilon EngineScreen space reflections on Epsilon Engine
Screen space reflections on Epsilon Engine
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptxObject Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptx
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks Presentation
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Aerial detection part3
Aerial detection part3Aerial detection part3
Aerial detection part3
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
 

Recently uploaded

Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

Fast methods for deep learning based object detection

  • 1. Fast Methods for Deep Learning based Object Detection
  • 2. R-CNN: Problems ● Training is a multi-stage pipeline. ○ R-CNN first finetunes a ConvNet on object proposals using log loss. ○ Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. ○ In the third training stage, bounding-box regressors are learned. ● Training is expensive in space and time. ○ For SVM and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. ○ With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the VOC07 trainval set. These features require hundreds of gigabytes of storage. ● Object detection is slow. ○ At test-time, features are extracted from each object proposal in each test image. ○ Detection with VGG16 takes 47s / image (on a GPU).
  • 11. ● Only calculate features once. ● ROI Pooling layer extracts constant length vector representations of proposals. ● Classify and regress bounding boxes with multi purpose loss for end-to-end training. Fast R-CNN
  • 12. Fast R-CNN: ROI Pooling
  • 13. Fast R-CNN: ROI Pooling
  • 14. Fast R-CNN: ROI Pooling
  • 15. Fast R-CNN: ROI Pooling
  • 16. Fast R-CNN: ROI Pooling
  • 17. ● Instead of SVM + bounding box regression: ○ SoftMax classifier output ○ Bounding box regression output ● Multi-task training: Fast R-CNN
  • 18. ● Advantages ○ Training is single-stage, using a multi-task loss ○ Training can update all network layers ○ No disk storage is required for feature caching ○ More accurate 66.9mAP vs 66.0mAP. ○ Faster training time 9.5h vs 84h (x8.8) ○ Faster test time per image: 0.32s vs 47s (x146) ● Problem ○ Test time don’t include region proposals. ○ Test time with region proposals: 2s vs 50s (x25) ● Solution ○ Make the CNN do region proposals too! Fast R-CNN
  • 19. ● Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015) ○ Shaoqing Ren, Kaiming He, Ross Girshick ● Insert a Region Proposal Network (RPN) after the last convolutional layer. ● RPN trained to produce region proposals directly; no need for external region proposals! ● After RPN, use RoI Pooling and an upstream classifier and bbox regressor just like Fast R-CNN. Faster R-CNN
  • 20. ● Slide a small window on the already computed feature map (FREE!). ● Build a small network for: ○ Classifying object or not-object, and ○ Regressing bbox locations ● Position of the sliding window provides localization information with reference to the image. ● Box regression provides finer localization information with reference to this sliding window Faster R-CNN: RPN
  • 21. ● In the paper: Ugly pipeline ○ Use alternating optimization to train RPN, then Fast R-CNN with RPN proposals, etc. ○ More complex than it has to be ● Since publication: Joint training! ○ One network, four losses ■ RPN classification (anchor good / bad) ■ RPN regression (anchor -> proposal) ■ Fast R-CNN classification (over classes) ■ Fast R-CNN regression (proposal -> box) Faster R-CNN: Training
  • 22. How Many Anchors Do We Need?
  • 23. How Many Proposals Do We Need? ● Fast R-CNN used 2000 proposals from selective search. ● Faster R-CNN needs only 300 proposals from the RPN. ● RPN is better than selective search ○ Deep learning vs. classical computer vision ○ Optimized for this task
  • 24. How Much Data Do We Need?
  • 25. Also Read: R-FCN: Object Detection via Region-based Fully Convolutional Networks https://arxiv.org/abs/1605.06409
  • 27. Another Approach For Speeding Up Proposals Just Don’t Do It
  • 28. Just RPN From Faster R-CNN ● Much faster than Faster R-CNN! ● But RPN had only object/not object classifier.
  • 29. Add Classification! ● What about accuracy? ● How well does it handle different object scales?
  • 32. SSD: Single Shot MultiBox Detector
  • 33. SSD: Single Shot MultiBox Detector
  • 34. Why Does Stride Matter? ● Smaller stride means more scanned windows. ● Handles close objects better. ○ Need to have enough default boxes to do accurate matching in each. ● Handles small objects better. ○ Better IoU with objects. ○ More positive windows per object. ● Too little stride is bad ○ Too many windows means too many false positives to filter.
  • 35. Improving Accuracy ● Object detection data is unbalanced ○ 1-30 True Positives per image. ○ 8,000 - 25,000 False Positives per image. ● Solution ○ Resample at fixed ratio (1:3) ● Not all negatives are equal! ○ Some are harder than others ● Better Solution ○ Hard negative mining: resample worst-misclassified false positives at fixed ratio.
  • 36. Improving Accuracy ● Not enough data? ● Solution: Data augmentation ○ Random horizontal flip ○ Random crop ○ Random color distortion ○ Random expansion
  • 37. How Much Does It Help?
  • 38. Also Read: YOLO9000: Better, Faster, Stronger https://arxiv.org/abs/1612.08242
  • 39. Speed/accuracy factors in object detectors ● Algorithm: Faster R-CNN / SSD / R-FCN / YOLO / ... ● Backbone: VGG16 / ResNet / MobileNet / etc… ● Input size ● Many other hyperparameters...
  • 40. Speed/accuracy trade-offs for modern convolutional object detectors (Google)
  • 41. Frameworks ● Caffe ○ Faster R-CNN: https://github.com/rbgirshick/py-faster-rcnn ○ SSD: https://github.com/weiliu89/caffe/tree/ssd ● Tensorflow Object Detection API: ○ https://github.com/tensorflow/models/tree/master/research/object_detection ● Detectron: ○ https://github.com/facebookresearch/Detectron ● Many more re-implementations in different languages...
  • 42. Honorable mentions ● VGG16: https://arxiv.org/abs/1409.1556 ● ResNet: https://arxiv.org/abs/1512.03385 ● Inception-ResNet: https://arxiv.org/abs/1602.07261 ● ResNeXt: https://arxiv.org/abs/1611.05431 ● Xception: https://arxiv.org/abs/1610.02357 ● DenseNet: https://arxiv.org/abs/1608.06993 ● MobileNet: https://arxiv.org/abs/1704.04861 ● SqueezeNet: https://arxiv.org/abs/1602.07360
  • 43. Looking for brilliant researchers cv@brodmann17.com