SlideShare a Scribd company logo
Gang Yu
旷 视 研 究 院
Object Detection in Recent 3 Years
Beyond RetinaNet and Mask R-CNN
Schedule of Tutorial
• Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu)
• Lecture 2: AutoML for Object Detection (Xiangyu Zhang)
• Lecture 3: Finegrained Visual Analysis (Xiu-shen Wei)
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
What is object detection?
What is object detection?
Detection - Evaluation Criteria
Average Precision (AP) and mAP
Figures are from wikipedia
Detection - Evaluation Criteria
mmAP
Figures are from http://cocodataset.org
How to perform a detection?
• Sliding window: enumerate all the windows (up to millions of windows)
• VJ detector: cascade chain
• Fully Convolutional network
• shared computation
Robust Real-time Object Detection; Viola, Jones; IJCV 2001
http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
General Detection Before Deep Learning
• Feature + classifier
• Feature
• Haar Feature
• HOG (Histogram of Gradient)
• LBP (Local Binary Pattern)
• ACF (Aggregated Channel Feature)
• …
• Classifier
• SVM
• Bootsing
• Random Forest
Traditional Hand-crafted Feature: HoG
Traditional Hand-crafted Feature: HoG
General Detection Before Deep Learning
Traditional Methods
• Pros
• Efficient to compute (e.g., HAAR, ACF) on CPU
• Easy to debug, analyze the bad cases
• reasonable performance on limited training data
• Cons
• Limited performance on large dataset
• Hard to be accelerated by GPU
Deep Learning for Object Detection
Based on the whether following the “proposal and refine”
• One Stage
• Example: Densebox, YOLO (YOLO v2), SSD, Retina Net
• Keyword: Anchor, Divide and conquer, loss sampling
• Two Stage
• Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN,
MaskRCNN
• Keyword: speed, performance
A bit of History
Image
Feature
Extractor
classification
localization
(bbox)
One stage detector
Densebox (2015) UnitBox (2016) EAST (2017)
YOLO (2015) Anchor Free
Anchor importedYOLOv2 (2016)
SSD (2015)
RON(2017)
RetinaNet(2017)
DSSD (2017)
two stages detector
Image
Feature
Extractor
classification
localization
(bbox)
Proposal
classification
localization
(bbox)
Refine
RCNN (2014) Fast RCNN(2015)
Faster RCNN (2015)
RFCN (2016)
MultiBox(2014)
RFCN++ (2017)
FPN (2017)
Mask RCNN (2017)
OverFeat(2013)
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Modern Object detectors
Backbone Head
• Modern object detectors
• RetinaNet
• f1-f7 for backbone, f3-f7 with 4 convs for head
• FPN with ROIAlign
• f1-f6 for backbone, two fcs for head
• Recall vs localization
• One stage detector: Recall is high but compromising the localization ability
• Two stage detector: Strong localization ability
Postprocess
NMS
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
Two-Stage detector: FPN/Mask R-CNN
• FPN Structure
• ROIAlign
Mask R-CNN, He etc, ICCV 2017 Best paper
What is next for object detection?
• The pipeline seems to be mature
• There still exists a large gap between existing state-of-arts and product
requirements
• The devil is in the detail
Outline
• Introduction to Object Detection
• Modern Object detectors
• One Stage detector vs Two-stage detector
• Challenges
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-Grained
• Conclusion
Challenges Overview
• Backbone
• Head
• Pretraining
• Scale
• Batch Size
• Crowd
• NAS
• Fine-grained
Backbone Head Postprocess
NMS
Challenges - Backbone
• Backbone network is designed for classification task but not for
localization task
• Receptive Field vs Spatial resolution
• Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
Backbone - DetNet
• DetNet: A Backbone network for Object Detection, Li etc, 2018,
https://arxiv.org/pdf/1804.06215.pdf
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Backbone - DetNet
Challenges - Head
• Speed is significantly improved for the two-stage detector
• RCNN - > Fast RCNN -> Faster RCNN - > RFCN
• How to obtain efficient speed as one stage detector like YOLO, SSD?
• Small Backbone
• Light Head
Head – Light head RCNN
• Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017,
https://arxiv.org/pdf/1711.07264.pdf
Code: https://github.com/zengarden/light_head_rcnn
Head – Light head RCNN
• Backbone
• L: Resnet101
• S: Xception145
• Thin Feature map
• L:C_{mid} = 256
• S: C_{mid} =64
• C_{out} = 10 * 7 * 7
• R-CNN subnet
• A fc layer is connected to the PS ROI pool/Align
Head – Light head RCNN
Head – Light head RCNN
Head – Light head RCNN
• Mobile Version
• ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv
2019
• https://arxiv.org/abs/1903.11752
Pretraining – Objects365
• ImageNet pretraining is usually employed for backbone training
• Training from Scratch
• Scratch Det claims GN/BN is important
• Rethinking ImageNet Pretraining validates that training time is important
Pretraining – Objects365
• Objects365 Dataset
Pretraining – Objects365
• Pretraining with Objects365 vs ImageNet vs from Sctratch
Pretraining – Objects365
• Pretraining on Backbone or Pretraining on both backbone and head
Pretraining – Objects365
• Results on VOC Detection & VOC Segmentation
Pretraining – Objects365
• Summary
• Pretraining is important to reduce the training time
• Pretraining with a large dataset is beneficial for the performance
Challenges - Scale
• Scale variations is extremely large for object detection
Challenges - Scale
• Scale variations is extremely large for object detection
• Previous works
• Divide and Conquer: SSD, DSSD, RON, FPN, …
• Limited Scale variation
• Scale Normalization for Image Pyramids, Singh etc, CVPR2018
• Slow inference speed
• How to address extremely large scale variation without compromising
inference speed?
Scale - SFace
• SFace: An Efficient Network for Face Detection in Large Scale Variations,
2018, http://cn.arxiv.org/pdf/1804.06559.pdf
• Anchor-based:
• Good localization for the scales which are covered by anchors
• Difficult to address all the scale ranges of faces
• Anchor-free:
• Able to cover various face scales
• Not good for the localization ability
Scale - SFace
Scale - SFace
Scale - SFace
Scale - SFace
• Summary:
• Integrate anchor-based and anchor-free for the scale issue
• A new benchmark for face detection with large scale variations: 4K Face
Challenges - Batchsize
• Small mini-batchsize for general object detection
• 2 for R-CNN, Faster RCNN
• 16 for RetinaNet, Mask RCNN
• Problem with small mini-batchsize
• Long training time
• Insufficient BN statistics
• Inbalanced pos/neg ratio
Batchsize – MegDet
• MegDet: A Large Mini-Batch Object Detector, CVPR2018,
https://arxiv.org/pdf/1711.07240.pdf
Batchsize – MegDet
• Techniques
• Learning rate warmup
• Cross-GPU Batch Normalization
Challenges - Crowd
• NMS is a post-processing step to eliminate multiple responses on one object
instance
• Reasonable for mild crowdness like COCO and VOC
• Will Fail in the case when the objects are in a crowd
Challenges - Crowd
• A few works have been devoted to this topic
• Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf
• Relation Networks, Hu etc, CVPR 2018,
https://arxiv.org/pdf/1711.11575.pdf
• Lacking a good benchmark for evaluation in the literature
Crowd - CrowdHuman
• CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018,
https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/
• A benchmark with Head, Visible Human, Full body bounding-box
• Generalization ability for other head/pedestrian datasets
• Crowdness
Crowd - CrowdHuman
Crowd-CrowdHuman
Crowd-CrowdHuman
• Generalization
• Head
• Pedestrian
• COCO
Conclusion
• The task of object detection is still far from solved
• Details are important to further improve the performance
• Backbone
• Head
• Pretraining
• Scale
• Batchsize
• Crowd
• The improvement of object detection will be a significantly boost for the
computer vision industry
广告部分
• Megvii Detection 知乎专栏
Email: yugang@megvii.com
Object Detection Beyond Mask R-CNN and RetinaNet I

More Related Content

What's hot

Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
changedaeoh
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
NamHyuk Ahn
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN
Takashi Abe
 
【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2
Hirokatsu Kataoka
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
Sungbin Lim
 
物体検知(Meta Study Group 発表資料)
物体検知(Meta Study Group 発表資料)物体検知(Meta Study Group 発表資料)
物体検知(Meta Study Group 発表資料)
cvpaper. challenge
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
DaeHeeKim31
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
Masahiro Suzuki
 
【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes
Deep Learning JP
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
anna8885
 
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Yusuke Uchida
 
複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査
Tomoki Hayashi
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
harmonylab
 
[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations
Deep Learning JP
 

What's hot (20)

Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN
 
【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
物体検知(Meta Study Group 発表資料)
物体検知(Meta Study Group 発表資料)物体検知(Meta Study Group 発表資料)
物体検知(Meta Study Group 発表資料)
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes【DL輪読会】HexPlaneとK-Planes
【DL輪読会】HexPlaneとK-Planes
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
 
複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査複数話者WaveNetボコーダに関する調査
複数話者WaveNetボコーダに関する調査
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
 
[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations
 

Similar to Object Detection Beyond Mask R-CNN and RetinaNet I

Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
Wanjin Yu
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
NAVER Engineering
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
Junho Cho
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Edge AI and Vision Alliance
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
Intel Nervana
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
Aly Abdelkareem
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
Hiroki Nakahara
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
ExtremeEarth
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
Databricks
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
United States Air Force Academy
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
AshrafDabbas1
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
OBDPC 2022
OBDPC 2022OBDPC 2022
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
cppfrug
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
Plamen Petrov
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
Intel® Software
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
Moazzem Hossain
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
LEE HOSEONG
 

Similar to Object Detection Beyond Mask R-CNN and RetinaNet I (20)

Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 

More from Wanjin Yu

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
Wanjin Yu
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
Wanjin Yu
 
Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
Wanjin Yu
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
Wanjin Yu
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
Wanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
Wanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
Wanjin Yu
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Wanjin Yu
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Wanjin Yu
 

More from Wanjin Yu (15)

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
 
Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

Recently uploaded

一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 

Recently uploaded (20)

一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 

Object Detection Beyond Mask R-CNN and RetinaNet I

  • 1. Gang Yu 旷 视 研 究 院 Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN
  • 2. Schedule of Tutorial • Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu) • Lecture 2: AutoML for Object Detection (Xiangyu Zhang) • Lecture 3: Finegrained Visual Analysis (Xiu-shen Wei)
  • 3. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 4. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 5. What is object detection?
  • 6. What is object detection?
  • 7. Detection - Evaluation Criteria Average Precision (AP) and mAP Figures are from wikipedia
  • 8. Detection - Evaluation Criteria mmAP Figures are from http://cocodataset.org
  • 9. How to perform a detection? • Sliding window: enumerate all the windows (up to millions of windows) • VJ detector: cascade chain • Fully Convolutional network • shared computation Robust Real-time Object Detection; Viola, Jones; IJCV 2001 http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
  • 10. General Detection Before Deep Learning • Feature + classifier • Feature • Haar Feature • HOG (Histogram of Gradient) • LBP (Local Binary Pattern) • ACF (Aggregated Channel Feature) • … • Classifier • SVM • Bootsing • Random Forest
  • 13. General Detection Before Deep Learning Traditional Methods • Pros • Efficient to compute (e.g., HAAR, ACF) on CPU • Easy to debug, analyze the bad cases • reasonable performance on limited training data • Cons • Limited performance on large dataset • Hard to be accelerated by GPU
  • 14. Deep Learning for Object Detection Based on the whether following the “proposal and refine” • One Stage • Example: Densebox, YOLO (YOLO v2), SSD, Retina Net • Keyword: Anchor, Divide and conquer, loss sampling • Two Stage • Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN, MaskRCNN • Keyword: speed, performance
  • 15. A bit of History Image Feature Extractor classification localization (bbox) One stage detector Densebox (2015) UnitBox (2016) EAST (2017) YOLO (2015) Anchor Free Anchor importedYOLOv2 (2016) SSD (2015) RON(2017) RetinaNet(2017) DSSD (2017) two stages detector Image Feature Extractor classification localization (bbox) Proposal classification localization (bbox) Refine RCNN (2014) Fast RCNN(2015) Faster RCNN (2015) RFCN (2016) MultiBox(2014) RFCN++ (2017) FPN (2017) Mask RCNN (2017) OverFeat(2013)
  • 16. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 17. Modern Object detectors Backbone Head • Modern object detectors • RetinaNet • f1-f7 for backbone, f3-f7 with 4 convs for head • FPN with ROIAlign • f1-f6 for backbone, two fcs for head • Recall vs localization • One stage detector: Recall is high but compromising the localization ability • Two stage detector: Strong localization ability Postprocess NMS
  • 18. One Stage detector: RetinaNet • FPN Structure • Focal loss Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
  • 19. One Stage detector: RetinaNet • FPN Structure • Focal loss Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
  • 20. Two-Stage detector: FPN/Mask R-CNN • FPN Structure • ROIAlign Mask R-CNN, He etc, ICCV 2017 Best paper
  • 21. What is next for object detection? • The pipeline seems to be mature • There still exists a large gap between existing state-of-arts and product requirements • The devil is in the detail
  • 22. Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion
  • 23. Challenges Overview • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-grained Backbone Head Postprocess NMS
  • 24. Challenges - Backbone • Backbone network is designed for classification task but not for localization task • Receptive Field vs Spatial resolution • Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
  • 25. Backbone - DetNet • DetNet: A Backbone network for Object Detection, Li etc, 2018, https://arxiv.org/pdf/1804.06215.pdf
  • 31. Challenges - Head • Speed is significantly improved for the two-stage detector • RCNN - > Fast RCNN -> Faster RCNN - > RFCN • How to obtain efficient speed as one stage detector like YOLO, SSD? • Small Backbone • Light Head
  • 32. Head – Light head RCNN • Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017, https://arxiv.org/pdf/1711.07264.pdf Code: https://github.com/zengarden/light_head_rcnn
  • 33. Head – Light head RCNN • Backbone • L: Resnet101 • S: Xception145 • Thin Feature map • L:C_{mid} = 256 • S: C_{mid} =64 • C_{out} = 10 * 7 * 7 • R-CNN subnet • A fc layer is connected to the PS ROI pool/Align
  • 34. Head – Light head RCNN
  • 35. Head – Light head RCNN
  • 36. Head – Light head RCNN • Mobile Version • ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv 2019 • https://arxiv.org/abs/1903.11752
  • 37. Pretraining – Objects365 • ImageNet pretraining is usually employed for backbone training • Training from Scratch • Scratch Det claims GN/BN is important • Rethinking ImageNet Pretraining validates that training time is important
  • 38. Pretraining – Objects365 • Objects365 Dataset
  • 39. Pretraining – Objects365 • Pretraining with Objects365 vs ImageNet vs from Sctratch
  • 40. Pretraining – Objects365 • Pretraining on Backbone or Pretraining on both backbone and head
  • 41. Pretraining – Objects365 • Results on VOC Detection & VOC Segmentation
  • 42. Pretraining – Objects365 • Summary • Pretraining is important to reduce the training time • Pretraining with a large dataset is beneficial for the performance
  • 43. Challenges - Scale • Scale variations is extremely large for object detection
  • 44. Challenges - Scale • Scale variations is extremely large for object detection • Previous works • Divide and Conquer: SSD, DSSD, RON, FPN, … • Limited Scale variation • Scale Normalization for Image Pyramids, Singh etc, CVPR2018 • Slow inference speed • How to address extremely large scale variation without compromising inference speed?
  • 45. Scale - SFace • SFace: An Efficient Network for Face Detection in Large Scale Variations, 2018, http://cn.arxiv.org/pdf/1804.06559.pdf • Anchor-based: • Good localization for the scales which are covered by anchors • Difficult to address all the scale ranges of faces • Anchor-free: • Able to cover various face scales • Not good for the localization ability
  • 49. Scale - SFace • Summary: • Integrate anchor-based and anchor-free for the scale issue • A new benchmark for face detection with large scale variations: 4K Face
  • 50. Challenges - Batchsize • Small mini-batchsize for general object detection • 2 for R-CNN, Faster RCNN • 16 for RetinaNet, Mask RCNN • Problem with small mini-batchsize • Long training time • Insufficient BN statistics • Inbalanced pos/neg ratio
  • 51. Batchsize – MegDet • MegDet: A Large Mini-Batch Object Detector, CVPR2018, https://arxiv.org/pdf/1711.07240.pdf
  • 52. Batchsize – MegDet • Techniques • Learning rate warmup • Cross-GPU Batch Normalization
  • 53. Challenges - Crowd • NMS is a post-processing step to eliminate multiple responses on one object instance • Reasonable for mild crowdness like COCO and VOC • Will Fail in the case when the objects are in a crowd
  • 54. Challenges - Crowd • A few works have been devoted to this topic • Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf • Relation Networks, Hu etc, CVPR 2018, https://arxiv.org/pdf/1711.11575.pdf • Lacking a good benchmark for evaluation in the literature
  • 55. Crowd - CrowdHuman • CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018, https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/ • A benchmark with Head, Visible Human, Full body bounding-box • Generalization ability for other head/pedestrian datasets • Crowdness
  • 59. Conclusion • The task of object detection is still far from solved • Details are important to further improve the performance • Backbone • Head • Pretraining • Scale • Batchsize • Crowd • The improvement of object detection will be a significantly boost for the computer vision industry
  • 60. 广告部分 • Megvii Detection 知乎专栏 Email: yugang@megvii.com