SlideShare a Scribd company logo
1 of 38
Download to read offline
A Survey of
Deep Learning-Based
Object Detection
Jiao, Licheng and Zhang, Fan and Liu, Fang and Yang,
Shuyuan and Li, Lingling and Feng, Zhixi and Qu, Rong
IEEE Access, 2019
,
2022/06/17
◼
◼
• two-stage
• one-stage
• 2019
◼
◼
◼
◼
◼
•
◼
•
•
•
•
•
•
VisDrone2018
[Shindai+, ICRA 2019]
[Chen+, CVPR2018]
two-stage one-stage
◼two-stage
• Faster R-CNN [Ren+, NeurIPS2015]
◼one-stage
◼one-stage
• YOLO [Redmon+, CVPR2016]
• SSD [Liu+, ECCV2016]
◼two-stage
BBox
two-stage one-stage
end-to-end
two-stage
R-CNN Fast R-CNN
◼R-CNN [Girshick+, CVPR2014]
• CNN
• SVM
•
• CNN
•
◼Fast R-CNN [Girshick, ICCV2015]
•
• RoI region of interest pooling
• region proposal
R-CNN
◼Faster R-CNN [Ren+, NeurIPS2015]
• RPN region proposal network multi-scale anchors
Fast R-CNN
•
◼Mask R-CNN [He+, ICCV2017]
• ResNet [He+, CVPR2016] -FPN
[Lin+, CVPR2017]
• RoI pooling RoIAlign
• 1
◼Cascade R-CNN
[Cai and Vasconcelos, CVPR2018]
• IoU
RoIAlign
one-stage
SSD Single Shot Detection
◼ DBox
• BBox NMS
Localization, confidence
38
38
19
19
19
19
10
10
5
5
3
3
1
1
Non-
maximum
suppression
conv
conv
conv
conv
conv
conv
300 300
[Liu+, ECCV2016]
NMS Non-maximum Suppression
◼BBox
• confidence score BBox
• BBox IoU
confidence score BBox
Non-
maximum
suppression
BBox
[Liu+, ECCV2016]
𝐼𝑜𝑈 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑂𝑣𝑒𝑟𝑙𝑎𝑝
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛
one-stage
◼Feature Pyramid Networks
• RetinaNet [Lin+, ICCV2017]
• Focal Loss
• M2Det [Zhao+, AAAI2018]
• Multi-Level FPN
◼RefineDet [Zhang+, CVPR2018]
• one-stage two-stage
RetinaNet RefineDet
2019
Relatioal Networks [Hu+, CVPR2018]
◼SSD NMS BBox
•
◼object relation module
•
•
• end to end BBox object relation module
DCNv2 [Zhu+, CVPR2019]
◼DCN [Dai+, ICCV2017]
• receptive field
◼Modulated deformable convolution
• Modulation deformable RoI pooling
standard convolution deformable convolution
3 3
NAS-FPN [Ghiasi+, CVPR2019]
◼NAS Neural Architecture Search FPN
• RNN Controller
(b)-(f)
NAS-FPN / Proxy task AP
1
DCN
DCNv2
one-stage
2
one-stage
◼PASCAL VOC
◼COCO
• COCO mAP
◼ImageNet
◼VisDrone 2018
◼Open Images
◼Pedestrian detection datasets
• Caltech
• KITTI
• CityPersons
• TDC
• EuroCity Persons
AP mAP COCO mAP
◼Precision Recall IoU 0.5
• Precision =
BBox(IoU≥0.5)
BBox (all)
• Recall =
BBox(IoU≥0.5)
Gt BBox (all)
◼AP Average Precision
• AP = ‫׬‬
0
1
p r dr
• Recall vs Precision AP
•
◼mAP
• AP
• COCO IoU = [0.5, 0.55, … , 0.95] mAP
BBox / BBox
BBox / BBox
◼FPN
• MASK R-CNN, NAS-FPN, FCOS [Tian+, ICCV2019]
◼SSD
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
◼
• RefineDet, R-DAD [Bae, AAAI2019]
◼
• Attention mechanism [Zhang & Kim, CVPR2019]
• SSD [Kong+, ECCV2018]
◼
• DCN DCNv2 15
loss
◼IoU loss
• Unit Box [Yu+, ACM MM 2016]
◼ BBox regression loss
• BBox
[He+, CVPR2019]
• Softer-NMS [He+, arXiv2019]
◼
• Axially Localized Detection
[Cabriel+, nature
communicaitions2019]
◼one-stage
• Hard negative mining
[Bucher+, arXiv2016]
◼ Hard mining
• IoU-balanced sampling
[Pang+, CVPR2019]
◼loss
• RetinaNet
• AP-loss
[Chen+, CVPR2019]
NMS
◼NMS
• Relation Networks 14
◼ BBox Gt BBox IoU
• IoU-Net learning [Jiang+, ECCV2018]
◼IoU Confidence score
• Fitness NMS [Tychsen-Smith & Petersson, CVPR2018]
◼NMS
• Softer-NMS [He+, arXiv2019]
1
◼
•
◼SSD
• [Jeong+, arXiv2017]
• Context-Aware SSD
[Xiang+, arXiv2018]
◼GAN [Goodfellow, NeurIPS2014]
• Perceptual GAN [Li+, CVPR2017]
◼
◼
• Face Attention Network
[Wang+, arXiv2017]
◼
• Reputation loss
[Wang+, IEEE Access 2018]
• Occlusion-aware R-CNN
[Zhang+, ECCV2018]
2
◼
•
•
• anchor BBox
Faster R-CNN
SSD
anchor-free
◼anchor
• anchor
• anchor
•
◼anchor-free
• CornerNet [Law and Deng, ECCV2018]
• FCOS [Tian+, ICCV2019]
•
• CenterNet [Duan+, ICCV2019]
◼
• YOLO YOLO9000 [Redmon & Farhadi, CVPR2017]
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
• Pelee [Wang+, NeurIPS2018]
◼
• RetinaNet 12
• RFBNet [Liu+, ECCV2018]
• pRF
RFBNet RFB module
◼
• ScrachDet [Zhu+, CVPR2019]
•
◼
• DetNet [Li+, ECCV2018]
•
• Light-Head R-CNN [Li+, arXiv2017]
• two-stage
◼
[Hu+, CVPR2017]
◼
[Braun, arXiv2018]
1
2
3
4
◼
•
ISPRS dataset [Audebert+, MDPI 2017]
True positive false positive Grand truth
◼
[Li+, arXiv2017]
◼
[Li+, arXiv2019]
1
2
3 CAM[Zhou+, arXiv2015]
4 ablation study
◼
1 3 RetinaNet
2 3
SKU-110K
[Goldman+, CVPR2019]
RetinaNet
◼
•
◼
•
• NMS
• confidence
◼

More Related Content

More from Toru Tamaki

More from Toru Tamaki (20)

論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
 
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
 

Recently uploaded

研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
atsushi061452
 

Recently uploaded (12)

Hyperledger Fabricコミュニティ活動体験& Hyperledger Fabric最新状況ご紹介
Hyperledger Fabricコミュニティ活動体験& Hyperledger Fabric最新状況ご紹介Hyperledger Fabricコミュニティ活動体験& Hyperledger Fabric最新状況ご紹介
Hyperledger Fabricコミュニティ活動体験& Hyperledger Fabric最新状況ご紹介
 
20240523_IoTLT_vol111_kitazaki_v1___.pdf
20240523_IoTLT_vol111_kitazaki_v1___.pdf20240523_IoTLT_vol111_kitazaki_v1___.pdf
20240523_IoTLT_vol111_kitazaki_v1___.pdf
 
情報を表現するときのポイント
情報を表現するときのポイント情報を表現するときのポイント
情報を表現するときのポイント
 
Amazon Cognitoで実装するパスキー (Security-JAWS【第33回】 勉強会)
Amazon Cognitoで実装するパスキー (Security-JAWS【第33回】 勉強会)Amazon Cognitoで実装するパスキー (Security-JAWS【第33回】 勉強会)
Amazon Cognitoで実装するパスキー (Security-JAWS【第33回】 勉強会)
 
Intranet Development v1.0 (TSG LIVE! 12 LT )
Intranet Development v1.0 (TSG LIVE! 12 LT )Intranet Development v1.0 (TSG LIVE! 12 LT )
Intranet Development v1.0 (TSG LIVE! 12 LT )
 
Keywordmap overview material/CINC.co.ltd
Keywordmap overview material/CINC.co.ltdKeywordmap overview material/CINC.co.ltd
Keywordmap overview material/CINC.co.ltd
 
MPAなWebフレームワーク、Astroの紹介 (その1) 2024/05/17の勉強会で発表されたものです。
MPAなWebフレームワーク、Astroの紹介 (その1) 2024/05/17の勉強会で発表されたものです。MPAなWebフレームワーク、Astroの紹介 (その1) 2024/05/17の勉強会で発表されたものです。
MPAなWebフレームワーク、Astroの紹介 (その1) 2024/05/17の勉強会で発表されたものです。
 
クラウド時代におけるSREとUPWARDの取組ーUPWARD株式会社 CTO門畑
クラウド時代におけるSREとUPWARDの取組ーUPWARD株式会社 CTO門畑クラウド時代におけるSREとUPWARDの取組ーUPWARD株式会社 CTO門畑
クラウド時代におけるSREとUPWARDの取組ーUPWARD株式会社 CTO門畑
 
研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
研究紹介スライド: オフライン強化学習に基づくロボティックスワームの制御器の設計
 
5/22 第23回 Customer系エンジニア座談会のスライド 公開用 西口瑛一
5/22 第23回 Customer系エンジニア座談会のスライド 公開用 西口瑛一5/22 第23回 Customer系エンジニア座談会のスライド 公開用 西口瑛一
5/22 第23回 Customer系エンジニア座談会のスライド 公開用 西口瑛一
 
ロボットマニピュレーションの作業・動作計画 / rosjp_planning_for_robotic_manipulation_20240521
ロボットマニピュレーションの作業・動作計画 / rosjp_planning_for_robotic_manipulation_20240521ロボットマニピュレーションの作業・動作計画 / rosjp_planning_for_robotic_manipulation_20240521
ロボットマニピュレーションの作業・動作計画 / rosjp_planning_for_robotic_manipulation_20240521
 
部内勉強会(IT用語ざっくり学習) 実施日:2024年5月17日(金) 対象者:営業部社員
部内勉強会(IT用語ざっくり学習) 実施日:2024年5月17日(金) 対象者:営業部社員部内勉強会(IT用語ざっくり学習) 実施日:2024年5月17日(金) 対象者:営業部社員
部内勉強会(IT用語ざっくり学習) 実施日:2024年5月17日(金) 対象者:営業部社員
 

文献紹介:A Survey of Deep Learning-Based Object Detection