文献紹介：A Survey of Deep Learning-Based Object Detection

A Survey of
Deep Learning-Based
Object Detection
Jiao, Licheng and Zhang, Fan and Liu, Fang and Yang,
Shuyuan and Li, Lingling and Feng, Zhixi and Qu, Rong
IEEE Access, 2019
,
2022/06/17

◼
◼
• two-stage
• one-stage
• 2019
◼
◼
◼
◼

◼
•
◼
•
•
•
•
•
•
VisDrone2018
[Shindai+, ICRA 2019]
[Chen+, CVPR2018]

two-stage one-stage
◼two-stage
• Faster R-CNN [Ren+, NeurIPS2015]
◼one-stage
◼one-stage
• YOLO [Redmon+, CVPR2016]
• SSD [Liu+, ECCV2016]
◼two-stage
BBox
two-stage one-stage
end-to-end

R-CNN Fast R-CNN
◼R-CNN [Girshick+, CVPR2014]
• CNN
• SVM
•
• CNN
•
◼Fast R-CNN [Girshick, ICCV2015]
•
• RoI region of interest pooling
• region proposal

R-CNN
◼Faster R-CNN [Ren+, NeurIPS2015]
• RPN region proposal network multi-scale anchors
Fast R-CNN
•
◼Mask R-CNN [He+, ICCV2017]
• ResNet [He+, CVPR2016] -FPN
[Lin+, CVPR2017]
• RoI pooling RoIAlign
• 1
◼Cascade R-CNN
[Cai and Vasconcelos, CVPR2018]
• IoU
RoIAlign

SSD Single Shot Detection
◼ DBox
• BBox NMS
Localization, confidence
38
38
19
19
19
19
10
10
5
5
3
3
1
1
Non-
maximum
suppression
conv
conv
conv
conv
conv
conv
300 300
[Liu+, ECCV2016]

NMS Non-maximum Suppression
◼BBox
• confidence score BBox
• BBox IoU
confidence score BBox
Non-
maximum
suppression
BBox
[Liu+, ECCV2016]
𝐼𝑜𝑈 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑂𝑣𝑒𝑟𝑙𝑎𝑝
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛

one-stage
◼Feature Pyramid Networks
• RetinaNet [Lin+, ICCV2017]
• Focal Loss
• M2Det [Zhao+, AAAI2018]
• Multi-Level FPN
◼RefineDet [Zhang+, CVPR2018]
• one-stage two-stage
RetinaNet RefineDet

Relatioal Networks [Hu+, CVPR2018]
◼SSD NMS BBox
•
◼object relation module
•
•
• end to end BBox object relation module

DCNv2 [Zhu+, CVPR2019]
◼DCN [Dai+, ICCV2017]
• receptive field
◼Modulated deformable convolution
• Modulation deformable RoI pooling
standard convolution deformable convolution
3 3

NAS-FPN [Ghiasi+, CVPR2019]
◼NAS Neural Architecture Search FPN
• RNN Controller
(b)-(f)
NAS-FPN / Proxy task AP

◼PASCAL VOC
◼COCO
• COCO mAP
◼ImageNet
◼VisDrone 2018
◼Open Images
◼Pedestrian detection datasets
• Caltech
• KITTI
• CityPersons
• TDC
• EuroCity Persons

AP mAP COCO mAP
◼Precision Recall IoU 0.5
• Precision =
BBox(IoU≥0.5)
BBox (all)
• Recall =
BBox(IoU≥0.5)
Gt BBox (all)
◼AP Average Precision
• AP = ‫׬‬
0
1
p r dr
• Recall vs Precision AP
•
◼mAP
• AP
• COCO IoU = [0.5, 0.55, … , 0.95] mAP
BBox / BBox
BBox / BBox

◼FPN
• MASK R-CNN, NAS-FPN, FCOS [Tian+, ICCV2019]
◼SSD
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
◼
• RefineDet, R-DAD [Bae, AAAI2019]
◼
• Attention mechanism [Zhang & Kim, CVPR2019]
• SSD [Kong+, ECCV2018]
◼
• DCN DCNv2 15

loss
◼IoU loss
• Unit Box [Yu+, ACM MM 2016]
◼ BBox regression loss
• BBox
[He+, CVPR2019]
• Softer-NMS [He+, arXiv2019]
◼
• Axially Localized Detection
[Cabriel+, nature
communicaitions2019]
◼one-stage
• Hard negative mining
[Bucher+, arXiv2016]
◼ Hard mining
• IoU-balanced sampling
[Pang+, CVPR2019]
◼loss
• RetinaNet
• AP-loss
[Chen+, CVPR2019]

NMS
◼NMS
• Relation Networks 14
◼ BBox Gt BBox IoU
• IoU-Net learning [Jiang+, ECCV2018]
◼IoU Confidence score
• Fitness NMS [Tychsen-Smith & Petersson, CVPR2018]
◼NMS
• Softer-NMS [He+, arXiv2019]

1
◼
•
◼SSD
• [Jeong+, arXiv2017]
• Context-Aware SSD
[Xiang+, arXiv2018]
◼GAN [Goodfellow, NeurIPS2014]
• Perceptual GAN [Li+, CVPR2017]
◼
◼
• Face Attention Network
[Wang+, arXiv2017]
◼
• Reputation loss
[Wang+, IEEE Access 2018]
• Occlusion-aware R-CNN
[Zhang+, ECCV2018]

2
◼
•
•
• anchor BBox
Faster R-CNN
SSD

anchor-free
◼anchor
• anchor
• anchor
•
◼anchor-free
• CornerNet [Law and Deng, ECCV2018]
• FCOS [Tian+, ICCV2019]
•
• CenterNet [Duan+, ICCV2019]

◼
• YOLO YOLO9000 [Redmon & Farhadi, CVPR2017]
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
• Pelee [Wang+, NeurIPS2018]
◼
• RetinaNet 12
• RFBNet [Liu+, ECCV2018]
• pRF
RFBNet RFB module

◼
• ScrachDet [Zhu+, CVPR2019]
•
◼
• DetNet [Li+, ECCV2018]
•
• Light-Head R-CNN [Li+, arXiv2017]
• two-stage

◼
[Braun, arXiv2018]
1
2
3
4

◼
•
ISPRS dataset [Audebert+, MDPI 2017]
True positive false positive Grand truth

◼
[Li+, arXiv2019]
1
2
3 CAM[Zhou+, arXiv2015]
4 ablation study

◼
1 3 RetinaNet
2 3
SKU-110K
[Goldman+, CVPR2019]
RetinaNet

◼
•
◼
•
• NMS
• confidence
◼

文献紹介：A Survey of Deep Learning-Based Object Detection

Recommended

Recommended

More Related Content

More from Toru Tamaki

More from Toru Tamaki (20)

Recently uploaded

Recently uploaded (12)

文献紹介：A Survey of Deep Learning-Based Object Detection