YOLO series:
v1~v5, X, F, and YOWO
,
2022/06/17
◼
•
• (bbox)
•
• bbox
◼
• DPM [Yan+, CVPR2011]
• R-CNN [Girshick+, CVPR2014]
• YOLO [Redmon+, CVPR2016]
• SSD [Liu+, arXiv2017]
• EfficientDet [Tan+, CVPR2020]
• DeiT [Touvron+, arXiv2020]
dog
bicycle
[Joseph+, CVPR2016]
YOLOv1 [Redmon+, CVPR2016]
◼1
•
• :
• :
◼2
• DPM
• R-CNN
◼YOLOv1
•
•
•
YOLOv1:
◼ S × S YOLOv1 Network
• : 24 𝑆 × 𝑆 × 𝑐ℎ𝑎𝑛𝑛𝑒𝑙
• : 2 𝑆 × 𝑆 × (5 × 2 + )
◼ 2 bbox
• bbox : (𝑏𝑥, 𝑏𝑦), 𝑏𝑤, 𝑏ℎ
• Confidence : 𝑡𝑂
• : 𝑡0, 𝑡1 … , 𝑡𝑐
bbox1
Feature map
YOLOv1 Network
𝑆
𝑆
5
𝑆
𝑆
C
class
5
bbox2
YOLOv1 Network
7 × 7
: 20 : 24 :2
bbox1 bbox2
◼
• bbox
• : (𝑋, 𝑌)
• : 𝑊𝑖𝑑𝑡ℎ
• : 𝐻𝑒𝑖𝑔ℎ𝑡
• 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
•
• 𝑃 𝐶𝑎𝑡 𝑂𝑏𝑗𝑒𝑐𝑡 …
YOLOv2 [Redmon&Farhadi, CVPR2017]
◼YOLOv1
•
•
• 448 × 448
•
• 5
•
• K-means (𝑘 = 5)
•
• bbox
•
•
•
•
Darknet-19
: 18
: 1
YOLOv3 [Redmon&Farhadi, arXiv2018]
◼YOLOv2
• FPN
•
•
• v2
◼FPN [Lin+, CVPR2017]
•
(YOLOv3: 3 )
•
• bbox, confidence,
◼YOLOv3
•
YOLOv2
YOLOv3
Single feature map
Feature Pyramid Network
Darknet-53
: 52
: 1
YOLOv4 [Bochkovskiy+, arXiv2020]​
◼Backbone
• CSPNet [Wang+, CVPR2020]
◼Head
• YOLOv3
◼Neck
• Spatial Pyramid Pooling (SPP)
[He+, TPAMI2015]
• Path Aggregation Network (PAN)
[Liu+, CVPR2018]
CSPNet
◼ 2
• Dense Block
• Transition Layer
◼
•
•
Spatial pyramid pooling (SPP)
◼
Path Aggregation Network (PAN)
◼FPN (a) Bottom-up (b)
FPN
YOLOv5 YOWO
◼YOLOv5 [Glenn, GitHub2020]
• YOLOv4
•
•
• YOLOv4 : C
• YOLOv5 : Python
◼YOWO [Köpüklü+, arXiv2021]
• You Only Watch Once
•
YOLOX
◼YOLOX: Exceeding YOLO Series in 2021 [Ge+, arXiv2021]
◼YOLOv3~v5
• P(Obj)
◼YOLOX
•
• 1anchor box/anchor
• Multi positives
• positive
• bbox
positive
V3~v5
YOLOF
◼You Only Look One-level Feature [Chen+, CVPR2021]
•
•
•
•Single in Single out
◼YOLOv1
• 1
• Bbox
◼YOLOv2
• v1
•
• Bbox
◼YOLOv3
• FPN
◼YOLOv4
• YOLOv3 CSPNet, PAN
◼YOLOv5
• YOLOv4
•
• YOLOv4 : C
• YOLOv5 : Python
◼YOWO
•
◼YOLOX
•
◼YOLOF
• Single-in Single-out
◼You Only Look Once: Unified, Real-Time Object Detection
[Redom+, CVPR2016]
◼YOLO9000: Better, Faster, Stronger [Redmon+, CVPR2017]
◼YOLOv3: An Incremental Improvement [Redmon+, arXiv2018]
◼YOLOv4: Optimal Speed and Accuracy of Object Detection
[Bochkovskiy+, arXiv2020]​
◼You Only Watch Once [Köpüklü+, arXiv2021]
◼YOLOX: Exceeding YOLO Series in 2021 [Ge+, arXiv2021]
◼You Only Look One-level Feature [Chen+, CVPR2021]
Input
(416,416,32)
Conv
Concatenate
Concatenate
Conv
Conv Conv Detection Result2
Conv Conv Detection Result1
Detection Result3
Conv+
Upsample
Conv+
Upsample
Residual
Conv
Residual
Conv
Residual
Scale 1
Scale 2
Scale 3
Conv
Residual
Conv
Residual
Conv
Conv

文献紹介:YOLO series:v1-v5, X, F, and YOWO