SlideShare a Scribd company logo
1 of 23
Download to read offline
YOLOv3:
An Incremental Improvement
Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”
17th November, 2019
PR12 Paper Review
JinWon Lee
Samsung Electronics
Related PR12
• PR-016:You only look once: Unified, real-time object detection :
https://youtu.be/eTDcoeqj1_w
• PR-023:YOLO9000: Better, Faster, Stronger :
https://youtu.be/6fdclSGgeio
References
• What’s new inYOLO v3?
https://towardsdatascience.com/yolo-v3-object-detection-
53fb7d3bfe6b
• How to implement aYOLO (v3) object detector from scratch in
PyTorch
https://blog.paperspace.com/how-to-implement-a-yolo-object-
detector-in-pytorch/
IOU & mAP
mAP
Introduction
• This is aTECH REPORT.
• “I managed to make some improvements toYOLO. But, honestly,
nothing like super interesting, just a bunch of small changes that
make it better.”
• Better, Not Faster, Stronger(?)
• The last sentence of the document: In closing, do not @me.
(Because I finally quitTwitter).
Bounding Box Prediction
• YOLOv3 predicts bounding boxes using dimension clusters as anchor
boxes.The network predicts 4 coordinates for each bounding box, tx,
ty, tw, th.
• Use sum of squared error loss
Bounding Box Prediction
• YOLOv3 predicts an objectness score for each bounding box using
logistic regression.
• This should be 1 if the bounding box prior overlaps a ground truth
object by more than any other bounding box prior.
• They use the threshold of 0.5. Unlike Faster R-CNN,YOLOv3 only
assigns one bounding box prior for each ground truth object.
Class Prediction
• YOLO v3 now performs multilabel classification for objects detected
in images.
• Softmaxing classes rests on the assumption that classes are mutually
exclusive.
• However, when we have classes like Person and Women in a dataset,
then the above assumption fails.This is the reason why the authors
ofYOLO have refrained from softmaxing the classes. Instead, each
class score is predicted using logistic regression and a threshold is
used to predict multiple labels for an object.
Predictions Across Scales
• The most salient feature ofYOLOv3 is that
it makes detections at three different scales.
• YOLOv3 predicts boxes at 3 scales
• YOLOv3 predicts 3 boxes at each scale
 in total 9 boxes
 So the tensor is N x N x (3 x (4 + 1 + 80))
80
3
N
N
255
Anchor Boxes
• They still use k-means clustering to determine bounding box priors.
They just sort of chose 9 clusters and 3 scales arbitrarily and then
divide up the clusters evenly across scales.
• On the COCO dataset the 9 clusters were:
(10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x
198), (373x326).
No. of Bounding Boxes
• YOLOv1 predicts 98 boxes (7x7 grid cells, 2 boxes per cell @448x448)
• YOLOv2 predicts 845 boxes (13x13 grid cells, 5 anchor boxes
@416x416)
• YOLOv3 predicts 10,647 boxes (@416x416)
• YOLOv3 predicts more than 10x the number of boxes predicted by
YOLOv2
Feature Extraction
• Darknet-53
• Darknet-53 is better than ResNet-101 and 1.5x
faster. Darknet-53 has similar performance to
ResNet-152 and is 2x faster.
• Darknet-53 also achieves the highest
measured floating point operations per
second.This means the network structure
better utilizes the GPU, making it more
efficient to evaluate and thus faster.
YOLOv3 Architecture
Darknet-53
Similar to Feature Pyramid Network
Training
• Authors still train on full images with no hard negative mining or any
of that stuff.
• They use multi-scale training, lots of data augmentation, batch
normalization, all the standard stuff.
Results
• It is still quite a bit behind other models like RetinaNet in this metric
though.
• However, when we look at the “old” detection metric of mAP at IOU=
0.5 (or AP50 in the chart)YOLOv3 is very strong.
• With the new multi-scale predictionsYOLOv3 has relatively high APs
performances.
Results
ThingsWeTriedThat Didn’tWork
• Anchor box x, y offset predictions
 Using the normal anchor box prediction mechanism where you predict the x,
y offset as a multiple of the box width or height
• Linear x, y predictions instead of logistic
• Focal loss
 Focal loss droppedYOLOv3’s mAP about 2 points
• Dual IOU thresholds and truth assignment
 Faster R-CNN uses two IOU thresholds during training. If a prediction
overlaps the ground truth by 0.7 it is as a positive example, by [0.3-0.7] it is
ignored, less than 0.3 for all ground truth objects it is a negative example.
WhatThis All Means
• YOLOv3 is a good detector. It’s fast, it’s accurate. It’s not as great on
the COCO average AP between .5 and .95 IOU metric. But it’s very
good on the old detection metric of .5 IOU.
• Russakovsky et al report that that humans have a hard time
distinguishing an IOU of .3 from .5!
 Training humans to visually inspect a bounding box with IOU of 0.3 and
distinguish it from one with IOU 0.5 is surprisingly difficult.
Rebuttal
• Graphs have not one but two non-zero origins
Rebuttal
• For PASCALVOC, the IOU threshold was “set deliberately low to
account for inaccuracies in bounding boxes in the ground truth data”
• COCO can have better labelling thanVOC since COCO has
segmentation masks. But there was the lack of justification for
updating mAP.
• Emphasis must mean it de-emphasizes something else, in this case
classification accuracy. A miss-classified example is much more
obvious than a bounding box that is slightly shifted.
mAP’s Problems
• They are both perfect! (mAP = 1.0)
A New Proposal
• mAP is screwed up because all that matters is per-class rank ordering.
• What about getting rid of per-class AP and just doing a global
average precision?
• Or doing an AP calculation per-image and averaging over that?
• Boxes are stupid anyway though, I’m probably a true believer in
masks except I can’t getYOLO to learn them.
Thank you

More Related Content

What's hot

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!J On The Beach
 
yolov3-4-5.pdf
yolov3-4-5.pdfyolov3-4-5.pdf
yolov3-4-5.pdfNguynnhLm4
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 

What's hot (20)

YOLO v1
YOLO v1YOLO v1
YOLO v1
 
Yolov3
Yolov3Yolov3
Yolov3
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Yolo
YoloYolo
Yolo
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Yolo
YoloYolo
Yolo
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!
 
yolov3-4-5.pdf
yolov3-4-5.pdfyolov3-4-5.pdf
yolov3-4-5.pdf
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
YOLO
YOLOYOLO
YOLO
 

Similar to An Incremental Improvement to YOLOv3 Object Detection

Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...Lviv Startup Club
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from ArmEdge AI and Vision Alliance
 
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET Journal
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired personshahsamkit73
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...NAVER Engineering
 
presentation on Faster Yolo
presentation on Faster Yolo presentation on Faster Yolo
presentation on Faster Yolo toontown1
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detection
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detectionSEMINAR COURSE PRESENTATION on YOLO algorithm for object detection
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detectionprasenjitroy98546
 
Real Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep LearningReal Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep LearningIRJET Journal
 
Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJAEMINJEONG5
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysispotaters
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 

Similar to An Incremental Improvement to YOLOv3 Object Detection (20)

Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
 
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired person
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...
 
presentation on Faster Yolo
presentation on Faster Yolo presentation on Faster Yolo
presentation on Faster Yolo
 
DMS MODULE 1 PRESENTATION.pptx
DMS MODULE 1 PRESENTATION.pptxDMS MODULE 1 PRESENTATION.pptx
DMS MODULE 1 PRESENTATION.pptx
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detection
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detectionSEMINAR COURSE PRESENTATION on YOLO algorithm for object detection
SEMINAR COURSE PRESENTATION on YOLO algorithm for object detection
 
Real Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep LearningReal Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep Learning
 
Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 

More from Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

An Incremental Improvement to YOLOv3 Object Detection

  • 1. YOLOv3: An Incremental Improvement Joseph Redmon, et al., “YOLOv3: An Incremental Improvement” 17th November, 2019 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. Related PR12 • PR-016:You only look once: Unified, real-time object detection : https://youtu.be/eTDcoeqj1_w • PR-023:YOLO9000: Better, Faster, Stronger : https://youtu.be/6fdclSGgeio
  • 3. References • What’s new inYOLO v3? https://towardsdatascience.com/yolo-v3-object-detection- 53fb7d3bfe6b • How to implement aYOLO (v3) object detector from scratch in PyTorch https://blog.paperspace.com/how-to-implement-a-yolo-object- detector-in-pytorch/
  • 5. Introduction • This is aTECH REPORT. • “I managed to make some improvements toYOLO. But, honestly, nothing like super interesting, just a bunch of small changes that make it better.” • Better, Not Faster, Stronger(?) • The last sentence of the document: In closing, do not @me. (Because I finally quitTwitter).
  • 6. Bounding Box Prediction • YOLOv3 predicts bounding boxes using dimension clusters as anchor boxes.The network predicts 4 coordinates for each bounding box, tx, ty, tw, th. • Use sum of squared error loss
  • 7. Bounding Box Prediction • YOLOv3 predicts an objectness score for each bounding box using logistic regression. • This should be 1 if the bounding box prior overlaps a ground truth object by more than any other bounding box prior. • They use the threshold of 0.5. Unlike Faster R-CNN,YOLOv3 only assigns one bounding box prior for each ground truth object.
  • 8. Class Prediction • YOLO v3 now performs multilabel classification for objects detected in images. • Softmaxing classes rests on the assumption that classes are mutually exclusive. • However, when we have classes like Person and Women in a dataset, then the above assumption fails.This is the reason why the authors ofYOLO have refrained from softmaxing the classes. Instead, each class score is predicted using logistic regression and a threshold is used to predict multiple labels for an object.
  • 9. Predictions Across Scales • The most salient feature ofYOLOv3 is that it makes detections at three different scales. • YOLOv3 predicts boxes at 3 scales • YOLOv3 predicts 3 boxes at each scale  in total 9 boxes  So the tensor is N x N x (3 x (4 + 1 + 80)) 80 3 N N 255
  • 10. Anchor Boxes • They still use k-means clustering to determine bounding box priors. They just sort of chose 9 clusters and 3 scales arbitrarily and then divide up the clusters evenly across scales. • On the COCO dataset the 9 clusters were: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x 198), (373x326).
  • 11. No. of Bounding Boxes • YOLOv1 predicts 98 boxes (7x7 grid cells, 2 boxes per cell @448x448) • YOLOv2 predicts 845 boxes (13x13 grid cells, 5 anchor boxes @416x416) • YOLOv3 predicts 10,647 boxes (@416x416) • YOLOv3 predicts more than 10x the number of boxes predicted by YOLOv2
  • 12. Feature Extraction • Darknet-53 • Darknet-53 is better than ResNet-101 and 1.5x faster. Darknet-53 has similar performance to ResNet-152 and is 2x faster. • Darknet-53 also achieves the highest measured floating point operations per second.This means the network structure better utilizes the GPU, making it more efficient to evaluate and thus faster.
  • 13. YOLOv3 Architecture Darknet-53 Similar to Feature Pyramid Network
  • 14. Training • Authors still train on full images with no hard negative mining or any of that stuff. • They use multi-scale training, lots of data augmentation, batch normalization, all the standard stuff.
  • 15. Results • It is still quite a bit behind other models like RetinaNet in this metric though. • However, when we look at the “old” detection metric of mAP at IOU= 0.5 (or AP50 in the chart)YOLOv3 is very strong. • With the new multi-scale predictionsYOLOv3 has relatively high APs performances.
  • 17. ThingsWeTriedThat Didn’tWork • Anchor box x, y offset predictions  Using the normal anchor box prediction mechanism where you predict the x, y offset as a multiple of the box width or height • Linear x, y predictions instead of logistic • Focal loss  Focal loss droppedYOLOv3’s mAP about 2 points • Dual IOU thresholds and truth assignment  Faster R-CNN uses two IOU thresholds during training. If a prediction overlaps the ground truth by 0.7 it is as a positive example, by [0.3-0.7] it is ignored, less than 0.3 for all ground truth objects it is a negative example.
  • 18. WhatThis All Means • YOLOv3 is a good detector. It’s fast, it’s accurate. It’s not as great on the COCO average AP between .5 and .95 IOU metric. But it’s very good on the old detection metric of .5 IOU. • Russakovsky et al report that that humans have a hard time distinguishing an IOU of .3 from .5!  Training humans to visually inspect a bounding box with IOU of 0.3 and distinguish it from one with IOU 0.5 is surprisingly difficult.
  • 19. Rebuttal • Graphs have not one but two non-zero origins
  • 20. Rebuttal • For PASCALVOC, the IOU threshold was “set deliberately low to account for inaccuracies in bounding boxes in the ground truth data” • COCO can have better labelling thanVOC since COCO has segmentation masks. But there was the lack of justification for updating mAP. • Emphasis must mean it de-emphasizes something else, in this case classification accuracy. A miss-classified example is much more obvious than a bounding box that is slightly shifted.
  • 21. mAP’s Problems • They are both perfect! (mAP = 1.0)
  • 22. A New Proposal • mAP is screwed up because all that matters is per-class rank ordering. • What about getting rid of per-class AP and just doing a global average precision? • Or doing an AP calculation per-image and averaging over that? • Boxes are stupid anyway though, I’m probably a true believer in masks except I can’t getYOLO to learn them.