You Only Look Once:
Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016)
The YOLO Detection System
The YOLO Detection System
(1) resizes the input image to 448 × 448.
(2) runs a single convolutional network on the image.
(3) thresholds the resulting detections by the model’s confidence.
https://www.jeremyjordan.me/object-detection-one-stage/
Non-maximum suppression
Bounding Box, Confidence and Class Probability
YOLO reframes
object detection
as a regression
problem.
• The image is divided into an S × S grid and for each grid cell predicts B bounding
boxes (x, y, w, h), confidence for those boxes, and C class probabilities.
• These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
Bounding Box, Confidence and Class Probability
The confidence of the bounding box
Formally we define
confidence as Pr(Object) ∗
IOU . If no object exists in that
cell, the confidence scores
should be zero.
The Neural Network Architecture
For evaluating YOLO on PASCAL VOC, we use S = 7, B = 2. PASCAL VOC has 20 labelled
classes so C = 20. Our final prediction is a 7 × 7 × (2∗5 + 20) tensor.
Loss Function
The size of the bounding box
The confidence of the bounding box
The probability of the class
Evaluation Metric
Confusion Matrix
Intersection Over Union (IOU) and Object Detection
https://devblogs.nvidia.com/exploring-spacenet-dataset-using-digits/
Recall-Precision Curve and Average Precision
https://acutecaretesting.org/en/articles/precision-
recall-curves-what-are-they-and-how-are-they-used
Ideally, the value of the Precision does not
decrease as the increase of the value of Recall.
The general definition for the Average Precision
(AP) is finding the area under the precision-recall
curve.
https://medium.com/@jonathan_hui/ma
p-mean-average-precision-for-object-
detection-45c121a31173
The dataset contains 5 apples only. We
collect all the predictions made for apples
in all the images and rank it in descending
order according to the predicted
confidence level.
The second column indicates whether the
prediction is correct or not. In this example,
the prediction is correct if IoU ≥ 0.5.
Recall-Precision Curve and Average Precision
An average for the 11-point interpolated AP is calculated and the curve is divided from
0 to 1.0 into 11 points
Average Precision (AP) is the
area under the precision-recall
curve.
mAP (mean average precision) is the average of the AP for each class.
Average Precision and mean Average Precision
Experimental Results
Fast YOLO uses a neural network
with fewer convolutional layers (9
instead of 24) and fewer filters in
those layers.
Comparison to Other Real-Time Systems
YOLO is 10 mAP more accurate than the fast version while still well above
real-time in speed.
VOC 2007 Error Analysis
•Correct: correct class and IOU > .5
• Localization: correct class, .1 < IOU < .5
• Similar: class is similar, IOU > .1
• Other: class is wrong, IOU > .1
• Background: IOU < .1 for any object
Localization errors account for more of YOLO’s errors than all other sources
combined. Fast R-CNN makes much fewer localization errors but far more
background errors.
Qualitative Results
Yolo

Yolo

  • 1.
    You Only LookOnce: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016)
  • 2.
  • 3.
    The YOLO DetectionSystem (1) resizes the input image to 448 × 448. (2) runs a single convolutional network on the image. (3) thresholds the resulting detections by the model’s confidence.
  • 4.
  • 5.
    Bounding Box, Confidenceand Class Probability YOLO reframes object detection as a regression problem. • The image is divided into an S × S grid and for each grid cell predicts B bounding boxes (x, y, w, h), confidence for those boxes, and C class probabilities. • These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
  • 6.
    Bounding Box, Confidenceand Class Probability The confidence of the bounding box Formally we define confidence as Pr(Object) ∗ IOU . If no object exists in that cell, the confidence scores should be zero.
  • 7.
    The Neural NetworkArchitecture For evaluating YOLO on PASCAL VOC, we use S = 7, B = 2. PASCAL VOC has 20 labelled classes so C = 20. Our final prediction is a 7 × 7 × (2∗5 + 20) tensor.
  • 8.
    Loss Function The sizeof the bounding box The confidence of the bounding box The probability of the class
  • 9.
  • 10.
  • 11.
    Intersection Over Union(IOU) and Object Detection https://devblogs.nvidia.com/exploring-spacenet-dataset-using-digits/
  • 12.
    Recall-Precision Curve andAverage Precision https://acutecaretesting.org/en/articles/precision- recall-curves-what-are-they-and-how-are-they-used Ideally, the value of the Precision does not decrease as the increase of the value of Recall. The general definition for the Average Precision (AP) is finding the area under the precision-recall curve.
  • 13.
    https://medium.com/@jonathan_hui/ma p-mean-average-precision-for-object- detection-45c121a31173 The dataset contains5 apples only. We collect all the predictions made for apples in all the images and rank it in descending order according to the predicted confidence level. The second column indicates whether the prediction is correct or not. In this example, the prediction is correct if IoU ≥ 0.5. Recall-Precision Curve and Average Precision
  • 14.
    An average forthe 11-point interpolated AP is calculated and the curve is divided from 0 to 1.0 into 11 points Average Precision (AP) is the area under the precision-recall curve. mAP (mean average precision) is the average of the AP for each class. Average Precision and mean Average Precision
  • 15.
  • 16.
    Fast YOLO usesa neural network with fewer convolutional layers (9 instead of 24) and fewer filters in those layers. Comparison to Other Real-Time Systems YOLO is 10 mAP more accurate than the fast version while still well above real-time in speed.
  • 17.
    VOC 2007 ErrorAnalysis •Correct: correct class and IOU > .5 • Localization: correct class, .1 < IOU < .5 • Similar: class is similar, IOU > .1 • Other: class is wrong, IOU > .1 • Background: IOU < .1 for any object Localization errors account for more of YOLO’s errors than all other sources combined. Fast R-CNN makes much fewer localization errors but far more background errors.
  • 18.