Yolo releases gianmaria

YOLO releases
Gianmaria Perillo
Data Scientist, Sferanet
perillo@sferaspa.com

Object Detection Problem
• Image classification is the task of taking an input image and
outputting a class (a cat, dog, etc) or a probability of those classes
that better describe the image. For humans, this task of recognition is
one of the first skills we learn.
• Object Localization is the task of predict the object in an image as
well as its boundaries. The aims is to locate object in an image.

Object Detection Problem
Object detection tries to find out all the objects and their boundaries.
Classification
Classification
+ Localization
Object Detection
CAT CAT CAT,DOG,DUCK

Traditional Detection Methods
• Feature extraction: Haar, HOG, SIFT …
• Feature selection: PCA, ICA …
• Feature Matching
• Classification: SVM, Logistic Regression, Nearest Neighbor …

Deep Learning Object Detection Methods
A naive approach to object detection problem would be to take
different regions of interest from the image, and use a CNN to classify
the presence of the object within that region.

Deep Learning Object Detection Methods:
Two-stage detector
The detection happens in two stages:
1. First, the model proposes a set of regions of interests by select
search or regional proposal network.
2. Then a classifier only processes the region candidates.

Region-CNN (R-CNN)
Use selective search to extract just 2000 regions from the image.

Fast R-CNN
The regions are extracted not from image, but from feature-map
generated by a CNN.

Faster R-CNN
Selective search is a slow and
time-consuming process.
Use a separated NN to generate
proposals.
Training and test are faster than
R-CNN and Fast R-CNN.

Deep Learning Object Detection Methods:
One-stage detector
In a one-stage detector there is no intermediate task (region
proposals).
A back-bone network is used to extract features from image, usually
pre-trained as an image classifier.
Use a grid to predict a fixed number of bounding-box.

You Only Look Once (YOLO)
The base idea is to divide the image in a grid with fixed number of cells.
There are three version of YOLO:
• YOLO v1 : Joseph Redmon,Santosh Divvala, Ross Girshick, Ali Farhadi, 2015.
• YOLO v2, YOLO9000: Joseph Redmon and Ali Farhadi, 2016.
• YOLO v3 : Joseph Redmon and Ali Farhadi, 2018.

YOLO v1
• Divide the input image into an S × S grid.
• Each grid cell predicts B bounding boxes.
• Each bounding box :
• Confidence = 𝑃𝑟 𝑜𝑔𝑔𝑒𝑡𝑡𝑜 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
.
• 𝒙, 𝒚, 𝒘, 𝒉 = (𝑥, 𝑦) bb center, 𝑤 width, ℎ height
• C class probabilities.
• Prediction = S × S × (B ∗ 5 + C)

YOLO v1 : Cost Function
Classification Loss
Localization Loss
Confidence Loss

YOLO v1 : Pros & Cons
• Spatial constraints on bounding
box predictions.
• Small objects that appear in
groups.
• Generalize to objects in new or
unusual aspect ratios or
configurations
• Fast.
• Predictions are made from one
single network.
• Can be trained end-to-end to
improve accuracy.
PROS CONS

YOLO v2
• Batch Normalization
• Anchor-Box
• Dimension Clusters
• Direct location prediction
• Fine-Grained Features
• Darknet-19
• Hierarchical classification

YOLO v2: Anchor Box and Dimension Cluster
Yolo v1 predicts bounding box with convolutional layers. Faster R-CNN
uses a separated network to predict offsets and confidences for anchor
boxes.
Yolo v2 use anchor boxes. Instead of hand pick priors, K-means is used
on the training set bounding boxes to find better priors.
Distance measure indipendent of the size of the box:
𝑑 𝑏𝑜𝑥, 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 = 1 − 𝐼𝑂𝑈(𝑏𝑜𝑥, 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑)

YOLO v2: Anchor Box and Dimension Cluster

YOLO v2: Direct Location Prediction
The network predicts 5 bounding boxes at each cell in the output
feature map. The network predicts 5 coordinates for each bounding
box: 𝒕 𝒙, 𝒕 𝒚, 𝒕 𝒘, 𝒕 𝒉, 𝒕 𝒐.
𝑏 𝑥 = 𝜎 𝑡 𝑥 + 𝑐 𝑥
𝑏 𝑦 = 𝜎 𝑡 𝑦 + 𝑐 𝑦
𝑏 𝑤 = 𝑝 𝑤 𝑒 𝑡 𝑤
𝑏ℎ = 𝑝ℎ 𝑒 𝑡ℎ
With (𝑐 𝑥, 𝑐 𝑦) offset of the cell from top left corner and 𝑝 𝑤, 𝑝ℎ the
bounding box prior width and height.

YOLO v2: Direct Location Prediction ????

YOLO v2: Darknet-19
Back-bone network with 19
convolutional layers.
1x1 filters to compress the
feature map.
Batch normalization to stabilize
training and avoid overfitting.
Passthrough layer is added so the
model can use ﬁne grain features
from previous layers.

YOLO 9000
Yolo v2 is trained separately for classification and detection.
It is been proposed a method to jointly training the network for both
task.
A new hierarchical dataset is created from COCO and ImageNet based
on concept of synonyms and hyponomes.

YOLO v2 : Pros & Cons
• Pre-processing for prior.
• Experimental threshold.
• Faster and more accurate.
• Can detect small object.
• Joint detection and classification
training.
• Hierachical classification.
PROS CONS

YOLO v3
• Darknet 53
• Multi scale feature
• Residual block
• Logistic classifier
• Multi-label classification

YOLO v3
• Dual IOU thresholds.
• Focal loss (RetinaNet).
• Linear activation.
• More accurate.
• Multiscale feature.
• Multilabel approach.
PROS ATTEMPTS

Gianmaria Perillo
Data Scientist, Sferanet
perillo@sferaspa.com

Yolo releases gianmaria

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Yolo releases gianmaria

Similar to Yolo releases gianmaria (20)

More from Deep Learning Italia

More from Deep Learning Italia (20)

Recently uploaded

Recently uploaded (20)

Yolo releases gianmaria

Editor's Notes