Single Shot Multibox Detector

SSD:
Single Shot Multibox Detector
NamHyuk Ahn

Object Detection
- mean Average Precision (mAP)
• Popular eval metric
• Compute average precision
for single class, and average
them over all classes
• Detections is True-positive
if box is overlap with ground-
truth more than some threshold
(usually use 0.5)

Object Detection
- R-CNN Family
• Most popular detection method in deep learning
• Use region proposal method <- make model slow
• Good accuracy (Faster: 73.2% mAP), but very slow
• R-CNN: 50 sec/img, Fast: 2 sec/img, Faster: 0.2 sec/img (7 FPS)
- YOLO (You Only Look Once)
• Real-time (45 FPS), but low accuracy (63.4% mAP)

YOLO:
You Only Look Once
- Single shot detector model
• Not separate classification and bbox regression
- Divide image into S x S grid (7x7 in paper)
• Within each grid cell, (4+1)*B + C vector,
• B: # of boxes in each grid (2 in paper)
C: # of classes (20 in paper)
(4+1): 4 box coord + 1 box confidence
- Direct prediction using CNN with regression loss

YOLO:
You Only Look Once
- Operate on a single-scale feature map (last pool)
• Bad accuracy with large or small object
- Predict bbox using fc layer
- Hard data augmentation, 448x448 input image
- Use customized CNN architecture

SSD:
- Multi-scale feature maps for detection
• Add conv layer at the end of base network, decrease size progressively
• Concat output of multi-scale feature map at the last layer
- Convolutional predictors for detection
• YOLO use fc layer, but SSD use 3x3 conv kernel

SSD:
- Default boxes and aspect ratios
• Set default boxes at each location, and predict offset relative to
corresponding default box
• output dims: (C+4)K*M*N,
K=# of default box, C=# of classes, MN=feature dims

SSD:
- Default boxes and aspect ratios
• Use 6 default boxes at each feature cell
• { 1, 2, 3, 1/2, 1/3 } aspect ratio boxes + 1 box with 1 aspect ratio
• Set 3 boxes in conv4_3 to reduce computation

SSD:
- Output feature (final layer)
• With given output boxes from multi-scale features, sort them
using class confidence
• Pick top-200 boxes and make each box 7-dim vector
• [ batch_idx, class_confidence, label, box offset…]
• Output feature dim is 7x200
•

Model analysis
- Data argumentation is very important
- More feature map is better
• Lower feature map can capture fine-grained details of object
- More default box shape is better
• If you only 4 boxes, performance drop by 0.9%
• Using variety shape of default box makes predicting box easier
- Astrous VGG is better and faster

Result
- Accuracy is compare to state-of-the-art, and with
real-time

Reference
- Liu, Wei, et al. "SSD: Single Shot MultiBox Detector." arXiv preprint
arXiv:1512.02325 (2015).

Single Shot Multibox Detector

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Single Shot Multibox Detector

Similar to Single Shot Multibox Detector (20)

More from NamHyuk Ahn

More from NamHyuk Ahn (7)

Recently uploaded

Recently uploaded (20)

Single Shot Multibox Detector