"Optimizing SSD Object Detection for Low-power Devices," a Presentation from Allegro

© 2019 allegro.ai
Optimizing
SSD Object Detection
for Low-power Devices
Moses Guttmann
CTO, allegro.ai
May 2019

© 2019 allegro.ai
Agenda
● Deep-learning computer vision: Towards embedded deployment
● Single-Shot-Detection: A short overview
● Prior design - A low hanging fruit for optimization
● Data-driven prior optimization
● Results
2

© 2019 allegro.ai
About allegro.ai
End2end platform optimized for DL based perception / CV
● Automated labeling
● Experiment management
● Dataset management
● Deep learning (Dev)Ops
● Continuous / active learning
Trusted by:
3

© 2019 allegro.ai
Towards Embedded
Deployment

© 2019 allegro.ai
Embedded Object Detection: Living on the Edge
Model Design equation:
+ Low memory
+ Efficiency (compute OPs)
+ Accuracy
= Bill-of-Materials
5
General rule for inference -
“large model” equals:
● Accurate
● Many operations
● High memory footprint
Vestibulum congue
FRAME
RATE
ACCURACY
POWER
DRAW
CHOOSE TWO

© 2019 allegro.ai
Detection: Towards Embedded Applications
1. Function split: [feature extractor] + [detection heads]
2. Multiple heads for different tasks Shared feature extractor
3. Single-Shot models Execution
path is not dynamic
4. Use weak feature extractors Low operations
count
5. Optional: model quantization Performance
boost, optimized
6
☹ DLCV “state of the art” == High memory footprint & low FPS

© 2019 allegro.ai
Model “Zoo” - Which is Best for Me?
7
“Common” detection
tasks:
Deployable architecture
show only slight
difference in accuracy
Larger model
higher accuracy
Source: https://github.com/amdegroot/ssd.pytorch

© 2019 allegro.ai
Where is the catch?

© 2019 allegro.ai
The problem
with detection…
Timed on Intel(R) Core(TM) i7-
7700K CPU @ 4.20GHz
Non-optimized pytorch
Simulated “embedded” cpu
deployment
9
Source: https://github.com/amdegroot/ssd.pytorch

© 2019 allegro.ai
What is Going On?
In detection, getting the actual results means
● Refining 10-20k suggestions to detections
● Expensive algorithms (e.g. NMS), 30-50% of processing time
Reducing the feature extractor size does not help
Opportunity:
● Lower complexity of refinement algorithm (hurt accuracy!)
OR
● Reduce number of suggestions (preserve accuracy?!)
10

© 2019 allegro.ai
Single-Shot Detection

© 2019 allegro.ai 12
Not SSD: “many shot”
Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012
2 stages - Many-Shot Detection NNs (not SSD / YOLO)
1. Image → feature space → object proposals generator
2. Object proposals → resample feature space → classifier
Computationally
intensive:
FPS depends on
number of
proposals

© 2019 allegro.ai
● Fast detection with minor penalty in accuracy
● Predictions of different scales from different layers
● Significantly more proposals (~24K vs ~6K of F-RCNN)
● Supports small objects
13
Single Shot Detection
SSD: Single Shot MultiBox Detector, arXiv:1512.02325
Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012

© 2019 allegro.ai
YOLO / SSD
14
SSD: Single Shot MultiBox Detector, arXiv:1512.02325
YOLO9000: Better, Faster, Stronger, arXiv:1612.08242

© 2019 allegro.ai
Origin of Suggestions in Single-Shot:
15
● Prior Grid:
(box/proposal/anchor)
Set of priors for each target
“pixel” @ each resolution
● Localization: mapping
between priors and
bounding-boxes
● High-quality object classifier
for every prior type

© 2019 allegro.ai
MobileNetV2@SSD (512x512): Prune Priors
16
* Number of priors (~total)
source: Allegro.ai - AI research team

© 2019 allegro.ai
Suggested Path for Optimization:
● Previous work: priors tuned for benchmark datasets,
not applicable for real-world datasets.
● Prior amount/shapes should be tailored to the objects (and sizes)
● Enable selection between accuracy/performance
● Independent/additive to all other optimizations
Bonus points:
● Optimization as part of the pipeline (matched with the data)
● Automatically prune model execution graph, check if priors are
not generated for specific scale (i.e. “no big objects in dataset”)
17

The size of all objects is
known in advanced
Tune the priors for our
specific purpose
Data-Centric Approach - Toy Example
small large
prior size/scale

Data-Centric Approach - Optimization Example
small large
prior size/scale
Remove unused prior: #10
Delete priors: #1, #2, #8
Reshape other priors
Confirm they match with
all the examples in dataset

© 2019 allegro.ai
Problem Definition
Task: “Pet detector”
Classes: “cat”, “dog”, “bird”
Examples taken from
VOC/COCO “train” sets.
Size: 24K ROIs
Unique Priors: 24.5 K (36 types)
21
Detectorpriormatch
prior size/scalesource: Allegro.ai - AI research team

© 2019 allegro.ai
Optimized Prior Matching
Task: “Pet detector”
Classes: “cat”, “dog”, “bird”
Examples taken from
VOC/COCO “train” sets.
Size: 24K ROIs
Unique Priors: 16K (21 types)
22
Detectorpriormatch
prior size/scalesource: Allegro.ai - AI research team

© 2019 allegro.ai
Method: (I) Collect Statistics (with Augmentations)
“Dataset as Database”
Apply data
augmentations
Collect object bounding
boxes
23source: Allegro.ai - AI research team

© 2019 allegro.ai
Method: (II) Partition to Detection Resolutions
Model architecture
Partition box population
Resolution/Scale
(small to large)

© 2019 allegro.ai
Method: (III) Weighted Clustering
Clustering
using naive K-means
Data Bias aware
weighting function
Ensure “fair” priors

© 2019 allegro.ai
Method: (IV) Merge Similar/Prune: “Light”
Optimization
Small boxes are
redundant.
Negligible accuracy
decrease

© 2019 allegro.ai
Optimization II
Greedy merging strategy
Decrease number of priors
Small cost in accuracy
Method: (V) Merge Similar/Prune: “Hard”

© 2019 allegro.ai
Take-Home Messages
● Successful implementation of data-driven optimization
● Applicable to any SSD meta-architecture (SSD, DSSD, FPN)
● Can change the input size and get optimized priors for any input
depending on deployment
31

© 2019 allegro.ai
Future Work
● AutoML - Pruning towards required accuracy and model
footprint
● “Reverse optimization” - Flag biased datasets which require
more examples if there is underrepresentation
● Mask optimisation for instance segmentation (MASK-RCNN etc.)
32

© 2019 allegro.ai
Resources
33
Tools used:
● allegro.ai deep learning perception platform
● Deep learning framework: Pytorch
Research papers:
● Speed/accuracy trade-offs for modern convolutional object
detectors, arXiv:1611.10012
● SSD: Single Shot MultiBox Detector, arXiv:1512.02325

© 2019 allegro.ai
What is Deep Learning Computer Vision
● Computer vision: classification, detection, segmentation,
recognition,...
● Based on deep learning - “Weak AI” - important, but limited tasks
- data intensive, large memory footprint, expensive ops
“Accurate” inference :=
Model trained on ‘input
data’ gives accurate
predictions
in deployment
36
Model inference := perform CV task on input image/video

© 2019 allegro.ai
Detect = Locate Object + Classify
● Astounding progress
● Data-driven models
● Deployable tech
● Dedicated hardware
OPTIONAL
37
Dragon: 86%

© 2019 allegro.ai
Ground Truth = GT
Intersection over Union = IoU
“Good prior”
How to choose prior-GT match for training?

© 2019 allegro.ai
IoU: not *the* ideal choice for matching with priors
Difference between thresholds
is not easy to see unaided
Question:
Which Priors here should be
trained to match the dog’s
bounding box?
https://www.reddit.com/r/computervision/comments/876h0f/yolo_v3_released/dwd7hpm/

"Optimizing SSD Object Detection for Low-power Devices," a Presentation from Allegro

More Related Content

What's hot

Similar to "Optimizing SSD Object Detection for Low-power Devices," a Presentation from Allegro

More from Edge AI and Vision Alliance

Recently uploaded

"Optimizing SSD Object Detection for Low-power Devices," a Presentation from Allegro