© 2019 allegro.ai
Optimizing
SSD Object Detection
for Low-power Devices
Moses Guttmann
CTO, allegro.ai
May 2019
© 2019 allegro.ai
Agenda
● Deep-learning computer vision: Towards embedded deployment
● Single-Shot-Detection: A short overview
● Prior design - A low hanging fruit for optimization
● Data-driven prior optimization
● Results
2
© 2019 allegro.ai
About allegro.ai
End2end platform optimized for DL based perception / CV
● Automated labeling
● Experiment management
● Dataset management
● Deep learning (Dev)Ops
● Continuous / active learning
Trusted by:
3
© 2019 allegro.ai
Towards Embedded
Deployment
© 2019 allegro.ai
Embedded Object Detection: Living on the Edge
Model Design equation:
+ Low memory
+ Efficiency (compute OPs)
+ Accuracy
= Bill-of-Materials
5
General rule for inference -
“large model” equals:
● Accurate
● Many operations
● High memory footprint
Vestibulum congue
FRAME
RATE
ACCURACY
POWER
DRAW
CHOOSE TWO
© 2019 allegro.ai
Detection: Towards Embedded Applications
1. Function split: [feature extractor] + [detection heads]
2. Multiple heads for different tasks Shared feature extractor
3. Single-Shot models Execution
path is not dynamic
4. Use weak feature extractors Low operations
count
5. Optional: model quantization Performance
boost, optimized
6
☹ DLCV “state of the art” == High memory footprint & low FPS
© 2019 allegro.ai
Model “Zoo” - Which is Best for Me?
7
“Common” detection
tasks:
Deployable architecture
show only slight
difference in accuracy
Larger model
higher accuracy
Source: https://github.com/amdegroot/ssd.pytorch
© 2019 allegro.ai
Where is the catch?
© 2019 allegro.ai
The problem
with detection…
Timed on Intel(R) Core(TM) i7-
7700K CPU @ 4.20GHz
Non-optimized pytorch
Simulated “embedded” cpu
deployment
9
Source: https://github.com/amdegroot/ssd.pytorch
© 2019 allegro.ai
What is Going On?
In detection, getting the actual results means
● Refining 10-20k suggestions to detections
● Expensive algorithms (e.g. NMS), 30-50% of processing time
Reducing the feature extractor size does not help
Opportunity:
● Lower complexity of refinement algorithm (hurt accuracy!)
OR
● Reduce number of suggestions (preserve accuracy?!)
10
© 2019 allegro.ai
Single-Shot Detection
© 2019 allegro.ai 12
Not SSD: “many shot”
Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012
2 stages - Many-Shot Detection NNs (not SSD / YOLO)
1. Image → feature space → object proposals generator
2. Object proposals → resample feature space → classifier
Computationally
intensive:
FPS depends on
number of
proposals
© 2019 allegro.ai
● Fast detection with minor penalty in accuracy
● Predictions of different scales from different layers
● Significantly more proposals (~24K vs ~6K of F-RCNN)
● Supports small objects
13
Single Shot Detection
SSD: Single Shot MultiBox Detector, arXiv:1512.02325
Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012
© 2019 allegro.ai
YOLO / SSD
14
SSD: Single Shot MultiBox Detector, arXiv:1512.02325
YOLO9000: Better, Faster, Stronger, arXiv:1612.08242
© 2019 allegro.ai
Origin of Suggestions in Single-Shot:
15
● Prior Grid:
(box/proposal/anchor)
Set of priors for each target
“pixel” @ each resolution
● Localization: mapping
between priors and
bounding-boxes
● High-quality object classifier
for every prior type
© 2019 allegro.ai
MobileNetV2@SSD (512x512): Prune Priors
16
* Number of priors (~total)
source: Allegro.ai - AI research team
© 2019 allegro.ai
Suggested Path for Optimization:
● Previous work: priors tuned for benchmark datasets,
not applicable for real-world datasets.
● Prior amount/shapes should be tailored to the objects (and sizes)
● Enable selection between accuracy/performance
● Independent/additive to all other optimizations
Bonus points:
● Optimization as part of the pipeline (matched with the data)
● Automatically prune model execution graph, check if priors are
not generated for specific scale (i.e. “no big objects in dataset”)
17
© 2019 allegro.ai 18
The size of all objects is
known in advanced
Tune the priors for our
specific purpose
Data-Centric Approach - Toy Example
small large
prior size/scale
source: Allegro.ai - AI research team
© 2019 allegro.ai 19
Data-Centric Approach - Optimization Example
small large
prior size/scale
Remove unused prior: #10
Delete priors: #1, #2, #8
Reshape other priors
Confirm they match with
all the examples in dataset
source: Allegro.ai - AI research team
© 2019 allegro.ai
Example
© 2019 allegro.ai
Problem Definition
Task: “Pet detector”
Classes: “cat”, “dog”, “bird”
Examples taken from
VOC/COCO “train” sets.
Size: 24K ROIs
Unique Priors: 24.5 K (36 types)
21
Detectorpriormatch
prior size/scalesource: Allegro.ai - AI research team
© 2019 allegro.ai
Optimized Prior Matching
Task: “Pet detector”
Classes: “cat”, “dog”, “bird”
Examples taken from
VOC/COCO “train” sets.
Size: 24K ROIs
Unique Priors: 16K (21 types)
22
Detectorpriormatch
prior size/scalesource: Allegro.ai - AI research team
© 2019 allegro.ai
Method: (I) Collect Statistics (with Augmentations)
“Dataset as Database”
Apply data
augmentations
Collect object bounding
boxes
23source: Allegro.ai - AI research team
© 2019 allegro.ai
Method: (II) Partition to Detection Resolutions
Model architecture
Partition box population
Resolution/Scale
(small to large)
24source: Allegro.ai - AI research team
© 2019 allegro.ai
Method: (III) Weighted Clustering
Clustering
using naive K-means
Data Bias aware
weighting function
Ensure “fair” priors
25source: Allegro.ai - AI research team
© 2019 allegro.ai
Method: (IV) Merge Similar/Prune: “Light”
Optimization
Small boxes are
redundant.
Negligible accuracy
decrease
26source: Allegro.ai - AI research team
© 2019 allegro.ai
Optimization II
Greedy merging strategy
Decrease number of priors
Small cost in accuracy
Method: (V) Merge Similar/Prune: “Hard”
2727source: Allegro.ai - AI research team
© 2019 allegro.ai
Results
© 2019 allegro.ai
29
MobileNetV2@SSD (512x512): Prune Results
* Number of priors (~total)
source: Allegro.ai - AI research team
© 2019 allegro.ai
30
MobileNetV2@SSD (640x480): Prune Results
* Number of priors (~total)
source: Allegro.ai - AI research team
© 2019 allegro.ai
Take-Home Messages
● Successful implementation of data-driven optimization
● Applicable to any SSD meta-architecture (SSD, DSSD, FPN)
● Can change the input size and get optimized priors for any input
depending on deployment
31
© 2019 allegro.ai
Future Work
● AutoML - Pruning towards required accuracy and model
footprint
● “Reverse optimization” - Flag biased datasets which require
more examples if there is underrepresentation
● Mask optimisation for instance segmentation (MASK-RCNN etc.)
32
© 2019 allegro.ai
Resources
33
Tools used:
● allegro.ai deep learning perception platform
● Deep learning framework: Pytorch
Research papers:
● Speed/accuracy trade-offs for modern convolutional object
detectors, arXiv:1611.10012
● SSD: Single Shot MultiBox Detector, arXiv:1512.02325
© 2019 allegro.ai
Thank You!
© 2019 allegro.ai
BACKUP SLIDES
© 2019 allegro.ai
What is Deep Learning Computer Vision
● Computer vision: classification, detection, segmentation,
recognition,...
● Based on deep learning - “Weak AI” - important, but limited tasks
- data intensive, large memory footprint, expensive ops
“Accurate” inference :=
Model trained on ‘input
data’ gives accurate
predictions
in deployment
36
Model inference := perform CV task on input image/video
© 2019 allegro.ai
Detect = Locate Object + Classify
● Astounding progress
● Data-driven models
● Deployable tech
● Dedicated hardware
OPTIONAL
37
Dragon: 86%
© 2019 allegro.ai
source: Youtube
‘off-the-shelf’ models are not enough (YOLOv3)
© 2019 allegro.ai
YOLO9000: Better, Faster, Stronger, arXiv:1612.08242
SSD vs YOLOv2
© 2019 allegro.ai
Ground Truth = GT
Intersection over Union = IoU
“Good prior”
How to choose prior-GT match for training?
© 2019 allegro.ai
IoU: not *the* ideal choice for matching with priors
Difference between thresholds
is not easy to see unaided
Question:
Which Priors here should be
trained to match the dog’s
bounding box?
https://www.reddit.com/r/computervision/comments/876h0f/yolo_v3_released/dwd7hpm/

"Optimizing SSD Object Detection for Low-power Devices," a Presentation from Allegro

  • 1.
    © 2019 allegro.ai Optimizing SSDObject Detection for Low-power Devices Moses Guttmann CTO, allegro.ai May 2019
  • 2.
    © 2019 allegro.ai Agenda ●Deep-learning computer vision: Towards embedded deployment ● Single-Shot-Detection: A short overview ● Prior design - A low hanging fruit for optimization ● Data-driven prior optimization ● Results 2
  • 3.
    © 2019 allegro.ai Aboutallegro.ai End2end platform optimized for DL based perception / CV ● Automated labeling ● Experiment management ● Dataset management ● Deep learning (Dev)Ops ● Continuous / active learning Trusted by: 3
  • 4.
    © 2019 allegro.ai TowardsEmbedded Deployment
  • 5.
    © 2019 allegro.ai EmbeddedObject Detection: Living on the Edge Model Design equation: + Low memory + Efficiency (compute OPs) + Accuracy = Bill-of-Materials 5 General rule for inference - “large model” equals: ● Accurate ● Many operations ● High memory footprint Vestibulum congue FRAME RATE ACCURACY POWER DRAW CHOOSE TWO
  • 6.
    © 2019 allegro.ai Detection:Towards Embedded Applications 1. Function split: [feature extractor] + [detection heads] 2. Multiple heads for different tasks Shared feature extractor 3. Single-Shot models Execution path is not dynamic 4. Use weak feature extractors Low operations count 5. Optional: model quantization Performance boost, optimized 6 ☹ DLCV “state of the art” == High memory footprint & low FPS
  • 7.
    © 2019 allegro.ai Model“Zoo” - Which is Best for Me? 7 “Common” detection tasks: Deployable architecture show only slight difference in accuracy Larger model higher accuracy Source: https://github.com/amdegroot/ssd.pytorch
  • 8.
  • 9.
    © 2019 allegro.ai Theproblem with detection… Timed on Intel(R) Core(TM) i7- 7700K CPU @ 4.20GHz Non-optimized pytorch Simulated “embedded” cpu deployment 9 Source: https://github.com/amdegroot/ssd.pytorch
  • 10.
    © 2019 allegro.ai Whatis Going On? In detection, getting the actual results means ● Refining 10-20k suggestions to detections ● Expensive algorithms (e.g. NMS), 30-50% of processing time Reducing the feature extractor size does not help Opportunity: ● Lower complexity of refinement algorithm (hurt accuracy!) OR ● Reduce number of suggestions (preserve accuracy?!) 10
  • 11.
  • 12.
    © 2019 allegro.ai12 Not SSD: “many shot” Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012 2 stages - Many-Shot Detection NNs (not SSD / YOLO) 1. Image → feature space → object proposals generator 2. Object proposals → resample feature space → classifier Computationally intensive: FPS depends on number of proposals
  • 13.
    © 2019 allegro.ai ●Fast detection with minor penalty in accuracy ● Predictions of different scales from different layers ● Significantly more proposals (~24K vs ~6K of F-RCNN) ● Supports small objects 13 Single Shot Detection SSD: Single Shot MultiBox Detector, arXiv:1512.02325 Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012
  • 14.
    © 2019 allegro.ai YOLO/ SSD 14 SSD: Single Shot MultiBox Detector, arXiv:1512.02325 YOLO9000: Better, Faster, Stronger, arXiv:1612.08242
  • 15.
    © 2019 allegro.ai Originof Suggestions in Single-Shot: 15 ● Prior Grid: (box/proposal/anchor) Set of priors for each target “pixel” @ each resolution ● Localization: mapping between priors and bounding-boxes ● High-quality object classifier for every prior type
  • 16.
    © 2019 allegro.ai MobileNetV2@SSD(512x512): Prune Priors 16 * Number of priors (~total) source: Allegro.ai - AI research team
  • 17.
    © 2019 allegro.ai SuggestedPath for Optimization: ● Previous work: priors tuned for benchmark datasets, not applicable for real-world datasets. ● Prior amount/shapes should be tailored to the objects (and sizes) ● Enable selection between accuracy/performance ● Independent/additive to all other optimizations Bonus points: ● Optimization as part of the pipeline (matched with the data) ● Automatically prune model execution graph, check if priors are not generated for specific scale (i.e. “no big objects in dataset”) 17
  • 18.
    © 2019 allegro.ai18 The size of all objects is known in advanced Tune the priors for our specific purpose Data-Centric Approach - Toy Example small large prior size/scale source: Allegro.ai - AI research team
  • 19.
    © 2019 allegro.ai19 Data-Centric Approach - Optimization Example small large prior size/scale Remove unused prior: #10 Delete priors: #1, #2, #8 Reshape other priors Confirm they match with all the examples in dataset source: Allegro.ai - AI research team
  • 20.
  • 21.
    © 2019 allegro.ai ProblemDefinition Task: “Pet detector” Classes: “cat”, “dog”, “bird” Examples taken from VOC/COCO “train” sets. Size: 24K ROIs Unique Priors: 24.5 K (36 types) 21 Detectorpriormatch prior size/scalesource: Allegro.ai - AI research team
  • 22.
    © 2019 allegro.ai OptimizedPrior Matching Task: “Pet detector” Classes: “cat”, “dog”, “bird” Examples taken from VOC/COCO “train” sets. Size: 24K ROIs Unique Priors: 16K (21 types) 22 Detectorpriormatch prior size/scalesource: Allegro.ai - AI research team
  • 23.
    © 2019 allegro.ai Method:(I) Collect Statistics (with Augmentations) “Dataset as Database” Apply data augmentations Collect object bounding boxes 23source: Allegro.ai - AI research team
  • 24.
    © 2019 allegro.ai Method:(II) Partition to Detection Resolutions Model architecture Partition box population Resolution/Scale (small to large) 24source: Allegro.ai - AI research team
  • 25.
    © 2019 allegro.ai Method:(III) Weighted Clustering Clustering using naive K-means Data Bias aware weighting function Ensure “fair” priors 25source: Allegro.ai - AI research team
  • 26.
    © 2019 allegro.ai Method:(IV) Merge Similar/Prune: “Light” Optimization Small boxes are redundant. Negligible accuracy decrease 26source: Allegro.ai - AI research team
  • 27.
    © 2019 allegro.ai OptimizationII Greedy merging strategy Decrease number of priors Small cost in accuracy Method: (V) Merge Similar/Prune: “Hard” 2727source: Allegro.ai - AI research team
  • 28.
  • 29.
    © 2019 allegro.ai 29 MobileNetV2@SSD(512x512): Prune Results * Number of priors (~total) source: Allegro.ai - AI research team
  • 30.
    © 2019 allegro.ai 30 MobileNetV2@SSD(640x480): Prune Results * Number of priors (~total) source: Allegro.ai - AI research team
  • 31.
    © 2019 allegro.ai Take-HomeMessages ● Successful implementation of data-driven optimization ● Applicable to any SSD meta-architecture (SSD, DSSD, FPN) ● Can change the input size and get optimized priors for any input depending on deployment 31
  • 32.
    © 2019 allegro.ai FutureWork ● AutoML - Pruning towards required accuracy and model footprint ● “Reverse optimization” - Flag biased datasets which require more examples if there is underrepresentation ● Mask optimisation for instance segmentation (MASK-RCNN etc.) 32
  • 33.
    © 2019 allegro.ai Resources 33 Toolsused: ● allegro.ai deep learning perception platform ● Deep learning framework: Pytorch Research papers: ● Speed/accuracy trade-offs for modern convolutional object detectors, arXiv:1611.10012 ● SSD: Single Shot MultiBox Detector, arXiv:1512.02325
  • 34.
  • 35.
  • 36.
    © 2019 allegro.ai Whatis Deep Learning Computer Vision ● Computer vision: classification, detection, segmentation, recognition,... ● Based on deep learning - “Weak AI” - important, but limited tasks - data intensive, large memory footprint, expensive ops “Accurate” inference := Model trained on ‘input data’ gives accurate predictions in deployment 36 Model inference := perform CV task on input image/video
  • 37.
    © 2019 allegro.ai Detect= Locate Object + Classify ● Astounding progress ● Data-driven models ● Deployable tech ● Dedicated hardware OPTIONAL 37 Dragon: 86%
  • 38.
    © 2019 allegro.ai source:Youtube ‘off-the-shelf’ models are not enough (YOLOv3)
  • 39.
    © 2019 allegro.ai YOLO9000:Better, Faster, Stronger, arXiv:1612.08242 SSD vs YOLOv2
  • 40.
    © 2019 allegro.ai GroundTruth = GT Intersection over Union = IoU “Good prior” How to choose prior-GT match for training?
  • 41.
    © 2019 allegro.ai IoU:not *the* ideal choice for matching with priors Difference between thresholds is not easy to see unaided Question: Which Priors here should be trained to match the dog’s bounding box? https://www.reddit.com/r/computervision/comments/876h0f/yolo_v3_released/dwd7hpm/