¿Segmentación semántica?
¿Pero de qué me estás hablando?
Ricardo Guerrero Gómez-Olmedo
Email: ricardo.guerrero@beeva.com
Twitter: @ricgu8086
PoC Semantic
Segmentation
Inference Series I
BEE PART OF THE CHANGE
Avenida de Burgos, 16 D, 28036 Madrid
hablemos@beeva.com
www.beeva.com
3
ToC
1. Detection Vs Semantic Segmentation.
2. What is… inference?
3. Technology
4. Neural… wait for it … Networks
5. Dataset
6. Qualitative results
7. Quantitative results: metrics
8. Conclusions
4
Making off
5
Detection Vs Semantic segmentation
6
Detection Vs Semantic segmentation
7
Applications of semantic segmentation
Tags:
interactions between objects, commerce, health, Augmented Reality
8
What is...
inference?
9
What is... inference?
10
What is... inference?
1st: explore your data
2nd: train your model
3rd: use it (a.k.a. inference)
11
Technology
12
Technology
● Robust
● Efficient
● Mature
● Huge “Model Zoo”
● In production
● Doesn’t break the API every 3 months
● Still in use in research and in industry
● But slowly decreasing its popularity in favor of
Tensorflow, PyTorch, Caffe2 …
Compatibility with custom
hardware: Intel Neural Stick.
NOT a special CPU. It’s ASIC
Hardware: it does only one
thing, but it’s the best at it.
13
Technology
● Robust
● Efficient
● Mature
● Huge “Model Zoo”
● In production
● Doesn’t break the API every 3 months
● Slowly decreasing its popularity in favor of
Tensorflow, PyTorch, Caffe2 ...
Compatibility with custom
hardware: Intel Neural Stick.
NOT a special CPU. It’s ASIC
Hardware: it does only one
thing, but it’s the best at it.
DISCLAIMER:
No Neural Stick was used in this POC (yet). It’s
in the roadmap, but this time an AWS instance
was used to limit the uncertainty in the first stage.
One problem at a time
14
Neural… wait for it…
Networks
15
Neural… wait for it… Networks
LeNet5 by Yann Lecun
16
ICLR 2017
This happy guy is
me.
MNIST
This is Yann LeCun.
Deep Learning
world-level expert.
Director of Facebook
AI Research
17
MNIST Dataset
Classify digits in bank checks (1998)
MNIST
18
Neural… wait for it… Networks
Fully Convolutional Networks for Semantic Segmentation
Congress: PAMI (accepted May, 2016)
Not
State-of-the-art!!
FCN:
Fully
Convolutional
Network
Download
trained model
19
Neural… wait for it… Networks
20
Neural… wait for it… Networks
21
Neural… wait for it… Networks
FCN:
Fully
Convolutional
Network
Depth = nº classes
{
22
Dataset
23
Dataset
24
Dataset
25
Dataset
26
We focus on this part!!
Dataset
21 effective classes: 20 + background.
Contains ignore label
27
Dataset
Object segmentation Class segmentation
28
We focus on this part!!
Dataset
21 effective classes: 20 + background.
Contains ignore label
29
Qualitative results
30
Qualitative results
What is this?
31
Qualitative results
32
Qualitative results
12 = dog
3 = bird
(not visible with
this colormap)
0 = background
33
Qualitative results
What is this?
34
Qualitative results
How do we get from left to right?
35
Qualitative results
PASCAL VOC 2012 Ground Truth
36
What does it mean Ground Truth (or GT)?
Have you heard about labels,
bounding boxes, segmentation
masks, etc?
* Ok, it exists what is called unsupervised learning and
semi-supervised learning, but it’s not our focus here.
You don’t train
with data (*). You
train with
annotated data.
37
Qualitative results
Remember, this is the ignore label
38
Qualitative results
1 = plane
9 = chair
4 = boat
1 = plane20 = tv monitor
(not visible with this
colormap)
0 = background
39
Quantitative results
& metrics
40
Why?
1 = plane
9 = chair
4 = boat
1 = plane20 = tv monitor
(not visible with this
colormap)
0 = background
How we
compare
models?
41
Metrics
Most common metrics:
● IoU (Intersection over Union), aka Jaccard
Index.
● Pixel accuracy.
Traits:
More natural, closest to what human expect.
Too restrictive. Easiest to compute.
a.k.a. Area of
Intersection
Pixel accuracy =
Correctly classified
Total pixels
%
42
Metrics
IoU: Why not just Area of
Overlap?
43
Metrics
My monitor is here:
IoU: Why not just Area of
Overlap?
44
Metrics
My monitor is here:
IoU: Why not just Area of
Overlap?
Overlap 100%
45
Metrics
a.k.a. Area of
Intersection
Pascal VOC Challenge criteria:
IoU >= 0.5 hit ✓
IoU < 0.5 miss X
46
Metrics
Most common metrics:
● IoU (Intersection over Union).
● Pixel accuracy.
Traits:
More natural, closest to what human expect.
Too restrictive. Easiest to compute.
a.k.a. Area of
Intersection
Pixel accuracy =
Correctly classified
Total pixels
%
47
Quantitative results:
Pixel accuracy
Pixel accuracy:
● First measurement without using the ignore label
(value 255)
Report: the mean pixel accuracy for the testing
split (100 images) is 0.685
● Second measurement, using the ignore label.
Report: the mean pixel accuracy for the testing
split (100 images) is 0.745
To take away:
● If we don’t use the ignore label, we penalyze our
model for things we don’t really care.
● Remember FCN is not state-of-the-art. Results are
much better.
48
Quantitative results: Timing
Timing
Dataset total images: 1449.
Limiting the analysis to the first 100.
Total: 766.160 s
Mean: 7.661 s
Variance: 0.005 s
Median: 7.654 s
>> time python compute_metrics.py
real 12m51.975s
user 12m46.936s
sys 0m2.712s
5.815 s overhead: loading libraries, caffe
engine, restoring network, resizing images, etc
● AWS M5.large (no GPU)
● Deep Learning AMI Ubuntu Linux
- 2.4_Oct2017 - ami-37bb714d
● Image by image, not batch.
49
Conclusions
& Future steps
50
Conclusions
● Deep Learning AMI: well prepared. Not well documented.
● Caffe: easy to use (remember it was already installed). Extremely
non verbose, python code very readable.
● Pascal Voc: very good dataset. Very difficult to load semantic
segmentation labels (with the right code it’s just 1 line).
● Semantic segmentation: more complexity than other tasks such
as classification or object detection.
● Recommendation: Caffe is a very good option, but more modern
options such as Caffe 2 or Pytorch should be tested and
compared.
51
More info
1. Inference series I: How to use Caffe with AWS’ Deep Learning AMI
for Semantic Segmentation
2. Inference series I [2nd round]: How to use Caffe with AWS’ Deep
Learning AMI for Semantic Segmentation
52
Future steps?
Objective:
1. Reduce model size -> AWS lambda, IoT
2. Accelerate speed
3. Keep same accuracy
It’s a tradeoff
53
Future steps?
How?
1. Model compression
2. Custom hardware accelerators:
a. Intel Neural stick
b. Google TPU
c. NVIDIA Volta (tensor cores)
54
?
?
?
?
Any questions?
Ricardo Guerrero Gómez-Olmedo
Email: ricardo.guerrero@beeva.com
Twitter: @ricgu8086
Medium: medium.com/@ricardo.guerrero
IT Researcher | BEEVA LABS
hablemos@beeva.com | www.beeva.com

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero