Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero

¿Segmentación semántica?
¿Pero de qué me estás hablando?
Ricardo Guerrero Gómez-Olmedo
Email: ricardo.guerrero@beeva.com
Twitter: @ricgu8086

PoC Semantic
Segmentation
Inference Series I
BEE PART OF THE CHANGE
Avenida de Burgos, 16 D, 28036 Madrid
hablemos@beeva.com
www.beeva.com

3
ToC
1. Detection Vs Semantic Segmentation.
2. What is… inference?
3. Technology
4. Neural… wait for it … Networks
5. Dataset
6. Qualitative results
7. Quantitative results: metrics
8. Conclusions

5
Detection Vs Semantic segmentation

6
Detection Vs Semantic segmentation

7
Applications of semantic segmentation
Tags:
interactions between objects, commerce, health, Augmented Reality

10
What is... inference?
1st: explore your data
2nd: train your model
3rd: use it (a.k.a. inference)

12
Technology
● Robust
● Efficient
● Mature
● Huge “Model Zoo”
● In production
● Doesn’t break the API every 3 months
● Still in use in research and in industry
● But slowly decreasing its popularity in favor of
Tensorflow, PyTorch, Caffe2 …
Compatibility with custom
hardware: Intel Neural Stick.
NOT a special CPU. It’s ASIC
Hardware: it does only one
thing, but it’s the best at it.

13
Technology
● Robust
● Efficient
● Mature
● Huge “Model Zoo”
● In production
● Doesn’t break the API every 3 months
● Slowly decreasing its popularity in favor of
Tensorflow, PyTorch, Caffe2 ...
Compatibility with custom
hardware: Intel Neural Stick.
NOT a special CPU. It’s ASIC
Hardware: it does only one
thing, but it’s the best at it.
DISCLAIMER:
No Neural Stick was used in this POC (yet). It’s
in the roadmap, but this time an AWS instance
was used to limit the uncertainty in the first stage.
One problem at a time

14
Neural… wait for it…
Networks

15
Neural… wait for it… Networks
LeNet5 by Yann Lecun

16
ICLR 2017
This happy guy is
me.
MNIST
This is Yann LeCun.
Deep Learning
world-level expert.
Director of Facebook
AI Research

17
MNIST Dataset
Classify digits in bank checks (1998)
MNIST

18
Fully Convolutional Networks for Semantic Segmentation
Congress: PAMI (accepted May, 2016)
Not
State-of-the-art!!
FCN:
Fully
Convolutional
Network
Download
trained model

19

20

21
FCN:
Fully
Convolutional
Network
Depth = nº classes
{

26
We focus on this part!!
Dataset
21 effective classes: 20 + background.
Contains ignore label

27
Dataset
Object segmentation Class segmentation

28
We focus on this part!!
Dataset
21 effective classes: 20 + background.
Contains ignore label

30
Qualitative results
What is this?

32
Qualitative results
12 = dog
3 = bird
(not visible with
this colormap)
0 = background

33
Qualitative results
What is this?

34
Qualitative results
How do we get from left to right?

35
Qualitative results
PASCAL VOC 2012 Ground Truth

36
What does it mean Ground Truth (or GT)?
Have you heard about labels,
bounding boxes, segmentation
masks, etc?
* Ok, it exists what is called unsupervised learning and
semi-supervised learning, but it’s not our focus here.
You don’t train
with data (*). You
train with
annotated data.

37
Qualitative results
Remember, this is the ignore label

38
Qualitative results
1 = plane
9 = chair
4 = boat
1 = plane20 = tv monitor
(not visible with this
colormap)
0 = background

39
Quantitative results
& metrics

40
Why?
1 = plane
9 = chair
4 = boat
1 = plane20 = tv monitor
(not visible with this
colormap)
0 = background
How we
compare
models?

41
Metrics
Most common metrics:
● IoU (Intersection over Union), aka Jaccard
Index.
● Pixel accuracy.
Traits:
More natural, closest to what human expect.
Too restrictive. Easiest to compute.
a.k.a. Area of
Intersection
Pixel accuracy =
Correctly classified
Total pixels
%

42
Metrics
IoU: Why not just Area of
Overlap?

43
Metrics
My monitor is here:
Overlap?

44
Metrics
My monitor is here:
Overlap?
Overlap 100%

45
Metrics
a.k.a. Area of
Intersection
Pascal VOC Challenge criteria:
IoU >= 0.5 hit ✓
IoU < 0.5 miss X

46
Metrics
Most common metrics:
● IoU (Intersection over Union).
● Pixel accuracy.
Traits:
More natural, closest to what human expect.
Too restrictive. Easiest to compute.
a.k.a. Area of
Intersection
Pixel accuracy =
Correctly classified
Total pixels
%

47
Quantitative results:
Pixel accuracy
Pixel accuracy:
● First measurement without using the ignore label
(value 255)
Report: the mean pixel accuracy for the testing
split (100 images) is 0.685
● Second measurement, using the ignore label.
Report: the mean pixel accuracy for the testing
split (100 images) is 0.745
To take away:
● If we don’t use the ignore label, we penalyze our
model for things we don’t really care.
● Remember FCN is not state-of-the-art. Results are
much better.

48
Quantitative results: Timing
Timing
Dataset total images: 1449.
Limiting the analysis to the first 100.
Total: 766.160 s
Mean: 7.661 s
Variance: 0.005 s
Median: 7.654 s
>> time python compute_metrics.py
real 12m51.975s
user 12m46.936s
sys 0m2.712s
5.815 s overhead: loading libraries, caffe
engine, restoring network, resizing images, etc
● AWS M5.large (no GPU)
● Deep Learning AMI Ubuntu Linux
- 2.4_Oct2017 - ami-37bb714d
● Image by image, not batch.

50
Conclusions
● Deep Learning AMI: well prepared. Not well documented.
● Caffe: easy to use (remember it was already installed). Extremely
non verbose, python code very readable.
● Pascal Voc: very good dataset. Very difficult to load semantic
segmentation labels (with the right code it’s just 1 line).
● Semantic segmentation: more complexity than other tasks such
as classification or object detection.
● Recommendation: Caffe is a very good option, but more modern
options such as Caffe 2 or Pytorch should be tested and
compared.

51
More info
1. Inference series I: How to use Caffe with AWS’ Deep Learning AMI
for Semantic Segmentation
2. Inference series I [2nd round]: How to use Caffe with AWS’ Deep
Learning AMI for Semantic Segmentation

52
Future steps?
Objective:
1. Reduce model size -> AWS lambda, IoT
2. Accelerate speed
3. Keep same accuracy
It’s a tradeoff

53
Future steps?
How?
1. Model compression
2. Custom hardware accelerators:
a. Intel Neural stick
b. Google TPU
c. NVIDIA Volta (tensor cores)

Ricardo Guerrero Gómez-Olmedo
Email: ricardo.guerrero@beeva.com
Twitter: @ricgu8086
Medium: medium.com/@ricardo.guerrero
IT Researcher | BEEVA LABS
hablemos@beeva.com | www.beeva.com

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero

Recommended

Recommended

More Related Content

Similar to Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero

Similar to Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero (20)

Recently uploaded

Recently uploaded (20)

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hablando? - Ricardo Guerrero