Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Object Segmentation
Day 2 Lecture 7

Object Segmentation
Define the accurate boundaries of all objects in an image
2

Semantic Segmentation
Label every pixel!
Don’t differentiate
instances (cows)
Classic computer
vision problem
Slide Credit: CS231n 3

Instance Segmentation
Detect instances,
give category, label
pixels
“simultaneous
detection and
segmentation” (SDS)

Object Segmentation: Datasets
Pascal Visual Object Classes
20 Classes
~ 5.000 images
Pascal Context
540 Classes
~ 10.000 images
5

SUN RGB-D
19 Classes
~ 10.000 images
Microsoft COCO
80 Classes
~ 300.000 images
6

CityScapes
30 Classes
~ 25.000 images
ADE20K
>150 Classes
~ 22.000 images
7

Slide Credit: CS231n
CNN COW
Extract
patch
Run through
a CNN
Classify
center pixel
Repeat for
every pixel
8

CNN
Run “fully convolutional” network
to get all pixels at once
9

CNN
Smaller output
due to pooling
Problem 1:
10

Learnable upsampling
Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015
Learnable upsampling!

Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
12

Dot product
between filter
and input
13

Dot product
between filter
and input
14

15

Dot product
between filter
and input
16

Dot product
between filter
and input
17

Learnable Upsample: Deconvolutional Layer
3 x 3 “deconvolution”, stride 2 pad 1
18

Input gives
weight for
filter values
19

Input gives
weight for
filter values
Sum where
output overlaps
20

Warning: Checkerboard effect when kernel size is not
divisible by the stride
Source: distill.pub
21

Source: distill.pub
stride = 2, kernel_size = 3
22
Warning: Checkerboard effect when kernel size is not
divisible by the stride

Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015
“Regular” VGG “Upside down” VGG
23

Better Upsampling: Subpixel
Re-arange features in previous convolutional layer to form a
higher resolution output
Shi et al.Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network.CVPR 2016
24

CNN
Blobby-like
segmentations
Problem 2:
High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the
segmentation branch
25

Skip Connections
Skip connections = Better results
“skip
connections”
Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015
Recovering low level features from early layers
26

Dilated Convolutions
Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016
Structural change in convolutional layers for dense prediction problems (e.g. image segmentation)
● The receptive field grows exponentially as you add more layers → more context information in deeper
layers wrt regular convolutions
● Number of parameters increases linearly as you add more layers
27

Detect instances,
give category, label
pixels
“simultaneous
detection and
segmentation” (SDS)

More challenging than Semantic Segmentation
● Number of objects is variable
● No unique match between predicted and ground truth objects (cannot use
instance IDs)
Several attack lines:
● Proposal-based methods
● Recurrent Neural Networks
29

Proposal-based
Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014
External
Segment
proposals
Mask out background
with mean image
Similar to R-CNN, but with segment proposals
30

Proposal-based
Slide Credit: CS231nHariharan et al. Hypercolumns for Object Segmentation and Fine-grained Localization. CVPR 2015
31

Proposal-based Instance Segmentation: MNC
Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016
Won COCO 2015
challenge
(with ResNet)
Region proposal network (RPN)
Reshape boxes to
fixed size,
figure / ground
logistic regression
Mask out background,
predict object class
Learn entire model
end-to-end!
Faster R-CNN for Pixel Level Segmentation in a multi-stage cascade strategy
32

Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016
Predictions Ground truth
Proposal-based Instance Segmentation: MNC
33

He et al. Mask R-CNN. arXiv Mar 2017
Proposal-based Instance Segmentation: Mask R-CNN
Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks
and class labels
34

Mask R-CNN
● Classification & box detection losses are identical to those in Faster R-CNN
● Addition of a new loss term for mask prediction:
The network outputs a K x m x m volume for mask prediction, where K is the
number of categories and m is the size of the mask (square)
35

Mask R-CNN: RoI Align
Reminder: RoI Pool from Fast R-CNN
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
Fully-connected layers expect
low-res conv features:
C x h x w
x/16 & rounding → misalignment ! + not differentiable
36

Jaderberg et al. Spatial Transformer Networks. NIPS 2015
Mask R-CNN: RoI Align
Use bilinear interpolation instead of cropping + maxpool
37
Mapping given by box coordinates
( 12
and 21
= 0 translation + scale)

Mask R-CNN
Object Detection
39

Recurrent Instance Segmentation
Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016
40
Sequential mask generation

Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016
41
Mapping between ground truth and predicted masks ?

Recurrent Instance Segmentation:
Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC
1-Compute the IoU for all pairs of Predicted/GT
masks
Ŷt
Yt
42
Coverage Loss

1-Compute the IoU for all pairs of Predicted/GT
masks
0.9
0
0
0.1
0.8
0.1
...
...
...
...
43
Coverage Loss

2-Find best matching:
Loss: Sum of the Intersections over the union for
the best matching (*-1)
44
Coverage Loss

3-Also take into account the scores
s1
= 0.93
s2
= 0.73
s3
= 0.86
s4
= 0.63
s5
= 0.56
Where:
is the binary cross entropy:
is the Iverson bracket which:
Is 1 if the condition is true and 0 else
45
Coverage Loss

4-Add everything together
46
Coverage Loss

Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 47

Summary
Segmentation Datasets
Semantic Segmentation Methods
● Deconvolution
● Dilated Convolution
● Skip Connections
Instance Segmentation Methods
● Proposal-Based
● Recurrent
48

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

Similar to Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017) (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)