Instance Segmentation with Embedding | Bar Vinograd

Instance Segmentation
The first independent seminar #8
Bar Vinograd / 25.03.2018 / Tel Aviv University

● What is Instance Segmentation?
● Mask R-CNN Overview
● Instance Embedding (3 papers)
● Summary
Agenda

What is instance segmentation?
http://cs231n.stanford.edu/index.html

What is instance segmentation?

Datasets
● Stills
○ CVPPP leaf segmentation
○ PASCAL VOC
○ COCO
○ CityScapes
○ KITTI Vehicles
○ ...
● Video
○ DAVIS
○ CityScapes

MASK R-CNN
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick @ FAIR
https://arxiv.org/abs/1703.06870

DensePose: Dense Human Pose Estimation In The
Wild
Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos @ FAIR
Change the mask head
with body part / position
head

Problems with Mask R-CNN
● Slow: ~5fps with 1080ti at 800x1100
● There may be more than one instance in each box
● Performs poorly on objects with low box fill rate (chair, bicycle)
● A pixel may be shared by multiple objects
● Multi step - complex to implement and tweek.

RetinaNet : Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR

Instance Embedding
Give every pixel an n-dimensional “color”
in an embedding space and cluster in that
space
Papers for today:
● 1703.10277 - Semantic Instance Segmentation via Deep Metric Learning
● 1708.02551 - Semantic Instance Segmentation with a Discriminative Loss
Function
● 1712.08273 - Recurrent Pixel Embedding for Instance Grouping

Instance Embedding
The 2018 Data Science Bowl on
Kaggle. Instance segmentation on
cell nuclei.
Original
Image
Semantic
Segmentation
First 7 dimensions
of embedding
space

Semantic Instance Segmentation via Deep Metric
Learning
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio
Guadarrama, Kevin P. Murphy
Google / UCLA

Learning

Learning
Pairwise pixel loss
Weights are set s.t. they balance large and small objects and summed to 1

Learning
Training the seeds
Pick K (=10) pixels at random and grow a mask around them with various
thresholds τ.
If we find a sufficient intersection with a ground truth object, the pixel is assigned
with its class.

Learning
Picking the seeds
Unlike NMS, diversity in embedding space is encouraged, rather than spatial
diversity.

Learning
● DeepLab v2 (resnet-101) backbone. Pre-trained with COCO
● Training starts with no classification/seediness score and gradually increased
to 0.2.
● Backbone used with a pyramid (0.25, 0.5, 1, 2) and results fed to the
embedding and seedines models.
● Evaluated on PASCAL VOC 2012

Semantic Instance Segmentation with a
Discriminative Loss Function
Bert De Brabandere, Davy Neven, Luc Van Gool
ESAT-PSI, KU Leuven
https://github.com/DavyNeven/fastSceneUnderstanding

● Very similar to the previous
paper
● Uses discriminative loss
● Each class is embedded
independently

Pulling Force
Pushing Force
Regularization
α = 1
β = 1
γ = 0.001
Push threshold
1.5
Pull threshold
0.5

Semantic Instance Segmentation
with a Discriminative Loss
Function
Parsing | mean-shift clustering
1. Pick an unlabeled pixel and assume its
embedding value is the mean of the instance
a. find all pixels that are close (below threshold) to current mean
b. Calc the mean of the new set in embedding space
c. Go to step a. and repeat until convergence (mean is not changing)
2. Go to step 1. if more unlabeled pixels remains

● A semantic segmentation mask should be trained alongside. Clustering only
on pixels that are considered a part of an object.
● Probably best to set the pull threshold to 0.
● Unlike a loss on similar/different pixel pairs, with contrastive loss, information
flows between all pixels
● Semantic Segmentation matters a lot
● No need to balance instance sizes

Recurrent Pixel Embedding for Instance Grouping
Shu Kong, Charless Fowlkes
University of California
https://github.com/aimerykong/Recurrent-Pixel-Embedding-for-Instance-Grouping

● Embedding on a n-dimensional sphere
● Pairwise pixel loss, cosine distance.
● Main Contribution: mean-shift clustering is part of the model and
differentiable

● Calibrated cosine distance
Weighted by the size of the instances
Use α = 0.5

●
● Cubic convergence guarantees
● May be applied only to neighbourhoods or any subset of the whole image

More on GBMS: http://www.cs.cmu.edu/~aarti/SMLRG/miguel_slides.pdf

● Gaussian distribution is not appropriate because the distance should be taken
with respect to the cosine distance
● Using von Mises-Fisher distribution
“gaussian” on the sphere surface
● Should perform L2 normalization
After each iteration

Fdsa
Uses the von Mises-Fisher distribution (gaussian on a sphere surface) instead of
the gaussian kernel

● Computing the similarity matrix is expensive. Only some of pixels participate
in this phase ~50%
● The loss is backpropagated
at each iteration of the
module
● The iterative application is considered as parallel to hard negative mining.
● DeepLab-v3 is used a backbone

Comparison
Embedding Loss Seeds Parsing
Semantic Instance Segmentation via Deep Metric Learning
pairwise sigmoid-like loss
euclidean distance
learned Seediness score expand mask around seeds
Semantic Instance Segmentation with a Discriminative Loss Function
center ≠ center
point -> center
euclidean distance
random mean-shift around seeds
pairwise pixel + GBMS
cosine distance
random GBMS Proposals and
simple LR + mean shift

Other papers
● End-to-End Instance Segmentation with Recurrent Attention
● Deep Watershed Transform for Instance Segmentation
● Associative Embedding: End-to-End Learning for Joint Detection and
Grouping
http://ttic.uchicago.edu/~mmaire/papers/pdf/affinity_cnn_cvpr2016.pdf
● SGN: Sequential Grouping Networks for Instance Segmentation
https://www.cs.toronto.edu/~urtasun/publications/liu_etal_iccv17.pdf

Takeaways
● Use contrastive loss with pulling threshold 0
● Either learn a seedeniess model or implement GBMS
● Accuracy/Speed trade off is achieved by almost exclusively replacing the
backbone
● Pretrain on COCO
● No one need to more than 64 dimensions of embedding space
● When all fails, use Mask-RCNN

Instance Segmentation with Embedding | Bar Vinograd

Recommended

Recommended

More Related Content

Similar to Instance Segmentation with Embedding | Bar Vinograd

Similar to Instance Segmentation with Embedding | Bar Vinograd (20)

Recently uploaded

Recently uploaded (20)

Instance Segmentation with Embedding | Bar Vinograd