6. MASK R-CNN
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick @ FAIR
https://arxiv.org/abs/1703.06870
7. DensePose: Dense Human Pose Estimation In The
Wild
Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos @ FAIR
Change the mask head
with body part / position
head
https://arxiv.org/abs/1802.00434
8. Problems with Mask R-CNN
● Slow: ~5fps with 1080ti at 800x1100
● There may be more than one instance in each box
● Performs poorly on objects with low box fill rate (chair, bicycle)
● A pixel may be shared by multiple objects
● Multi step - complex to implement and tweek.
9. RetinaNet : Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR
https://arxiv.org/abs/1708.02002
10. RetinaNet : Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR
https://arxiv.org/abs/1708.02002
12. Instance Embedding
Give every pixel an n-dimensional “color”
in an embedding space and cluster in that
space
Papers for today:
● 1703.10277 - Semantic Instance Segmentation via Deep Metric Learning
● 1708.02551 - Semantic Instance Segmentation with a Discriminative Loss
Function
● 1712.08273 - Recurrent Pixel Embedding for Instance Grouping
13. Instance Embedding
The 2018 Data Science Bowl on
Kaggle. Instance segmentation on
cell nuclei.
Original
Image
Semantic
Segmentation
First 7 dimensions
of embedding
space
14. Semantic Instance Segmentation via Deep Metric
Learning
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio
Guadarrama, Kevin P. Murphy
Google / UCLA
https://arxiv.org/abs/1703.10277
16. Semantic Instance Segmentation via Deep Metric
Learning
Pairwise pixel loss
Weights are set s.t. they balance large and small objects and summed to 1
19. Semantic Instance Segmentation via Deep Metric
Learning
Training the seeds
Pick K (=10) pixels at random and grow a mask around them with various
thresholds τ.
If we find a sufficient intersection with a ground truth object, the pixel is assigned
with its class.
20. Semantic Instance Segmentation via Deep Metric
Learning
Picking the seeds
Unlike NMS, diversity in embedding space is encouraged, rather than spatial
diversity.
21. Semantic Instance Segmentation via Deep Metric
Learning
● DeepLab v2 (resnet-101) backbone. Pre-trained with COCO
● Training starts with no classification/seediness score and gradually increased
to 0.2.
● Backbone used with a pyramid (0.25, 0.5, 1, 2) and results fed to the
embedding and seedines models.
● Evaluated on PASCAL VOC 2012
22.
23. Semantic Instance Segmentation with a
Discriminative Loss Function
Bert De Brabandere, Davy Neven, Luc Van Gool
ESAT-PSI, KU Leuven
https://arxiv.org/abs/1708.02551
https://github.com/DavyNeven/fastSceneUnderstanding
24. Semantic Instance Segmentation with a
Discriminative Loss Function
● Very similar to the previous
paper
● Uses discriminative loss
● Each class is embedded
independently
25. Semantic Instance Segmentation with a
Discriminative Loss Function
Pulling Force
Pushing Force
Regularization
α = 1
β = 1
γ = 0.001
Push threshold
1.5
Pull threshold
0.5
27. Semantic Instance Segmentation
with a Discriminative Loss
Function
Parsing | mean-shift clustering
1. Pick an unlabeled pixel and assume its
embedding value is the mean of the instance
a. find all pixels that are close (below threshold) to current mean
b. Calc the mean of the new set in embedding space
c. Go to step a. and repeat until convergence (mean is not changing)
2. Go to step 1. if more unlabeled pixels remains
31. Semantic Instance Segmentation with a
Discriminative Loss Function
● A semantic segmentation mask should be trained alongside. Clustering only
on pixels that are considered a part of an object.
● Probably best to set the pull threshold to 0.
● Unlike a loss on similar/different pixel pairs, with contrastive loss, information
flows between all pixels
● Semantic Segmentation matters a lot
● No need to balance instance sizes
32.
33. Recurrent Pixel Embedding for Instance Grouping
Shu Kong, Charless Fowlkes
University of California
https://arxiv.org/abs/1712.08273
https://github.com/aimerykong/Recurrent-Pixel-Embedding-for-Instance-Grouping
34. ● Embedding on a n-dimensional sphere
● Pairwise pixel loss, cosine distance.
● Main Contribution: mean-shift clustering is part of the model and
differentiable
Recurrent Pixel Embedding for Instance Grouping
38. Recurrent Pixel Embedding for Instance Grouping
●
● Cubic convergence guarantees
● May be applied only to neighbourhoods or any subset of the whole image
39. Recurrent Pixel Embedding for Instance Grouping
More on GBMS: http://www.cs.cmu.edu/~aarti/SMLRG/miguel_slides.pdf
40. Recurrent Pixel Embedding for Instance Grouping
● Gaussian distribution is not appropriate because the distance should be taken
with respect to the cosine distance
● Using von Mises-Fisher distribution
“gaussian” on the sphere surface
● Should perform L2 normalization
After each iteration
41. Recurrent Pixel Embedding for Instance Grouping
Fdsa
Uses the von Mises-Fisher distribution (gaussian on a sphere surface) instead of
the gaussian kernel
42. Recurrent Pixel Embedding for Instance Grouping
● Computing the similarity matrix is expensive. Only some of pixels participate
in this phase ~50%
● The loss is backpropagated
at each iteration of the
module
● The iterative application is considered as parallel to hard negative mining.
● DeepLab-v3 is used a backbone
43. Comparison
Embedding Loss Seeds Parsing
Semantic Instance Segmentation via Deep Metric Learning
pairwise sigmoid-like loss
euclidean distance
learned Seediness score expand mask around seeds
Semantic Instance Segmentation with a Discriminative Loss Function
center ≠ center
point -> center
euclidean distance
random mean-shift around seeds
Recurrent Pixel Embedding for Instance Grouping
pairwise pixel + GBMS
cosine distance
random GBMS Proposals and
simple LR + mean shift
44. Other papers
● End-to-End Instance Segmentation with Recurrent Attention
https://arxiv.org/abs/1605.09410
● Deep Watershed Transform for Instance Segmentation
https://arxiv.org/abs/1611.08303
● Associative Embedding: End-to-End Learning for Joint Detection and
Grouping
http://ttic.uchicago.edu/~mmaire/papers/pdf/affinity_cnn_cvpr2016.pdf
● SGN: Sequential Grouping Networks for Instance Segmentation
https://www.cs.toronto.edu/~urtasun/publications/liu_etal_iccv17.pdf
45. Takeaways
● Use contrastive loss with pulling threshold 0
● Either learn a seedeniess model or implement GBMS
● Accuracy/Speed trade off is achieved by almost exclusively replacing the
backbone
● Pretrain on COCO
● No one need to more than 64 dimensions of embedding space
● When all fails, use Mask-RCNN