Instance Segmentation
The first independent seminar #8
Bar Vinograd / 25.03.2018 / Tel Aviv University
● What is Instance Segmentation?
● Mask R-CNN Overview
● Instance Embedding (3 papers)
● Summary
Agenda
What is instance segmentation?
http://cs231n.stanford.edu/index.html
What is instance segmentation?
Datasets
● Stills
○ CVPPP leaf segmentation
○ PASCAL VOC
○ COCO
○ CityScapes
○ KITTI Vehicles
○ ...
● Video
○ DAVIS
○ CityScapes
MASK R-CNN
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick @ FAIR
https://arxiv.org/abs/1703.06870
DensePose: Dense Human Pose Estimation In The
Wild
Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos @ FAIR
Change the mask head
with body part / position
head
https://arxiv.org/abs/1802.00434
Problems with Mask R-CNN
● Slow: ~5fps with 1080ti at 800x1100
● There may be more than one instance in each box
● Performs poorly on objects with low box fill rate (chair, bicycle)
● A pixel may be shared by multiple objects
● Multi step - complex to implement and tweek.
RetinaNet : Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR
https://arxiv.org/abs/1708.02002
RetinaNet : Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR
https://arxiv.org/abs/1708.02002
THE FUTURE
Instance Embedding
Give every pixel an n-dimensional “color”
in an embedding space and cluster in that
space
Papers for today:
● 1703.10277 - Semantic Instance Segmentation via Deep Metric Learning
● 1708.02551 - Semantic Instance Segmentation with a Discriminative Loss
Function
● 1712.08273 - Recurrent Pixel Embedding for Instance Grouping
Instance Embedding
The 2018 Data Science Bowl on
Kaggle. Instance segmentation on
cell nuclei.
Original
Image
Semantic
Segmentation
First 7 dimensions
of embedding
space
Semantic Instance Segmentation via Deep Metric
Learning
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio
Guadarrama, Kevin P. Murphy
Google / UCLA
https://arxiv.org/abs/1703.10277
Semantic Instance Segmentation via Deep Metric
Learning
Semantic Instance Segmentation via Deep Metric
Learning
Pairwise pixel loss
Weights are set s.t. they balance large and small objects and summed to 1
Semantic Instance Segmentation via Deep Metric
Learning
Semantic Instance Segmentation via Deep Metric
Learning
Semantic Instance Segmentation via Deep Metric
Learning
Training the seeds
Pick K (=10) pixels at random and grow a mask around them with various
thresholds τ.
If we find a sufficient intersection with a ground truth object, the pixel is assigned
with its class.
Semantic Instance Segmentation via Deep Metric
Learning
Picking the seeds
Unlike NMS, diversity in embedding space is encouraged, rather than spatial
diversity.
Semantic Instance Segmentation via Deep Metric
Learning
● DeepLab v2 (resnet-101) backbone. Pre-trained with COCO
● Training starts with no classification/seediness score and gradually increased
to 0.2.
● Backbone used with a pyramid (0.25, 0.5, 1, 2) and results fed to the
embedding and seedines models.
● Evaluated on PASCAL VOC 2012
Semantic Instance Segmentation with a
Discriminative Loss Function
Bert De Brabandere, Davy Neven, Luc Van Gool
ESAT-PSI, KU Leuven
https://arxiv.org/abs/1708.02551
https://github.com/DavyNeven/fastSceneUnderstanding
Semantic Instance Segmentation with a
Discriminative Loss Function
● Very similar to the previous
paper
● Uses discriminative loss
● Each class is embedded
independently
Semantic Instance Segmentation with a
Discriminative Loss Function
Pulling Force
Pushing Force
Regularization
α = 1
β = 1
γ = 0.001
Push threshold
1.5
Pull threshold
0.5
Semantic Instance Segmentation with a
Discriminative Loss Function
Semantic Instance Segmentation
with a Discriminative Loss
Function
Parsing | mean-shift clustering
1. Pick an unlabeled pixel and assume its
embedding value is the mean of the instance
a. find all pixels that are close (below threshold) to current mean
b. Calc the mean of the new set in embedding space
c. Go to step a. and repeat until convergence (mean is not changing)
2. Go to step 1. if more unlabeled pixels remains
Semantic Instance Segmentation with a
Discriminative Loss Function
Semantic Instance Segmentation with a
Discriminative Loss Function
Semantic Instance Segmentation with a
Discriminative Loss Function
● A semantic segmentation mask should be trained alongside. Clustering only
on pixels that are considered a part of an object.
● Probably best to set the pull threshold to 0.
● Unlike a loss on similar/different pixel pairs, with contrastive loss, information
flows between all pixels
● Semantic Segmentation matters a lot
● No need to balance instance sizes
Recurrent Pixel Embedding for Instance Grouping
Shu Kong, Charless Fowlkes
University of California
https://arxiv.org/abs/1712.08273
https://github.com/aimerykong/Recurrent-Pixel-Embedding-for-Instance-Grouping
● Embedding on a n-dimensional sphere
● Pairwise pixel loss, cosine distance.
● Main Contribution: mean-shift clustering is part of the model and
differentiable
Recurrent Pixel Embedding for Instance Grouping
Recurrent Pixel Embedding for Instance Grouping
Recurrent Pixel Embedding for Instance Grouping
● Calibrated cosine distance
Weighted by the size of the instances
Use α = 0.5
Recurrent Pixel Embedding for Instance Grouping
Recurrent Pixel Embedding for Instance Grouping
●
● Cubic convergence guarantees
● May be applied only to neighbourhoods or any subset of the whole image
Recurrent Pixel Embedding for Instance Grouping
More on GBMS: http://www.cs.cmu.edu/~aarti/SMLRG/miguel_slides.pdf
Recurrent Pixel Embedding for Instance Grouping
● Gaussian distribution is not appropriate because the distance should be taken
with respect to the cosine distance
● Using von Mises-Fisher distribution
“gaussian” on the sphere surface
● Should perform L2 normalization
After each iteration
Recurrent Pixel Embedding for Instance Grouping
Fdsa
Uses the von Mises-Fisher distribution (gaussian on a sphere surface) instead of
the gaussian kernel
Recurrent Pixel Embedding for Instance Grouping
● Computing the similarity matrix is expensive. Only some of pixels participate
in this phase ~50%
● The loss is backpropagated
at each iteration of the
module
● The iterative application is considered as parallel to hard negative mining.
● DeepLab-v3 is used a backbone
Comparison
Embedding Loss Seeds Parsing
Semantic Instance Segmentation via Deep Metric Learning
pairwise sigmoid-like loss
euclidean distance
learned Seediness score expand mask around seeds
Semantic Instance Segmentation with a Discriminative Loss Function
center ≠ center
point -> center
euclidean distance
random mean-shift around seeds
Recurrent Pixel Embedding for Instance Grouping
pairwise pixel + GBMS
cosine distance
random GBMS Proposals and
simple LR + mean shift
Other papers
● End-to-End Instance Segmentation with Recurrent Attention
https://arxiv.org/abs/1605.09410
● Deep Watershed Transform for Instance Segmentation
https://arxiv.org/abs/1611.08303
● Associative Embedding: End-to-End Learning for Joint Detection and
Grouping
http://ttic.uchicago.edu/~mmaire/papers/pdf/affinity_cnn_cvpr2016.pdf
● SGN: Sequential Grouping Networks for Instance Segmentation
https://www.cs.toronto.edu/~urtasun/publications/liu_etal_iccv17.pdf
Takeaways
● Use contrastive loss with pulling threshold 0
● Either learn a seedeniess model or implement GBMS
● Accuracy/Speed trade off is achieved by almost exclusively replacing the
backbone
● Pretrain on COCO
● No one need to more than 64 dimensions of embedding space
● When all fails, use Mask-RCNN
Questions?
Thank You!
me@barvinograd.com

Instance Segmentation with Embedding | Bar Vinograd

  • 1.
    Instance Segmentation The firstindependent seminar #8 Bar Vinograd / 25.03.2018 / Tel Aviv University
  • 2.
    ● What isInstance Segmentation? ● Mask R-CNN Overview ● Instance Embedding (3 papers) ● Summary Agenda
  • 3.
    What is instancesegmentation? http://cs231n.stanford.edu/index.html
  • 4.
    What is instancesegmentation?
  • 5.
    Datasets ● Stills ○ CVPPPleaf segmentation ○ PASCAL VOC ○ COCO ○ CityScapes ○ KITTI Vehicles ○ ... ● Video ○ DAVIS ○ CityScapes
  • 6.
    MASK R-CNN Kaiming He,Georgia Gkioxari, Piotr Dollár, Ross Girshick @ FAIR https://arxiv.org/abs/1703.06870
  • 7.
    DensePose: Dense HumanPose Estimation In The Wild Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos @ FAIR Change the mask head with body part / position head https://arxiv.org/abs/1802.00434
  • 8.
    Problems with MaskR-CNN ● Slow: ~5fps with 1080ti at 800x1100 ● There may be more than one instance in each box ● Performs poorly on objects with low box fill rate (chair, bicycle) ● A pixel may be shared by multiple objects ● Multi step - complex to implement and tweek.
  • 9.
    RetinaNet : FocalLoss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR https://arxiv.org/abs/1708.02002
  • 10.
    RetinaNet : FocalLoss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár @ FAIR https://arxiv.org/abs/1708.02002
  • 11.
  • 12.
    Instance Embedding Give everypixel an n-dimensional “color” in an embedding space and cluster in that space Papers for today: ● 1703.10277 - Semantic Instance Segmentation via Deep Metric Learning ● 1708.02551 - Semantic Instance Segmentation with a Discriminative Loss Function ● 1712.08273 - Recurrent Pixel Embedding for Instance Grouping
  • 13.
    Instance Embedding The 2018Data Science Bowl on Kaggle. Instance segmentation on cell nuclei. Original Image Semantic Segmentation First 7 dimensions of embedding space
  • 14.
    Semantic Instance Segmentationvia Deep Metric Learning Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy Google / UCLA https://arxiv.org/abs/1703.10277
  • 15.
    Semantic Instance Segmentationvia Deep Metric Learning
  • 16.
    Semantic Instance Segmentationvia Deep Metric Learning Pairwise pixel loss Weights are set s.t. they balance large and small objects and summed to 1
  • 17.
    Semantic Instance Segmentationvia Deep Metric Learning
  • 18.
    Semantic Instance Segmentationvia Deep Metric Learning
  • 19.
    Semantic Instance Segmentationvia Deep Metric Learning Training the seeds Pick K (=10) pixels at random and grow a mask around them with various thresholds τ. If we find a sufficient intersection with a ground truth object, the pixel is assigned with its class.
  • 20.
    Semantic Instance Segmentationvia Deep Metric Learning Picking the seeds Unlike NMS, diversity in embedding space is encouraged, rather than spatial diversity.
  • 21.
    Semantic Instance Segmentationvia Deep Metric Learning ● DeepLab v2 (resnet-101) backbone. Pre-trained with COCO ● Training starts with no classification/seediness score and gradually increased to 0.2. ● Backbone used with a pyramid (0.25, 0.5, 1, 2) and results fed to the embedding and seedines models. ● Evaluated on PASCAL VOC 2012
  • 23.
    Semantic Instance Segmentationwith a Discriminative Loss Function Bert De Brabandere, Davy Neven, Luc Van Gool ESAT-PSI, KU Leuven https://arxiv.org/abs/1708.02551 https://github.com/DavyNeven/fastSceneUnderstanding
  • 24.
    Semantic Instance Segmentationwith a Discriminative Loss Function ● Very similar to the previous paper ● Uses discriminative loss ● Each class is embedded independently
  • 25.
    Semantic Instance Segmentationwith a Discriminative Loss Function Pulling Force Pushing Force Regularization α = 1 β = 1 γ = 0.001 Push threshold 1.5 Pull threshold 0.5
  • 26.
    Semantic Instance Segmentationwith a Discriminative Loss Function
  • 27.
    Semantic Instance Segmentation witha Discriminative Loss Function Parsing | mean-shift clustering 1. Pick an unlabeled pixel and assume its embedding value is the mean of the instance a. find all pixels that are close (below threshold) to current mean b. Calc the mean of the new set in embedding space c. Go to step a. and repeat until convergence (mean is not changing) 2. Go to step 1. if more unlabeled pixels remains
  • 28.
    Semantic Instance Segmentationwith a Discriminative Loss Function
  • 29.
    Semantic Instance Segmentationwith a Discriminative Loss Function
  • 31.
    Semantic Instance Segmentationwith a Discriminative Loss Function ● A semantic segmentation mask should be trained alongside. Clustering only on pixels that are considered a part of an object. ● Probably best to set the pull threshold to 0. ● Unlike a loss on similar/different pixel pairs, with contrastive loss, information flows between all pixels ● Semantic Segmentation matters a lot ● No need to balance instance sizes
  • 33.
    Recurrent Pixel Embeddingfor Instance Grouping Shu Kong, Charless Fowlkes University of California https://arxiv.org/abs/1712.08273 https://github.com/aimerykong/Recurrent-Pixel-Embedding-for-Instance-Grouping
  • 34.
    ● Embedding ona n-dimensional sphere ● Pairwise pixel loss, cosine distance. ● Main Contribution: mean-shift clustering is part of the model and differentiable Recurrent Pixel Embedding for Instance Grouping
  • 35.
    Recurrent Pixel Embeddingfor Instance Grouping
  • 36.
    Recurrent Pixel Embeddingfor Instance Grouping ● Calibrated cosine distance Weighted by the size of the instances Use α = 0.5
  • 37.
    Recurrent Pixel Embeddingfor Instance Grouping
  • 38.
    Recurrent Pixel Embeddingfor Instance Grouping ● ● Cubic convergence guarantees ● May be applied only to neighbourhoods or any subset of the whole image
  • 39.
    Recurrent Pixel Embeddingfor Instance Grouping More on GBMS: http://www.cs.cmu.edu/~aarti/SMLRG/miguel_slides.pdf
  • 40.
    Recurrent Pixel Embeddingfor Instance Grouping ● Gaussian distribution is not appropriate because the distance should be taken with respect to the cosine distance ● Using von Mises-Fisher distribution “gaussian” on the sphere surface ● Should perform L2 normalization After each iteration
  • 41.
    Recurrent Pixel Embeddingfor Instance Grouping Fdsa Uses the von Mises-Fisher distribution (gaussian on a sphere surface) instead of the gaussian kernel
  • 42.
    Recurrent Pixel Embeddingfor Instance Grouping ● Computing the similarity matrix is expensive. Only some of pixels participate in this phase ~50% ● The loss is backpropagated at each iteration of the module ● The iterative application is considered as parallel to hard negative mining. ● DeepLab-v3 is used a backbone
  • 43.
    Comparison Embedding Loss SeedsParsing Semantic Instance Segmentation via Deep Metric Learning pairwise sigmoid-like loss euclidean distance learned Seediness score expand mask around seeds Semantic Instance Segmentation with a Discriminative Loss Function center ≠ center point -> center euclidean distance random mean-shift around seeds Recurrent Pixel Embedding for Instance Grouping pairwise pixel + GBMS cosine distance random GBMS Proposals and simple LR + mean shift
  • 44.
    Other papers ● End-to-EndInstance Segmentation with Recurrent Attention https://arxiv.org/abs/1605.09410 ● Deep Watershed Transform for Instance Segmentation https://arxiv.org/abs/1611.08303 ● Associative Embedding: End-to-End Learning for Joint Detection and Grouping http://ttic.uchicago.edu/~mmaire/papers/pdf/affinity_cnn_cvpr2016.pdf ● SGN: Sequential Grouping Networks for Instance Segmentation https://www.cs.toronto.edu/~urtasun/publications/liu_etal_iccv17.pdf
  • 45.
    Takeaways ● Use contrastiveloss with pulling threshold 0 ● Either learn a seedeniess model or implement GBMS ● Accuracy/Speed trade off is achieved by almost exclusively replacing the backbone ● Pretrain on COCO ● No one need to more than 64 dimensions of embedding space ● When all fails, use Mask-RCNN
  • 46.
  • 47.