© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Semantic Segmentation
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Problem statement: Pixel-level classification task
Applications: Brain tissue segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox, 2015
source: https://github.com/reachsumit/deep-unet-for-satellite-image-segmentation
Applications: Satellite image land use
Applications: Self-driving cars
source: https://www.youtube.com/watch?v=ATlcEDSPWXY
How does it work?
Source: Fully Convolutional Networks for Semantic Segmentation, Long et al. 2015
Deep Neural Network
Input
RGB or Grayscale Images
Unsigned integer [0,255]
N classes
Output: predict one “heat-map” per class
Softmax across class axis
How does it work?
Trained to minimize the softmax cross entropy loss for each pixel i,j
predictions among the N different classes:
𝑙𝑜𝑠𝑠 = −
𝑖,𝑗
𝐻,𝑊
𝑐
𝑁
𝑦𝑖,𝑗,𝑐 ∗ log(𝑝𝑖,𝑗,𝑐)
𝑙𝑜𝑠𝑠 = −
𝑖,𝑗
𝐻,𝑊
log(𝑝𝑖,𝑗,𝑐=𝑦 𝑖,𝑗
)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Main challenge: capturing multi-scale context
cow?
Source: Deep LabV3 Rethinking Atrous Convolution for Semantic Image Segmentation, Chen et al. 2017
Strategies for capturing multi-scale context
Architectures: HourGlass
Architecture of the full network. The convolution network is based on the VGG16 architecture. The deconvolution
network uses unpooling and deconvolution layers. Source: H. Noh et al. (2015)
Architectures: U-Net
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox, 2015
Architectures: DeepLab V3
Source: Rethinking Atrous Convolution for Semantic Image Segmentation Liang-Chieh Chen, George Papandreou,
Florian Schroff, Hartwig Adam, 2017
Architectures: DeepLab V3+
Source: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu,
George Papandreou, Florian Schroff, and Hartwig Adam, 2018
Architectures: and more
See this medium blog post: Review of deep learning algorithm for semantic
segmentation
Fully Convolutional Network
ParseNet
Feature Pyramid Network
Pyramid Scene Parsing network (PSPNet)
Path Aggregation Network (PANet)
Context Encoding Network (EncNet)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Conclusion
The key challenge in semantic segmentation is to
efficiently mix local and global context for pixel-wise
predictions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Thank you!
Go Build! https://gluon-cv.mxnet.io/build/examples_segmentation/index.html

Image Segmentation: Approaches and Challenges

  • 1.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Semantic Segmentation
  • 2.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Problem statement: Pixel-level classification task
  • 3.
    Applications: Brain tissuesegmentation U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox, 2015
  • 4.
  • 5.
    Applications: Self-driving cars source:https://www.youtube.com/watch?v=ATlcEDSPWXY
  • 6.
    How does itwork? Source: Fully Convolutional Networks for Semantic Segmentation, Long et al. 2015 Deep Neural Network
  • 7.
    Input RGB or GrayscaleImages Unsigned integer [0,255]
  • 8.
    N classes Output: predictone “heat-map” per class Softmax across class axis
  • 9.
    How does itwork? Trained to minimize the softmax cross entropy loss for each pixel i,j predictions among the N different classes: 𝑙𝑜𝑠𝑠 = − 𝑖,𝑗 𝐻,𝑊 𝑐 𝑁 𝑦𝑖,𝑗,𝑐 ∗ log(𝑝𝑖,𝑗,𝑐) 𝑙𝑜𝑠𝑠 = − 𝑖,𝑗 𝐻,𝑊 log(𝑝𝑖,𝑗,𝑐=𝑦 𝑖,𝑗 )
  • 10.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Main challenge: capturing multi-scale context cow?
  • 11.
    Source: Deep LabV3Rethinking Atrous Convolution for Semantic Image Segmentation, Chen et al. 2017 Strategies for capturing multi-scale context
  • 12.
    Architectures: HourGlass Architecture ofthe full network. The convolution network is based on the VGG16 architecture. The deconvolution network uses unpooling and deconvolution layers. Source: H. Noh et al. (2015)
  • 13.
    Architectures: U-Net U-Net: ConvolutionalNetworks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox, 2015
  • 14.
    Architectures: DeepLab V3 Source:Rethinking Atrous Convolution for Semantic Image Segmentation Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam, 2017
  • 15.
    Architectures: DeepLab V3+ Source:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, 2018
  • 16.
    Architectures: and more Seethis medium blog post: Review of deep learning algorithm for semantic segmentation Fully Convolutional Network ParseNet Feature Pyramid Network Pyramid Scene Parsing network (PSPNet) Path Aggregation Network (PANet) Context Encoding Network (EncNet)
  • 17.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
  • 18.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Conclusion The key challenge in semantic segmentation is to efficiently mix local and global context for pixel-wise predictions
  • 19.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Thank you! Go Build! https://gluon-cv.mxnet.io/build/examples_segmentation/index.html

Editor's Notes

  • #2 First call deck for a high level introduction to Apache MXNet.
  • #3 Pixel house landscape by 8bitnoob