SlideShare a Scribd company logo
1 of 34
DeepLabV3+
Encoder-Decoder with Atrous Separable Convolution
for Semantic Image Segmentation
Background
▪ DeepLabV3+ is the latest version of the DeepLab models.
▪ DeepLab V1: Semantic Image Segmentation with Deep Convolutional Nets and
Fully Connected CRFs. ICLR 2015.
▪ DeepLab V2: DeepLab: Semantic Image Segmentation with Deep Convolutional
Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017.
▪ DeepLab V3: Rethinking Atrous Convolution for Semantic Image Segmentation.
arXiv 2017.
▪ DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic
Image Segmentation. arXiv 2018.
Semantic Segmentation
▪ Classifying all pixels in an image
into classes.
▪ Classification at the pixel level.
▪ Does not have to separate different
instances of the same class.
▪ Has important applications in
Medical Imaging.
Current Results on Pascal VOC 2012
Motivation and Key Concepts
▪ Use Atrous Convolution and Separable Convolutions to reduce computation.
▪ Combine Atrous Spatial Pyramid Pooling Modules and Encoder-Decoder
Structures.
▪ ASPPs capture contextual information at multiple scales by pooling features at
different resolutions.
▪ Encoder-Decoders can obtain sharp object boundaries.
Architecture Overview
Advanced Convolutions
Convolution (Cross-Correlation) for 1 Channel
Convolution with Zero-Padding Display with Convolution Kernel
Blue maps: inputs, Cyan maps: outputs, Kernel: not displayed
Other Convolutions (Cross-Correlations)
Strided Convolution with Padding Atrous (Dilated) Convolution with r=2
Blue maps: inputs, Cyan maps: outputs, Kernel: not displayed
Atrous Convolution
▪ à trous is French for “with holes”
▪ Atrous Convolution is also known as
Dilated Convolution.
▪ Atrous Convolution with r=1 is the
same as ordinary Convolution
▪ The image on the left shows 1D
atrous convolution
Receptive Field of Atrous Convolutions
▪ Left: r=1, Middle: r=2, Right: r=4
▪ Atrous Convolution has a larger receptive field than normal convolution
with the same number of parameters.
Depth-wise Separable Convolution
▪ A special case of Grouped Convolution.
▪ Separate the convolution operation along the depth (channel) dimension.
▪ It can refer to both (depth -> point) and (point -> depth).
▪ It only has meaning in multi-channel convolutions (cross-correlations).
Review: Multi-Channel 2D Convolution
Exact Shapes and Terminology
▪ Filter: A collection of 𝑪𝒊𝒏 Kernels of shape (𝑲 𝑯, 𝑲 𝑾) concatenated channel-wise.
▪ Input Tensor Shape: 𝑵, 𝑪𝒊𝒏, 𝑯, 𝑾 𝑜𝑟 (𝑵, 𝑯, 𝑾, 𝑪𝒊𝒏)
▪ Filters are 3D, Kernels are 2D, All filters are concatenated to a single 4D array in 2D CNNs.
(𝑵: 𝐵𝑎𝑡𝑐ℎ 𝑁𝑢𝑚𝑏𝑒𝑟, 𝑯: 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝐻𝑒𝑖𝑔ℎ𝑡, 𝑾: 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑊𝑖𝑑𝑡ℎ,
𝑪𝒊𝒏: #𝐼𝑛𝑝𝑢𝑡 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠, 𝑪 𝒐𝒖𝒕: #𝑂𝑢𝑡𝑝𝑢𝑡 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠, 𝑲 𝑯: 𝐾𝑒𝑟𝑛𝑒𝑙 𝐻𝑒𝑖𝑔ℎ𝑡, 𝑲 𝑾: 𝐾𝑒𝑟𝑛𝑒𝑙 𝑊𝑖𝑑𝑡ℎ)
Step 1: Convolution on Input Tensor Channels
Step 2: Summation along Input Channel Dimension
Step 3: Add Bias Term
▪ Each kernel of a filter iterates only
1 channel of the input tensor.
▪ The number of filters is 𝐶 𝑜𝑢𝑡. Each
filter generates one output channel.
▪ Each 2D kernel is different from all
other kernels in the 3D filter.
Key Points
Normal Convolution
▪ Top: Input Tensor
▪ Middle: Filter
▪ Bottom: Output Tensor
Depth-wise Separable Convolution
▪ Replace Step 2.
▪ Instead of summation, use point-wise convolution (1x1 convolution).
▪ There is now only one (𝑪𝒊𝒏, 𝑲 𝑯, 𝑲 𝑾) filter.
▪ The number of 1x1 filters is 𝑪 𝒐𝒖𝒕.
▪ Bias is usually included only at the end of both convolution operations.
▪ Usually refers to depth-wise convolution -> point-wise convolution.
▪ Xception uses point-wise convolution -> depth-wise convolution.
Depth-wise Separable Convolution
Characteristics
▪ Depth-wise Separable Convolution can be used as a drop-in replacement for
ordinary convolution in DCNNs.
▪ The number of parameters is reduced significantly (sparse representation).
▪ The number of flops is reduced by several orders of magnitude
(computationally efficient).
▪ There is no significant drop in performance (performance may even improve).
▪ Wall-clock time reduction is less dramatic due to GPU memory access patterns.
Example: Flop Comparison (Padding O, Bias X)
Ordinary Convolution
▪ 𝐻 ∗ 𝑊 ∗ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡
▪ For a 256x256x3 image with 128 filters
with kernel size of 3x3, the number of
flops would be
256 ∗ 256 ∗ 3 ∗ 3 ∗ 3 ∗ 128 = 226,492,416
▪ There is an 8-fold reduction in the
number of flops.
Depth-wise Separable Convolution
▪ 𝐻 ∗ 𝑊 ∗ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 + 𝐻 ∗ 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡
▪ Left: Depth Conv, Right: Point Conv
▪ For a 256x256x3 image with 128 filters
and a 3x3 kernel size, the number of
flops would be
256 ∗ 256 ∗ 3 ∗ 3 ∗ 3 + 256 ∗ 256 ∗ 3 ∗ 128
= 1,769,472 + 25,165,824 = 26,935,296
Example: Parameter Comparison (Excluding Bias Term)
Ordinary Convolution
▪ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡
▪ For a 256x256x3 image with 128 filters
and 3x3 kernel size, the number of
weights would be
3 ∗ 3 ∗ 3 ∗ 128 = 3,456
Depth-wise Separable Convolution
▪ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 + 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡
▪ For a 256x256x3 image with 128 filters
and 3x3 kernel size, the number of flops
would be
3 ∗ 3 ∗ 3 + 3 ∗ 128 = 411
▪ There is also an 8-fold reduction in
parameter numbers.
Atrous Depth-wise Separable Convolution
Architecture Overview
Encoder-Decoder Structures
▪ The Encoder reduces the spatial sizes of feature maps, while extracting higher-
level semantic information.
▪ The Decoder gradually recovers the spatial information.
▪ UNETs are a classical example of encoder-decoder structures.
▪ In DeepLabV3+, DeepLabV3 is used as the encoder.
Architecture Overview
Decoder Layer
Structure
1. Apply 4-fold bilinear up-sampling on the
ASPP outputs.
2. Apply 1x1 Convolution with reduced filter
number on a intermediate feature layer.
3. Concatenate ASPP outputs with
intermediate features.
4. Apply two 3x3 Convolutions.
5. Apply 4-fold bilinear up-sampling.
Purpose & Implementation
▪ The ASPP is poor at capturing fine details.
▪ The decoder is used to improve the
resolution of the image.
▪ The intermediate layer has 1x1
convolutions to reduce channel number.
ASPP: Atrous Spatial Pyramid Pooling
The ASPP Layer
▪ Encodes multi-scale contextual
information through multiple rates.
▪ Concatenate all extracted features and an
up-sampled global average pooling layer
channel-wise.
▪ Use Atrous Depth-wise separable
convolutions for multiple channels.
▪ Bad at capturing sharp object boundaries.
Modified Aligned Xception Network
▪ Xception: Extreme Inception Network.
▪ Backbone network for DeepLabV3+
▪ Uses residual blocks and separable
convolutions.
Explanation of Xception
▪ Takes the “Inception Hypothesis”, which states that cross-channel correlations
and spatial correlations are sufficiently decoupled that it is preferable not to
map them jointly, to the extreme.
▪ The extensive use of separable convolutions and atrous convolutions allows
the model to fit in GPU memory despite the huge number of layers.
▪ Originally applied point-wise convolution before depth-wise convolution.
▪ Invented by François Chollet.
Architecture Review
The End

More Related Content

What's hot

Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning Asma-AH
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domainAshish Kumar
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation범준 김
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Point processing
Point processingPoint processing
Point processingpanupriyaa7
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 

What's hot (20)

Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
Bit plane coding
Bit plane codingBit plane coding
Bit plane coding
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domain
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Hog and sift
Hog and siftHog and sift
Hog and sift
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Point processing
Point processingPoint processing
Point processing
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 

Similar to DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Quality enhamcment
Quality enhamcmentQuality enhamcment
Quality enhamcmentYara Ali
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfMohammadAzreeYahaya
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basisJAEMINJEONG5
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsMohamed Loey
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 

Similar to DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (20)

Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Quality enhamcment
Quality enhamcmentQuality enhamcment
Quality enhamcment
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdf
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network Models
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Cv mini project (1)
Cv mini project (1)Cv mini project (1)
Cv mini project (1)
 
Data compression
Data compressionData compression
Data compression
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 

More from Joonhyung Lee

Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGANDenoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGANJoonhyung Lee
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainJoonhyung Lee
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...Joonhyung Lee
 
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...Joonhyung Lee
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...Joonhyung Lee
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeJoonhyung Lee
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingJoonhyung Lee
 

More from Joonhyung Lee (11)

nnUNet
nnUNetnnUNet
nnUNet
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGANDenoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
StarGAN
StarGANStarGAN
StarGAN
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical Imaging
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

  • 1. DeepLabV3+ Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
  • 2. Background ▪ DeepLabV3+ is the latest version of the DeepLab models. ▪ DeepLab V1: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ICLR 2015. ▪ DeepLab V2: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017. ▪ DeepLab V3: Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017. ▪ DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018.
  • 3. Semantic Segmentation ▪ Classifying all pixels in an image into classes. ▪ Classification at the pixel level. ▪ Does not have to separate different instances of the same class. ▪ Has important applications in Medical Imaging.
  • 4. Current Results on Pascal VOC 2012
  • 5. Motivation and Key Concepts ▪ Use Atrous Convolution and Separable Convolutions to reduce computation. ▪ Combine Atrous Spatial Pyramid Pooling Modules and Encoder-Decoder Structures. ▪ ASPPs capture contextual information at multiple scales by pooling features at different resolutions. ▪ Encoder-Decoders can obtain sharp object boundaries.
  • 8. Convolution (Cross-Correlation) for 1 Channel Convolution with Zero-Padding Display with Convolution Kernel Blue maps: inputs, Cyan maps: outputs, Kernel: not displayed
  • 9. Other Convolutions (Cross-Correlations) Strided Convolution with Padding Atrous (Dilated) Convolution with r=2 Blue maps: inputs, Cyan maps: outputs, Kernel: not displayed
  • 10. Atrous Convolution ▪ à trous is French for “with holes” ▪ Atrous Convolution is also known as Dilated Convolution. ▪ Atrous Convolution with r=1 is the same as ordinary Convolution ▪ The image on the left shows 1D atrous convolution
  • 11. Receptive Field of Atrous Convolutions ▪ Left: r=1, Middle: r=2, Right: r=4 ▪ Atrous Convolution has a larger receptive field than normal convolution with the same number of parameters.
  • 12. Depth-wise Separable Convolution ▪ A special case of Grouped Convolution. ▪ Separate the convolution operation along the depth (channel) dimension. ▪ It can refer to both (depth -> point) and (point -> depth). ▪ It only has meaning in multi-channel convolutions (cross-correlations).
  • 14. Exact Shapes and Terminology ▪ Filter: A collection of 𝑪𝒊𝒏 Kernels of shape (𝑲 𝑯, 𝑲 𝑾) concatenated channel-wise. ▪ Input Tensor Shape: 𝑵, 𝑪𝒊𝒏, 𝑯, 𝑾 𝑜𝑟 (𝑵, 𝑯, 𝑾, 𝑪𝒊𝒏) ▪ Filters are 3D, Kernels are 2D, All filters are concatenated to a single 4D array in 2D CNNs. (𝑵: 𝐵𝑎𝑡𝑐ℎ 𝑁𝑢𝑚𝑏𝑒𝑟, 𝑯: 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝐻𝑒𝑖𝑔ℎ𝑡, 𝑾: 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑊𝑖𝑑𝑡ℎ, 𝑪𝒊𝒏: #𝐼𝑛𝑝𝑢𝑡 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠, 𝑪 𝒐𝒖𝒕: #𝑂𝑢𝑡𝑝𝑢𝑡 𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑠, 𝑲 𝑯: 𝐾𝑒𝑟𝑛𝑒𝑙 𝐻𝑒𝑖𝑔ℎ𝑡, 𝑲 𝑾: 𝐾𝑒𝑟𝑛𝑒𝑙 𝑊𝑖𝑑𝑡ℎ)
  • 15. Step 1: Convolution on Input Tensor Channels
  • 16. Step 2: Summation along Input Channel Dimension
  • 17. Step 3: Add Bias Term ▪ Each kernel of a filter iterates only 1 channel of the input tensor. ▪ The number of filters is 𝐶 𝑜𝑢𝑡. Each filter generates one output channel. ▪ Each 2D kernel is different from all other kernels in the 3D filter. Key Points
  • 18. Normal Convolution ▪ Top: Input Tensor ▪ Middle: Filter ▪ Bottom: Output Tensor
  • 19. Depth-wise Separable Convolution ▪ Replace Step 2. ▪ Instead of summation, use point-wise convolution (1x1 convolution). ▪ There is now only one (𝑪𝒊𝒏, 𝑲 𝑯, 𝑲 𝑾) filter. ▪ The number of 1x1 filters is 𝑪 𝒐𝒖𝒕. ▪ Bias is usually included only at the end of both convolution operations. ▪ Usually refers to depth-wise convolution -> point-wise convolution. ▪ Xception uses point-wise convolution -> depth-wise convolution.
  • 21. Characteristics ▪ Depth-wise Separable Convolution can be used as a drop-in replacement for ordinary convolution in DCNNs. ▪ The number of parameters is reduced significantly (sparse representation). ▪ The number of flops is reduced by several orders of magnitude (computationally efficient). ▪ There is no significant drop in performance (performance may even improve). ▪ Wall-clock time reduction is less dramatic due to GPU memory access patterns.
  • 22. Example: Flop Comparison (Padding O, Bias X) Ordinary Convolution ▪ 𝐻 ∗ 𝑊 ∗ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡 ▪ For a 256x256x3 image with 128 filters with kernel size of 3x3, the number of flops would be 256 ∗ 256 ∗ 3 ∗ 3 ∗ 3 ∗ 128 = 226,492,416 ▪ There is an 8-fold reduction in the number of flops. Depth-wise Separable Convolution ▪ 𝐻 ∗ 𝑊 ∗ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 + 𝐻 ∗ 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡 ▪ Left: Depth Conv, Right: Point Conv ▪ For a 256x256x3 image with 128 filters and a 3x3 kernel size, the number of flops would be 256 ∗ 256 ∗ 3 ∗ 3 ∗ 3 + 256 ∗ 256 ∗ 3 ∗ 128 = 1,769,472 + 25,165,824 = 26,935,296
  • 23. Example: Parameter Comparison (Excluding Bias Term) Ordinary Convolution ▪ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡 ▪ For a 256x256x3 image with 128 filters and 3x3 kernel size, the number of weights would be 3 ∗ 3 ∗ 3 ∗ 128 = 3,456 Depth-wise Separable Convolution ▪ 𝐾 𝐻 ∗ 𝐾 𝑊 ∗ 𝐶𝑖𝑛 + 𝐶𝑖𝑛 ∗ 𝐶 𝑜𝑢𝑡 ▪ For a 256x256x3 image with 128 filters and 3x3 kernel size, the number of flops would be 3 ∗ 3 ∗ 3 + 3 ∗ 128 = 411 ▪ There is also an 8-fold reduction in parameter numbers.
  • 26. Encoder-Decoder Structures ▪ The Encoder reduces the spatial sizes of feature maps, while extracting higher- level semantic information. ▪ The Decoder gradually recovers the spatial information. ▪ UNETs are a classical example of encoder-decoder structures. ▪ In DeepLabV3+, DeepLabV3 is used as the encoder.
  • 28. Decoder Layer Structure 1. Apply 4-fold bilinear up-sampling on the ASPP outputs. 2. Apply 1x1 Convolution with reduced filter number on a intermediate feature layer. 3. Concatenate ASPP outputs with intermediate features. 4. Apply two 3x3 Convolutions. 5. Apply 4-fold bilinear up-sampling. Purpose & Implementation ▪ The ASPP is poor at capturing fine details. ▪ The decoder is used to improve the resolution of the image. ▪ The intermediate layer has 1x1 convolutions to reduce channel number.
  • 29. ASPP: Atrous Spatial Pyramid Pooling
  • 30. The ASPP Layer ▪ Encodes multi-scale contextual information through multiple rates. ▪ Concatenate all extracted features and an up-sampled global average pooling layer channel-wise. ▪ Use Atrous Depth-wise separable convolutions for multiple channels. ▪ Bad at capturing sharp object boundaries.
  • 31. Modified Aligned Xception Network ▪ Xception: Extreme Inception Network. ▪ Backbone network for DeepLabV3+ ▪ Uses residual blocks and separable convolutions.
  • 32. Explanation of Xception ▪ Takes the “Inception Hypothesis”, which states that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly, to the extreme. ▪ The extensive use of separable convolutions and atrous convolutions allows the model to fit in GPU memory despite the huge number of layers. ▪ Originally applied point-wise convolution before depth-wise convolution. ▪ Invented by François Chollet.