Semantic SegmentationSemantic Segmentation
Example - Fully Convolutional Networks for SemanticExample - Fully Convolutional Networks for Semantic
SegmentationSegmentation
UC Berkeley
Computer visionComputer vision
picture source
(https://read01.com/Bng557M.html#.W4T_kXUzbiw)
Semantic segmentationSemantic segmentation
Each pixel has its own label!
picture source (https://www.quora.com/What-does-the-term-
semantic-segmentation-mean-in-the-context-of-Deep-Learning)
Typical wayTypical way
Image Model
Outcome
w x h
Label
w x h
cross
entropy
Loss is calculated for each pixel independently.
IssueIssue
How to create dense prediction?
related works:
patchwise training
small model -> small receptive eld
post-processing (e.g. superpixel projection, random eld regularization, ltering
...)
saturating tanh
restricted receptive eld
input shifting and output interlacing
multi-scale pyramid processing
Receptive eldReceptive eld
IdeaIdea
Semantics and location
Global information resolves what while local information resolves where.
global information -> what (semantics)
local information -> where (location)
IdeaIdea
Use train by entire image, instead of patch.
Let receptive eld overlap signi cantly to improve ef ciency.
Transfer learning from classi cation net to fully convolution network.
For pixelwise prediction, connect coarse outputs to pixels.
Fully convolutional networkFully convolutional network
Fully Convolutional Networks for Semantic Segmentation
(https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)
Convert classi cation net to fully convolution networkConvert classi cation net to fully convolution network
Dense predictionDense prediction
Strategy for upsampling:
Shift-and-stitch
Deconvolution
Shift-and-stitchShift-and-stitch
picture source (https://www.jianshu.com/p/e534e2be5d7d)
DeconvolutionDeconvolution
Deconvolutional network [2015]
Evaluation methodEvaluation method
is the number of the pixel of class predicted to be class
there are different classes
, total number of pixels of class
pixel accuracy:
mean accuracy:
mean region intersection over union (IU):
nij i j
ncl
=ti ∑
j
nij i
/∑
i
nii ∑
i
ti
(1/ ) /ncl ∑
i
nii ∑
i
ti
1
ncl
∑
i
nii
+ −ti ∑
j
nji nii
ResultsResults
Combine coarse and ne feature mapsCombine coarse and ne feature maps
Combine coarse and ne feature mapsCombine coarse and ne feature maps
Combine coarse and ne feature mapsCombine coarse and ne feature maps
32x upsampling
FCN-32sconv 7
conv 7
pool 4
2x upsampling
1 x 1 conv
+
16x upsampling
FCN-16s
pool 3
2x upsampling
conv 7
pool 4
4x upsampling
1 x 1 conv
+
8x upsampling
FCN-8s
ResultsResults
FCN-8s SDS [17] Ground Truth Image
ImportanceImportance
FCN for pixelwise prediction
arbitrary-sized inputs
learning and inference whole image at a time
leverage supervised pre-train model
upsampling (deconvolution)
Take home messageTake home message
more convolution, more coarse
combine coarse and ne feature map (skip architecture)
Deconvolutional network [2015]Deconvolutional network [2015]
Learning Deconvolution Network for Semantic Segmentation
(https://arxiv.org/abs/1505.04366)
Deconvolutional networkDeconvolutional network
[1]
U-Net [2015]U-Net [2015]
copy and crop
input
image
tile
output
segmentation
map
641
128
256
512
1024
max pool 2x2
up-conv 2x2
conv 3x3, ReLU
572x572
284²
64
128
256
512
570x570
568x568
282²
280²140²
138²
136²68²
66²
64²32²
28²
56²
54²
52²
512
104²
102²
100²200²
30²
198²
196²392x392
390x390
388x388
388x388
1024
512 256
256 128
64128 64 2
conv 1x1
U-Net: Convolutional Networks for Biomedical Image
Segmentation (https://arxiv.org/abs/1505.04597)
U-NetU-Net
U-Net: Convolutional Networks for Biomedical Image
Segmentation (https://arxiv.org/abs/1505.04597)
SegNet [2015, University of Cambridge]SegNet [2015, University of Cambridge]
Convolutional Encoder-Decoder Architecture
Convolutional Encoder-Decoder
Pooling Indices
Input
Segmentation
Output
Conv + Batch Normalisation + ReLU
Pooling Upsampling Softmax
RGB Image
SegNetSegNet
high ef ciency
reduce parameters
make end-to-end training availible
My conclusionMy conclusion
Encoder-decoder architecture
Encoder: extract high-level or abstract meanings (semantics)
Decoder: generate instance from abstract meanings
Discriminative model
Generative model
P (y ∣ x)
P (x, y)
Q & AQ & A
ReferenceReference
[1]
[2]
[3]
A brief introduction to recent segmentation methods
(https://www.slideshare.net/mitmul/a-brief-introduction-to-recent-
segmentation-methods)
关于FCN 论⽂中的Shift-and-stitch 的详尽解释
(https://www.jianshu.com/p/e534e2be5d7d)
A 2017 Guide to Semantic Segmentation with Deep Learning
(http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review)

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation