Image Segmentation
DeepBio
Hyungjoo Cho
Image Segmentation
Image Segmentation
Pixel wise prediction(classification)
Image Segmentation
• CNNs
• RNNs
• GANs
Using CNNs
Fully Convolution Network
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
• End-to-end, Pixel-to-pixel prediction
• Backwards convolution for up-sampling
• Per-pixel multinomial logistic loss
Limitations
• Fixed size receptive field
• Too simple structure to get detailed features
Deconvolution Network
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf
• Combining unpooling, deconvolution(with crop), and Relu
• Reconstruction of the detailed structure of an object in finer resolution
• Batch-normalization
Limitations
• Difficult to learn
• Still lose spatial information
U-Net
https://arxiv.org/pdf/1505.04597.pdf
• Do not use unpooling(only up-convolution)
• Skip-connection(with concat)
• Do not have fully connected layer
• Elastic deformation
Limitations
• Didn’t use batch-norm
• VGG is not the best solution for feature extracting
Deep contextual networks
http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11789
• Auxiliary connection, classifier
• Ensemble
• Lower memory consumption
Limitations
• Didn’t use batch-norm
• VGG is not the best solution for feature extracting
FusionNet
https://arxiv.org/pdf/1612.05360.pdf
• Skip-connection(with summation)
• Residual block(shortcut connection)
• Elastic deformation
Limitations
• Memory
• Memory
• Memory
• Memory
• Memory
Pyramid Scene Parsing Net
https://arxiv.org/pdf/1612.01105.pdf
• Pre-trained FCN with ResNet(1/8 sized feature map)
• Pyramid pooling & 1x1 cone
• Bilinear interpolation
• Avg pooling is better than Max pooling
Using RNNs
Multi-Dimensional RNNs
https://arxiv.org/pdf/0705.2011.pdf
• GOD GRAVES!!
• 1D RNNs(Bi-directional RNNs) couldn’t explain images well
• Need to access to the surrounding context in all directions
• N-dimensional data : At least 2^(N) hidden layers
• The input layer is size 3(RGB) or 1(Gray) or patch and the output layer(softmax) is size of classes
Assume that…
A00 A01 A02
A10 A11 A12
A20 A21 A22
• 3X3 IMAGE
A00 A01 A02 A10 A11 A12 A20 A21 A22
O00 O01 O02 O10 O11 O12 O20 O21 O22
Scene Labeling with LSTM
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf
• Patch without overlapping
• Four separate 2D-LSTM block with summation
• The size of the layer corresponds
to the number of feature maps
Turned MD RNNs
https://arxiv.org/pdf/0705.2011.pdf
• Standard MD RNNs was not easy to parallelize
• Rotate 45 degrees
• Easy to parallelize!!
• This introduces context gaps
PyraMiD RNNs
https://arxiv.org/pdf/1506.07452.pdf
• Fill the blanks
• More spatial information
• For 3D-Image : PyraMiD needs only 6, while standard needs 8 cubes
Grid LSTM
https://arxiv.org/pdf/1506.07452.pdf
• GOD GRAVES
• Connections along depth dimension as well as temporal dimension
• 3D Grid LSTM = Multi-dimensional LSTM + memory connection
Using GANs
Pix2Pix
https://arxiv.org/pdf/1611.07004v1.pdf
• Pixel to pixel translation
• U-Net + Conditional Gan loss
• Also doing well segmentation tasks
Pix2Pix
https://arxiv.org/pdf/1611.07004v1.pdf
Adversarial Networks for the Detection of
Aggressive Prostate Cancer
https://arxiv.org/pdf/1702.08014.pdf
• Pix2pix structure
• Conditional Gan loss
• Instance norm in stead of batch norm
Thanks❤️

Image segmentation hj_cho