UNetEliyaLaialy (2).pptx

Ben-Gurion University of the Negev
Deep Learning Image Processing 2018
Eliya Ben Avraham & Laialy Darwesh
U-Net: Convolutional Networks
for
Biomedical Image Segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox
University of Freiburg, Germany
1
https://arxiv.org/pdf/1505.04597.pdf

 Introduction
 Motivation
 Previous work
 U-NET architecture
 U-NET Training
 Data Augmentation
 Experiments
 Extending U-NET
 Conclusion
Topics
2

Convolutional Neural Networks (CNN)
Introduction
3
https://www.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html
 The fewer number of connections and weights make convolutional
layers relatively cheap (vs full connect) in terms of memory and
compute power needed.
 Convolutional networks make the assumption of locality, and
hence are more powerful

Introduction
4
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
https://sites.google.com/site/nttrungmtwiki/home/it/data-science---python/tensorflow/tensorflow-and-deep-learning-part-3?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F&showPrintDialog=1
Convolution Layer
W - Input volume size
F – Receptive field size (Filter
Size)
P - Zero padding used on the
border
S - Stride
Output Size = (W−F+2P)/S+1
0 1 2
2 2 0
0 1 2
kernel
 Padding = 0
 Strides = 1
Output Size = (5−3+2∗0)/1+1 =
3

Introduction
5
https://www.saagie.com/blog/object-detection-part1
 The use of convolutional networks is on classification tasks
where the output to typical image is a single class label.
A class label is supposed to be assigned to each pixel.
 In many visual tasks, especially in biomedical image
processing, the desired output should include localization

Introduction
6
http://cs231n.stanford.edu/slides/2016/winter1516_lecture13.pdf
 Label every pixel!
 Don’t differentiate instances
 Classic computer vision problem
Pixel-wise Semantic Segmentation

Main Motivation
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
Biomedical Image Segmentation with U-net
 For example AlexNet:
 8 layers and millions of parameters on on the ImageNet
dataset
 1 million training images
 Thousands of training images are usually beyond reach
in biomedical task
 The desired output should include localization

Main Motivation
Biomedical Image Segmentation with U-net
U-Net
Input image Output segmentation map
 U-net learns segmentation in and end-to-end setting
 Vary few annotated images (approx. 30 per application)
 Touching objects of the same class
IEEE International Symposium on Biomedical Imaging (ISBI 2015)

First Task
9
Ciresan, D.C., Gambardella, L.M., Giusti, A., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. In: NIPS. pp. 2852{2860 (2012)
Predict the class label of each pixel
 Stacks of Electron microscopy (EM) images
 EM segmentation challenge at ISBI 2012
 30 training images
Training stack Ground truth
Black - neuron membranes
White - cells

Second Task
10
ISBI 2015- separation of touching objects of the same class
 Light microscopic images (recorded by phase contrast microscopy)
 Part of the ISBI cell tracking challenge 2014 and 2015
Raw image
(HeLa cells)
Generated segmentation mask
(white:foreground, black:background)
Ground truth segmentation.

Challenges
Segmentation of Neuronal Structures in EM

Previous work
Ciresan, D.C., Gambardella, L.M., Giusti, A., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images.
The winner (ISBI 2012) (Ciresan et al.)
 Trained a network in a sliding-window (local region (patch) around that pixel)
x Slow because the network must be run separately for each patch
 This network can localize
Deep
Neural
Netwok
 The training data in terms of patches is much larger than the number of training images
x There is a lot of redundancy

Previous work
Ciresan, D.C., Gambardella, L.M., Giusti, A., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. In: NIPS. pp. 2852{2860(2012)
The winner (ISBI 2012)
 Trade-off between localization accuracy and the use of context.
Larger patches: Require more max-pooling layers → reduce the localization accuracy
Small patches: Allow the network to see only little context
We want a good localization and the use of context at the
same time
Deep
Neural
Netwok

Previous work (Inspiration)
https://www.azavea.com/blog/2017/05/30/deep-learning-on-aerial-imagery/
Fully convolutional neural network (FCN) architecture for
semantic segmentation
 Localization and the use of context at the same time
Localization and the use of context at the
same time
 Input image with any size
 Added Simple Decoder (Upsampling + Conv)
 Removed Dense Layers

Input
image
tile
W - Input volume size
F – Receptive field size (Filter
Size)
P - Zero padding used on the
border
S - Stride
U-NET Architecture
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
Output
segmentation map
(here 2 classes)
background and
foreground
Increase the “What”
Reduce the “Where”
Create high-resolution
segmentation map
Output Size (first conv)
= (572 – 3 +2*0)/1 + 1 = 570
→ 570 x 570
Output Size (second
conv)
= (570 – 3 +2*0)/1 + 1 = 568
→ 568 x 568
Concatenation with
high-resolution features
from contracting path

U-NET Strategy
Over-tile strategy for arbitrary large images
 Segmentation of the yellow area uses input data of the blue area
 Raw data extrapolation by mirroring

U-net Training
17
𝐸 = −
𝑥∈𝛺
𝑤 𝑥 𝑙𝑜𝑔(pl(x)(x) )
𝑝𝑘(𝑥) = exp 𝑎𝑘 𝑥 /
𝑘′=1
𝐾
exp(𝑎𝑘′ 𝑥 )
Soft-max:
Cross-Entropy loss function:
 𝑘- Feature channel
 𝑎𝑘(𝑥) - The activation in feature channel k at pixel position x
 𝑤(𝑥)- True label per a pixel

U-net Training
18
pixel-wise loss weight
 Force the network to learn the small separation borders that they
introduce between touching cells.
𝐰 𝒙 = 𝒘𝒄 𝒙 + 𝒘𝟎 𝒆𝒙𝒑 −
𝒅𝟏 𝒙 + 𝒅𝟐 𝒙
𝟐
𝟐𝝈𝟐
 𝑤𝑐 𝑥 - weight map to balance the class frequencies
 𝑤0 - 10 , 𝜎 ≈ 5 pixels
 𝑑1/𝑑2 - Distance to the border of the nearest cell / second nearest cell
Colors :different instances

Data Augmentation
Augment Training Data using Deformation
 Random elastic deformation of the training samples.
 Shift and rotation invariance of the training samples.
 They use random displacement vectors on 3 by 3 grid.
 The displacement are sampled from Gaussian distribution with standard
deviation of 10 pixels

U-net Training
20
Weights initialization
 Achieved by Gaussian distribution:
 A good initialization of the weights is extremely important
 Ideally the initial weights should be adapted such that each feature
map in the network has approximately unit variance)
𝟏 = 𝑽𝒂𝒓
𝒊
𝑵
𝑿𝒊𝑾𝒊
𝝈𝒘 =
𝟏
𝑵
 For example: 3x3 convolution and 64 feature channels in the
previous layer 𝑁 = 3 ∗ 3 ∗ 64 = 576
𝝈𝒘 =
𝟐
𝑵
ReLU layers
ReLU unit is zero for non positive inputs

Experiments: First task
21
 The results of u-net is better than the sliding window convolutional
network which was the best one in 2012 until 2015.
Raw image Ground truth
EM segmentation challenge (since ISBI 2012)

Experiments :Second/Third task
22
DIC-Hela
PhC-U373
 Strong shape variations
 Weak outer borders, strong irrelevant inner borders
 Cytoplasm has same structure like background
ISBI cell tracking challenge 2015

Extending U-NET Architecture
23
Application scenarios for volumetric segmentation with the 3D u-net
Semi-automated segmentation
https://arxiv.org/abs/1606.06650
 The user annotates some slices of each volume to be segmented
 The network predicts the dense segmentation
Fully automated segmentation
 Trained with annotated slices
 Run on non-annotated volumes

Extending U-NET Architecture
24
 Voxel size of 1.76×1.76×2.04µm3
 Batch normalization (“BN”) before each ReLU
 3 × 3 × 3 convolutions, 2 × 2 × 2 max pooling, upconvolution of 2 × 2 × 2
https://arxiv.org/abs/1606.06650
Input: 132 × 132 × 116 voxel tile
Output: 44×44×28 voxel
Application scenarios for volumetric segmentation with the 3D u-net
Jun 2016

Extends the previous u-net
25
 Additional reconstruction layer
 LS is the softmax loss (standard cross entropy loss averaged over all pixels),
LR is the reconstruction loss (standard mean squared error)
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7813160
shifted sigmoid
K = 50 was found to be sufficient
to ensure pre-training convergence
Unsupervised Pre-training for Fully Convolutional Neural Networks
(2016)

Summary and Conclusion
26
U-net advantages
 Flexible and can be used for any rational image masking task
 High accuracy (given proper training, dataset, and training time)
 Doesn’t contain any fully connected layers
 Faster than the sliding-window (1-sec per image)
 Proven to be very powerful segmentation tool in scenarios with limited data
 Succeeds to achieve very good performances on different biomedical
segmentation applications.
U-net disadvantages
 Larger images need high GPU memory.
 Takes significant amount of time to train (relatively many layers)
 Pre-trained models not widely available (it's too task specific)

UNetEliyaLaialy (2).pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to UNetEliyaLaialy (2).pptx

Similar to UNetEliyaLaialy (2).pptx (20)

Recently uploaded

Recently uploaded (20)

UNetEliyaLaialy (2).pptx

Editor's Notes