U-Net: Convolutional Networks for Biomedical Image
Segmentation
2022/04/26 Changjin Lee
Semantic Segmentation
- Assign each pixel a class label
FCN
● Gradually generate higher-level feature maps as the network gets deeper
● Transposed Convolution to generate segmentation map
● Model Variations - Utilize “skip-connection” by summation
higher-level (context information)
low-level (ex. edges, curves)
U-Net: Idea
● The final feature map from the series of convolutional layers contain high-level and contextual information
● We cannot just jump from this “higher-level” feature maps to segmentation map which requires “low-level” and dense
predictions
● To remedy this issue, FCN introduces “skip-connection” which uses “summation”
● U-Net introduces a better structured network architecture to narrow down this disparity
○ U-shape architecture
○ Skip connection by concatenation
U-Net
● Contracting path
○ channels increase
○ resolution decreases
○ gradually generate higher-level features
● Expansive path
○ transpose convolution
○ channels decrease
○ resolution increase
○ more precise localization
● Skip-connection by concatenation
○ Unlike FCN, U-Net “concatenates” the feature maps from the contracting path with the expansive path
contracting path expansive path
concatenation
U-Net
higher level
lower level
d1
d2
d3
d4
u1
u2
u3
u4
Propagate contextual information to the similar
abstraction level
lower level
higher level
Contracting Path
● 1/2 resolution, 2 channels
● Double Conv
○ 3 x 3 filter
○ no padding - resolution decreases after each conv layer
■ But in practice, often use padding to
match dimension
● ReLU
● 2 x 2 Max Pooling, stride 2
Bottleneck Layer
Expansive Path
● 2 resolution, 1/2 channels
● 2 x 2 Transpose Convolution
● Skip Connection - Concatenation
○ Feature map from the contracting path is cropped to match the dimension for concatenation
● Double Conv
● ReLU
Final Layer
● Now we have to assign a class label to each pixel value
● 1 x1 conv layer (n filters) where n is the number of classes
● Final Output: (B, n, H, W), n = # classes
● Softmax to produce segmentation map
Overlap-tile Strategy
● Image with any arbitrary size can be used as input
● To efficiently use GPU, an input image divided into “tiles” and each tile is fed into the network
● Since we use zero padding, the final output and the input have different resolution
○ In practice, often we use padding to produce equal-resolution output
● Hence, extrapolate the border by mirroring the input image
Pre-computed Weight Map
d1 distance to the border of the closest cell d2: distance to the border of the second closest cell
Data Augmentation
● The paper was originally designed for microscopic medical segmentations
○ hard to obtain large training dataset -> heavy augmentation is necessary
● shift and rotation
● elastic deformations
● gray value variations
● smooth deformations
● drop-out layers
Experiments
References
[1] https://arxiv.org/pdf/1411.4038.pdf
[2] https://arxiv.org/abs/1505.04597

U-Net (1).pptx

  • 1.
    U-Net: Convolutional Networksfor Biomedical Image Segmentation 2022/04/26 Changjin Lee
  • 2.
    Semantic Segmentation - Assigneach pixel a class label
  • 3.
    FCN ● Gradually generatehigher-level feature maps as the network gets deeper ● Transposed Convolution to generate segmentation map ● Model Variations - Utilize “skip-connection” by summation higher-level (context information) low-level (ex. edges, curves)
  • 4.
    U-Net: Idea ● Thefinal feature map from the series of convolutional layers contain high-level and contextual information ● We cannot just jump from this “higher-level” feature maps to segmentation map which requires “low-level” and dense predictions ● To remedy this issue, FCN introduces “skip-connection” which uses “summation” ● U-Net introduces a better structured network architecture to narrow down this disparity ○ U-shape architecture ○ Skip connection by concatenation
  • 5.
    U-Net ● Contracting path ○channels increase ○ resolution decreases ○ gradually generate higher-level features ● Expansive path ○ transpose convolution ○ channels decrease ○ resolution increase ○ more precise localization ● Skip-connection by concatenation ○ Unlike FCN, U-Net “concatenates” the feature maps from the contracting path with the expansive path contracting path expansive path concatenation
  • 6.
    U-Net higher level lower level d1 d2 d3 d4 u1 u2 u3 u4 Propagatecontextual information to the similar abstraction level lower level higher level
  • 7.
    Contracting Path ● 1/2resolution, 2 channels ● Double Conv ○ 3 x 3 filter ○ no padding - resolution decreases after each conv layer ■ But in practice, often use padding to match dimension ● ReLU ● 2 x 2 Max Pooling, stride 2
  • 8.
  • 9.
    Expansive Path ● 2resolution, 1/2 channels ● 2 x 2 Transpose Convolution ● Skip Connection - Concatenation ○ Feature map from the contracting path is cropped to match the dimension for concatenation ● Double Conv ● ReLU
  • 10.
    Final Layer ● Nowwe have to assign a class label to each pixel value ● 1 x1 conv layer (n filters) where n is the number of classes ● Final Output: (B, n, H, W), n = # classes ● Softmax to produce segmentation map
  • 11.
    Overlap-tile Strategy ● Imagewith any arbitrary size can be used as input ● To efficiently use GPU, an input image divided into “tiles” and each tile is fed into the network ● Since we use zero padding, the final output and the input have different resolution ○ In practice, often we use padding to produce equal-resolution output ● Hence, extrapolate the border by mirroring the input image
  • 12.
    Pre-computed Weight Map d1distance to the border of the closest cell d2: distance to the border of the second closest cell
  • 13.
    Data Augmentation ● Thepaper was originally designed for microscopic medical segmentations ○ hard to obtain large training dataset -> heavy augmentation is necessary ● shift and rotation ● elastic deformations ● gray value variations ● smooth deformations ● drop-out layers
  • 14.
  • 15.