U-Net is a convolutional neural network (CNN) architecture designed for semantic segmentation tasks, especially in the field of medical image analysis. It was introduced by Olaf Ronneberger, Philipp Fischer, and Thomas Brox in 2015. The name "U-Net" comes from its U-shaped architecture.
Key features of the U-Net architecture:
U-Shaped Design: U-Net consists of a contracting path (downsampling) and an expansive path (upsampling). The architecture resembles the letter "U" when visualized.
Contracting Path (Encoder):
The contracting path involves a series of convolutional and pooling layers.
Each convolutional layer is followed by a rectified linear unit (ReLU) activation function and possibly other normalization or activation functions.
Pooling layers (usually max pooling) reduce spatial dimensions, capturing high-level features.
Expansive Path (Decoder):
The expansive path involves a series of upsampling and convolutional layers.
Upsampling is achieved using transposed convolution (also known as deconvolution or convolutional transpose).
Skip connections are established between corresponding layers in the contracting and expansive paths. These connections help retain fine-grained spatial information during the upsampling process.
Skip Connections:
Skip connections concatenate feature maps from the contracting path to the corresponding layers in the expansive path.
These connections facilitate the fusion of low-level and high-level features, aiding in precise localization.
Final Layer:
The final layer typically uses a convolutional layer with a softmax activation function for multi-class segmentation tasks, providing probability scores for each class.
U-Net's architecture and skip connections help address the challenge of segmenting objects with varying sizes and shapes, which is often encountered in medical image analysis. Its success in this domain has led to its application in other areas of computer vision as well.
The U-Net architecture has also been extended and modified in various ways, leading to improvements like the U-Net++ architecture and variations with attention mechanisms, which further enhance the segmentation performance.
U-Net's intuitive design and effectiveness in semantic segmentation tasks have made it a cornerstone in the field of medical image analysis and an influential architecture for researchers working on segmentation challenges.
2. What does a U-Net do?
Input Image Output Segmentation Map
Learns Segmentation
3. WHAT IS U-NET ARCHITECTURE?
UNET stands for U-network
UNET is a U shaped encoder-decoder network architecture which consists of four enoder -decoder blocks
and four decoder blocks that are connected via a bridge. U-Net is a deep learning architecture for image
segmentation. It consists of an encoder network, which down samples the input image, and a decoder
network, which up samples the encoded features to the original image size while learning to segment the
image into different classes. The two networks are connected by "skip connections" that concatenate the
encoder features with the corresponding decoder features, allowing the decoder to recover fine-grained
details lost during the down sampling process. U-Net is popular in medical imaging and is widely used
for tasks such as cell segmentation, tissue segmentation, and organ segmentation.
4. U-NET USAGE FOR DIFFERENT TASKS
• U-Net is a popular deep learning architecture that is primarily used for image
segmentation tasks, but it can also be used for various other tasks such as
1. Medical Image Analysis
2. Computer Vision
3. Generative Models
4. Anomaly Detection
5. Time-series Forecasting
6. Sentiment Analysis
8. U-Net Architecture
Ronneberger et al. (2015) U-net Architecture
Concatenate with high-resolution feature
maps from the Contraction Phase
9. IMAGE SEGMENTATION
• Image segmentation is commonly used technique in digital image processing
and analytic to partition an image into multiple parts or region often based on
the characteristics of the pixels in the image
• The goal of segmentation is to simplify or change the representation of an image
into something that is more meaningful and easier to analyze.
10. OOP USED IN DEFINING THE DATA SET IN THE
IMAGE SEGMENTATION
11. • This code defines a UNet class which is a subclass of nn.Module from PyTorch's Neural Network library. The class has the following
attributes:num_classes - number of classes in the target outputcontracting_11, contracting_21, contracting_31, and contracting_41 - the
1st convolutional layer in each of the 4 contracting path blockscontracting_12, contracting_22, contracting_32, and contracting_42 - the
2nd max pooling layer in each of the 4 contracting path blocksmiddle - the middle convolutional layer of the Unet.
• expansive_11, expansive_21, expansive_31, and expansive_41 - the 1st transposed convolutional layer in each of the 4 expanding path
blocksexpansive_12, expansive_22, expansive_32, and expansive_42 - the 2nd convolutional layer in each of the 4 expanding path
blocksoutput - the final output layer with a convolutional operationThe class has a _init_ method which initializes the UNet class and
sets the number of classes. It also creates the contracting and expanding path blocks using the conv_block method which creates a
sequence of 2 consecutive convolutional layers followed by a ReLU activation function and a batch normalization layer.
• The class has a forward method which implements the forward pass of the UNet. This method passes the input image X through the
contracting path blocks to get the middle layer output and then passes this output through the expanding path blocks to get the final
output. The forward method uses the nn module to perform the required operations
12. OOP USED IN DEFINING THE MODEL IN THE
IMAGE SEGMENTATION
13. U-Net Summary
• Contraction Phase
• Reduce spatial dimension, but increases the “what.”
• Expansion Phase
• Recovers object details and the dimensions, which is the “where.”
• Concatenating feature maps from the Contraction phase helps the Expansion
phase with recovering the “where” information.