Deep Learning forComputer Vision – Techniques
for Semantic Segmentation
Saurabh Jha
2.
Agenda
• Deep Learningin medical imaging, opportunities and types of data
• Semantic Segmentation – Evolution
• Architecture – FCN, U-Net
• Exploring Medical Images
• Skin Lesion detection
• Challenges in Medical Images
3.
Deep Learning inmedical imaging:
There is lot of hype
AI in medicine: Rise of
the machines (Forbes,
2017
MIT
Technology
Review
“They should stop training radiologists now”
Geoffrey Hinton (godfather of deep learning , 2017)
“To the question, will AI replace radiologists. I say the answer is no…”
“ …but radiologists who do AI will replace radiologists who don’t”
Curtis Langlotz in 2017
4.
Opportunities – DeepLearning for Medical Imaging
Value Proposition
Level of diagnostic support
Image Acquisition and Reconstruction
Automatic scan planning
Accelerated imaging
Image Enhancement Super-resolution
Semantic Image Segmentation Organ Segmentation
Quantification of Imaging Biomakers
Computer Aided Interpretation
Computer Aided Diagnosis
Tumour Quantification
Screening
Semantic Segmentation
• Semanticsegmentation is understanding an image at pixel level i.e, we want to assign each pixel
in the image an object class.
• There are many different approaches for estimating the semantic segmentation of the image. Most
common methods are based on Autoencoder(AE) architecture like FCN, Unet architecture
7.
Evolution: Semantic Segmentation
Segmentationas clustering Segmentation as graph
P
R
E
C
N
N
E
R
A
Patch-based methods were used to
overcome small data problem with
limited success.
Ciresan, Dan, et al.(2012)
P
R
E
F
C
N
E
R
A
8.
Deep Convolutional Netsfor Segmentation
Need to reason about individual pixels!
Success factors?
• Wide receptive field great
• Spatial Invariance not good
Need to preserve spatial info!
Want both wide receptive field and high spatial resolution
9.
2 3 00
0 0
0 0 3 8
0 0 7 2
5 1
0 0 0 0
5 0 0 0
0 0 0 8
0 0 0 0
Feature response map 2 X 2 window based max pooling
5 0 0 8
Code with magnitude
• CNN’s make use of filters (also known as kernels), to detect what features,
such as edges, are present throughout an image. A filter is just a matrix of
values, called weights, that are trained to detect specific features. The filter
moves over each part of the image to check if the feature it is meant to detect
is present. To provide a value representing how confident it is that a specific
feature is present, the filter carries out a convolution operation, which is an
element-wise product and sum between two matrices.
Introducing Convolution and Max Pooling Operation
Convolution
3 Major innovationson network architecture
• Removal of fully connected layers
• Deconvolution
• Skip Path
• Downsampling path : capture semantic/contextual information
• Upsampling path : recover spatial information
Fully Convolutional Networks for Semantic Segmentation
Removal of fully connected layers
• Dense output with relative size to the input
• Replace with 1 X 1 convolutions to transform feature maps to
class – wise predictions
12.
Upsampling – Unpooling
Input– 2 X 2 Output – 4 X 4
Nearest Neighbor
Max Pooling Max Unpooling
Input: 4 X 4
Output: 2 X 2 Input: 2 X 2
Output: 4 X 4
Corresponding pairs of downsampling and upsampling layers
Skip Layer modelin detail
Skip Path:
• Concatenate low level features with high level features to
handle multiscale objects
• Provide options for different output sizes
• With skip layer we can get rather finer pixel output
• Upsampling is used to resolve the size incompatible
problem between different layer and combining is done by
simple sum operation
15.
• The U-netarchitecture is synonymous with an encoder-
decoder architecture. Essentially, it is a deep-learning
framework based on FCNs; it comprises two parts:
• A contracting path similar to an encoder, to capture
context via a compact feature map.
• A symmetric expanding path similar to a decoder,
which allows precise localization. This step is done to
retain boundary information (spatial information)
despite down sampling and max-pooling performed in
the encoder stage.
Advantages of Using U-Net
1.1. Computationally efficient
2.2. Trainable with a small data-set
3.3. Trained end-to-end
4.4. Preferable for bio-medical applications
U-Net Architecture
Challenges in MedicalImaging
• Access to large datasets require partnering with clinical institutions
• Annotations are very expensive (medical experts required)
• Transfer Learning
- Natural Images are extremely different from medical images
- What to do in case of 3D data? Harder to train requires more data
• Data Augmentations
• Noisy Labels – Agreement between radiologists is low in many cases
• Data Variability – Different machine vendors, different scanning protocols , demographic factors
• GPU memory limitations – Use small batch size
• Imbalance data – class balance is skewed severely towards non-object class
- Majority of the non-object samples are easy to discriminate, lesions are challenging to discriminate
- Dedicated loss function (Dice loss)