World ml summit deep learning for computer vision

Deep Learning for Computer Vision – Techniques
for Semantic Segmentation
Saurabh Jha

Agenda
• Deep Learning in medical imaging, opportunities and types of data
• Semantic Segmentation – Evolution
• Architecture – FCN, U-Net
• Exploring Medical Images
• Skin Lesion detection
• Challenges in Medical Images

Deep Learning in medical imaging:
There is lot of hype
AI in medicine: Rise of
the machines (Forbes,
2017
MIT
Technology
Review
“They should stop training radiologists now”
Geoffrey Hinton (godfather of deep learning , 2017)
“To the question, will AI replace radiologists. I say the answer is no…”
“ …but radiologists who do AI will replace radiologists who don’t”
Curtis Langlotz in 2017

Opportunities – Deep Learning for Medical Imaging
Value Proposition
Level of diagnostic support
Image Acquisition and Reconstruction
Automatic scan planning
Accelerated imaging
Image Enhancement Super-resolution
Semantic Image Segmentation Organ Segmentation
Quantification of Imaging Biomakers
Computer Aided Interpretation
Computer Aided Diagnosis
Tumour Quantification
Screening

What is Medical Imaging?
MR
CT
X-ray
Ultrasound

Semantic Segmentation
• Semantic segmentation is understanding an image at pixel level i.e, we want to assign each pixel
in the image an object class.
• There are many different approaches for estimating the semantic segmentation of the image. Most
common methods are based on Autoencoder(AE) architecture like FCN, Unet architecture

Evolution: Semantic Segmentation
Segmentation as clustering Segmentation as graph
P
R
E
C
N
N
E
R
A
Patch-based methods were used to
overcome small data problem with
limited success.
Ciresan, Dan, et al.(2012)
P
R
E
F
C
N
E
R
A

Deep Convolutional Nets for Segmentation
Need to reason about individual pixels!
Success factors?
• Wide receptive field great
• Spatial Invariance not good
 Need to preserve spatial info!
Want both wide receptive field and high spatial resolution

2 3 0 0
0 0
0 0 3 8
0 0 7 2
5 1
0 0 0 0
5 0 0 0
0 0 0 8
0 0 0 0
Feature response map 2 X 2 window based max pooling
5 0 0 8
Code with magnitude
• CNN’s make use of filters (also known as kernels), to detect what features,
such as edges, are present throughout an image. A filter is just a matrix of
values, called weights, that are trained to detect specific features. The filter
moves over each part of the image to check if the feature it is meant to detect
is present. To provide a value representing how confident it is that a specific
feature is present, the filter carries out a convolution operation, which is an
element-wise product and sum between two matrices.
Introducing Convolution and Max Pooling Operation
Convolution

Convolutional Auto-Encoder
Encoder-Decoder architecture
Ranzato et al, CPVR 07

3 Major innovations on network architecture
• Removal of fully connected layers
• Deconvolution
• Skip Path
• Downsampling path : capture semantic/contextual information
• Upsampling path : recover spatial information
Fully Convolutional Networks for Semantic Segmentation
Removal of fully connected layers
• Dense output with relative size to the input
• Replace with 1 X 1 convolutions to transform feature maps to
class – wise predictions

Upsampling – Unpooling
Input – 2 X 2 Output – 4 X 4
Nearest Neighbor
Max Pooling Max Unpooling
Input: 4 X 4
Output: 2 X 2 Input: 2 X 2
Output: 4 X 4
Corresponding pairs of downsampling and upsampling layers

Transposed Convolution
Stride 1 and no padding, just pad
the original Input (blue entries)
with zeros (white entries)
Data
1 2 3 4
6 7 8 9
11 12 13 14
16 17 18 19
Filter
0.1 0.2 0.3
0.2 0.5 0.4
-0.1 0.3 0.1
Result
13.1 15.1
23.1 25.1
Padded Result
0 0 0 0 0 0
0 0 0 0 0 0
0 0 13.1 15.1 0 0
0 0 23.1 25.1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Deconv Filter
0 0 0
0 0 0
0 0 0
Error
1 2 3 4
6 7 8 9
11 12 13 14
16 17 18 19
Downsampling
Padded Result
0 0 0 0 0 0
0 0 0 0 0 0
0 0 13.1 15.1 0 0
0 0 23.1 25.1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Deconv Filter
0.7687 0.00 0.678
0.0953 0.029 0.092
0.2948 -0.02 0.208
Upsampling – Transposed Convolution
Learn deconv filter after few epochs of SGD
Result
2.73 2.89 3.57 4.45
6.01 6.53 8 8.839
11 13.2 13 14
15.7 17 17.8 19.3
Error
-1.73 -0.89 -0.57 -0.45
-0.01 0.467 0.002 0.161
0 -1.2 0.009 0
0.348 -0.01 0.238 -0.3
Total – 6.373
Result
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

Skip Layer model in detail
Skip Path:
• Concatenate low level features with high level features to
handle multiscale objects
• Provide options for different output sizes
• With skip layer we can get rather finer pixel output
• Upsampling is used to resolve the size incompatible
problem between different layer and combining is done by
simple sum operation

• The U-net architecture is synonymous with an encoder-
decoder architecture. Essentially, it is a deep-learning
framework based on FCNs; it comprises two parts:
• A contracting path similar to an encoder, to capture
context via a compact feature map.
• A symmetric expanding path similar to a decoder,
which allows precise localization. This step is done to
retain boundary information (spatial information)
despite down sampling and max-pooling performed in
the encoder stage.
Advantages of Using U-Net
1.1. Computationally efficient
2.2. Trainable with a small data-set
3.3. Trained end-to-end
4.4. Preferable for bio-medical applications
U-Net Architecture

Axial
View
Exploring Medical Images

Challenges in Medical Imaging
• Access to large datasets require partnering with clinical institutions
• Annotations are very expensive (medical experts required)
• Transfer Learning
- Natural Images are extremely different from medical images
- What to do in case of 3D data? Harder to train requires more data
• Data Augmentations
• Noisy Labels – Agreement between radiologists is low in many cases
• Data Variability – Different machine vendors, different scanning protocols , demographic factors
• GPU memory limitations – Use small batch size
• Imbalance data – class balance is skewed severely towards non-object class
- Majority of the non-object samples are easy to discriminate, lesions are challenging to discriminate
- Dedicated loss function (Dice loss)

World ml summit deep learning for computer vision

More Related Content

Recently uploaded

Featured

World ml summit deep learning for computer vision