Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

World ml summit deep learning for computer vision

10 views

Published on

Deep Learning

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

World ml summit deep learning for computer vision

  1. 1. Deep Learning for Computer Vision – Techniques for Semantic Segmentation Saurabh Jha
  2. 2. Agenda • Deep Learning in medical imaging, opportunities and types of data • Semantic Segmentation – Evolution • Architecture – FCN, U-Net • Exploring Medical Images • Skin Lesion detection • Challenges in Medical Images
  3. 3. Deep Learning in medical imaging: There is lot of hype AI in medicine: Rise of the machines (Forbes, 2017 MIT Technology Review “They should stop training radiologists now” Geoffrey Hinton (godfather of deep learning , 2017) “To the question, will AI replace radiologists. I say the answer is no…” “ …but radiologists who do AI will replace radiologists who don’t” Curtis Langlotz in 2017
  4. 4. Opportunities – Deep Learning for Medical Imaging Value Proposition Level of diagnostic support Image Acquisition and Reconstruction Automatic scan planning Accelerated imaging Image Enhancement Super-resolution Semantic Image Segmentation Organ Segmentation Quantification of Imaging Biomakers Computer Aided Interpretation Computer Aided Diagnosis Tumour Quantification Screening
  5. 5. What is Medical Imaging? MR CT X-ray Ultrasound
  6. 6. Semantic Segmentation • Semantic segmentation is understanding an image at pixel level i.e, we want to assign each pixel in the image an object class. • There are many different approaches for estimating the semantic segmentation of the image. Most common methods are based on Autoencoder(AE) architecture like FCN, Unet architecture
  7. 7. Evolution: Semantic Segmentation Segmentation as clustering Segmentation as graph P R E C N N E R A Patch-based methods were used to overcome small data problem with limited success. Ciresan, Dan, et al.(2012) P R E F C N E R A
  8. 8. Deep Convolutional Nets for Segmentation Need to reason about individual pixels! Success factors? • Wide receptive field great • Spatial Invariance not good  Need to preserve spatial info! Want both wide receptive field and high spatial resolution
  9. 9. 2 3 0 0 0 0 0 0 3 8 0 0 7 2 5 1 0 0 0 0 5 0 0 0 0 0 0 8 0 0 0 0 Feature response map 2 X 2 window based max pooling 5 0 0 8 Code with magnitude • CNN’s make use of filters (also known as kernels), to detect what features, such as edges, are present throughout an image. A filter is just a matrix of values, called weights, that are trained to detect specific features. The filter moves over each part of the image to check if the feature it is meant to detect is present. To provide a value representing how confident it is that a specific feature is present, the filter carries out a convolution operation, which is an element-wise product and sum between two matrices. Introducing Convolution and Max Pooling Operation Convolution
  10. 10. Convolutional Auto-Encoder Encoder-Decoder architecture Ranzato et al, CPVR 07
  11. 11. 3 Major innovations on network architecture • Removal of fully connected layers • Deconvolution • Skip Path • Downsampling path : capture semantic/contextual information • Upsampling path : recover spatial information Fully Convolutional Networks for Semantic Segmentation Removal of fully connected layers • Dense output with relative size to the input • Replace with 1 X 1 convolutions to transform feature maps to class – wise predictions
  12. 12. Upsampling – Unpooling Input – 2 X 2 Output – 4 X 4 Nearest Neighbor Max Pooling Max Unpooling Input: 4 X 4 Output: 2 X 2 Input: 2 X 2 Output: 4 X 4 Corresponding pairs of downsampling and upsampling layers
  13. 13. Transposed Convolution Stride 1 and no padding, just pad the original Input (blue entries) with zeros (white entries) Data 1 2 3 4 6 7 8 9 11 12 13 14 16 17 18 19 Filter 0.1 0.2 0.3 0.2 0.5 0.4 -0.1 0.3 0.1 Result 13.1 15.1 23.1 25.1 Padded Result 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13.1 15.1 0 0 0 0 23.1 25.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Deconv Filter 0 0 0 0 0 0 0 0 0 Error 1 2 3 4 6 7 8 9 11 12 13 14 16 17 18 19 Downsampling Padded Result 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13.1 15.1 0 0 0 0 23.1 25.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Deconv Filter 0.7687 0.00 0.678 0.0953 0.029 0.092 0.2948 -0.02 0.208 Upsampling – Transposed Convolution Learn deconv filter after few epochs of SGD Result 2.73 2.89 3.57 4.45 6.01 6.53 8 8.839 11 13.2 13 14 15.7 17 17.8 19.3 Error -1.73 -0.89 -0.57 -0.45 -0.01 0.467 0.002 0.161 0 -1.2 0.009 0 0.348 -0.01 0.238 -0.3 Total – 6.373 Result 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  14. 14. Skip Layer model in detail Skip Path: • Concatenate low level features with high level features to handle multiscale objects • Provide options for different output sizes • With skip layer we can get rather finer pixel output • Upsampling is used to resolve the size incompatible problem between different layer and combining is done by simple sum operation
  15. 15. • The U-net architecture is synonymous with an encoder- decoder architecture. Essentially, it is a deep-learning framework based on FCNs; it comprises two parts: • A contracting path similar to an encoder, to capture context via a compact feature map. • A symmetric expanding path similar to a decoder, which allows precise localization. This step is done to retain boundary information (spatial information) despite down sampling and max-pooling performed in the encoder stage. Advantages of Using U-Net 1.1. Computationally efficient 2.2. Trainable with a small data-set 3.3. Trained end-to-end 4.4. Preferable for bio-medical applications U-Net Architecture
  16. 16. Axial View Exploring Medical Images
  17. 17. Skin Lesion Detection
  18. 18. Challenges in Medical Imaging • Access to large datasets require partnering with clinical institutions • Annotations are very expensive (medical experts required) • Transfer Learning - Natural Images are extremely different from medical images - What to do in case of 3D data? Harder to train requires more data • Data Augmentations • Noisy Labels – Agreement between radiologists is low in many cases • Data Variability – Different machine vendors, different scanning protocols , demographic factors • GPU memory limitations – Use small batch size • Imbalance data – class balance is skewed severely towards non-object class - Majority of the non-object samples are easy to discriminate, lesions are challenging to discriminate - Dedicated loss function (Dice loss)

×