Deep Learning for
Medical ImagingGEETA CHAUHAN, CTO SVSG
MARCH 5TH, 2018
Agenda
 Use cases for Deep Learning in Medical Imaging
 What is Deep Learning?
 Deep Learning models in Medical Imaging
 Rise of Specialized Compute
 Techniques for Optimization
 E2E Pipeline
 Look into future
 Steps for starting your journey
 References
Source: CBInsights
Deep Learning in Medical Imaging
Real-time
Clinical
Diagnostics
(Enlitic)
Whole-body
Portable
Ultrasound
(Butterfly Networks,
Baylabs)
Radiology
Assistant, Cloud
Imaging AI
(Zebra, Arterys)
Intelligent Stoke
Care
(Viz.ai)
Screening
Tumor, Diabetic
Retinopathy
(Google, Enlitic, IBM)
Oncology
(Flatiron Health)
Source: Nature
Skin Cancer
 5.4M cases on non-melanoma
skin cancer each year in US
 20% Americans will get skin
cancer
 Actinic Keratosis (pre-cancer)
affects 58 M Americans
 78k melanomas each year –
10K deaths
 $8.1B in US annual costs for skin
cancer
5
Successes!
 Mammographic mass
classification
 Brain Lesions
 Air way leakages
 Diabetic Retinopathy
 Prostrate Segmentation
 Breast cancer metastasis
 Skin Lesion Classification
 Bone suppression in Chest X-Rays
6
Source: arXiv:1702.05747
What is
Deep
Learning?
 AI Neural Networks
composed of many
layers
 Learn like humans
 Automated Feature
Learning
 Layers are like Image
Filters
Deep
Learning in
Medical
Imaging
SURVEY OF 300+ PAPERS
8
Source: arXiv:1702.05747
Medical
imaging models
 Pre-trained networks with
Transfer learning
 U-Net, V-Net, E-Net
 FCN – fully convolutional net
with skip connections, Multi-
stream CNNs
 TieNet, DenseCNN Encoder +
RNN Decoder – Multi-label
classification
 FCN + MDP (RL) for 2d/3d
Image Registration
9
Source: arXiv:1505.04597
Medical
imaging
models
10
TIENET – AUTOMATIC LABELS FOR
CHEST X-RAYS
Shift towards Specialized Compute
 Special purpose Cloud
 Google TPU, Microsoft Brainwave, Intel Nervana, IBM Power AI, Nvidia v100
 Bare Metal Cloud – Preview AWS, GCE coming April 2018
 Spectrum: CPU, GPU, FPGA, Custom Asics
 Edge Compute: Hardware accelerators, AI SOC
 Intel Neural Compute Stick, Nvidia Jetson, Nvidia Drive PX (Self driving cars)
 Architectures
 Cluster Compute, HPC, Neuromorphic, Quantum compute
 Complexity in Software
 Model tuning/optimizations specific to hardware
 Growing need for compilers to optimize based on deployment hardware
 Workload specific compute: Model training, Inference
11
CPU Optimizations
 Leverage High Performant compute tools
 Intel Python, Intel Math Kernel Library (MKL),
NNPack (for multi-core CPUs)
 Compile Tensorflow from Source for CPU
Optimizations
 Proper Batch size, using all cores & memory
 Proper Data Format
 NCHW for CPUs vs Tensorflow default NHWC
 Use Queues for Reading Data
Source: Intel Research Blog
12
Tensorflow CPU Optimizations
 Compile from source
 git clone https://github.com/tensorflow/tensorflow.git
 Run ./configure from Tensorflow source directory
 Select option MKL (CPU) Optimization
 Build pip package for install
 bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt
//tensorflow/tools/pip_package:build_pip_package
 Install the optimized TensorFlow wheel
 bazel-bin/tensorflow/tools/pip_package/build_pip_package
~/path_to_save_wheel
pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl
 Intel Optimized Pip Wheel files
13
Parallelize your models
 Data Parallelism
 Tensorflow Estimator + Experiments
 Parameter Server, Worker cluster
 Intel BigDL Spark Cluster
 Baidu’s Ring AllReduce
 Uber’s Horovod TensorFusion
 HyperTune Google Cloud ML
 Model Parallelism
 Graph too large to fit on one
machine
 Tensorflow Model Towers
14
Optimizations for Training
Source: Amazon MxNET
15
Workload Partitioning
Source: Amazon MxNET
 Minimize communication time
 Place neighboring layers on same GPU
 Balance workload between GPUs
 Different layers have different memory-compute
properties
 Model on left more balanced
 LSTM unrolling: ↓ memory, ↑ compute time
 Encode/Decode: ↑ memory
16
Optimizations for Inferencing
 Graph Transform Tool
 Freeze graph (variables to constants)
 Quantize weights (20 M weights for IV3)
 Inception v3 93 MB → 1.5 MB
 Pruning, Weight Sharing, Deep Compression
 AlexNet 35x smaller, VGG-16 49x smaller
 3x to 4x speedup, 3x to 7x more energy-efficient
17
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph 
--in_graph=/tmp/classify_image_graph_def.pb 
--outputs="softmax" --out_graph=/tmp/quantized_graph.pb 
--transforms='add_default_attributes strip_unused_nodes(type=float,
shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'
Cluster
Optimizations
 Define your ML Container locally
 Evaluate with different parameters in the cloud
 Use EFS / GFS for data storage and sharing across
nodes
 Create separate Data processing container
 Mount EFS/GFS drive on all pods for shared
storage
 Avoid GPU Fragmentation problems by bundling
jobs
 Placement optimizations – Kubernetes Bundle
as pods, Mesos placement constraints
 GPU Drivers bundling in container a problem
 Mount as Readonly volume, or use Nvidia-
docker
18
Uber’s
Horovod on
Mesos
 Peleton Gang Scheduler
 MPI based bandwidth
optimized communication
 Code for one GPU, replicates
across cluster
 Nested Containers
19
Source: Uber Mesoscon
Pipeline:
Google’s TFX
20
 Continuous Training & Serving
 Data Analysis, Transformation,
Validation
 Model Training, Validation,
Serving
 Warm-Startup
Future:
Explainability
21
 Active research area
 Current Techniques
 Activation Heat Maps
 Saliency Maps
 Reconstruct Image
 t-sne vizualization
Future: FPGA Hardware Microservices
Project Brainwave Source: Microsoft Research Blog
22
FPGA Optimizations
Brainwave Compiler Source: Microsoft Research Blog
23
Can FPGA Beat GPU Paper:
➢ Optimizing CNNs on Intel FPGA
➢ FPGA vs GPU: 60x faster, 2.3x more energy-
efficient
➢ <1% loss of accuracy
ESE on FPGA Paper:
➢ Optimizing LSTMs on Xilinx FPGA
➢ FPGA vs CPU: 43x faster, 40x more energy-
efficient
➢ FPGA vs GPU: 3x faster, 11.5x more energy-
efficient
Future: Neuromorphic Compute
Intel’s Loihi: Brain Inspired AI Chip Neuromorphic memristors
24
Future:
Quantum
Computers
Source: opentranscripts.org
+ Personalized Medicine for Cancer Treatment
? Cybersecurity a big challenge
25
Medical Imaging Open Datasets
 http://www.cancerimagingarchive.net/
 Lung Cancer, Skin Cancer, Breast Cancer….
 Kaggle Open Datasets
 Diabetic Retinopathy, Lung Cancer
 Kaggle Data Science Bowl 2018
 https://www.kaggle.com/c/data-science-bowl-2018
 ISIC Skin Cancer Dataset
 https://challenge.kitware.com/#challenge/583f126bcad3a51cc66c8d9a
 Grand Challenges in Medical Image Analysis
 https://grand-challenges.grand-challenge.org/all_challenges/
 And more…
 https://github.com/sfikas/medical-imaging-datasets
26
Where to start your journey?
 Level 1: Just Starting
 Start with the Kaggle and other Open Competitions
 Use the existing pre-trained networks (like GoogleNet) with the Medical Open Source
data
 Level 2: Intermediate
 Experiment with models specific to Medical Imaging space like U-Net/V-Net
 Combine 3rd party data sets for greater insights
 Level 3: Advanced
 Experiment with building new models from scratch
 Level 4: Mature
 Add feedback loop to your models, learning from outcomes
 Experiment with Deep Reinforcement Learning
 Industrialize the ML/DL Pipeline, shared model repository across company
27
Resources
 CBInsights AI in Healthcare Map: https://www.cbinsights.com/research/artificial-intelligence-startups-healthcare/
 DL in Medical Imaging Survey : https://arxiv.org/pdf/1702.05747.pdf
 Unet: https://arxiv.org/pdf/1505.04597.pdf
 Learning to diagnose from scratch exploiting dependencies in labels: https://arxiv.org/pdf/1710.10501.pdf
 TieNet Chest X-Ray Auto-reporting: https://arxiv.org/pdf/1801.04334.pdf
 Dermatologist level classification of Skin Cancer using DL: https://www.nature.com/articles/nature21056
 Tensorflow Intel CPU Optimized: https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-
architecture
 Tensorflow Quantization: https://www.tensorflow.org/performance/quantization
 Deep Compression Paper: https://arxiv.org/abs/1510.00149
 Microsoft’s Project Brainwave: https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/
 Can FPGAs Beat GPUs?: http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf
 ESE on FPGA: https://arxiv.org/abs/1612.00694
 Intel Spark BigDL: https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark
 Baidu’s Paddle-Paddle on Kubernetes: http://blog.kubernetes.io/2017/02/run-deep-learning-with-paddlepaddle-on-
kubernetes.html
 Uber’s Horovod Distributed Training framework for Tensorflow: https://github.com/uber/horovod
 TFX: Tensorflow based production scale ML Platform: https://dl.acm.org/citation.cfm?id=3098021
 Explainable AI: https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf
28
Questions?
Contact
http://bit.ly/geeta4c
geeta@svsg.co
@geeta4c

Deep learning for medical imaging

  • 1.
    Deep Learning for MedicalImagingGEETA CHAUHAN, CTO SVSG MARCH 5TH, 2018
  • 2.
    Agenda  Use casesfor Deep Learning in Medical Imaging  What is Deep Learning?  Deep Learning models in Medical Imaging  Rise of Specialized Compute  Techniques for Optimization  E2E Pipeline  Look into future  Steps for starting your journey  References
  • 3.
  • 4.
    Deep Learning inMedical Imaging Real-time Clinical Diagnostics (Enlitic) Whole-body Portable Ultrasound (Butterfly Networks, Baylabs) Radiology Assistant, Cloud Imaging AI (Zebra, Arterys) Intelligent Stoke Care (Viz.ai) Screening Tumor, Diabetic Retinopathy (Google, Enlitic, IBM) Oncology (Flatiron Health)
  • 5.
    Source: Nature Skin Cancer 5.4M cases on non-melanoma skin cancer each year in US  20% Americans will get skin cancer  Actinic Keratosis (pre-cancer) affects 58 M Americans  78k melanomas each year – 10K deaths  $8.1B in US annual costs for skin cancer 5
  • 6.
    Successes!  Mammographic mass classification Brain Lesions  Air way leakages  Diabetic Retinopathy  Prostrate Segmentation  Breast cancer metastasis  Skin Lesion Classification  Bone suppression in Chest X-Rays 6 Source: arXiv:1702.05747
  • 7.
    What is Deep Learning?  AINeural Networks composed of many layers  Learn like humans  Automated Feature Learning  Layers are like Image Filters
  • 8.
    Deep Learning in Medical Imaging SURVEY OF300+ PAPERS 8 Source: arXiv:1702.05747
  • 9.
    Medical imaging models  Pre-trainednetworks with Transfer learning  U-Net, V-Net, E-Net  FCN – fully convolutional net with skip connections, Multi- stream CNNs  TieNet, DenseCNN Encoder + RNN Decoder – Multi-label classification  FCN + MDP (RL) for 2d/3d Image Registration 9 Source: arXiv:1505.04597
  • 10.
  • 11.
    Shift towards SpecializedCompute  Special purpose Cloud  Google TPU, Microsoft Brainwave, Intel Nervana, IBM Power AI, Nvidia v100  Bare Metal Cloud – Preview AWS, GCE coming April 2018  Spectrum: CPU, GPU, FPGA, Custom Asics  Edge Compute: Hardware accelerators, AI SOC  Intel Neural Compute Stick, Nvidia Jetson, Nvidia Drive PX (Self driving cars)  Architectures  Cluster Compute, HPC, Neuromorphic, Quantum compute  Complexity in Software  Model tuning/optimizations specific to hardware  Growing need for compilers to optimize based on deployment hardware  Workload specific compute: Model training, Inference 11
  • 12.
    CPU Optimizations  LeverageHigh Performant compute tools  Intel Python, Intel Math Kernel Library (MKL), NNPack (for multi-core CPUs)  Compile Tensorflow from Source for CPU Optimizations  Proper Batch size, using all cores & memory  Proper Data Format  NCHW for CPUs vs Tensorflow default NHWC  Use Queues for Reading Data Source: Intel Research Blog 12
  • 13.
    Tensorflow CPU Optimizations Compile from source  git clone https://github.com/tensorflow/tensorflow.git  Run ./configure from Tensorflow source directory  Select option MKL (CPU) Optimization  Build pip package for install  bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt //tensorflow/tools/pip_package:build_pip_package  Install the optimized TensorFlow wheel  bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/path_to_save_wheel pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl  Intel Optimized Pip Wheel files 13
  • 14.
    Parallelize your models Data Parallelism  Tensorflow Estimator + Experiments  Parameter Server, Worker cluster  Intel BigDL Spark Cluster  Baidu’s Ring AllReduce  Uber’s Horovod TensorFusion  HyperTune Google Cloud ML  Model Parallelism  Graph too large to fit on one machine  Tensorflow Model Towers 14
  • 15.
  • 16.
    Workload Partitioning Source: AmazonMxNET  Minimize communication time  Place neighboring layers on same GPU  Balance workload between GPUs  Different layers have different memory-compute properties  Model on left more balanced  LSTM unrolling: ↓ memory, ↑ compute time  Encode/Decode: ↑ memory 16
  • 17.
    Optimizations for Inferencing Graph Transform Tool  Freeze graph (variables to constants)  Quantize weights (20 M weights for IV3)  Inception v3 93 MB → 1.5 MB  Pruning, Weight Sharing, Deep Compression  AlexNet 35x smaller, VGG-16 49x smaller  3x to 4x speedup, 3x to 7x more energy-efficient 17 bazel build tensorflow/tools/graph_transforms:transform_graph bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=/tmp/classify_image_graph_def.pb --outputs="softmax" --out_graph=/tmp/quantized_graph.pb --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes strip_unused_nodes sort_by_execution_order'
  • 18.
    Cluster Optimizations  Define yourML Container locally  Evaluate with different parameters in the cloud  Use EFS / GFS for data storage and sharing across nodes  Create separate Data processing container  Mount EFS/GFS drive on all pods for shared storage  Avoid GPU Fragmentation problems by bundling jobs  Placement optimizations – Kubernetes Bundle as pods, Mesos placement constraints  GPU Drivers bundling in container a problem  Mount as Readonly volume, or use Nvidia- docker 18
  • 19.
    Uber’s Horovod on Mesos  PeletonGang Scheduler  MPI based bandwidth optimized communication  Code for one GPU, replicates across cluster  Nested Containers 19 Source: Uber Mesoscon
  • 20.
    Pipeline: Google’s TFX 20  ContinuousTraining & Serving  Data Analysis, Transformation, Validation  Model Training, Validation, Serving  Warm-Startup
  • 21.
    Future: Explainability 21  Active researcharea  Current Techniques  Activation Heat Maps  Saliency Maps  Reconstruct Image  t-sne vizualization
  • 22.
    Future: FPGA HardwareMicroservices Project Brainwave Source: Microsoft Research Blog 22
  • 23.
    FPGA Optimizations Brainwave CompilerSource: Microsoft Research Blog 23 Can FPGA Beat GPU Paper: ➢ Optimizing CNNs on Intel FPGA ➢ FPGA vs GPU: 60x faster, 2.3x more energy- efficient ➢ <1% loss of accuracy ESE on FPGA Paper: ➢ Optimizing LSTMs on Xilinx FPGA ➢ FPGA vs CPU: 43x faster, 40x more energy- efficient ➢ FPGA vs GPU: 3x faster, 11.5x more energy- efficient
  • 24.
    Future: Neuromorphic Compute Intel’sLoihi: Brain Inspired AI Chip Neuromorphic memristors 24
  • 25.
    Future: Quantum Computers Source: opentranscripts.org + PersonalizedMedicine for Cancer Treatment ? Cybersecurity a big challenge 25
  • 26.
    Medical Imaging OpenDatasets  http://www.cancerimagingarchive.net/  Lung Cancer, Skin Cancer, Breast Cancer….  Kaggle Open Datasets  Diabetic Retinopathy, Lung Cancer  Kaggle Data Science Bowl 2018  https://www.kaggle.com/c/data-science-bowl-2018  ISIC Skin Cancer Dataset  https://challenge.kitware.com/#challenge/583f126bcad3a51cc66c8d9a  Grand Challenges in Medical Image Analysis  https://grand-challenges.grand-challenge.org/all_challenges/  And more…  https://github.com/sfikas/medical-imaging-datasets 26
  • 27.
    Where to startyour journey?  Level 1: Just Starting  Start with the Kaggle and other Open Competitions  Use the existing pre-trained networks (like GoogleNet) with the Medical Open Source data  Level 2: Intermediate  Experiment with models specific to Medical Imaging space like U-Net/V-Net  Combine 3rd party data sets for greater insights  Level 3: Advanced  Experiment with building new models from scratch  Level 4: Mature  Add feedback loop to your models, learning from outcomes  Experiment with Deep Reinforcement Learning  Industrialize the ML/DL Pipeline, shared model repository across company 27
  • 28.
    Resources  CBInsights AIin Healthcare Map: https://www.cbinsights.com/research/artificial-intelligence-startups-healthcare/  DL in Medical Imaging Survey : https://arxiv.org/pdf/1702.05747.pdf  Unet: https://arxiv.org/pdf/1505.04597.pdf  Learning to diagnose from scratch exploiting dependencies in labels: https://arxiv.org/pdf/1710.10501.pdf  TieNet Chest X-Ray Auto-reporting: https://arxiv.org/pdf/1801.04334.pdf  Dermatologist level classification of Skin Cancer using DL: https://www.nature.com/articles/nature21056  Tensorflow Intel CPU Optimized: https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel- architecture  Tensorflow Quantization: https://www.tensorflow.org/performance/quantization  Deep Compression Paper: https://arxiv.org/abs/1510.00149  Microsoft’s Project Brainwave: https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/  Can FPGAs Beat GPUs?: http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf  ESE on FPGA: https://arxiv.org/abs/1612.00694  Intel Spark BigDL: https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark  Baidu’s Paddle-Paddle on Kubernetes: http://blog.kubernetes.io/2017/02/run-deep-learning-with-paddlepaddle-on- kubernetes.html  Uber’s Horovod Distributed Training framework for Tensorflow: https://github.com/uber/horovod  TFX: Tensorflow based production scale ML Platform: https://dl.acm.org/citation.cfm?id=3098021  Explainable AI: https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf 28
  • 29.