A World -Leading Science Foundation Ireland Research Centre
Deep Learning for Medical
Image Analysis
Keelin Murphy
July 3rd
2017
About Me
BA Mathematics TCD
Software Industry 4-5 years
MSc + PhD in Biomedical Image Analysis
UCC (INFANT Research Centre)
Utrecht Medical Center, the Netherlands
Today….
Machine Learning
Deep Learning
Deep Learning for Image Analysis
Deep Learning for Medical Image Analysis
Deep Learning for Neonatal Medical Image Analysis
Artificial Deep Intelligent Machine Learning …..?
Deep
Learning
Machine Learning
Artificial Intelligence
Machine Learning
Finding patterns in data
Supervised learning –
Learn by example, like humans
CAR
Training:
Feed in labelled data
Machine Learning
CARGIRLDOG
Testing:
Request labels for new data
How does it learn?
Machine Learning
Everything Else
(“Conventional” Machine Learning)
Neural Networks
(AKA Deep Learning)
Support Vector Machines
Random Forests
Gradient Boosting
Linear Classifiers
Nearest Neighbour Classifiers
……..
“Hand-crafted” Features
Hand-crafting features
Texture
Curvature
What features do humans use?
Penguin Mania
Are humans always the best?
Are humans always the best?
Deep Learning
(Artificial) Neural networks with lots of hidden layers (deep)
The network determines what features are
useful
Lost favour until around 2006-2012
- Large amounts of data online
- GPU and distributed processing
Source: Alexander Del Toro Barba
https://www.linkedin.com/pulse/how-artificial-intelligence-revolutionizing-finance-del-toro-barba
Neural Networks: Auto-features!
Dog
Cat
Penguin
0.1
0.2
0.7
Input Layer Hidden Layers Output Layer
Dog
Cat
Penguin
0.0
0.0
1.0
TRUTH
ERROR
ERROR
ERROR
Neuron / Perceptron
= Matrix of Weights
TRAINING
0.3
0.4
0.3
Back-Propagation
Update weights to minimize errors
Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
W1x + b1
W2x + b2
(w11)(x1) + (w12)(x2) + (w13)(x3) + (w14)(x4) + b1
(w21)(x1) + (w22)(x2) + (w23)(x3) + (w24)(x4) + b2
Output Layer
Dog
Cat
Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
f(W1x + b1)
f(W2x + b2)
f = activation
function (non
linearity)
Output Layer
Dog
Cat
Deep Neural Networks
Activation Functions e.g.
Sigmoid
No longer recommended
ReLU (Rectified Linear Unit)
Very popular
Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
f(W1x + b1)
f(W2x + b2)
f = activation
function (non
linearity)
Output Layer
Dog
Cat
Softmax
Function
n1
n2
Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
Output Layer
Dog
Cat
Softmax
Function
TRAINING
Dog
Cat
TRUTH
ERROR
ERROR
Back-Propagation
Update weights to minimize errors
1.0
0.0
n1
n2
f(W1x + b1)
f(W2x + b2)
Deep Neural Networks
Back-propagation:
Choose weight changes
which move us
“downwards” in the loss
function, L
Gradient Descent:
W11
W12
L
Network error
measured by Loss function,
L (Cost function)
ERROR
Deep Neural Networks
Gradient Descent = basis for many more sophisticated
optimization methods
Optimizer = Method of updating weights based on Loss
Adam, Adagrad, Adamax, RMSProp……. etc etc
See also http://sebastianruder.com/optimizing-gradient-descent/index.html
Deep Neural Networks
We have discussed Fully Connected Neural Networks
Deep Neural Networks
256
256
(Small) RGB image
Fully connected model :
•256 x 256 x 3 weights PER neuron in first
hidden layer!
•Flattening input to a vector loses information
What about image analysis……?
By Aphex34 - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=45659236
Convolutional Neural Networks
Input layer (image)
First Hidden layer
(Num channels (features) = 5)
CNN model :
•Neurons arranged in blocks
•Each neuron connects to a small region
of the input (receptive field)
•Neurons in same channel share same
weights
•Weight-sharing -> detection of similar
features across the image
Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolutional Neural Networks
Convolutional Neural Networks
Adapted from : http://benanne.github.io/images/architecture.png
MaxPool: reduce dimensionality, prevent overfitting
Could also add “dropout” layer to help with overfitting
# Code Snippet
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
Convolutional Neural Networks
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
Anything with patterns in data!
Fraud Detection
Risk Assessment
Speech Recognition
Text Analysis
Targeted advertising
………..
Practical Applications
e.g.
Manufacturing
Photo Management
Driverless Car
Remote Sensing
Medical Images……
Imaging Applications
Image Categorisation
flamingo
Source : ImageNet 2011
(Large Scale Visual Recognition Challenge)
Training data:
1.2 million images
1000 image categories
Image Object Localisation
Source : ImageNet 2013
(Large Scale Visual Recognition Challenge)
Training data:
396,000 images
200 object categories
Image Captioning
"a man is throwing a frisbee in a park“
Source : http://cs.stanford.edu/people/karpathy/neuraltalk2/demo.html
Image Semantic Segmentation
Source :
www.robots.ox.ac.uk/~szheng/CRFasRNN.html
Generating Art
Source :
https://www.technologyreview.com/s/
608195/machine-creativity-beats-some-
modern-art/
Medical Images
X-Ray
CT
Ultrasound
MRI
Medical Images – Brain MRI
3D Brain
The Real 3D Brain
Medical Images - Why AI?
Radiologist
• Error-prone
• Subjective
• Qualitative
Computers
• Tireless
• Objective (repeatable results)
• Quantitative
Deep Learning for Medical Images
Litjens et al “A Survey on Deep Learning in Medical Image Analysis”, 2017
Deep Learning for Medical Images
Litjens et al “A Survey on Deep Learning in Medical Image Analysis”, 2017
Data Science Bowl 2017
Detection of prostate cancer using temporal sequences of
ultrasound data: a large clinical feasibility study
Azizi et al “Detection of prostate cancer using temporal sequences of ultrasound data: a large clinical feasibility study”, 2016
Malignancy
determination in
Prostate Ultrasound
Medical Imaging Applications
Detection of
Tuberculosis in Chest
X-Ray
Kim et al “Deconvolutional Feature Stacking for Weakly-Supervised Semantic Segmentation”, 2016
Medical Imaging Applications
Gao et al “Multi-label Deep Regression and Unordered Pooling for Holistic Interstitial Lung Disease Detection”,
2016
Detecting Patterns of
Interstitial Lung
Disease
(CT)
Medical Imaging Applications
Ghafoorian et al “Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities”,
2016
Segmentation of
White-Matter
Hyperintensities
(MRI)
Medical Imaging Applications
INFANT Research
INFANT Perinatal Research
NeonatesPregnancy
e.g.
Diagnostic Testing
Improved Monitoring
Newborn Health
Monitoring
Nutrition
Brain InjuryBrain Injury
www.infantcentre.i
Hypoxic Ischemic Encephalopathy
Oxygen Deprivation during Birth
Cause of brain injury
2-5 cases per 1000 live births
Wide range of severities and outcomes
Which part of the brain is injured and how severely?Which part of the brain is injured and how severely?
Hypoxic Ischemic Encephalopathy
Day 3-4 MRI acquired
Diffusion-Weighted T2 (Anatomical)
Baby 1: Diffusion-Weighted MRI
Baby 2: Diffusion-Weighted MRI
Baby 3: Diffusion-Weighted MRI
The Neural Network
(25 Subjects – per pixel Classification – round-robin 5 subjects training)
Fully Convolutional Network with dilated convolutions
Trained on image patches with Data Augmentation
3 x Hidden Convolutional Layers (32 features, 64 features, 96 features)
Loss function : Binary Cross-entropy
Optimizer : Adam
Brain Tissue Segmentation
Segment 8 tissue types
in anatomical scans
NeoBrainS12 Public Challenge
2 x subjects fully labelled (training)
5 x subjects no labels (test)
Brain Tissue Segmentation
Human SegmentationTraining Data
Network SegmentationTest Data
Brain Tissue Segmentation
The Neural Network
(2 training Subjects – per pixel classification)
Fully Convolutional Network with dilated convolutions
Trained on image patches with Data Augmentation
Deep residual network with 11 convolutional layers stacked with batch-
normalization and ‘ReLU’ activation layers.
Loss function : Categorical Cross-entropy
Optimizer : Adam
Getting Started with Deep Learning
Preferably with GPU
e.g. NVIDIA GTX
see also http://timdettmers.com/2017/04/09/which-gpu-for-deep-learning/
Coding Frameworks:
Caffe (Berkeley AI)
Theano (University of Montreal)
Tensorflow (Google)
PyTorch (or Torch)
Higher Level:
Lasagne (layered on Theano)
Keras (layered on Tensorflow/Theano)
Also check out:
Deep learning for JVM (Java, Scala, Hadoop, Spark) https://deeplearning4j.org/
Packages in e.g. Matlab & R (extensive list: http://deeplearning.net/software_links/)
Acknowledgements
Kevin McGuinness, INSIGHT centre DCU
Joseph Antony, INSIGHT centre DCU
http://cs231n.github.io/
http://neuralnetworksanddeeplearning.com
http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial
https://ayearofai.com
https://arxiv.org/ (for research literature)
https://github.com/kailashahirwar/cheatsheets-ai (cheat-sheets for
programming)
https://aiexperiments.withgoogle.com (fun stuff with neural nets)
Useful (and fun) resources
Thank You.
Questions?
keelinm@gmail.com
@MurphyKeelin

2017 07 03_meetup_d

  • 1.
    A World -LeadingScience Foundation Ireland Research Centre Deep Learning for Medical Image Analysis Keelin Murphy July 3rd 2017
  • 2.
    About Me BA MathematicsTCD Software Industry 4-5 years MSc + PhD in Biomedical Image Analysis UCC (INFANT Research Centre) Utrecht Medical Center, the Netherlands
  • 3.
    Today…. Machine Learning Deep Learning DeepLearning for Image Analysis Deep Learning for Medical Image Analysis Deep Learning for Neonatal Medical Image Analysis
  • 4.
    Artificial Deep IntelligentMachine Learning …..? Deep Learning Machine Learning Artificial Intelligence
  • 5.
    Machine Learning Finding patternsin data Supervised learning – Learn by example, like humans
  • 6.
    CAR Training: Feed in labelleddata Machine Learning CARGIRLDOG Testing: Request labels for new data
  • 7.
    How does itlearn? Machine Learning Everything Else (“Conventional” Machine Learning) Neural Networks (AKA Deep Learning) Support Vector Machines Random Forests Gradient Boosting Linear Classifiers Nearest Neighbour Classifiers …….. “Hand-crafted” Features
  • 8.
  • 9.
    What features dohumans use?
  • 10.
  • 11.
  • 12.
  • 13.
    Deep Learning (Artificial) Neuralnetworks with lots of hidden layers (deep) The network determines what features are useful Lost favour until around 2006-2012 - Large amounts of data online - GPU and distributed processing Source: Alexander Del Toro Barba https://www.linkedin.com/pulse/how-artificial-intelligence-revolutionizing-finance-del-toro-barba
  • 14.
    Neural Networks: Auto-features! Dog Cat Penguin 0.1 0.2 0.7 InputLayer Hidden Layers Output Layer Dog Cat Penguin 0.0 0.0 1.0 TRUTH ERROR ERROR ERROR Neuron / Perceptron = Matrix of Weights TRAINING 0.3 0.4 0.3 Back-Propagation Update weights to minimize errors
  • 15.
    Deep Neural Networks SimplestNeural Network Example: Input Hidden Layer 2 neurons x1 x2 x4 x3 w11 b1 b2 w12 w13 w14 w21 w22 w23 w24 x w1 w2 W1x + b1 W2x + b2 (w11)(x1) + (w12)(x2) + (w13)(x3) + (w14)(x4) + b1 (w21)(x1) + (w22)(x2) + (w23)(x3) + (w24)(x4) + b2 Output Layer Dog Cat
  • 16.
    Deep Neural Networks SimplestNeural Network Example: Input Hidden Layer 2 neurons x1 x2 x4 x3 w11 b1 b2 w12 w13 w14 w21 w22 w23 w24 x w1 w2 f(W1x + b1) f(W2x + b2) f = activation function (non linearity) Output Layer Dog Cat
  • 17.
    Deep Neural Networks ActivationFunctions e.g. Sigmoid No longer recommended ReLU (Rectified Linear Unit) Very popular
  • 18.
    Deep Neural Networks SimplestNeural Network Example: Input Hidden Layer 2 neurons x1 x2 x4 x3 w11 b1 b2 w12 w13 w14 w21 w22 w23 w24 x w1 w2 f(W1x + b1) f(W2x + b2) f = activation function (non linearity) Output Layer Dog Cat Softmax Function n1 n2
  • 19.
    Deep Neural Networks SimplestNeural Network Example: Input Hidden Layer 2 neurons x1 x2 x4 x3 w11 b1 b2 w12 w13 w14 w21 w22 w23 w24 x w1 w2 Output Layer Dog Cat Softmax Function TRAINING Dog Cat TRUTH ERROR ERROR Back-Propagation Update weights to minimize errors 1.0 0.0 n1 n2 f(W1x + b1) f(W2x + b2)
  • 20.
    Deep Neural Networks Back-propagation: Chooseweight changes which move us “downwards” in the loss function, L Gradient Descent: W11 W12 L Network error measured by Loss function, L (Cost function) ERROR
  • 21.
    Deep Neural Networks GradientDescent = basis for many more sophisticated optimization methods Optimizer = Method of updating weights based on Loss Adam, Adagrad, Adamax, RMSProp……. etc etc See also http://sebastianruder.com/optimizing-gradient-descent/index.html
  • 22.
    Deep Neural Networks Wehave discussed Fully Connected Neural Networks
  • 23.
    Deep Neural Networks 256 256 (Small)RGB image Fully connected model : •256 x 256 x 3 weights PER neuron in first hidden layer! •Flattening input to a vector loses information What about image analysis……?
  • 24.
    By Aphex34 -Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45659236 Convolutional Neural Networks Input layer (image) First Hidden layer (Num channels (features) = 5) CNN model : •Neurons arranged in blocks •Each neuron connects to a small region of the input (receptive field) •Neurons in same channel share same weights •Weight-sharing -> detection of similar features across the image
  • 25.
  • 26.
    Convolutional Neural Networks Adaptedfrom : http://benanne.github.io/images/architecture.png MaxPool: reduce dimensionality, prevent overfitting Could also add “dropout” layer to help with overfitting
  • 27.
    # Code Snippet model= Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) Convolutional Neural Networks https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
  • 28.
    Anything with patternsin data! Fraud Detection Risk Assessment Speech Recognition Text Analysis Targeted advertising ……….. Practical Applications
  • 29.
    e.g. Manufacturing Photo Management Driverless Car RemoteSensing Medical Images…… Imaging Applications
  • 30.
    Image Categorisation flamingo Source :ImageNet 2011 (Large Scale Visual Recognition Challenge) Training data: 1.2 million images 1000 image categories
  • 31.
    Image Object Localisation Source: ImageNet 2013 (Large Scale Visual Recognition Challenge) Training data: 396,000 images 200 object categories
  • 32.
    Image Captioning "a manis throwing a frisbee in a park“ Source : http://cs.stanford.edu/people/karpathy/neuraltalk2/demo.html
  • 33.
    Image Semantic Segmentation Source: www.robots.ox.ac.uk/~szheng/CRFasRNN.html
  • 34.
  • 35.
  • 36.
    Medical Images –Brain MRI 3D Brain
  • 37.
  • 38.
    Medical Images -Why AI? Radiologist • Error-prone • Subjective • Qualitative Computers • Tireless • Objective (repeatable results) • Quantitative
  • 39.
    Deep Learning forMedical Images Litjens et al “A Survey on Deep Learning in Medical Image Analysis”, 2017
  • 40.
    Deep Learning forMedical Images Litjens et al “A Survey on Deep Learning in Medical Image Analysis”, 2017
  • 41.
  • 42.
    Detection of prostatecancer using temporal sequences of ultrasound data: a large clinical feasibility study Azizi et al “Detection of prostate cancer using temporal sequences of ultrasound data: a large clinical feasibility study”, 2016 Malignancy determination in Prostate Ultrasound Medical Imaging Applications
  • 43.
    Detection of Tuberculosis inChest X-Ray Kim et al “Deconvolutional Feature Stacking for Weakly-Supervised Semantic Segmentation”, 2016 Medical Imaging Applications
  • 44.
    Gao et al“Multi-label Deep Regression and Unordered Pooling for Holistic Interstitial Lung Disease Detection”, 2016 Detecting Patterns of Interstitial Lung Disease (CT) Medical Imaging Applications
  • 45.
    Ghafoorian et al“Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities”, 2016 Segmentation of White-Matter Hyperintensities (MRI) Medical Imaging Applications
  • 46.
    INFANT Research INFANT PerinatalResearch NeonatesPregnancy e.g. Diagnostic Testing Improved Monitoring Newborn Health Monitoring Nutrition Brain InjuryBrain Injury www.infantcentre.i
  • 47.
    Hypoxic Ischemic Encephalopathy OxygenDeprivation during Birth Cause of brain injury 2-5 cases per 1000 live births Wide range of severities and outcomes Which part of the brain is injured and how severely?Which part of the brain is injured and how severely?
  • 48.
    Hypoxic Ischemic Encephalopathy Day3-4 MRI acquired Diffusion-Weighted T2 (Anatomical)
  • 49.
  • 50.
  • 51.
  • 52.
    The Neural Network (25Subjects – per pixel Classification – round-robin 5 subjects training) Fully Convolutional Network with dilated convolutions Trained on image patches with Data Augmentation 3 x Hidden Convolutional Layers (32 features, 64 features, 96 features) Loss function : Binary Cross-entropy Optimizer : Adam
  • 53.
    Brain Tissue Segmentation Segment8 tissue types in anatomical scans NeoBrainS12 Public Challenge 2 x subjects fully labelled (training) 5 x subjects no labels (test)
  • 54.
    Brain Tissue Segmentation HumanSegmentationTraining Data
  • 55.
  • 56.
    The Neural Network (2training Subjects – per pixel classification) Fully Convolutional Network with dilated convolutions Trained on image patches with Data Augmentation Deep residual network with 11 convolutional layers stacked with batch- normalization and ‘ReLU’ activation layers. Loss function : Categorical Cross-entropy Optimizer : Adam
  • 57.
    Getting Started withDeep Learning Preferably with GPU e.g. NVIDIA GTX see also http://timdettmers.com/2017/04/09/which-gpu-for-deep-learning/ Coding Frameworks: Caffe (Berkeley AI) Theano (University of Montreal) Tensorflow (Google) PyTorch (or Torch) Higher Level: Lasagne (layered on Theano) Keras (layered on Tensorflow/Theano) Also check out: Deep learning for JVM (Java, Scala, Hadoop, Spark) https://deeplearning4j.org/ Packages in e.g. Matlab & R (extensive list: http://deeplearning.net/software_links/)
  • 58.
    Acknowledgements Kevin McGuinness, INSIGHTcentre DCU Joseph Antony, INSIGHT centre DCU
  • 59.
    http://cs231n.github.io/ http://neuralnetworksanddeeplearning.com http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial https://ayearofai.com https://arxiv.org/ (for researchliterature) https://github.com/kailashahirwar/cheatsheets-ai (cheat-sheets for programming) https://aiexperiments.withgoogle.com (fun stuff with neural nets) Useful (and fun) resources
  • 60.

Editor's Notes

  • #17 Vector notation, dot product
  • #18 Sigmoids saturate gradients during back-propagation ReLU is fast to compute, works well in many cases, caution with learning rate as incorrect settings can lead to dead neurons which always output 0.
  • #19 Want to convert our output values from the hidden layer into categorical probability.
  • #21 L shown as a function of weights
  • #23 Regular Neural Nets don’t scale well to full images. In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectable size, e.g. 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting. the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner
  • #24 Flattening images to vectors loses information
  • #25 Each neuron connects to a small region of the input (receptive field) Weight-sharing -> detection of similar features across the image
  • #26 Central pixel replaced with a weighted sum of itself and surrounding pixels
  • #27 Max pooling, reduces dimensionality, helps overfitting Dropout – disable some neurons on occasion, so that the others learn to function better independently - helps overfitting
  • #36 Discuss modalities, pros/cons?
  • #38 Anisotropy Grayscale Lack of annotations
  • #39 Imaging protocols which ten years ago might have generated 50 images, may now produce thousands of images, all requiring expert examination Error-prone – tired, distracted, bad day, inexperienced Subjective – lots of disagreement! Qualitative – “quite big”, “fairly severe” (2D line measures) Tireless – Can work 24/7 Objective - Repeatable results, 100% agreement Quantitative – “10% bigger than average”
  • #46 common finding on brain MR images of patients diagnosed with small vessel disease (SVD) [1], multiple sclerosis [2], Parkinsonism [3], stroke [4], Alzheimer’s disease [5] and Dementia [6]. Associated with various measures of decline. Not well understood.
  • #58 use primitive functions that NVIDIA has specifically developed for deep learning on GPUs, called cuDNN Torch based on Lua language