Machine Learning for automatic diagnosis: why your deep neural network might not work by Manon Ansart, PhD Student @AramisLab (Sorbonne Université, Inserm, ICM, CNRS, Inria)

Machine Learning for automatic
diagnosis: why your deep neural
network might not work
April 24th 2019

Projects 2
Amyloidosis prediction
• Predict the output of Amyloid PET
• Application : Recruit patients for clinical trials at a
lower cost
Predicting the evolution of a diagnosis
• Predict the development of Alzheimer’s disease
• Quantitative review

Data uncertainty 3
Reliability of the label
• Definition of AD not clear
• Subjective label established by specialist : sensitivity of 55 %
and specificity of 85 % (Beach et al 2012)
Noisy data
• Always look at the test – retest
(Koval et al 2019)

Data uncertainty 4
Biased missing values
• Example: proportion of hypertension in AD and non-AD subjects

Small data sets 5
ADNI Dataset
800 subjects with Mild Cognitive
Impairment (MCI), 550 followed for 3
years
MNIST
70k examples

Small data sets 6
Amyloidosis prediction
Work on 2 data sets:
• ADNI - 431 subjects, 6 features -> AUC = 69%
• INSIGHT – 318 subjects, 112 features -> AUC = 56%
• Lasso feature selection: AUC = 64 %
• Domain-knowledge summary variables (26): AUC = 68 %

Small data sets 7
Impact of smaller data sets:
• Lower performances
• Over-fitting
Over-fitting in deep learning
• High number of parameters
• Impact on performances

Lack of data 8
Data leakage
T –test feature
selection
Classification
(train – test)
T – test : separating
features
Paper 1
Paper 2
Paper 3
Common
knowledge
Feature selection
using domain
knowledge
Classification
(train – test)
Good… if different data sets are used

Lack of data 9
Solutions
• Using domain knowledge
• Validating on different data
sets
• Pooling cohorts to obtain
larger ones
Data set AUC
Trained and tested on INSIGHT 61.9
Trained on ADNI, tested on INSIGHT 62.0
Using both, same sample size 61.3
Using both, full sample size 67.5
Table: performances of amyloidosis prediction using MRI
features

Usability
What is the use case ?
• Information expected by the user
(doctor/patient)
• Inputs (unavailable / expensive features)
Interpretability
• A small increase in accuracy might not be
worth loss of interpretability

Take-home 11
Main issues
• Data uncertainty: unreliable labels, noisy features, biased missing values
➝ Take the time to explore, know your data
• Lack of data
➝ Use domain knowledge, pool different data bases, don’t over-fit
• Usability and need for interpretability
➝ It is not only about having the biggest accuracy !

References 12
Ansart, Manon, Stéphane Epelbaum, Geoffroy Gagliardi, Olivier Colliot, Didier Dormont, Bruno Dubois,
Harald Hampel, Stanley Durrleman, and for the Alzheimer’s Disease Neuroimaging Initiative* and the INSIGHT-
preAD study. 2019. “Reduction of Recruitment Costs in Preclinical AD Trials: Validation of Automatic Pre-
Screening Algorithm for Brain Amyloidosis.” Statistical Methods in Medical Research, January,
096228021882303. https://doi.org/10.1177/0962280218823036.
Ansart, Manon, Stéphane Epelbaum, Giulia Bassignana, Alexandre Bône, Simona Botani, Tiziana Cattai,
Raphaël Couronné, et al. 2019. “Predicting the Progression of Mild Cognitive Impairment Using Machine
Learning : A Systematic and Quantitative Review.” 49. (to be published)
Beach, Thomas G., Sarah E. Monsell, Leslie E. Phillips, and Walter Kukull. 2012. “Accuracy of the Clinical
Diagnosis of Alzheimer Disease at National Institute on Aging Alzheimer’s Disease Centers, 2005–2010.”
Journal of Neuropathology and Experimental Neurology 71 (4): 266–73.
https://doi.org/10.1097/NEN.0b013e31824b211b.
Koval, Igor, Stéphanie Allassonnière, and Stanley Durrleman. 2019. “Simulation of Virtual Cohorts Increases
Predictive Accuracy of Cognitive Decline in MCI Subjects.” ArXiv:1904.02921 [Cs, Stat], April.
http://arxiv.org/abs/1904.02921.
Junhao Wen, Elina Thibeau-Sutre, Jorge Samper- González, Alexandre Routier, Simona Bottani, Stanley
Durrleman, Ninon Burgos, Olivier Colliot. 2019. “Convolutional Neural Networks for Classification of
Alzheimer's Disease: Overview and Reproducible Evaluation.”

Machine Learning for automatic diagnosis: why your deep neural network might not work by Manon Ansart, PhD Student @AramisLab (Sorbonne Université, Inserm, ICM, CNRS, Inria)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning for automatic diagnosis: why your deep neural network might not work by Manon Ansart, PhD Student @AramisLab (Sorbonne Université, Inserm, ICM, CNRS, Inria)

Similar to Machine Learning for automatic diagnosis: why your deep neural network might not work by Manon Ansart, PhD Student @AramisLab (Sorbonne Université, Inserm, ICM, CNRS, Inria) (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Machine Learning for automatic diagnosis: why your deep neural network might not work by Manon Ansart, PhD Student @AramisLab (Sorbonne Université, Inserm, ICM, CNRS, Inria)