Thoughts on
Machine Learning and Artificial Intelligence
Maarten van Smeden, PhD

Leiden University Medical Center, Netherlands

STRATOS Lorenz Meeting

21/09/2018
Interested reader perspective
• Statistician by training

• Limited experience applying machine learning techniques

• Three examples that I think are illustrative for ML/AI in medicine
as it is applied nowadays

• Focus: prediction
Tech company business model
Apple Watch 4
FDA Approval
https://www.statnews.com/2018/09/13/heres-the-data-behind-the-new-apple-watch-ekg-app/?mc_cid=0fbfd65c13&mc_eid=75f1d5aea2
Impressive artificial intelligence
IBM Watson win against 2 Jeopardy’s champions in 2011
Reviewer #2
Less impressive artificial intelligence
Warning!
Statistical policing going on
Yesterday’s news
http://www.timvanderzee.com/the-wansink-dossier-an-overview/

Example 1: ML predicting mortality
• Caliber dataset (UK, EHR)

• N = 80,000 pre-existing coronary artery disease

• Predict all cause mortality (18,000 events, time horizon unclear)

• “used Cox models, random forests and elastic net regression”

• 586 candidate predictors vs 27 pre-selected variables

• Complete case / multiple imputation / missing indicator method

• Cox models: linear main effects only

• Split sample (1/3 test, 2/3 training)
Example 1: ML predicting mortality
Example 1: ML predicting mortality
Example 1: ML predicting mortality
One take
Linear regression is an example of
Machine Learning?
If so, what isn’t Machine Learning?
Perhaps more reasonable?
Beam & Kohane, JAMA, 2018
Example 2: lymph node metastases
Example 2: lymph node metastases
Example 2: lymph node metastases
• Researcher challenge competition

• Whole slide images of women diagnosed with breast cancer

• Training data: N = 270 (110 events); test data: N = 129 (49 events)

• 11 pathologists evaluating the test data

• 390 teams signed up for the competition

• 23 teams submitted 32 algorithms for evaluation
Example 2: lymph node metastases
Example 2: lymph node metastases
• Unfair comparison between pathologists and DL

• Pathologists no access to regularly available diagnostics

• AUC comparison DL (continuous) vs pathologists (5-item
scale) 

• Promising algorithms overrepresented (390 teams -> 32
algorithms submitted)
Example 2: lymph node metastases
• No attention to risk prediction / calibration

• ML: attention classification only without probability

• Hugh (often implicit) difference between the traditional (risk)
prediction modeling in medicine and (traditional ML)

• Probably fine for Netflix recommendations; not so much for
real life medical decision making
Misuse of “risk"
Example 3: 5 types of diabetes
Example 3: 5 types of diabetes
Example 3: 5 types of diabetes
• Patients with newly diagnosed diabetes (N = 8980) 

• 6 continuous variables 

• K-means clustering (‘unsupervised learning’)
Example 3: 5 types of diabetes
Example 3: 5 types of diabetes
BS detection simulation
• Data generated from 2 independent MVN-distributions with .3 equal pairwise correlations 

• “Sunday morning simulations”, code: https://github.com/MvanSmeden/DiabetesClusters
K-means clustering
“K-means finds a Voronoi partition, only if that partition coincides with a
"clustering" does it have a hope of actually doing clustering”

Max Little: https://twitter.com/MaxALittle/status/970277900871262213
Freak examples?
Probably?
Maybe?
What I observe is:
• Confusion and disagreement about what is and isn’t ML/AI 

• Analyses labeled “ML/AI” have a tendency to concentrate on
classification (exceptions exist, e.g. high dimensional PS
approaches suggested that are called “ML”) 

• Analyses labeled “ML/AI” in medicine are surprisingly often
done by people not thoroughly trained in statistics

• Basic statistical principles are often forgotten or ignored (e.g.
improper scoring rules)
Concluding remarks (1)
• Just because an algorithm is novel or flexible doesn’t mean it is
any good, obviously

• Dismissing the potential value of novel “ML/AI” algorithms out-
of-hand doesn’t make sense

• We need more realistic simulations and many applications to
compare the traditional vs more novel / flexible algorithms

• The primary issue in medical applications seems to be with the
modelers not so much with the models
Concluding remarks (2)
• Statisticians should be more involved in the application and
evaluation of novel / flexible algorithms, especially for risk
prediction

• Statisticians should be involved in studying performance of
novel / flexible algorithms (e.g. data hungriness) -> realistic
simulation studies

• Collaboration with computer scientists

• Computationally intensive -> may not be cheap

• Serious experimental design and reporting
Simulation is…
“…it is using simulation for multiplication that I find objectionable. Eight patients are
eight patients and so should remain.”
“All the impressive achievements of
deep learning amount to just curve
fitting”
Judea Pearl
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence

Thoughts on Machine Learning and Artificial Intelligence

  • 1.
    Thoughts on Machine Learningand Artificial Intelligence Maarten van Smeden, PhD Leiden University Medical Center, Netherlands STRATOS Lorenz Meeting 21/09/2018
  • 2.
    Interested reader perspective •Statistician by training • Limited experience applying machine learning techniques • Three examples that I think are illustrative for ML/AI in medicine as it is applied nowadays • Focus: prediction
  • 3.
  • 5.
  • 6.
  • 8.
    Impressive artificial intelligence IBMWatson win against 2 Jeopardy’s champions in 2011
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Example 1: MLpredicting mortality • Caliber dataset (UK, EHR) • N = 80,000 pre-existing coronary artery disease • Predict all cause mortality (18,000 events, time horizon unclear) • “used Cox models, random forests and elastic net regression” • 586 candidate predictors vs 27 pre-selected variables • Complete case / multiple imputation / missing indicator method • Cox models: linear main effects only • Split sample (1/3 test, 2/3 training)
  • 15.
    Example 1: MLpredicting mortality
  • 16.
    Example 1: MLpredicting mortality
  • 17.
    Example 1: MLpredicting mortality
  • 18.
    One take Linear regressionis an example of Machine Learning? If so, what isn’t Machine Learning?
  • 19.
    Perhaps more reasonable? Beam& Kohane, JAMA, 2018
  • 20.
    Example 2: lymphnode metastases
  • 21.
    Example 2: lymphnode metastases
  • 22.
    Example 2: lymphnode metastases • Researcher challenge competition • Whole slide images of women diagnosed with breast cancer • Training data: N = 270 (110 events); test data: N = 129 (49 events) • 11 pathologists evaluating the test data • 390 teams signed up for the competition • 23 teams submitted 32 algorithms for evaluation
  • 23.
    Example 2: lymphnode metastases
  • 24.
    Example 2: lymphnode metastases • Unfair comparison between pathologists and DL • Pathologists no access to regularly available diagnostics • AUC comparison DL (continuous) vs pathologists (5-item scale) • Promising algorithms overrepresented (390 teams -> 32 algorithms submitted)
  • 25.
    Example 2: lymphnode metastases • No attention to risk prediction / calibration • ML: attention classification only without probability • Hugh (often implicit) difference between the traditional (risk) prediction modeling in medicine and (traditional ML) • Probably fine for Netflix recommendations; not so much for real life medical decision making
  • 26.
  • 27.
    Example 3: 5types of diabetes
  • 28.
    Example 3: 5types of diabetes
  • 29.
    Example 3: 5types of diabetes • Patients with newly diagnosed diabetes (N = 8980) • 6 continuous variables • K-means clustering (‘unsupervised learning’)
  • 30.
    Example 3: 5types of diabetes
  • 31.
    Example 3: 5types of diabetes
  • 32.
    BS detection simulation •Data generated from 2 independent MVN-distributions with .3 equal pairwise correlations • “Sunday morning simulations”, code: https://github.com/MvanSmeden/DiabetesClusters
  • 33.
    K-means clustering “K-means findsa Voronoi partition, only if that partition coincides with a "clustering" does it have a hope of actually doing clustering” Max Little: https://twitter.com/MaxALittle/status/970277900871262213
  • 34.
  • 35.
    What I observeis: • Confusion and disagreement about what is and isn’t ML/AI • Analyses labeled “ML/AI” have a tendency to concentrate on classification (exceptions exist, e.g. high dimensional PS approaches suggested that are called “ML”) • Analyses labeled “ML/AI” in medicine are surprisingly often done by people not thoroughly trained in statistics • Basic statistical principles are often forgotten or ignored (e.g. improper scoring rules)
  • 36.
    Concluding remarks (1) •Just because an algorithm is novel or flexible doesn’t mean it is any good, obviously • Dismissing the potential value of novel “ML/AI” algorithms out- of-hand doesn’t make sense • We need more realistic simulations and many applications to compare the traditional vs more novel / flexible algorithms • The primary issue in medical applications seems to be with the modelers not so much with the models
  • 37.
    Concluding remarks (2) •Statisticians should be more involved in the application and evaluation of novel / flexible algorithms, especially for risk prediction • Statisticians should be involved in studying performance of novel / flexible algorithms (e.g. data hungriness) -> realistic simulation studies • Collaboration with computer scientists • Computationally intensive -> may not be cheap • Serious experimental design and reporting
  • 38.
    Simulation is… “…it isusing simulation for multiplication that I find objectionable. Eight patients are eight patients and so should remain.”
  • 39.
    “All the impressiveachievements of deep learning amount to just curve fitting” Judea Pearl