SlideShare a Scribd company logo
Str-AI-ght to heaven?
Pitfalls for clinical decision support based on AI
Ben Van Calster
Department Development and Regeneration and EPI-centre, KU Leuven
Department Biomedical Data Sciences, LUMC Leiden
Research Ethics Committee, UZ Leuven
ben.vancalster@kuleuven.be; @BenVanCalster
ISUOG World Congress, 16 October 2021
Disclaimer
• Talk last year: “a plea for good methodology”
• This talk builds on that, in the context of AI and machine learning
• There is a lot of hype surrounding AI/ML. It may have potential, but we better
start to get real!
2
https://lawtomated.com/enough-with-the-a-i-hype-and-why/
Lawtomated
Do not celebrate too early…
3
Copyright Bas Czerwinski / Getty Images
Julian Alaphilippe, Liège-Bastogne-Liège (Oct. 4th, 2020)
Real winner: Primož Roglič
Real winner
Deep learning on medical images
4
Topol. Nat Med 2019;25:44-56. Zhu et al. Front Neurol 2019;10:869.
Titano et al. Nat Med 2018;24:1337-41; Nam et al. Radiology 2019;290:218-28; Ehteshami Bejnordi et al. JAMA 2017;318:2199-210;
Esteva et al. Nature 2017;542:115-8; De Fauw et al. Nat Med 2018;24:1342-50; Raman et al. Eye 2019;33:97-109.
Machine Learning for ‘EHR’ data
5
Rajkomar et al. Npj Digit Med 2018;1:18.
Rose. JAMA Netw Open 2018;1:e181404.
Reason for popularity?
6
“Very complex machine learning algorithms are highly flexible,
and hence find relationships we could not see before.
Therefore we make better predictions and better decisions.”
→ Guaranteed success!
Right?
Pitfalls for “predictive analytics”
7
 1. Poor methodology
 2. Lack of evidence
 3. Considerable heterogeneity
 4. (Financial) conflicts of interest
 5. Actual implementation in clinical practice
1. Methodology matters, not impact factors
8
Altman DG. BMJ 1994;308:283-284.
Van Calster et al, J Clin Epidemiol, in press.
Altman. BMJ 1994.
Our own frustration paper. JCE 2021.
‘Predictive analytics’: covid-19
9
Wynants et al. BMJ 2020;369:m1328.
The review found more than 1 paper a day (!)
Results not trustworthy for 97% of the 231 models
Median sample size: 338
Non-representative sample: 42%
Representativity unclear: 25%
Data analysis problematic: 94%
No model validation at all: 22%
Predictive analytics for covid-19
10
Wynants et al. BMJ 2020;369:m1328
Deep learning models for covid-19 diagnosis using CT or RX
- No discussion of target population or setting
- Control group (without covid-19):
 Images from pediatric population
 Images from a different country
 Images from different time periods
 Barely defined, e.g. ‘healthy persons’
- Images from online repository, without further information
- Often not any demographic description (not even age or sex!)
Covid-19 deep learning: deep failure!
11
Roberts et al. Nat Mach Intell 2021;3:199-217.
Public covid-19 RX datasets
12
Santa Cruz et al. Med Image Analysis 2021;74:102225.
Complex algorithms are data hungry
So you dream of
having a Porsche?
If you cannot (or don’t want to) pay for it,
you may get this...
This also holds for predictive analytics. More fancy model? More expensive.
Currency: GOOD data.
13
Measurement and data quality
14
Missing values: the tricky importance of the invisible
Measurement: timing and procedure matters
Outcome: quality labels are key (see e.g. deep learning on medical images)
Beam & Kohane. JAMA 2018;319:1317-1318.
2. Wanted: evidence
• Kleinrouweler (AJOG 2016): 263 models in obstetrics
• Only 23 of these (9%) had been externally validated…
Other examples of model overload:
• 1060 models predicting outcomes after CVD (1990-2015) (Wessler et al, 2017)
• 363 models predicting CVD (Damen et al, 2016)
• 231 models related to Covid-19 (Wynants et al, 2020), and counting!
• 116 models to diagnose ovarian malignancy (Kaijser et al, 2014)
15
Wessler et al. Diagn Progn Res 2017;1:20. Damen et al. BMJ 2016;353:i2416. Wynants et al. BMJ 2020;369:m1328.
Kleinrouweler et al. AJOG 2016;214:79-90. Kaijser et al. Hum Reprod Update 2014;20:229-62.
Smartphone apps for skin lesions
16
Freeman et al. BMJ 2020;368:m127
• 9 validation studies covering 6 apps
• 1132 lesions in total (average 126 per study)
• Methodological quality was poor
o Selective inclusions (non-representative)
o Images were taken and selected by clinicians
o Lots of unusable images
Scarce and poor evidence
Radiology AI
17
Van Leeuwen et al. Eur Radiol 2021;31:3797-3804
• 64/100: no evidence
• 18/100: evidence of diagnostic performance
• 18/100: evidence of potential impact
• Half of the studies were independent, the other half had conflicts of interest
3. Expect (a lot of) heterogeneity
18
• Changes in care over time
• Differences in care between healthcare systems
• Differences in populations between practices/hospitals/regions
• Differences in hardware, software, and measurement procedures
• Differences in performance between patient subgroups (cf fairness)
Futoma et al. Lancet Digit Health 2020;2:e489-e492.
19
https://www.unite.ai/andrew-ng-criticizes-the-culture-of-overfitting-in-machine-learning/.
https://www.youtube.com/watch?v=Gbnep6RJinQ
Procedural heterogeneity
20
Agniel et al. BMJ 2018;360:k1479.
Hardware/software
21
Badgeley et al. npj Digit Med 2019;2:31.
Deep learning was better at predicting scanner model and brand
(AUC>=0.98) than at predicting hip fracture (AUC 0.78)
Where do DL datasets come from anyway?
22
Kaushal et al. JAMA 2020;324:1212-1213.
Implications?
23
Van Calster et al. BMC Med 2019;17:230.
THERE IS NO SUCH THING AS A ‘VALIDATED’ MODEL
DL research (Sep 2021)
24
Perkonigg et al. Nat Comm 2021;12:5678.
4. Proprietary datasets and models
25
Van Calster et al. JAMIA 2019;26:1651-1654.
https://hai.stanford.edu/news/flying-dark-hospital-ai-tools-arent-well-documented.
Not necessarily bad in principle: financial resources are needed
But it may hamper openness, availability, independent validation
COVID review: companies often did not react, but claimed that the model
was used on thousands of patients
Google’s Dermatology Assist (CE label)
26
https://www.bbc.com/news/technology-57157566.
May 18th, 2021
Google’s Dermatology Assist (CE label)
27
https://www.statnews.com/2021/06/02/machine-learning-ai-methodology-research-flaws/.
Roxana Daneshjou (Stanford):
- No evaluation on external dataset.
- Insufficient variation in skin types.
- Outcome rarely based on biopsy.
- “I haven't seen data that makes me feel
comfortable with putting this in the hands of
patients or physicians.”
External validation of EPIC sepsis model
28
Wong et al. JAMA Intern Med 2021;181:1065-1070.
Model: penalized logistic regression with 80 variables
Data: 3 healthcare organizations, 2013-2015
AUC according to internal documentation: 0.78-0.83
Validation: 1 academic center, 2018-2019
AUC 0.63, calibration poor (risks way too high)
5. Actual implementation
29
Logistical/practical issues to fit model in clinical workflow
Psychological issues regarding model use by healthcare staff
Medicolegal: Who is responsible when prediction is wrong?
https://www.statnews.com/2020/03/09/can-you-sue-artificial-intelligence-algorithm-for-malpractice/
Panch et al. npj Digit Med 2019;2:77.
Lack of evidence revisited: impact?
30
Clinical impact studies: scarce, difficult
Clinical decision support is a complex intervention (Kappen et al, 2018)
Endpoints of impact studies?
- Process-related: ‘easy’, but intermediate
- Long-term patient outcomes: difficult, lower effect sizes expected
Kappen et al. Diagn Progn Res 2018.
So, does medical AI ‘work’?
31
We still often don’t know!
Trust jeopardized by
- poor methodology
- lack of evidence
- lack of openness.
It may have potential if done well and evidence is gathered.
AI community / academia often shoots itself in the foot, this is a pity
Academia: wrong incentives (publish or perish)!
Companies: financial conflicts of interest!
That’s (not) all folks…
32
https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/.
https://spectrum.ieee.org/deep-learning-computational-cost
Thompson et al. IEEE Spectrum 2021.
Hao. MlT Technology review 2019.

More Related Content

What's hot

Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
Maarten van Smeden
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
Laure Wynants
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Ewout Steyerberg
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
Maarten van Smeden
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
Maarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
Stats Statswork
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
Maarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
Maarten van Smeden
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
Ewout Steyerberg
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
Maarten van Smeden
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
Maarten van Smeden
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Aseda Owusua Addai-Deseh
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
Maarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
Stats Statswork
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
GaryCollins74
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
Maarten van Smeden
 
Data science in health care
Data science in health careData science in health care
Data science in health care
Chetan Khanzode
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
Maarten van Smeden
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilance
Revathi Boyina
 

What's hot (20)

Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Data science in health care
Data science in health careData science in health care
Data science in health care
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilance
 

Similar to Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI

Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Sean Manion PhD
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Jake Chen
 
Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023
Peter Embi
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
ssuser6b571f
 
20190820 deepest
20190820 deepest 20190820 deepest
20190820 deepest
Ryoungwoo Jang
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
Paul Agapow
 
fnano-04-972421.pdf
fnano-04-972421.pdffnano-04-972421.pdf
fnano-04-972421.pdf
EverestTechnomania
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
marcus evans Network
 
Νικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech ConferenceΝικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech Conference
Starttech Ventures
 
Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7
Lio Naveau
 
Big data in research: possibilities and pitfalls
Big data in research: possibilities and pitfallsBig data in research: possibilities and pitfalls
Big data in research: possibilities and pitfalls
Joppe Nijman
 
Possibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICUPossibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICU
Joppe Nijman
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
Namkug Kim
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...
Levi Waldron
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
Paul Agapow
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing country
Dr. Ashish lakhey
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
Rebecca Grant
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
Paul Agapow
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Thien Q. Tran
 
Cancer tissue evaluation.pptx
Cancer tissue evaluation.pptxCancer tissue evaluation.pptx
Cancer tissue evaluation.pptx
KerenEvangelineI
 

Similar to Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI (20)

Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
 
Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
20190820 deepest
20190820 deepest 20190820 deepest
20190820 deepest
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
fnano-04-972421.pdf
fnano-04-972421.pdffnano-04-972421.pdf
fnano-04-972421.pdf
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
 
Νικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech ConferenceΝικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech Conference
 
Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7
 
Big data in research: possibilities and pitfalls
Big data in research: possibilities and pitfallsBig data in research: possibilities and pitfalls
Big data in research: possibilities and pitfalls
 
Possibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICUPossibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICU
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing country
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
Cancer tissue evaluation.pptx
Cancer tissue evaluation.pptxCancer tissue evaluation.pptx
Cancer tissue evaluation.pptx
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI

  • 1. Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI Ben Van Calster Department Development and Regeneration and EPI-centre, KU Leuven Department Biomedical Data Sciences, LUMC Leiden Research Ethics Committee, UZ Leuven ben.vancalster@kuleuven.be; @BenVanCalster ISUOG World Congress, 16 October 2021
  • 2. Disclaimer • Talk last year: “a plea for good methodology” • This talk builds on that, in the context of AI and machine learning • There is a lot of hype surrounding AI/ML. It may have potential, but we better start to get real! 2 https://lawtomated.com/enough-with-the-a-i-hype-and-why/ Lawtomated
  • 3. Do not celebrate too early… 3 Copyright Bas Czerwinski / Getty Images Julian Alaphilippe, Liège-Bastogne-Liège (Oct. 4th, 2020) Real winner: Primož Roglič Real winner
  • 4. Deep learning on medical images 4 Topol. Nat Med 2019;25:44-56. Zhu et al. Front Neurol 2019;10:869. Titano et al. Nat Med 2018;24:1337-41; Nam et al. Radiology 2019;290:218-28; Ehteshami Bejnordi et al. JAMA 2017;318:2199-210; Esteva et al. Nature 2017;542:115-8; De Fauw et al. Nat Med 2018;24:1342-50; Raman et al. Eye 2019;33:97-109.
  • 5. Machine Learning for ‘EHR’ data 5 Rajkomar et al. Npj Digit Med 2018;1:18. Rose. JAMA Netw Open 2018;1:e181404.
  • 6. Reason for popularity? 6 “Very complex machine learning algorithms are highly flexible, and hence find relationships we could not see before. Therefore we make better predictions and better decisions.” → Guaranteed success! Right?
  • 7. Pitfalls for “predictive analytics” 7  1. Poor methodology  2. Lack of evidence  3. Considerable heterogeneity  4. (Financial) conflicts of interest  5. Actual implementation in clinical practice
  • 8. 1. Methodology matters, not impact factors 8 Altman DG. BMJ 1994;308:283-284. Van Calster et al, J Clin Epidemiol, in press. Altman. BMJ 1994. Our own frustration paper. JCE 2021.
  • 9. ‘Predictive analytics’: covid-19 9 Wynants et al. BMJ 2020;369:m1328. The review found more than 1 paper a day (!) Results not trustworthy for 97% of the 231 models Median sample size: 338 Non-representative sample: 42% Representativity unclear: 25% Data analysis problematic: 94% No model validation at all: 22%
  • 10. Predictive analytics for covid-19 10 Wynants et al. BMJ 2020;369:m1328 Deep learning models for covid-19 diagnosis using CT or RX - No discussion of target population or setting - Control group (without covid-19):  Images from pediatric population  Images from a different country  Images from different time periods  Barely defined, e.g. ‘healthy persons’ - Images from online repository, without further information - Often not any demographic description (not even age or sex!)
  • 11. Covid-19 deep learning: deep failure! 11 Roberts et al. Nat Mach Intell 2021;3:199-217.
  • 12. Public covid-19 RX datasets 12 Santa Cruz et al. Med Image Analysis 2021;74:102225.
  • 13. Complex algorithms are data hungry So you dream of having a Porsche? If you cannot (or don’t want to) pay for it, you may get this... This also holds for predictive analytics. More fancy model? More expensive. Currency: GOOD data. 13
  • 14. Measurement and data quality 14 Missing values: the tricky importance of the invisible Measurement: timing and procedure matters Outcome: quality labels are key (see e.g. deep learning on medical images) Beam & Kohane. JAMA 2018;319:1317-1318.
  • 15. 2. Wanted: evidence • Kleinrouweler (AJOG 2016): 263 models in obstetrics • Only 23 of these (9%) had been externally validated… Other examples of model overload: • 1060 models predicting outcomes after CVD (1990-2015) (Wessler et al, 2017) • 363 models predicting CVD (Damen et al, 2016) • 231 models related to Covid-19 (Wynants et al, 2020), and counting! • 116 models to diagnose ovarian malignancy (Kaijser et al, 2014) 15 Wessler et al. Diagn Progn Res 2017;1:20. Damen et al. BMJ 2016;353:i2416. Wynants et al. BMJ 2020;369:m1328. Kleinrouweler et al. AJOG 2016;214:79-90. Kaijser et al. Hum Reprod Update 2014;20:229-62.
  • 16. Smartphone apps for skin lesions 16 Freeman et al. BMJ 2020;368:m127 • 9 validation studies covering 6 apps • 1132 lesions in total (average 126 per study) • Methodological quality was poor o Selective inclusions (non-representative) o Images were taken and selected by clinicians o Lots of unusable images Scarce and poor evidence
  • 17. Radiology AI 17 Van Leeuwen et al. Eur Radiol 2021;31:3797-3804 • 64/100: no evidence • 18/100: evidence of diagnostic performance • 18/100: evidence of potential impact • Half of the studies were independent, the other half had conflicts of interest
  • 18. 3. Expect (a lot of) heterogeneity 18 • Changes in care over time • Differences in care between healthcare systems • Differences in populations between practices/hospitals/regions • Differences in hardware, software, and measurement procedures • Differences in performance between patient subgroups (cf fairness) Futoma et al. Lancet Digit Health 2020;2:e489-e492.
  • 20. Procedural heterogeneity 20 Agniel et al. BMJ 2018;360:k1479.
  • 21. Hardware/software 21 Badgeley et al. npj Digit Med 2019;2:31. Deep learning was better at predicting scanner model and brand (AUC>=0.98) than at predicting hip fracture (AUC 0.78)
  • 22. Where do DL datasets come from anyway? 22 Kaushal et al. JAMA 2020;324:1212-1213.
  • 23. Implications? 23 Van Calster et al. BMC Med 2019;17:230. THERE IS NO SUCH THING AS A ‘VALIDATED’ MODEL
  • 24. DL research (Sep 2021) 24 Perkonigg et al. Nat Comm 2021;12:5678.
  • 25. 4. Proprietary datasets and models 25 Van Calster et al. JAMIA 2019;26:1651-1654. https://hai.stanford.edu/news/flying-dark-hospital-ai-tools-arent-well-documented. Not necessarily bad in principle: financial resources are needed But it may hamper openness, availability, independent validation COVID review: companies often did not react, but claimed that the model was used on thousands of patients
  • 26. Google’s Dermatology Assist (CE label) 26 https://www.bbc.com/news/technology-57157566. May 18th, 2021
  • 27. Google’s Dermatology Assist (CE label) 27 https://www.statnews.com/2021/06/02/machine-learning-ai-methodology-research-flaws/. Roxana Daneshjou (Stanford): - No evaluation on external dataset. - Insufficient variation in skin types. - Outcome rarely based on biopsy. - “I haven't seen data that makes me feel comfortable with putting this in the hands of patients or physicians.”
  • 28. External validation of EPIC sepsis model 28 Wong et al. JAMA Intern Med 2021;181:1065-1070. Model: penalized logistic regression with 80 variables Data: 3 healthcare organizations, 2013-2015 AUC according to internal documentation: 0.78-0.83 Validation: 1 academic center, 2018-2019 AUC 0.63, calibration poor (risks way too high)
  • 29. 5. Actual implementation 29 Logistical/practical issues to fit model in clinical workflow Psychological issues regarding model use by healthcare staff Medicolegal: Who is responsible when prediction is wrong? https://www.statnews.com/2020/03/09/can-you-sue-artificial-intelligence-algorithm-for-malpractice/ Panch et al. npj Digit Med 2019;2:77.
  • 30. Lack of evidence revisited: impact? 30 Clinical impact studies: scarce, difficult Clinical decision support is a complex intervention (Kappen et al, 2018) Endpoints of impact studies? - Process-related: ‘easy’, but intermediate - Long-term patient outcomes: difficult, lower effect sizes expected Kappen et al. Diagn Progn Res 2018.
  • 31. So, does medical AI ‘work’? 31 We still often don’t know! Trust jeopardized by - poor methodology - lack of evidence - lack of openness. It may have potential if done well and evidence is gathered. AI community / academia often shoots itself in the foot, this is a pity Academia: wrong incentives (publish or perish)! Companies: financial conflicts of interest!
  • 32. That’s (not) all folks… 32 https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/. https://spectrum.ieee.org/deep-learning-computational-cost Thompson et al. IEEE Spectrum 2021. Hao. MlT Technology review 2019.