SlideShare a Scribd company logo
1 of 56
Download to read offline
March 2024, Aachen
Validation and Trustworthiness of AI based
Predictions
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Dept of Biomedical Data Sciences
Leiden University Medical Center
Thanks to many for assistance and inspiration
Prediction
“10% risk”
Clinical prediction models
25-Mar-24
3 www.cancercalculators.org
25-Mar-24
4 Insert > Header & footer
Validated  Trustworthy?
25-Mar-24
5 Insert > Header & footer
Data preparation
Veldnorm Medische AI
6
Predictive algorithms: Medical AI
Phase 1
Development AI
algorithm
Phase 2
Validation AI
algorithm
Phase 3
Software
environment
Phase 4
Impact
assessment
Phase 5
Implementation in
medical practice
Phase 6
1 2 3 4 5 6
Trustworthiness of predictions
Mathematical models needed:
• Complex processes
• No simple prediction via a deterministic theory
Modelling assumptions:
• Generally false
• Intelligent guesswork
Medical prediction: y ~ X
Topics: trustworthy predictions
1. What do we need for individual patients?
• Internal vs external validity
• Calibration vs discrimination
2. Trustworthy processes to build a prediction model?
• AI
• Humans
• Requirements
3. Types of uncertainty
• Statistical aspects
• Model uncertainty
• Heterogeneity between contexts of practical application
Validated = trustworthy?
• Classic:
• No validation, only internal  low ranking journal
• 1 convincing validation  top journal
• Modern
• Substantial heterogeneity in performance
 There is no such thing as a validated model
Example in neurotrauma, external validation
25-Mar-24
10 Insert > Header & footer
Modification of val.prob() in rms; val.prob.ci.2()
Steyerberg et al, PLoS Med 2008
well calibrated over prediction
Very heterogenous validations
25-Mar-24
11 Insert > Header & footer
Perspective at validation
25-Mar-24
12 Insert > Header & footer
Summary validation
• Internal: minimum, same underlying population
• External:
• Temporal
• Geographic
• Domain
• Efficient: internal-external cross-validation
Performance assessment
• What is the most commonly reported measure for performance of
prediction models?
• Area under the Receiver Operating Characteristic curve (AUC), or
concordance (c) statistic
• Discrimination = spread of predictions between individuals
• Higher if better predictors in model
• Higher --> more trustworthy??
25-Mar-24
15 Insert > Header & footer
Trustworthiness for individuals
• Calibration = reliability of predictions per individual
25-Mar-24
17 Insert > Header & footer
Modification of val.prob() in rms; val.prob.ci.2()
Steyerberg et al, PLoS Med 2008
well calibrated
Trustworthiness for individuals
• Calibration = reliability of predictions per individual
• True risk estimates UTOPIAN
• Calibration underreported
Topics: trustworthy predictions
1. What do we need for individual patients?
• Internal vs external validity
• Calibration vs discrimination
2. Trustworthy processes to build a prediction model?
• AI
• Humans
3. Types of uncertainty
• Statistical aspects
• Model uncertainty
• Heterogeneity between contexts of practical application
Trustworthiness of ChatGPT
ChatGPT: may be hallucinating
• Simple calculations:
Trustworthiness and AI
• Relation to evidence base
• Other popular terms related to AI
• Fairness
• Equity
 Let’s ask ChatGPT
Trustworthy models?
• Modeling flexibility: friend or foe?
Human oversight on:
• Classical modeling: selection of predictors; nonlinearity; interactions
• AI: hyperparameters; technique CART / RF / XGBoost / nnet / ..
Trustworthiness: poor for human modelers
Red cards and dark skin soccer players
https://psyarxiv.com/qkwst/
25-Mar-24
27 Insert > Header & footer
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin
toned players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3)
• 20 teams: statistically significant positive effect, 9: non-significant relation
25-Mar-24
28 Insert > Header & footer
Estimated odds ratios by 29 research teams
25-Mar-24
29 Insert > Header & footer
“Logistic regression”
25-Mar-24
30 Insert > Header & footer
Claimed trust in results
25-Mar-24
31 Insert > Header & footer
Trustworthiness: poor for human modelers
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin toned
players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3).
• 20 teams: statistically significant positive effect, 9: non-significant relation.
• 21 unique combinations of covariates
• “Variation in analysis of complex data may be difficult to
avoid, even by experts with honest intentions”
25-Mar-24
32 Insert > Header & footer
Explainable to humans = trustworthy?
• Explainable AI
• Algorithm trustworthy: if predictions are based on factors that are
acceptable to domain experts instead of on ‘spurious correlations’
• SHAP (SHapley Additive exPlanations) values
“By using SHAP values, researchers and practitioners can gain a
deeper understanding of how different features influence model
predictions, leading to improved model interpretability and trust.”
ChatGPT3.5
Topics: trustworthy predictions
1. What do we need for individual patients?
• Internal vs external validity
• Calibration vs discrimination
2. Trustworthy processes to build a prediction model?
• AI
• Humans
3. Types of uncertainty
• Statistical aspects
• Model uncertainty
• Heterogeneity between contexts of practical application
Approaches to uncertainty quantification
• Sample size
• specifically #events for binary outcome prediction
• ‘patients like you’ and exceptionality
• For risk communication: aleatory uncertainty
• For uncertainty communication: epistemic
Example on presentation by ‘the king of nomograms’
25-Mar-24
36 Insert > Header & footer
25-Mar-24
37 Insert > Header & footer
N = 100?
25-Mar-24
38 Insert > Header & footer
GUSTO data, n=1200
(out of 40,830)
Uncertainty for rare, strong predictor: SHOCK
25-Mar-24
39 Insert > Header & footer
Predictions with and without SHOCK in the model
25-Mar-24
40 Insert > Header & footer
AI: arXiv paper
“Trustworthiness expresses whether a prediction is aligned with the
train data”
• Distance between the new patient and similar patients from the
training data estimates the trustworthiness of a prediction;
resembles the reference data
• Prediction for the new patient is close to the ground truth of the
neighbors in the training set.
Claim
”Effective N is an attractive concept to address epistemiologic uncertainty”
• Analytic solutions for regression models
• Minimum certainty, say, n>10, for model specification: selection / shrinkage?
• Approximate solutions for machine learning models
• Bootstrap
• Effective N: conditional on the model
25-Mar-24
42 Insert > Header & footer
Model uncertainty
• 246 biologists modeling
• 61 analysts in 29 teams on the Red Card problem
• …
• Comparisons between classic vs machine learning
Systematic review
25-Mar-24
44 Insert > Header & footer
Differences in discrimination
Topics: trustworthy predictions
1. What do we need for individual patients?
• Internal vs external validity
• Calibration vs discrimination
2. Trustworthy processes to build a prediction model?
• AI
• Humans
3. Types of uncertainty
• Statistical aspects
• Model uncertainty
• Heterogeneity between contexts of practical application
Heterogeneity
• Study design
• Selection of subjects
• Disease domain
• Measurement of covariates
• Measurement of outcomes
• Associations of covariates with outcome
• Overall outcome rates
Heterogeneity in performance
Steyerberg et al, PLoS Med 2008
Dijkland S et al; J Neurotrauma 2019
25-Mar-24
50 Insert > Header & footer
15 cohorts: 11 RCTs, 4 Observational studies
25-Mar-24
51 Insert > Header & footer
Heterogeneity in case-mix
25-Mar-24
52 Insert > Header & footer
Heterogeneity in predictor effects
25-Mar-24
53 Insert > Header & footer
Heterogeneity in predictions
25-Mar-24
54 Insert > Header & footer
Heterogeneity in individual predictions
25-Mar-24
55 Insert > Header & footer
Conclusions on trustworthy predictions
• Epistemic uncertainty: under the influence of the modeler
• Larger sample sizes
• Modest modeling, limit flexibility
• Heterogeneity: assess differences between settings
• Study design
• Distribution and effects of covariates
• Differences between predictions
• Model predictions suffer from multiple sources of uncertainty
• Transparency: for policy makers / physicians / patients
• Context dependency: Local versus global models
25-Mar-24
56 Insert > Header & footer

More Related Content

Similar to Trustworthiness of AI based predictions Aachen 2024

Myths and Realities of Psychometric Testing
Myths and Realities of Psychometric TestingMyths and Realities of Psychometric Testing
Myths and Realities of Psychometric TestingOPRA Psychology Group
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Krishnaram Kenthapadi
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealth Catalyst
 
Hima_Lakkaraju_XAI_ShortCourse.pptx
Hima_Lakkaraju_XAI_ShortCourse.pptxHima_Lakkaraju_XAI_ShortCourse.pptx
Hima_Lakkaraju_XAI_ShortCourse.pptxPhanThDuy
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataHealth Catalyst
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Machine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineMachine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineChad You
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecastingNimai Chand Das Adhikari
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainData Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainFormulatedby
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?Paul Agapow
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Does big data = big insights?
Does big data = big insights?Does big data = big insights?
Does big data = big insights?Colin Strong
 
Open Data Science Conference (ODSC) Medical AI Webinar
Open Data Science Conference (ODSC) Medical AI WebinarOpen Data Science Conference (ODSC) Medical AI Webinar
Open Data Science Conference (ODSC) Medical AI WebinarSina Bari, M.D.
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataDale Sanders
 
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift Conference
 

Similar to Trustworthiness of AI based predictions Aachen 2024 (20)

Myths and Realities of Psychometric Testing
Myths and Realities of Psychometric TestingMyths and Realities of Psychometric Testing
Myths and Realities of Psychometric Testing
 
Elsevier
ElsevierElsevier
Elsevier
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
 
Hima_Lakkaraju_XAI_ShortCourse.pptx
Hima_Lakkaraju_XAI_ShortCourse.pptxHima_Lakkaraju_XAI_ShortCourse.pptx
Hima_Lakkaraju_XAI_ShortCourse.pptx
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Machine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineMachine Learning and Prediction in Medicine
Machine Learning and Prediction in Medicine
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecasting
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainData Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Data science 101
Data science 101Data science 101
Data science 101
 
Does big data = big insights?
Does big data = big insights?Does big data = big insights?
Does big data = big insights?
 
Open Data Science Conference (ODSC) Medical AI Webinar
Open Data Science Conference (ODSC) Medical AI WebinarOpen Data Science Conference (ODSC) Medical AI Webinar
Open Data Science Conference (ODSC) Medical AI Webinar
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
 
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
 

Recently uploaded

Monoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyMonoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyHasnat Tariq
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxDr. Dheeraj Kumar
 
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-KnowledgeGiftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-Knowledgeassessoriafabianodea
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiGoogle
 
LESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingLESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingSakthi Kathiravan
 
Tans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxTans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxKezaiah S
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Classmanuelazg2001
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMADivya Kanojiya
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfSasikiranMarri
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfDivya Kanojiya
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxDr. Dheeraj Kumar
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.ANJALI
 
World-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxWorld-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxEx WHO/USAID
 
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand University
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand UniversityCEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand University
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand UniversityHarshChauhan475104
 
systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxEyobAlemu11
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdfDolisha Warbi
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptxMohamed Rizk Khodair
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 

Recently uploaded (20)

Monoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyMonoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technology
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptx
 
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-KnowledgeGiftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali Rai
 
LESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingLESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursing
 
Tans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptxTans femoral Amputee : Prosthetics Knee Joints.pptx
Tans femoral Amputee : Prosthetics Knee Joints.pptx
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Class
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdf
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdf
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptx
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.
 
World-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxWorld-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptx
 
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand University
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand UniversityCEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand University
CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand University
 
systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptx
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptx
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 

Trustworthiness of AI based predictions Aachen 2024

  • 1. March 2024, Aachen Validation and Trustworthiness of AI based Predictions Ewout W. Steyerberg, PhD Professor of Clinical Biostatistics and Medical Decision Making Dept of Biomedical Data Sciences Leiden University Medical Center Thanks to many for assistance and inspiration
  • 3. Clinical prediction models 25-Mar-24 3 www.cancercalculators.org
  • 4. 25-Mar-24 4 Insert > Header & footer
  • 5. Validated  Trustworthy? 25-Mar-24 5 Insert > Header & footer
  • 6. Data preparation Veldnorm Medische AI 6 Predictive algorithms: Medical AI Phase 1 Development AI algorithm Phase 2 Validation AI algorithm Phase 3 Software environment Phase 4 Impact assessment Phase 5 Implementation in medical practice Phase 6 1 2 3 4 5 6
  • 7. Trustworthiness of predictions Mathematical models needed: • Complex processes • No simple prediction via a deterministic theory Modelling assumptions: • Generally false • Intelligent guesswork Medical prediction: y ~ X
  • 8. Topics: trustworthy predictions 1. What do we need for individual patients? • Internal vs external validity • Calibration vs discrimination 2. Trustworthy processes to build a prediction model? • AI • Humans • Requirements 3. Types of uncertainty • Statistical aspects • Model uncertainty • Heterogeneity between contexts of practical application
  • 9. Validated = trustworthy? • Classic: • No validation, only internal  low ranking journal • 1 convincing validation  top journal • Modern • Substantial heterogeneity in performance  There is no such thing as a validated model
  • 10. Example in neurotrauma, external validation 25-Mar-24 10 Insert > Header & footer Modification of val.prob() in rms; val.prob.ci.2() Steyerberg et al, PLoS Med 2008 well calibrated over prediction
  • 11. Very heterogenous validations 25-Mar-24 11 Insert > Header & footer
  • 12. Perspective at validation 25-Mar-24 12 Insert > Header & footer
  • 13. Summary validation • Internal: minimum, same underlying population • External: • Temporal • Geographic • Domain • Efficient: internal-external cross-validation
  • 14. Performance assessment • What is the most commonly reported measure for performance of prediction models? • Area under the Receiver Operating Characteristic curve (AUC), or concordance (c) statistic • Discrimination = spread of predictions between individuals • Higher if better predictors in model • Higher --> more trustworthy??
  • 15. 25-Mar-24 15 Insert > Header & footer
  • 16. Trustworthiness for individuals • Calibration = reliability of predictions per individual
  • 17. 25-Mar-24 17 Insert > Header & footer Modification of val.prob() in rms; val.prob.ci.2() Steyerberg et al, PLoS Med 2008 well calibrated
  • 18. Trustworthiness for individuals • Calibration = reliability of predictions per individual • True risk estimates UTOPIAN • Calibration underreported
  • 19. Topics: trustworthy predictions 1. What do we need for individual patients? • Internal vs external validity • Calibration vs discrimination 2. Trustworthy processes to build a prediction model? • AI • Humans 3. Types of uncertainty • Statistical aspects • Model uncertainty • Heterogeneity between contexts of practical application
  • 20. Trustworthiness of ChatGPT ChatGPT: may be hallucinating • Simple calculations:
  • 21. Trustworthiness and AI • Relation to evidence base • Other popular terms related to AI • Fairness • Equity  Let’s ask ChatGPT
  • 22.
  • 23.
  • 24.
  • 25. Trustworthy models? • Modeling flexibility: friend or foe? Human oversight on: • Classical modeling: selection of predictors; nonlinearity; interactions • AI: hyperparameters; technique CART / RF / XGBoost / nnet / ..
  • 26.
  • 27. Trustworthiness: poor for human modelers Red cards and dark skin soccer players https://psyarxiv.com/qkwst/ 25-Mar-24 27 Insert > Header & footer
  • 28. • 29 teams involving 61 analysts; same dataset; same research question: whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players • Estimated odds ratios 0.89 –2.93 (median 1.3) • 20 teams: statistically significant positive effect, 9: non-significant relation 25-Mar-24 28 Insert > Header & footer
  • 29. Estimated odds ratios by 29 research teams 25-Mar-24 29 Insert > Header & footer
  • 31. Claimed trust in results 25-Mar-24 31 Insert > Header & footer
  • 32. Trustworthiness: poor for human modelers • 29 teams involving 61 analysts; same dataset; same research question: whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players • Estimated odds ratios 0.89 –2.93 (median 1.3). • 20 teams: statistically significant positive effect, 9: non-significant relation. • 21 unique combinations of covariates • “Variation in analysis of complex data may be difficult to avoid, even by experts with honest intentions” 25-Mar-24 32 Insert > Header & footer
  • 33. Explainable to humans = trustworthy? • Explainable AI • Algorithm trustworthy: if predictions are based on factors that are acceptable to domain experts instead of on ‘spurious correlations’ • SHAP (SHapley Additive exPlanations) values “By using SHAP values, researchers and practitioners can gain a deeper understanding of how different features influence model predictions, leading to improved model interpretability and trust.” ChatGPT3.5
  • 34. Topics: trustworthy predictions 1. What do we need for individual patients? • Internal vs external validity • Calibration vs discrimination 2. Trustworthy processes to build a prediction model? • AI • Humans 3. Types of uncertainty • Statistical aspects • Model uncertainty • Heterogeneity between contexts of practical application
  • 35. Approaches to uncertainty quantification • Sample size • specifically #events for binary outcome prediction • ‘patients like you’ and exceptionality • For risk communication: aleatory uncertainty • For uncertainty communication: epistemic
  • 36. Example on presentation by ‘the king of nomograms’ 25-Mar-24 36 Insert > Header & footer
  • 37. 25-Mar-24 37 Insert > Header & footer
  • 38. N = 100? 25-Mar-24 38 Insert > Header & footer GUSTO data, n=1200 (out of 40,830)
  • 39. Uncertainty for rare, strong predictor: SHOCK 25-Mar-24 39 Insert > Header & footer
  • 40. Predictions with and without SHOCK in the model 25-Mar-24 40 Insert > Header & footer
  • 41. AI: arXiv paper “Trustworthiness expresses whether a prediction is aligned with the train data” • Distance between the new patient and similar patients from the training data estimates the trustworthiness of a prediction; resembles the reference data • Prediction for the new patient is close to the ground truth of the neighbors in the training set.
  • 42. Claim ”Effective N is an attractive concept to address epistemiologic uncertainty” • Analytic solutions for regression models • Minimum certainty, say, n>10, for model specification: selection / shrinkage? • Approximate solutions for machine learning models • Bootstrap • Effective N: conditional on the model 25-Mar-24 42 Insert > Header & footer
  • 43. Model uncertainty • 246 biologists modeling • 61 analysts in 29 teams on the Red Card problem • … • Comparisons between classic vs machine learning
  • 45.
  • 47. Topics: trustworthy predictions 1. What do we need for individual patients? • Internal vs external validity • Calibration vs discrimination 2. Trustworthy processes to build a prediction model? • AI • Humans 3. Types of uncertainty • Statistical aspects • Model uncertainty • Heterogeneity between contexts of practical application
  • 48. Heterogeneity • Study design • Selection of subjects • Disease domain • Measurement of covariates • Measurement of outcomes • Associations of covariates with outcome • Overall outcome rates
  • 49. Heterogeneity in performance Steyerberg et al, PLoS Med 2008 Dijkland S et al; J Neurotrauma 2019
  • 50. 25-Mar-24 50 Insert > Header & footer
  • 51. 15 cohorts: 11 RCTs, 4 Observational studies 25-Mar-24 51 Insert > Header & footer
  • 52. Heterogeneity in case-mix 25-Mar-24 52 Insert > Header & footer
  • 53. Heterogeneity in predictor effects 25-Mar-24 53 Insert > Header & footer
  • 54. Heterogeneity in predictions 25-Mar-24 54 Insert > Header & footer
  • 55. Heterogeneity in individual predictions 25-Mar-24 55 Insert > Header & footer
  • 56. Conclusions on trustworthy predictions • Epistemic uncertainty: under the influence of the modeler • Larger sample sizes • Modest modeling, limit flexibility • Heterogeneity: assess differences between settings • Study design • Distribution and effects of covariates • Differences between predictions • Model predictions suffer from multiple sources of uncertainty • Transparency: for policy makers / physicians / patients • Context dependency: Local versus global models 25-Mar-24 56 Insert > Header & footer