This document discusses sources of uncertainty in clinical prediction models. It identifies several types of uncertainty including aleatory uncertainty, epistemic uncertainty, approximation/estimation uncertainty, model uncertainty, data uncertainty, and population uncertainty. It illustrates these uncertainties using a model to predict ovarian cancer risk. Accounting for different sources of uncertainty, the predicted risks for individual patients can vary by over 50 percentage points. The document concludes that completely quantifying uncertainty is impossible and that transparency around uncertainty is important for clinical use and risk communication with patients.
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
ISCB 2023 Sources of uncertainty b.pptx
1. August 2023, Milan (Italy)
Sources of uncertainty in
clinical prediction models
Ben Van Calster, PhD
Dept Development & Regeneration, KU Leuven (B)
Dept Biomedical Data Sciences, LUMC Leiden (NL)
Thanks to Laure Wynants and Ewout Steyerberg
2. Validated = trustworthy?
• Classic prediction model validation is on population level:
• Discrimination (c stat)
• Calibration (curve)
• Clinical utility (net benefit)
• But patients focus on their own risk
• Is that trustworthy, even if the model is calibrated?
4. Sources of uncertainty
Mateo Dineen, “Elephants in the room”
Elephants in the room
Altman & Royston 2000: “We believe that the distinction between what is achievable at
the group and individual levels is not well understood.”
5. “Alternative models […], while demonstrating good agreement for describing
patients in the aggregate, are shown to differ considerably for individual patients.”
6. “Human survival is so uncertain that even the best statistical analysis cannot provide
single-number prediction of real use for individual patients.”
7. Does (true) individual risk exist?
• Concept of individual risk is debated
• Nonsense, not meaningful, … (Stern 2012, Sniderman 2015, people in 19th century)
• “reference class problem”
• Even then:
• Risk estimate may be seen as the level to which one is ‘willing to bet’ on
the existence/occurrence of the event (cf de Finetti; Nau 2001)
• How stable / uncertain is this ‘betting level’?
8. Sources of uncertainty
• Aleatory uncertainty
• Epistemic uncertainty
• Approximation/Estimation uncertainty
• Model(er) uncertainty
• Data uncertainty
• Population uncertainty (heterogeneity)
Hullermeier & Waegeman. Machine Learning 2021;110:457-506.
Gruber et al. arXiv 2023 (statistician’s perspective)
9. Illustration: ovarian cancer diagnosis
• Data from 1133 patients from the U Hospital Leuven, 1999-2015
• Patients with one or more ovarian tumors scheduled for surgery
• Estimate risk of malignancy
• Main model:
• Standard logistic regression
• age (years), max lesion diameter (mm), proportion solid tissue, CA125
(IU/L), bilateral tumors (Y/N), papillations with blood flow (Y/N)
• First 4 predictors: rcs with 3 knots
• 10 parameters, 37% prevalence, assumed AUC of 0.88 (cf literature):
minimum required sample size 359 (Riley et al 2020)
10. Illustration: ovarian cancer diagnosis
• Random test set of 100 patients, remaining 1033 is training pool
• We randomly select 385 cases from training pool
• First, impute missing CA125 values (31%) using regression imputation
• Train model
• Apply imputation model and prediction model to test set
• Calculate risks on test set
11. Illustration: ovarian cancer diagnosis
10 variations regarding modeling uncertainty:
• Nonlinearity based on first degree fractional polynomials (Sauerbrei et al 2007)
• Nonlinearity ignored (we don’t like it, but it happens a lot)
• Numerical variables dichotomized at median (stay calm)
• Backward elimination with alpha 1%
• Backward elimination with alpha 20%
• Penalized estimation based on AICc (Harrell 2015)
• Including interactions with CA125, backward elimination at alpha 20%
• Random forest model using very deep trees (min.node.size 2)
• Random forest model with less deep trees (min.node.size 20)
• Use 2 other categorical preds (ordinal vascularation score, irregular walls)
12. Illustration: ovarian cancer diagnosis
3 variations regarding data uncertainty:
• Model tumor size and proportion solid tissue using estimated volumes
• Impute using missing indicator method
• Impute using median imputation conditional on outcome (stay calm)
2 variations regarding population uncertainty
• Sample from training pool from hospital in Rome (N=1141)
• Sample from training pool from hospital in Malmö (N=1053)
Approximation uncertainty
• Repeat all of this 100 times (100 random training sets of 385 cases)
14. Model uncertainty (1st train set)
AUC range 0.95-0.98
ECE range 0.06-0.10
Mean risk range:
21 %p (range 2-63)
15. Data uncertainty (1st train set)
AUC range 0.97-0.98
ECE range 0.06-0.07
Mean risk range:
6 %p (range 0.2-30)
16. Population uncertainty (1st train set)
AUC range 0.96-0.97
ECE range 0.06-0.07
Mean risk range:
12 %p (range 1-67)
17. All uncertainties
Mean AUC 0.97 (range 0.83-0.99)
Mean ECE 0.07 (range 0.02-0.24)
Mean risk range:
52 %p (range 5->99)
18. Conclusions
• Model predictions suffer from multiple sources of uncertainty
• Transparency: for policy makers / physicians / patients
• Risk communication, shared decision making?
• Quantifying uncertainty completely is impossible (multiverse too large)
• Reporting stability is good, but is only a part of uncertainty
• Riley’s sample size methodology is great, but only a start? (Riley 2020)
• Methods to abstain from prediction: idem (Myers 2020; Kompa 2021)
29-Aug-23
18
19. Conclusions
• Epistemic uncertainty: under the influence of the modeler
• Larger sample sizes, bias-variance considerations
• Domain knowledge about predictors
• Address heterogeneity between settings during development & validation
• IPD / multicenter (TRIPOD-Cluster: Debray 2023)
• Distribution and effects of covariates (Steyerberg 2019)
• Local versus global model: updating
29-Aug-23
19
20. Literature
• Lemeshow et al. Intens Care Med 1995. Competing models (diff pred).
• Henderson & Keiding. J Med Ethics 2005. Competing models (diff pred).
• Steyerberg et al. JCE 2005. Competing models (diff pred).
• Stern. arXiv 2012. Competing models (diff pred).
• Sniderman et al. JAMA 2015. Competing models (diff pred).
• Pate et al. BMC Med 2019. Modeling decisions (pred, time, region, impute)
• Meijerink et al. arXiv 2020.
• Myers et al. Npj Digit Med 2020. Tool to identify subgroups where risks cannot be trusted.
• Pate et al. Diagn Progn Res 2020. N and stability.
• Kompa et al. Npj Digit Med 2021. Overview of uncertainty quantification/abstention.
• Hullermeier & Wageman. Mach Learn 2021. Overview of uncertainty.
• Thomassen et al. ISCB2022/SMDM2023. Stability using effective N (x people like you)
• Riley & Collins. Biom J 2023. Stability (4 levels) and quantification.
• Ledger et al. medRxiv 2023. Uncertainty due to choice of algorithm.
• Gruber et al. arXiv 2023. Overview of uncertainty.