A plea for good methodology when developing clinical prediction models

A plea for good methodology:
the strengths and limitations of
approaches to developing prediction
models in obstetrics and gynecology
Ben Van Calster
Department of Development and Regeneration, KU Leuven (B)
Department of Biomedical Data Sciences, LUMC (NL)
Research Ethics Committee, University Hospitals Leuven (B)
Epi-Centre, KU Leuven (B)
Glasgow/Leuven, October 16th 2020

To explain or to predict?
DESCRIBE / EXPLAIN
• Study independent associations / predictors / risk factors
• Key: effect size per variable
• Not prediction modeling!
PREDICT
• Obtain a system that gives predictions (risk estimates)
• Aim is the use in NEW patients: it should work ‘tomorrow’, not now
• Key: quality of the predictions
3

Strengths of prediction models
• Help in (shared) clinical decision making
• Objectify predictions
• Patient counseling
• Effect on clinical workflow and outcomes
GOOD METHODOLOGY AND
GOOD REPORTING ARE ESSENTIAL!
4
Beam and Kohane. JAMA 2018;319:1317-8.

Get the objective right
5
Riley. Nature 2019;572:27-9.
Cronin & Vickers. Urology 2010;76:1298-301.

Get the objective right
• Is there a real clinical need for a new model?
• For which outcome, and for which management decision?
• When during the clinical workflow should the prediction be made?
• Does this match with the timing of the predictors?
• Do you have/can you collect data that is (really) fit for purpose?
6

Example
7
Tangiisuran et al. PLoS One 2014;9:e111254.

Too many models, too few validations
• 1060 models predicting outcomes after CVD (1990-2015) (Wessler et al, 2017)
• 363 models predicting CVD (Damen et al, 2016)
• 231 models related to Covid-19 (Wynants et al, 2020; living syst review)
ObGyn related:
• 263 models in obstetrics (Kleinrouweler et al, 2016)
• 116 models to diagnose ovarian malignancy (Kaijser et al, 2014)
 Perhaps academic CVs need help, but patients need help more
8
Thanks to @GSCollins
Wessler et al. Diagn Progn Res 2017;1:20. Damen et al. BMJ 2016;353:i2416. Wynants et al. BMJ 2020;369:m1328.
Kleinrouweler et al. AJOG 2016;214:79-90. Kaijser et al. Hum Reprod Update 2014;20:229-62.

Models in obstetrics
Only 23 of 263 models (9%) have been externally validated!
9
Kleinrouweler et al. AJOG 2016;214:79-90.

Knowledge is power (1)
Avoid dichotomization of continuous predictor variables
• Biologically implausible
• Deletes information, worse predictions (AUC ) (Collins 2016; Steyerberg 2018)
• Only clinical decisions should be binary
10
Collins et al. Stat Med 2016;35:4124-35.
Steyerberg et al. J Clin Epidemiol 2018;98:133-43.
Butts & Ng. Statistical and methodological myths and urban legends, p361-86. Routledge/Taylor & Francis, 2009.

Use available knowledge, do not always ask the data!
11
Good & Hardin. Common errors in statistics (and how to avoid them). Wiley, 2006.
“Bypassing the brain to
compute by reflex is a
sure recipe for disaster”

Explain how and when predictors are measured, standardize where
reasonably possible
- Units; e.g. progesterone in ng/ml or nmol/L
- How tumor volume or diameter is calculated
- What is meant by ‘hormonal therapy use’ (Which? When?)
- Smoking
- BMI: measured vs self-reported
If measurement varies across studies, model performance deteriorates
(Luijken, 2019; Luijken, 2020)
12
Luijken et al. Stat Med 2019;38:3444-59.
Luijken et al. J Clin Epidemiol 2020;119:7-18.

Knowledge is power (4): sample size
You think of buying a Porsche.
But if you do not want to pay for it,
you may get this.
The same applies for developing risk models.
13

The currency is sample size
The more complicated (or ‘fancy’) the modeling strategy,
the more you have to pay with sample size.
(counterfeit money does not help: we need good quality data)
In this respect, avoid train-test split, this reduces sample size for model
development: you’re burning your money
14

The currency is sample size
Many have heard of the “10 events per variable” rule
1. Often incorrect use: This is not about 10 patients per variable in the final model!
2. This is outdated, 10 EPV is often not enough. See new procedure (BMJ 2020).
3. Flexible algorithms are data hungry, EPV>>10 may be needed (van der Ploeg 2014).
15
Van der Ploeg et al. BMC Med Res Methodol 2014;14:137
Riley et al. BMJ 2020;368:m441.

Knowledge is power (5): missing data
Usually, “empty cells” are “full of information”!
Using only complete cases
- decreases sample size (less money)
- typically leaves a non-representative sample (biased risk estimates)
Presence of a test can be more predictive than the test result! See EHR data.
16
Agniel et al. BMJ 2018;360:k1479.

Model validation: assess calibration!
Key elements of model performance:
discrimination between patients with and without the event
calibration (correctness) of risk estimates
17
DISCRIMINATION
When it rained, was the
estimated chance of rain
higher (on average)?
CALIBRATION
For days with 80% estimated
chance of rain, did it rain on
8 out of 10 days?

Calibration: the Achilles heel
18
Van Calster & Vickers. Med Decis Making 2015;35:162-9.
Van Calster et al. BMC Med 2019;17:230.
Miscalibration: estimated risk is inaccurate
 Patient and clinician are misinformed, may lead to inappropriate decisions
(Van Calster & Vickers, 2015)

Performance depends on place and time
One external validation in one hospital does not tell much about a model!
“There is no such thing as a validated model”
 Study heterogeneity
19
Van Calster et al. BMJ 2020;370:m2614.

P-values and significance testing
Very small role in prediction modeling
- Focus is on robust predictions
- Focus is on precision of the performance estimates (e.g. AUC, calibration)
- Focus is on quantifying heterogeneity
- Focus is on qualitative difference between populations
- Focus is on a priori selection of predictors
- further data-driven selection can be based on p-values; high alpha recommended
(Steyerberg & Van Calster, 2020)
20
Steyerberg & Van Calster. Eur J Clin Invest 2020;50:e13229.

Machine learning popularity
21
“Typical machine learning algorithms are highly flexible
So will uncover associations we could not find before
Hence better predictions and management decisions”
→ One of the master keys, with guaranteed success!

Machine Learning: success guaranteed?
22
Christodoulou et al. J Clin Epidemiol 2019;110:12-22.

Poor methodology and reporting is common
23
Christodoulou et al (2019) – 71 studies:
- What was done about missing data? 100% poor or unclear
- How was performance validated? 68% unclear or biased approach
- Was calibration of risk estimates studied? 79% not at all
- Prognostic models: time horizon often ignored completely
Kleinrouweler et al (2016) – 263 models:
- Was calibration studied? 82% not at all
- Was the model fully presented so people can use it? Not for 38% of models
- Was the clinical use discussed? Not for 89% of models
FOLLOW TRIPOD GUIDELINES FOR REPORTING!
www.tripod-statement.org
Christodoulou et al. J Clin Epidemiol 2019;110:12-22.
Kleinrouweler et al. AJOG 2016;214:79-90.
Moons et al. Ann Intern Med 2015;162:w1-73.

The harm of poor methodology
24
Steyerberg et al. J Clin Epidemiol 2018;98:133-43.

Resources on prediction modeling
25
Involve a statistician with knowledge of prediction modeling!
Steyerberg EW. Clinical prediction models (2nd ed). Springer, 2019.
Riley RD et al. Prognosis research in healthcare. OUP, 2019.
Moons KGM et al. Transparent reporting of a multivariable prediction model for individual
prognosis and diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med
2015;162:W1-73.
Wynants L et al. Key steps and common pitfalls in developing and validating risk models.
BJOG 2017; 2017;124:423-432.
Prognosisresearch.com (newly launched website)

A plea for good methodology when developing clinical prediction models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A plea for good methodology when developing clinical prediction models

Similar to A plea for good methodology when developing clinical prediction models (20)

Recently uploaded

Recently uploaded (20)

A plea for good methodology when developing clinical prediction models