Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment

Developing & validating
clinical prediction models
to improve infection prevention and treatment
Evangelos I. Kritsotakis
Associate Professor of Biostatistics, University of Crete
Honorary Senior Lecturer in Epidemiology, University of Sheffield
e.kritsotakis@uoc.gr
07.04.2021
Joint Seminar Series in Translational and Clinical Medicine
UoC Medical School - IMBB-FORTH – UCRC

• Overview of CPM: key concepts, controversies & best practice
– Developing models: What do you want, when, and how?
– Explanation vs prediction
– Overfitting
– Selecting predictors, sample size
– Validation is needed!
– Calibration is essential
– Apparent performance & optimism
– External validation for translational research, but expect heterogeneity
– Statistical validity vs clinical utility (thresholds, DCA)
• Example of (own) applied research: predicting readmission in OPAT patients
• Machine learning / AI ? (exciting, but calm down!)
Outline

Clinical prediction models:
Key concepts

Key concepts: Explanation vs Prediction

To explain:
• Study (strength of) independent associations with the outcome, e.g.
– to find risk factors or causes
– to isolate the effect of a primary factor or intervention
Tsioutis C, Kritsotakis EI, et al. Int J Antimicrob Agents 2016; 48(5):492-7

To predict:
http://www.cvriskcalculator.com/
Obtain a system (set of variables + model) that
estimates the risk of the outcome

To predict:
• Obtain a system (set of variables + model) that estimates the risk of the outcome
• Aim is the use in NEW patients: it should work ‘tomorrow’, not now (validation)
Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021 (In Press)
Calibration
is essential

Key concepts: Diagnostic vs Prognostic CPM
van Smeden J Clin Epidemiol 2021;132:142-145.
Estimating probability of
• having the target
condition (prevalence)
vs
• getting the target
condition (incidence)

Key concepts: What do we mean by CPM?
• “Model development studies aim to derive a prediction model by
selecting the relevant predictors and combining them statistically into a
multivariable model.” TRIPOD statement, 2015
• “… summarize the effects of predictors to provide individualized
predictions of the absolute risk of a diagnostic or prognostic outcome.”
Steyerberg, 2019
• Reasons for wishing to make such personalized predictions include:
 to inform treatment or other clinical decisions for individual patients;
 to inform patients and their families;
 to create clinical risk groups for informing treatment or
 for stratifying patients by disease severity in clinical trials
Altman & Royston, 2000

CPM: current landscape
Maarten van Smeden tweet (March 17 2021):
Prediction model? My prediction is that somebody already did it

CPM: current landscape - example
“living”
systematic
review
This is update 3
(April 2020)
n = 232 CPMs
Conclusion Prediction models for covid-19 are quickly entering the
academic literature to support medical decision making at a time when they
are urgently needed. This review indicates that almost all pubished
prediction models are poorly reported, and at high risk of bias such
that their reported predictive performance is probably optimistic.
However, we have identified two (one diagnostic and one prognostic)
promising models …

BMJ 2020;369:m1328: 208 new models, 24 ext. validations, 3 main types:
CPM: current landscape – example – COVID-19
Main problems:
Participants
• Inappropriate exclusion or study design
Predictors
• Scored “unknown” in imaging studies
Outcome
• Subjective or proxy outcomes
Analysis
• Small sample size
• Inappropriate or incomplete evaluation of performance
PROBAST
bias risk:
97% high
3% unclear
Moderate to excellent predictive performance, but:
C: 0.71 to 0.99
C: 0.65 to 0.99
C: 0.54 to 0.99

Key issues in model development

Developing a CPM: Why, when, how?
• Is there a clinical need?
Of/for whom?
• Who is eligible to use the model (target population)?
• What specific outcome is predicted (target of prediction)?
• When should the prediction be made (time origin)?
• What is the time horizon for the prediction?
Data on such patients?
• Which predictors are available at that time point?
• What is the quality of the data?
Chen L Ann Transl Med 2020;8(4):71
JUST BECAUSE YOU CAN
CREATE A PREDICTIVE MODEL
DOES NOT MEAN THAT YOU SHOULD

When?
• Logistic regression, artificial neural
network, naive Bayes, and random
forest machine learning
algorithms.
• AUCs between 0.85 and 0.92
indicative of excellent predictive
performance
But, the models used future information!
Model predictors included hospital
complications, which can only be known
when hospital stay has ended.

Developing a CPM: Why, when, how?
Chen L Ann Transl Med 2020;8(4):71
Proceed:
 avoid dichotomizing,
 penalize where possible,
 do rigorous internal/external
validation,
 study model calibration,
 think hard about dealing with
missing data,
 Report following TRIPOD
guideline

CPM: reporting guidelines & risk of bias tools
• TRIPOD: reporting development/validation prediction models
• PROBAST: risk of bias development/validation prediction models
• STARD: reporting diagnostic accuracy studies
• QUADAS-2: risk of bias diagnostic accuracy studies
• REMARK: reporting (tumour marker) prognostic factor studies
• QUIPS: risk of bias in prognostic factor studies
• In development: TRIPOD-AI, TRIPOD-cluster, PROBAST-AI, STARD-AI, QUADAS-AI,
DECIDE-AI etc.
equator-network.org

Developing a CPM: models & modelling
• The aim is to derive (from empirical data) a function f():
Binary outcome Y:
Probability of outcome = f (set of predictor variables)
Pr(Y = 1) = f (X)
e.g. logistic regression
Continuous outcome Y:
Mean of outcome = f (set of predictor variables)
μΥ = f (X)
e.g. linear regression

Developing a CPM: Regression vs Machine Learning
Breiman. Stat Sci 2001;16:199-231
Data are ‘generated’ inside a black box by nature
The Modeling Culture The Algorithmic Culture
Assume a stochastic data
model for the inside of the
box, use it for predictions
(model or theory driven)
The inside of the box is complex
and unknown, find an algorithm
that operates well for prediction
(data driven)

Developing a CPM: regression models
• Linear models (linear combinations of Xs): f (X) =f ∑ Χ
• Pr(10 year CHD) = f (age, cholesterol, SBP, diabetes, smoking)
1
ℎ ! "
(simplied) Framingham risk score
Link to online calculator
D'Agostino et al Circulation 2008;117(6):743-53.

Common Regression Models
Model Equation
Linear Mean of Y = b0 + b1X1 + b2X2 + b3X3 + …
Logistic Log(odds) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Poisson Log(incidence rate) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Cox Log(hazard rate) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Y = outcome (response) variable
X1, X2, X3, …. = predictor variables
All “linear” models!
22

Logistic regression model
23
P
ΣbX
Log(odds) of Y = b0 + b1X1 + b2X2 + b3X3 + … (linearity in logit)
or, equivalently:
( )
1 1 2 2 3
0 3
b X b X b X
b
1
Probability of Y
1 e
− + + + +
=
+
L

Developing a CPM: Overfitting
Overfitting =
Source: https://retrobadge.co.uk/retrobadge/slogans-sayings-
badges/public-enemy-number-one-small-retro-badge/
Overfitting = What you see is not what you get!
“Idiosyncrasies in the data are fitted rather than
generalizable patterns. A model may hence not be
applicable to new patients, even when the setting of
application is very similar to the development setting”
Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.

https://twitter.com/LesGuessing/status/997146590442799105
https://www.facebook.com/machinelearningandAI/photo
s/a.888018821351139/1783631865123159/?type=3

https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-machine-learning/
“Make everything as simple as possible, but not simpler”
Albert Einstein (?)
“Complexity is our enemy. Any fool can make something complicated”
Sir Richard Branson, founder of the Virgin Group

Developing a CPM: Overfitting consequences
• Typical calibration plot with overfitting:
Source: Maarten van Smeden
 Discrimination (e.g. AUC) may not be affected, but:
 Low risks are underestimated
 High risk are overestimated

Developing a CPM: Overfitting prevention
Three most important steps:
1. Careful candidate predictors preselection
– Avoid data driven variable selection (e.g. stepwise), especially in small datasets
2. Avoid ignoring information, e.g. unnecessary dichotomization
3. Ensure sufficient sample size
– Use it all on model development (avoid splitting into training and test dataset)
Heinze G, et al. Biom J. 2018
Royston et al. Statistics in Medicine 2006

Developing a CPM: Predictor selection
• Highly Recommended: Careful Preselection of Variables
– Subject matter knowledge & clinical reasoning
– Literature review
– Practical constraints
(timeline, availability, data quality, costs)
Discussion between
statistician and
clinical colleagues
at the start

• Variable selection algorithms (avoid if possible!)
– Univariable filtering,
– Forward selection,
– Backward elimination
– Stepwise selection
– Best subset selection based information-theoretic measures (AIC, BIC)
– Change-in-estimate: Purposeful selection and augmented backward selection
– LASSO (Least Absolute Shrinkage and Selection Operator.)
Heinze G, et al. Biom J. 2018
Good and Hardin, Common Errors in Statistics (and
How to Avoid Them), p. 3, p. 152

Opinions on variable selection algorithms:
(Harrell 2001; Steyerberg 2009-19; Burnham & Anderson 2002; Royston & Sauerbrei 2008)
focus on prediction focus on explanation

Developing a CPM: dichotomania!
Dichotomization/categorization is very prevalent:
• Wynants et al. 2020 (COVID prediction models): 48%
• Collins et al. 2011 (Diabetes T2 prediction models): 63%
• Mallett et al. 2010 (Prognostic models in cancer): 70%

Developing a CPM: dichotomania!
Consequences of Unnecessary dichotomization of predictors:
 Biological implausible step-functions in predicted risk
 Loss of information
 Source of overfitting when cut-off is data-driven
e.g. Ensor et al. 2018: dichotomizing BMI equates to throwing away 1/3 of data

Developing a CPM: Sample size & EPV
• Calculating the minimum sample size required is complex! Still debatable and
under research with much advances in the last few years.
• How much model complexity (e.g. number predictors) can we afford?
• Rules of thumb:
– Logistic regression: At least 10 outcome events per variable

Developing a CPM: Sample size & EPV
• EPV = 10
– Number of candidate variables, not variables in the final model
– “Variables” = model parameters (degrees of freedom)
– Should be considered lower bound! (EPV ≥ 10)
• More studies:
Cited by 2611
Cited by 219

Developing a CPM: Sample size
Depends:
• Model complexity (e.g. # predictors, EPV)
• Data structure
• Performance required (e.g. high vs low stake medical decisions)

Developing a CPM: Sample size
Riley et al.
Developed methods and software to calculate
sample size that is needed to
• minimise potential overfitting
• estimate probability (risk) precisely

Validation

Validating a CPM: Measures of predictive performance
• Discrimination
– Ability of model to rank subjects according to
the risk of the outcome event.
– Trade-off between sensitivity and specificity
– But Se. Sp. depend on the threshold and
predictions come on a continuum.
– Assessed graphically with a Receiver
Operating Curve (ROC) and numerically by
the area under the curve (AUC = c-index)
• Calibration
– Agreement between outcome predictions
from the model and observed outcomes.
– Assessed graphically with calibration plots
– Assessed numerically with the calibration
slope (ideal slope = 1) and calibration
intercept (ideal CITL= 0)
Slope =1.05
CITL = 0.00
AUC = 0.75

Validating a CPM: Optimism
• Predictive performance is optimistic when estimated on the same dataset where
the risk prediction model was developed (“apparent performance”)
• Optimism can be large, especially with small sample size and/or many predictors
• To get a better estimate of the predictive performance:
- Internal validation (other sample / same source)
- External validation (other sample / other source)
Altman & Royton, 2000
Statistical police matter!

Validating a CPM: Internal validation
 Evaluate performance of CPM on same target population (reproducibility).
 Estimate the degree of optimism, produce optimism-corrected measures of
model performance:
– Split-sample validation into ‘training’ and ‘test’ sets
• Random split (avoid!)
• Non-random split (better!)
– Resampling methods
• Cross-validation
• Bootstrapping
(inefficient!)
(recommended!)
+ repeat all modelling steps!!

Validating a CPM: External Validation
• Strongest test of a prediction model in similar or different target populations or
domains (transportability).
– Temporal validation (more recently treated patients from other similar setting)
– Broader/geographic validation (in different areas/centres)
– Different settings (e.g. from adults to children)
• Expect decreased predictive performance due to heterogeneity:
– Different type of patients (case mix)
– Different outcome occurrence
– Differences in care over time
– Differences in treatments
Debray et al. J Clin Epi 2015
Model recalibration or
updating may need to
be considered
for transportability

Validating a CPM: Clinical Utility
• Output of models is not binary but a probability
– need to think about cut-offs (decision-thresholds) for decision-making
• Net Benefit
– indicates how many more TP classifications can be made with a model for the same
number of FP classifications, compared to not using a model
MEDICAL DECISION MAKING/NOV–DEC 2006
Decision threshold (pt): probability where expected benefit of
treatment is equal to the expected benefit of avoiding treatment
DCA:
NB over a range
of decision
thresholds

• NB = 0.05 at threshold of 10%
Using the model is equivalent to finding 5 outcomes per 100 patients
without unnecessary interventions
Validating a CPM: Decision Curve Analysis
CPM would be helpful for decisions with
a threshold probability of 10% or above.

Example of own applied research

Predicting the risk of hospital admission in OPAT patients
• OPAT = Outpatient Parenteral Antimicrobial Therapy
– Incepted in UK in 1990s to allow intravenous antibiotic receipt in patient homes or
in an outpatient clinics.
– Has been shown to be safe, clinically effective and cost effective for wide range of
infections when delivered through a formal service with appropriate specialist input
and clinical governance.
• Sheffield OPAT service, established 2006, one of the largest in the UK
– Even with careful patient selection about 1 of every 10 patients in OPAT required
hospitalization
– Reasons: Worsening of infection/no improvement, New infection, Adverse drug
reaction, etc
– Unclear why this happens, but could this be predicted in advance?
J Antimicrob Chemother 2019; 74: 3125–3127
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7

Model
development
+
basic internal
validation
Temporal &
broader
external
validation
+
Clinical utility
(DCA)

• Outcome: Unplanned inpatient admission to an acute care hospital
• Prediction horizon: 30 days of discharge from the OPAT service
• Predictor selection: Clinical reasoning + literature review + availability at OPAT start
• Predictors: 13 candidate predictors
• sex, age, prior hospitalizations in the past 12 months, Charlson
comorbidity score,
• drug-resistant organism, concurrent intravenous antimicrobial
therapy, four antimicrobial classes (penicillin, cephalosporin,
carbapenem and glycopeptide), type of infection, type of vascular
access (peripheral vs. central), mode of delivery
• Multi - correlation: 10 candidate predictors (only cephalosporins retained)
• Model: Logistic regression
• Linearity (in logit): Restricted cubic splines
• Sample size: n = 1073 patients / 123 events
-4
-3
-2
-1
0
1
2
3
4
Logit
0 1 2 3 4 5 6 7 8 9 10
Prior hospitalisations
(d) Cubic spline with knots at quartiles for full sample

Final model equation:
• Linear predictor (LP) for a patient :
LP = −3.628 + (0.016 × age in years) + (0.264 × number of prior hospitalizations)
+ (0.103 × Charlson comorbidity score) + (0.248, if self/carer administration)
+ (0.479, if infusion centre) + (0.635, if IV combination therapy)
+ (0.480, if endovascular infec[on) − (0.337, if respiratory disease)
+ (0.189, if urogenital infec[on) − (0.037, if bone and joint infec[on)
− (0.776, if skin and soft tissue infection).
• Probability of 30-day unplanned hospitalization for the same patient:
1
1 #$%

BJI, bone and joint infection; CN, community nurse; EI, endovascular infection; IC, infusion
centre; IV, intravenous; OT, other indication; RD, respiratory disease; SC, self/carer
administration; SSTI, skin and soft tissue infection; UGI, urogenital infection.
IC
CN SC
No Yes
OT
EI
RD UGI
BJI
SSTI
15 25 35 45 55 65 75 85 95
0 1 2 3 4 5 6 7 8 9 10
0 2 4 6 8 10 12 14 16
Age, years
Number of prior hospitalizations
Charlson comorbidity score
Mode of delivery
IV combination therapy
Indication for OPAT
0 1 2 3 4 5 6 7 8 9 10 11
Points
5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
30-day hospitalization risk
0 5 10 15 20 25 30
Total points

• Internal Validation
– (Apparent) AUC: C = 0.72 (95% CI 0.67 - 0.77)
– (validated) Bootstrap optimism-corrected c = 0.70
– Calibration slope 0.99 (95% CI 0.78 - 1.21)
Hosmer-Lemeshow test
p = 0.546
“The prediction model may help
improve OPAT outcomes through better
identification of high-risk patients and
provision of tailored care”
“Our model should be prospectively
validated in an independent diverse
patient population before use in actual
patient care”

• External validation
– Derivation cohort (n = 1073, events = 123, 11.5%), Sheffield 2015-2017
– Temporal validation cohort (n = 1087, events=159, 14.6%), Sheffield 2018-2020
– Broad external validation (n = 418, events=117, 28%), Derby 2018-2020
J Antimicrob Chemother 2021;76(8):2204-2212

Temporal
Discrimination
AUC = 0.75
(95%CI 0.71 – 0.79)
Calibration
Slope = 1.05
Intercept = 0.16
Broader
Discrimination
AUC = 0.67
(95%CI 0.61 – 0.73)
Calibration
Slope = 1.01
Intercept = 0.54
JAC 2021

Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
Model updating (intercept recalibration)

Decision curve analysis (clinical utility)
greater net benefit (i.e. clinically useful)
for a range of risk thresholds (15% to 50%)

• Conclusion
– the prediction model is temporally and externally valid, and clinically useful for the
prediction of 30 day unplanned hospitalization in patients receiving OPAT.
– It may help improve OPAT outcomes through better identification of high-risk
patients and provision of tailored care.
• What’s next?
– Translational research:
– Investigate introduction of CPM into clinical practice (acceptability, presentation)
– Plan an impact study (time series quasi experimental design)

What about machine learning?

ML is trendy!
0
2000
4000
6000
8000
10000
12000
14000
16000
"machine learning" in PubMed

Do ML algorithms over perform traditional regression models?
JAMA Network Open. 2020;3(1):e1918962.
January 10, 2020
• “Traditional statistical methods seem to
be more useful when EPV is small and a
priori knowledge is substantial
• ML could be more suited for a huge bulk
of data, such as omics, radiodiagnostics,
image analysis
• Integration of the two approaches should
be preferred over a unidirectional choice
of either approach.”
Medicina 2020, 56, 455

ML/AI: Four myths
1. Big Data will resolve the problems with small datasets
2. ML/AI is very different from classical modeling
3. ML / AI is better than classical modeling for medical prediction problems
4. ML / AI leads to better generalizability
No: Big data, Big controversies and possibly Big errors!
e.g. Ala-Korpela et al In J Epidemiol 2021
No: cultural differences, but a continuum
No: there is supporting evidence in systematic reviews
No: any prediction model may suffer from poor generazability

Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment

Similar to Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment (20)

Recently uploaded

Recently uploaded (20)

Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment