Developing & validating
clinical prediction models
to improve infection prevention and treatment
Evangelos I. Kritsotakis
Associate Professor of Biostatistics, University of Crete
Honorary Senior Lecturer in Epidemiology, University of Sheffield
e.kritsotakis@uoc.gr
07.04.2021
Joint Seminar Series in Translational and Clinical Medicine
UoC Medical School - IMBB-FORTH – UCRC
• Overview of CPM: key concepts, controversies & best practice
– Developing models: What do you want, when, and how?
– Explanation vs prediction
– Overfitting
– Selecting predictors, sample size
– Validation is needed!
– Calibration is essential
– Apparent performance & optimism
– External validation for translational research, but expect heterogeneity
– Statistical validity vs clinical utility (thresholds, DCA)
• Example of (own) applied research: predicting readmission in OPAT patients
• Machine learning / AI ? (exciting, but calm down!)
Outline
Clinical prediction models:
Key concepts
Key concepts: Explanation vs Prediction
To explain:
• Study (strength of) independent associations with the outcome, e.g.
– to find risk factors or causes
– to isolate the effect of a primary factor or intervention
Tsioutis C, Kritsotakis EI, et al. Int J Antimicrob Agents 2016; 48(5):492-7
To predict:
http://www.cvriskcalculator.com/
Obtain a system (set of variables + model) that
estimates the risk of the outcome
To predict:
• Obtain a system (set of variables + model) that estimates the risk of the outcome
• Aim is the use in NEW patients: it should work ‘tomorrow’, not now (validation)
Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021 (In Press)
Calibration
is essential
Key concepts: Diagnostic vs Prognostic CPM
van Smeden J Clin Epidemiol 2021;132:142-145.
Estimating probability of
• having the target
condition (prevalence)
vs
• getting the target
condition (incidence)
Key concepts: What do we mean by CPM?
• “Model development studies aim to derive a prediction model by
selecting the relevant predictors and combining them statistically into a
multivariable model.” TRIPOD statement, 2015
• “… summarize the effects of predictors to provide individualized
predictions of the absolute risk of a diagnostic or prognostic outcome.”
Steyerberg, 2019
• Reasons for wishing to make such personalized predictions include:
 to inform treatment or other clinical decisions for individual patients;
 to inform patients and their families;
 to create clinical risk groups for informing treatment or
 for stratifying patients by disease severity in clinical trials
Altman & Royston, 2000
CPM: current landscape
Maarten van Smeden tweet (March 17 2021):
Prediction model? My prediction is that somebody already did it
CPM: current landscape - example
“living”
systematic
review
This is update 3
(April 2020)
n = 232 CPMs
Conclusion Prediction models for covid-19 are quickly entering the
academic literature to support medical decision making at a time when they
are urgently needed. This review indicates that almost all pubished
prediction models are poorly reported, and at high risk of bias such
that their reported predictive performance is probably optimistic.
However, we have identified two (one diagnostic and one prognostic)
promising models …
BMJ 2020;369:m1328: 208 new models, 24 ext. validations, 3 main types:
CPM: current landscape – example – COVID-19
Main problems:
Participants
• Inappropriate exclusion or study design
Predictors
• Scored “unknown” in imaging studies
Outcome
• Subjective or proxy outcomes
Analysis
• Small sample size
• Inappropriate or incomplete evaluation of performance
PROBAST
bias risk:
97% high
3% unclear
Moderate to excellent predictive performance, but:
C: 0.71 to 0.99
C: 0.65 to 0.99
C: 0.54 to 0.99
Clinical prediction models:
Key issues in model development
Developing a CPM: Why, when, how?
• Is there a clinical need?
Of/for whom?
• Who is eligible to use the model (target population)?
• What specific outcome is predicted (target of prediction)?
• When should the prediction be made (time origin)?
• What is the time horizon for the prediction?
Data on such patients?
• Which predictors are available at that time point?
• What is the quality of the data?
Chen L Ann Transl Med 2020;8(4):71
JUST BECAUSE YOU CAN
CREATE A PREDICTIVE MODEL
DOES NOT MEAN THAT YOU SHOULD
When?
• Logistic regression, artificial neural
network, naive Bayes, and random
forest machine learning
algorithms.
• AUCs between 0.85 and 0.92
indicative of excellent predictive
performance
But, the models used future information!
Model predictors included hospital
complications, which can only be known
when hospital stay has ended.
Developing a CPM: Why, when, how?
Chen L Ann Transl Med 2020;8(4):71
Proceed:
 avoid dichotomizing,
 penalize where possible,
 do rigorous internal/external
validation,
 study model calibration,
 think hard about dealing with
missing data,
 Report following TRIPOD
guideline
CPM: reporting guidelines & risk of bias tools
• TRIPOD: reporting development/validation prediction models
• PROBAST: risk of bias development/validation prediction models
• STARD: reporting diagnostic accuracy studies
• QUADAS-2: risk of bias diagnostic accuracy studies
• REMARK: reporting (tumour marker) prognostic factor studies
• QUIPS: risk of bias in prognostic factor studies
• In development: TRIPOD-AI, TRIPOD-cluster, PROBAST-AI, STARD-AI, QUADAS-AI,
DECIDE-AI etc.
equator-network.org
Developing a CPM: models & modelling
• The aim is to derive (from empirical data) a function f():
Binary outcome Y:
Probability of outcome = f (set of predictor variables)
Pr(Y = 1) = f (X)
e.g. logistic regression
Continuous outcome Y:
Mean of outcome = f (set of predictor variables)
μΥ = f (X)
e.g. linear regression
Developing a CPM: Regression vs Machine Learning
Breiman. Stat Sci 2001;16:199-231
Data are ‘generated’ inside a black box by nature
The Modeling Culture The Algorithmic Culture
Assume a stochastic data
model for the inside of the
box, use it for predictions
(model or theory driven)
The inside of the box is complex
and unknown, find an algorithm
that operates well for prediction
(data driven)
Developing a CPM: regression models
• Linear models (linear combinations of Xs): f (X) =f ∑ Χ
• Pr(10 year CHD) = f (age, cholesterol, SBP, diabetes, smoking)
1
ℎ ! "
(simplied) Framingham risk score
Link to online calculator
D'Agostino et al Circulation 2008;117(6):743-53.
Common Regression Models
Model Equation
Linear Mean of Y = b0 + b1X1 + b2X2 + b3X3 + …
Logistic Log(odds) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Poisson Log(incidence rate) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Cox Log(hazard rate) of Y = b0 + b1X1 + b2X2 + b3X3 + …
Y = outcome (response) variable
X1, X2, X3, …. = predictor variables
All “linear” models!
22
Logistic regression model
23
P
ΣbX
Log(odds) of Y = b0 + b1X1 + b2X2 + b3X3 + … (linearity in logit)
or, equivalently:
( )
1 1 2 2 3
0 3
b X b X b X
b
1
Probability of Y
1 e
− + + + +
=
+
L
Developing a CPM: Overfitting
Overfitting =
Source: https://retrobadge.co.uk/retrobadge/slogans-sayings-
badges/public-enemy-number-one-small-retro-badge/
Overfitting = What you see is not what you get!
“Idiosyncrasies in the data are fitted rather than
generalizable patterns. A model may hence not be
applicable to new patients, even when the setting of
application is very similar to the development setting”
Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
Developing a CPM: Overfitting
https://twitter.com/LesGuessing/status/997146590442799105
https://www.facebook.com/machinelearningandAI/photo
s/a.888018821351139/1783631865123159/?type=3
Developing a CPM: Overfitting
https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-machine-learning/
“Make everything as simple as possible, but not simpler”
Albert Einstein (?)
“Complexity is our enemy. Any fool can make something complicated”
Sir Richard Branson, founder of the Virgin Group
Developing a CPM: Overfitting consequences
• Typical calibration plot with overfitting:
Source: Maarten van Smeden
 Discrimination (e.g. AUC) may not be affected, but:
 Low risks are underestimated
 High risk are overestimated
Developing a CPM: Overfitting prevention
Three most important steps:
1. Careful candidate predictors preselection
– Avoid data driven variable selection (e.g. stepwise), especially in small datasets
2. Avoid ignoring information, e.g. unnecessary dichotomization
3. Ensure sufficient sample size
– Use it all on model development (avoid splitting into training and test dataset)
Heinze G, et al. Biom J. 2018
Royston et al. Statistics in Medicine 2006
Developing a CPM: Predictor selection
• Highly Recommended: Careful Preselection of Variables
– Subject matter knowledge & clinical reasoning
– Literature review
– Practical constraints
(timeline, availability, data quality, costs)
Discussion between
statistician and
clinical colleagues
at the start
Developing a CPM: Predictor selection
• Variable selection algorithms (avoid if possible!)
– Univariable filtering,
– Forward selection,
– Backward elimination
– Stepwise selection
– Best subset selection based information-theoretic measures (AIC, BIC)
– Change-in-estimate: Purposeful selection and augmented backward selection
– LASSO (Least Absolute Shrinkage and Selection Operator.)
Heinze G, et al. Biom J. 2018
Good and Hardin, Common Errors in Statistics (and
How to Avoid Them), p. 3, p. 152
Developing a CPM: Predictor selection
Opinions on variable selection algorithms:
(Harrell 2001; Steyerberg 2009-19; Burnham & Anderson 2002; Royston & Sauerbrei 2008)
focus on prediction focus on explanation
Developing a CPM: dichotomania!
Dichotomization/categorization is very prevalent:
• Wynants et al. 2020 (COVID prediction models): 48%
• Collins et al. 2011 (Diabetes T2 prediction models): 63%
• Mallett et al. 2010 (Prognostic models in cancer): 70%
Developing a CPM: dichotomania!
Consequences of Unnecessary dichotomization of predictors:
 Biological implausible step-functions in predicted risk
 Loss of information
 Source of overfitting when cut-off is data-driven
e.g. Ensor et al. 2018: dichotomizing BMI equates to throwing away 1/3 of data
Developing a CPM: Sample size & EPV
• Calculating the minimum sample size required is complex! Still debatable and
under research with much advances in the last few years.
• How much model complexity (e.g. number predictors) can we afford?
• Rules of thumb:
– Logistic regression: At least 10 outcome events per variable
Developing a CPM: Sample size & EPV
• EPV = 10
– Number of candidate variables, not variables in the final model
– “Variables” = model parameters (degrees of freedom)
– Should be considered lower bound! (EPV ≥ 10)
• More studies:
Cited by 2611
Cited by 219
Developing a CPM: Sample size
Depends:
• Model complexity (e.g. # predictors, EPV)
• Data structure
• Performance required (e.g. high vs low stake medical decisions)
Developing a CPM: Sample size
Riley et al.
Developed methods and software to calculate
sample size that is needed to
• minimise potential overfitting
• estimate probability (risk) precisely
Clinical prediction models:
Validation
Validating a CPM: Measures of predictive performance
• Discrimination
– Ability of model to rank subjects according to
the risk of the outcome event.
– Trade-off between sensitivity and specificity
– But Se. Sp. depend on the threshold and
predictions come on a continuum.
– Assessed graphically with a Receiver
Operating Curve (ROC) and numerically by
the area under the curve (AUC = c-index)
• Calibration
– Agreement between outcome predictions
from the model and observed outcomes.
– Assessed graphically with calibration plots
– Assessed numerically with the calibration
slope (ideal slope = 1) and calibration
intercept (ideal CITL= 0)
Slope =1.05
CITL = 0.00
AUC = 0.75
Validating a CPM: Optimism
• Predictive performance is optimistic when estimated on the same dataset where
the risk prediction model was developed (“apparent performance”)
• Optimism can be large, especially with small sample size and/or many predictors
• To get a better estimate of the predictive performance:
- Internal validation (other sample / same source)
- External validation (other sample / other source)
Altman & Royton, 2000
Statistical police matter!
Validating a CPM: Internal validation
 Evaluate performance of CPM on same target population (reproducibility).
 Estimate the degree of optimism, produce optimism-corrected measures of
model performance:
– Split-sample validation into ‘training’ and ‘test’ sets
• Random split (avoid!)
• Non-random split (better!)
– Resampling methods
• Cross-validation
• Bootstrapping
(inefficient!)
(recommended!)
+ repeat all modelling steps!!
Validating a CPM: External Validation
• Strongest test of a prediction model in similar or different target populations or
domains (transportability).
– Temporal validation (more recently treated patients from other similar setting)
– Broader/geographic validation (in different areas/centres)
– Different settings (e.g. from adults to children)
• Expect decreased predictive performance due to heterogeneity:
– Different type of patients (case mix)
– Different outcome occurrence
– Differences in care over time
– Differences in treatments
Debray et al. J Clin Epi 2015
Model recalibration or
updating may need to
be considered
for transportability
Validating a CPM: Clinical Utility
• Output of models is not binary but a probability
– need to think about cut-offs (decision-thresholds) for decision-making
• Net Benefit
– indicates how many more TP classifications can be made with a model for the same
number of FP classifications, compared to not using a model
MEDICAL DECISION MAKING/NOV–DEC 2006
Decision threshold (pt): probability where expected benefit of
treatment is equal to the expected benefit of avoiding treatment
DCA:
NB over a range
of decision
thresholds
• NB = 0.05 at threshold of 10%
Using the model is equivalent to finding 5 outcomes per 100 patients
without unnecessary interventions
Validating a CPM: Decision Curve Analysis
CPM would be helpful for decisions with
a threshold probability of 10% or above.
Clinical prediction models:
Example of own applied research
Predicting the risk of hospital admission in OPAT patients
• OPAT = Outpatient Parenteral Antimicrobial Therapy
– Incepted in UK in 1990s to allow intravenous antibiotic receipt in patient homes or
in an outpatient clinics.
– Has been shown to be safe, clinically effective and cost effective for wide range of
infections when delivered through a formal service with appropriate specialist input
and clinical governance.
• Sheffield OPAT service, established 2006, one of the largest in the UK
– Even with careful patient selection about 1 of every 10 patients in OPAT required
hospitalization
– Reasons: Worsening of infection/no improvement, New infection, Adverse drug
reaction, etc
– Unclear why this happens, but could this be predicted in advance?
J Antimicrob Chemother 2019; 74: 3125–3127
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
Predicting the risk of hospital admission in OPAT patients
Model
development
+
basic internal
validation
Temporal &
broader
external
validation
+
Clinical utility
(DCA)
Predicting the risk of hospital admission in OPAT patients
• Outcome: Unplanned inpatient admission to an acute care hospital
• Prediction horizon: 30 days of discharge from the OPAT service
• Predictor selection: Clinical reasoning + literature review + availability at OPAT start
• Predictors: 13 candidate predictors
• sex, age, prior hospitalizations in the past 12 months, Charlson
comorbidity score,
• drug-resistant organism, concurrent intravenous antimicrobial
therapy, four antimicrobial classes (penicillin, cephalosporin,
carbapenem and glycopeptide), type of infection, type of vascular
access (peripheral vs. central), mode of delivery
• Multi - correlation: 10 candidate predictors (only cephalosporins retained)
• Model: Logistic regression
• Linearity (in logit): Restricted cubic splines
• Sample size: n = 1073 patients / 123 events
-4
-3
-2
-1
0
1
2
3
4
Logit
0 1 2 3 4 5 6 7 8 9 10
Prior hospitalisations
(d) Cubic spline with knots at quartiles for full sample
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
Predicting the risk of hospital admission in OPAT patients
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
Predicting the risk of hospital admission in OPAT patients
Final model equation:
• Linear predictor (LP) for a patient :
LP = −3.628 + (0.016 × age in years)  + (0.264 × number of prior hospitalizations) 
+ (0.103 × Charlson comorbidity score)   +  (0.248, if self/carer administration) 
+ (0.479, if infusion centre) + (0.635, if IV combination therapy) 
+ (0.480, if endovascular infec[on) − (0.337, if respiratory disease) 
+ (0.189, if urogenital infec[on) − (0.037, if bone and joint infec[on) 
− (0.776, if skin and soft tissue infection).
• Probability of 30-day unplanned hospitalization for the same patient:
1
1 #$%
Predicting the risk of hospital admission in OPAT patients
BJI, bone and joint infection; CN, community nurse; EI, endovascular infection; IC, infusion
centre; IV, intravenous; OT, other indication; RD, respiratory disease; SC, self/carer
administration; SSTI, skin and soft tissue infection; UGI, urogenital infection.
IC
CN SC
No Yes
OT
EI
RD UGI
BJI
SSTI
15 25 35 45 55 65 75 85 95
0 1 2 3 4 5 6 7 8 9 10
0 2 4 6 8 10 12 14 16
Age, years
Number of prior hospitalizations
Charlson comorbidity score
Mode of delivery
IV combination therapy
Indication for OPAT
0 1 2 3 4 5 6 7 8 9 10 11
Points
5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
30-day hospitalization risk
0 5 10 15 20 25 30
Total points
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
Predicting the risk of hospital admission in OPAT patients
• Internal Validation
– (Apparent) AUC: C = 0.72 (95% CI 0.67 - 0.77)
– (validated) Bootstrap optimism-corrected c = 0.70
– Calibration slope 0.99 (95% CI 0.78 - 1.21)
Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
Hosmer-Lemeshow test
p = 0.546
“The prediction model may help
improve OPAT outcomes through better
identification of high-risk patients and
provision of tailored care”
“Our model should be prospectively
validated in an independent diverse
patient population before use in actual
patient care”
Predicting the risk of hospital admission in OPAT patients
• External validation
– Derivation cohort (n = 1073, events = 123, 11.5%), Sheffield 2015-2017
– Temporal validation cohort (n = 1087, events=159, 14.6%), Sheffield 2018-2020
– Broad external validation (n = 418, events=117, 28%), Derby 2018-2020
J Antimicrob Chemother 2021;76(8):2204-2212
Predicting the risk of hospital admission in OPAT patients
Temporal
Discrimination
AUC = 0.75
(95%CI 0.71 – 0.79)
Calibration
Slope = 1.05
Intercept = 0.16
Broader
Discrimination
AUC = 0.67
(95%CI 0.61 – 0.73)
Calibration
Slope = 1.01
Intercept = 0.54
JAC 2021
Predicting the risk of hospital admission in OPAT patients
Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
Model updating (intercept recalibration)
Predicting the risk of hospital admission in OPAT patients
Decision curve analysis (clinical utility)
greater net benefit (i.e. clinically useful)
for a range of risk thresholds (15% to 50%)
Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
Predicting the risk of hospital admission in OPAT patients
• Conclusion
– the prediction model is temporally and externally valid, and clinically useful for the
prediction of 30 day unplanned hospitalization in patients receiving OPAT.
– It may help improve OPAT outcomes through better identification of high-risk
patients and provision of tailored care.
• What’s next?
– Translational research:
– Investigate introduction of CPM into clinical practice (acceptability, presentation)
– Plan an impact study (time series quasi experimental design)
Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
Clinical prediction models:
What about machine learning?
ML is trendy!
0
2000
4000
6000
8000
10000
12000
14000
16000
"machine learning" in PubMed
Do ML algorithms over perform traditional regression models?
JAMA Network Open. 2020;3(1):e1918962.
January 10, 2020
• “Traditional statistical methods seem to
be more useful when EPV is small and a
priori knowledge is substantial
• ML could be more suited for a huge bulk
of data, such as omics, radiodiagnostics,
image analysis
• Integration of the two approaches should
be preferred over a unidirectional choice
of either approach.”
Medicina 2020, 56, 455
ML/AI: Four myths
1. Big Data will resolve the problems with small datasets
2. ML/AI is very different from classical modeling
3. ML / AI is better than classical modeling for medical prediction problems
4. ML / AI leads to better generalizability
No: Big data, Big controversies and possibly Big errors!
e.g. Ala-Korpela et al In J Epidemiol 2021
No: cultural differences, but a continuum
No: there is supporting evidence in systematic reviews
No: any prediction model may suffer from poor generazability
Thank you
for your attention

Developing and validating statistical models for clinical prediction and prognosis to improve infection prevention and treatment

  • 1.
    Developing & validating clinicalprediction models to improve infection prevention and treatment Evangelos I. Kritsotakis Associate Professor of Biostatistics, University of Crete Honorary Senior Lecturer in Epidemiology, University of Sheffield e.kritsotakis@uoc.gr 07.04.2021 Joint Seminar Series in Translational and Clinical Medicine UoC Medical School - IMBB-FORTH – UCRC
  • 2.
    • Overview ofCPM: key concepts, controversies & best practice – Developing models: What do you want, when, and how? – Explanation vs prediction – Overfitting – Selecting predictors, sample size – Validation is needed! – Calibration is essential – Apparent performance & optimism – External validation for translational research, but expect heterogeneity – Statistical validity vs clinical utility (thresholds, DCA) • Example of (own) applied research: predicting readmission in OPAT patients • Machine learning / AI ? (exciting, but calm down!) Outline
  • 3.
  • 4.
  • 5.
    To explain: • Study(strength of) independent associations with the outcome, e.g. – to find risk factors or causes – to isolate the effect of a primary factor or intervention Tsioutis C, Kritsotakis EI, et al. Int J Antimicrob Agents 2016; 48(5):492-7
  • 6.
    To predict: http://www.cvriskcalculator.com/ Obtain asystem (set of variables + model) that estimates the risk of the outcome
  • 7.
    To predict: • Obtaina system (set of variables + model) that estimates the risk of the outcome • Aim is the use in NEW patients: it should work ‘tomorrow’, not now (validation) Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021 (In Press) Calibration is essential
  • 8.
    Key concepts: Diagnosticvs Prognostic CPM van Smeden J Clin Epidemiol 2021;132:142-145. Estimating probability of • having the target condition (prevalence) vs • getting the target condition (incidence)
  • 9.
    Key concepts: Whatdo we mean by CPM? • “Model development studies aim to derive a prediction model by selecting the relevant predictors and combining them statistically into a multivariable model.” TRIPOD statement, 2015 • “… summarize the effects of predictors to provide individualized predictions of the absolute risk of a diagnostic or prognostic outcome.” Steyerberg, 2019 • Reasons for wishing to make such personalized predictions include:  to inform treatment or other clinical decisions for individual patients;  to inform patients and their families;  to create clinical risk groups for informing treatment or  for stratifying patients by disease severity in clinical trials Altman & Royston, 2000
  • 10.
    CPM: current landscape Maartenvan Smeden tweet (March 17 2021): Prediction model? My prediction is that somebody already did it
  • 11.
    CPM: current landscape- example “living” systematic review This is update 3 (April 2020) n = 232 CPMs Conclusion Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. However, we have identified two (one diagnostic and one prognostic) promising models …
  • 12.
    BMJ 2020;369:m1328: 208new models, 24 ext. validations, 3 main types: CPM: current landscape – example – COVID-19 Main problems: Participants • Inappropriate exclusion or study design Predictors • Scored “unknown” in imaging studies Outcome • Subjective or proxy outcomes Analysis • Small sample size • Inappropriate or incomplete evaluation of performance PROBAST bias risk: 97% high 3% unclear Moderate to excellent predictive performance, but: C: 0.71 to 0.99 C: 0.65 to 0.99 C: 0.54 to 0.99
  • 13.
    Clinical prediction models: Keyissues in model development
  • 14.
    Developing a CPM:Why, when, how? • Is there a clinical need? Of/for whom? • Who is eligible to use the model (target population)? • What specific outcome is predicted (target of prediction)? • When should the prediction be made (time origin)? • What is the time horizon for the prediction? Data on such patients? • Which predictors are available at that time point? • What is the quality of the data? Chen L Ann Transl Med 2020;8(4):71 JUST BECAUSE YOU CAN CREATE A PREDICTIVE MODEL DOES NOT MEAN THAT YOU SHOULD
  • 15.
    When? • Logistic regression,artificial neural network, naive Bayes, and random forest machine learning algorithms. • AUCs between 0.85 and 0.92 indicative of excellent predictive performance But, the models used future information! Model predictors included hospital complications, which can only be known when hospital stay has ended.
  • 16.
    Developing a CPM:Why, when, how? Chen L Ann Transl Med 2020;8(4):71 Proceed:  avoid dichotomizing,  penalize where possible,  do rigorous internal/external validation,  study model calibration,  think hard about dealing with missing data,  Report following TRIPOD guideline
  • 17.
    CPM: reporting guidelines& risk of bias tools • TRIPOD: reporting development/validation prediction models • PROBAST: risk of bias development/validation prediction models • STARD: reporting diagnostic accuracy studies • QUADAS-2: risk of bias diagnostic accuracy studies • REMARK: reporting (tumour marker) prognostic factor studies • QUIPS: risk of bias in prognostic factor studies • In development: TRIPOD-AI, TRIPOD-cluster, PROBAST-AI, STARD-AI, QUADAS-AI, DECIDE-AI etc. equator-network.org
  • 18.
    Developing a CPM:models & modelling • The aim is to derive (from empirical data) a function f(): Binary outcome Y: Probability of outcome = f (set of predictor variables) Pr(Y = 1) = f (X) e.g. logistic regression Continuous outcome Y: Mean of outcome = f (set of predictor variables) μΥ = f (X) e.g. linear regression
  • 19.
    Developing a CPM:Regression vs Machine Learning Breiman. Stat Sci 2001;16:199-231 Data are ‘generated’ inside a black box by nature The Modeling Culture The Algorithmic Culture Assume a stochastic data model for the inside of the box, use it for predictions (model or theory driven) The inside of the box is complex and unknown, find an algorithm that operates well for prediction (data driven)
  • 20.
    Developing a CPM:regression models • Linear models (linear combinations of Xs): f (X) =f ∑ Χ • Pr(10 year CHD) = f (age, cholesterol, SBP, diabetes, smoking) 1 ℎ ! " (simplied) Framingham risk score Link to online calculator D'Agostino et al Circulation 2008;117(6):743-53.
  • 21.
    Common Regression Models ModelEquation Linear Mean of Y = b0 + b1X1 + b2X2 + b3X3 + … Logistic Log(odds) of Y = b0 + b1X1 + b2X2 + b3X3 + … Poisson Log(incidence rate) of Y = b0 + b1X1 + b2X2 + b3X3 + … Cox Log(hazard rate) of Y = b0 + b1X1 + b2X2 + b3X3 + … Y = outcome (response) variable X1, X2, X3, …. = predictor variables All “linear” models! 22
  • 22.
    Logistic regression model 23 P ΣbX Log(odds)of Y = b0 + b1X1 + b2X2 + b3X3 + … (linearity in logit) or, equivalently: ( ) 1 1 2 2 3 0 3 b X b X b X b 1 Probability of Y 1 e − + + + + = + L
  • 23.
    Developing a CPM:Overfitting Overfitting = Source: https://retrobadge.co.uk/retrobadge/slogans-sayings- badges/public-enemy-number-one-small-retro-badge/ Overfitting = What you see is not what you get! “Idiosyncrasies in the data are fitted rather than generalizable patterns. A model may hence not be applicable to new patients, even when the setting of application is very similar to the development setting” Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
  • 24.
    Developing a CPM:Overfitting https://twitter.com/LesGuessing/status/997146590442799105 https://www.facebook.com/machinelearningandAI/photo s/a.888018821351139/1783631865123159/?type=3
  • 25.
    Developing a CPM:Overfitting https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-machine-learning/ “Make everything as simple as possible, but not simpler” Albert Einstein (?) “Complexity is our enemy. Any fool can make something complicated” Sir Richard Branson, founder of the Virgin Group
  • 26.
    Developing a CPM:Overfitting consequences • Typical calibration plot with overfitting: Source: Maarten van Smeden  Discrimination (e.g. AUC) may not be affected, but:  Low risks are underestimated  High risk are overestimated
  • 27.
    Developing a CPM:Overfitting prevention Three most important steps: 1. Careful candidate predictors preselection – Avoid data driven variable selection (e.g. stepwise), especially in small datasets 2. Avoid ignoring information, e.g. unnecessary dichotomization 3. Ensure sufficient sample size – Use it all on model development (avoid splitting into training and test dataset) Heinze G, et al. Biom J. 2018 Royston et al. Statistics in Medicine 2006
  • 28.
    Developing a CPM:Predictor selection • Highly Recommended: Careful Preselection of Variables – Subject matter knowledge & clinical reasoning – Literature review – Practical constraints (timeline, availability, data quality, costs) Discussion between statistician and clinical colleagues at the start
  • 29.
    Developing a CPM:Predictor selection • Variable selection algorithms (avoid if possible!) – Univariable filtering, – Forward selection, – Backward elimination – Stepwise selection – Best subset selection based information-theoretic measures (AIC, BIC) – Change-in-estimate: Purposeful selection and augmented backward selection – LASSO (Least Absolute Shrinkage and Selection Operator.) Heinze G, et al. Biom J. 2018 Good and Hardin, Common Errors in Statistics (and How to Avoid Them), p. 3, p. 152
  • 30.
    Developing a CPM:Predictor selection Opinions on variable selection algorithms: (Harrell 2001; Steyerberg 2009-19; Burnham & Anderson 2002; Royston & Sauerbrei 2008) focus on prediction focus on explanation
  • 31.
    Developing a CPM:dichotomania! Dichotomization/categorization is very prevalent: • Wynants et al. 2020 (COVID prediction models): 48% • Collins et al. 2011 (Diabetes T2 prediction models): 63% • Mallett et al. 2010 (Prognostic models in cancer): 70%
  • 32.
    Developing a CPM:dichotomania! Consequences of Unnecessary dichotomization of predictors:  Biological implausible step-functions in predicted risk  Loss of information  Source of overfitting when cut-off is data-driven e.g. Ensor et al. 2018: dichotomizing BMI equates to throwing away 1/3 of data
  • 33.
    Developing a CPM:Sample size & EPV • Calculating the minimum sample size required is complex! Still debatable and under research with much advances in the last few years. • How much model complexity (e.g. number predictors) can we afford? • Rules of thumb: – Logistic regression: At least 10 outcome events per variable
  • 34.
    Developing a CPM:Sample size & EPV • EPV = 10 – Number of candidate variables, not variables in the final model – “Variables” = model parameters (degrees of freedom) – Should be considered lower bound! (EPV ≥ 10) • More studies: Cited by 2611 Cited by 219
  • 35.
    Developing a CPM:Sample size Depends: • Model complexity (e.g. # predictors, EPV) • Data structure • Performance required (e.g. high vs low stake medical decisions)
  • 36.
    Developing a CPM:Sample size Riley et al. Developed methods and software to calculate sample size that is needed to • minimise potential overfitting • estimate probability (risk) precisely
  • 37.
  • 38.
    Validating a CPM:Measures of predictive performance • Discrimination – Ability of model to rank subjects according to the risk of the outcome event. – Trade-off between sensitivity and specificity – But Se. Sp. depend on the threshold and predictions come on a continuum. – Assessed graphically with a Receiver Operating Curve (ROC) and numerically by the area under the curve (AUC = c-index) • Calibration – Agreement between outcome predictions from the model and observed outcomes. – Assessed graphically with calibration plots – Assessed numerically with the calibration slope (ideal slope = 1) and calibration intercept (ideal CITL= 0) Slope =1.05 CITL = 0.00 AUC = 0.75
  • 39.
    Validating a CPM:Optimism • Predictive performance is optimistic when estimated on the same dataset where the risk prediction model was developed (“apparent performance”) • Optimism can be large, especially with small sample size and/or many predictors • To get a better estimate of the predictive performance: - Internal validation (other sample / same source) - External validation (other sample / other source) Altman & Royton, 2000 Statistical police matter!
  • 40.
    Validating a CPM:Internal validation  Evaluate performance of CPM on same target population (reproducibility).  Estimate the degree of optimism, produce optimism-corrected measures of model performance: – Split-sample validation into ‘training’ and ‘test’ sets • Random split (avoid!) • Non-random split (better!) – Resampling methods • Cross-validation • Bootstrapping (inefficient!) (recommended!) + repeat all modelling steps!!
  • 41.
    Validating a CPM:External Validation • Strongest test of a prediction model in similar or different target populations or domains (transportability). – Temporal validation (more recently treated patients from other similar setting) – Broader/geographic validation (in different areas/centres) – Different settings (e.g. from adults to children) • Expect decreased predictive performance due to heterogeneity: – Different type of patients (case mix) – Different outcome occurrence – Differences in care over time – Differences in treatments Debray et al. J Clin Epi 2015 Model recalibration or updating may need to be considered for transportability
  • 42.
    Validating a CPM:Clinical Utility • Output of models is not binary but a probability – need to think about cut-offs (decision-thresholds) for decision-making • Net Benefit – indicates how many more TP classifications can be made with a model for the same number of FP classifications, compared to not using a model MEDICAL DECISION MAKING/NOV–DEC 2006 Decision threshold (pt): probability where expected benefit of treatment is equal to the expected benefit of avoiding treatment DCA: NB over a range of decision thresholds
  • 43.
    • NB =0.05 at threshold of 10% Using the model is equivalent to finding 5 outcomes per 100 patients without unnecessary interventions Validating a CPM: Decision Curve Analysis CPM would be helpful for decisions with a threshold probability of 10% or above.
  • 44.
    Clinical prediction models: Exampleof own applied research
  • 45.
    Predicting the riskof hospital admission in OPAT patients • OPAT = Outpatient Parenteral Antimicrobial Therapy – Incepted in UK in 1990s to allow intravenous antibiotic receipt in patient homes or in an outpatient clinics. – Has been shown to be safe, clinically effective and cost effective for wide range of infections when delivered through a formal service with appropriate specialist input and clinical governance. • Sheffield OPAT service, established 2006, one of the largest in the UK – Even with careful patient selection about 1 of every 10 patients in OPAT required hospitalization – Reasons: Worsening of infection/no improvement, New infection, Adverse drug reaction, etc – Unclear why this happens, but could this be predicted in advance? J Antimicrob Chemother 2019; 74: 3125–3127 Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
  • 46.
    Predicting the riskof hospital admission in OPAT patients Model development + basic internal validation Temporal & broader external validation + Clinical utility (DCA)
  • 47.
    Predicting the riskof hospital admission in OPAT patients • Outcome: Unplanned inpatient admission to an acute care hospital • Prediction horizon: 30 days of discharge from the OPAT service • Predictor selection: Clinical reasoning + literature review + availability at OPAT start • Predictors: 13 candidate predictors • sex, age, prior hospitalizations in the past 12 months, Charlson comorbidity score, • drug-resistant organism, concurrent intravenous antimicrobial therapy, four antimicrobial classes (penicillin, cephalosporin, carbapenem and glycopeptide), type of infection, type of vascular access (peripheral vs. central), mode of delivery • Multi - correlation: 10 candidate predictors (only cephalosporins retained) • Model: Logistic regression • Linearity (in logit): Restricted cubic splines • Sample size: n = 1073 patients / 123 events -4 -3 -2 -1 0 1 2 3 4 Logit 0 1 2 3 4 5 6 7 8 9 10 Prior hospitalisations (d) Cubic spline with knots at quartiles for full sample Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
  • 48.
    Predicting the riskof hospital admission in OPAT patients Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
  • 49.
    Predicting the riskof hospital admission in OPAT patients Final model equation: • Linear predictor (LP) for a patient : LP = −3.628 + (0.016 × age in years)  + (0.264 × number of prior hospitalizations)  + (0.103 × Charlson comorbidity score)   +  (0.248, if self/carer administration)  + (0.479, if infusion centre) + (0.635, if IV combination therapy)  + (0.480, if endovascular infec[on) − (0.337, if respiratory disease)  + (0.189, if urogenital infec[on) − (0.037, if bone and joint infec[on)  − (0.776, if skin and soft tissue infection). • Probability of 30-day unplanned hospitalization for the same patient: 1 1 #$%
  • 50.
    Predicting the riskof hospital admission in OPAT patients BJI, bone and joint infection; CN, community nurse; EI, endovascular infection; IC, infusion centre; IV, intravenous; OT, other indication; RD, respiratory disease; SC, self/carer administration; SSTI, skin and soft tissue infection; UGI, urogenital infection. IC CN SC No Yes OT EI RD UGI BJI SSTI 15 25 35 45 55 65 75 85 95 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 Age, years Number of prior hospitalizations Charlson comorbidity score Mode of delivery IV combination therapy Indication for OPAT 0 1 2 3 4 5 6 7 8 9 10 11 Points 5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 30-day hospitalization risk 0 5 10 15 20 25 30 Total points Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7
  • 51.
    Predicting the riskof hospital admission in OPAT patients • Internal Validation – (Apparent) AUC: C = 0.72 (95% CI 0.67 - 0.77) – (validated) Bootstrap optimism-corrected c = 0.70 – Calibration slope 0.99 (95% CI 0.78 - 1.21) Clinical Microbiology and Infection 2019; 25(7): 905.e1e905.e7 Hosmer-Lemeshow test p = 0.546 “The prediction model may help improve OPAT outcomes through better identification of high-risk patients and provision of tailored care” “Our model should be prospectively validated in an independent diverse patient population before use in actual patient care”
  • 52.
    Predicting the riskof hospital admission in OPAT patients • External validation – Derivation cohort (n = 1073, events = 123, 11.5%), Sheffield 2015-2017 – Temporal validation cohort (n = 1087, events=159, 14.6%), Sheffield 2018-2020 – Broad external validation (n = 418, events=117, 28%), Derby 2018-2020 J Antimicrob Chemother 2021;76(8):2204-2212
  • 53.
    Predicting the riskof hospital admission in OPAT patients Temporal Discrimination AUC = 0.75 (95%CI 0.71 – 0.79) Calibration Slope = 1.05 Intercept = 0.16 Broader Discrimination AUC = 0.67 (95%CI 0.61 – 0.73) Calibration Slope = 1.01 Intercept = 0.54 JAC 2021
  • 54.
    Predicting the riskof hospital admission in OPAT patients Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021 Model updating (intercept recalibration)
  • 55.
    Predicting the riskof hospital admission in OPAT patients Decision curve analysis (clinical utility) greater net benefit (i.e. clinically useful) for a range of risk thresholds (15% to 50%) Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
  • 56.
    Predicting the riskof hospital admission in OPAT patients • Conclusion – the prediction model is temporally and externally valid, and clinically useful for the prediction of 30 day unplanned hospitalization in patients receiving OPAT. – It may help improve OPAT outcomes through better identification of high-risk patients and provision of tailored care. • What’s next? – Translational research: – Investigate introduction of CPM into clinical practice (acceptability, presentation) – Plan an impact study (time series quasi experimental design) Durojaiye et al. Journal of Antimicrobial Chemotherapy 2021
  • 57.
    Clinical prediction models: Whatabout machine learning?
  • 58.
  • 59.
    Do ML algorithmsover perform traditional regression models? JAMA Network Open. 2020;3(1):e1918962. January 10, 2020 • “Traditional statistical methods seem to be more useful when EPV is small and a priori knowledge is substantial • ML could be more suited for a huge bulk of data, such as omics, radiodiagnostics, image analysis • Integration of the two approaches should be preferred over a unidirectional choice of either approach.” Medicina 2020, 56, 455
  • 60.
    ML/AI: Four myths 1.Big Data will resolve the problems with small datasets 2. ML/AI is very different from classical modeling 3. ML / AI is better than classical modeling for medical prediction problems 4. ML / AI leads to better generalizability No: Big data, Big controversies and possibly Big errors! e.g. Ala-Korpela et al In J Epidemiol 2021 No: cultural differences, but a continuum No: there is supporting evidence in systematic reviews No: any prediction model may suffer from poor generazability
  • 61.