Bias in covid 19 models

Bias in COVID-19 models
Learning Machine Learning
Universidad del Rosario, 15/07/2021
Laure Wynants PhD
Maastricht University, Department of Epidemiology
KU Leuven, Department of Development and Regeneration, EPI-Centre
laure.wynants@maastrichtuniversity.nl
@laure_wynants

Overview
1. Terminology
2. Models for covid-19
3. Bias

What is your (educational) background?

My background
Ma Sociology
Assistant
Prof.
Epidemiology
PhD Electrical Engineering
Ma
Biostatistics

Diagnostic and prognostic models
data-
driven

Van Smeden et al. Clinical prediction models: diagnosis versus prognosis, JCE, in press

Some more terminology
– you may want to take a screenshot
Statistics / Epi Machine learning
Prediction Supervised learning
Outcome variable, dependent variable Target
Gold standard Ground truth
Predictor, covariate, independent
variable
Feature
Fitting Learning
Parameter Weights
Development – validation Training - test
Sensitivity Recall
Positive Predictive value Precision

Why bother?
“As of today, we have deployed the system in 16 hospitals, and it is
performing over 1,300 screenings per day”
MedRxiv pre-print only, 23 March 2020,
doi.org/10.1101/2020.03.19.20039354

Study search
prognosis
diagnosis
susceptibility

Characteristics of reviewed models II
114 out of 236 models (48%) were available in a format for use in clinical practice.

Commonly included predictors
DIAGNOSTIC MODELS
VITAL SIGNS (FEVER)
FLU-LIKE SIGNS AND SYMPTOMS
AGE
ELECTROLYTES
IMAGE FEATURES
PROGNOSTIC MODELS
AGE
COMORBIDITIES
VITAL SIGNS
IMAGE FEATURES
SEX

Methods
Logistic regression
34%
Neural network / deep
learning
32%
Other (Cox PH, SVM,
random forest,..)
34%

Performance: AUC
• General population models: 0.71 to ≥0.99
• Diagnostic models: 0.65 to ≥ 0.99
• Diagnostic severity models: 0.80 to ≥ 0.99
• Diagnostic imaging models: 0.70 to ≥ 0.99
• Prognosis models: 0.54 to ≥0.99
(prediction horizon varies from 1 to 37 days, if reported)

How often can we trust the estimated
predictive performance?
o187 / 236 models
o121 / 236 models
o4 / 236 models

Characteristics of reviewed models
Median (IQR)
Sample size 344 (134 to 748)
Number of events 70 (37 to 160)

Risk of bias
226 high
6 unclear
4 low

Prediction in new patients is a lot worse
Gupta et al, Eur Resp J, 2020

A good model could improve care and reduce
costs
Help allocate scarce resources
Why care about bias?

Poor models can make things worse
Inaccurate predictions -> harmful decisions
(Van Calster & Vickers, Med Dec Mak, 2015)
ICU scores during H1N1 pandemic (Enfield, Chest, 2011)

What is bias anyway?
policymakers and ethicists

statisticians

epidemiologists
“an error in the conception and design of a study – or in
the collection, analysis, interpretation, reporting,
publication, or review or data – leading to results or
conclusions that are systematically (as opposed to
randomly) different from truth”
Porta M, ed. A Dictionary of Epidemiology. 6th Edition. Oxford: Oxford University Press, 2014.

Risk of bias in prediction models
“We define risk of bias to occur when
shortcomings in study design, conduct, or
analysis could lead to systematically
distorted estimates of a model’s
predictive performance.”
PROBAST

The numbers are only as good as the process
producing them
Participants Predictors Outcome Analysis
Signalling
questions in
4 domains:

Risk of bias – common causes

Participants
1. Were appropriate data sources used?
2. Were all inclusion and exclusion of participants appropriate?

Problem: diagnose covid-19 vs not covid-19
doi.org/10.1101/2020.04.24.20078998

Issues
Set of images from patients with covid-19 stems from a different source than the set of
images from patients without covid-19:
1. Non-covid images not representative of typical patients suspected of having covid-19
• Metadata (e.g. age, comorbidities such as pre-existing chronic lung disease)?
• Alternative diagnoses in the target population include pathology such as heart
failure or pulmonary embolism,…
• Predictive performance (AUC, PPV (precision), NPV, calibration) depends on
patient case-mix
2. Sets differ systematically in many respects -> spurious correlations -> performance
inflated
• geographical location, time period (pre- or post 12/2019), type of machine,
settings of the imaging procedure, image preparation/preprocessing
3. Frankenstein datasets
• Combinations of existing databases of images
• Same images often included >1x
• Train and test set no longer independent

Problem: distinguish between mild and severe covid-19
doi.org/10.1007/s00330-020-06817-6

Predictors
1. Were predictors defined and assessed in a similar way for all
participants?
2. Were predictor assessments made without knowledge of
outcome data?
3. Are all predictors available at the time the model is intended to
be used?

Problem: predict mortality due to covid-19
doi.org/10.1016/S2589-7500(20)30217-X

Comparable?
Measured one time vs measured
throughout the hospital stay?
Actionable for doctors?
Are we predicting death or are we
diagnosing it (the patient is already
dead/dying)?

doi.org/10.1136/bmj.m3339

Outcome
1. Was the outcome determined appropriately?
2. Was a pre-specified or standard outcome definition used?
3. Were predictors excluded from the outcome definition?
4. Was the outcome defined and determined in a similar way for all
participants?
5. Was the outcome determined without knowledge of predictor
information?
6. Was the time interval between predictor assessment and outcome
determination appropriate?

arXiv:2003.07347v3
Problem: identify people at risk in the general population

arXiv:2003.07347v3
Is it appropriate to predict covid-19 hospitalization risk
without data on covid-19 hospitalizations?

doi: 10.1136/bmj.m3731

Analysis
1. Were there a reasonable number of participants with the outcome?
2. Were continuous and categorical predictors handled appropriately?
3. Were all enrolled participants included in the analysis?
4. Were participants with missing data handled appropriately?
5. Was selection of predictors based on univariable analysis avoided?
6. Were complexities in the data (e.g. censoring, competing risks,
sampling of control participants) accounted for appropriately?
7. Were relevant model performance measures evaluated appropriately?
8. Were model overfitting and optimism in model performance accounted
for?
9. Do predictors and their assigned weights in the final model correspond
to the results from the reported multivariable analysis?

Problem: predict covid-19 mortality
DOI: 10.1093/cid/ciaa538
Very little data to learn from
-> risk of overfitting

Handling of missing data for training data: not reported
Excluding patients with missing data leads to biased results when the analyzed
individuals are a selective subgroup from the original sample

doi: 10.1136/bmj.m3731

• Some associations may be spurious and predictors may no longer be
important after you take others into account
• Predictors known from previous research to be important may not reach
statistical significance (for example, due to small sample size)
• Some predictors are important only after adjustment for other predictors

• How far ahead are we predicting? Not everyone is followed up for the
same amount of time (16 hours vs > 1 month)
• Excludes over half of patients!
• Survival analysis uses available information on all patients and is more
appropriate for this type of data

• Assumes a linear effect of age

• Very little data for testing
• Calibration is not assessed

Conclusion
• Despite reports of impressive predictive performance, much
of the growing body of literature on prediction research for
covid-19 is of low quality.
• Don‘t trust a good reported performance alone – study
design & analysis & validation matters!
• Prediction is not just a methodological exercise to get the
best performance on your dataset. You need to be able to
trust the predictions for real patients.

If it’s not reported, it’s unclear to everyone else
but yourself
22 items deemed essential for transparent reporting of a prediction model study

Questions?
laure.wynants@maastrichtuniversity.nl
@laure_wynants
https://www.covprecise.org/living-review/

Bias in covid 19 models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bias in covid 19 models

Similar to Bias in covid 19 models (20)

Recently uploaded

Recently uploaded (20)

Bias in covid 19 models