Measuring clinical utility: uncertainty in Net Benefit

Measuring clinical utility:
uncertainty in Net Benefit
Laure Wynants
ISCB Milan, August 29th 2023

impact of
prediction models
in clinical practice
Example: The IOTA ADNEX model
Multinomial logistic regression model based on age, ca-125, 6 ultrasound characteristics and
center type
Among patients with ovarian masses, refer patients with highest risk of malignancy to
oncological care/surgery
Wynants, L., van Smeden, M., McLernon, D.J. et al. Three myths about risk thresholds for prediction models. BMC Med 17, 192 (2019). 2

A link between costs,
benefits and risk
threshold to intervene
The risk threshold can be chosen to minimize the expected total costs. For a (calibrated) risk model:
𝑡 =
𝐶𝐹𝑃 − 𝐶𝑇𝑁
𝐶𝐹𝑃 + 𝐶𝐹𝑁 − 𝐶𝑇𝑃 − 𝐶𝑇𝑁
Assuming 𝐶𝑇𝑁 = 0 and recognizing the benefit of intervening when needed 𝐵𝑇𝑃 = 𝐶𝐹𝑁 − 𝐶𝑇𝑃:
𝑡 =
𝐶𝐹𝑃
𝐶𝐹𝑃 + 𝐵𝑇𝑃
And
𝑡
1−𝑡
=
𝐶𝐹𝑃
𝐵𝑇𝑃
Wynants, L., van Smeden, M., McLernon, D.J. et al. Three myths about risk thresholds for prediction models. BMC Med 17, 192 (2019). 3

Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74.
Van Calster B, Valentin L, Froyman W, Landolfo C, Ceusters J, Testa A C et al. Validation of models to diagnose ovarian cancer in patients managed surgically or
conservatively: multicentre cohort study BMJ 2020; 370 :m2614 doi:10.1136/bmj.m2614
5
Net Benefit

How certain are you?
Kerr KF, Marsh TL, Janes H. The Importance of Uncertainty and Opt-In v. Opt-Out: Best Practices for Decision Curve Analysis. Med Decis Making. 2019 Jul;39(5):491-
492.
6
“ In contemplating a change from a default policy
(treat-all or treat-none) to a policy based on
estimated risk, policymakers should know the
strength of the evidence in favor of the policy change
– they need some quantification of uncertainty such
as confidence intervals”
Kerr, Marsh, Janes 2019

7
N = 4905, 1039 events
N = 250, 53 events
(average external validation sample size)

CI around NBmodel
Pro
• Bootstrap, closed form asymptotic methods, and Bayesian methods available
for pointwise CI or confidence bands for NBmodel (Zhang 2018, Marsh 2020, Pfeiffer 2020,
Sande 2020, Cruz 2023)
• Reflects uncertainty when treat none is the only competing strategy
Con
• Does not reflect uncertainty if treat all or other models/tests are (among the)
competing strategies
Zhang Z et al Transl Med. 2018;6:308; Marsh et al Biometrics. 2020;76:843–52; Pfeiffer & Gail Biom J. 2020;62:764–76; Sande SZ et al Stat Med. 2020;39:2980–3002;
Cruz & Korthauer arxiv 2023.
8

Alternatives
1. 95% CI difference NB with each strategy
2. P(useful)
3. EVPI
11

95% CI difference NB with all strategies
Van Calster B, Valentin L, Froyman W, Landolfo C, Ceusters J, Testa A C et al. Validation of models to diagnose ovarian cancer in patients managed surgically or
conservatively: multicentre cohort study BMJ 2020; 370 :m2614 doi:10.1136/bmj.m2614
https://github.com/mdbrown/rmda
12
t=1%, n=250
0.206 (95%CI 0.153 to 0.254)
-0.003 (95% CI -0.011 to 0.002)
t=10%, n=250
Diff treat none 0.192 (95% CI 0.140 to 0.245)
Diff treat all 0.066 (95% CI 0.052 to 0.078)

Statistical test for ΔNB=0?
• Would delay the introduction of reliable models, and lead to patient harm
• Traditional decision theory dictates that we choose the model with the highest NB,
irrespective of statistical significance
• Would make research unfeasible
• Sample sizes in the millions sometimes needed demonstrate a statistically significant
difference between two models
• Only valid for the population the data was gathered in
• Too much uncertainty
• If risk-neutral: use model but continue validation, revise strategy later if needed
• If not risk-neutral: depending on (de-)implementation costs, wait to replace well-
established practice until uncertainty is “sufficiently” reduced -> context-specific α
Vickers AJ, Van Calster B, Wynants L, Steyerberg EW. Decision curve analysis: confidence intervals and hypothesis testing for net benefit. Diagn Progn Res. 2023 Jun
6;7(1):11.; Claxton K. The irrelevance of inference: a decision-making approach to the stochastic evaluation of health care technologies. J Health Econ. 1999;18:341–
64.
14

95% CI for difference in NB
Pro
• Shows uncertainty due to limited sample size
• Bootstrap, closed form asymptotic and Bayesian methods available (Vickers 2008, Zhang
2018, Marsh 2020, Pfeiffer 2020, Sande 2020, Cruz 2023)
Con
• No yes/no decision (justify your alpha)
• Only valid in the population the data came from
Vickers AJ et al BMC Med Inform Decis Mak. 2008;8:53.; Zhang Z et al Transl Med. 2018;6:308; Marsh et al Biometrics. 2020;76:843–52; Pfeiffer & Gail Biom J.
2020;62:764–76; Sande SZ et al Stat Med. 2020;39:2980–3002; Cruz & Korhauer arxiv 2023.
15

16
Heterogeneity in
utility between
populations
Test accuracy
(affects 70% of
MA; Willis BMC
med res meth
2011)
Miscalibration
(Van Calster et al.
MDM 2015)
Disease
prevalence
(Hilden, Stat
Med., 2000)

P(Useful)
• Trivariate meta-analysis model
Wynants L, Riley RD, Timmerman D, Van Calster B. Random-effects meta-analysis of the clinical utility of tests and prediction models. Stat Med. 2018;37:2034–52.
ERRATUM https://doi.org/10.1002/sim.9597
17
Sample logit prevalence, logit
sensitivity, logit specificity for a new
center
Calculate NB of the model the
competing strategies in the new center

P(Useful)
Systematic review ADNEX
37 studies reporting sens/spec/prev
P(useful)=0.95
BARREÑADA L, LEDGER A, DHIMAN P, et al. The ADNEX risk prediction model for ovarian cancer diagnosis: A systematic review
and meta-analysis of external validation studies. medRxiv. 2023:2023.07.12.23291935. doi:10.1101/2023.07.12.23291935 19

P(Useful)
Wynants L, Riley RD, Timmerman D, Van Calster B. Random-effects meta-analysis of the clinical utility of tests and prediction models. Stat Med. 2018;37:2034–52. 20
2403 patients from 18 centers 2320 patients from 11 studies

P(Useful)
Pro
• Bayesian and bootstrap methods available for meta-analysis and single validation (Wynants 2018,
Sadatsafavi 2023, Cruz 2023)
• Meta-analysis recognizes performance (and NB) varies between populations
Con
• No yes/no decision
• Disadvantage: harm induced in specific centers may be very small or even negligible
Wynants et al Stat Med. 2018;37:2034–52.; ERRATUM https://doi.org/10.1002/sim.9597; Sadatsafavi et al Med. Decis. Making. 2023;43(5):564-75; Cruz & Korthauer
2023 doi:10.48550/arXiv.2308.02067.
21

EVPI
• Do not just look at the probability that using the model would lead to harm
• Also look at how much harm it would do if the model is not the optimal strategy
• value of information metric to quantify the expected return on investment in research
Wilson, E.C.F. A Practical Guide to Value of Information Analysis. PharmacoEconomics 33, 105–121 (2015). https://doi.org/10.1007/s40273-014-0219-x 22

Pick a box
Thanks to Mohsen Sadatsafavi for this example 23
EVPI = expected reward under perfect info - expected reward under current info
EVPI = 62,5 € - 50 €
= 12,5€
Box A: 0€ or 50€ Box B: 0€ or
100€
P
0 0 0,25
0 100 0,25
50 0 0,25
50 100 0,25
E(reward)=25 E(reward)=50

What is the maximum bribe you should pay?
24
EVPI = 62,5 € - 50 €
= 12,5€
This is the expected gain of having perfect information
100€
P Decision if you had
perfect info
Reward if you had
perfect info
0 0 0,25 A or B 0
0 100 0,25 B 100
50 0 0,25 A 50
50 100 0,25 B 100
E(reward)=25 E(reward)=50 E(reward)=62,5

Opportunity loss
25
EVPI = 62,5 € - 50 €
= 12,5€
100€
P Decision if you had
perfect info
Reward if you had
perfect info
B-A Loss under current
info
0 0 0,25 A or B 0 0 0
0 100 0,25 B 100 100 0
50 0 0,25 A 50 -50 50
50 100 0,25 B 100 50 0
E(reward)=25 E(reward)=50 E(reward)=62,5 E(difference)=25 E(loss)= 12,5

EVPI
Wilson, E.C.F. A Practical Guide to Value of Information Analysis. PharmacoEconomics 33, 105–121 (2015). https://doi.org/10.1007/s40273-014-0219-x 26
ΔNB
Loss
(net
true
positives/100)
-0,15 -0,10 -0,05 0 0,05 0,10 0,15
-
0,05
0,10
0,15

EVPI external validation
Sadatsafavi M, Lee TY, Wynants L, Vickers AJ, Gustafson P. Value-of-Information Analysis for External Validation of Risk Prediction Models. Med. Decis. Making.
2023;43(5):564-75. doi:10.1177/0272989X231178317
28

EVPI
29
t=1%, n=250
0.206 (95%CI 0.153 to 0.254)
-0.003 (95% CI -0.011 to 0.002)
0.39
0.0006
Scaled to EU population:
- a gain of 210 correctly detected cancers/year
- equivalent to a gain of 1890 avoided false referrals/year
t=10%, n=250
Diff treat none 0.192 (95% CI 0.140 to 0.245)
Diff treat all 0.066 (95% CI 0.052 to 0.078)
P(useful) 1
EVPI 0

EVPI
30
Cruz 2023
Cruz GNF, Korthauer K. Bayesian Decision Curve Analysis with bayesDCA. 2023 doi:10.48550/arXiv.2308.02067.

EVPI – extension to heterogeneous data
31
N centers 37
Total N 9989
NB model (t=0.1) 0.28
(Cr I 0.26 to 0.33, Pr I 0.09 to 0.67)
NB TA 0.25
(Cr I 0.18 to 0.29, Pr -0.01 to 0.63)
Prob Useful 0.96
Voi perfect info population level 0
Voi perfect center level info * 0.00038
Voi partial perfect center level info 0.00003
Voi perfect info in the first center 0.0821
*NB perfect center-level info – NB current info
E𝜓E𝜃|𝜓max 𝑁𝐵𝑚𝑜𝑑𝑒𝑙(𝜃), 𝑁𝐵𝑎𝑙𝑙(𝜃) , 0 − max(E𝜓E𝜃|𝜓𝑁𝐵𝑚𝑜𝑑𝑒𝑙 𝜃 , E𝜓E𝜃|𝜓𝑁𝐵𝑎𝑙𝑙(𝜃), 0)

EVPI
Pro
Bayesian, bootstrap and closed-form asymptotic methods available (Sadatsafavi 2023, Cruz
2023)
Yes/no answer (need to decide how many € you are willing to pay for 1 net TP)
Con
Interpretation (net number of TPs, expected loss instead of expected difference)
Perfect information may be utopic (EVSI/ expected net gain of sampling)
Sadatsafavi M, Lee TY, Wynants L, Vickers AJ, Gustafson P. Value-of-Information Analysis for External Validation of Risk Prediction Models. Med. Decis. Making.
2023;43(5):564-75. doi:10.1177/0272989X231178317; Cruz GNF, Korthauer K. Bayesian Decision Curve Analysis with bayesDCA. 2023 doi:10.48550/arXiv.2308.02067. 32

Conclusion
Limited sample sizes and between-center heterogeneity are driving
uncertainty re. the optimal strategy to use in a DCA
Quantifying the degree of uncertainty is relevant for understanding the value
of additional external validation, but beware of mindless NHST
CIs around ΔNB, P(Useful) and EVPI quantify the degree of uncertainty
In practice, many considerations may play a role in the decision (not) to
introduce a model, e.g. lack of face validity, no practical implementation in
workflow, …
33

References
• Wynants L, Riley RD, Timmerman D, Van Calster B. Random-effects meta-analysis of the clinical
utility of tests and prediction models. Stat Med. 2018;37:2034–52.
• Vickers AJ, Van Calster B, Wynants L, Steyerberg EW. Decision curve analysis: confidence intervals
and hypothesis testing for net benefit. Diagn Progn Res. 2023 Jun 6;7(1):11.
• Sadatsafavi M, Lee TY, Wynants L, Vickers AJ, Gustafson P. Value-of-Information Analysis for External
Validation of Risk Prediction Models. Med. Decis. Making. 2023;43(5):564-75.
Laure.wynants@maastrichtuniversity.nl
34

Measuring clinical utility: uncertainty in Net Benefit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Measuring clinical utility: uncertainty in Net Benefit

Similar to Measuring clinical utility: uncertainty in Net Benefit (20)

Recently uploaded

Recently uploaded (20)

Measuring clinical utility: uncertainty in Net Benefit