Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach

279

Published on

Published in: Technology, Health & Medicine
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
279
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach
• Andrew Vickers
• Department of Epidemiology and Biostatistics
• Memorial Sloan-Kettering Cancer Center
• 2. The Kattan challenge A clinician comes to you with two models (or tests) and wants to know which to use. What statistical method do you use to help answer the clinician?
• 3. Traditional biostatistical metrics   Sensitivity Specificity PPV NPV LR+ LR- AUC (Youden) Brier (mean squared error) % Free PSA < 20% 91% 40% 35% 92% 1.52 0.23 0.65 0.47 hK2 > 75 pg / ml 51% 78% 45% 82% 2.32 0.63 0.64 0.29
• 4. Which test is best?
• Sensitivity / specificity insufficient to determine which test should be used:
• “ Depends on whether sensitivity or specificity is more important”
• Traditional biostatistical techniques for evaluating models, markers and tests do not incorporate clinical consequences
• Accordingly, they cannot inform clinical practice
How do we guide the clinician?
• 6. Prostate cancer High risk (10%) Low Risk (90%)
• 7. Prostate cancer Use of the model to determine treatment would avoid 47 unnecessary treatments at the expense of failing to treat 45 patients who do require treatment.
• 8. Discrimination
• Probability of a correct prediction in a pair of discordant patients
• C index or area under the curve (AUC)
• Range: 0.5 (coin flip) – 1 (perfect)
• 9. Discrimination rules!
• Most widely reported statistic
• “ Predictive accuracy” & “discrimination” used interchangeably
• 10. From the literature
• “ The AUC was 0.734 … accurate and useful in patients with micrometastasis in the sentinel lymph node.”
• “ AUC of 0.78 … the use of a nomogram may be useful for predicting the probability of pCR and for the design of the proper therapeutic algorithm in locally advanced breast cancer.”
• 11. Discrimination? So what?
• How high a discrimination is “high enough” to warrant use of a model?
• 0.65
• 0.70
• 0.80
• 12. Discrimination? So what?
• What increase in discrimination makes it worth measuring a marker?
• 0.005
• 0.02
• 0.05
• 13. Extremely bad models can have extremely good discrimination
• Take Kattan nomogram (C index = 0.8) and divide all predictions by 10
• Recurrence risk of 40% => 4%
• Discrimination is unaffected
• High “predictive accuracy” but we’d tell high risk men not to worry.
• 14. Calibration
• Concordance between predictive and actual risk
• A model is well-calibrated if, for every 100 patients given a risk of X%, close to X have the event
• 15. Calibration plot
• 16. Problem with calibration
• No good metric
• No single number expresses calibration
• How much miscalibration is “too much” to allow use of a model?
• 17. Incorporating clinical consequences
• “ Depends on whether sensitivity or specificity is more important”
• We need:
• A parameter specifying relative importance of sensitivity vs. specificity
• A way to incorporate that parameter in a statistical methodology
• 18. What is the “depends” parameter?
• Threshold probability
• Predicted probability of disease is p̂
• Define a threshold probability of disease as p t
• Patient accepts treatment if p̂ ≥ p t
• 19. Intuitively
• The threshold probability at which a patient will opt for treatment is informative of how a patient weighs the relative harms of false-positive and false-negative results.
• Nothing new:
• Decision analytic result since 1970’s
• 20. Clinical net benefit
• Number of true positives per patient if false positives rate was zero
• 21. Application to models with a continuous endpoint 1. Select a p t 2. Positive test defined as 3. Calculate “Clinical Net Benefit” as:
• 22. Kallikrein panel: weight false positives by 20% ÷ (1 – 20%) Cancers found Unecessary Biopsies Net benefit Biopsy all men with elevated PSA 277 723 277 - 723 ÷ 4 96.25 Biopsy if risk ≥ 20% on panel 211 276 211 - 276 ÷ 4 142
• 23. Net benefit has simple clinical interpretation
• Net benefit of 0.142 at p t of 20%
• Using the model is the equivalent of a strategy that identified the equivalent of 14.2 cancers per 100 patients with no unnecessary biopsies
• 24. Net benefit has simple clinical interpretation
• Difference between model and treat all at p t of 20%.
• 0.048
• Divide by weighting 0.048/ 0.25 = 0.19
• 19 fewer false positives per 100 patients for equal number of true positives
• E.g. 19 fewer unnecessary biopsies with no missed cancers
• 25. Decision curve analysis 4. Vary p t over an appropriate range Vickers & Elkin Med Decis Making 2006;26:565–574 1. Select a p t 2. Positive test defined as 3. Calculate “Clinical Net Benefit” as:
• 26. Decision curve analysis
• 27. Gallina vs. Partin
• AUC 0.81
• AUC 0.78
P=0.02
• 28. Decision curve analysis
• 29.
• 30.
• 31. Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Mathematics Simple Can be complex Additional data Not required Patient preferences, costs or effectiveness Endpoints Binary or continuous Continuous endpoints problematic Assess clinical value? No Yes
• 32. Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Decision curve analysis Mathematics Simple Can be complex Simple Additional data Not required Patient preferences, costs or effectiveness Informal, general estimates Endpoints Binary or continuous Continuous endpoints problematic Binary or continuous Assess clinical value? No Yes Yes
• 33. Decision curve analysis
• Decision curve analysis tells us about the clinical value of a model where accuracy metrics do not
• Decision curve analysis does not require either: