Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach <...
The Kattan challenge A clinician comes to you  with two models (or tests) and wants to know which to use. What statistical...
Traditional biostatistical metrics   Sensitivity Specificity PPV NPV LR+ LR- AUC (Youden) Brier (mean squared error) % Fre...
Which test is best? <ul><li>Sensitivity / specificity insufficient to determine which test should be used: </li></ul><ul><...
Conclusion about traditional metrics <ul><li>Traditional biostatistical techniques for evaluating models, markers and test...
Prostate cancer High risk (10%) Low Risk (90%)
Prostate cancer Use of the model to determine treatment would avoid 47 unnecessary treatments at the expense of failing to...
Discrimination <ul><li>Probability of a correct prediction in a pair of discordant patients </li></ul><ul><li>C index or a...
Discrimination rules! <ul><li>Most widely reported statistic </li></ul><ul><li>“ Predictive accuracy” & “discrimination” u...
From the literature <ul><li>“ The AUC was 0.734 … accurate and useful in patients with micrometastasis in the sentinel lym...
Discrimination? So what? <ul><li>How high a discrimination is “high enough” to warrant use of a model? </li></ul><ul><ul><...
Discrimination? So what? <ul><li>What increase in discrimination makes it worth measuring a marker? </li></ul><ul><ul><li>...
Extremely bad models can have extremely good discrimination <ul><li>Take Kattan nomogram (C index = 0.8) and divide all pr...
Calibration <ul><li>Concordance between predictive and actual risk </li></ul><ul><li>A model is well-calibrated if, for ev...
Calibration plot
Problem with calibration <ul><li>No good metric </li></ul><ul><ul><li>No single number expresses calibration </li></ul></u...
Incorporating clinical consequences <ul><li>“ Depends on whether sensitivity or specificity is more important” </li></ul><...
What is the “depends” parameter? <ul><li>Threshold probability </li></ul><ul><li>Predicted probability of disease is p̂  <...
Intuitively <ul><li>The threshold probability at which a patient will opt for treatment is informative of how a patient we...
Clinical net benefit <ul><li>Number of true positives per patient if false positives rate was zero </li></ul>
Application to models with a continuous endpoint 1. Select a  p t   2. Positive test defined as   3. Calculate “Clinical N...
Kallikrein panel: weight false positives by 20% ÷ (1 – 20%) Cancers found Unecessary Biopsies Net benefit Biopsy all men w...
Net benefit has simple clinical interpretation <ul><li>Net benefit of 0.142 at  p t  of 20% </li></ul><ul><li>Using the mo...
Net benefit has simple clinical interpretation <ul><li>Difference between model and treat all at  p t  of 20%. </li></ul><...
Decision curve analysis 4. Vary  p t   over an appropriate range  Vickers & Elkin Med Decis Making 2006;26:565–574 1. Sele...
Decision curve analysis
Gallina vs. Partin <ul><li>AUC 0.81 </li></ul><ul><li>AUC 0.78 </li></ul>P=0.02
Decision curve analysis
 
 
Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Mathematics Simp...
Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Decision curve a...
Decision curve analysis <ul><li>Decision curve analysis tells us about the clinical value of a model where accuracy metric...
Upcoming SlideShare
Loading in...5
×

NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach

290

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
290
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach

  1. 1. Traditional statistical methods for evaluating prediction models are uninformative: towards a decision analytic approach <ul><li>Andrew Vickers </li></ul><ul><li>Department of Epidemiology and Biostatistics </li></ul><ul><li>Memorial Sloan-Kettering Cancer Center </li></ul>
  2. 2. The Kattan challenge A clinician comes to you with two models (or tests) and wants to know which to use. What statistical method do you use to help answer the clinician?
  3. 3. Traditional biostatistical metrics   Sensitivity Specificity PPV NPV LR+ LR- AUC (Youden) Brier (mean squared error) % Free PSA < 20% 91% 40% 35% 92% 1.52 0.23 0.65 0.47 hK2 > 75 pg / ml 51% 78% 45% 82% 2.32 0.63 0.64 0.29
  4. 4. Which test is best? <ul><li>Sensitivity / specificity insufficient to determine which test should be used: </li></ul><ul><ul><li>“ Depends on whether sensitivity or specificity is more important” </li></ul></ul>
  5. 5. Conclusion about traditional metrics <ul><li>Traditional biostatistical techniques for evaluating models, markers and tests do not incorporate clinical consequences </li></ul><ul><li>Accordingly, they cannot inform clinical practice </li></ul>How do we guide the clinician?
  6. 6. Prostate cancer High risk (10%) Low Risk (90%)
  7. 7. Prostate cancer Use of the model to determine treatment would avoid 47 unnecessary treatments at the expense of failing to treat 45 patients who do require treatment.
  8. 8. Discrimination <ul><li>Probability of a correct prediction in a pair of discordant patients </li></ul><ul><li>C index or area under the curve (AUC) </li></ul><ul><li>Range: 0.5 (coin flip) – 1 (perfect) </li></ul>
  9. 9. Discrimination rules! <ul><li>Most widely reported statistic </li></ul><ul><li>“ Predictive accuracy” & “discrimination” used interchangeably </li></ul>
  10. 10. From the literature <ul><li>“ The AUC was 0.734 … accurate and useful in patients with micrometastasis in the sentinel lymph node.” </li></ul><ul><li>“ AUC of 0.78 … the use of a nomogram may be useful for predicting the probability of pCR and for the design of the proper therapeutic algorithm in locally advanced breast cancer.” </li></ul>
  11. 11. Discrimination? So what? <ul><li>How high a discrimination is “high enough” to warrant use of a model? </li></ul><ul><ul><li>0.65 </li></ul></ul><ul><ul><li>0.70 </li></ul></ul><ul><ul><li>0.80 </li></ul></ul>
  12. 12. Discrimination? So what? <ul><li>What increase in discrimination makes it worth measuring a marker? </li></ul><ul><ul><li>0.005 </li></ul></ul><ul><ul><li>0.02 </li></ul></ul><ul><ul><li>0.05 </li></ul></ul>
  13. 13. Extremely bad models can have extremely good discrimination <ul><li>Take Kattan nomogram (C index = 0.8) and divide all predictions by 10 </li></ul><ul><ul><li>Recurrence risk of 40% => 4% </li></ul></ul><ul><li>Discrimination is unaffected </li></ul><ul><li>High “predictive accuracy” but we’d tell high risk men not to worry. </li></ul>
  14. 14. Calibration <ul><li>Concordance between predictive and actual risk </li></ul><ul><li>A model is well-calibrated if, for every 100 patients given a risk of X%, close to X have the event </li></ul>
  15. 15. Calibration plot
  16. 16. Problem with calibration <ul><li>No good metric </li></ul><ul><ul><li>No single number expresses calibration </li></ul></ul><ul><li>How much miscalibration is “too much” to allow use of a model? </li></ul>
  17. 17. Incorporating clinical consequences <ul><li>“ Depends on whether sensitivity or specificity is more important” </li></ul><ul><li>We need: </li></ul><ul><ul><li>A parameter specifying relative importance of sensitivity vs. specificity </li></ul></ul><ul><ul><li>A way to incorporate that parameter in a statistical methodology </li></ul></ul>
  18. 18. What is the “depends” parameter? <ul><li>Threshold probability </li></ul><ul><li>Predicted probability of disease is p̂ </li></ul><ul><li>Define a threshold probability of disease as p t </li></ul><ul><li>Patient accepts treatment if p̂ ≥ p t </li></ul>
  19. 19. Intuitively <ul><li>The threshold probability at which a patient will opt for treatment is informative of how a patient weighs the relative harms of false-positive and false-negative results. </li></ul><ul><li>Nothing new: </li></ul><ul><ul><li>Decision analytic result since 1970’s </li></ul></ul>
  20. 20. Clinical net benefit <ul><li>Number of true positives per patient if false positives rate was zero </li></ul>
  21. 21. Application to models with a continuous endpoint 1. Select a p t 2. Positive test defined as 3. Calculate “Clinical Net Benefit” as:
  22. 22. Kallikrein panel: weight false positives by 20% ÷ (1 – 20%) Cancers found Unecessary Biopsies Net benefit Biopsy all men with elevated PSA 277 723 277 - 723 ÷ 4 96.25 Biopsy if risk ≥ 20% on panel 211 276 211 - 276 ÷ 4 142
  23. 23. Net benefit has simple clinical interpretation <ul><li>Net benefit of 0.142 at p t of 20% </li></ul><ul><li>Using the model is the equivalent of a strategy that identified the equivalent of 14.2 cancers per 100 patients with no unnecessary biopsies </li></ul>
  24. 24. Net benefit has simple clinical interpretation <ul><li>Difference between model and treat all at p t of 20%. </li></ul><ul><ul><li>0.048 </li></ul></ul><ul><li>Divide by weighting 0.048/ 0.25 = 0.19 </li></ul><ul><ul><li>19 fewer false positives per 100 patients for equal number of true positives </li></ul></ul><ul><ul><li>E.g. 19 fewer unnecessary biopsies with no missed cancers </li></ul></ul>
  25. 25. Decision curve analysis 4. Vary p t over an appropriate range Vickers & Elkin Med Decis Making 2006;26:565–574 1. Select a p t 2. Positive test defined as 3. Calculate “Clinical Net Benefit” as:
  26. 26. Decision curve analysis
  27. 27. Gallina vs. Partin <ul><li>AUC 0.81 </li></ul><ul><li>AUC 0.78 </li></ul>P=0.02
  28. 28. Decision curve analysis
  29. 31. Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Mathematics Simple Can be complex Additional data Not required Patient preferences, costs or effectiveness Endpoints Binary or continuous Continuous endpoints problematic Assess clinical value? No Yes
  30. 32. Statistical analysis vs. decision analysis Traditional statistical analysis Traditional decision analysis Decision curve analysis Mathematics Simple Can be complex Simple Additional data Not required Patient preferences, costs or effectiveness Informal, general estimates Endpoints Binary or continuous Continuous endpoints problematic Binary or continuous Assess clinical value? No Yes Yes
  31. 33. Decision curve analysis <ul><li>Decision curve analysis tells us about the clinical value of a model where accuracy metrics do not </li></ul><ul><li>Decision curve analysis does not require either: </li></ul><ul><ul><li>Additional data </li></ul></ul><ul><ul><li>Individualized assessment </li></ul></ul><ul><li>Simple to use software is available to implement decision curve analysis </li></ul><ul><ul><li>www.decisioncurveanalysis.org </li></ul></ul>

×