Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Absence of a gold standard in diagnostic test accuracy research

342 views

Published on

These slides were presented on June 7 2017 during a pre-conference workshop of the WEON conference (Dutch society for epidemiology)

Published in: Science
  • Be the first to comment

  • Be the first to like this

Absence of a gold standard in diagnostic test accuracy research

  1. 1. Absence of a gold standard in diagnostic test accuracy research
 with application in context of childhood TB Maarten van Smeden, PhD Post-doctoral researcher Julius Center for Health Sciences and Primary Care WEON 2017 Pre-conference Accounting for Measurement Error in Epidemiology Antwerp, June 7, 2017
  2. 2. Outline • Diagnostic test accuracy • The problem: absence of a gold standard • Possible solution: latent class analysis in context of TB
  3. 3. Diagnostic testing
  4. 4. Diagnostic testing
  5. 5. Diagnostic testing
  6. 6. Diagnostic testing • “New test better than the existing test(s)?” • “(Where to) add new test to diagnostic pathway?” • “Recommend new test in practice guidelines?” Fig from: Bossuyt, BMJ, 2006
  7. 7. Diagnostic test accuracy studies (DTA) • Evaluation of “new” diagnostic tests (=index test) by comparison to a “gold standard” • Misclassification probabilities of index test: sensitivity, specificity, negative/positive predictive values, etc.
  8. 8. Classical DTA analysis Subjects undergo the index test (T) and gold standard test (GS) GS + GS - T + A C T - B D
  9. 9. Classical DTA analysis Sensitivity (Se) = A/(A+B)
 Specificity (Sp) = D/(D+C) GS + GS - T + A C T - B D
  10. 10. Reporting guideline: STARD
  11. 11. Reporting guideline: STARD “.. a gold standard would be an error-free reference standard”
  12. 12. All that glitters is not gold • Commonly the best available reference standard: Se < 1 and Sp < 1: not a “gold standard”. 
 
 Because:
 detection limits (e.g. culture), infeasible/not ethical to execute in some patients (e.g. biopsy), observer errors (e.g. MRI), etc.
  13. 13. All that glitters is not gold • Commonly the best available reference standard: Se < 1 and Sp < 1: not a “gold standard”. 
 
 -> misclassifications of the target condition by the reference standard (= measurement error) 

  14. 14. When using imperfect reference standard Assuming: reference standard Se = 1, index test Sp = Se = 0.7, conditional independence reference standard and index test
 0.5 0.6 0.7 0.8 0.9 1.0 Specificity Reference Standard E[SenstivityIndexTest] Disease prevalence = 0.05 Disease prevalence = 0.25 Disease prevalence = 0.50 0.3 0.4 0.5 0.6 0.7
  15. 15. When using imperfect reference standard • Bias, sometimes called “reference standard bias”. Not necessarily a lower bound of Se/Sp
 
 • Philosophical problems when index test is believed to be more accurate than the best available reference standard
  16. 16. When using imperfect reference standard Absence of a gold standard Misclassifications by the reference standard -> 
 no straightforward approaches to estimation of misclassification probabilities of index tests (that are valid)
  17. 17. Tuberculosis (TB) Paulsen, Nature, 2013 ■ FIGURE 2.16a Top causes of death worldwide in 2012.a,b Deaths from TB among HIV-positive people are shown in grey.c Road injury HIV/AIDS Diabetes mellitus Diarrheal diseases Tracheal, bronchus, lung cancers TB Chronic obstructive pulmonary disease Lower respiratory infections Stroke Ischaemic heart disease 0 1 2 3 4 5 6 7 Millions ■ F Est 20 in g a This is the latest year for which estimates for all causes are currently available. See WHO Global Health Observatory data repository, available at http://apps.who.int/gho/data/node.main.GHECOD (accessed 27 August 2015). b For HIV/AIDS, the latest estimates of the number of deaths in 2012 a F t o b i b D d HIV WPR 9.2 8.3–10.0 0.29 Global 35.2 30.9–39.4 8.4 WHO Global TB report 2015
  18. 18. Data • 749 hospitalised children with suspected pulmonary TB in Cape Town, South Africa • Study procedures, a number of tests for TB for each subject: • Microscopy • Culture • Xpert (NAAT) • TST (skin test) • Radiography
  19. 19. Primary publication
  20. 20. Primary publication 48%: “possible tuberculosis”
  21. 21. Solution?
  22. 22. • The idea: Simple latent class model Pr(T = 1) = ⇡Se + (1 ⇡)(1 Sp) = Pr(D = 1)Pr(T = 1|D = 1)+ Pr(D = 0)Pr(T = 1|D = 0)
  23. 23. • With two conditionally independent binary tests (T0 and T1) Simple latent class model Pr(T0 = 1, T1 = 1) = ⇡Se0Se1+ (1 ⇡)(1 Sp0)(1 Sp1)
  24. 24. • With J conditionally independent tests (and bit of algebra): Simple latent class model Pr(T1, . . . , TJ ) = ⇡ JY j=1 Se Tj j (1 Sej)1 Tj + (1 ⇡) JY j=1 Sp 1 Tj j (1 Spj)Tj
  25. 25. Latent class model estimation • Maximum likelihood • Gibbs sampling
  26. 26. Heuristic model for TB data
  27. 27. Heuristic model for TB data • Conditional independence between all tests is unlikely • Conditional dependence between: Xpert, culture, microscopy, and TST among TB diseased due to “bacterial load” • Bacterial load modelled by a random effect
  28. 28. Modeling dependence
  29. 29. Pairwise correlation residual (misfit) Conditional independence model Random effects model
  30. 30. Main results Conditional independence model Random effects model
  31. 31. Is latent class analysis useful? • In TB example, I believe: yes • More realistic than assuming reference standard (culture) has Se = Sp = 1 • Results ‘robust’ to changing prior distributions and conditional dependence structure • Lack of robust alternative approaches for DTA in the absence of a gold standard
  32. 32. Is latent class analysis useful? • But: • Latent class analysis for DTA is still rare
  33. 33. Latent class analysis in diagnostic research Systematic review from 2014 • 69 theoretical papers • 64 applied papers in human research + 47 in veterinary sciences • applications of LCA still not common in human diagnostic research van Smeden, AJE, 2014
  34. 34. Is latent class analysis useful? • But: • Latent class analysis for DTA is still rare • Robustness to misspecification of the conditional dependence structure is a concern
  35. 35. Is latent class analysis useful? • But: • Latent class analysis for DTA is still rare • Robustness to misspecification of the conditional dependence structure is a concern • Identifiability requirements
  36. 36. Why Bayesian? • Practical arguments: • Model specifications in non-commercial software packages (e.g. randomLCA vs rjags in R) • (Weakly) informative prior distributions can solve non- identifiability problems • Additional calculations (e.g. positive/negative predictive values with CrI)
  37. 37. Final remarks • Misclassification in DTA studies is often both the primary topic of study (for the index test) and the problem (when occurring in the reference standard) • Model based estimation of index test accuracy by latent class analysis can be useful • There is some evidence that robustness of the latent class model can be improved when disease status can be verified with certainty in a subset • While the focus of this talk was on DTA, other studies such as “incremental value” studies suffer from the same problems
  38. 38. Acknowledgements Thanks to all co-authors in: Supported by a grant from Canadian Institutes of Health Research (MOP #89857)

×