Absence of a gold standard in diagnostic test accuracy research

Absence of a gold standard in diagnostic test
accuracy research 
with application in context of childhood TB
Maarten van Smeden, PhD
Post-doctoral researcher Julius Center for Health Sciences and Primary Care
WEON 2017 Pre-conference Accounting for Measurement Error in Epidemiology
Antwerp, June 7, 2017

Outline
• Diagnostic test accuracy
• The problem: absence of a gold standard
• Possible solution: latent class analysis in context of TB

Diagnostic testing
• “New test better than the existing test(s)?”
• “(Where to) add new test to diagnostic pathway?”
• “Recommend new test in practice guidelines?”
Fig from: Bossuyt, BMJ, 2006

Diagnostic test accuracy studies (DTA)
• Evaluation of “new” diagnostic tests (=index test) by
comparison to a “gold standard”
• Misclassiﬁcation probabilities of index test: sensitivity,
speciﬁcity, negative/positive predictive values, etc.

Classical DTA analysis
Subjects undergo the index test (T) and gold standard test (GS)
GS + GS -
T + A C
T - B D

Classical DTA analysis
Sensitivity (Se) = A/(A+B) 
Speciﬁcity (Sp) = D/(D+C)
GS + GS -
T + A C
T - B D

Reporting guideline: STARD
“.. a gold standard would be an error-free reference standard”

All that glitters is not gold
• Commonly the best available reference standard: Se < 1 and
Sp < 1: not a “gold standard”.  
 
Because: 
detection limits (e.g. culture), infeasible/not ethical to execute
in some patients (e.g. biopsy), observer errors (e.g. MRI), etc.

All that glitters is not gold
• Commonly the best available reference standard: Se < 1 and
Sp < 1: not a “gold standard”.  
 
-> misclassiﬁcations of the target condition by the reference
standard (= measurement error)

When using imperfect reference standard
Assuming: reference standard Se = 1, index test Sp = Se = 0.7, conditional independence reference standard and index test 
0.5 0.6 0.7 0.8 0.9 1.0
Specificity Reference Standard
E[SenstivityIndexTest]
Disease prevalence = 0.05
0.3
0.4
0.5
0.6
0.7

• Bias, sometimes called “reference standard bias”. Not
necessarily a lower bound of Se/Sp 
 
• Philosophical problems when index test is believed to be
more accurate than the best available reference standard

Absence of a gold standard
Misclassiﬁcations by the reference standard ->  
no straightforward approaches to estimation of
misclassiﬁcation probabilities of index tests (that are valid)

Tuberculosis (TB)
Paulsen, Nature, 2013
■ FIGURE 2.16a
Top causes of death worldwide in 2012.a,b Deaths from TB
among HIV-positive people are shown in grey.c
Road injury
HIV/AIDS
Diabetes mellitus
Diarrheal diseases
Tracheal, bronchus,
lung cancers
TB
Chronic obstructive
pulmonary disease
Lower respiratory
infections
Stroke
Ischaemic heart
disease
0 1 2 3 4 5 6 7
Millions
■ F
Est
20
in g
a This is the latest year for which estimates for all causes are currently
available. See WHO Global Health Observatory data repository,
available at http://apps.who.int/gho/data/node.main.GHECOD
(accessed 27 August 2015).
b For HIV/AIDS, the latest estimates of the number of deaths in 2012
a F
t
o
b
i
b D
d
HIV
WPR 9.2 8.3–10.0 0.29
Global 35.2 30.9–39.4 8.4
WHO Global TB report 2015

Data
• 749 hospitalised children with suspected pulmonary TB in
Cape Town, South Africa
• Study procedures, a number of tests for TB for each subject:
• Microscopy
• Culture
• Xpert (NAAT)
• TST (skin test)
• Radiography

Primary publication
48%: “possible tuberculosis”

• The idea:
Simple latent class model
Pr(T = 1) = ⇡Se + (1 ⇡)(1 Sp)
= Pr(D = 1)Pr(T = 1|D = 1)+
Pr(D = 0)Pr(T = 1|D = 0)

• With two conditionally independent binary tests (T0 and T1)
Pr(T0 = 1, T1 = 1) = ⇡Se0Se1+
(1 ⇡)(1 Sp0)(1 Sp1)

• With J conditionally independent tests (and bit of algebra):
Pr(T1, . . . , TJ ) = ⇡
JY
j=1
Se
Tj
j (1 Sej)1 Tj
+
(1 ⇡)
JY
j=1
Sp
1 Tj
j (1 Spj)Tj

Latent class model estimation
• Maximum likelihood
• Gibbs sampling

Heuristic model for TB data
• Conditional independence
between all tests is unlikely
• Conditional dependence
between: Xpert, culture,
microscopy, and TST among TB
diseased due to “bacterial load”
• Bacterial load modelled by a
random effect

Pairwise correlation residual (misﬁt)
Conditional independence model Random effects model

Main results
Conditional independence model Random effects model

Is latent class analysis useful?
• In TB example, I believe: yes
• More realistic than assuming reference standard (culture)
has Se = Sp = 1
• Results ‘robust’ to changing prior distributions and
conditional dependence structure
• Lack of robust alternative approaches for DTA in the
absence of a gold standard

• But:
• Latent class analysis for DTA is still rare

Latent class analysis in diagnostic research
Systematic review from 2014
• 69 theoretical papers
• 64 applied papers in human research + 47 in veterinary sciences
• applications of LCA still not common in human diagnostic research
van Smeden, AJE, 2014

• But:
• Robustness to misspeciﬁcation of the conditional
dependence structure is a concern

• But:
• Robustness to misspeciﬁcation of the conditional
dependence structure is a concern
• Identiﬁability requirements

Why Bayesian?
• Practical arguments:
• Model speciﬁcations in non-commercial software packages
(e.g. randomLCA vs rjags in R)
• (Weakly) informative prior distributions can solve non-
identiﬁability problems
• Additional calculations (e.g. positive/negative predictive
values with CrI)

Final remarks
• Misclassiﬁcation in DTA studies is often both the primary topic
of study (for the index test) and the problem (when occurring
in the reference standard)
• Model based estimation of index test accuracy by latent class
analysis can be useful
• There is some evidence that robustness of the latent class
model can be improved when disease status can be veriﬁed
with certainty in a subset
• While the focus of this talk was on DTA, other studies such as
“incremental value” studies suffer from the same problems

Acknowledgements
Thanks to all co-authors in:
Supported by a grant from Canadian Institutes of Health Research (MOP
#89857)

Absence of a gold standard in diagnostic test accuracy research

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Absence of a gold standard in diagnostic test accuracy research

Similar to Absence of a gold standard in diagnostic test accuracy research (20)

More from Maarten van Smeden

More from Maarten van Smeden (20)

Recently uploaded

Recently uploaded (20)

Absence of a gold standard in diagnostic test accuracy research