Diagnotic and screening tests

Diagnostic & Screening Tests

Evaluating Clinical Tests

Herald-Leader, 13 October 2011

Herald-Leader, 18 October 2011

Biological Symptoms
Onset Appear
Clinical
Clinical Outcome
Screening
Diagnosis

Mausner & Kramer, Epidemiology—An Introductory Text, 1985


Diagnostic and screening tests attempt to reveal an otherwise
hidden truth about patients (i.e., their health status: diseased
or disease-free).
•Physical examination
•Radiographs/Computed Tomography (CT)
•Blood and urine assays
•Cytology (Paps smear, Oral brush biopsy)
•Saliva (HIV testing)

Discrimination & Classification

“The fundamental principle of diagnostic testing [and
screening] rests on the belief that individuals with disease are
different from individuals without disease and that diagnostic
[and screening] tests can distinguish between these two
groups.”
Riegelman, Studying a Study and Testing a Test, 2000

•Valid (i.e., accurate)
Sensitivity, specificity, ROC
Predictive values
Multiple tests
•Reliable (i.e., precise or repeatable)
Percent agreement
Kappa


Disease status comes from an external source of “truth”
regarding the patients in the population:
•Gold standard or reference standard
Adequate
Independent
Unbiased
Representative

Interlude: The Gold Standard

Unbiased
• The procedure used to establish the truth should not bias
the truth.
• Surgery or histology  the “truth” will consist of the
more advanced cases

Representative
• Cadaver studies of TMJ (older). Patients younger.
• Caries simulations (drilled holes in teeth) versus natural
lesions

Interlude: The Gold Standard

Adequate
•Surgery or autopsy (common in imaging studies)
•Time between imaging and surgery/biopsy
•Applies to positive cases
•Negative cases – clinical follow-up

Independent
•Histology provides an independent truth.
•Occasionally all of the available information, including
the test being tested is used to establish the gold
standard. Bone lesion for example (BFO). Creates a bias
in favor of the test


“Appearances to the mind are of four kinds. Things either are
what they appear to be [ ]; or they neither are, nor appear to
be [ ]; or they are, and do not appear to be [ ]; or they are
not, and yet appear to be [ ]. Rightly to aim in all these
cases is the wise man’s task.”
Epictetus (c. 50-120)
Discourses, Bk I, Chp 27

Validity: Sensitivity & Specificity

Sensitivity

= Ability of the test to correctly identify those with disease
= Probability of testing positive given the presence of disease
= TP / (TP + FN)
= a / (a + c)


Specificity

= Ability of the test to correctly identify those without disease
= Probability of testing negative given the absence of disease
= TN / (FP + TN)
= d / (b + d)


Assume a population of 1000 people of whom 100 have a
disease. Of these 100 people, the test correctly identifies 80.Of
the 900 disease-free people, the test correctly identifies 800.

Sensitivity = a / (a + c) = 80 / 100 = 80%
Specificity = d / (b + d) = 800/ 900 = 89%

Gordis, 2009, Table 5-1


Sensitivity and Specificity

• Inherent characteristics of the test
• Stable over different populations with different disease
prevalence
• Useful for comparing performance of two tests
(e.g., Digital versus film mammography / Pisano, NEJM 2005)
• Have a reciprocal relationship with one another


Low cutoff  High sensitivity
 Low specificity
 False positives

Moderate cutoff  balance

High cutoff  Low sensitivity
 High specificity
 False negatives

Courtesy, S. Fleming, 2011

Validity: Receiver Operating Characteristic Curve

X-axis:
False positive ratio
(1-specificity)
Y-axis:
True positive ratio
(sensitivity)


5: Sensitivity = 1 and Specificity = 0
1: Sensitivity = 0 and Specificity = 1


ROC can be used for a binary
outcome (cancer/no cancer) by
creating a multipoint scoring
scale.

Validity: Performance / Predictive Value

Sensitivity and specificity are useful, but

• May be numerically different if obtained on a group of
people with early stages of disease compared with a group
with more advanced disease.

• We do not know ahead of time who has the disease and
who does not. Rather, we get the test results and need to
interpret the findings.


Positive Predictive Value

= Ability of the test to correctly identify those who test positive
= Probability of having the disease given a positive test result
= TP / (TP + FP)
= a / (a + b)


Negative Predictive Value

= Ability of the test to correctly identify those who test negative
= Probability of not having the disease (i.e., being disease-free)
given a negative test result
= TN / (FN + TN)
= d / (c + d)

Validity: Positive & Negative Predictive Values

Assume a population of 1000 people of whom 100 have a
disease. Of these 100 people, the test correctly identifies 80.Of
the 900 disease-free people, the test correctly identifies 800.

Positive PV = a / (a + b) = 80 / 180 = 44%
Negative PV = d / (c + d) = 800/ 820 = 98%

Gordis, 2009, Table 5-7

Validity: Predictive Values & Prevalence
Assume a test with a sensitivity of 80% and specificitity of 90%.
What happens to the predictive values when the prevalence of
the disease varies? To fill in the cells, assume a convenient total
population, in this case 1000.

80 90

20 810

Positive PV = a / (a + b) = 80 / 170 = 0.4706 = 47.1%
Negative PV = d / (c + d) = 810/ 830 = 0.9759 = 97.6%

After Kramer Clinical Epidemiology and Biostatistics, 1988



Positive PV = a / (a + b) = 400 / 450 = 0.8888 = 88.9%
Negative PV = d / (c + d) = 100/ 550 = 0.8181 = 81.8%




Positive PV = a / (a + b) = 720 / 730 = 0.9863 = 98.6%
Negative PV = d / (c + d) = 90/ 270 = 0.3333 = 33.3%




Some additional terms:
• Pretest probability = prior probability = prevalence
• Post-test probability = posterior probability =
positive/negative predictive value
• Bayes Theorem (Thomas Bayes, 1702-61)


Gordis, 2009, Figure 5-12


Sackett, Clinical Epidemiology, 1985

Multiple Tests: Sequential versus Simultaneous
Screening test: Diagnostic test:
•Less expensive •More expensive
•Less invasive •More invasive
•Less uncomfortable •More uncomfortable
•More accurate

Mausner & Kramer, Epidemiology—An Introductory Text, 1985

Multiple Tests: Sequential

Net Sensitivity = 161 / 200 = 80.5%
Net Specificity = (8740 + 158) / 9,800 = 90.1%

Multiple Tests: Sequential

Net Sensitivity = 315 / 500 = 63.0%
Net Specificity = (7600 + 1710) / 9,500 = 98.0%

Multiple Tests: Simultaneous
Suppose in a population of 1000
people, 200 have the disease and
Test A sensitivity = 80%
Test B sensitivity = 90%
Net sensitivity = A+, B+ or both

Step 1: 0.8 x 200 = 160 who are A+
Step *: 0.9 x 200 = 180 who are B+
Step 2: 0.9 x 160 = 144 who are A+B+
Step 3: 160 – 144 = 16 who are A+ only
Step 4: 180 – 144 = 36 who are B+ only
Step 5: 144 + 16 + 36 = 196 = A+,B+, or
both
Step 6: 196/200 = 98%


Multiple Tests: Simultaneous
Suppose in a population of 1000
people, 800 don’t have the disease
Test A specificity = 60%
Test B specificity = 90%
Net specificity = A- and B-

Step 1: 0.6 x 800 = 480 who are A-
Step *: 0.90 x 800 = 720 who are B-
Step 2: 0.9 x 480 = 432 who are A-
and B-
Step 3: 432/800 = 54%


Reliability

Reliability (aka repeatability or precision) is the ability of the
test to give consistent results when performed more than once
by on the same individual under the same conditions, even if
conducted by different examiners.

Sources of variability (the antithesis of repeatability)
•Subjects
BP reading (throughout day, sitting/standing, R/L arm)
Serum glucose (throughout day, day of the week)
•Instrumentations
PSA assay (5% variability even when measuring identical blood
sample)
•Observer
Intra-observer
Inter-observer

Reliability: Percent Agreement

Percent agreement
= number of tests that agree / total number of tests
= (a + d) / (a + b + c + d)
= 35 / 40
= 0.875 = 87.5%

Reliability: Kappa

Measure agreement beyond that expected from chance alone:

Kappa = (percent agreement – chance agreement)
(1 – chance agreement)

Kappa varies between 0 (no agreement) and 1 (perfect agreement)
< 0.40 Poor agreement
0.40 - 0.75 Fair to good agreement
> 0.75 Excellent

In example, chance agreement = 0.695
Kappa = (0.875 – 0.695)/(1 – 0.695) = 0.180/0.305 = 0.590

Reliability: Kappa

Kundel and Polansky, Radiology, 2003

Reliability: Calculating Kappa

Two pathologists independently read and score 75
histopathology slides using their own criteria to subtype the
lesion as Grade II or Grade III



Observed



Expected

Kappa = 90.7% - 51.7% = 39% = 0.81
100% - 51.7% 48.3%

Screening

“Screening is defined as the presumptive identification of
unrecognized diseasese or defects by the application of tests,
examinations, or other procedures that can be applied rapidly.”

Friis and Seller, 2009

“For screening to be of benefit, treatment given during the
detectable preclinical phase must result in a better prognosis
than therapy given after symptoms develop.”

Hennekens and Buring, 1987

Screening

Nature of the Disease
•Important health problem
Morbidity/Mortality
•Treatable
 Unethical to screen if untreatable, except to prevent transmission
(e.g., early cases of AIDS versus protecting blood supply)
•Relatively high prevalence
 Rare disease  PPV is low & cost per case detected is high
 Exceptions: Phenylketouria (PKU), 1 in 15,000 births, but
consequences are severe (mental retardation), treatment is
simple (dietary restriction), screening tests are simple.
•Detectable preclinical phase (long latency period)
Biological Symptoms
Onset Appear
Clinical
Clinical Outcome
Screening
Diagnosis

Screening

Nature of the Test
•Simple
 Easy to learn and perform
No complicated patient preparation
•Rapid
 To administer
 To yield results
•Safe
 Screened populations are overwhelmingly healthy – keep them
that way
•Valid and reliable
 High sensitivity
 Relatively high specificity – accept some FP as there will be
follow-up confirmatory tests, but what is the cost and morbidity
of the follow-up, the cost of mislabeling someone, etc.

Screening

Societal Factors
•Cost
 Relatively inexpensive
Benefit/cost ratio favorable versus other health care expenditures
•Acceptable
 Unpalatable or difficult tests  refusal to participate

Resources

Langlotz, Radiology 2003 – supplement to Gordis, especially
for ROC curves.

Pisano et al. NEJM 2005 – example of an application of
concepts.

Linker, AJPH 2012 – and interesting historical perspective of
screening, specifically for scoliosis.

US Preventive Services Task Force (USPSTF) – the source of
many guidelines (and some controversy) regarding screening:
< http://www.uspreventiveservicestaskforce.org/>.

Diagnotic and screening tests

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Diagnotic and screening tests

Similar to Diagnotic and screening tests (20)

Recently uploaded

Recently uploaded (20)

Diagnotic and screening tests