2. Evaluation of a Diagnostic Test
Accuracy or Validity –
The degree to which a measurement or
study reaches a correct conclusion
determining the ‘True Status’ of the
disease
• Sensitivity and Specificity describe the
validity of the test relative to gold standard
• First step in the evaluation of a test is
determining True status of the disease
using ‘Gold Standard’ for the Test
3. Definitions
• Sensitivity – the percentage of diseased persons
having positive test
• Specificity – the percentage of Non- diseased
persons having negative test
• Predictive Value – estimation of the probability of
Dis. after test results –PPV & NPV
• PPV – %age of persons with +ve Test, having Dis.
• NPV - %age of persons with -ve Test, not having Dis
4. • The greater the sensitivity of a test, more likely
the test will detect the disease.
The test with Great Sensitivity are usefull to
rule out presence of a disease bcz -ve test will
virtually exclude the possibility that patient has
the Disease.
• The greater the specificity of a test, the more
likely it is that persons without the disease will
have –ve test.
Very specific tests often are used to confirm the
presence of a disease. If a test is highly specific;
a +ve test result would strongly suggest the
presence of the disease
5. Surgical Biopsy (Gold Standard- TRUTH)
FNA
results
Positive
Disease
(Biopsy + ve)
No Disease
(Biopsy - ve)
Total
14
8 22
Negative
1
91
92
TOTAL 15 99 114
True Positives
False Positives
False Negatives
True Negatives
Type I error
Type II error Power 1 -
Sensitivity
14/15=93%
Specificity
(91/99 = 92%)
PPV=14/22
= 64%
NPV =
91/92 = 99%
Prev = 13%
(15/114)
6. How Diagnostic Test help
• Whether the probability of Br. Ca is 13% or
64% :- further workup is required!!!
• But a –ve test result would reduce the
probability that Br. Ca is present to 1%
(100% minus NPV)
So!
Now no Biopsy…but keep watch
7. 14 8
1 91
Without Palpable Mass
Surgical Biopsy
Cancer No Cancer
FNA
Result
+
-
Total 15 99
22
92
114
113 15
8 181
Surgical Biopsy
Cancer No Cancer
FNA
Result
+
-
Total 121 196
128
189
317
With Palpable Mass
Total
Total
Prevalence = 13%
Sensitivity = 93%
Specificity = 92%
PPV = 64%
NPV = 99%
Prevalence = 38%
Sensitivity = 93%
Specificity = 92%
PPV = 88%
NPV = 96%
8. Two Stage Screening - Net Specificity
Net Sensitivity = 315/500 = 63% ( )
Net Specificity = 7600 + 1710/ 9500 =
98% ( )
Assessing the Validity and Reliability of Diagnostic and Screening Test
PPV-?
12. Two Stage vs Simultaneous Testing
• Compared with either test alone, there is :-
– a loss in Net Sensitivity and a gain in Net Specificity
in Two Stage Testing
– a gain in Net Sensitivity and a loss in Net Specificity
in Simultaneous Testing
• Decision to use either Two Stage or Simultaneous
Testing depends on objective and practical
considerations (like reducing hospital stay, cost
and invasiveness ! Insurance coverage)
14. Without palpable masses; Prevalence= 13%
With palpable masses; Prevalence= 38%
Prevalence= 13%
Sensitivity = 93%
Specificity = 92%
PV + = 64%
PV - = 99%
Prevalence= 38%
Sensitivity = 93%
Specificity = 92%
PV + = 88%
PV - = 96%
Predictive value and Prev.
Surgical biopsy
FNA
results
positive
Cancer No
Cancer
Total
41 5 46
negative 3 65 68
Total 44 70 114
Surgical biopsy
FNA
results
positive
Cancer No
Cancer
Total
14 8 22
negative 1 91 92
Total 15 99 114
15. Suspicious FNA results considered positive
Prevalence= 38%
Sensitivity = 93%
Specificity = 92%
PV + = 88%
PV - = 96%
Surgical biopsy
FNA
results
Positive
Cancer No Cancer Total
33 0 33
negative 11 70 81
Total 44 70 114
Suspicious FNA results considered negative
Prevalence= 38%
Sensitivity = 75%
Specificity = 100%
PV + = 100%
PV - = 87%
Specificity & Predictive Value
Surgical biopsy
FNA
results
positive
Cancer No
Cancer
Total
41 5 46
negative 3 65 68
Total 44 70 114
16.
17.
18.
19. Decide whether to order ESR or directly MRI?
• A 57 yr old man presents with h/o aching low back pain that
persists at rest and is worse by bending and lifting.
Progressively getting worse in last 6 wks-awakening him at
night.
• Within past 10 days he has noticed numbness in Rt buttock
and thigh and weakness in Rt lower limb.
• He had no fever but has lost 10 lb Wt. in last 4 months.
• O/E – Temp. is 99.6F, tenderness in the lower lumber spine,
decrease in sensation over dorso-lateral aspect of Rt. foot,
weakness in Rt ankle aversion. Deep tendon reflexes normal
• ! You suspect that man has 20% chance of spinal malignancy
• ESR ≥20 mm/h has 78% sensitivity & 67% specificity
• MRI has 95% sensitivity AND 95% specificity!!
• Suppose we have 1000 patients
20. We can use any of he following methods
• 2 X 2 Table Method
• Likelihood Ratio (gives odds)
• Decision Tree method And
• Bayes Theoram
Don’t
Tt
Test Treat
Do
n’t
Tt
Test Treat
Don’t
Tt
Test Treat
No difference
Test is quite accurate
with little Risk
Test is of low accuracy
or Risky
21. How will the prior probability of 20% change with +ve ESR?
Disease
Test (ESR) D+ D-
T+ (TP) 156 (FP) 264 420
T- (FN) 44 (TN) 536 580
200 800 1000
Predictive Value of a Positive test PPV (PV+) = 156/420 = 0.37
Increased from 20% to 37%
Predictive Value of a Negative test NPV = 536/580 = 0.92
A) - What is the probability that the patient doesn’t have the Dis. though the test is +ve (FP)?
B)- What is the probability that the patient does have the disease though the test is –ve (FN)?
A) - 264/420 = 0.63 (hence minimal use as screening test), and B)- 44/580 = 0.08
22. MRI has 95% sensitivity AND 95% specificity
And will have PVs as follows:-
Disease
Test (MRI) D+ D-
T+ (TP) 351.5 (FP) 31.5 383
T- (FN) 18.5 (TN) 598.5 617
370 630 1000
Predictive Value of a Positive test PPV (PV+) = 351.5/383 = 0.918
Predictive Value of a Negative test NPV (NPV) = 598.5/617= 0.970
23. Likelihood Ratio
LR = Chance of picking the Dis. out of total Diseased
Chance of picking the Dis. out of total Dis. Free
= sensitivity/ 1-specificity (FN)
= LR = 0.78/ 1-0.67 = 0.78/0.33 = 2.36
(In case of ESR where sensitivity = 78% & Specificity = 67%);
Pre Test odds = Prior Probability
1- Prior Probability
Post Test odds = LR X Pre test odds = 0.78 X 0.2
0.33 X 0.8
= 2.36 X 0.25 = 0.59
Posterior probability (PPV) = post test odds/ 1+ post test odds
= 0.59/ 1+ 0.59 = 0.37 = 37%
= 0.20/1- 0.2 = 0.25
=0.156/ 0.264
24. Reliability or Repeatability of a Test
• Factors responsible for variation in the results:
1. Intra subject (within the individual) variation
2. Intra observer variation (variation in the
reading of test result by the same observer)-
greater the subjective element in the reading
more is this error
3. Inter observer variation (variation in the
reading of test result between observers)
25. Inter observer Variation
Reading No. 1
Reading No. 2 Abnormal Suspect Doubtful Normal
Abnormal A B C D
Suspect E F G H
Doubtful I J K L
Normal M N O P
Percent Agreement =
A + F + K + P
Total readings
X 100
27. Kappa Statistics
• The extent to which two observers (physician/
nurse/ radiologist etc) agree is an important
Index of good quality of care
• Yet, there is a fraction based ‘solely on chance’
for agreement between two observers
• What we want to know is – to what extent
did the education/ training that the observers
received improve the quality of their
observation (how much increased percent
agreement between them beyond chance! )
28. Rationale of the kappa statistics
• First we want to know – how much better is the
agreement between the observers’ readings than
would be expected by chance
= (% agreement observed - % agreement expected
by chance alone)
• What is the maximum improvement the
observers can have than expected by chance
100% - % agreement expected by chance alone
• Kappa expresses the extent to which the observed
agreement exceeds chance agreement relative to
maximum that the observer can hope to improve
29. • Kappa =
[Percent Agreement
Observed]
[Percent Agreement
expected by chance alone]
-
[Percent Agreement
expected by chance alone]
100% -
Landis and Koch suggested that :-
kappa greater than 0.75 = excellent agreement
Kappa of 0.40 to 0.75 = intermediate to good agreement
33. Exercise
• PA was used to screen Br. Ca. in 2,500 women
with biopsy proven adenocarcinoma and in
5,000 age matched control wome. The result
of PA were +ve in 1800 cases and 800 control.
• What is the sensitivity, specificity and PPV of
PA ?
• 72%, 84% and 69%
34. • A screening test is used in the same way in two
similar Pop. But the proportion of FPs among those
who test +ve in Pop. A is lower than that among
those who test +ve in Pop. B
• What is likely explanation for this finding ?
• Prevalence of disease is higher in Pop. A
• Compare PA and Audiometry test for hearing
problem using sen.itivity and specificity ?
PA
Test
Hearing Prob.
D+ D-
T+ 240 40
T- 60 160
Audiometry
Test
Hearing Prob.
D+ D-
T+ 240 40
T- 60 160
More sensitive and less specific
35. • Two Pead. Test for streptococcal infection one
(X) using standard culture test which is 90%
Sen. And 96% Sp. While other (Y) uses New
culture test which is 96% Sen. And 96% Sp.
• If 200 patients undergo culture with both tests
which is correct:-
a) X will correctly identify more people
b) X will correctly identify fewer people
c) X will correctly identify more people without infection
d) The Prev. of Strep. Inf. Is needed to know who will
correctly identify more people
e) Ans is b
36. • In a screening for colon Ca. 50-75 yrs old were
screened with Hemoccult test.
• If Hemoccult test has 70% Sen. & 75% Sp. And
Prev. of Ca. colon is 12/1000, what is the PPV
• 3.13%
• If Hemoccult test is –ve no further testing is
done but if its +ve Hemoccult test II is done .
• If this second test is also +ve for blood in stool,
more extensive tests are done. What is the
effect on Net Sen. & Sp. of this method?
• Net Sen. Is decreased and Net Sp. Increased
37. • Two physicians were asked to classify X-rays
abnormal or normal independently.
Physician 2
Physician 1 Abnormal Normal Total
Abnormal 40 20 60
Normal 10 30 40
Total 50 50 100
1. What is the simple % agreement between the two physician - …………..
2. The % agreement between the two physician, excluding the X-rays that bot
classified as normal is ………………..
3. The value of kappa is ………….
4. How will you rate this kappa ………….
52. The process
of making
an objective
and
systematic
analysis of
information
from all the
randomized
controlled
trials
Editor's Notes
In a perfect world medical test would always be correct +ve test will mean disease and –Ve test will mean ‘no disease’ but in actuality that is not the case!!!!
Internal Validity – the extent to which the results reflect the true situation.
To improve on it :-
Restricting the type of subjects (Inclusion/Exclusion Criteria)
The environment in which the study is performed
External Validity – Generalizability (sample size!)
The greater the sensitivity of a test, more likely the test will detect the disease. For FNA 93% of all breast cancer patients had positive results Negative result with test having High Sensitivity would virtually exclude the possibility that patient has the Disease
The greater the specificity of a test, the more likely it is that persons without the disease will be excluded from the consideration of having the disease
Patient Profile extended – 114 such patients were tested with both biopsy and FNA test
PPV-? – 15.6% to 63%
If we do not want False Positives – two stage testing
If we do not want False Negatives – Simultaneous testing
A decision must be reached on whether to order ESR or proceed directly with lumber MRI. It depends on –
prior probability of Spinal Malignancy
the accuracy of ESR in detecting malignancy among who have it (sensitivity)
its ability to label a person not having disease among disease free people (specificity)
Application of probabilistic and statistical principles to individual patient is evidence based medicine
evaluating new diagnostic procedures
determining most cost effective approach
evaluating available Tt options
A physician’s best guess (index of suspicion that pt has the disease – prior probability) depends on knowledge of prev. and is revised upwards or downwards depending up on S/S and other characteristics like race/ age/ sex etc
Decision to do test &/ or Tt depends on the risk of the Diagnostic test, the benefits of Tt to patient, the risk of Tt to patient with and without disease and the accuracy of the Test.
If we ask two roadside persons to mark a set of X-rays as positive and negative indepently there will be some agreement between their readings just by chance
Kappa statistics proposed by Cohen in 1960
The ROC plot of a given test is obtained by calculating the sensitivity and specificity of every observed value, and then plotting sensitivity (on the Y axis) against 1 - specificity (on the X axis). A test that does not discriminate between normal and abnormal would give a diagonal straight line from the bottom left corner to the top right corner. All points on such a line represent a 1:1 ratio of true to false positives. An ideal test would give a rectangular plot passing from the origin at the bottom left hand corner towards top left hand corner at first and thence to the top right hand corner. In reality the ROC curves of many of the tests in common use fall in between these extremes. The cut-off point for deciding between normal and abnormal is selected arbitrarily where the ROC curve changes direction from being vertical to horizontal. The more the ROC curve arches into the upper left hand corner away from the diagonal, the better the test.
The overall odds ratio is then calculated by pooling the data of all the studies. This is also called "typical" odds ratio. It is calculated by the difference between number of deaths in the treatment group (i.e. the number observed) and the number of deaths in this group if the treatment were ineffective (i.e. number expected). This gives the Observed minus the Expected statistic. The confidence interval of O E is also calculated
The outcome in the case of each study can be estimated separately by calculating the O E value. If the observed number (O) differs systematically from the expected number (E), there is clear evidence of effect.
The totaled O E gives a measure of the overall statistical significance and the effect size