A presentation shedding some insight into the tricky concepts of validity and reliability of any screening test, used in day-to-day lives, using easy and understandable language.
Test bank for critical care nursing a holistic approach 11th edition morton f...
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
1. VALIDITY AND RELIABLITY OF
A SCREENING TEST
DR. SHALINI PATTANAYAK
1ST Year Post Graduate Trainee,
Dept. of Community Medicine,
IPGME&R and SSKM Hospital
2. TABLE OF CONTENTS:
1. SCREENING TEST
2. VALIDITY OF A SCREENING TEST
3. POSITIVE PREDICTIVE VALUE (PPV)
4. NEGATIVE PREDICTIVE VALUE (NPV)
5. TESTS IN SERIES AND PARALLEL
6. BAYES’ THEOREM
7. LIKELIHOOD RATIO
3. TABLE OF CONTENTS (contd.) :
8. PROBLEM OF THE BORDERLINE
9. DETERMINING THE CUTOFF POINT
10. RECEIVER OPERATING CHARACTERISTIC CURVE (ROC CURVE)
11. RELIABILITY OF A SCREENING TEST
12. RELATIONSHIP BETWEEN VALIDITY AND RELIABILITY
13. SUMMARY
14. REFERENCES
4. 1. SCREENING TEST
• CONCEPT : The active search for disease among apparently healthy
people is a fundamental aspect of prevention. This is embodied in
“screening”.
• DEFINITION : The search for unrecognized disease or defect by means
of rapidly applied tests, examinations, or other procedures in
apparently healthy individuals.
5. 1. SCREENING TEST (contd.)
• Some examples of screening tests :
Serological testing for HIV ( ELISA, RAPID)
Neonatal screening for hypothyroidism
Screening for cervical carcinoma ( Pap smear, VIA) and breast
carcinoma ( Mammography ).
Screening for developmental anomalies in the fetus ( AFP : Alpha-
fetoprotein).
6. 1. SCREENING TEST (contd.)
• The screening test must satisfy the following criteria:
acceptability
repeatability ( reliability/precision/reproducibility/consistency)
validity/accuracy
7. 2. VALIDITY OF A SCREENING TEST
• VALIDITY : The term “validity” refers to what extent a screening test
accurately measures which it purports or is supposed to measure.
• It expresses the ability of a test to separate or distinguish those who
have the disease from those who do not. E.g. glycosuria is a useful
screening test for diabetes, but a more valid test is glucose tolerance
test.
8. 2. VALIDITY OF A SCREENING TEST (contd.)
• Validity refers to how well the assessment tool/ instrument actually
measures what it is intended to measure i.e. underlying outcome of
interest.
• The assessment tool/ instrument must be valid for the result to be
accurately applied and interpreted. E.g. weighing machine is not valid.
• It determines how an individual performs, in different situations.
9. 2. VALIDITY (contd.)
• VALIDITY has the following components:
Sensitivity
Specificity
Predictive Accuracy
• The above mentioned measurements are measured in percentages.
• Sensitivity and specificity are usually determined by applying the test to
one group of persons having the disease, and a reference group not having
the same disease.
10. 2. VALIDITY (contd.)
• SENSITIVITY: Ability of a Test to identify correctly all those who have
the disease ( True Positive)
• It is a statistical index to measure diagnostic accuracy; first described
by Yerushalmy in 1940s.
• Thus, sensitivity : True positives (TP) X 100
True positives (TP) +False negatives (FN)
11. 2. VALIDITY (contd.)
• SPECIFICITY : Ability of a Test to identify correctly those who do not
have the disease (True Negative).
• Thus, specificity : True negatives (TN) X 100
True negatives (TN) + False positives (FP)
12. 2. VALIDITY (contd.)
Table 1 : Table showing arbitrary screening test results placed in a 2X2 table
SCREENING
TEST RESULT
DIAGNOSIS OF DISEASE TOTAL
DISEASED NON
DISEASED
POSITIVE a
(TP)
b
(FP)
a+b
NEGATIVE c
(FN)
d
(TN)
c+d
TOTAL a+c b+d a+b+c+d
13. 2. VALIDITY (contd.)
• Thus, from the previous mentioned table:
sensitivity : [a/(a+c)] X 100
specificity : [d/(b+d)] X100
[ Index: a= number of TP, b= Number of FP, c= number of FN, d= number of TN;
(a+c)= total no. of diseased, (b+d)= total no. of non-diseased, (a+b)= total no. of
study subjects tested positive, (c+d)= total no. of study subjects tested negative for
a test; (a+b+c+d)= total no. of study participants].
14. 2. VALIDITY (contd.)
• FALSE POSITIVES: The term “false positive’ means that an individual
who does not have the disease under study is falsely labelled as
“diseased” and is subjected to further diagnostic tests, at the cost of
inconvenience, discomfort, anxiety and expenses, till he is finally
diagnosed free from the disease. False positive tests are a burden to
health expenses and community.
15. 2. VALIDITY (contd.)
• FALSE NEGATIVES: The term “false negative” means that an individual
who has a disease under study is told that he does not have the
disease; in other words he is given a ‘false assurance’. Thus a study
participant who tested false negative may ignore the signs and
symptoms of the disease and further delay the treatment. This maybe
detrimental if the disease is a serious one and if no screening test is
planned in upcoming few days or weeks.
16. 3. POSITIVE PREDICTIVE VALUE (PPV)
• Ability of a Screening Test to identify correctly all those who have the
disease , out of all those who test positive on a screening test
• Also called Post Test Probability
Thus, from the previous mentioned table, PPV = [a /(a+b) ] x 100
17. 4. NEGATIVE PREDICTIVE VALUE (NPV)
• Ability of a Screening Test to identify correctly all those who do not
have the disease , out of all those who test negative on a screening
test
Thus, from the previous mentioned table, NPV = [ d/(c+d) ] x 100
18. 5. TESTS IN SERIES AND PARALLEL
• SERIES : One Test after Another
2nd Test is applied only after 1st Test is Positive
• PARALLEL : Both Tests are applied together
• Table 2: Table showing interpretations of a test in series and parallel
19. 5. TESTS IN SERIES AND PARALLEL (contd.)
• IN SERIES:
Combined Sensitivity of 2 Tests A & B in series = Sn(A) x Sn(B)
Combined Specificity of 2 Tests A & B in series = [ Sp(A) + Sp(B)] – [Sp
(A) x Sp(B)]
• IN PARALLEL:
Combined Sensitivity of 2 tests A & B in Parallel = [ Sn(A) + Sn(B)] –
[Sn (A) x Sn(B)]
Combined Specificity of 2 Tests A & B in Parallel = Sp(A) x Sp(B)
20. 6. BAYES’ THEOREM
• If the test results are positive , what is the probability that the patient
has the disease?
• If the test is negative , what is the probability that the person doesn’t
have the disease?
• Bayes’ Theorem provides answer
• It was first described by Clergyman
22. 6. BAYES’ THEOREM (contd.)
• Relationship between PPV of a Screening Test and Sensitivity,
Specificity and Prevalence of disease in a population:
23. 7. LIKELIHOOD RATIO
• LIKELIHOOD RATIO POSITIVE (LR+) : It is the ratio of sensitivity of a
test to the false-positive error rate. It can be summarized by the
equation : [a/(a+c)] / [b/(b+d)].
• LIKELIHOOD RATIO NEGATIVE (LR-) : It is the ratio of false negative
error rate divided by the specificity of a test. In other words, LR- can
be written as : [ c/(a+c)] / [d/(b+d)].
24. 7. LIKELIHOOD RATIO (contd.)
• If the LR+ of a test is large and the LR- is small, it is probably a good
test.
• Experts in test analysis sometimes calculate the ratio of LR+ to LR- to
obtain a measure of separation between the positive and the
negative test.
• Likelihood ratios are not influenced by prevalence of the disease.
25. 7. LIKELIHOOD RATIO (contd.)
• Likelihood Ratio Positive (LR +)
A positive result on ELISA for HIV is 9.9 times more likely
to occur in a subject with HIV infection as compared to a
subject who does not have HIV infection
26. 7. LIKELIHOOD RATIO (contd.)
• Likelihood Ratio Negative (LR -) :
Example : If the sensitivity of ELISA is 99% (i.e. 0.99 and specificity is
90% (i.e. 0.90), then
The interpretation is that a negative result is only one
hundredth times likely to occur in a person who really has HIV
infection as compared to a person who does not have HIV infection.
27. 8. PROBLEM OF THE BORDERLINE
Fig 1: figure showing unimodal and bimodal distribution of variables.
28. 9. DETERMINING THE CUT OFF POINT
• The factors to be considered are:
Disease prevalence : when the disease prevalence is high in a
community, the cut off point is set at a lower level
The disease: when the disease under study is lethal and early
intervention markedly increases the prognosis, the cut off point is set
at a lower level.
30. 10. RECEIVER OPERATING CHARACTERISTIC
CURVE (ROC CURVE)
• Receiver Operating Characteristic curve or ROC curve, is used to decide on
a good cut-off point of continuous variables in clinical tests, e.g. serum
calcium, blood glucose, blood pressure, etc.
• Origin : World War II, in evaluating the performance of radar receiver
operators:-
i) true positive
ii) false positive
iii) false negative
31. 10. ROC CURVE (contd.)
• If a group of investigators wanted to determine the best cut-off for a
blood pressure screening program, they might begin by taking a single
initial blood pressure measurement in a large population and then
performing a complete workup for persistent hypertension in all of
the individuals. Each person would have data on a single screening
blood pressure and an ultimate diagnosis concerning the presence or
absence of hypertension. Based on this information, an ROC curve
could be constructed.
33. 10. ROC CURVE (contd.)
Fig. 2: Figure showing relation between sensitivity and specificity:
34. 10. ROC CURVE (contd.)
• The ideal ROC Curve for a test would rise almost vertically from the
lower left corner and move horizontally almost along the upper line,
as shown in the diagram below:-
35. 10. ROC CURVE (contd.)
One method of comparing different tests is to determine the area
under the ROC curve for each test and to use a statistical test of
significance to decide if the area under one curve differs significantly
from the area under the other curve. The greater the area under the
curve, the better the test is.
36. 11. RELIABILITY OF A SCREENING TEST
• Reliability of a screening test, sometimes also known as
reproducibility or precision or consistency, is the ability of a
measurement to give the same result or similar result with repeated
measurements of the same factor. It means that all values obtained
from the same test will be consistent every time, in the same setting.
38. 11. RELIABILITY (contd.)
• The factors that contribute to the variation between the test results
are:
Observer variations
Biological variations
Errors related to technical methods
39. 11. RELIABILITY (contd.)
Observer variations are of the following two types :
• Intraobserver variation: Sometimes variation occurs between two or
more readings of the same test results made by the observer. Tests
and examinations differ in the degree to which subjective factors
enter into observer’s conclusions, and greater the subjective element
in the reading, greater the intraobserver variation in readings is likely
to be.
40. 11. RELIABILITY (contd.)
• Interobserver Variation : Variation between observers may happen,
where two examiners often do not give the same result. The extent to
which the observers agree or disagree is an important issue and
therefore we need to be able to express the extent of agreement in
quantitative terms also called ‘percent agreement.’
41. 11. RELIABILITY (contd.)
Biological or subject variation : The values obtained in measuring
many human characteristics often vary over time, for a short period
or a longer period, such as seasonal variation. The conditions under
which certain tests are conducted, e.g. shortly after eating, post-
exercise, etc. clearly can lead to different results in the same
individual.
42. 11. RELIABILITY (contd.)
Errors related to technical methods:
defective instruments
erroneous calibrations
faulty reagents
the test itself maybe inappropriate or not reliable
43. 11. RELIABILITY (contd.)
• Overall Percent Agreement : If a test uses dichotomous variables, i.e.,
two different results (positive or negative), the results maybe
arranged into a 2x2 table, and the observer agreement can be
calculated.
• A common way to measure agreement is to calculate the overall
percent agreement. If 90% of observations are in cells a and d, the
overall percent agreement would be 90%.
44. 11. RELIABILITY (contd.)
• Overall Percent Agreement (contd.) :
This is a 2X2 table, where cells a
and d represent agreement,
and cells c and d represent
disagreement.
If 90% of the observations are
in cells a and d, then the overall
percent agreement would be
90%
45. 11. RELIABILITY (contd.)
• Drawbacks of Percent Agreement :
it does not give an idea about the prevalence of disease in the participants
studied.
it does not show how disagreements occurred- if the positive and negative
test results were evenly distributed between the two observers or did one
observer consistently find more positive outcomes than the other.
it does not define the extent to which agreement between the observers
improves on chance.
46. 11. RELIABILITY (contd.)
KAPPA STATSITICS: The Kappa test is performed to determine the
extent to which the agreement between two observers improved on
chance agreement alone. Even if two observers only guessed about
the presence or absence of a disease or health condition, they
sometimes would agree by chance.
47. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.) :
Let us consider an example : two clinicians have examined the same 100
patients in 1 hour and recorded the presence or absence of murmur in each
patient. For 7 patients, the first clinician reports absence of murmur and
the second reports presence of a murmur, and for 3 patients the second
clinician reports the absence and first clinician reports the presence of a
murmur. For 30 patients the clinicians agree on the presence and for 60
patients they agree on the absence of a murmur.
48. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.) : The results obtained can be arranged in a
2X2 table as follows :
Clinician 1
Clinician 2 Murmur
present
Murmur
absent
Total
Murmur
present
a= 30 b= 7 (a+b)= 37
Murmur
absent
c= 3 d= 60 (c+d)= 63
Total (a+c)= 33 (b+d)= 67 (a+b+c+d)=100
49. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.):
• The observed agreement (Ao) is the actual number of observations in
cells a and d.
• The maximum possible agreement is the total number of
observations (N)
• The agreement expected by chance (Ac) is the sum of expected
number of observations in cells a and d.
• Therefore, kappa = (Ao-Ac) / (N-Ac)
50. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.) : from the previous mentioned values, the
following can be calculated :
• Observed agreement (Ao) : 30+60= 90
• Maximum possible agreement (N) : 30+7+3+60 = 100
• Cell a agreement expected by chance : [(30+7)(30+3)]/100 = 12.2
• Cell d agreement expected by chance : [(3+60)(7+60)]/100 = 42.2
• So, total agreement expected by chance (Ac) : 12.2+42.2 = 54.4
• So, kappa : (Ao-Ac) / (N-Ac) = (90-54.4) / (100-54.4) = 0.78 = 78%
51. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.) : kappa ratio can take values from -1 to
+1.
• -1 = perfect disagreement
• 0 = agreement expected by chance
• +1 = perfect agreement.
52. 11. RELIABILITY (contd.)
KAPPA STATISTICS (contd.):
Interpretation:
• <20% = negligible improvement over chance
• 20-40% = minimal improvement over chance
• 40-60% = fair improvement over chance
• 60-80% = good improvement over chance
• >80% = excellent improvement over chance
53. 11. RELIABILITY (contd.)
WEIGHTED KAPPA :
• Kappa test provides valuable data on observer agreement for
diagnosis recorded as ‘present’ or ‘absent’.
• For diagnosis/ studies involving 3 or more outcome categories like
negatives, suspicious or probable , we use weighted kappa test.
• The weighted kappa test gives partial credit for agreement that is
close but not perfect.
54. 12. RELATIONSHIP BETWEEN VALIDITY AND
RELIABILITY
Fig 4 : Figure showing the relationship between validity and reliability of a screening
test.
55. 13. SUMMARY
• VALIDITY is the ability of a screening test to accurately measure what
it purports to measure.
• RELIABILITY is that property of a screening test where repeated
measurements of the same variable done on the same subject or
material at the same time will yield consistent results.
56. 13. SUMMARY (contd.)
• Validity has two components: sensitivity and specificity, which can be
determined when a test is applied to a group of diseased individuals
and to a reference group having non-diseased individuals. These two
components along with ‘predictive accuracy’ are the inherent
properties of a screening test.
• A good screening test should be highly valid and highly reliable at the
same time.
57. 13. SUMMARY (contd.)
• Agreement among observations between two different observers can be
determined by percent agreement or Kappa statistics.
• A good cut off point for continuous variables obtained from a clinical test
can be determined by ROC curve, wherefrom sensitivity and false positive
error rate can also be determined.
• A screening test, which is used to rule out a diagnosis, must have high
degree of sensitivity.
• A confirmatory test, which is used to rule in a disease, must have high
degree of specificity.
58. 14. REFERENCES
• Park K. Park’s Textbook of Preventive and Social Medicine. 26th ed.
Jabalpur : Banarasidas Bhanot Publishers; 2021. p. 152-6.
• Celentano D, Szklo M. Gordis Epidemiology. Elsevier. 2019. p. 94-120.
• Katz LD, Elmore GJ, Wild MG, Lucan CS. Jekel’s Epidemiology,
Biostatistics, Preventive Medicine, and Public Health. 4th ed. Elsevier.
2014. p. 81-96.