Diagnostic test and agreement.pptx

DIAGNOSTIC TEST AND
AGREEMENT
“Accuracy is telling the truth…Precision is telling the same story over and over again.”
Yiding Wang
DR. AMIT KUMARTHAKUR
Dept. of Public Health Dentistry
Govt. Dental College & Research Institute, Bengaluru

CONTENT
• Introduction
• Test with dichotomous results
• Test of continuous variables
• Receiver operating characteristic curve
• Use of multiple test
Sequential testing
Simultaneous testing
• Predictive value
• Likelihood ratios
• Observer variation
• Measuring agreement
• Summary & conclusion
• References

Introduction
• As with all other elements of healthcare, medical tests should be thoroughly
evaluated in high quality studies.
• Biased results from poorly designed, conducted or analysed studies may
trigger premature dissemination and implementation of a medical test and
mislead physicians to incorrect decisions regarding the care for an individual
patient.
• Avoidance of these perils requires a proper evaluation of medical tests.

• Diagnostic testing, a term that encompasses physical examination, history,
imaging techniques (e.g., x-rays, CT scans), and procedures (e.g., ECG), as
well as laboratory tests, provides the framework for clinical decision making.
• A given diagnostic test is based on the assumption that diseased and healthy
individuals can be accurately and reproducibly differentiated by the test.

Types of diagnostic tests
1. Qualitative diagnostic tests classify patients as diseased or disease-free
according to the presence or absence of a clinical sign or symptom. For
example, an x-ray might confirm or disprove the existence of a fracture.
2. Quantitative diagnostic tests classify patients as diseased or disease-free on
the basis of whether they fall above or below a preselected cutoff value known
as the positivity criterion. This cutoff value is also referred to as the critical
value or referent value.

Screening test vs diagnostic test

VALIDITY OF TESTS
• The term validity refers to what extent the test accurately measures which it
purports to measure.
• In other words, validity expresses the ability of a test to separate or
distinguish those who have the disease from those who do not.
• Validity has two components: sensitivity and specificity.

Tests with Dichotomous
Results (Positive or Negative)

Sensitivity
• The term sensitivity was introduced by Yerushalmy in 1940s as a statistical
index of diagnostic accuracy.
• It has been defined as the ability of a test to identify correctly all those who
have the disease, that is "true positive".
• Sensitivity is the probability of a positive test result (that is, the test indicates
the presence of disease) for a patient with the disease.

Specificity
• Specificity, on the other hand, is the probability of a negative test result (that
is, the test does not indicate the presence of disease) for a patient without the
disease.
• The specificity of the test is defined as the ability of the test to identify
correctly those who do not have the disease.

• An ideal screening test should be 100% sensitive and 100% specific. In
practice, this seldom occurs.

• Note that to calculate the sensitivity and specificity of a test, we must know
who “really” has the disease and who does not from a source other than the
test we are using.
• We are, in fact, comparing our test results with some “gold standard”—an
external source of “truth” regarding the disease status of each individual in
the population.

• Sometimes this truth may be the result of another test that has been in use,
and sometimes it is the result of a more definitive, and often more invasive,
test (e.g., cardiac catheterization or tissue biopsy).
• However, in real life, when we use a test to identify diseased and non-
diseased persons in a population, we clearly do not know who has the disease
and who does not. (If this were already established, testing would be
pointless.)

• But to quantitatively assess the sensitivity and specificity of a test, we must
have another source of truth with which to compare the test results.

Why is the problem of false positive
important?
• When we conduct a screening program, we often have a large group of people
who screened positive, including both people who really have the disease
(true positives) and people who do not have the disease (false positives).
• The issue of false positives is important because all people who screened
positive are brought back for more sophisticated and more expensive tests. Of
the several problems that result, the first is a burden on the health care system.

• Another is the anxiety and worry induced in persons who have been told that
they have tested positive.
• Considerable evidence indicates that many people who are labeled “positive”
by a screening test never have that label completely erased, even if the results
of a subsequent evaluation are negative.

• For example, children labeled “positive” in a screening program for heart
disease were handled as handicapped by parents and school personnel even
after being told that subsequent more definitive tests were negative.
• In addition, such individuals may be limited in regard to employment and
insurability by erroneous interpretation of positive screening test results, even
if subsequent tests fail to substantiate any positive finding.
• It brings bring discredit to screening programmes.

Why is the problem of false negatives
important?
• If a person has the disease but is erroneously informed that the test result is
negative, and if the disease is a serious one for which effective intervention is
available, the problem is indeed critical.
• For example, if the disease is a type of cancer that is curable only in its early
stages, a false-negative result could represent a virtual death sentence.

• Thus, the importance of false-negative results depends on the nature and
severity of the disease being screened for, the effectiveness of available
intervention measures,
• and whether the effectiveness is greater if the intervention is administered
early in the natural history of the disease.

• Test for a continuous variable, such as blood pressure or blood glucose level,
for which there is no “positive” or “negative” result.
• A decision must therefore be made in establishing a cutoff level above which
a test result is considered positive and below which a result is considered
negative.

• Population of 20 diabetics and 20 non-diabetics who are being screened using
a blood sugar test whose scale is shown along the vertical axis from high to
low. The diabetics are represented by blue circles and the non-diabetics by red
circles.

• We see that although blood sugar levels tend to be higher in diabetics than in
non-diabetics, no level clearly separates the two groups; there is some overlap
of diabetics and non-diabetics at every blood sugar level.

• Nevertheless, we must select a cutoff point so that those whose results fall
above the cutoff can be called “positive,” and can be called back for further
testing,
• and those whose results fall below that point are called “negative,” and are
not called back for further testing.

What if a low cutoff level is chosen
????
• Very few diabetics would be misdiagnosed.
• A large proportion of the non-diabetics are now identified as positive by the
test

What if a high cutoff level is
chosen????
• Many of the diabetics will not be identified as positive.
• On the other hand, most of the non-diabetics will be correctly identified as
negative.

• The difficulty is that in the real world, no vertical line separates the diabetics
and non-diabetics, and they are, in fact, mixed together in fact, they are not
even distinguishable by red or blue circles.
• So if a high cutoff level is used, all those with results below the line will be
assured they do not have the disease and will not be followed further;
• if the low cutoff is used, all those with results above the line will be brought
back for further testing.

• The dilemma involved in deciding whether to set a high cutoff or a low cutoff
rests in the problem of the false positives and the false negatives that result
from the testing.

Where to Draw the Cutoff-Point??
If the diagnostic (confirmatory) test is expensive or invasive:
• Minimize false positives or Use a cut-point with high specificity
If the penalty for missing a case is high (e.g., the disease is fatal and
treatment exists, or disease easily spreads):
• Maximize true positives. That is, use a cut-point with high sensitivity
• Balance severity of false positives against false negatives.

Principles of testing
1. A screening test, which is used to rule out a diagnosis, should have a high
degree of sensitivity.
2. A confirmatory test, which is used to rule in a diagnosis, should have a high
degree of specificity.

Receiver operating characteristic
(ROC) curves
• Term originated in England during the Battle of Britian.
• To decide on a good cutoff point, investigators could construct this curve.
• The ROC shows the values of sensitivity and specificity associated with each
possible cut-point, so that its graph provides a complete picture of the
performance of the test.

• The Y-axis shows the sensitivity of a test, and the X-axis shows the false
positive error rate (1-specificity).
• An ROC curve that consisted of a straight line from the lower left-hand corner
to the upper right-hand corner would signify a test that was no better than
chance.
• The closer the curve comes to the upper left-hand corner, the more accurate
the test (higher sensitivity and higher specificity)

• In contrast, a test with an ROC curve that passes near the upper left corner
(that is, near 100% sensitivity and 0% FPR [100% specificity]) is nearly
perfect at distinguishing disease from no disease.
• An ideal ROC curve for a test would rise almost vertically from the lower left
corner and move horizontally almost along the upper line.

1. Sequential (Two-stage) Testing
• In sequential or two-stage screening, a less expensive, less invasive, or less
uncomfortable test is generally performed first, and those who screen positive
are recalled for further testing with a more expensive, more invasive, or more
uncomfortable test, which may have greater sensitivity and specificity.
• It is hoped that bringing back for further testing only those who screen
positive will reduce the problem of false positives.

Net sensitivity
After finishing both tests, 315 people of the total 500 people with diabetes in
this population of 10,000 will have been correctly called positive: 315/500 =
63% net sensitivity.
Thus, there is a loss in net sensitivity by using both tests.

Net specificity
• To calculate net specificity, note that 7,600 people of the 9,500 in this
population who do not have diabetes were correctly called negative in the
first-stage screening and were not tested further
• an additional 1,710 of those 9,500 non-diabetics were correctly called
negative in the second-stage screening. Thus a total of 7,600 + 1,710 of the
9,500 nondiabetics were correctly called negative: 9,310/9,500 = 98% net
specificity.
• Thus, use of both tests has resulted in a gain in net specificity.

2. Simultaneous Testing
• When two (or more) tests are conducted in parallel
• The goal is to maximize the probability that subjects with the disease (true
positives) are identified (increase sensitivity)
• Consequently, more false positives are also identified (decrease specificity)

• Thus, when two simultaneous tests are used, there is a net gain in sensitivity
(from 80% using test A and 90% using test B to 98% using both tests
simultaneously).
• However, there is a net loss in specificity (net specificity = 54%) compared to
using either test alone (specificity of 60% using test A and 90% using test B).

Comparison of Simultaneous and
Sequential Testing
• In a clinical setting, multiple tests are often used simultaneously. For example,
a patient admitted to a hospital may have an array of tests performed at the
time of admission. When multiple tests are used simultaneously to detect a
specific disease, the individual is generally considered to have tested
“positive” if he or she has a positive result on any one or more of the tests.
• The individual is considered to have tested “negative” if he or she tests
negative on all of the tests. The effects of such a testing approach on
sensitivity and specificity differ from those that result from sequential testing.

• In summary, as we have seen previously, when two sequential tests are used
and those who test positive by the first test are brought in for the second test,
there is a net loss in sensitivity, but a net gain in specificity, compared with
either test alone. However, when two simultaneous tests are used, there is a
net gain in sensitivity and a net loss in specificity, compared with either test
alone.
• Given these results, the decision to use either sequential or simultaneous
testing often is based both on the objectives of the testing, including whether
testing is being done for screening or diagnostic purposes, and on practical
considerations related to the setting in which the testing is being done,
including the length of hospital stay, costs, and degree of invasiveness of each
of the tests as well as the extent of third-party insurance coverage.

PREDICTIVE VALUE OF A TEST
• So far, we have asked, “How good is the test at identifying people with the
disease and people without the disease?” This is an important issue,
particularly in screening free-living populations.
• In effect, we are asking, “If we screen a population, what proportion of
people who have the disease will be correctly identified?” This is clearly an
important public health consideration.
• In the clinical setting, however, a different question may be important for the
physician: If the test results are positive in this patient, what is the probability
that this patient has the disease.

• The concept of predictive value is used to assess the performance of a test in
relation to a given frequency of the condition being sought.
• The positive predictive value (PPV) is defined as the proportion of people
with the condition among all those who received a positive test result.
• Similarly, the negative predictive value is the proportion of people without
the condition among all those who received a negative test result.

Assume a population of 100
100 have disease
900 do not have disease

• Unlike the sensitivity and specificity of the test, which can be considered
characteristic of the test being used,
• the predictive value is affected by two factors: the prevalence of the disease in
the population tested and, when the disease is infrequent, the specificity of the
test being used.

• The more prevalent a disease is in a given population, the more accurate will
be the predictive value of a positive screening test.
• The predictive value of a positive result falls as disease prevalence declines.

Relationship between Positive Predictive
Value and Specificity of the Test

Likelihood ratios
• The likelihood ratio for a test result is defined as the ratio between the
probability of observing that result in patients with the disease in question,
and the probability of that result in patients without the disease.
• Likelihood ratios are, clinically, more useful than sensitivity and specificity.
• They provide a summary of how many times more (or less) likely patients
with the disease are to have that particular result than patients without the
disease, and they can also be used to calculate the probability of disease for
individual patients

Likelihood ratio for a positive test
(LR+)
• LR+ is defined as the probability of an individual with disease having a
positive test divided by the probability of an individual without disease having
a positive test.
• The formula for calculating LR+ is
The probability of an individual with disease having a positive test (TP)
The probability of an individual without disease having a positive test(FP)

Test A used to diagnose Disease A.
Sensitivity of Test A - 80% or 0.8,
Specificity -94% or 0.94.
LR+ of this test is calculated as 0.8/1 − 0.94 = 0.8/0.06 or 13.3.
This means that a person with Disease A is about 13 times more likely to have a
positive test than a person who has not got Disease A.

• LR+s greater than 1 means that a positive test is more likely to occur in
people with the disease than in people without the disease.
• LR+s less than 1 means that a positive test is less likely to occur in people
with the disease compared to people without the disease.

Likelihood ratio for a negative test
• LR− is defined as the probability of an individual with disease having a
negative test divided by the probability of an individual without disease
having a negative test.
• The formula for calculating LR−
The probability of an individual with the disease having a negative test
The probability of an individual without the disease having a negative test

• The numerator in this equation is the converse of sensitivity (1 − sensitivity),
and the denominator is equivalent to specificity.
• Thus the LR− of a test can be calculated by dividing 1−sensitivity by
specificity (1−Sensitivity/Specificity).

• The sensitivity of Test A, was 80% or 0.8, and the specificity was 94%or
0.94.
• Thus LR− for Test A is 1 − 0.8/0.94 = 0.2/0.94 or 0.21.
• This means that the probability of having a negative test for individuals with
Disease A is 0.21 times or about one-fifth of that of those without the disease.
• Put in another way, individuals without the disease are about five times more
likely to have a negative test than individuals with the disease

• The lower the negative LR, the more certain you can be that a negative test
indicates the person does not have the disorder.
• If the LR is close to 1, then the test will not provide much information. That
is, the likelihood that a person has, or does not have, a condition will not
change at all if the LR is exactly 1.0 and the diagnostic hypothesis is no closer
to being confirmed or rejected.

Observer variation
• All observations are subjected to variation (or error). These may be of two
types
a. Intra -observer variation
b. Inter -observer variation

Intra -observer variation
• If a single observer takes two measurements (e.g., blood pressure, chest
expansion) in the same subject, at the same time and each time, he obtained a
different result, this is termed as intra-observer or within-observer variation.
• This is variation between repeated observations by the same observer on the
same subject or material at the same time.
• Intra-observer variation may often be minimized by taking the average of
several replicate measurements at the same time.

Inter -observer variation
• This is variation between different observers on the same subject or material,
also known as between-observer variation.
• Inter-observer variation has occurred if one observer examines a blood-smear
and finds malaria parasite, while a second observer examines the same slide
and finds it normal.

• Observational errors are common in the interpretation of x-rays. ECG
tracings, readings of blood pressure and studies of histopathological
specimens.
• Observer errors can be minimized by
(a) standardization of procedures for obtaining measurements and classifications
(b) intensive training of all the observers
(c) making use of two or more observers for independent assessment, etc.
• It is probable that these errors can never be eliminated absolutely.

Measuring agreement
• It is important to know when measurements from the same subjects, but taken
using two different instruments, can be used interchangeably. In any situation,
it is unlikely that two different instruments will give identical results for all
subjects.
• The extent to which two different methods of measuring the same variable
can be compared or can be used interchangeably is called agreement between
the methods. This is also sometimes called the comparability of the tests.
• When assessing agreement, we are often measuring the criterion validity or
the construct validity between two tests.

• Two examiners often do not derive the same result. The extent to which
observers agree or disagree is an important issue, whether we are considering
physical examinations, laboratory tests, or other means of assessing human
characteristics.
• We therefore need to be able to express the extent of agreement in
quantitative terms.

Percent agreement
Add the numbers in all of the cells in agreed (A + F + K + P), divide that sum by the total number,
and multiply the result by 100 to yield a percentage.

• Merely reporting the overall percent agreement is considered inadequate for
numerous reasons
1) The overall percent agreement does not tell the prevalence of the finding in
the subjects studied.
2) It does not tell how the disagreements occurred.
3) Considerable agreement would be expected by chance alone, and the overall
percent agreement does not tell the extent to which the agreement improves on
chance.

Cohen kappa
• Items such as physical exam findings, radiographic interpretations, or other
diagnostic tests often rely on some degree of subjective interpretation by observers.
• Studies that measure the agreement between two or more observers should include a
statistic that takes into account the fact that observers will sometimes agree or
disagree simply by chance.
• The kappa statistic (or kappa coefficient) is the most commonly used statistic for this
purpose. A kappa of 1 indicates perfect agreement, whereas a kappa of 0 indicates
agreement equivalent to chance.
• A limitation of kappa is that it is affected by the prevalence of the finding under
observation.

Weighted Kappa
• Sometimes, we are more interested in the agreement across major categories
in which there is meaningful difference. For example, let’s suppose we had
five categories categories of “helpfulness of noon lectures:” “very helpful,”
“somewhat helpful,” “neutral,” “somewhat a waste,” and “complete waste.”
• In this case, we may not care whether one resident categorizes as “very
helpful” while another categorizes as “somewhat helpful,” but we might care
if one resident categorizes as “very helpful” while another categorizes as
“complete waste.”
• A weighted kappa, which assigns less weight to agreement as categories are
further apart, would be reported in such instances.

Relationship between validity and
reliability

Summary & Conclusion
• Three important goals of data collection and analysis are the promotion of
accuracy and precision, the reduction of differential and non differential
errors, and the reduction in interobserver and intraobserver variability.
• Various statistical methods are available to study accuracy and usefulness of
screening test and diagnostic test in clinical medicine.

• In general test with high sensitivity are helpful for screening patients, whereas
test with high specificity are useful for confirming the diagnosis in patients.
• In clinical medicine and research, it is important to minimize errors in data, so
that they can be used to guide, rather than mislead, the individuals who
provide the care.

References
1. Hunt R J. Percent Agreement, Pearson's Correlation, and Kappa as Measures
of Inter-examiner Reliability. Preventive and Community Dentistry, College of
Dentistry, University of Iowa, Iowa City, Iowa 52242.
2. Metz CB. Basic Principles of ROC Analysis
3. Florkowski CM. Sensitivity, Specificity, Receiver-Operating Characteristic
(ROC) Curves and Likelihood Ratios: Communicating the Performance of
Diagnostic Tests. Clin Biochem Rev Vol 29 Suppl (i) August 2008

4) Viera AJ, Garrett JM. Understanding Interobserver Agreement: The Kappa
Statistic. Fam Med 2005;37(5):360-3.
5) Jekel JF, Katz DL, Elmore JG, Wild DG. Epidemiology, biostatistics, and
preventive medicine. 3rd edition. Elsevier. USA
6) Park K. Textbook of Preventive and Social Medicine. 22nd edition. Bhanot
publishers. Jabalpur

7) Knapp RG, Miller MC. Clinical epidemiology and biostatistics (NMS).
Harwal publishing company. Pennsylvania
8) Gordis L. Epidemiology. 4th edition. Saunders Elsevier.USA
9) Oral health survey Basic methods.4th edition.

Diagnostic test and agreement.pptx

Diagnostic test and agreement.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Diagnostic test and agreement.pptx

Similar to Diagnostic test and agreement.pptx (20)

Recently uploaded

Recently uploaded (20)

Diagnostic test and agreement.pptx

Editor's Notes