Epidemiological Approaches for Evaluation of diagnostic tests.pptx

Epidemiological Approaches for
Evaluation of Diagnostic Tests
DOI: 10.13140/RG.2.2.21924.86400
Available at:
https://www.researchgate.net/publication/373370051_Epidemiological_Approac
hes_for_Evaluation_of_diagnostic_tests
Bhoj R Singh
Division of Epidemiology, ICAR-IVRI,
Izatnagar-243122, India
brs1762@gmail.com
1

Introduction
Diagnosis of a disease or a problem is the first step towards solution/ treatment.
Clinical Diagnosis or Provisional Diagnosis is the first step in diagnosis and is done
after a physical examination of the patient by a clinician. Clinical diagnosis may
or may not be true and to reach Final diagnosis Laboratory Investigations using
gross and microscopic pathological observations and determining the disease
indicators are required. The diagnostic tests may be Non-dichotomous Diagnostic
Tests (when continuous values are given by the test in a range starting from sub-
normal to above-normal range) and Dichotomous Diagnostic Tests (when results
are given either plus or minus, disease or no-disease). To make non-
Dichotomous diagnostic test a Dichotomous one you need to establish the cut-off
values based on reference values or Gold Standard test readings or with the use
of Receiver operator characteristic (ROC) curves, Precision-Recall Curves,
Likelihood Ratios, etc., and finally establishing statistical agreement (using Kappa
values, Level of Agreement, χ2 Statistics) between the true diagnosis and
laboratory diagnosis. Thereafter, the Accuracy, Precision, Bias, Sensitivity,
Specificity, Positive Predictive value, and Negative Predictive value, of a
diagnostic test are established for use in clinical practice. Diagnostic tests are also
used to determine Prevalence (True prevalence, apparent prevalence) and
Incidence of the disease to estimate the disease burden so that control measures
can be implemented. There are several Phases in the development and use of a
diagnostic assay starting from conceptualization of the diagnostic test,
development and evaluation to determine flaws in diagnostic test use and
Interpretation influencers. This presentation mainly deals with the
epidemiological evaluation procedures for diagnostic tests.
2

Why we need diagnostic tests?
Clinical Diagnosis: A diagnosis offered by clinician based on clinical
examination of patients, it may not be true particularly when a disease
syndrome is having overlapping sings and symptoms. It is also referred as
Provisional Diagnosis. Thus we need a diagnostic tests to reduce the
uncertainly factor in diagnosis and adding Laboratory Diagnosis or
Laboratory Investigation (it also include gross and histo-pathological
examination by a pathologist) to the diagnostic procedures to reach the
Final Diagnosis. There may be no-agreement between Clinical Diagnosis
and Laboratory Diagnosis and to reach Final diagnosis other investigations
are required like pathological investigations and experimentation.
Diagnostic tests are often interpreted using a dichotomous outcome
(normal/abnormal, diseased/healthy, treat/don't treat) which poses less
difficulty when the test itself is dichotomous (presence or absence of a
pathogen).
However, when the tests give continuous reading instead of dichotomous
results we may face difficulty in interpretation and in such cases, the
selection of an appropriate cut-off point to separate 'positive' and
'negative' results introduces a level of uncertainty that results to false
positive and false negative outcome. Therefore, we need to evaluate the
test to make it a valid test.
To estimate the true prevalence of disease taking into account test sensitivity
and specificity (applicable when we have Gold standard tests i.e., a known
status. 3

Development of a Diagnostic test/ assay
There are several stages in development of a
diagnostic assay for clinical use including
– Development of the concept for diagnostic test.
– Testing of the concept using different diagnostic
platforms.
– Thereafter, there are four stages in development of a
diagnostic assay
• Phase I: Establishing reference values for the test parameter
• Phase II: Analyzing the validity of the test
• Phase III: Analyzing the impact of incorporation of the test in
diagnostic-therapeutic plan.
• Phase IV: Long-term assessment after incorporating the
diagnostic test in clinical practice.
4

Why to evaluate a diagnostic tests?
• For Correct interpretation:
As presence of pathogen
or detection of pathogen is
not always indicate disease
and vice-versa too.
Factors influencing
interpretation of a
diagnostic assay.
5

Some Important Definitions Related with
Diagnostic Interpretations
Trait of test Definition
Accuracy The level of agreement between the test result and the "true" clinical state.
Bias The systematic deviation from the "true" clinical state
Precision The degree of fluctuation of a series of measurements around the central
measurement.
Sensitivity Proportion of animals with the disease which test positive (true positives). The test
with high "sensitivity" from a laboratory perspective is likely to be "sensitive" from
an epidemiological perspective.
Specificity Proportion of animals without the disease which test negative (true negatives). The
ability of the test to react only when the particular analyte (antigen/ antibody) is
present and not react to the presence of other compounds and cross reacts. A highly
"specific" test in laboratory is also likely to be "specific" from an epidemiological
perspective.
PPV Positive Predictive Value of a diagnostic test is the probability that an animal tested
positive actually has the disease in question.
NPV Negative Predictive Value of a diagnostic test is the probability that an animal tested
negative actually does not have the disease in question.
True prevalence Proportion of animals in the population having the disease in question regardless of
their test result, it include both the "true" positives and the "false" negatives.
Apparent
Prevalence
Proportion of animals in the population giving a positive diagnosis with the
diagnostic test irrespective of their true status for the disease in question. It includes
both "true" positives and "false" positives.
6

True Positive False Positive
Actual infection or
disease is present.
Group (antigen/ antibody) cross-reaction
Non-specific inhibitors of the diagnostic assays
Non-specific agglutinins
True Negative False Negative
Actual infection or
disease is absent.
Natural/induced tolerance
Improper timing, Improper selection of test, Insensitive tests
Non-specific inhibitors, Toxic substances
Antibiotic induced immunoglobulin suppression
Incomplete or blocking antibody
True Disease
Status (Gold
Standard test)
Test results with diagnostic test Prevalence
Positive Negative
Disease is present True positive (A) False negative (B) True prevalence (A+B)
Disease is absent False positive (C) True negative (D)
Prevalence Apparent prevalence
(A+C)
True positive rate= A/(A+B), False positive rate= C/ (C+D)
7

The Gold Standard test is available to
know the True prevalence
Test results with the new
diagnostic test
Positive Negative
True Disease Status or results
of Gold Standard test
Disease present A B
Disease absent C D
Sensitivity (Se) = A/(A+C)
Specificity (Sp) = D/(B+D)
Apparent prevalence (AP) = A+C/(A+B+C+D)
True prevalence (TP) = A+B/(A+B+C+D)
Positive Predictive Value (PPV) = A/(A+B) = [TP x Se] / [TP x Se + (1-TP) x (1-Sp)]
Negative Predictive Value (NPV) = D/(C+D) = [(1-TP) x Sp] / [(1-TP) x Sp + (TP) x (1-Se)]
The PPV of a particular test can be improved by
1. Testing of "high risk" groups (animals with clinical signs rather than normal
animals)
2. Using a higher cut-off for higher specificity or run a test with a higher specificity
3. Use of multiple tests for interpretation of results (test in series to increase
specificity).
Sensitivity can be increased by
1. Using a lower cut off limit
2. Using more than one tests in parallel 8

No Gold Standard test is available to
know the True prevalence, then
comparing the two diagnostic tests
Results with
Diagnostic test B
Positive Negative
Results with Diagnostic Test
A
Disease present A B
Disease absent C D
Observed agreement (OA) = (A+D)/ N
N= A+B+C+D
Expected agreement (EA) = [{(A + B)/N} x {(A + C)/N}] + [{(C + D)/N} x {(B + D)/N}]
Kappa = (OA - EA)/(1-EA)
Interpretation of Kappa value
Value of Kappa Level of Agreement % of Data that are Reliable
0–0.20 None 0–4%.
0.21–0.39 Minimal 4–15%
0.40–0.59 Weak 15–35%
0.60–0.79 Moderate 35–63%
0.80–0.90 Strong 64–81%
>0.90 Almost Perfect 82–100%
9

Kappa versus χ2 Statistics
• McNemar Chi sqaure (χ2 ) processes the data in the off-diagonal elements (cell “B” and cell
“C”), the Kappa analysis focus on the data in the major diagonal from upper left to lower
right (cell “A” and cell “D”), examining whether counts along this diagonal differ
significantly from what is expected or not. Both compares observed versus expected values
but in different way.
• The Kappa (К) statistic is frequently used to test interrater reliability. The importance of
rater reliability lies in the fact that it represents the extent to which the data collected in
the study are correct representations of the variables measured.
• The kappa is a form of correlation coefficient. Correlation coefficients cannot be directly
interpreted, but a squared correlation coefficient, called the coefficient of
determination (COD) is directly interpretable.
• It is interpreted in the similar way using probability of being similar
Results with Diagnostic test B
Positive Negative
Results with
Diagnostic Test A
Disease present 102 8
Disease absent 7 103
χ2 = 1x10-6 (p, = 0.817), No significant difference i.e., with probability of 81.7% similarity
K = (102+103)/(102+8+7+103)= 205/220 = 0.932 Perfect agreement
>82% similarity. 10

Non-dichotomous Diagnostic Tests
• Many diagnostic tests give the answer in numerical values on a continuous
scale. It is often up to the clinician to decide the cut-off to choose, above or
below which the measurement is considered abnormal and requires
intervention like treatment. Once a cut-off point is chosen, the continuous data
is converted into dichotomous positive or negative, and the sensitivity and
specificity at that cut-off can be calculated. You may trade off between
sensitivity and the specificity by changing the cut point changes.
To decide the best cut-off value ‘Receiver operator characteristic (ROC) curves are
commonly used. To make an ROC curve, calculates specificity and sensitivity over
different cut points and then plot the sensitivity on the Y-axis and specificity on the
X-axis for different points of cut-off. The point nearest the left upper corner is the
best cut-off point. The ROC curve of the with larger area under the curve is the
better (more discriminatory) than the test with a curve nearer to a diagonal line.
But he clinical significance of a particular value dictates the cut-off to be chosen.
Another simple way to draw ROC curve is:
1. Calculate True positive rate and False positive rates at different cut-off values
True positive rate= A/(A+B) = Sensitivity, False positive rate= C/ (C+D) = 1-
specificity
2. Plot for each threshold, FPR value on the x-axis and the TPR value on the y-axis,
join the dots with a line.
3. The area covered below the line is called “Area Under the Curve (AUC). The
higher the AUC better the cut-off value.
11

Source: https://mmuratarat.github.io/2019-10-01/how-to-compute-AUC-plot-ROC-by-hand
12

ROC Curves versus Precision-Recall Curves
• ROC Curves summarize the trade-off between the true
positive rate and false positive rate for a predictive model
using different probability thresholds.
• Precision-Recall curves summarize the trade-off between the
true positive rate and the positive predictive value for a
predictive model using different probability thresholds.
• ROC curves are appropriate when the observations are
balanced between each class, whereas precision-recall curves
are appropriate for imbalanced data sets.
13

Likelihood Ratios
• On choosing a cut-off for a test and giving results as negative or positive is
associated with a danger of losing important information because of
missing abnormally high or low values which are clinically more significant
than a borderline one for instituting an effective therapeutic intervention.
• Likelihood ratios (LR) are a way of getting over this difficulty. A positive
likelihood ratio (LR+) expresses in a single value the odds that a particular
test result is likely to come from a patient with the disease.
• LRs can be calculated at different cut-offs and also over a range of
readings.
• LR+ve= True positive rate i.e., [A/(A+C)]/ False positive rate i.e.,
[B/(B+D)]
or LR+ve= Sensitivity/ (1-Specificity)
• For a test with a continuous value, LR+ve can be calculated at any value of
the test result by dividing the probability of that test result in patients with
disease by the probability of the same test result in patients without
disease.
• LRs, since they are calculated vertically, do not change with prevalence of
disease in the population being tested.
14

Problems of Evaluation of a new
diagnostic test/ assay
• Reference (gold standard) test bias: It is a big real problem
when the new test is better in sensitivity and / or
specificity.
• Solution: Intuition, that improved scientific knowledge
leads to development of tests better in sensitivity and
specificity. However needs to be substantiated with
detection of causal agent using newer and more efficient
methods than culture methods.
• Sample size in diagnostic test evaluation studies:
Diagnostic test evaluation is also a form of prevalence
(cross-sectional) studies, sample size is calculated as
below:�
N=��[Zα
2�P(1−P)]/W2
• P is the desired proportion (sensitivity or specificity), zα is
the standard normal deviate for a 2 sided α and W is the
desired total width of the confidence interval.
15

Quiz
• How to compare diagnostic value of a new assay when Gold Standard test is available?
• What is role of sample size in evaluation of a new diagnostic assay and what the determinants of
sample size?
• How to compare diagnostic value of two new assays in absence of Gold Standard test?
• Calculate Se, Sp, PPV, NPV, TP, FP, prevalence (for both the tests) and compare k and χ2 statistics
for the following results
Test B results
Positive Negative
Diseased 205 75
Test A (Gold Standard)
Not Diseased 25 205
• Draw a ROC curve for the test an ELISA test developed for diagnosis of JE using following
statistics and decide which cut-off is the best
Cut-off Sp Se
0.3 0.5 0.82
0.4 0.6 0.82
0.5 0.75 0.78
0.6 0.82 0.75
0.7 0.82 0.75
16

References
• https://www.fao.org/3/X4946E/x4946e0b.htm
• https://cegh.net/article/S2213-3984(15)00089-5/fulltext
• https://www.slideshare.net/singh_br1762/epidemiological-
method-to-determine-utility-of-a-diagnostic
• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6788703/
• https://www.slideshare.net/singh_br1762/epidemiological-
method-to-determine-utility-of-a-diagnostic.
• https://towardsdatascience.com/understanding-the-roc-curve-in-
three-visual-steps-795b1399481c
• https://cegh.net/action/showPdf?pii=S2213-3984%2815%2900089-
5
• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052.
• https://pressbooks.library.upei.ca/montelpare/chapter/measures-
of-association-part-2-the-kappa-statistic.
• https://www.jstor.org/stable/23214180
17

Epidemiological Approaches for Evaluation of diagnostic tests.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Epidemiological Approaches for Evaluation of diagnostic tests.pptx

Similar to Epidemiological Approaches for Evaluation of diagnostic tests.pptx (20)

More from Bhoj Raj Singh

More from Bhoj Raj Singh (20)

Recently uploaded

Recently uploaded (20)

Epidemiological Approaches for Evaluation of diagnostic tests.pptx