DefinitionHow well a survey measures what it sets out tomeasure.Validity can be determined only if there is areference procedure of “gold standard”.Food–frequency questionnaires food diariesBirth weight hospital record.
Screening testValidity – get the correct resultSensitive – correctly classify casesSpecificity – correctly classify non-cases[screening and diagnosis are not identical]
Validity: 1) SensitivityProbability (proportion) ofcorrect classification of cases /Cases found all cases
Validity: 2) SpecificityProbability (proportion) ofcorrect classification of noncases /Noncases identified all noncases
2 cases / month OO OO O O O O O OO O
Pre-detectable preclinical clinical old OO OO O O O O O O O O OO O
Pre-detectable pre-clinical clinical old O O O OO OO O O O O O O O O O O O O O O O O O O O OOOO O O O O O O O O
What is the prevalence of “the condition”? O O O OO OO O O O O O O O O O O O O O O O O O O O OOOO O O O O O O O O
Sensitivity of a screening testProbability (proportion) ofcorrect classification of detectable, pre-clinical cases
Pre-detectable pre-clinical clinical old (8) (10) (6) (14) O O O OO OO O O O O O O O O O O O O O O O O O O O OOO O O O O O O O O O
Correctly classifiedSensitivity: ––––––––––––––––––––––––––– Total detectable pre-clinical (10) O O O OO OO O O O O O O O O O O O O O O O O O O O OOO O O O O O O O O O
Specificity of a screening test Probability (proportion) of correct classification of noncases Noncases identified / all noncases
Pre-detectable pre-clinical clinical old (8) (10) (6) (14) O O O OO OO O O O O O O O O O O O O O O O O O O O OOOO O O O O O O O O
Correctly classifiedSpecificity: ––––––––––––––––––––––––––––– Total non-cases (& pre-detect) (162 or 170) O O O OO OO O O O O O O O O O O O O O O O O O O O OOOO O O O O O O O O
True Disease Status Cases Non-cases True False Positive positive positive a+bScreening ab Test c d True Results False c+d Negative negative negative a+c b+d True positives a Sensitivity = = All cases a+c True negatives d Specificity = = All non-cases b+d
True Disease Status Cases Non-cases Positive 140 1,000 1,140Screening ab Test c d Results 19,000 19,060 Negative 60 200 20,000 True positives 140Sensitivity = = = 70% All cases 200Specificity = True negatives = 19,000 = 95% All non-cases 20,000
Interpreting test results: predictive value Probability (proportion) of those tested who are correctly classified Cases identified / all positive tests Noncases identified / all negative tests
True Disease Status Cases Non-cases True False Positive positive positive a+bScreening ab Test c d True Results False c+d Negative negative negative a+c b+d True positives a PPV = = All positives a+b True negatives d NPV = = All negatives c+d
True Disease Status Cases Non-cases Positive 140 1,000 1,140Screening ab Test c d Results 19,000 19,060 Negative 60 200 20,000 True positives 140 PPV = = = 12.3% All positives 1,140 19,000 NPV = True negatives = = 99.7% All negatives 19,060
Receiver operating characteristic (ROC) curveNot aIl tests give a simple yes/no result. Someyield results that are numerical values along acontinuous scale of measurement. in thesesituations, high sensitivity is obtained at thecost of low specificity and vice versa
ReliabilityRepeatability – get same resultEach timeFrom each instrumentFrom each raterIf don’t know correct result, then canexamine reliability only.
DefinitionThe degree of stability exhibited when ameasurement is repeated under identicalconditionsLack of reliability may arise from divergencesbetween observers or instruments ofmeasurement or instability of the attributebeing measured (from Last. Dictionary of Epidemiology.
Assessment of reliabilityTest-Retest ReliabilityEquivalence Internal ConsistencyspssReliability: Kappa
EXAMPLE OF PERCENT AGREEMENTTwo physicians are each given aset of 100 X-rays to look at independentlyand asked to judge whether pneumonia ispresent or absent. When both sets ofdiagnoses are tallied, it is found that 95%of the diagnoses are the same.
IS PERCENT AGREEMENT GOOD ENOUGH? Do these two physicians exhibit high diagnostic reliability? Can there be 95% agreement betweentwo observers without really havinggood reliability?
Compare the two tables below: Table 2 Table 1 MD#1 MD#1 Yes No Yes No Yes 1 3 Yes 43 3 MD#2 MD#2 No 2 94 No 2 52In both instances, the physicians agree95% of the time. Are the two physiciansequally reliable in the two tables?
USE OF THE KAPPA STATISTIC TO ASSESS RELIABILITYKappa is a widely used test ofinter or intra-observer agreement(or reliability) which corrects forchance agreement.
KAPPA VARIES FROM + 1 to - 1+ 1 means that the two observers are perfectlyreliable. They classify everyone exactly the same way.0 means there is no relationship at allbetween the two observer’sclassifications, above the agreement thatwould be expected by chance.- 1 means the two observers classify exactly theopposite of each other. If one observer saysyes, the other always says no.
GUIDE TO USE OF KAPPAS INEPIDEMIOLOGY AND MEDICINEKappa > .80 is considered excellentKappa .60 - .80 is considered goodKappa .40 - .60 is considered fairKappa < .40 is considered poor
WAY TO CALCULATE KAPPA1. Calculate observed agreement (cells inwhich the observers agree/total cells). Inboth table 1 and table 2 it is 95%2. Calculate expected agreement (chanceagreement) based on the marginal totals
OBSERVED MD #1 How do we calculate • the N expected by chance in each cell? Yes No We assume that •MD#2 Yes 1 3 4 each cell should No 2 94 96 reflect the marginal distributions, i.e. the 3 97 100 proportion of yes and no answersEXPECTED MD #1 should be the same within the four-fold table as in the Yes No marginal totals.MD#2 Yes 4 No 96 3 97 100
To do this, we find the proportion of answers in either the column (3% and 97%, yes and no respectively for MD #1) or row (4% and 96% yes and no respectively for MD #2) marginal totals, and apply one of the two proportions to the other marginal total. For example, 96% of the row totals are in the “No” category. Therefore, by chance 96% of MD #1’s “No’s” should also be in the “No” column. 96% of 97 is 93.12. MD#1 EXPECTED Yes No MD#2 Yes 4 No 93.12 96 3 97 100
By subtraction, all other cells fill in automatically,and each yes/no distribution reflects the marginaldistribution. Any cell could have been used to makethe calculation, because once one cell is specified ina 2x2 table with fixed marginal distributions, allother cells are also specified. EXPECTED MD #1 Yes No MD#2 Yes 0.12 3.88 4 No 2.88 93.12 96 3 97 100
Now you can see that just by the operation of chance, 93.24 of the 100 observations should have been agreed to by the two observers. (93.12 + 0.12) EXPECTED MD #1 Yes No MD#2 Yes 0.12 3.88 4 No 2.88 93.12 96 3 97 100
Below is the formula for calculating Kappa from expected agreementObserved agreement - Expected Agreement1 - Expected Agreement95% - 93.24% = 1.76% = .261 - 93.24% 6.76%
How good is a Kappa of 0.26?Kappa > .80 is considered excellent Kappa .60 - .80 is considered good Kappa .40 - .60 is considered fair Kappa < .40 is considered poor
In the second example, the observedagreement was also 95%, but the marginal totals were very different ACTUAL MD #1 Yes No MD#2 Yes 46 No 54 45 55 100
Using the same procedure as before, we calculate the expectedN in any one cell, based on the marginal totals. For example, the lower right cell is 54% of 55, which is 29.7 ACTUAL MD #1 Yes No MD#2 Yes 46 No 29.7 54 45 55 100
And, by subtraction the other cells areas below. The cells which indicateagreement are highlighted in yellow,and add up to 50.4% ACTUAL MD #1 Yes No MD#2 Yes 20.7 25.3 46 No 24.3 29.7 54 45 55 100
Enter the two agreements into the formula Observed agreement - Expected Agreemen1 - Expected Agreement95% - 50.4% = 44.6% = .901 - 50.4% 49.6% In this example, the observers have the same % agreement, but now they are much different from chance. Kappa of 0.90 is considered excellent