EPI809/Spring 2008EPI809/Spring 2008 11
Fisher’s Exact TestFisher’s Exact Test
 Fisher’s Exact TestFisher’s Exact Test is a test foris a test for
independence in a 2 X 2 table. It is mostindependence in a 2 X 2 table. It is most
useful when the total sample size and theuseful when the total sample size and the
expected values are small. The test holdsexpected values are small. The test holds
the marginal totals fixed and computes thethe marginal totals fixed and computes the
hypergeometric probability that nhypergeometric probability that n1111 is atis at
least as large as the observed valueleast as large as the observed value
 Useful when E(cell counts) < 5.Useful when E(cell counts) < 5.
EPI809/Spring 2008EPI809/Spring 2008 22
Hypergeometric distributionHypergeometric distribution
 Example: 2x2 table with cell counts a, b, c, d. AssumingExample: 2x2 table with cell counts a, b, c, d. Assuming
marginal totals are fixed:marginal totals are fixed:
M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.
for convenience assume N1<N2, M1<M2.for convenience assume N1<N2, M1<M2.
possible value of a are: 0, 1, …min(M1,N1).possible value of a are: 0, 1, …min(M1,N1).
 Probability distribution of cell count a follows aProbability distribution of cell count a follows a
hypergeometric distribution:hypergeometric distribution:
N = a + b + c + d = N1 + N2 = M1 + M2N = a + b + c + d = N1 + N2 = M1 + M2

Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]

Mean (x) = M1N1/ NMean (x) = M1N1/ N

Var (x) = M1M2N1N2 / [NVar (x) = M1M2N1N2 / [N22
(N-1)](N-1)]
 Fisher exact test is based on this hypergeometric distr.Fisher exact test is based on this hypergeometric distr.
EPI809/Spring 2008EPI809/Spring 2008 33
Fisher’s Exact Test ExampleFisher’s Exact Test Example
 Is HIV Infection related to Hx of STDs in SubIs HIV Infection related to Hx of STDs in Sub
Saharan African Countries? Test at 5%Saharan African Countries? Test at 5%
level.level.
yesyes nono totaltotal
yesyes 33 77 1010
nono 55 1010 1515
totaltotal 88 1717
HIV Infection
Hx of STDs
EPI809/Spring 2008EPI809/Spring 2008 44
Hypergeometric prob.Hypergeometric prob.
 Probability of observing this specific tableProbability of observing this specific table
given fixed marginal totals isgiven fixed marginal totals is
Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]
= 0.3332= 0.3332
 Note the above is not the p-value. Why?Note the above is not the p-value. Why?
 Not the accumulative probability, or notNot the accumulative probability, or not
the tail probability.the tail probability.
 Tail prob = sum of all values (a = 3, 2, 1,Tail prob = sum of all values (a = 3, 2, 1,
0).0).
EPI809/Spring 2008EPI809/Spring 2008 55
Hypergeometric probHypergeometric prob
Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]
= 0.2082= 0.2082
Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]
= 0.0595= 0.0595
Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]
= 0.0059= 0.0059
Tail prob = .3332+.2082+.0595+.0059 = .Tail prob = .3332+.2082+.0595+.0059 = .
60686068
EPI809/Spring 2008EPI809/Spring 2008 66
Data dis;Data dis;
input STDs $ HIV $ count;input STDs $ HIV $ count;
cards;cards;
no no 10no no 10
No Yes 5No Yes 5
yes no 7yes no 7
yes yes 3yes yes 3
;;
run;run;
proc freq data=dis order=data;proc freq data=dis order=data;
weight Count;weight Count;
tables STDs*HIV/chisq fisher;tables STDs*HIV/chisq fisher;
run;run;
Fisher’s Exact TestFisher’s Exact Test
SAS CodesSAS Codes
EPI809/Spring 2008EPI809/Spring 2008 77
Pearson Chi-squares testPearson Chi-squares test
Yates correctionYates correction
 Pearson Chi-squares testPearson Chi-squares test
χχ22
= ∑= ∑ii (O(Oii-E-Eii))22
/E/Eii follows a chi-squaresfollows a chi-squares
distribution with df = (r-1)(c-1)distribution with df = (r-1)(c-1)
if Eif Eii ≥ 5.≥ 5.
 Yates correction for more accurate p-valueYates correction for more accurate p-value
χχ22
= ∑= ∑ii (|O(|Oii-E-Eii| - 0.5)| - 0.5)22
/E/Eii
when Owhen Oii and Eand Eii are close to each other.are close to each other.
EPI809/Spring 2008EPI809/Spring 2008 88
Fisher’s Exact Test SAS OutputFisher’s Exact Test SAS Output
Statistics for Table of STDs by HIV
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 1 0.0306 0.8611
Likelihood Ratio Chi-Square 1 0.0308 0.8608
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.0294 0.8638
Phi Coefficient -0.0350
Contingency Coefficient 0.0350
Cramer's V -0.0350
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F) 10
Left-sided Pr <= F 0.6069
Right-sided Pr >= F 0.7263
Table Probability (P) 0.3332
Two-sided Pr <= P 1.0000
EPI809/Spring 2008EPI809/Spring 2008 99
Fisher’s Exact TestFisher’s Exact Test
 The output consists of three p-values:
 Left: Use this when the alternative to
independence is that there is negative
association between the variables. That is, the
observations tend to lie in lower left and upper
right.
 Right: Use this when the alternative to
independence is that there is positive
association between the variables. That is, the
observations tend to lie in upper left and lower
right.
 2-Tail: Use this when there is no prior
alternative.
EPI809/Spring 2008EPI809/Spring 2008 1010
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- Nominal Data
 Cohen’s Kappa (Cohen’s Kappa ( κκ ))

Also referred to as Cohen’s General Index ofAlso referred to as Cohen’s General Index of
Agreement. It was originally developed toAgreement. It was originally developed to
assess the degree of agreement between twoassess the degree of agreement between two
judges or raters assessing n items on thejudges or raters assessing n items on the
basis of a nominal classification for 2basis of a nominal classification for 2
categories. Subsequent work by Fleiss andcategories. Subsequent work by Fleiss and
Light presented extensions of this statistic toLight presented extensions of this statistic to
more than 2 categories.more than 2 categories.
EPI809/Spring 2008EPI809/Spring 2008 1111
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- Nominal Data
 Cohen’s Kappa (Cohen’s Kappa ( κκ ))
.33
(a)
.07
(b)
.13
(c)
.47
(d)
Inspector A
Good Bad
Inspector ‘B’
Bad
Good .40 (pB)
.60 (qB)
.46 .54 1.00
(pA) (qA)
EPI809/Spring 2008EPI809/Spring 2008 1212
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- Nominal Data
 Cohen’s Kappa (Cohen’s Kappa ( κκ ))

Cohen’sCohen’s κκ requires that we calculate tworequires that we calculate two
values:values:
• ppoo : the proportion of cases in which agreement: the proportion of cases in which agreement
occurs. In our example, this value equals 0.80.occurs. In our example, this value equals 0.80.
• Pe : the proportion of cases in which agreementPe : the proportion of cases in which agreement
would have been expected due purely to chance,would have been expected due purely to chance,
based upon the marginal frequencies; wherebased upon the marginal frequencies; where
pe = pApB + qAqB = 0.508 for our data
EPI809/Spring 2008EPI809/Spring 2008 1313
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- Nominal Data
 Cohen’s Kappa (Cohen’s Kappa ( κκ ))

Then, Cohen’sThen, Cohen’s κκ measures themeasures the
agreement between two variables and isagreement between two variables and is
defined bydefined by
κ =
po - pe
1 - pe
= 0.593
EPI809/Spring 2008EPI809/Spring 2008 1414
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- Nominal Data
 Cohen’s Kappa (Cohen’s Kappa ( κκ ))

To test the Null Hypothesis that the trueTo test the Null Hypothesis that the true
kappakappa κκ = 0, we use the Standard Error:= 0, we use the Standard Error:
 then z =then z = κκ//σσκκ~N(0,1)~N(0,1)
σκ =
1
(1 - pe) n
pe + pe
2
- ∑ pi.p.i (pi.+p.i)
i = 1
k
where pi. & p.i refer to row
and column proportions (in
textbook, ai= pi. & bi=p.i)
EPI809/Spring 2008EPI809/Spring 2008 1515
Useful Measures of AssociationUseful Measures of Association
- Nominal Data- SAS CODES- Nominal Data- SAS CODES
DataData kap;kap;
input B $ A $ prob;input B $ A $ prob;
n=n=100100;;
count=prob*n;count=prob*n;
cards;cards;
Good Good .33Good Good .33
Good Bad .07Good Bad .07
Bad Good .13Bad Good .13
Bad Bad .47Bad Bad .47
;;
runrun;;
procproc freqfreq data=kap order=data;data=kap order=data;
weight Count;weight Count;
tables B*A/chisq;tables B*A/chisq;
test kappa;test kappa;
runrun;;
EPI809/Spring 2008EPI809/Spring 2008 1616
Useful Measures of AssociationUseful Measures of Association
- Nominal Data-- Nominal Data- SAS OUTPUTSAS OUTPUT
The FREQ Procedure
Statistics for Table of B by A
Simple Kappa Coefficient
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Kappa 0.5935
ASE 0.0806
95% Lower Conf Limit 0.4356
95% Upper Conf Limit 0.7514
Test of H0: Kappa = 0
ASE under H0 0.0993
Z 5.9796
One-sided Pr > Z <.0001
Two-sided Pr > |Z| <.0001
Sample Size = 100
EPI809/Spring 2008EPI809/Spring 2008 1717
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 1818
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
 The approximate test previously presented forThe approximate test previously presented for
assessing a difference in proportions is based uponassessing a difference in proportions is based upon
the assumption that the two samples arethe assumption that the two samples are
independentindependent..
 Suppose, however, that we are faced with a situationSuppose, however, that we are faced with a situation
where this is not true. Suppose we randomly-selectwhere this is not true. Suppose we randomly-select
100 people, and find that 20% of them have flu. Then,100 people, and find that 20% of them have flu. Then,
imagine that we apply some type of treatment to allimagine that we apply some type of treatment to all
sampled peoples; and on a post-test, we find that 20%sampled peoples; and on a post-test, we find that 20%
have flu.have flu.
Basis / Rationale for the TestBasis / Rationale for the Test
EPI809/Spring 2008EPI809/Spring 2008 1919
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
 We might be tempted to suppose that no hypothesisWe might be tempted to suppose that no hypothesis
test is required under these conditions, in that thetest is required under these conditions, in that the
‘Before’ and ‘After’‘Before’ and ‘After’ pp values are identical, and wouldvalues are identical, and would
surely result in a test statistic value of 0.00.surely result in a test statistic value of 0.00.
 The problem with this thinking, however, is that the twoThe problem with this thinking, however, is that the two
samplesample pp values are dependent, in that each personvalues are dependent, in that each person
was assessed twice. Itwas assessed twice. It isis possible that the 20 peoplepossible that the 20 people
that had flu originally still had flu. It is also possiblethat had flu originally still had flu. It is also possible
that the 20 people that had flu on the second testthat the 20 people that had flu on the second test werewere
a completely different seta completely different set of 20 people!of 20 people!
EPI809/Spring 2008EPI809/Spring 2008 2020
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
 It is for precisely this type of situation thatIt is for precisely this type of situation that
McNemar’s Test for Correlated (Dependent)McNemar’s Test for Correlated (Dependent)
Proportions is applicable.Proportions is applicable.
 McNemar’s Test employs two unique featuresMcNemar’s Test employs two unique features
for testing the two proportions:for testing the two proportions:
* a special fourfold contingency table; with a* a special fourfold contingency table; with a
* special-purpose chi-square (* special-purpose chi-square (ΧΧ 22
) test statistic) test statistic
(the approximate test).(the approximate test).
EPI809/Spring 2008EPI809/Spring 2008 2121
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
Nomenclature for the Fourfold (2 x 2)Nomenclature for the Fourfold (2 x 2)
Contingency TableContingency Table
AA BB
DDCC
(B + D)(B + D)(A + C)(A + C)
(C + D)(C + D)
(A + B)(A + B)
nn
where
(A+B) + (C+D) =
(A+C) + (B+D) =
n = number of
units
evaluated
and where
df = 1
EPI809/Spring 2008EPI809/Spring 2008 2222
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
1. Construct a 2x2 table where the paired1. Construct a 2x2 table where the paired
observations are the sampling units.observations are the sampling units.
2. Each observation must represent a single2. Each observation must represent a single
joint event possibility; that is, classifiable injoint event possibility; that is, classifiable in
only one cell of the contingency table.only one cell of the contingency table.
3. In it’s Exact form, this test may be conducted3. In it’s Exact form, this test may be conducted
as a One Sample Binomial for the B & C cellsas a One Sample Binomial for the B & C cells
Underlying Assumptions of the TestUnderlying Assumptions of the Test
EPI809/Spring 2008EPI809/Spring 2008 2323
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
 4. The expected frequency (4. The expected frequency (ffee) for the B) for the B
and C cells on the contingency tableand C cells on the contingency table
must be equal to or greater than 5;must be equal to or greater than 5;
wherewhere
ffee = (B + C) / 2= (B + C) / 2
from the Fourfold tablefrom the Fourfold table
Underlying Assumptions of the TestUnderlying Assumptions of the Test
EPI809/Spring 2008EPI809/Spring 2008 2424
McNemar’s Test for Correlated (Dependent)McNemar’s Test for Correlated (Dependent)
ProportionsProportions
Sample ProblemSample Problem
A randomly selected group of 120 students taking a
standardized test for entrance into college exhibits a
failure rate of 50%. A company which specializes in
coaching students on this type of test has indicated that it
can significantly reduce failure rates through a four-hour
seminar. The students are exposed to this coaching
session, and re-take the test a few weeks later. The
school board is wondering if the results justify paying this
firm to coach all of the students in the high school. Should
they? Test at the 5% level.
EPI809/Spring 2008EPI809/Spring 2008 2525
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
Sample ProblemSample Problem
The summary data for this study appear as follows:
Number of
Students
Status Before
Session
Status After
Session
4 Fail Fail
4 Pass Fail
56 Fail Pass
56 Pass Pass
EPI809/Spring 2008EPI809/Spring 2008 2626
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
The data are then entered into the Fourfold
Contingency table:
5656 5656
4444
60606060
88
112112
120120
Before
Pass Fail
Pass
Fail
After
EPI809/Spring 2008EPI809/Spring 2008 2727
 Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses
HH00 :: ππ11 == ππ22
HH11 :: ππ11 ≠≠ ππ22
wherewhere ππ11 andand ππ22 relate to the proportion ofrelate to the proportion of
observations reflectingobservations reflecting changeschanges in status (thein status (the
B & C cells in the table)B & C cells in the table)
 Step II :Step II : αα == 0.050.05
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 2828
 Step III : State the Associated TestStep III : State the Associated Test
StatisticStatistic
{ ABS (B - C) - 1 }
2
Χ 2
=
B + C
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 2929
 Step IV : State the distribution of theStep IV : State the distribution of the
Test Statistic When HTest Statistic When Hoo is Trueis True
ΧΧ 22
== ΧΧ 22
with 1with 1 dfdf when Hwhen Hoo is Trueis True
d
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 3030
Step V : Reject Ho if ABS (Χ 2
) > 3.84
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
X² Distribution
X²0
3.84
Chi-Square Curve w. df=1
EPI809/Spring 2008EPI809/Spring 2008 3131
 Step VI : Calculate the Value of theStep VI : Calculate the Value of the
Test StatisticTest Statistic

McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions(Dependent) Proportions
{ ABS (56 - 4) - 1 }
2
Χ 2
= = 43.35
56 + 4
EPI809/Spring 2008EPI809/Spring 2008 3232
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions-SAS Codes(Dependent) Proportions-SAS Codes
DataData test;test;
input Before $ After $ count;input Before $ After $ count;
cards;cards;
pass pass 56pass pass 56
pass fail 56pass fail 56
fail pass 4fail pass 4
fail fail 4;fail fail 4;
runrun;;
procproc freqfreq data=test order=data;data=test order=data;
weight Count;weight Count;
tables Before*After/tables Before*After/agreeagree;;
runrun;;
EPI809/Spring 2008EPI809/Spring 2008 3333
McNemar’s Test for CorrelatedMcNemar’s Test for Correlated
(Dependent) Proportions-SAS Output(Dependent) Proportions-SAS Output
Statistics for Table of Before by After
McNemar's Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Statistic (S) 45.0667
DF 1
Pr > S <.0001
Sample Size = 120
Without the
correction
EPI809/Spring 2008EPI809/Spring 2008 3434
Conclusion: What we have learnedConclusion: What we have learned
1.1. Comparison of binomial proportion using ZComparison of binomial proportion using Z
andand χχ22
Test.Test.
2.2. ExplainExplain χχ22
Test for Independence of 2Test for Independence of 2
variablesvariables
3.3. Explain The Fisher’s test for independenceExplain The Fisher’s test for independence
4.4. McNemar’s tests for correlated dataMcNemar’s tests for correlated data
5.5. Kappa StatisticKappa Statistic
6.6. Use of SAS Proc FREQUse of SAS Proc FREQ
EPI809/Spring 2008EPI809/Spring 2008 3535
Conclusion: Further readingsConclusion: Further readings
Read textbook forRead textbook for
1.1. Power and sample size calculationPower and sample size calculation
2.2. Tests for trendsTests for trends

Mc namer test of correlation

  • 1.
    EPI809/Spring 2008EPI809/Spring 200811 Fisher’s Exact TestFisher’s Exact Test  Fisher’s Exact TestFisher’s Exact Test is a test foris a test for independence in a 2 X 2 table. It is mostindependence in a 2 X 2 table. It is most useful when the total sample size and theuseful when the total sample size and the expected values are small. The test holdsexpected values are small. The test holds the marginal totals fixed and computes thethe marginal totals fixed and computes the hypergeometric probability that nhypergeometric probability that n1111 is atis at least as large as the observed valueleast as large as the observed value  Useful when E(cell counts) < 5.Useful when E(cell counts) < 5.
  • 2.
    EPI809/Spring 2008EPI809/Spring 200822 Hypergeometric distributionHypergeometric distribution  Example: 2x2 table with cell counts a, b, c, d. AssumingExample: 2x2 table with cell counts a, b, c, d. Assuming marginal totals are fixed:marginal totals are fixed: M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d. for convenience assume N1<N2, M1<M2.for convenience assume N1<N2, M1<M2. possible value of a are: 0, 1, …min(M1,N1).possible value of a are: 0, 1, …min(M1,N1).  Probability distribution of cell count a follows aProbability distribution of cell count a follows a hypergeometric distribution:hypergeometric distribution: N = a + b + c + d = N1 + N2 = M1 + M2N = a + b + c + d = N1 + N2 = M1 + M2  Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]  Mean (x) = M1N1/ NMean (x) = M1N1/ N  Var (x) = M1M2N1N2 / [NVar (x) = M1M2N1N2 / [N22 (N-1)](N-1)]  Fisher exact test is based on this hypergeometric distr.Fisher exact test is based on this hypergeometric distr.
  • 3.
    EPI809/Spring 2008EPI809/Spring 200833 Fisher’s Exact Test ExampleFisher’s Exact Test Example  Is HIV Infection related to Hx of STDs in SubIs HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5%Saharan African Countries? Test at 5% level.level. yesyes nono totaltotal yesyes 33 77 1010 nono 55 1010 1515 totaltotal 88 1717 HIV Infection Hx of STDs
  • 4.
    EPI809/Spring 2008EPI809/Spring 200844 Hypergeometric prob.Hypergeometric prob.  Probability of observing this specific tableProbability of observing this specific table given fixed marginal totals isgiven fixed marginal totals is Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!] = 0.3332= 0.3332  Note the above is not the p-value. Why?Note the above is not the p-value. Why?  Not the accumulative probability, or notNot the accumulative probability, or not the tail probability.the tail probability.  Tail prob = sum of all values (a = 3, 2, 1,Tail prob = sum of all values (a = 3, 2, 1, 0).0).
  • 5.
    EPI809/Spring 2008EPI809/Spring 200855 Hypergeometric probHypergeometric prob Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!] = 0.2082= 0.2082 Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!] = 0.0595= 0.0595 Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!] = 0.0059= 0.0059 Tail prob = .3332+.2082+.0595+.0059 = .Tail prob = .3332+.2082+.0595+.0059 = . 60686068
  • 6.
    EPI809/Spring 2008EPI809/Spring 200866 Data dis;Data dis; input STDs $ HIV $ count;input STDs $ HIV $ count; cards;cards; no no 10no no 10 No Yes 5No Yes 5 yes no 7yes no 7 yes yes 3yes yes 3 ;; run;run; proc freq data=dis order=data;proc freq data=dis order=data; weight Count;weight Count; tables STDs*HIV/chisq fisher;tables STDs*HIV/chisq fisher; run;run; Fisher’s Exact TestFisher’s Exact Test SAS CodesSAS Codes
  • 7.
    EPI809/Spring 2008EPI809/Spring 200877 Pearson Chi-squares testPearson Chi-squares test Yates correctionYates correction  Pearson Chi-squares testPearson Chi-squares test χχ22 = ∑= ∑ii (O(Oii-E-Eii))22 /E/Eii follows a chi-squaresfollows a chi-squares distribution with df = (r-1)(c-1)distribution with df = (r-1)(c-1) if Eif Eii ≥ 5.≥ 5.  Yates correction for more accurate p-valueYates correction for more accurate p-value χχ22 = ∑= ∑ii (|O(|Oii-E-Eii| - 0.5)| - 0.5)22 /E/Eii when Owhen Oii and Eand Eii are close to each other.are close to each other.
  • 8.
    EPI809/Spring 2008EPI809/Spring 200888 Fisher’s Exact Test SAS OutputFisher’s Exact Test SAS Output Statistics for Table of STDs by HIV Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350 WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263 Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000
  • 9.
    EPI809/Spring 2008EPI809/Spring 200899 Fisher’s Exact TestFisher’s Exact Test  The output consists of three p-values:  Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right.  Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right.  2-Tail: Use this when there is no prior alternative.
  • 10.
    EPI809/Spring 2008EPI809/Spring 20081010 Useful Measures of AssociationUseful Measures of Association - Nominal Data- Nominal Data  Cohen’s Kappa (Cohen’s Kappa ( κκ ))  Also referred to as Cohen’s General Index ofAlso referred to as Cohen’s General Index of Agreement. It was originally developed toAgreement. It was originally developed to assess the degree of agreement between twoassess the degree of agreement between two judges or raters assessing n items on thejudges or raters assessing n items on the basis of a nominal classification for 2basis of a nominal classification for 2 categories. Subsequent work by Fleiss andcategories. Subsequent work by Fleiss and Light presented extensions of this statistic toLight presented extensions of this statistic to more than 2 categories.more than 2 categories.
  • 11.
    EPI809/Spring 2008EPI809/Spring 20081111 Useful Measures of AssociationUseful Measures of Association - Nominal Data- Nominal Data  Cohen’s Kappa (Cohen’s Kappa ( κκ )) .33 (a) .07 (b) .13 (c) .47 (d) Inspector A Good Bad Inspector ‘B’ Bad Good .40 (pB) .60 (qB) .46 .54 1.00 (pA) (qA)
  • 12.
    EPI809/Spring 2008EPI809/Spring 20081212 Useful Measures of AssociationUseful Measures of Association - Nominal Data- Nominal Data  Cohen’s Kappa (Cohen’s Kappa ( κκ ))  Cohen’sCohen’s κκ requires that we calculate tworequires that we calculate two values:values: • ppoo : the proportion of cases in which agreement: the proportion of cases in which agreement occurs. In our example, this value equals 0.80.occurs. In our example, this value equals 0.80. • Pe : the proportion of cases in which agreementPe : the proportion of cases in which agreement would have been expected due purely to chance,would have been expected due purely to chance, based upon the marginal frequencies; wherebased upon the marginal frequencies; where pe = pApB + qAqB = 0.508 for our data
  • 13.
    EPI809/Spring 2008EPI809/Spring 20081313 Useful Measures of AssociationUseful Measures of Association - Nominal Data- Nominal Data  Cohen’s Kappa (Cohen’s Kappa ( κκ ))  Then, Cohen’sThen, Cohen’s κκ measures themeasures the agreement between two variables and isagreement between two variables and is defined bydefined by κ = po - pe 1 - pe = 0.593
  • 14.
    EPI809/Spring 2008EPI809/Spring 20081414 Useful Measures of AssociationUseful Measures of Association - Nominal Data- Nominal Data  Cohen’s Kappa (Cohen’s Kappa ( κκ ))  To test the Null Hypothesis that the trueTo test the Null Hypothesis that the true kappakappa κκ = 0, we use the Standard Error:= 0, we use the Standard Error:  then z =then z = κκ//σσκκ~N(0,1)~N(0,1) σκ = 1 (1 - pe) n pe + pe 2 - ∑ pi.p.i (pi.+p.i) i = 1 k where pi. & p.i refer to row and column proportions (in textbook, ai= pi. & bi=p.i)
  • 15.
    EPI809/Spring 2008EPI809/Spring 20081515 Useful Measures of AssociationUseful Measures of Association - Nominal Data- SAS CODES- Nominal Data- SAS CODES DataData kap;kap; input B $ A $ prob;input B $ A $ prob; n=n=100100;; count=prob*n;count=prob*n; cards;cards; Good Good .33Good Good .33 Good Bad .07Good Bad .07 Bad Good .13Bad Good .13 Bad Bad .47Bad Bad .47 ;; runrun;; procproc freqfreq data=kap order=data;data=kap order=data; weight Count;weight Count; tables B*A/chisq;tables B*A/chisq; test kappa;test kappa; runrun;;
  • 16.
    EPI809/Spring 2008EPI809/Spring 20081616 Useful Measures of AssociationUseful Measures of Association - Nominal Data-- Nominal Data- SAS OUTPUTSAS OUTPUT The FREQ Procedure Statistics for Table of B by A Simple Kappa Coefficient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Kappa 0.5935 ASE 0.0806 95% Lower Conf Limit 0.4356 95% Upper Conf Limit 0.7514 Test of H0: Kappa = 0 ASE under H0 0.0993 Z 5.9796 One-sided Pr > Z <.0001 Two-sided Pr > |Z| <.0001 Sample Size = 100
  • 17.
    EPI809/Spring 2008EPI809/Spring 20081717 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
  • 18.
    EPI809/Spring 2008EPI809/Spring 20081818 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions  The approximate test previously presented forThe approximate test previously presented for assessing a difference in proportions is based uponassessing a difference in proportions is based upon the assumption that the two samples arethe assumption that the two samples are independentindependent..  Suppose, however, that we are faced with a situationSuppose, however, that we are faced with a situation where this is not true. Suppose we randomly-selectwhere this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then,100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to allimagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20%sampled peoples; and on a post-test, we find that 20% have flu.have flu. Basis / Rationale for the TestBasis / Rationale for the Test
  • 19.
    EPI809/Spring 2008EPI809/Spring 20081919 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions  We might be tempted to suppose that no hypothesisWe might be tempted to suppose that no hypothesis test is required under these conditions, in that thetest is required under these conditions, in that the ‘Before’ and ‘After’‘Before’ and ‘After’ pp values are identical, and wouldvalues are identical, and would surely result in a test statistic value of 0.00.surely result in a test statistic value of 0.00.  The problem with this thinking, however, is that the twoThe problem with this thinking, however, is that the two samplesample pp values are dependent, in that each personvalues are dependent, in that each person was assessed twice. Itwas assessed twice. It isis possible that the 20 peoplepossible that the 20 people that had flu originally still had flu. It is also possiblethat had flu originally still had flu. It is also possible that the 20 people that had flu on the second testthat the 20 people that had flu on the second test werewere a completely different seta completely different set of 20 people!of 20 people!
  • 20.
    EPI809/Spring 2008EPI809/Spring 20082020 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions  It is for precisely this type of situation thatIt is for precisely this type of situation that McNemar’s Test for Correlated (Dependent)McNemar’s Test for Correlated (Dependent) Proportions is applicable.Proportions is applicable.  McNemar’s Test employs two unique featuresMcNemar’s Test employs two unique features for testing the two proportions:for testing the two proportions: * a special fourfold contingency table; with a* a special fourfold contingency table; with a * special-purpose chi-square (* special-purpose chi-square (ΧΧ 22 ) test statistic) test statistic (the approximate test).(the approximate test).
  • 21.
    EPI809/Spring 2008EPI809/Spring 20082121 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions Nomenclature for the Fourfold (2 x 2)Nomenclature for the Fourfold (2 x 2) Contingency TableContingency Table AA BB DDCC (B + D)(B + D)(A + C)(A + C) (C + D)(C + D) (A + B)(A + B) nn where (A+B) + (C+D) = (A+C) + (B+D) = n = number of units evaluated and where df = 1
  • 22.
    EPI809/Spring 2008EPI809/Spring 20082222 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions 1. Construct a 2x2 table where the paired1. Construct a 2x2 table where the paired observations are the sampling units.observations are the sampling units. 2. Each observation must represent a single2. Each observation must represent a single joint event possibility; that is, classifiable injoint event possibility; that is, classifiable in only one cell of the contingency table.only one cell of the contingency table. 3. In it’s Exact form, this test may be conducted3. In it’s Exact form, this test may be conducted as a One Sample Binomial for the B & C cellsas a One Sample Binomial for the B & C cells Underlying Assumptions of the TestUnderlying Assumptions of the Test
  • 23.
    EPI809/Spring 2008EPI809/Spring 20082323 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions  4. The expected frequency (4. The expected frequency (ffee) for the B) for the B and C cells on the contingency tableand C cells on the contingency table must be equal to or greater than 5;must be equal to or greater than 5; wherewhere ffee = (B + C) / 2= (B + C) / 2 from the Fourfold tablefrom the Fourfold table Underlying Assumptions of the TestUnderlying Assumptions of the Test
  • 24.
    EPI809/Spring 2008EPI809/Spring 20082424 McNemar’s Test for Correlated (Dependent)McNemar’s Test for Correlated (Dependent) ProportionsProportions Sample ProblemSample Problem A randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.
  • 25.
    EPI809/Spring 2008EPI809/Spring 20082525 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions Sample ProblemSample Problem The summary data for this study appear as follows: Number of Students Status Before Session Status After Session 4 Fail Fail 4 Pass Fail 56 Fail Pass 56 Pass Pass
  • 26.
    EPI809/Spring 2008EPI809/Spring 20082626 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions The data are then entered into the Fourfold Contingency table: 5656 5656 4444 60606060 88 112112 120120 Before Pass Fail Pass Fail After
  • 27.
    EPI809/Spring 2008EPI809/Spring 20082727  Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses HH00 :: ππ11 == ππ22 HH11 :: ππ11 ≠≠ ππ22 wherewhere ππ11 andand ππ22 relate to the proportion ofrelate to the proportion of observations reflectingobservations reflecting changeschanges in status (thein status (the B & C cells in the table)B & C cells in the table)  Step II :Step II : αα == 0.050.05 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
  • 28.
    EPI809/Spring 2008EPI809/Spring 20082828  Step III : State the Associated TestStep III : State the Associated Test StatisticStatistic { ABS (B - C) - 1 } 2 Χ 2 = B + C McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
  • 29.
    EPI809/Spring 2008EPI809/Spring 20082929  Step IV : State the distribution of theStep IV : State the distribution of the Test Statistic When HTest Statistic When Hoo is Trueis True ΧΧ 22 == ΧΧ 22 with 1with 1 dfdf when Hwhen Hoo is Trueis True d McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
  • 30.
    EPI809/Spring 2008EPI809/Spring 20083030 Step V : Reject Ho if ABS (Χ 2 ) > 3.84 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions X² Distribution X²0 3.84 Chi-Square Curve w. df=1
  • 31.
    EPI809/Spring 2008EPI809/Spring 20083131  Step VI : Calculate the Value of theStep VI : Calculate the Value of the Test StatisticTest Statistic  McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions { ABS (56 - 4) - 1 } 2 Χ 2 = = 43.35 56 + 4
  • 32.
    EPI809/Spring 2008EPI809/Spring 20083232 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions-SAS Codes(Dependent) Proportions-SAS Codes DataData test;test; input Before $ After $ count;input Before $ After $ count; cards;cards; pass pass 56pass pass 56 pass fail 56pass fail 56 fail pass 4fail pass 4 fail fail 4;fail fail 4; runrun;; procproc freqfreq data=test order=data;data=test order=data; weight Count;weight Count; tables Before*After/tables Before*After/agreeagree;; runrun;;
  • 33.
    EPI809/Spring 2008EPI809/Spring 20083333 McNemar’s Test for CorrelatedMcNemar’s Test for Correlated (Dependent) Proportions-SAS Output(Dependent) Proportions-SAS Output Statistics for Table of Before by After McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001 Sample Size = 120 Without the correction
  • 34.
    EPI809/Spring 2008EPI809/Spring 20083434 Conclusion: What we have learnedConclusion: What we have learned 1.1. Comparison of binomial proportion using ZComparison of binomial proportion using Z andand χχ22 Test.Test. 2.2. ExplainExplain χχ22 Test for Independence of 2Test for Independence of 2 variablesvariables 3.3. Explain The Fisher’s test for independenceExplain The Fisher’s test for independence 4.4. McNemar’s tests for correlated dataMcNemar’s tests for correlated data 5.5. Kappa StatisticKappa Statistic 6.6. Use of SAS Proc FREQUse of SAS Proc FREQ
  • 35.
    EPI809/Spring 2008EPI809/Spring 20083535 Conclusion: Further readingsConclusion: Further readings Read textbook forRead textbook for 1.1. Power and sample size calculationPower and sample size calculation 2.2. Tests for trendsTests for trends

Editor's Notes

  • #35 As a result of this class, you will be able to ...
  • #36 As a result of this class, you will be able to ...