Categorical Data Analysis(CDA)
Lecture Note
Chapter One
Contingency Tables
By
Ahmed Hasan
Assistant Professor of Statistics
College of Natural and Computational Science,
Department of Statistics
Madd Walabu University
Robe, Bale,Ethiopia
1
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Outline
• Contingency Tables
• Probability structure for
contingency tables
– Joint, marginal, and
conditional probabilities
– Independence of
Categorical Variables
– Poisson and multinomial
sampling
• Comparing proportions in two-
by-two sampling
• The odds ratio
– Inference for odds ratio
– Odds ratio and relative risk
• Chi-square tests of
independence
– Pearson statistics and the
chi-square distribution
– Likelihood-ratio statistic
– Tests of independence
• Testing for independence for
ordinal data
• Exact inference for small
samples
• Association in three-way
tables
• Chi-square test of
homogeneity
• Chi-square test of goodness-of-
fit
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 2
Contingency Tables
• Contingency table data (also called a cross-tabulation or crosstab)
is the data arranged in table form having I rows for categories of X
and J columns for categories of Y with the cells that contain
frequency counts of outcome.
• The table displays the frequency (or count) of different
combinations of the variables' categories.
• It is a key tool in categorical data analysis and helps determine
whether and how two or more categorical variables are related.
• A two-way table with I rows and J columns is called an I × J (read I–
by–J ) table.
• One that cross classifies three variables is called a three-way
contingency table, and so forth.
3
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Key Components of a Contingency Table
• Rows and Columns:
– Each row represents a category or a level of one variable (often called
the row variable).
– Each column represents a category or a level of another variable
(often called the column variable).
• Cells:
– Each cell in the table represents the count or frequency of occurrences
for the specific combination of the row and column variables.
• Marginal Totals:
– The sums of the rows (the row totals) and the sums of the columns
(the column totals) are displayed at the margins of the table.
– These totals are important for understanding the distribution of data
across the variables.
• Grand Total:
– The overall total number of observations in the table, which is the sum
of all the row totals or column totals.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 4
Contingency Tables: Example
• The following classifies the data according to gender and opinion about
afterlife. For the females, 509 said they believed in an afterlife and 116
said they did not or were undecided.
– Does an association exist between gender and belief in an afterlife?
– Is one gender more likely than the other to believe in an afterlife, or
– is belief in an afterlife independent of gender?
• Table1: Cross Classification of Belief in Afterlife by Gender
• The marginal totals are the sums across the rows and columns (e.g., 625 total
Female observations, 907 Yes for belief in after life, and 1127 overall).
5
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Belief in Afterlife
Yes No or Undecided Total
Females 509 116 625
Males 398 104 502
Total 907 220 1127
Statistical Analysis with Contingency Tables
• Chi-Square Test of Independence:
– A common statistical test used with contingency tables is the Chi-
Square Test to determine if two categorical variables are independent.
– The test compares the observed frequencies in the table to the
frequencies that would be expected if the variables were independent.
• Measures of Association:
– For 2x2 tables, measures such as odds ratio or relative risk may be
calculated to assess the strength and direction of the relationship
between the variables.
– For larger tables, Cramér's V or the phi coefficient can measure the
strength of association between variables.
• Fisher’s Exact Test:
– When the sample size is small, the Fisher’s Exact Test is used instead
of the Chi-Square test to evaluate the significance of the association
between two categorical variables.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 6
Types of Contingency Tables
• 2x2 Table: Involves two variables, each with two
categories (e.g., Yes/No or Success/Failure).
• 2x3 Table: Involves two variables, one with two
categories and the other with three.
• NxM Table: Involves more than two categories
for each variable, with larger tables being
common in multi-way contingency analysis.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 7
Probability Structure for Contingency Tables
Joint, Marginal, and Conditional Probabilities
• Probabilities for contingency tables can be of three types – joint, marginal,
or conditional.
• Suppose first that a randomly chosen subject from the population of
interest is classified on X and Y .
– Let πij = P(X = i, Y = j) denote the probability that (X, Y) falls in the cell
in row i and column j.
• The probabilities {πij} form the joint distribution of X and Y .
– They satisfy πij
𝑖,𝑗 = 1.
• The marginal distributions are the row and column totals of the joint
probabilities.
– We denote these by {πi+} for the row variable and {π+j} for the column variable,
where the subscript “+” denotes the sum over the index it replaces.
• For 2 × 2 tables
π1+ = π11 + π12 and π+1 = π11 + π21
8
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Probability Structure for Contingency Tables
• Let us use Roman p in place of Greek π.
• For example: {pij} are cell proportions in a sample joint distribution.
We denote the cell counts by {nij}.
– The marginal frequencies are the row totals {ni+} and the column totals
{n+j}, and n = nij
𝑖,𝑗 denotes the total sample size.
• The sample cell proportions relate to the cell counts by
pij = nij/n
• In many contingency tables, one variable (say, the column variable,
Y) is a response variable and the other (the row variable, X) is an
explanatory variable.
• Then, it is informative to construct a separate probability
distribution for Y at each level of X.
• Such a distribution consists of conditional probabilities for Y , given
the level of X, which is called a conditional distribution.
9
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Probability Structure for Contingency
Tables: Example
10
• The Table1 above classified n = 1127 respondents to a General Social
Survey by their gender and by their belief in an afterlife.
• The below table illustrates the cell count notation for these data.
– For example, n11 = 509, and the related sample joint proportion is
p11 = 509/1127 = 0.45.
• Here, belief in the afterlife is a response variable and gender is an
explanatory variable.
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Belief in Afterlife
Yes No or Undecided Total
Females n11 = 509 n12 = 116 n1+ = 625
Males n21 = 398 n22 = 104 n2+ = 502
Total n+1 = 907 n+2 = 220 n = 1127
Probability Structure for Contingency
Tables: Example
• We therefore study the conditional distributions of belief
in the afterlife, given gender.
• For females, the proportion of “yes” responses was
509/625 = 0.81 and the proportion of “no” responses
was 116/625 = 0.19.
• The proportions (0.81, 0.19) form the sample conditional
distribution of belief in the afterlife.
• For males, the sample conditional distribution is (0.79,
0.21).
11
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Independence of Categorical Variables
12
• Statistical independence between two categorical variables, X
and Y, implies that knowing the value of X does not change
the probability distribution of Y, and vice versa.
– This means that the conditional probability of Y given X is the
same as the marginal probability of Y alone, at each level of X.
• Statistical independence is, equivalently, the property that all
joint probabilities equal the product of their marginal
probabilities,
πij = πi+π+j for i = 1, . . . , I and j = 1, . . . , J
• When X and Y are independent,
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Poisson and Multinomial Sampling
• A Poisson sampling model treats cell counts {Yij} as independent Poisson
random variables with parameters {μij}.
• The joint probability mass function for potential outcomes {nij} is
then the product of the individual Poisson probabilities P(Yij = nij) for
the IJ cells, or
• When the total sample size n is fixed but either the row or column
totals are not, a multinomial sampling model applies.
• The IJ cells are the possible outcomes. The probability mass
function of the cell counts has the multinomial form
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 13
Comparing Proportions In Two-by-two Tables
14
• Response variables having two categories are called binary
variables.
• For instance, belief in afterlife is binary when measured with
categories (yes, no).
• Many studies compare two groups on a binary response, Y.
– The data can be displayed in a 2 × 2 contingency table, in which the
rows are the two groups and the columns are the response levels of Y.
• This section presents measures for comparing groups on
binary responses.
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Relative Risk
15
• For 2 × 2 tables, the relative risk is:
Relative risk =
π1
π2
• It can be any nonnegative real number.
• A relative risk of 1.00 occurs when π1 = π2, that is, when the
response is independent of the group.
• Two groups with sample proportions p1 and p2 have a sample
relative risk of p1/p2.
Example:
• The following table is from a report on the relationship between
aspirin use and myocardial infarction (heart attacks) by the
Physicians’ Health Study Research Group at Harvard
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Relative Risk
16
From this, the sample proportion p1 and p2 can be:
• Of the n1 = 11,034 physicians taking placebo, 189 suffered myocardial
infarction (MI) during the study, a proportion of p1 = 189/11,034 = 0.0171.
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Relative Risk
17
• Of the n2 = 11,037 physicians taking aspirin,
104 suffered MI, a proportion of p2 = 0.0094.
• Now, the sample relative risk is
p1
p2
=
0.0171
0.0094
= 1.82.
– Which means: proportion of MI cases was 82%
higher for the group taking placebo.
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Odds Ratio
18
• It occurs as a parameter in the most important type of
model for categorical data.
• For a probability of success π, the odds of success is
𝑜𝑑𝑑𝑠 =
𝜋
1 − 𝜋
• For instance, if π = 0.75, then the odds of success is
0.75/0.25 = 3.
• The odds are nonnegative, with value greater than 1.0
when a success is more likely than a failure.
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Odds Ratio
• When odds = 4.0, a success is four times as likely as a failure.
• The probability of success is 0.8, the probability of failure is
0.2, and the odds equal 0.8/0.2 = 4.0.
• We then expect to observe four successes for every one
failure.
• When odds = 1/4, a failure is four times as likely as a success,
we then expect to observe one success for every four failures.
• The success probability itself is the function of the odds,
𝜋 =
𝑜𝑑𝑑𝑠
𝑜𝑑𝑑𝑠 + 1
• For instance, when odds = 4, then π = 4/(4 + 1) = 0.8.
19
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Odds Ratio
• In 2 × 2 tables, within row 1 the odds of success are
odds1 = π1/(1 − π1), and within row 2 the odds of
success equal odds2 = π2/(1 − π2).
• The ratio of the odds from the two rows,
𝜃 =
𝑜𝑑𝑑𝑠1
𝑜𝑑𝑑𝑠2
=
π1/(1 − π1)
π2/(1 − π2)
• This is the odds ratio.
• Whereas the relative risk is a ratio of two probabilities,
the odds ratio 𝜃 is a ratio of two odds.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 20
Properties of Odds Ratio
• The odds ratio can be any nonnegative number.
• When X and Y are independent, π1 = π2, so odds1 = odds2 and
θ = odds1/odds2 = 1.
• The independence value θ = 1 is a baseline for comparison.
• When θ > 1, the odds of success are higher in row 1 than in row 2.
• For instance, when θ = 4, the odds of success in row 1 are four
times the odds of success in row 2.
• Thus, subjects in row 1 are more likely to have successes than are
subjects in row 2; that is, π1 > π2.
• When θ < 1, a success is less likely in row 1 than in row 2; that is, π1 < π2.
21
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Properties of Odds Ratio
• Two values for θ represent the same strength of association, but in
opposite directions, when one value is the inverse of the other.
• When θ = 0.25, for example, the odds of success in row 1 are 0.25
times the odds of success in row 2, or equivalently 1/0.25 = 4.0
times as high in row2 as in row1.
• The odds ratio does not change value when the table orientation
reverses so that the rows become the columns and the columns
become the rows.
• The odds ratio can be defined using joint probabilities as
𝜃 =
π11/π12
π21/π22
=
π11π22
π12π21
22
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Example
From the table above,
• For the physicians taking placebo, the estimated odds of MI
equal n11/n12 = 189/10,845 = 0.0174.
• Since 0.0174 = 1.74/100, the value 0.0174 means there were
1.74 “yes” outcomes for every 100 “no” outcomes.
• The estimated odds equal 104/10,933 = 0.0095 for those taking
aspirin, or 0.95 “yes” outcomes per every 100 “no” outcomes.
• The sample odds ratio equals 𝜃 = 0.0174/0.0095 = 1.832.
– This also equals the cross-product ratio (189 × 10, 933)/(10,845 × 104).
– The estimated odds of MI for male physicians taking placebo equal
1.83 times the estimated odds for male physicians taking aspirin.
– The estimated odds were 83% higher for the placebo group.
23
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Inference for Odds Ratios and Log
Odds Ratios
• Unless the sample size is extremely large, the sampling distribution of the
odds ratio is highly skewed.
• Because of this skewness, statistical inference for the odds ratio uses an
alternative but equivalent measure – its natural logarithm, ln(θ).
• For independency, If θ=1, ln(θ) = ln(1) = 0, meaning the two variables are
independent.
• The sample ln odds ratio, ln𝜃, has a less skewed sampling distribution that
is bell-shaped.
• Its approximating normal distribution has a mean of ln𝜃 and a standard error of
𝑆𝐸 ln𝜃 =
1
𝑛11
+
1
𝑛12
+
1
𝑛21
+
1
𝑛22
NB: The SE decreases as the cell counts increase.
• The test Statistic is Wald Test: Z = ln(𝜃)
𝑆𝐸(ln𝜃)
24
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Inference for Odds Ratios and Log
Odds Ratios
• Because the sampling distribution is closer to normality for ln𝜃 than 𝜃, it is
better to construct confidence intervals for lnθ.
• Then transform back (that is, take antilog, using the exponential function,
discussed below) to form a confidence interval for θ.
• A large-sample confidence interval for lnθ is
ln𝜃 ± 𝑍α/2 .SE(ln𝜃)
• Back-transform to odds ratio scale
CI for θ = (𝑒
ln𝜃 − 𝑍
α/2 .SE
, 𝑒
ln𝜃 + 𝑍
α/2 .SE
)
Example:
• For the previous example, the natural log of 𝜃 equals ln(1.832) = 0.605,
and the SE becomes;
𝑆𝐸 =
1
189
+
1
10845
+
1
104
+
1
10933
= 0.123
25
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Inference for Odds Ratios and Log
Odds Ratios
• For the population, a 95% confidence interval for lnθ equals
0.605 ± 1.96(0.123), or (0.365, 0.846).
• Back-transform to odds ratio scale: The corresponding confidence interval
for θ is
(e0.365, e0.846) = (1.44, 2.33)
• This is from the equality of ex = c is equivalent to ln(c) = x
– For instance, e0 = exp(0) = 1 corresponds to ln(1) = 0; similarly, e0.7 =
exp(0.7) = 2.0 corresponds to ln(2) = 0.7]
• Since the 95% confidence interval (1.44, 2.33) for the odds ratio does not
contain 1.0, the true odds of MI seem different for the two groups.
• We estimate that the odds of MI are at least 44% higher for subjects
taking placebo than for subjects taking aspirin.
26
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Relationship Between Odds Ratio and
Relative Risk
Odds ratio =
𝑝1/(1 −𝑝1)
𝑝2/(1 − 𝑝2)
= Relative risk × (
1 −𝑝2
1 −𝑝1
)
• When 𝑝1and 𝑝2 are both close to zero, the fraction in the last
term of this expression equals approximately 1.0.
– The odds ratio and relative risk then take similar values.
• For Example: From the previous example MI, proportion of MI
cases is close to zero, and then sample odds ratio of 1.83 is
similar to the sample relative risk of 1.82.
27
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Chi-squared Tests of Independence
• The chi-square tests are statistical methods used to
determine whether there is a significant association between
categorical variables.
• It is applied in the context of contingency tables, where data
is organized into rows and columns representing different
levels of two categorical variables.
• The Chi-Square Test of Independence determines whether
two categorical variables are independent of each other.
• Null Hypothesis (H₀): The variables are independent
• Alternative Hypothesis (Ha): The variables are not
independent
28
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Pearson Statistic and the Chi-Squared Distribution
• Pearson Chi-Square Test: Developed by Karl Pearson in 1900, it
evaluates whether observed frequencies align with theoretical
expectations for multinomial distributions.
• Consider the null hypothesis (Ho) that cell probabilities equal
certain fixed values {π𝑖𝑗}, for a sample of size n with cell counts
{𝑛𝑖𝑗},
– The values {μ𝑖𝑗 = nπ𝑖𝑗} are expected frequencies.
– They represent the values of the expectations {E(𝑛𝑖𝑗)} when Ho is true.
• The Pearson chi-squared statistic for testing H0 is
𝑋2 =
(𝑛𝑖𝑗 − μ𝑖𝑗)2
μ𝑖𝑗
, where,
• This statistic takes its minimum value of zero when all 𝑛𝑖𝑗 = μ𝑖𝑗.
29
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
µ𝑖𝑗 =
𝑛𝑖+ ∗ 𝑛+𝑗
𝑛
• ni+​ = total count for row I
• n+j​ = total count for
column j
• n = total sample size
Pearson Statistic
• For a fixed sample size, greater differences {𝑛𝑖𝑗 − μ𝑖𝑗} produce
larger X2 values and stronger evidence against Ho.
• The Pearson X2 statistic follows a chi-square distribution
asymptotically as the sample size (n) becomes large.
• Decision Rule:
– Compare the computed χ2 to the critical value from the
chi-square distribution table with df degrees of freedom.
– Reject Ho​ if χ2 exceeds the critical value or if the p-value is
less than the significance level (α).
11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 30
Chi-square distribution
• It is a continuous probability distribution used in hypothesis
testing. It is a special case of the gamma distribution.
• Properties:
– It is skewed to the right.
– The shape depends on the degrees of freedom (df).
– As df increases, the distribution approaches normality.
• For a chi-square random variable X∼χ2(k), the PDF is:
• Where:
– Γ(⋅): Gamma function,
– k: Degrees of freedom.
11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 31
Chi-square distribution: Applications
• Goodness-of-Fit Test:
– Tests whether an observed frequency distribution matches an
expected distribution.
• Test of Independence:
– Evaluates whether two categorical variables are independent in a
contingency table.
• Test of Homogeneity:
– Determines if different populations share the same distribution of a
categorical variable.
• Confidence Intervals for Variance:
– For a normal population with variance ς2, the chi-square distribution
is used to construct confidence intervals:
• Analysis of Variance (ANOVA):
– Variance components are analyzed using chi-square distributions.
11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 32
Likelihood-Ratio Statistic
• An alternative to the Pearson χ2, the Likelihood-Ratio Statistic is
based on the log-likelihood.
• For two-way contingency tables with likelihood function based on
the multinomial distribution, the likelihood-ratio statistic is given by
𝐺2
= 2 𝑛𝑖𝑗𝑙𝑜𝑔
𝑛𝑖𝑗
μij
– This statistic is called the likelihood-ratio chi-squared statistic.
• Like the Pearson statistic, G2 takes its minimum value of 0 when all
nij = μij , and larger values provide stronger evidence against Ho.
• Under Ho, the two statistics follows chi-squared distribution, and they
often yield similar conclusions, especially for large sample sizes.
• Preferred when some expected frequencies are small (<5)
33
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Tests of Independence
• In two-way contingency tables with joint probabilities {πij} for
two response variables, the null hypothesis of statistical
independence is
H0: πij = πi+π+j for all i and j
• The marginal probabilities then determine the joint
probabilities.
• To test H0, we identify μij = nπij = nπi+π+j as the expected
frequency.
• Here, μij is the expected value of nij assuming independence.
• Usually, {πi+} and {π+j} are unknown, as is this expected value.
34
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Tests of Independence
• To estimate the expected frequencies, substitute sample proportions for
the unknown marginal probabilities, giving
• This is the row total for the cell multiplied by the column total for the cell,
divided by the overall sample size.
• The {𝜇ij} are called estimated expected frequencies.
• For testing independence in I×J contingency tables, the Pearson and
likelihood ratio statistics equal
• Their large-sample 𝑋2
distributions have df =(I − 1)(J − 1).
35
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Tests of Independence: Example
• Gender Vs Political party identification (Democratic, Independent or
Republican) data, and test their independency.
36
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Tests of Independence: Example
• Ho: political party identification and gender are Independent
Using likelihood test: G2 = 2 nij 𝒍𝒐𝒈
nij
µij
• The expected frequency, µij =
𝑛𝑖+𝑛+𝑗
𝑛
µ11 =
𝑛1+𝑛+1
𝑛
= µ11 =
1246∗1557
2757
= 703.
µ12 =
𝑛1+𝑛+2
𝑛
= µ11 =
1557∗566
2757
= 319.6
……..
µ23 =
𝑛2+𝑛+3
𝑛
= µ23 =
1200∗945
2757
= 411.3
• Now, G2 = 2[762 log
762
703.7
+…+477 log
477
411.3
] = 30
• with (I-1)(J-1) =2 degree of freedom
37
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Tests of Independence: Example
• Using Chi-square statistics: 𝑋2 =
(𝑛𝑖𝑗−µ𝑖𝑗)2
µ𝑖𝑗
=
(762−703.7)2
703.7
+ ⋯ +
(477−411.3)2
411.3
𝑋2
= 30.1 with the same degree of freedom 2
• And then the tabulated value, X2
0.05(2) is 5.99
• Conclusion:
– This result rejects the independency evidence.
– Therefore, from both results, we can conclude that
political party identification and gender are associated.
38
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Testing for independence for ordinal data
• When the rows and/or the columns are ordinal, the chi-squared test of
independence using test statistic X2 or G2 ignores the ordering
information.
• Test statistics that use the ordinality by treating ordinal variables as
quantitative rather than qualitative (nominal scale) are usually more
appropriate and provide greater power.
Linear Trend Alternative to Independence
• When the variables are ordinal, a trend association is common.
• As the level of X increases, responses on Y tend to increase toward higher
levels, or responses on Y tend to decrease toward lower levels.
• Eg. A researcher examines the relationship between physical activity level
(low, moderate, high) and overall health rating (poor, fair, good, excellent).
– As physical activity level increases from low to high, the proportion of
responses shifts toward higher health ratings.
39
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Testing for independence for ordinal data
• To detect a trend association, a simple analysis assigns scores
to categories and measures the degree of linear trend.
• The test statistic, which is sensitive to positive or negative
linear trends, utilizes correlation information in the data.
• Let u1 ≤ u2 ≤ · · ·≤ ul denote scores for the rows, and let v1 ≤ v2
≤ · · · ≤ vj denote scores for the columns,
– These scores often reflect the ordinal ranking of the
categories (e.g., 1, 2, 3) but can be adjusted based on the
relative distances or importance of the categories
• Let 𝒖 = 𝒖𝒊𝒏𝒊+
𝒊 denote the weighted mean of the row
scores, and
• 𝒗 = 𝒗𝒋𝒏+𝒋
𝒋 weighted mean of the column scores
40
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Testing for independence for ordinal data
• Then, the correlation between X and Y can be given by:
𝑟 =
(𝒖𝒊−𝒖)(𝒗𝒋−𝒗)𝒏𝒊𝒋
𝒊,𝒋
(𝒖𝒊−𝒖)𝟐
𝒏𝒊+
𝒊 (𝒗𝒋−𝒗)𝟐
𝒏+𝒋
𝒋
=
𝐶𝑜𝑣(𝑋,𝑌)
𝑆𝑋𝑆𝑌
• Where
– 𝒖𝒊​, 𝒗𝒋: Scores for the i-th row and j-th column, respectively,
– 𝒖, 𝒗: Weighted means of the scores for rows and columns,
– 𝒏𝒊𝒋​: Observed frequency in cell (i,j),
– 𝒏𝒊+​: Row totals,
– 𝒏+𝒋: Column totals.
• Here, independence between the variables implies that its population value ρ
equals zero.
• For testing Ho: independence against the two-sided Ha: ρ ≠ 0, a test statistic is:
M2 = (n-1)r2, and M2 ~ X2(1 df)
41
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/24/2024
Exercise: Testing for independence for ordinal data
Test the association between
Item 1: A working mother can establish just as warm and secure a
relationship with her children as a mother who does not work. and
Item 2: Working women should have paid maternity leave.
42
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Exercise: Testing for independence for ordinal data
• Using X2 and G2, having df = 12, G2 = 47.576 (P = <0.001) and
X2 = 44.961(P = <0.001).
• There is a “linear trend” in these data, so we can describe this
relationship using a single statistic:
𝑟 =
𝐶𝑜𝑣(𝑋, 𝑌)
𝑆𝑋𝑆𝑌
• To compute r, we need scores for both the row (item 1) categories
and the column (item 2) categories. So, the score for categories:
𝑅𝑜𝑤𝑠: 𝑢1 = 1, 𝑢2 = 2, 𝑢3 = 3, 𝑢4 = 4
Columns: 𝑣1 = 1, 𝑣2 = 2, 𝑣3 = 3, 𝑣4 = 4, 𝑣5 = 5
• From this, r = .203 and M2 = (884 − 1)(.203)2 = 36.26
With df = 1, p–value for observed M2 is < .001.
43
Ahmed Hasan(ahmed.hasan@mwu.edu.et)
11/25/2024
Exact Inference for Small Samples
• Exact inference methods are used for small samples when large-
sample approximations (e.g., normal or chi-square) may not be
reliable.
• These methods rely on the exact sampling distribution of test statistics
rather than approximations.
• Fisher's Exact Test is a common method for small-sample exact
inference.
• This involves using exact distributions like the binomial or
hypergeometric distribution to compute the exact p-value instead of
approximations.
– Binomial Distribution: For binary response data (success/failure), inference
directly uses the binomial probabilities without relying on asymptotic
normality.
– Hypergeometric Distribution: Applies when sampling is without replacement,
typically from a finite population.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 44
Association in Three-Way Tables
• Three-way tables involve three categorical variables. They help assess how
two variables are associated, considering the levels of a third variable.
• Analysis often examines conditional or marginal associations and uses
measures like conditional odds ratios.
• Types of Associations:
– Marginal Independence: Two variables are marginally independent if
their association is assessed without considering the third variable.
• Example: 𝑋 and 𝑌 are marginally independent if the marginal table
of 𝑋 and 𝑌(ignoring 𝑍) shows no association.
– Conditional Independence: Two variables are conditionally
independent given the third variable if their association within each
level of the third variable is independent. Denoted, X⊥Y∣Z.
– Homogeneous Association: The association between two variables 𝑋
and 𝑌 is homogeneous across all levels of 𝑍 if the odds ratios for 𝑋 and
𝑌 are the same at each level of 𝑍.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 45
Measures of Association
• Conditional Odds Ratio: For two variables 𝑋 and 𝑌 at a
specific level of 𝑍=𝑘, the odds ratio is
• where 𝑛𝑖𝑗𝑘​ are the observed counts in the three-way table.
• Homogeneous Odds Ratio: If the odds ratio 𝑂𝑅𝑋𝑌∣𝑍=𝑘 is the
same for all 𝑘, then the association between 𝑋 and 𝑌 is
homogeneous across 𝑍.
• Conditional Independence Model: Assumes that 𝑋 and 𝑌 are
independent given 𝑍.
– This can be tested using likelihood ratio statistics or loglinear
models.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 46
Testing for Association
• Likelihood Ratio Test: Compares a simpler (nested) model with a
more complex model.
• Test statistic:
– 𝑛𝑖𝑗𝑘 : Observed counts.
– 𝑛ijk : Expected counts under the simpler model.
• Interpretation: A significant 𝐺2 indicates the simpler model does
not fit well, suggesting the inclusion of additional interactions.
• Example: Suppose we study smoking habits (𝑋), exercise levels (𝑌), and
age groups (𝑍) in a population. A three-way table is formed with counts
for each combination.
– To test whether smoking (𝑋) and exercise (𝑌) are independent within age
groups (𝑍)
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 47
Chi-Square Test of Homogeneity
• This test compares whether multiple populations have the same
distribution across categories.
• Hypotheses:
– 𝐻0: The distributions across groups are the same.
– 𝐻𝑎: The distributions across groups differ.
• Test Statistic:
𝑋2 =
(𝑛𝑖𝑗 − μ𝑖𝑗)
μ𝑖𝑗
2
, μ𝑖𝑗 =
• Where 𝑛𝑖𝑗 is the observed count, and μ𝑖𝑗 is the expected count
under 𝐻o:
• The degrees of freedom (df) for the test is: df=(𝐼−1)(𝐽−1)
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 48
Chi-Square Test of Homogeneity: Example
• A researcher investigates whether the preference for different
types of beverages (e.g., coffee, tea, soda) is the same across
three age groups (young, middle-aged, senior).
• Data Table:
• For example for the cell (Young, Coffee):
µ11 =
𝑛1+𝑛+1
𝑛
=
100 ∗ 150
300
= 50
• df=(3−1)(3−1)=4.
11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 49
Coffee Tea Soda Total
Young 50 30 20 100
Middle-aged 60 20 20 100
Senior 40 50 10 100
Total 150 100 50 300
Chi-Square Test of Goodness-of-Fit
• This test assesses if observed categorical data fit a specific
theoretical distribution.
• Hypotheses:
– 𝐻0: The observed distribution follows the expected distribution.
– 𝐻𝑎: The observed distribution does not follow the expected distribution.
• Test Statistic: 𝑋2
=
(𝑛𝑖 − μ𝑖)
μ𝑖
2
– Where n𝑖​ and μ𝑖​ are the observed and expected count respectively
• The expected frequency for each category μi is calculated as:
μ𝑖= 𝑛⋅p𝑖​
• Where,
– 𝑛: Total number of observations,
– 𝑝𝑖: The hypothesized proportion for category 𝑖.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 50
Chi-Square Test of Goodness-of-Fit
• The df for the test is: df = 𝑘−1, where k is number of categories
• If parameters (e.g., mean or variance) are estimated from the data,
the degrees of freedom reduce further:
• df = k−1−(number of parameters estimated).
• Example: A die is rolled 60 times, and the observed frequencies for
the six faces are recorded. The hypothesized distribution is that the
die is fair, so each face should appear with equal probability
(𝑝𝑖=1/6).
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 51
Face 1 2 3 4 5 6
Observed(n𝑖) 10 8 12 14 9 7
Chi-Square Test of Goodness-of-Fit
• For a fair die, the expected frequency for each face is:
μ𝑖= 𝑛⋅p𝑖​ ​= 60*1/6 ​=10
• Compute
𝑋2 =
(𝑛𝑖 − μ𝑖)
μ𝑖
2
= 3.4
df = 6 – 1 = 5
• Using a chi-square distribution table, compare χ2=3.4 with df=5. At
α=0.05, the critical value is approximately 11.07. Since 3.4<11.07, we fail
to reject H0​.
• Conclusion: The p-value indicates that the observed frequencies are
consistent with the expected frequencies under the assumption that the
die is fair. There is no evidence to reject Ho
• Key Notes: Use the test only when expected frequencies are sufficiently large
(usually μi ≥ 5). For small expected frequencies, use an exact test or combine
categories.
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 52
Summary
Test Purpose Formula Key Assumption
Exact
Inference
Analyze small
samples without
large-sample
approximations
Depends on the
specific test (e.g.,
hypergeometric for
Fisher’s Test).
Assumes exact probabilities
of outcomes (discrete
distributions).
Three-Way
Association
Examine
relationships in
three-way
contingency tables
Loglinear model:
log(𝜇𝑖𝑗𝑘)=…
Likelihood Ratio
Test
Assumes multinomial or
Poisson sampling.
Chi-Square
Homogeneity
Compare
distributions
across groups
𝑋2
=
(𝑛𝑖𝑗 − μ𝑖𝑗)
μ𝑖𝑗
2
Assumes sufficiently large
sample size for expected
frequencies.
Chi-Square
Goodness-of-
Fit
Compare observed
vs. expected
distribution
𝑋2
=
(𝑛𝑖 − μ𝑖)
μ𝑖
2
Assumes categories are
mutually exclusive, and
expected frequencies are
sufficiently large (usually >5).
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 53
The End
Thank You!
11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 54

Introduction to Categorical Data Analysis: Contingency Table

  • 1.
    Categorical Data Analysis(CDA) LectureNote Chapter One Contingency Tables By Ahmed Hasan Assistant Professor of Statistics College of Natural and Computational Science, Department of Statistics Madd Walabu University Robe, Bale,Ethiopia 1 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 2.
    Outline • Contingency Tables •Probability structure for contingency tables – Joint, marginal, and conditional probabilities – Independence of Categorical Variables – Poisson and multinomial sampling • Comparing proportions in two- by-two sampling • The odds ratio – Inference for odds ratio – Odds ratio and relative risk • Chi-square tests of independence – Pearson statistics and the chi-square distribution – Likelihood-ratio statistic – Tests of independence • Testing for independence for ordinal data • Exact inference for small samples • Association in three-way tables • Chi-square test of homogeneity • Chi-square test of goodness-of- fit 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 2
  • 3.
    Contingency Tables • Contingencytable data (also called a cross-tabulation or crosstab) is the data arranged in table form having I rows for categories of X and J columns for categories of Y with the cells that contain frequency counts of outcome. • The table displays the frequency (or count) of different combinations of the variables' categories. • It is a key tool in categorical data analysis and helps determine whether and how two or more categorical variables are related. • A two-way table with I rows and J columns is called an I × J (read I– by–J ) table. • One that cross classifies three variables is called a three-way contingency table, and so forth. 3 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 4.
    Key Components ofa Contingency Table • Rows and Columns: – Each row represents a category or a level of one variable (often called the row variable). – Each column represents a category or a level of another variable (often called the column variable). • Cells: – Each cell in the table represents the count or frequency of occurrences for the specific combination of the row and column variables. • Marginal Totals: – The sums of the rows (the row totals) and the sums of the columns (the column totals) are displayed at the margins of the table. – These totals are important for understanding the distribution of data across the variables. • Grand Total: – The overall total number of observations in the table, which is the sum of all the row totals or column totals. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 4
  • 5.
    Contingency Tables: Example •The following classifies the data according to gender and opinion about afterlife. For the females, 509 said they believed in an afterlife and 116 said they did not or were undecided. – Does an association exist between gender and belief in an afterlife? – Is one gender more likely than the other to believe in an afterlife, or – is belief in an afterlife independent of gender? • Table1: Cross Classification of Belief in Afterlife by Gender • The marginal totals are the sums across the rows and columns (e.g., 625 total Female observations, 907 Yes for belief in after life, and 1127 overall). 5 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024 Belief in Afterlife Yes No or Undecided Total Females 509 116 625 Males 398 104 502 Total 907 220 1127
  • 6.
    Statistical Analysis withContingency Tables • Chi-Square Test of Independence: – A common statistical test used with contingency tables is the Chi- Square Test to determine if two categorical variables are independent. – The test compares the observed frequencies in the table to the frequencies that would be expected if the variables were independent. • Measures of Association: – For 2x2 tables, measures such as odds ratio or relative risk may be calculated to assess the strength and direction of the relationship between the variables. – For larger tables, Cramér's V or the phi coefficient can measure the strength of association between variables. • Fisher’s Exact Test: – When the sample size is small, the Fisher’s Exact Test is used instead of the Chi-Square test to evaluate the significance of the association between two categorical variables. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 6
  • 7.
    Types of ContingencyTables • 2x2 Table: Involves two variables, each with two categories (e.g., Yes/No or Success/Failure). • 2x3 Table: Involves two variables, one with two categories and the other with three. • NxM Table: Involves more than two categories for each variable, with larger tables being common in multi-way contingency analysis. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 7
  • 8.
    Probability Structure forContingency Tables Joint, Marginal, and Conditional Probabilities • Probabilities for contingency tables can be of three types – joint, marginal, or conditional. • Suppose first that a randomly chosen subject from the population of interest is classified on X and Y . – Let πij = P(X = i, Y = j) denote the probability that (X, Y) falls in the cell in row i and column j. • The probabilities {πij} form the joint distribution of X and Y . – They satisfy πij 𝑖,𝑗 = 1. • The marginal distributions are the row and column totals of the joint probabilities. – We denote these by {πi+} for the row variable and {π+j} for the column variable, where the subscript “+” denotes the sum over the index it replaces. • For 2 × 2 tables π1+ = π11 + π12 and π+1 = π11 + π21 8 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 9.
    Probability Structure forContingency Tables • Let us use Roman p in place of Greek π. • For example: {pij} are cell proportions in a sample joint distribution. We denote the cell counts by {nij}. – The marginal frequencies are the row totals {ni+} and the column totals {n+j}, and n = nij 𝑖,𝑗 denotes the total sample size. • The sample cell proportions relate to the cell counts by pij = nij/n • In many contingency tables, one variable (say, the column variable, Y) is a response variable and the other (the row variable, X) is an explanatory variable. • Then, it is informative to construct a separate probability distribution for Y at each level of X. • Such a distribution consists of conditional probabilities for Y , given the level of X, which is called a conditional distribution. 9 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 10.
    Probability Structure forContingency Tables: Example 10 • The Table1 above classified n = 1127 respondents to a General Social Survey by their gender and by their belief in an afterlife. • The below table illustrates the cell count notation for these data. – For example, n11 = 509, and the related sample joint proportion is p11 = 509/1127 = 0.45. • Here, belief in the afterlife is a response variable and gender is an explanatory variable. Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024 Belief in Afterlife Yes No or Undecided Total Females n11 = 509 n12 = 116 n1+ = 625 Males n21 = 398 n22 = 104 n2+ = 502 Total n+1 = 907 n+2 = 220 n = 1127
  • 11.
    Probability Structure forContingency Tables: Example • We therefore study the conditional distributions of belief in the afterlife, given gender. • For females, the proportion of “yes” responses was 509/625 = 0.81 and the proportion of “no” responses was 116/625 = 0.19. • The proportions (0.81, 0.19) form the sample conditional distribution of belief in the afterlife. • For males, the sample conditional distribution is (0.79, 0.21). 11 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 12.
    Independence of CategoricalVariables 12 • Statistical independence between two categorical variables, X and Y, implies that knowing the value of X does not change the probability distribution of Y, and vice versa. – This means that the conditional probability of Y given X is the same as the marginal probability of Y alone, at each level of X. • Statistical independence is, equivalently, the property that all joint probabilities equal the product of their marginal probabilities, πij = πi+π+j for i = 1, . . . , I and j = 1, . . . , J • When X and Y are independent, Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 13.
    Poisson and MultinomialSampling • A Poisson sampling model treats cell counts {Yij} as independent Poisson random variables with parameters {μij}. • The joint probability mass function for potential outcomes {nij} is then the product of the individual Poisson probabilities P(Yij = nij) for the IJ cells, or • When the total sample size n is fixed but either the row or column totals are not, a multinomial sampling model applies. • The IJ cells are the possible outcomes. The probability mass function of the cell counts has the multinomial form 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 13
  • 14.
    Comparing Proportions InTwo-by-two Tables 14 • Response variables having two categories are called binary variables. • For instance, belief in afterlife is binary when measured with categories (yes, no). • Many studies compare two groups on a binary response, Y. – The data can be displayed in a 2 × 2 contingency table, in which the rows are the two groups and the columns are the response levels of Y. • This section presents measures for comparing groups on binary responses. Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 15.
    Relative Risk 15 • For2 × 2 tables, the relative risk is: Relative risk = π1 π2 • It can be any nonnegative real number. • A relative risk of 1.00 occurs when π1 = π2, that is, when the response is independent of the group. • Two groups with sample proportions p1 and p2 have a sample relative risk of p1/p2. Example: • The following table is from a report on the relationship between aspirin use and myocardial infarction (heart attacks) by the Physicians’ Health Study Research Group at Harvard Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 16.
    Relative Risk 16 From this,the sample proportion p1 and p2 can be: • Of the n1 = 11,034 physicians taking placebo, 189 suffered myocardial infarction (MI) during the study, a proportion of p1 = 189/11,034 = 0.0171. Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 17.
    Relative Risk 17 • Ofthe n2 = 11,037 physicians taking aspirin, 104 suffered MI, a proportion of p2 = 0.0094. • Now, the sample relative risk is p1 p2 = 0.0171 0.0094 = 1.82. – Which means: proportion of MI cases was 82% higher for the group taking placebo. Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 18.
    Odds Ratio 18 • Itoccurs as a parameter in the most important type of model for categorical data. • For a probability of success π, the odds of success is 𝑜𝑑𝑑𝑠 = 𝜋 1 − 𝜋 • For instance, if π = 0.75, then the odds of success is 0.75/0.25 = 3. • The odds are nonnegative, with value greater than 1.0 when a success is more likely than a failure. Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 19.
    Odds Ratio • Whenodds = 4.0, a success is four times as likely as a failure. • The probability of success is 0.8, the probability of failure is 0.2, and the odds equal 0.8/0.2 = 4.0. • We then expect to observe four successes for every one failure. • When odds = 1/4, a failure is four times as likely as a success, we then expect to observe one success for every four failures. • The success probability itself is the function of the odds, 𝜋 = 𝑜𝑑𝑑𝑠 𝑜𝑑𝑑𝑠 + 1 • For instance, when odds = 4, then π = 4/(4 + 1) = 0.8. 19 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 20.
    Odds Ratio • In2 × 2 tables, within row 1 the odds of success are odds1 = π1/(1 − π1), and within row 2 the odds of success equal odds2 = π2/(1 − π2). • The ratio of the odds from the two rows, 𝜃 = 𝑜𝑑𝑑𝑠1 𝑜𝑑𝑑𝑠2 = π1/(1 − π1) π2/(1 − π2) • This is the odds ratio. • Whereas the relative risk is a ratio of two probabilities, the odds ratio 𝜃 is a ratio of two odds. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 20
  • 21.
    Properties of OddsRatio • The odds ratio can be any nonnegative number. • When X and Y are independent, π1 = π2, so odds1 = odds2 and θ = odds1/odds2 = 1. • The independence value θ = 1 is a baseline for comparison. • When θ > 1, the odds of success are higher in row 1 than in row 2. • For instance, when θ = 4, the odds of success in row 1 are four times the odds of success in row 2. • Thus, subjects in row 1 are more likely to have successes than are subjects in row 2; that is, π1 > π2. • When θ < 1, a success is less likely in row 1 than in row 2; that is, π1 < π2. 21 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 22.
    Properties of OddsRatio • Two values for θ represent the same strength of association, but in opposite directions, when one value is the inverse of the other. • When θ = 0.25, for example, the odds of success in row 1 are 0.25 times the odds of success in row 2, or equivalently 1/0.25 = 4.0 times as high in row2 as in row1. • The odds ratio does not change value when the table orientation reverses so that the rows become the columns and the columns become the rows. • The odds ratio can be defined using joint probabilities as 𝜃 = π11/π12 π21/π22 = π11π22 π12π21 22 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 23.
    Example From the tableabove, • For the physicians taking placebo, the estimated odds of MI equal n11/n12 = 189/10,845 = 0.0174. • Since 0.0174 = 1.74/100, the value 0.0174 means there were 1.74 “yes” outcomes for every 100 “no” outcomes. • The estimated odds equal 104/10,933 = 0.0095 for those taking aspirin, or 0.95 “yes” outcomes per every 100 “no” outcomes. • The sample odds ratio equals 𝜃 = 0.0174/0.0095 = 1.832. – This also equals the cross-product ratio (189 × 10, 933)/(10,845 × 104). – The estimated odds of MI for male physicians taking placebo equal 1.83 times the estimated odds for male physicians taking aspirin. – The estimated odds were 83% higher for the placebo group. 23 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 24.
    Inference for OddsRatios and Log Odds Ratios • Unless the sample size is extremely large, the sampling distribution of the odds ratio is highly skewed. • Because of this skewness, statistical inference for the odds ratio uses an alternative but equivalent measure – its natural logarithm, ln(θ). • For independency, If θ=1, ln(θ) = ln(1) = 0, meaning the two variables are independent. • The sample ln odds ratio, ln𝜃, has a less skewed sampling distribution that is bell-shaped. • Its approximating normal distribution has a mean of ln𝜃 and a standard error of 𝑆𝐸 ln𝜃 = 1 𝑛11 + 1 𝑛12 + 1 𝑛21 + 1 𝑛22 NB: The SE decreases as the cell counts increase. • The test Statistic is Wald Test: Z = ln(𝜃) 𝑆𝐸(ln𝜃) 24 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 25.
    Inference for OddsRatios and Log Odds Ratios • Because the sampling distribution is closer to normality for ln𝜃 than 𝜃, it is better to construct confidence intervals for lnθ. • Then transform back (that is, take antilog, using the exponential function, discussed below) to form a confidence interval for θ. • A large-sample confidence interval for lnθ is ln𝜃 ± 𝑍α/2 .SE(ln𝜃) • Back-transform to odds ratio scale CI for θ = (𝑒 ln𝜃 − 𝑍 α/2 .SE , 𝑒 ln𝜃 + 𝑍 α/2 .SE ) Example: • For the previous example, the natural log of 𝜃 equals ln(1.832) = 0.605, and the SE becomes; 𝑆𝐸 = 1 189 + 1 10845 + 1 104 + 1 10933 = 0.123 25 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 26.
    Inference for OddsRatios and Log Odds Ratios • For the population, a 95% confidence interval for lnθ equals 0.605 ± 1.96(0.123), or (0.365, 0.846). • Back-transform to odds ratio scale: The corresponding confidence interval for θ is (e0.365, e0.846) = (1.44, 2.33) • This is from the equality of ex = c is equivalent to ln(c) = x – For instance, e0 = exp(0) = 1 corresponds to ln(1) = 0; similarly, e0.7 = exp(0.7) = 2.0 corresponds to ln(2) = 0.7] • Since the 95% confidence interval (1.44, 2.33) for the odds ratio does not contain 1.0, the true odds of MI seem different for the two groups. • We estimate that the odds of MI are at least 44% higher for subjects taking placebo than for subjects taking aspirin. 26 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 27.
    Relationship Between OddsRatio and Relative Risk Odds ratio = 𝑝1/(1 −𝑝1) 𝑝2/(1 − 𝑝2) = Relative risk × ( 1 −𝑝2 1 −𝑝1 ) • When 𝑝1and 𝑝2 are both close to zero, the fraction in the last term of this expression equals approximately 1.0. – The odds ratio and relative risk then take similar values. • For Example: From the previous example MI, proportion of MI cases is close to zero, and then sample odds ratio of 1.83 is similar to the sample relative risk of 1.82. 27 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 28.
    Chi-squared Tests ofIndependence • The chi-square tests are statistical methods used to determine whether there is a significant association between categorical variables. • It is applied in the context of contingency tables, where data is organized into rows and columns representing different levels of two categorical variables. • The Chi-Square Test of Independence determines whether two categorical variables are independent of each other. • Null Hypothesis (H₀): The variables are independent • Alternative Hypothesis (Ha): The variables are not independent 28 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 29.
    Pearson Statistic andthe Chi-Squared Distribution • Pearson Chi-Square Test: Developed by Karl Pearson in 1900, it evaluates whether observed frequencies align with theoretical expectations for multinomial distributions. • Consider the null hypothesis (Ho) that cell probabilities equal certain fixed values {π𝑖𝑗}, for a sample of size n with cell counts {𝑛𝑖𝑗}, – The values {μ𝑖𝑗 = nπ𝑖𝑗} are expected frequencies. – They represent the values of the expectations {E(𝑛𝑖𝑗)} when Ho is true. • The Pearson chi-squared statistic for testing H0 is 𝑋2 = (𝑛𝑖𝑗 − μ𝑖𝑗)2 μ𝑖𝑗 , where, • This statistic takes its minimum value of zero when all 𝑛𝑖𝑗 = μ𝑖𝑗. 29 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024 µ𝑖𝑗 = 𝑛𝑖+ ∗ 𝑛+𝑗 𝑛 • ni+​ = total count for row I • n+j​ = total count for column j • n = total sample size
  • 30.
    Pearson Statistic • Fora fixed sample size, greater differences {𝑛𝑖𝑗 − μ𝑖𝑗} produce larger X2 values and stronger evidence against Ho. • The Pearson X2 statistic follows a chi-square distribution asymptotically as the sample size (n) becomes large. • Decision Rule: – Compare the computed χ2 to the critical value from the chi-square distribution table with df degrees of freedom. – Reject Ho​ if χ2 exceeds the critical value or if the p-value is less than the significance level (α). 11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 30
  • 31.
    Chi-square distribution • Itis a continuous probability distribution used in hypothesis testing. It is a special case of the gamma distribution. • Properties: – It is skewed to the right. – The shape depends on the degrees of freedom (df). – As df increases, the distribution approaches normality. • For a chi-square random variable X∼χ2(k), the PDF is: • Where: – Γ(⋅): Gamma function, – k: Degrees of freedom. 11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 31
  • 32.
    Chi-square distribution: Applications •Goodness-of-Fit Test: – Tests whether an observed frequency distribution matches an expected distribution. • Test of Independence: – Evaluates whether two categorical variables are independent in a contingency table. • Test of Homogeneity: – Determines if different populations share the same distribution of a categorical variable. • Confidence Intervals for Variance: – For a normal population with variance ς2, the chi-square distribution is used to construct confidence intervals: • Analysis of Variance (ANOVA): – Variance components are analyzed using chi-square distributions. 11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 32
  • 33.
    Likelihood-Ratio Statistic • Analternative to the Pearson χ2, the Likelihood-Ratio Statistic is based on the log-likelihood. • For two-way contingency tables with likelihood function based on the multinomial distribution, the likelihood-ratio statistic is given by 𝐺2 = 2 𝑛𝑖𝑗𝑙𝑜𝑔 𝑛𝑖𝑗 μij – This statistic is called the likelihood-ratio chi-squared statistic. • Like the Pearson statistic, G2 takes its minimum value of 0 when all nij = μij , and larger values provide stronger evidence against Ho. • Under Ho, the two statistics follows chi-squared distribution, and they often yield similar conclusions, especially for large sample sizes. • Preferred when some expected frequencies are small (<5) 33 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 34.
    Tests of Independence •In two-way contingency tables with joint probabilities {πij} for two response variables, the null hypothesis of statistical independence is H0: πij = πi+π+j for all i and j • The marginal probabilities then determine the joint probabilities. • To test H0, we identify μij = nπij = nπi+π+j as the expected frequency. • Here, μij is the expected value of nij assuming independence. • Usually, {πi+} and {π+j} are unknown, as is this expected value. 34 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 35.
    Tests of Independence •To estimate the expected frequencies, substitute sample proportions for the unknown marginal probabilities, giving • This is the row total for the cell multiplied by the column total for the cell, divided by the overall sample size. • The {𝜇ij} are called estimated expected frequencies. • For testing independence in I×J contingency tables, the Pearson and likelihood ratio statistics equal • Their large-sample 𝑋2 distributions have df =(I − 1)(J − 1). 35 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 36.
    Tests of Independence:Example • Gender Vs Political party identification (Democratic, Independent or Republican) data, and test their independency. 36 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 37.
    Tests of Independence:Example • Ho: political party identification and gender are Independent Using likelihood test: G2 = 2 nij 𝒍𝒐𝒈 nij µij • The expected frequency, µij = 𝑛𝑖+𝑛+𝑗 𝑛 µ11 = 𝑛1+𝑛+1 𝑛 = µ11 = 1246∗1557 2757 = 703. µ12 = 𝑛1+𝑛+2 𝑛 = µ11 = 1557∗566 2757 = 319.6 …….. µ23 = 𝑛2+𝑛+3 𝑛 = µ23 = 1200∗945 2757 = 411.3 • Now, G2 = 2[762 log 762 703.7 +…+477 log 477 411.3 ] = 30 • with (I-1)(J-1) =2 degree of freedom 37 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 38.
    Tests of Independence:Example • Using Chi-square statistics: 𝑋2 = (𝑛𝑖𝑗−µ𝑖𝑗)2 µ𝑖𝑗 = (762−703.7)2 703.7 + ⋯ + (477−411.3)2 411.3 𝑋2 = 30.1 with the same degree of freedom 2 • And then the tabulated value, X2 0.05(2) is 5.99 • Conclusion: – This result rejects the independency evidence. – Therefore, from both results, we can conclude that political party identification and gender are associated. 38 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 39.
    Testing for independencefor ordinal data • When the rows and/or the columns are ordinal, the chi-squared test of independence using test statistic X2 or G2 ignores the ordering information. • Test statistics that use the ordinality by treating ordinal variables as quantitative rather than qualitative (nominal scale) are usually more appropriate and provide greater power. Linear Trend Alternative to Independence • When the variables are ordinal, a trend association is common. • As the level of X increases, responses on Y tend to increase toward higher levels, or responses on Y tend to decrease toward lower levels. • Eg. A researcher examines the relationship between physical activity level (low, moderate, high) and overall health rating (poor, fair, good, excellent). – As physical activity level increases from low to high, the proportion of responses shifts toward higher health ratings. 39 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 40.
    Testing for independencefor ordinal data • To detect a trend association, a simple analysis assigns scores to categories and measures the degree of linear trend. • The test statistic, which is sensitive to positive or negative linear trends, utilizes correlation information in the data. • Let u1 ≤ u2 ≤ · · ·≤ ul denote scores for the rows, and let v1 ≤ v2 ≤ · · · ≤ vj denote scores for the columns, – These scores often reflect the ordinal ranking of the categories (e.g., 1, 2, 3) but can be adjusted based on the relative distances or importance of the categories • Let 𝒖 = 𝒖𝒊𝒏𝒊+ 𝒊 denote the weighted mean of the row scores, and • 𝒗 = 𝒗𝒋𝒏+𝒋 𝒋 weighted mean of the column scores 40 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 41.
    Testing for independencefor ordinal data • Then, the correlation between X and Y can be given by: 𝑟 = (𝒖𝒊−𝒖)(𝒗𝒋−𝒗)𝒏𝒊𝒋 𝒊,𝒋 (𝒖𝒊−𝒖)𝟐 𝒏𝒊+ 𝒊 (𝒗𝒋−𝒗)𝟐 𝒏+𝒋 𝒋 = 𝐶𝑜𝑣(𝑋,𝑌) 𝑆𝑋𝑆𝑌 • Where – 𝒖𝒊​, 𝒗𝒋: Scores for the i-th row and j-th column, respectively, – 𝒖, 𝒗: Weighted means of the scores for rows and columns, – 𝒏𝒊𝒋​: Observed frequency in cell (i,j), – 𝒏𝒊+​: Row totals, – 𝒏+𝒋: Column totals. • Here, independence between the variables implies that its population value ρ equals zero. • For testing Ho: independence against the two-sided Ha: ρ ≠ 0, a test statistic is: M2 = (n-1)r2, and M2 ~ X2(1 df) 41 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/24/2024
  • 42.
    Exercise: Testing forindependence for ordinal data Test the association between Item 1: A working mother can establish just as warm and secure a relationship with her children as a mother who does not work. and Item 2: Working women should have paid maternity leave. 42 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 43.
    Exercise: Testing forindependence for ordinal data • Using X2 and G2, having df = 12, G2 = 47.576 (P = <0.001) and X2 = 44.961(P = <0.001). • There is a “linear trend” in these data, so we can describe this relationship using a single statistic: 𝑟 = 𝐶𝑜𝑣(𝑋, 𝑌) 𝑆𝑋𝑆𝑌 • To compute r, we need scores for both the row (item 1) categories and the column (item 2) categories. So, the score for categories: 𝑅𝑜𝑤𝑠: 𝑢1 = 1, 𝑢2 = 2, 𝑢3 = 3, 𝑢4 = 4 Columns: 𝑣1 = 1, 𝑣2 = 2, 𝑣3 = 3, 𝑣4 = 4, 𝑣5 = 5 • From this, r = .203 and M2 = (884 − 1)(.203)2 = 36.26 With df = 1, p–value for observed M2 is < .001. 43 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 11/25/2024
  • 44.
    Exact Inference forSmall Samples • Exact inference methods are used for small samples when large- sample approximations (e.g., normal or chi-square) may not be reliable. • These methods rely on the exact sampling distribution of test statistics rather than approximations. • Fisher's Exact Test is a common method for small-sample exact inference. • This involves using exact distributions like the binomial or hypergeometric distribution to compute the exact p-value instead of approximations. – Binomial Distribution: For binary response data (success/failure), inference directly uses the binomial probabilities without relying on asymptotic normality. – Hypergeometric Distribution: Applies when sampling is without replacement, typically from a finite population. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 44
  • 45.
    Association in Three-WayTables • Three-way tables involve three categorical variables. They help assess how two variables are associated, considering the levels of a third variable. • Analysis often examines conditional or marginal associations and uses measures like conditional odds ratios. • Types of Associations: – Marginal Independence: Two variables are marginally independent if their association is assessed without considering the third variable. • Example: 𝑋 and 𝑌 are marginally independent if the marginal table of 𝑋 and 𝑌(ignoring 𝑍) shows no association. – Conditional Independence: Two variables are conditionally independent given the third variable if their association within each level of the third variable is independent. Denoted, X⊥Y∣Z. – Homogeneous Association: The association between two variables 𝑋 and 𝑌 is homogeneous across all levels of 𝑍 if the odds ratios for 𝑋 and 𝑌 are the same at each level of 𝑍. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 45
  • 46.
    Measures of Association •Conditional Odds Ratio: For two variables 𝑋 and 𝑌 at a specific level of 𝑍=𝑘, the odds ratio is • where 𝑛𝑖𝑗𝑘​ are the observed counts in the three-way table. • Homogeneous Odds Ratio: If the odds ratio 𝑂𝑅𝑋𝑌∣𝑍=𝑘 is the same for all 𝑘, then the association between 𝑋 and 𝑌 is homogeneous across 𝑍. • Conditional Independence Model: Assumes that 𝑋 and 𝑌 are independent given 𝑍. – This can be tested using likelihood ratio statistics or loglinear models. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 46
  • 47.
    Testing for Association •Likelihood Ratio Test: Compares a simpler (nested) model with a more complex model. • Test statistic: – 𝑛𝑖𝑗𝑘 : Observed counts. – 𝑛ijk : Expected counts under the simpler model. • Interpretation: A significant 𝐺2 indicates the simpler model does not fit well, suggesting the inclusion of additional interactions. • Example: Suppose we study smoking habits (𝑋), exercise levels (𝑌), and age groups (𝑍) in a population. A three-way table is formed with counts for each combination. – To test whether smoking (𝑋) and exercise (𝑌) are independent within age groups (𝑍) 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 47
  • 48.
    Chi-Square Test ofHomogeneity • This test compares whether multiple populations have the same distribution across categories. • Hypotheses: – 𝐻0: The distributions across groups are the same. – 𝐻𝑎: The distributions across groups differ. • Test Statistic: 𝑋2 = (𝑛𝑖𝑗 − μ𝑖𝑗) μ𝑖𝑗 2 , μ𝑖𝑗 = • Where 𝑛𝑖𝑗 is the observed count, and μ𝑖𝑗 is the expected count under 𝐻o: • The degrees of freedom (df) for the test is: df=(𝐼−1)(𝐽−1) 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 48
  • 49.
    Chi-Square Test ofHomogeneity: Example • A researcher investigates whether the preference for different types of beverages (e.g., coffee, tea, soda) is the same across three age groups (young, middle-aged, senior). • Data Table: • For example for the cell (Young, Coffee): µ11 = 𝑛1+𝑛+1 𝑛 = 100 ∗ 150 300 = 50 • df=(3−1)(3−1)=4. 11/25/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 49 Coffee Tea Soda Total Young 50 30 20 100 Middle-aged 60 20 20 100 Senior 40 50 10 100 Total 150 100 50 300
  • 50.
    Chi-Square Test ofGoodness-of-Fit • This test assesses if observed categorical data fit a specific theoretical distribution. • Hypotheses: – 𝐻0: The observed distribution follows the expected distribution. – 𝐻𝑎: The observed distribution does not follow the expected distribution. • Test Statistic: 𝑋2 = (𝑛𝑖 − μ𝑖) μ𝑖 2 – Where n𝑖​ and μ𝑖​ are the observed and expected count respectively • The expected frequency for each category μi is calculated as: μ𝑖= 𝑛⋅p𝑖​ • Where, – 𝑛: Total number of observations, – 𝑝𝑖: The hypothesized proportion for category 𝑖. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 50
  • 51.
    Chi-Square Test ofGoodness-of-Fit • The df for the test is: df = 𝑘−1, where k is number of categories • If parameters (e.g., mean or variance) are estimated from the data, the degrees of freedom reduce further: • df = k−1−(number of parameters estimated). • Example: A die is rolled 60 times, and the observed frequencies for the six faces are recorded. The hypothesized distribution is that the die is fair, so each face should appear with equal probability (𝑝𝑖=1/6). 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 51 Face 1 2 3 4 5 6 Observed(n𝑖) 10 8 12 14 9 7
  • 52.
    Chi-Square Test ofGoodness-of-Fit • For a fair die, the expected frequency for each face is: μ𝑖= 𝑛⋅p𝑖​ ​= 60*1/6 ​=10 • Compute 𝑋2 = (𝑛𝑖 − μ𝑖) μ𝑖 2 = 3.4 df = 6 – 1 = 5 • Using a chi-square distribution table, compare χ2=3.4 with df=5. At α=0.05, the critical value is approximately 11.07. Since 3.4<11.07, we fail to reject H0​. • Conclusion: The p-value indicates that the observed frequencies are consistent with the expected frequencies under the assumption that the die is fair. There is no evidence to reject Ho • Key Notes: Use the test only when expected frequencies are sufficiently large (usually μi ≥ 5). For small expected frequencies, use an exact test or combine categories. 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 52
  • 53.
    Summary Test Purpose FormulaKey Assumption Exact Inference Analyze small samples without large-sample approximations Depends on the specific test (e.g., hypergeometric for Fisher’s Test). Assumes exact probabilities of outcomes (discrete distributions). Three-Way Association Examine relationships in three-way contingency tables Loglinear model: log(𝜇𝑖𝑗𝑘)=… Likelihood Ratio Test Assumes multinomial or Poisson sampling. Chi-Square Homogeneity Compare distributions across groups 𝑋2 = (𝑛𝑖𝑗 − μ𝑖𝑗) μ𝑖𝑗 2 Assumes sufficiently large sample size for expected frequencies. Chi-Square Goodness-of- Fit Compare observed vs. expected distribution 𝑋2 = (𝑛𝑖 − μ𝑖) μ𝑖 2 Assumes categories are mutually exclusive, and expected frequencies are sufficiently large (usually >5). 11/24/2024 Ahmed Hasan(ahmed.hasan@mwu.edu.et) 53
  • 54.
    The End Thank You! 11/24/2024Ahmed Hasan(ahmed.hasan@mwu.edu.et) 54