2. Goodness-of-Fit and Contingency Tables
Contingency Tables
2
Objectives:
1 Test a distribution for goodness of fit, using chi-square.
2 Test two variables for independence, using chi-square.
3 Test proportions for homogeneity, using chi-square.
3. 3
Recall: Goodness-of-Fit
By “goodness-of-fit” we mean that sample data consisting of observed frequency
counts arranged in a single row or column (called a one-way frequency table) agree
with some particular distribution (such as normal or uniform) being considered. We
will use a hypothesis test for the claim that the observed frequency counts agree with
the claimed distribution.
Definition: A Multinomial Experiment is an experiment that meets the following conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified into exactly one of several different categories.
4. The probabilities for the different categories remain constant for each trial.
Measuring Disagreement with the Claimed Distribution
We know that sample frequencies typically differ somewhat from the values we theoretically expect, so we consider the key question:
Are the differences between the actual observed frequencies O and the theoretically
expected frequencies E significant?
To measure the discrepancy between the O and E values, we use the test statistic:
d.f. = number of categories minus 1
O = observed frequency
E = expected frequency
2
2
Test Statistic:
( )
1
O E
E
df k
4. 4
Recall: Goodness-of-Fit Notation
O represents the observed frequency of an outcome, found from the sample data.
E represents the expected frequency of an outcome, found by assuming that the distribution is as claimed.
k represents the number of different categories or cells.
n represents the total number of trials (or the total of observed sample values).
p represents the probability that a sample value falls within a particular category.
Requirements:
1. The data have been randomly selected.
2. The sample data consist of frequency counts for each of the different categories.
3. For each category, the expected frequency is at least 5. (The expected frequency for a category is the frequency that would occur if the data actually
have the distribution that is being claimed. There is no requirement that the observed frequency for each category must be at least 5.)
Null and Alternative Hypotheses
H0: The frequency counts agree with the claimed distribution.
H1: The frequency counts do not agree with the claimed distribution.
Critical values:
1. Critical values are found in 𝜒2
− 𝑇𝑎𝑏𝑙𝑒 by using k − 1 degrees of freedom, where k is the number of categories.
2. Goodness-of-fit hypothesis tests are always right-tailed.
3. Conducting a goodness-of-fit test requires that we identify the observed frequencies denoted by O, then find the frequencies expected (denoted by E)
with the claimed distribution. There are two different approaches for finding expected frequencies E:
𝑬𝒒𝒖𝒂𝒍 𝑭𝒓𝒆𝒒: 𝑬 =
𝒏
𝒌
, otherwise: E = np
TI Calculator:
Goodness of Fit - test
1. Stat
2. Tests
3. 𝝌 𝟐
𝑮𝑶𝑭 − 𝑻𝒆𝒔𝒕
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. 𝐝𝐟 = 𝐧 − 𝟏
6. Calculate
TI Calculator: Enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. O → 𝑳 𝟏 & E → 𝑳 𝟐
5. 5
Contingency Tables
Contingency Table: A contingency table (or two-way frequency table) is a table consisting of
frequency counts of categorical data corresponding to two different variables. (One variable is used to
categorize rows, and a second variable is used to categorize columns.)
When data can be tabulated in table form in terms of frequencies, several types of hypotheses can
be tested by using the chi-square test.
Test of Independence (of variables):
In a test of independence, we test the null hypothesis that in a contingency table, the row and
column variables are independent. (That is, there is no dependency between the row variable and the
column variable.)
The test of independence of variables is used to determine whether two variables are independent
of or related to each other when a single sample is selected.
Chi-Square Test of Homogeneity: (Test of test of homogeneity of proportions)
A chi-square test of homogeneity is a test of the claim that different populations have the same
proportions of some characteristics.
The test of homogeneity of proportions is used to determine whether the proportions
for a variable are equal when several samples are selected from different populations.
6. 6
Contingency Tables Objective, Notation & Requirements
Conduct a hypothesis test of independence between the row variable and column variable in a contingency table.
O represents the observed frequency in a cell of a contingency table.
E represents the expected frequency in a cell, found by assuming that the row and column variables are
independent.
r represents the number of rows in a contingency table (not including labels or row totals).
c represents the number of columns in a contingency table (not including labels or column totals).
Requirements
1. The sample data are randomly selected.
2. The sample data are represented as frequency counts in a two-way table.
3. For every cell in the contingency table, the expected frequency E is at least 5. (There is no requirement that
every observed frequency must be at least 5.)
H0: The row and column variables are independent. (There is no relationship between two variables.)
H1: The row and column variables are dependent. (There is a relationship between two variables.)
4. The critical values are found in Chi-Square Table ( 𝜒2
− 𝑇𝑎𝑏𝑙𝑒), Df = (r − 1) (c − 1)
where r is the number of rows and c is the number of columns.
2. Tests of independence with a contingency table are always right-tailed.
2
2
:
( - 1)( - 1)
O E
TS
E
df r c
(row sum)(clumn sum)
grand total
E
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
7. 7
Contingency Tables
d.f. = (rows – 1) (columns – 1) = (R – 1)(C – 1)
Test Statistic:
2
2 ( )O E
E
(row sum)(clumn sum)
grand total
E
d.f. = (R – 1)(C – 1)
O = observed frequency
E = expected frequency :
8. 8
The cells of this table contain frequency counts.
The frequency counts are the observed values, and
the expected values are shown in parentheses. The
row variable identifies the treatment used for a
stress fracture in a foot bone, and the column
variable identifies the outcome as a success or
failure. Refer to the table and find the expected
frequency for the cell in the first row and first
column, where the observed frequency is 54.
Example 1: Contingency Tables: Finding Expected Frequency
blank Success Failure
Surgery 54(E = 47.478) 12(E = 18.522)
Weight-Bearing Cast 41(E = 66.182) 51(E = 25.818)
Non-Weight-Bearing Cast
for 6 Weeks
70(E = 52.514) 3(E = 20.486)
Non-Weight-Bearing Cast
for Less Than 6 Weeks
17(E = 15.826) 5(E = 6.174)
(66)(182)
253
E
Total: Sum =182 Sum = 71
66
92
73
22
253
47.478
Interpretation:
Assuming that success is independent
of the treatment, then we expect to find
that 47.478 of the subjects would be
treated with surgery and that treatment
would be successful. There is a
discrepancy between O = 54 and E =
47.478, and such discrepancies are key
components of the test statistic that is a
collective measure of the overall
disagreement between the observed
frequencies and the frequencies
expected with independence between
the row and column variables.
(row sum)(clumn sum)
grand total
E
9. Example 2:
9
H0: Success is independent of the treatment. (Claim)
H1: Success and the treatment are dependent. RTT
df = (r − 1)(c − 1) = (4 − 1)(2 − 1) = 3 & α = 0.05
2
2
:
O E
E
TS
2 2
(54 47.4782) (5 6.1742)
...
47.4782 6.1742
2
2
:
( - 1)( - 1)
O E
TS
E
df r c
Decision:
a. Reject H0
b. The claim is False
c. There is NOT sufficient evidence
to support the claim that success
of the treatment is independent
of the type of treatment.
58.393
Use the same sample data from the previous example with a 0.05 significance
level to test the claim that success of the treatment is independent of the type of
treatment. What does the result indicate about the increasing trend to use surgery?
REQUIREMENT
CHECK:
1. On the basis of the
study description,
assume random
selection & assignment
of subjects to the
different treatment
groups.
2. Results are
frequency counts.
3. The expected
frequencies are all at
least 5. (lowest =
6.174.)
OR: P-Value from Table A-4: TS: 𝜒2
= 58.393 > highest value (12.838) in
Table →P-value < 0.005.
Interpretation: Success is dependent on the treatment and the success rates of 81.8% (54/54+12 or 66),
44.6% (41/92), 95.9% (70/73), and 77.3% (17/22) suggest that the best treatment is to use a non–
weight-bearing cast for 6 weeks. These results suggest that the increasing use of surgery is a treatment
strategy that is not supported by the evidence. (from the table of example 1)
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using α
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is
/ is not sufficient evidence to
support the claim that…
→ CV: 𝜒2 = 7.815
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
10. 10
Example 3:
Data given is a sample of 3 hospitals was selected, and the number of infections for a
specific year has been reported. Test the claim that there is a relationship between the
hospital and the number of patient infections. (Number of patient infections depends
on the hospital).
(row sum)(clumn sum)
grand total
E
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
11. 11
Example 3:
H0: The number of infections is
independent of the hospital.
H1: The number of infections is
dependent on the hospital (claim), RTT
df = (r − 1)(c − 1) = (3 – 1)(3 – 1) = 4 & α = 0.05
Decision:
a. Reject H0
b. The claim is True
c. There is sufficient evidence
to support the claim that
the number of infections is
related to the hospital
where they occurred.
→ CV: 𝜒2 = 9.488
2
2
:
( - 1)( - 1)
O E
TS
E
df r c
2
2
:
O E
E
TS
12. 12
A researcher wishes to determine whether there is a
relationship between the gender of an individual and
the amount of alcohol consumed. A sample of 68
people is selected, and the following data are
obtained. At α = 0.10, can the researcher conclude
that alcohol consumption is related to gender?
Example 4:
1,1
27 23
9.13
68
E
Gender
Alcohol Consumption
TotalLow Moderate High
Male
10 9 8
27
Female
13 16 12
41
Total 23 25 20 68
(9.13) (9.93) (7.94)
(13.87) (15.07) (12.06)
2 2 2
2 2 2
10 9.13 9 9.93 8 7.94
9.13 9.93 7.94
13 13.87 16 15.07 12 12.06
13.87 15.07 12.06
0.283
H0: The amount of alcohol that a person consumes is independent of the individual’s gender.
H1: The amount of alcohol that a person consumes is dependent on the individual’s gender (claim), RTT
df = (r − 1)(c − 1) = (2 – 1 )(3 – 1) = 2 & α = 0.10 → CV: 𝜒2 = 4.605
Decision:
a. Do not Reject H0
b. The claim is False
c. There is not enough evidence to
support the claim that the amount of
alcohol a person consumes is
dependent on the individual’s gender.
(row sum)(clumn sum)
grand total
E
2
2
:
O E
E
TS
2
2
:
( - 1)( - 1)
O E
TS
E
df r c
13. 13
Contingency Tables: Test for Homogeneity of Proportions
Contingency Tables: Chi-Square Test of Homogeneity:
A chi-square test of homogeneity is a test of the claim that different populations
have the same proportions of some characteristics.
Assumptions for Homogeneity of Proportions:
In conducting a test of homogeneity, we can use the same notation, requirements, test statistic, critical
value, and procedures given previously, with this exception: Instead of testing the null hypothesis of
independence between the row and column variables, we test the null hypothesis that the different
populations have the same proportion of some characteristic.
1. The data are obtained from a random sample.
2. The expected frequency for each category must be 5 or more.
In a typical test of independence, sample subjects are randomly selected from one
population and values of two different variables are observed.
In a typical chi-square test of homogeneity, subjects are randomly selected from the
different populations separately.
14. 14
Example 5:
This table lists results from an experiment in which 12 wallets were intentionally
lost in each of 16 different cities, including New York City, London, Amsterdam,
and so on. Use a 0.05 significance level with the data to test the null hypothesis
that the cities have the same proportion of returned wallets. (Note: This Lost
Wallet Test” implies that whether a wallet is returned is dependent on the city in
which it was lost.) Test the claim that the proportion of returned wallets is not
the same in the 16 different cities.
City A B C D E F G H I J K L M N O P
Wallet
Returned
8 5 7 11 5 8 6 7 3 1 4 2 4 6 4 9
Wallet Not
Returned
4 7 5 1 7 4 6 5 9 11 8 10 8 6 8 3
REQUIREMENT CHECK
(1) Based on the description of the study, we will treat the subjects as being randomly selected and
randomly assigned to the different cities.
(2) The results are expressed as frequency counts.
(3) The expected frequencies are all at least 5. (All expected values are either 5.625 or 6.375.) The
requirements are satisfied.
15. 15
Example 5:
H0: Whether a lost wallet is returned is independent of the city in
which it was lost. (p1 = p2 = p3 = … = pn), Claim
H1: A lost wallet being returned depends on the city in which it was
lost. (At least one proportion is different from the others), RTT
df = (16 – 1 )(2 – 1) = 15 & α = 0.05
Decision:
a. Reject H0, (Reject independence)
b. The claim is False
c. There is not enough evidence to support the claim that the proportion of
returned wallets is independent of the city in which it was lost. (The
proportion of returned wallets depends on the city in which they were
lost.) (There is sufficient evidence to conclude that the proportion of
returned wallets is not the same in the 16 different cities.)
P-value = 0.002
χ² = 35.388
CV : χ² = 24.996 TS: χ² = 35.388
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
2
2
:
( - 1)( - 1)
O E
TS
E
df r c
→ CV: 𝜒2 = 24.996
2
2
:
O E
E
TS
16. 16
Example 6:
100 people were selected from 4 income groups. They were asked if
they were “very happy.” The percent for each group who responded
yes and the number from the survey are shown in the table. At α =
0.05 test the claim that there is no difference in the proportions.
𝑬 =
𝒏
𝒌
=
𝟒𝟎𝟎
𝟒
= 𝟏𝟎𝟎
(row sum)(clumn sum)
grand total
E
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
17. 17
Example 6:
H0: p1 = p2 = p3 = p4 , Claim
H1: At least one of the proportions differs from the other. RTT
df = (R – 1)(C – 1) = (2 – 1)(4 – 1) = 3 & α = 0.05
Decision:
a. Reject H0
b. The claim is False
c. There is not enough evidence to support the claim that
there is no difference in the proportions. Hence the
incomes seem to make a difference in the proportions.
χ² = 14.149
TI Calculator:
Contingency Table
1. Access Matrix (2nd &
Press 𝒙−𝟏
)
2. Edit, Enter
Dimensions & cell
entries
3. Stat
4. Tests
5. 𝝌 𝟐
− 𝑻𝒆𝒔𝒕
6. Observed Matrix
must be A
7. Calculate
2
2
:
O E
E
TS
→ CV: 𝜒2
= 7.815
2
2
:
( - 1)( - 1)
O E
TS
E
df r c