2. Goodness-of-Fit and Contingency Tables
Goodness-of-Fit
2
Objectives:
1 Test a distribution for goodness of fit, using chi-square.
2 Test two variables for independence, using chi-square.
3 Test proportions for homogeneity, using chi-square.
3. 3
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList ๐ณ ๐ & ๐ณ ๐
4. Type in your data in ๐ณ ๐ & ๐ณ ๐
Recall: Correlation & Regression
The linear correlation coefficient r, is a number that measures how well paired sample data fit a
straight-line pattern when graphed . The value of rยฒ is the proportion of the variation in y that is
explained by the linear relationship between x and y. (The linear correlation coefficient:โ1 โค r โค 1)
Regression Line:
Given a collection of paired sample
data, the regression line (or line of best
fit, or least-squares line) is the straight
line that โbestโ fits the scatterplot of the
data.
No significant linear correlation โ
The best predicted y-value is ๐ฆ.
Significant linear correlation โ The
best predicted y-value is found by
Plugging x-value into the regression
equation.
TI Calculator:
Scatter Plot:
1. Press on Y & clear
2. 2nd y, Enter
3. On, Enter
4. Select X1-list: ๐ณ ๐
5. Select Y1-list: ๐ณ ๐
6. Mark: Select
Character
7. Press Zoom & 9 to get
ZoomStat
zx denotes the z score for an individual sample value x
zy is the z score for the corresponding sample value y.
2
2
:
:
, 2
1
T
n
t r d
O
r
r
fS n
r
๏ญ
๏ฝ ๏ฝ ๏ญ
๏ญ
TI Calculator:
Linear Regression - test
1. Stat
2. Tests
3. LinRegTTest
4. Enter ๐ณ ๐ & ๐ณ ๐
5. Freq = 1
6. Choose โ
7. Calculate
๏จ ๏ฉ ๏จ ๏ฉ๏จ ๏ฉ
๏จ ๏ฉ ๏จ ๏ฉ ๏จ ๏ฉ ๏จ ๏ฉ
2 22 2
)
1
:
(
,
x y
n xy x y Z Z
r r
n
n x x n y y
Or
๏ญ
๏ฝ ๏ฝ
๏ญ๏ฉ ๏น ๏ฉ ๏น๏ญ ๏ญ๏ช ๏บ ๏ช ๏บ๏ซ ๏ป ๏ซ ๏ป
๏ฅ ๏ฅ ๏ฅ ๏ฅ
๏ฅ ๏ฅ ๏ฅ ๏ฅ
Test of Hypothesis:
Step 1: H0 :๐ = 0, H1: ๐ โ 0 claim
& Tails
Step 2: TS: ๐ก = ๐
๐โ2
1โ๐2 , OR: r
Step 3: CV using ฮฑ From the
T-table or Correlation (r-table)
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is /
is not sufficient evidence to
support the claim thatโฆ
There is a linear Correlation If |r|
โฅ critical value
There is No Correlation If |r| <
critical value
๏จ ๏ฉ
0 1
1 22
0 1
Regression Line:
,
:
int :
,
y b b x
n xy x y
Slope b
n x x
Y ercept b y b x
y x
y x
n n
๏ฝ ๏ซ
๏ญ
๏ฝ
๏ญ
๏ญ ๏ฝ ๏ญ
๏ฝ ๏ฝ
๏ฅ ๏ฅ ๏ฅ
๏ฅ ๏ฅ
๏ฅ ๏ฅ
4. 4
Goodness-of-Fit
Key Concept:
By โgoodness-of-fitโ we mean that sample data consisting of observed frequency counts
arranged in a single row or column (called a one-way frequency table) agree with some
particular distribution (such as normal or uniform) being considered. We will use a hypothesis
test for the claim that the observed frequency counts agree with the claimed distribution.
Definition: A Multinomial Experiment is an experiment that meets the following conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified into exactly one of several different categories.
4. The probabilities for the different categories remain constant for each trial.
Goodness-of-Fit Test
A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or
conforms to) some claimed distribution.
Conduct a goodness-of-fit test, which is a hypothesis test to determine whether a single row (or
column) of frequency counts agrees with some specific distribution (such as uniform or normal).
5. 5
Goodness-of-Fit Notation
O represents the observed frequency of an outcome, found from the sample data.
E represents the expected frequency of an outcome, found by assuming that the distribution is as
claimed.
k represents the number of different categories or cells.
n represents the total number of trials (or the total of observed sample values).
p represents the probability that a sample value falls within a particular category.
Requirements:
1. The data have been randomly selected.
2. The sample data consist of frequency counts for each of the different categories.
3. For each category, the expected frequency is at least 5. (The expected frequency for a category is
the frequency that would occur if the data actually have the distribution that is being claimed.
There is no requirement that the observed frequency for each category must be at least 5.)
Null and Alternative Hypotheses
H0: The frequency counts agree with the claimed distribution.
H1: The frequency counts do not agree with the claimed distribution.
6. 6
Goodness-of-Fit Notation
P-values: P-values are provided by technology, or a range of P-values can be
found from ๐2
โ ๐๐๐๐๐.
Critical values:
1. Critical values are found in ๐2
โ ๐๐๐๐๐ by using k โ 1 degrees of freedom,
where k is the number of categories.
2. Goodness-of-fit hypothesis tests are always right-tailed.
3. Conducting a goodness-of-fit test requires that we identify the observed
frequencies denoted by O, then find the frequencies expected (denoted by
E) with the claimed distribution. There are two different approaches for
finding expected frequencies E:
If the expected frequencies are not all equal:
Calculate E = np for each individual category.
2
2
Test Statistic:
( )
1
O E
E
df k
๏ฃ
๏ญ
๏ฝ
๏ฝ ๏ญ
๏ฅ
d.f. = number of categories minus 1
O = observed frequency
E = expected frequency
๐ฐ๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐ ๐๐๐ ๐๐๐๐๐: ๐ฌ =
๐
๐
๐ธ๐๐ข๐๐ ๐น๐๐๐: ๐ธ =
๐
๐
, otherwise: E = np
7. Example 1:
7
Finding Expected Frequencies:
Equally Likely A single die is rolled 45 times with the following
results. Assuming that the die is fair and all outcomes are equally
likely, find the expected frequency E for each empty cell.
Outcome 1 2 3 4 5 6
Observed Frequency O 13 6 12 9 3 2
Expected Frequency E
๐ธ =
๐
๐
If the die is fair and the outcomes are all
equally likely, we expect that each
outcome should occur about 7.5 times.
=
45
6
= 7.5
b. Not Equally Likely Using the same results from part (a), suppose that we claim that instead
of being fair, the die is loaded so that the outcome of 1 occurs 50% of the time and the other
five outcomes occur 10% of the time. The probabilities are listed in the second row below.
Outcome 1 2 3 4 5 6
Probability 0.5 0.1 0.1 0.1 0.1 0.1
Observed Frequency O 13 6 12 9 3 2
Expected Frequency E
n = 45 & E = np
1: E = np = (45)(0.5) = 22.5
Each of the other five outcomes:
E = np = (45)(0.1) = 4.5
7.5 7.5 7.5 7.5 7.5 7.5
22.5 4.5 4.5 4.5 4.5 4.5
๐ธ๐๐ข๐๐ ๐น๐๐๐: ๐ธ =
๐
๐
,
otherwise: E = np
8. 8
Measuring Disagreement with the Claimed
Distribution
We know that sample frequencies typically differ somewhat from the values
we theoretically expect, so we consider the key question:
Are the differences between the actual observed frequencies O and the
theoretically expected frequencies E significant?
To measure the discrepancy between the O and E values, we use the test statistic
Goodness-of-Fit Notation
d.f. = Number of categories minus 1
O = observed frequency
E = expected frequency
2
2
Test Statistic:
( )
1
O E
E
df k
๏ฃ
๏ญ
๏ฝ
๏ฝ ๏ญ
๏ฅ
9. Example 2:
9
To check whether consumers have any preference among
five flavors of a new fruit soda, a sample of 100 people
provided the following data. Is there enough evidence to
support the claim that there is no preference in the
selection of fruit soda flavors? Let ฮฑ = 0.05.
Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10
Expected 20 20 20 20 20
H0: Consumers show no preference
(claim) OR ( p1 = p2 = p3 = p4 = p5 ).
H1: Consumers show a preference.(At
least one of the probabilities is different
from the others.), RTT
df = 5 โ 1 = 4 & ฮฑ = 0.05โ
2
2 ( )
:
O E
TS
E
๏ฃ
๏ญ
๏ฝ ๏ฅ
2 2 2 2 2
(32 20) (28 20) (16 20) (14 20) (10 20)
20 20 20 20 20
๏ญ ๏ญ ๏ญ ๏ญ ๏ญ
๏ฝ ๏ซ ๏ซ ๏ซ ๏ซ
Decision:
a. Reject H0
b. The claim is False
c. There is NOT sufficient
evidence to support the
claim that consumers show
no preference for the flavors.
18.000๏ฝ
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using ฮฑ
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is
/ is not sufficient evidence to
support the claim thatโฆ
TI Calculator:
Goodness of Fit - test
1. Stat
2. Tests
3. ๐ ๐
๐ฎ๐ถ๐ญ โ ๐ป๐๐๐
4. Enter ๐ณ ๐ & ๐ณ ๐
5. ๐๐ = ๐ง โ ๐
6. Calculate
TI Calculator: Enter data:
1. Stat
2. Edit
3. ClrList ๐ณ ๐ & ๐ณ ๐
4. O โ ๐ณ ๐ & E โ ๐ณ ๐
CV: ๐2 = 9.488
2
2 ( )
T 1: ,S
O E
df k
E
๏ฃ
๏ญ
๏ฝ ๏ฝ ๏ญ๏ฅ
10. Example 3:
10
A researcher reads that firearm-related deaths for people aged 1 to 18
were distributed as follows: 74% were accidental, 16% were homicides,
and 10% were suicides. In her district, there were 68 accidental deaths,
27 homicides, and 5 suicides during the past year. At ฮฑ = 0.10, test the
claim that the percentages are equal to the given claimed values.
Accidental Homicides Suicides
Observed 68 27 5
Expected 74 16 10
H0: p1 = 0.74, p2 = 0.16 and p3 = 0.10 (Claim)
H1: At least one of the proportions is not equal
to the given claimed value. (RTT)
2 2 2
(68 74) (27 16) (5 10)
74 16 10
๏ญ ๏ญ ๏ญ
๏ฝ ๏ซ ๏ซ 10.549๏ฝ
CV: ฮฑ = 0.10, df = k โ 1 = 2โ
Decision:
a. Reject H0
b. The claim is False
c. There is not sufficient evidence to support the
claim that the distribution is 74% accidental,
16% homicides, and 10% suicides.
2
2 ( )
:
O E
TS
E
๏ฃ
๏ญ
๏ฝ ๏ฅ
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using ฮฑ
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is
/ is not sufficient evidence to
support the claim thatโฆ
ฯยฒ = 4.605
2
2
Test Statistic:
( )
1
O E
E
df k
๏ฃ
๏ญ
๏ฝ
๏ฝ ๏ญ
๏ฅ