Stat 130 chi-square goodnes-of-fit test

Chi-Square
Goodness-of-Fit Test
LOZANO, ALDRIN T.

Introduction
The chi-square distribution can be used for
tests concerning frequency distributions, such as:

“If a sample of buyers is given a choice of
automobile colors, will each color be selected with
the same frequency?”

Assumptions

- The data are obtained from a random sample

- The expected frequency for each category must
be 5 or more

Test for Goodness-of-Fit
The chi-square statistic can be used to see
whether a frequency distribution fits a specific
pattern.

This is referred to as the chi-squared goodness-
of-fit test.

Observed Frequencies vs Expected
Frequencies
Suppose a market analyst wished to see
whether consumers have any preference among five
flavors of a new fruit soda. A sample of 100 people
provided these data:

Cherry Strawberry Orange Lime Grape
32 28 16 14 10

Frequencies
Since the frequencies for each flavor were
obtained from a sample, these actual frequencies
are called the observed frequencies.

The frequencies obtained by calculation (as if
there were no preference) are called the expected
frequencies.

Frequencies

Frequency Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10
Expected 20 20 20 20 20

The formula for the chi-square goodness-of-fit
test is:
(𝑂 − 𝐸)2
𝑋2 =
𝐸
Where:
O – observed or obtained frequency
E – expected or theoretical frquency

The degrees of freedom (df) is:

𝑑𝑓 = (𝐶 − 1)(𝑅 − 1)

Where:
C – number of columns
R – number of rows

Example
Is there enough evidence to reject the claim
that there is no preference in the selection of fruit
soda flavors, using the data shown previously?
Let α = 0.05.
Frequency Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10
Expected 20 20 20 20 20

Solution
Step 1: State the hypotheses and define the claim
Ho: Consumers show no preference for flavors (claim)
Ha: Consumers show a preference

Step 2: Find the critical value
df = 4 and α = 0.05, hence, the critical value from the chi-
square distribution table is 9.488

Solution
Step 3: Compute X2

(𝑂−𝐸)2
𝑋2 = = 18.0
𝐸

Solution
Step 4: Make the decision
The decision if to reject the null hypothesis, since 18.0 > 9.488

Solution
Step 5: Summarize the results
There is enough evidence to reject the claim that consumers
show no preference for the flavors.

A good fit
When the observed values
and expected values are close
together, the chi-square test value
will be small.

Then the decision will be not
to reject the null hypothesis—
hence, there is a “good fit.”

Not a good fit

When the observed values
and the expected values are far
apart, the chi-square test value will
be large. Then, the null hypothesis
will be rejected—hence, there is
“not a good fit.”

Chi-Square Goodness-of-Fit
Procedure Summary
Step 1: State the hypotheses and define the claim.
Step 2: Find the critical value. (test is always right tailed)
Step 3: Compute the test value.
Step 4: Make the decision.
Step 5: Summarize the results.

An example in R
Professor Bumblefuss takes a random sample of students
enrolled in Statistics 101 at ABC University. He finds the following:
there are 25 freshman in the sample, 32 sophomores, 18 juniors,
and 20 seniors. Test the null hypothesis that freshman,
sophomores, juniors, and seniors are equally represented among
students signed up for Stat 101.

Freshman Sophomore Juniors Seniors
25 32 18 20

R Implementation
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x),
length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B =
2000)
> chisq.test(c(25,32,18,20))

Chi-squared test for given probabilities

data: c(25, 32, 18, 20)
X-squared = 4.9158, df = 3, p-value = 0.1781

Another Example
A new casino game involves rolling 3 dice. The winnings are
directly proportional to the total number of sixes rolled. Suppose a
gambler plays the game 100 times, with the following observed
counts:
Number of Number of
Sixes Rolls
0 48
1 35
2 15
3 2

Another Example continued …
The casino becomes suspicious of the gambler and wishes to
determine whether the dice are fair. What do they conclude?

Another Example continued …
If a die is fair, we would expect the probability of rolling a 6 on any
given toss to be 1/6. Assuming the 3 dice are independent (the roll of
one die should not affect the roll of the others), we might assume that
the number of sixes in three rolls is distributed Binomial(3,1/6).

To determine whether the gambler's dice are fair, we may
compare his results with the results expected under this distribution.
The expected values for 0, 1, 2, and 3 sixes under the Binomial(3,1/6)
distribution are the following:

Expected Binomial Distribution values
P1 = P(roll 0 sixes) = P(X=0) = 0.58
P2 = P(roll 1 six ) = P(X=1) = 0.345
P3 = P(roll 2 sixes) = P(X=2) = 0.07
P4 = P(roll 3 sixes) = P(X=3) = 0.005

Expected vs Observed
Since the gambler plays 100 times, the expected counts are the
following:

Number of Sixes Expected Count Observed Count
0 58 48
1 34.5 35
2 7 15
3 0.5 2

Visual Comparison
The two plots shown below provide visual comparison of the
expected and observed values:

Chi-gram
From these graphs, it is
difficult to distinguish differences
between the observed and
expected counts. A visual
representation of the differences
is the chi-gram, which plots the
observed-expected counts divided
by the square root of the expected
counts, as shown here:

Chi-Square Statistic
The chi-square statistic is the sum of the squares of the plotted
values,

(48 – 58)2/58 + (35 – 34.5)2/34.5 + (15 – 7)2/7 + (2 – 0.5)2/0.5
1.72 + 0.007 + 9.14 + 4.5 = 15.367

Given this statistic, are the observed values likely under the
assumed model?

Making a decision
In the gambling example above, the chi-square test statistic X2 was
calculated to be 15.367. Since k = 4 in this case (the possibilities are 0,1,2, and
3 sixes) the test statistic is associated with the chi-square distribution with 3
degrees of freedom.

If we are interested in a significance level of 0.05, we may reject the
null hypothesis (that the dice is fair) if X2 ≥ 7.815, the value corresponding to
the 0.05 significance level for the X2 distribution. Since 15.367 is clearly greater
than 7.815, we may reject the null hypothesis that the dice is fair at a 0.05
significance level.

Making a decision
Given this information, the casino can ask the gambler to take his
dice (and business) somewhere else.

R Implementation
> expected <- c(58,34.5,7,0.5)
> observed <- c(48,35,15,2)

> chisq.test(observed, p = (expected/100))

Chi-squared test for given probabilities

data: observed
X-squared = 15.3742, df = 3, p-value = 0.001523

References

http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm

http://www.scribd.com/doc/101960970/10/CHI-SQUARE-
GOODNESS-OF-FIT-PROCEDURE-SUMMARY

Stat 130 chi-square goodnes-of-fit test

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Stat 130 chi-square goodnes-of-fit test

Similar to Stat 130 chi-square goodnes-of-fit test (20)

Recently uploaded

Recently uploaded (20)

Stat 130 chi-square goodnes-of-fit test