Chi square goodness of fit

What is a
Chi-Square Test of Goodness of Fit?

Questions of goodness of fit have become
increasingly important in modern statistics.

Questions of goodness of fit juxtapose complex
observed patterns against hypothesized or
previously observed patterns
to test overall and specific
differences among them.

Observed Hypothesized Difference

If the difference is small then the FIT IS GOOD

For example:

For example:
51% Females 50% Females 1%

If the difference is BIG then the FIT IS NOT GOOD

For example:

For example:
50% Females 22% Females 18%

Here is an example:
We want to know if a sample we have selected
has the national percentages of a certain ethnic
groups.

Here is an example:
We want to know if a sample we have selected
has the national percentages of a certain ethnic
groups.
2% of sample
is made of
members of
this ethnic
group
10% of the
population is
made of this
ethnic group
8% Difference

You will use certain statistical methods
to determine if the goodness of fit is
significant or not.

significant or not.
Here is an example:

significant or not.
Here is an example:
Problem – The chair of a statistics department
suspects that some of her faculty are more
popular with students than others.

There are three sections of introductory stats
that are taught at the same time in the morning
by Professors Cauforek, Kerr, and Rector.

There are three sections of introductory stats
that are taught at the same time in the morning
by Professors Cauforek, Kerr, and Rector.
66 students are planning on enrolling in one of
the three classes.

What would you expect the number of enrollees
to be in each class if popularity were not an
issue?

issue?
Professor Cauforek Professor Kerr Professor Rector
22 22 22

issue?
22 22 22
This is our expected value.

Now let’s see what was observed.

The number who enroll for each class was:

The number who enroll for each class was:
31 25 10

We will test the degree to which the observed
data...

data...
31 25 10

data...
31 25 10
…fits the expected enrollments.

data...
31 25 10
…fits the expected enrollments.
22 22 22

푥2 = Σ
(푂 − 퐸)2
퐸

Where:
푥2 = Σ
(푂 − 퐸)2
퐸

Where:
푥2 = Σ
(푂 − 퐸)2
퐸
풙ퟐ = 퐶ℎ푖 푆푞푢푎푟푒

Where:
푥2 = Σ
(푂 − 퐸)2
퐸
풙ퟐ = 퐶ℎ푖 푆푞푢푎푟푒
풙ퟐ = Σ
(푂 − 퐸)2
퐸

횺 = 푆푢푚 표푓
푥2 = 횺
(푂 − 퐸)2
퐸

퐎 = 표푏푠푒푟푣푒푑 푠푐표푟푒

푥2 = Σ
(푶 − 퐸)2
퐸

푥2 = Σ
(푶 − 퐸)2
퐸
31 25 10

푬 = 푒푥푝푒푐푡푒푑 푠푐표푟푒

푥2 = Σ
(푂 − 푬)2
퐸

푥2 = Σ
(푂 − 푬)2
퐸
22 22 22

푥2 = Σ
(푂 − 퐸)2
푬
22 22 22

Here is the null-hypothesis:
There is no significant difference between the
expected and the observed number of students
enrolled in three stats professors’ classes.

Now we will compute the 푥2 value and compare
it with the 푥2 critical value.

• If the value exceeds the critical value, then
we will reject the null-hypothesis.

• If the value exceeds the critical value, then
we will reject the null-hypothesis.
• If the value DOES NOT exceed the critical
value, then we will fail to reject the null-hypothesis.

Let’s compute the 푥2 value.

Expected 22 22 22
Observed 31 25 10

Expected 22 22 22
Observed 31 25 10
푥2 = 횺
(푂 − 퐸)2
퐸

OR
Expected 22 22 22
Observed 31 25 10
푥2 = 횺
(푂 − 퐸)2
퐸

OR
Expected 22 22 22
Observed 31 25 10
푥2 = 횺
(푂 − 퐸)2
퐸
푥2 =
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

Expected 22 22 22
Observed 31 25 10
OR
푥2 =
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸
푥2 = 횺
(푂 − 퐸)2
퐸

Let’s input each professor’s data into the
equation.

equation.
Expected 22 22 22
Observed 31 25 10

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(ퟑퟏ − 퐸)2
퐸
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − ퟐퟐ)2
퐸
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − 22)2
ퟐퟐ
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − 22)2
22
+
(ퟐퟓ − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − 22)2
22
+
(25 − ퟐퟐ)2
ퟐퟐ
+
(푂 − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − 22)2
22
+
(25 − 22)2
22
+
(ퟏퟎ − 퐸)2
퐸

equation.
Expected 22 22 22
Observed 31 25 10
푥2 =
(31 − 22)2
22
+
(25 − 22)2
22
+
(10 − ퟐퟐ)2
ퟐퟐ

Now for the calculation:
푥2 =
(31 − 22)2
22
+
(25 − 22)2
22
+
(10 − 22)2
22

푥2 =
(ퟗ)2
22
+
(25 − 22)2
22
+
(10 − 22)2
22

푥2 =
ퟖퟏ
22
+
(25 − 22)2
22
+
(10 − 22)2
22

푥2 =
81
22
+
(ퟑ)2
22
+
(10 − 22)2
22

푥2 =
81
22
+
ퟗ
22
+
(10 − 22)2
22

푥2 =
81
22
+
ퟗ
22
+
(−ퟏퟐ)2
22

푥2 =
81
22
+
9
22
+
ퟏퟒퟒ
22

Convert the fractions into decimals:
푥2 =
81
22
+
9
22
+
ퟏퟒퟒ
22

푥2 =
81
22
+
9
22
+
144
22

푥2 = ퟑ. ퟕ +
9
22
+
144
22

푥2 = 3.7 + ퟎ. ퟒ +
144
22

푥2 = 3.7 + 0.4 + ퟔ. ퟓ

Sum the terms:
푥2 = 3.7 + 0.4 + 6.5

As a contrasting example note what the 푥2 value
would be if the observed and expected values
were more similar:
Expected 22 22 22
Observed 24 22 20

Expected 22 22 22
Observed 24 22 20
푥2 =
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸
+
(푂 − 퐸)2
퐸

Expected 22 22 22
Observed 24 22 20
푥2 =
(푂 − ퟐퟐ)2
ퟐퟐ
+
(푂 − ퟐퟐ)2
ퟐퟐ
+
(푂 − ퟐퟐ)2
ퟐퟐ

Expected 22 22 22
Observed 24 22 20
푥2 =
(ퟐퟒ − 22)2
22
+
(ퟐퟐ − 22)2
22
+
(ퟐퟎ − 22)2
22

Expected 22 22 22
Observed 24 22 20
푥2 =
(ퟐ)2
22
+
(ퟎ)2
22
+
(−ퟐ)2
22

Expected 22 22 22
Observed 24 22 20
푥2 =
ퟒ
22
+
ퟎ
22
+
ퟒ
22

Expected 22 22 22
Observed 24 22 20
푥2 = ퟎ. ퟐ + ퟎ. ퟎ + ퟎ. ퟐ

Expected 22 22 22
Observed 24 22 20
푥2 = ퟎ. ퟒ

So the moral of the story is that the closer the
expected and observed values are to one
another, the smaller the Chi-square value or the
greater the goodness of fit (as seen below).

Expected 22 22 22
Observed 31 25 10

Expected 22 22 22
Observed 31 25 10
푥2 = ퟏퟎ. ퟔ

On the other hand, the farther the expected and
observed values are from one another the
smaller the Chi-square value or the greater the
goodness of fit (as seen below).

Expected 22 22 22
Observed 31 25 10

Expected 22 22 22
Observed 31 25 10
푥2 = ퟏퟎ. ퟔ

Now we determine if a 푥2 of 10.6 exceeds the
critical 푥2 for terms.

To calculate the 푥2 critical we first must
determine the degrees of freedom as well as set
the probability level.

The probability or alpha level means the
probability of a type 1 error we are willing to live
with (i.e., this is the probability of being wrong
when we reject the null hypothesis).

The probability or alpha level means the
probability of a type 1 error we are willing to live
with (i.e., this is the probability of being wrong
when we reject the null hypothesis). Generally
this value is 0.5 which is like saying we are
willing to be wrong 5 out of 100 times (0.05)
before we will reject the null-hypothesis.

Degrees of Freedom are calculated by taking the
number of groups and subtracting them by 1.
(Three groups minus 1 = 2)

We now have all of the information we need to
determine the critical 푥2.

We go to the Chi-Square Distribution Table and
locate the degrees of freedom.

df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …

And then we locate the probability or alpha level:
df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …

And then we locate the probability or alpha level:
df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …
Where these two values
intersect in the table we
find the critical 푥2.

Since the chi-square goodness of fit value (10.6)
exceeds the critical 푥2 (5.99) we will reject the
null hypothesis:

null hypothesis:

null hypothesis:
There actually is a significant difference.

In summary,
Questions of goodness of fit juxtapose observed
patterns against hypothesized to test overall and
specific differences among them.

Chi square goodness of fit

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Chi square goodness of fit

Similar to Chi square goodness of fit (20)

More from Ken Plummer

More from Ken Plummer (20)

Recently uploaded

Recently uploaded (20)

Chi square goodness of fit