© aSup-2007
CHI SQUARE   
1
The CHI SQUARE Statistic
Tests for Goodness of Fit
and Independence
© aSup-2007
CHI SQUARE   
2
Preview
 Color is known to affect human moods and
emotion. Sitting in a pale-blue room is more
calming than sitting in a bright-red room
 Based on the known influence of color, Hill
and Barton (2005) hypothesized that the
color of uniform may influence the outcome
of physical sports contest
 The study does not produce a numerical
score for each participant. Each participant
is simply classified into two categories
(winning or losing)
© aSup-2007
CHI SQUARE   
3
Preview
 The data consist of frequencies or proportions
describing how many individuals are in each
category
 This study want to use a hypothesis test to
evaluate data. The null hypothesis would state
that color has no effect on the outcome of the
contest
 Statistical technique have been developed
specifically to analyze and interpret data
consisting of frequencies or proportions 
CHI SQUARE
© aSup-2007
CHI SQUARE   
4
PARAMETRIC AND NONPARAMETRIC
STATISTICAL TESTS
 The tests that concern parameter and
require assumptions about parameter are
called parametric tests
 Another general characteristic of parametric
tests is that they require a numerical score
for each individual in the sample. In terms
of measurement scales, parametric tests
require data from an interval or a ratio scale
© aSup-2007
CHI SQUARE   
5
PARAMETRIC AND NONPARAMETRIC
STATISTICAL TESTS
 Often, researcher are confronted with
experimental situation that do not conform to
the requirements of parametric tests. In this
situations, it may not be appropriate to use a
parametric test because may lead to an
erroneous interpretation of the data
 Fortunately, there are several hypothesis
testing techniques that provide alternatives to
parametric test that called nonparametric tests
© aSup-2007
CHI SQUARE   
6
NONPARAMETRIC TEST
 Nonparametric tests sometimes are called
distribution free tests
 One of the most obvious differences
between parametric and nonparametric
tests is the type of data they use
 All the parametric tests required numerical
scores. For nonparametric, the subjects are
usually just classified into categories
© aSup-2007
CHI SQUARE   
7
NONPARAMETRIC TEST
 Notice that these classification involve
measurement on nominal or ordinal scales,
and they do not produce numerical values
that can be used to calculate mean and
variance
 Nonparametric tests generally are not as
sensitive as parametric test; nonparametric
tests are more likely to fail in detecting a
real difference between two treatments
© aSup-2007
CHI SQUARE   
8
THE CHI SQUARE TEST FOR GOODNESS OF FIT
… uses sample data to test hypotheses about
the shape or proportions of a population
distribution. The test determines how well the
obtained sample proportions fit the
population proportions specified by the null
hypothesis
© aSup-2007
CHI SQUARE   
9
THE NULL HYPOTHESIS FOR THE GOODNESS OF FIT
 For the chi-square test of goodness of fit, the
null hypothesis specifies the proportion (or
percentage) of the population in each category
 Generally H0 will fall into one of the following
categories:
○ No preference
H0 states that the population is divided equally
among the categories
○ No difference from a Known population
H0 states that the proportion for one population are
not different from the proportion that are known to
exist for another population
© aSup-2007
CHI SQUARE   
10
THE DATA FOR THE GOODNESS OF FIT TEST
 Select a sample of n individuals and count how
many are in each category
 The resulting values are called observed
frequency (fo)
 A sample of n = 40 participants was given a
personality questionnaire and classified into
one of three personality categories: A, B, or C
Category A Category B Category C
15 19 6
© aSup-2007
CHI SQUARE   
11
EXPECTED FREQUENCIES
 The general goal of the chi-square test for goodness
of fit is to compare the data (the observed
frequencies) with the null hypothesis
 The problem is to determine how well the data fit the
distribution specified in H0 – hence name goodness of
fit
 Suppose, for example, the null hypothesis states that
the population is distributed into three categories
with the following proportion
Category A Category B Category C
25% 50% 25%
© aSup-2007
CHI SQUARE   
12
EXPECTED FREQUENCIES
To find the exact frequency expected for each
category, multiply the same size (n) by the
proportion (or percentage) from the null
hypothesis
25% of 40 = 10 individual in category A
50% of 40 = 20 individual in category B
25% of 40 = 10 individual in category C
© aSup-2007
CHI SQUARE   
13
THE CHI-SQUARE STATISTIC
 The general purpose of any hypothesis test
is to determine whether the sample data
support or refute a hypothesis about
population
 In the chi-square test for goodness of fit, the
sample expressed as a set of observe
frequencies (fovalues) and the null
hypothesis is used to generate a set of
expected frequencies (fe values)
© aSup-2007
CHI SQUARE   
14
THE CHI-SQUARE STATISTIC
 The chi-square statistic simply measures ho
well the data (fo) fit the hypothesis (fe)
 The symbol for the chi-square statistic is χ2
 The formula for the chi-square statistic is
χ2
= ∑
(fo – fe)2
fe
© aSup-2007
CHI SQUARE   
15
A researcher has developed three different
design for a computer keyboard. A sample of n
= 60 participants is obtained, and each
individual tests all three keyboard and identifies
his or her favorite.
The frequency distribution of preference is:
Design A = 23, Design B = 12, Design C = 25.
Use a chi-square test for goodness of fit with α
= .05 to determine whether there are significant
preferences among three design
LEARNING CHECK
© aSup-2007
CHI SQUARE   
16
Dari https://twitter.com/#!/palangmerah
diketahui bahwa persentase golongan darah
di Indonesia adalah:
A : 25,48%,
B : 26,68%,
O : 40,77 %,
AB : 6,6 %
Golongan darah di kelas kita?
Apakah berbeda dengan data PMI?
LEARNING CHECK
© aSup-2007
CHI SQUARE   
17
THE CHI-SQUARE TEST FOR INDEPENDENCE
 The chi-square may also be used to test
whether there is a relationship between two
variables
 For example, a group of students could be
classified in term of personality (introvert,
extrovert) and in terms of color preferences
(red, white, green, or blue).
RED WHITE GREEN BLUE ∑
INTRO 10 3 15 22 50
EXTRO 90 17 25 18 150
100 20 40 40 200
© aSup-2007
CHI SQUARE   
18
OBSERVED AND EXPECTED FREQUENCIES
fo
RED WHITE GREEN BLUE ∑
INTRO 10 3 15 22 50
EXTRO 90 17 25 18 150
∑ 100 20 40 40 200
fe
RED WHITE GREEN BLUE ∑
INTRO 50
EXTRO 150
∑ 100 20 40 40 200
© aSup-2007
CHI SQUARE   
19
OBSERVED AND EXPECTED FREQUENCIES
fo
RED WHITE GREEN BLUE ∑
INTRO 10 3 15 22 50
EXTRO 90 17 25 18 150
∑ 100 20 40 40 200
fe
RED WHITE GREEN BLUE ∑
INTRO 25 5 10 10 50
EXTRO 75 15 30 30 150
∑ 100 20 40 40 200
© aSup-2007
CHI SQUARE   
20
OBSERVED AND EXPECTED FREQUENCIES
fo R W G B ∑
INTRO 10 3 15 22 50
EXTRO 90 17 25 18 150
∑ 100 20 40 40 200
(fo– fe)2
R W G B
INTRO
EXTRO
fe R W G B ∑
INTRO 25 5 10 10 50
EXTRO 75 15 30 30 150
∑ 100 20 40 40 200
© aSup-2007
CHI SQUARE   
21
OBSERVED AND EXPECTED FREQUENCIES
fo R W G B ∑
INTRO 10 3 15 22 50
EXTRO 90 17 25 18 150
∑ 100 20 40 40 200
(fo– fe)2
R W G B
INTRO (-15)2
(-2)2
(5)2
(12)2
EXTRO (15)2
(-2)2
(-5)2
(-12)2
fe R W G B ∑
INTRO 25 5 10 10 50
EXTRO 75 15 30 30 150
∑ 100 20 40 40 200
© aSup-2007
CHI SQUARE   
22
OBSERVED AND EXPECTED FREQUENCIES
(fo– fe)2
/fe R W G B
INTRO
EXTRO
fe R W G B
INTRO 25 5 10 10
EXTRO 75 15 30 30
(fo– fe)2
R W G B
INTRO 225 4 25 144
EXTRO 225 4 25 144
© aSup-2007
CHI SQUARE   
23
OBSERVED AND EXPECTED FREQUENCIES
(fo– fe)2
/fe R W G B
INTRO 9 0,8 2,5 14,4
EXTRO 3 0,267 0,833 4,8
fe R W G B
INTRO 25 5 10 10
EXTRO 75 15 30 30
(fo– fe)2
R W G B
INTRO 225 4 25 144
EXTRO 225 4 25 144
© aSup-2007
CHI SQUARE   
24
THE CHI-SQUARE STATISTIC
χ2
= ∑
(fo – fe)2
fe
χ2
= 35,6
df = (C-1) (R-1) = (3) (1) = 3
χ2
critical at α = .05 is 7,81

Chi square

  • 1.
    © aSup-2007 CHI SQUARE   1 The CHI SQUARE Statistic Tests for Goodness of Fit and Independence
  • 2.
    © aSup-2007 CHI SQUARE   2 Preview  Color is known to affect human moods and emotion. Sitting in a pale-blue room is more calming than sitting in a bright-red room  Based on the known influence of color, Hill and Barton (2005) hypothesized that the color of uniform may influence the outcome of physical sports contest  The study does not produce a numerical score for each participant. Each participant is simply classified into two categories (winning or losing)
  • 3.
    © aSup-2007 CHI SQUARE   3 Preview  The data consist of frequencies or proportions describing how many individuals are in each category  This study want to use a hypothesis test to evaluate data. The null hypothesis would state that color has no effect on the outcome of the contest  Statistical technique have been developed specifically to analyze and interpret data consisting of frequencies or proportions  CHI SQUARE
  • 4.
    © aSup-2007 CHI SQUARE   4 PARAMETRIC AND NONPARAMETRIC STATISTICAL TESTS  The tests that concern parameter and require assumptions about parameter are called parametric tests  Another general characteristic of parametric tests is that they require a numerical score for each individual in the sample. In terms of measurement scales, parametric tests require data from an interval or a ratio scale
  • 5.
    © aSup-2007 CHI SQUARE   5 PARAMETRIC AND NONPARAMETRIC STATISTICAL TESTS  Often, researcher are confronted with experimental situation that do not conform to the requirements of parametric tests. In this situations, it may not be appropriate to use a parametric test because may lead to an erroneous interpretation of the data  Fortunately, there are several hypothesis testing techniques that provide alternatives to parametric test that called nonparametric tests
  • 6.
    © aSup-2007 CHI SQUARE   6 NONPARAMETRIC TEST  Nonparametric tests sometimes are called distribution free tests  One of the most obvious differences between parametric and nonparametric tests is the type of data they use  All the parametric tests required numerical scores. For nonparametric, the subjects are usually just classified into categories
  • 7.
    © aSup-2007 CHI SQUARE   7 NONPARAMETRIC TEST  Notice that these classification involve measurement on nominal or ordinal scales, and they do not produce numerical values that can be used to calculate mean and variance  Nonparametric tests generally are not as sensitive as parametric test; nonparametric tests are more likely to fail in detecting a real difference between two treatments
  • 8.
    © aSup-2007 CHI SQUARE   8 THE CHI SQUARE TEST FOR GOODNESS OF FIT … uses sample data to test hypotheses about the shape or proportions of a population distribution. The test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis
  • 9.
    © aSup-2007 CHI SQUARE   9 THE NULL HYPOTHESIS FOR THE GOODNESS OF FIT  For the chi-square test of goodness of fit, the null hypothesis specifies the proportion (or percentage) of the population in each category  Generally H0 will fall into one of the following categories: ○ No preference H0 states that the population is divided equally among the categories ○ No difference from a Known population H0 states that the proportion for one population are not different from the proportion that are known to exist for another population
  • 10.
    © aSup-2007 CHI SQUARE   10 THE DATA FOR THE GOODNESS OF FIT TEST  Select a sample of n individuals and count how many are in each category  The resulting values are called observed frequency (fo)  A sample of n = 40 participants was given a personality questionnaire and classified into one of three personality categories: A, B, or C Category A Category B Category C 15 19 6
  • 11.
    © aSup-2007 CHI SQUARE   11 EXPECTED FREQUENCIES  The general goal of the chi-square test for goodness of fit is to compare the data (the observed frequencies) with the null hypothesis  The problem is to determine how well the data fit the distribution specified in H0 – hence name goodness of fit  Suppose, for example, the null hypothesis states that the population is distributed into three categories with the following proportion Category A Category B Category C 25% 50% 25%
  • 12.
    © aSup-2007 CHI SQUARE   12 EXPECTED FREQUENCIES To find the exact frequency expected for each category, multiply the same size (n) by the proportion (or percentage) from the null hypothesis 25% of 40 = 10 individual in category A 50% of 40 = 20 individual in category B 25% of 40 = 10 individual in category C
  • 13.
    © aSup-2007 CHI SQUARE   13 THE CHI-SQUARE STATISTIC  The general purpose of any hypothesis test is to determine whether the sample data support or refute a hypothesis about population  In the chi-square test for goodness of fit, the sample expressed as a set of observe frequencies (fovalues) and the null hypothesis is used to generate a set of expected frequencies (fe values)
  • 14.
    © aSup-2007 CHI SQUARE   14 THE CHI-SQUARE STATISTIC  The chi-square statistic simply measures ho well the data (fo) fit the hypothesis (fe)  The symbol for the chi-square statistic is χ2  The formula for the chi-square statistic is χ2 = ∑ (fo – fe)2 fe
  • 15.
    © aSup-2007 CHI SQUARE   15 A researcher has developed three different design for a computer keyboard. A sample of n = 60 participants is obtained, and each individual tests all three keyboard and identifies his or her favorite. The frequency distribution of preference is: Design A = 23, Design B = 12, Design C = 25. Use a chi-square test for goodness of fit with α = .05 to determine whether there are significant preferences among three design LEARNING CHECK
  • 16.
    © aSup-2007 CHI SQUARE   16 Dari https://twitter.com/#!/palangmerah diketahui bahwa persentase golongan darah di Indonesia adalah: A : 25,48%, B : 26,68%, O : 40,77 %, AB : 6,6 % Golongan darah di kelas kita? Apakah berbeda dengan data PMI? LEARNING CHECK
  • 17.
    © aSup-2007 CHI SQUARE   17 THE CHI-SQUARE TEST FOR INDEPENDENCE  The chi-square may also be used to test whether there is a relationship between two variables  For example, a group of students could be classified in term of personality (introvert, extrovert) and in terms of color preferences (red, white, green, or blue). RED WHITE GREEN BLUE ∑ INTRO 10 3 15 22 50 EXTRO 90 17 25 18 150 100 20 40 40 200
  • 18.
    © aSup-2007 CHI SQUARE   18 OBSERVED AND EXPECTED FREQUENCIES fo RED WHITE GREEN BLUE ∑ INTRO 10 3 15 22 50 EXTRO 90 17 25 18 150 ∑ 100 20 40 40 200 fe RED WHITE GREEN BLUE ∑ INTRO 50 EXTRO 150 ∑ 100 20 40 40 200
  • 19.
    © aSup-2007 CHI SQUARE   19 OBSERVED AND EXPECTED FREQUENCIES fo RED WHITE GREEN BLUE ∑ INTRO 10 3 15 22 50 EXTRO 90 17 25 18 150 ∑ 100 20 40 40 200 fe RED WHITE GREEN BLUE ∑ INTRO 25 5 10 10 50 EXTRO 75 15 30 30 150 ∑ 100 20 40 40 200
  • 20.
    © aSup-2007 CHI SQUARE   20 OBSERVED AND EXPECTED FREQUENCIES fo R W G B ∑ INTRO 10 3 15 22 50 EXTRO 90 17 25 18 150 ∑ 100 20 40 40 200 (fo– fe)2 R W G B INTRO EXTRO fe R W G B ∑ INTRO 25 5 10 10 50 EXTRO 75 15 30 30 150 ∑ 100 20 40 40 200
  • 21.
    © aSup-2007 CHI SQUARE   21 OBSERVED AND EXPECTED FREQUENCIES fo R W G B ∑ INTRO 10 3 15 22 50 EXTRO 90 17 25 18 150 ∑ 100 20 40 40 200 (fo– fe)2 R W G B INTRO (-15)2 (-2)2 (5)2 (12)2 EXTRO (15)2 (-2)2 (-5)2 (-12)2 fe R W G B ∑ INTRO 25 5 10 10 50 EXTRO 75 15 30 30 150 ∑ 100 20 40 40 200
  • 22.
    © aSup-2007 CHI SQUARE   22 OBSERVED AND EXPECTED FREQUENCIES (fo– fe)2 /fe R W G B INTRO EXTRO fe R W G B INTRO 25 5 10 10 EXTRO 75 15 30 30 (fo– fe)2 R W G B INTRO 225 4 25 144 EXTRO 225 4 25 144
  • 23.
    © aSup-2007 CHI SQUARE   23 OBSERVED AND EXPECTED FREQUENCIES (fo– fe)2 /fe R W G B INTRO 9 0,8 2,5 14,4 EXTRO 3 0,267 0,833 4,8 fe R W G B INTRO 25 5 10 10 EXTRO 75 15 30 30 (fo– fe)2 R W G B INTRO 225 4 25 144 EXTRO 225 4 25 144
  • 24.
    © aSup-2007 CHI SQUARE   24 THE CHI-SQUARE STATISTIC χ2 = ∑ (fo – fe)2 fe χ2 = 35,6 df = (C-1) (R-1) = (3) (1) = 3 χ2 critical at α = .05 is 7,81