Focus FoxA statistically minded toll collector wonders if drivers are equally
likely to choose each of the three lanes at his toll booth. He
selects a random sample from all the cars that approach the booth
when all three lanes are empty, so that the driver’s choice isn’t
influenced by the cars already at the booth.
Which of the following is the appropriate alternative hypothesis for
addressing this question?
a. The observed number of cars choosing each lane is equal.
b. The observed number of cars choosing each lane is different from the expected
number of cars.
c. The proportions of cars choosing each of the three lanes are equal.
d. The proportions of cars choosing at least one of the lanes is different from the
proportion choosing the other two lanes.
e. The proportions of cars choosing each of the three lanes are all different.
Lane Left Center right
Number of drivers 137 159 169
Chi-Square Test
We still have 3 conditions we must meet:
Replacement condition – Large Sample Size condition
- all expected counts must be at least 5
Large Sample Size condition takes the place of the Normal
condition for z & t procedures
Random & Independent must still be met!
Chi-Square Test
To determine whether a categorical variable has a claimed
distribution, perform a chi-square goodness-of-fit test.
H0: specified distribution of categorical variable is correct
Ha: specified distribution of categorical variable is not correct
Or written symbolically using pi for each category:
H0: p1 = ____, p2 = ____, p3 = ____, …..
Ha: at least one of the pi’s is incorrect
Find expected counts and calculate chi-square statistic
χ2 = ∑ (observed – expected)2
Expected
P-value is area to the right of χ2 under the density curve of the chi-
square distribution with k – 1 degrees of freedom
(k represents the number of categories for the variable)
Chi-Square Test
3 Conditions:
Random – data comes from a random sample or a randomized
experiment.
Large Sample Size – all expected counts are at least 5
Independent – individual observations are independent. When
sampling without replacement, the population is at least 10 as large as
the sample (10% condition)
Cautions:
- Make sure you are comparing counts not proportions
- When checking Large Sample Size, make sure to use expected
counts
Chi-Square Test
Are births evenly distributed across the days of the week? The
one-way table below shows the distribution of births across the
days of the week in a random sample of 140 births from local
records in a large city.
Do these data give significant evidence that local births are not
equally likely on all days of the week?
SPDC:
(expected counts in Plan, graph in Do)
Day: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
Births: 13 23 24 20 27 18 15
Chi-Square Test
Failing to reject does NOT mean H0 is correct
We can use technology to complete the “Do”
- Enter observed counts in L1
- Enter expected counts in L2
- STAT over to TESTS
- Select χ2 GOF-Test
Calculate gives test statistic, df,
& P-value
Draw will provide appropriate distribution with shading
Color Observed Expected
Blue 9 14.4
Orange 8 12
Green 12 9.6
Yellow 15 8.4
Red 10 7.8
Brown 6 7.8
Chi-Square Test
Biologists wish to cross pairs of tobacco plants having genetic
makeup Gg, indicating that each plant has one dominant gene G
and one recessive gene g for color. Each offspring plant will
receive one gene for color from each parent. The Punnett Square
shows the possible combinations of genes received by the
offspring
The Punnett Square suggests
that the expected ratio of green
GG to yellow-green Gg to albino
gg tobacco plants should be 1:2:1. The biologists predict that
25% of the offspring will be green, 50% will be yellow-green,
and 25% will be albino.
G g
G GG Gg
g Gg gg
Parent1
Parent 2
Chi-Square Test
To test their hypothesis about the distribution of offspring, the
biologists mate 84 randomly selected pairs of yellow-green
parent plants. Of 84 offspring, 23 plants were green, 50 were
yellow-green, and 11 were albino. Do these data differ
significantly from what the biologists have predicted? Carry out
an appropriate test at the α = 0.05 level to answer.
SPDC:
(expected counts in plan, graph in Do)
Chi-Square Test
If the sample data lead to a statistically significant result, we can
conclude that our variable has a distribution different from the
specified one.
We need a Follow-Up Analysis (the “why”)
Steps:
- Examine which categories of the variable show large
deviations between the observed and expected counts
- Look at the terms that sum χ2
- These components show which terms contribute most to the
chi-square statistic
Chi-Square Test
Ex. Tobacco Plant Offspring
Biggest contributor??
More or less than expected??
Follow-Up Analysis:
The largest contributor to the chi-square statistic is Albino
offspring. There were 10 fewer Albino plants than we expected.
Offspring Color Observed Expected
Green 23 21
Yellow-green 50 42
Albino 11 21

Chi square goodness of fit test

  • 1.
    Focus FoxA statisticallyminded toll collector wonders if drivers are equally likely to choose each of the three lanes at his toll booth. He selects a random sample from all the cars that approach the booth when all three lanes are empty, so that the driver’s choice isn’t influenced by the cars already at the booth. Which of the following is the appropriate alternative hypothesis for addressing this question? a. The observed number of cars choosing each lane is equal. b. The observed number of cars choosing each lane is different from the expected number of cars. c. The proportions of cars choosing each of the three lanes are equal. d. The proportions of cars choosing at least one of the lanes is different from the proportion choosing the other two lanes. e. The proportions of cars choosing each of the three lanes are all different. Lane Left Center right Number of drivers 137 159 169
  • 2.
    Chi-Square Test We stillhave 3 conditions we must meet: Replacement condition – Large Sample Size condition - all expected counts must be at least 5 Large Sample Size condition takes the place of the Normal condition for z & t procedures Random & Independent must still be met!
  • 3.
    Chi-Square Test To determinewhether a categorical variable has a claimed distribution, perform a chi-square goodness-of-fit test. H0: specified distribution of categorical variable is correct Ha: specified distribution of categorical variable is not correct Or written symbolically using pi for each category: H0: p1 = ____, p2 = ____, p3 = ____, ….. Ha: at least one of the pi’s is incorrect Find expected counts and calculate chi-square statistic χ2 = ∑ (observed – expected)2 Expected P-value is area to the right of χ2 under the density curve of the chi- square distribution with k – 1 degrees of freedom (k represents the number of categories for the variable)
  • 4.
    Chi-Square Test 3 Conditions: Random– data comes from a random sample or a randomized experiment. Large Sample Size – all expected counts are at least 5 Independent – individual observations are independent. When sampling without replacement, the population is at least 10 as large as the sample (10% condition) Cautions: - Make sure you are comparing counts not proportions - When checking Large Sample Size, make sure to use expected counts
  • 5.
    Chi-Square Test Are birthsevenly distributed across the days of the week? The one-way table below shows the distribution of births across the days of the week in a random sample of 140 births from local records in a large city. Do these data give significant evidence that local births are not equally likely on all days of the week? SPDC: (expected counts in Plan, graph in Do) Day: Sun. Mon. Tues. Wed. Thurs. Fri. Sat. Births: 13 23 24 20 27 18 15
  • 6.
    Chi-Square Test Failing toreject does NOT mean H0 is correct We can use technology to complete the “Do” - Enter observed counts in L1 - Enter expected counts in L2 - STAT over to TESTS - Select χ2 GOF-Test Calculate gives test statistic, df, & P-value Draw will provide appropriate distribution with shading Color Observed Expected Blue 9 14.4 Orange 8 12 Green 12 9.6 Yellow 15 8.4 Red 10 7.8 Brown 6 7.8
  • 7.
    Chi-Square Test Biologists wishto cross pairs of tobacco plants having genetic makeup Gg, indicating that each plant has one dominant gene G and one recessive gene g for color. Each offspring plant will receive one gene for color from each parent. The Punnett Square shows the possible combinations of genes received by the offspring The Punnett Square suggests that the expected ratio of green GG to yellow-green Gg to albino gg tobacco plants should be 1:2:1. The biologists predict that 25% of the offspring will be green, 50% will be yellow-green, and 25% will be albino. G g G GG Gg g Gg gg Parent1 Parent 2
  • 8.
    Chi-Square Test To testtheir hypothesis about the distribution of offspring, the biologists mate 84 randomly selected pairs of yellow-green parent plants. Of 84 offspring, 23 plants were green, 50 were yellow-green, and 11 were albino. Do these data differ significantly from what the biologists have predicted? Carry out an appropriate test at the α = 0.05 level to answer. SPDC: (expected counts in plan, graph in Do)
  • 9.
    Chi-Square Test If thesample data lead to a statistically significant result, we can conclude that our variable has a distribution different from the specified one. We need a Follow-Up Analysis (the “why”) Steps: - Examine which categories of the variable show large deviations between the observed and expected counts - Look at the terms that sum χ2 - These components show which terms contribute most to the chi-square statistic
  • 10.
    Chi-Square Test Ex. TobaccoPlant Offspring Biggest contributor?? More or less than expected?? Follow-Up Analysis: The largest contributor to the chi-square statistic is Albino offspring. There were 10 fewer Albino plants than we expected. Offspring Color Observed Expected Green 23 21 Yellow-green 50 42 Albino 11 21