Focus FoxA statistically minded toll collector wonders if drivers are equally
likely to choose each of the three lanes at his toll booth. He
selects a random sample from all the cars that approach the booth
when all three lanes are empty, so that the driver’s choice isn’t
influenced by the cars already at the booth.
Which of the following is the appropriate alternative hypothesis for
addressing this question?
a. The observed number of cars choosing each lane is equal.
b. The observed number of cars choosing each lane is different from the expected
number of cars.
c. The proportions of cars choosing each of the three lanes are equal.
d. The proportions of cars choosing at least one of the lanes is different from the
proportion choosing the other two lanes.
e. The proportions of cars choosing each of the three lanes are all different.
Lane Left Center right
Number of drivers 137 159 169
We still have 3 conditions we must meet:
Replacement condition – Large Sample Size condition
- all expected counts must be at least 5
Large Sample Size condition takes the place of the Normal
condition for z & t procedures
Random & Independent must still be met!
To determine whether a categorical variable has a claimed
distribution, perform a chi-square goodness-of-fit test.
H0: specified distribution of categorical variable is correct
Ha: specified distribution of categorical variable is not correct
Or written symbolically using pi for each category:
H0: p1 = ____, p2 = ____, p3 = ____, …..
Ha: at least one of the pi’s is incorrect
Find expected counts and calculate chi-square statistic
χ2 = ∑ (observed – expected)2
P-value is area to the right of χ2 under the density curve of the chi-
square distribution with k – 1 degrees of freedom
(k represents the number of categories for the variable)
Random – data comes from a random sample or a randomized
Large Sample Size – all expected counts are at least 5
Independent – individual observations are independent. When
sampling without replacement, the population is at least 10 as large as
the sample (10% condition)
- Make sure you are comparing counts not proportions
- When checking Large Sample Size, make sure to use expected
Are births evenly distributed across the days of the week? The
one-way table below shows the distribution of births across the
days of the week in a random sample of 140 births from local
records in a large city.
Do these data give significant evidence that local births are not
equally likely on all days of the week?
(expected counts in Plan, graph in Do)
Day: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
Births: 13 23 24 20 27 18 15
Failing to reject does NOT mean H0 is correct
We can use technology to complete the “Do”
- Enter observed counts in L1
- Enter expected counts in L2
- STAT over to TESTS
- Select χ2 GOF-Test
Calculate gives test statistic, df,
Draw will provide appropriate distribution with shading
Color Observed Expected
Blue 9 14.4
Orange 8 12
Green 12 9.6
Yellow 15 8.4
Red 10 7.8
Brown 6 7.8
Biologists wish to cross pairs of tobacco plants having genetic
makeup Gg, indicating that each plant has one dominant gene G
and one recessive gene g for color. Each offspring plant will
receive one gene for color from each parent. The Punnett Square
shows the possible combinations of genes received by the
The Punnett Square suggests
that the expected ratio of green
GG to yellow-green Gg to albino
gg tobacco plants should be 1:2:1. The biologists predict that
25% of the offspring will be green, 50% will be yellow-green,
and 25% will be albino.
G GG Gg
g Gg gg
To test their hypothesis about the distribution of offspring, the
biologists mate 84 randomly selected pairs of yellow-green
parent plants. Of 84 offspring, 23 plants were green, 50 were
yellow-green, and 11 were albino. Do these data differ
significantly from what the biologists have predicted? Carry out
an appropriate test at the α = 0.05 level to answer.
(expected counts in plan, graph in Do)
If the sample data lead to a statistically significant result, we can
conclude that our variable has a distribution different from the
We need a Follow-Up Analysis (the “why”)
- Examine which categories of the variable show large
deviations between the observed and expected counts
- Look at the terms that sum χ2
- These components show which terms contribute most to the
Ex. Tobacco Plant Offspring
More or less than expected??
The largest contributor to the chi-square statistic is Albino
offspring. There were 10 fewer Albino plants than we expected.
Offspring Color Observed Expected
Green 23 21
Yellow-green 50 42
Albino 11 21