Chapter 12: Analysis of Categorical Data 1
Chapter 12
Analysis of Categorical Data
LEARNING OBJECTIVES
This chapter presents several nonparametric statistics that can be used to analyze data
enabling you to:
1. Understand the chi-square goodness-of-fit test and how to use it.
2. Analyze data using the chi-square test of independence.
CHAPTER TEACHING STRATEGY
Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi-
square goodness-of-fit and chi-square test of independence. These two techniques are
important because they give the statistician a tool that is particularly useful for analyzing
nominal data (even though independent variable categories can sometimes have ordinal
or higher categories). It should be emphasized that there are many instances in business
research where the resulting data gathered are merely categorical identification. For
example, in segmenting the market place (consumers or industrial users), information is
gathered regarding gender, income level, geographical location, political affiliation,
religious preference, ethnicity, occupation, size of company, type of industry, etc. On
these variables, the measurement is often a tallying of the frequency of occurrence of
individuals, items, or companies in each category. The subject of the research is given no
"score" or "measurement" other than a 0/1 for being a member or not of a given category.
These two chi-square tests are perfectly tailored to analyze such data.
The chi-square goodness-of-fit test examines the categories of one variable to
determine if the distribution of observed occurrences matches some expected or
theoretical distribution of occurrences. It can be used to determine if some standard or
previously known distribution of proportions is the same as some observed distribution of
Chapter 12: Analysis of Categorical Data 2
proportions. It can also be used to validate the theoretical distribution of occurrences of
phenomena such as random arrivals which are often assumed to be Poisson distributed.
You will note that the degrees of freedom which are k - 1 for a given set of expected
values or for the uniform distribution change to k - 2 for an expected Poisson distribution
and to k - 3 for an expected normal distribution. To conduct a chi-square goodness-of-fit
test to analyze an expected Poisson distribution, the value of lambda must be estimated
from the observed data. This causes the loss of an additional degree of freedom. With
the normal distribution, both the mean and standard deviation of the expected distribution
are estimated from the observed values causing the loss of two additional degrees of
freedom from the k - 1 value.
The chi-square test of independence is used to compare the observed frequencies
along the categories of two independent variables to expected values to determine if the
two variables are independent or not. Of course, if the variables are not independent,
they are dependent or related. This allows business researchers to reach some
conclusions about such questions as is smoking independent of gender or is type of
housing preferred independent of geographic region. The chi-square test of independence
is often used as a tool for preliminary analysis of data gathered in exploratory research
where the researcher has little idea of what variables seem to be related to what variables,
and the data are nominal. This test is particularly useful with demographic type data.
A word of warning is appropriate here. When an expected frequency is small, the
observed chi-square value can be inordinately large thus yielding an increased possibility
of committing a Type I error. The research on this problem has yielded varying results
with some authors indicating that expected values as low as two or three are acceptable
and other researchers demanding that expected values be ten or more. In this text, we
have settled on the fairly widespread accepted criterion of five or more.
CHAPTER OUTLINE
16.1 Chi-Square Goodness-of-Fit Test
Testing a Population Proportion Using the Chi-square Goodness-of-Fit
Test as an Alternative Technique to the z Test
16.2 Contingency Analysis: Chi-Square Test of Independence
KEY TERMS
Categorical Data Chi-Square Test of Independence
Chi-Square Distribution Contingency Analysis
Chi-Square Goodness-of-Fit Test Contingency Table
Chapter 12: Analysis of Categorical Data 3
SOLUTIONS TO CHAPTER 16
12.1 f0
0
2
0 )(
f
ff e−
fe
53 68 3.309
37 42 0.595
32 33 0.030
28 22 1.636
18 10 6.400
15 8 6.125
Ho: The observed distribution is the same
as the expected distribution.
Ha: The observed distribution is not the same
as the expected distribution.
Observed ∑
−
=
e
e
f
ff 2
02 )(
χ = 18.095
df = k - 1 = 6 - 1 = 5, α = .05
χ2
.05,5 = 11.07
Since the observed χ2
= 18.095 > χ2
.05,5 = 11.07, the decision is to reject the null
hypothesis.
The observed frequencies are not distributed the same as the expected
frequencies.
12.2 f0 fe
0
2
0 )(
f
ff e−
19 18 0.056
17 18 0.056
14 18 0.889
18 18 0.000
19 18 0.056
21 18 0.500
18 18 0.000
18 18 0.000
Σfo = 144 Σfe = 144 1.557
Chapter 12: Analysis of Categorical Data 4
Ho: The observed frequencies are uniformly distributed.
Ha: The observed frequencies are not uniformly distributed.
8
1440
==
∑
k
f
x = 18
In this uniform distribution, each fe = 18
df = k – 1 = 8 – 1 = 7, α = .01
χ2
.01,7 = 18.4753
Observed ∑
−
=
e
e
f
ff 2
02 )(
χ = 1.557
Since the observed χ2
= 1.557 < χ2
.01,7 = 18.4753, the decision is to fail to reject
the null hypothesis
There is no reason to conclude that the frequencies are not uniformly
distributed.
12.3 Number f0 (Number)(f0)
0 28 0
1 17 17
2 11 22
3 5 15
54
Ho: The frequency distribution is Poisson.
Ha: The frequency distribution is not Poisson.
λ =
61
54
=0.9
Expected Expected
Number Probability Frequency
0 .4066 24.803
1 .3659 22.312
2 .1647 10.047
> 3 .0628 3.831
Since fe for > 3 is less than 5, collapse categories 2 and >3:
Chapter 12: Analysis of Categorical Data 5
Number fo fe
0
2
0 )(
f
ff e−
0 28 24.803 0.412
1 17 22.312 1.265
>2 16 13.878 0.324
61 60.993 2.001
df = k - 2 = 3 - 2 = 1, = .05
χ2
.05,1 = 3.84146
Calculated ∑
−
=
e
e
f
ff 2
02 )(
χ = 2.001
Since the observed χ2
= 2.001 < χ2
.05,1 = 3.84146, the decision is to fail to reject
the null hypothesis.
There is insufficient evidence to reject the distribution as Poisson distributed.
The conclusion is that the distribution is Poisson distributed.
12.4
Category f(observed) Midpt. fm fm2
10-20 6 15 90 1,350
20-30 14 25 350 8,750
30-40 29 35 1,015 35,525
40-50 38 45 1,710 76,950
50-60 25 55 1,375 75,625
60-70 10 65 650 42,250
70-80 7 75 525 39,375
n = Σf = 129 Σfm = 5,715 Σfm2
= 279,825
129
715,5
==
∑
∑
f
fm
x = 44.3
s =
128
129
)715,5(
825,279
1
)( 22
2
−
=
−
−∑ ∑
n
n
fM
fM
= 14.43
Ho: The observed frequencies are normally distributed.
Ha: The observed frequencies are not normally distributed.
Chapter 12: Analysis of Categorical Data 6
For Category 10 - 20 Prob
z =
43.14
3.4410 −
= -2.38 .4913
z =
43.14
3.4420 −
= -1.68 - .4535
Expected prob.: .0378
For Category 20-30 Prob
for x = 20, z = -1.68 .4535
z =
43.14
3.4430 −
= -0.99 -.3389
Expected prob: .1146
For Category 30 - 40 Prob
for x = 30, z = -0.99 .3389
z =
43.14
3.4440 −
= -0.30 -.1179
Expected prob: .2210
For Category 40 - 50 Prob
for x = 40, z = -0.30 .1179
z =
43.14
3.4450 −
= 0.40 +.1554
Expected prob: .2733
For Category 50 - 60 Prob
z =
43.14
3.4460 −
= 1.09 .3621
for x = 50, z = 0.40 -.1554
Expected prob: .2067
Chapter 12: Analysis of Categorical Data 7
For Category 60 - 70 Prob
z =
43.14
3.4470 −
= 1.78 .4625
for x = 60, z = 1.09 -.3621
Expected prob: .1004
For Category 70 - 80 Prob
z =
43.14
3.4480 −
= 2.47 .4932
for x = 70, z = 1.78 -.4625
Expected prob: .0307
For < 10:
Probability between 10 and the mean, 44.3 = (.0378 + .1145 + .2210
+ .1179) = .4913. Probability < 10 = .5000 - .4913 = .0087
For > 80:
Probability between 80 and the mean, 44.3 = (.0307 + .1004 + .2067 + .1554) =
.4932. Probability > 80 = .5000 - .4932 = .0068
Category Prob expected frequency
< 10 .0087 .0087(129) = 0.99
10-20 .0378 .0378(129) = 4.88
20-30 .1146 14.78
30-40 .2210 28.51
40-50 .2733 35.26
50-60 .2067 26.66
60-70 .1004 12.95
70-80 .0307 3.96
> 80 .0068 0.88
Due to the small sizes of expected frequencies, category < 10 is folded into 10-20
and >80 into 70-80.
Chapter 12: Analysis of Categorical Data 8
Category fo fe
0
2
0 )(
f
ff e−
10-20 6 5.87 .003
20-30 14 14.78 .041
30-40 29 28.51 .008
40-50 38 32.26 .213
50-60 25 26.66 .103
60-70 10 12.95 .672
70-80 7 4.84 .964
2.004
Calculated ∑
−
=
e
e
f
ff 2
02 )(
χ = 2.004
df = k - 3 = 7 - 3 = 4, α = .05
χ2
.05,1 = 9.48773
Since the observed χ2
= 2.004 > χ2
.05,4 = 9.48773, the decision is to fail to reject
the null hypothesis. There is not enough evidence to declare that the observed
frequencies are not normally distributed.
12.5 Definition fo Exp.Prop. fe
0
2
0 )(
f
ff e−
Happiness 42 .39 227(.39)= 88.53 24.46
Sales/Profit 95 .12 227(.12)= 27.24 168.55
Helping Others 27 .18 40.86 4.70
Achievement/
Challenge 63 .31 70.34 0.77
227 198.48
Ho: The observed frequencies are distributed the same as the expected
frequencies.
Ha: The observed frequencies are not distributed the same as the expected
frequencies.
Observed χ2
= 198.48
df = k – 1 = 4 – 1 = 3, α = .05
χ2
.05,3 = 7.81473
Chapter 12: Analysis of Categorical Data 9
Since the observed χ2
= 198.48 > χ2
.05,3 = 7.81473, the decision is to reject the
null hypothesis.
The observed frequencies for men are not distributed the same as the
expected frequencies which are based on the responses of women.
12.6 Age fo Prop. from survey fe
0
2
0 )(
f
ff e−
10-14 22 .09 (.09)(212)=19.08 0.45
15-19 50 .23 (.23)(212)=48.76 0.03
20-24 43 .22 46.64 0.28
25-29 29 .14 29.68 0.02
30-34 19 .10 21.20 0.23
> 35 49 .22 46.64 0.12
212 1.13
Ho: The distribution of observed frequencies is the same as the distribution of
expected frequencies.
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
α = .01, df = k - 1 = 6 - 1 = 5
χ2
.01,5 = 15.0863
The observed χ2
= 1.13
Since the observed χ2
= 1.13 < χ2
.01,5 = 15.0863, the decision is to fail to reject
the null hypothesis.
There is not enough evidence to declare that the distribution of observed
frequencies is different from the distribution of expected frequencies.
Chapter 12: Analysis of Categorical Data 10
12.7 Age fo m fm fm2
10-20 16 15 240 3,600
20-30 44 25 1,100 27,500
30-40 61 35 2,135 74,725
40-50 56 45 2,520 113,400
50-60 35 55 1,925 105,875
60-70 19 65 1,235 80,275
231 Σfm = 9,155 Σfm2
= 405,375
231
155,9
==
∑
n
fM
x = 39.63
s =
230
231
)155,9(
375,405
1
)( 22
2
−
=
−
−∑ ∑
n
n
fM
fM
= 13.6
Ho: The observed frequencies are normally distributed.
Ha: The observed frequencies are not normally distributed.
For Category 10-20 Prob
z =
6.13
63.3910 −
= -2.18 .4854
z =
6.13
63.3920 −
= -1.44 -.4251
Expected prob. .0603
For Category 20-30 Prob
for x = 20, z = -1.44 .4251
z =
6.13
63.3930 −
= -0.71 -.2611
Expected prob. .1640
For Category 30-40 Prob
for x = 30, z = -0.71 .2611
z =
6.13
63.3940 −
= 0.03 +.0120
Expected prob. .2731
Chapter 12: Analysis of Categorical Data 11
For Category 40-50 Prob
z =
6.13
63.3950 −
= 0.76 .2764
for x = 40, z = 0.03 -.0120
Expected prob. .2644
For Category 50-60 Prob
z =
6.13
63.3960 −
= 1.50 .4332
for x = 50, z = 0.76 -.2764
Expected prob. .1568
For Category 60-70 Prob
z =
6.13
63.3970 −
= 2.23 .4871
for x = 60, z = 1.50 -.4332
Expected prob. .0539
For < 10:
Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854
Probability < 10 = .5000 - .4854 = .0146
For > 70:
Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871
Probability > 70 = .5000 - .4871 = .0129
Age Probability fe
< 10 .0146 (.0146)(231) = 3.37
10-20 .0603 (.0603)(231) = 13.93
20-30 .1640 37.88
30-40 .2731 63.09
40-50 .2644 61.08
Chapter 12: Analysis of Categorical Data 12
50-60 .1568 36.22
60-70 .0539 12.45
> 70 .0129 2.98
Categories < 10 and > 70 are less than 5.
Collapse the < 10 into 10-20 and > 70 into 60-70.
Age fo fe
0
2
0 )(
f
ff e−
10-20 16 17.30 0.10
20-30 44 37.88 0.99
30-40 61 63.09 0.07
40-50 56 61.08 0.42
50-60 35 36.22 0.04
60-70 19 15.43 0.83
2.45
df = k - 3 = 6 - 3 = 3, α = .05
χ2
.05,3 = 7.81473
Observed χ2
= 2.45
Since the observed χ2
< χ2
.05,3 = 7.81473, the decision is to fail to reject the null
hypothesis.
There is no reason to reject that the observed frequencies are normally
distributed.
12.8 Number f (f)⋅ (number)
0 18 0
1 28 28
2 47 94
3 21 63
4 16 64
5 11 55
6 or more 9 54
Σf = 150 Σf⋅(number) = 358
λ =
150
358
=
⋅
∑
∑
f
numberf
= 2.4
Ho: The observed frequencies are Poisson distributed.
Ha: The observed frequencies are not Poisson distributed.
Chapter 12: Analysis of Categorical Data 13
Number Probability fe
0 .0907 (.0907)(150 = 13.61
1 .2177 (.2177)(150) = 32.66
2 .2613 39.20
3 .2090 31.35
4 .1254 18.81
5 .0602 9.03
6 or more .0358 5.36
fo fe
0
2
0 )(
f
ff e−
18 13.61 1.42
28 32.66 0.66
47 39.66 1.55
21 31.35 3.42
16 18.81 0.42
11 9.03 0.43
9 5.36 2.47
10.37
The observed χ2
= 10.27
α = .01, df = k – 2 = 7 – 2 = 5, χ2
.01,5 = 15.0863
Since the observed χ2
= 10.27 < χ2
.01,5 = 15.0863, the decision is to fail to reject
the null hypothesis.
There is not enough evidence to reject the claim that the observed
frequencies are Poisson distributed.
12.9 H0: p = .28 n = 270 x = 62
Ha: p ≠ .28
fo fe
0
2
0 )(
f
ff e−
Spend More 62 270(.28) = 75.6 2.44656
Don't Spend More 208 270(.72) = 194.4 0.95144
Total 270 270.0 3.39800
Chapter 12: Analysis of Categorical Data 14
The observed value of χ2
is 3.398
α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1
χ2
.025,1 = 5.02389
Since the observed χ2
= 3.398 < χ2
.025,1 = 5.02389, the decision is to fail to
reject the null hypothesis.
12.10 H0: p = .30 n = 180 x= 42
Ha: p ≠ .30
f0 fe
0
2
0 )(
f
ff e−
Provide 42 180(.30) = 54 2.6666
Don't Provide 138 180(.70) = 126 1.1429
Total 180 180 3.8095
The observed value of χ2
is 3.8095
α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1
χ2
.025,1 = 5.02389
Since the observed χ2
= 3.8095 < χ2
.025,1 = 5.02389, the decision is to fail to
reject the null hypothesis.
Chapter 12: Analysis of Categorical Data 15
12.11
Variable Two
Variable
One
203 326 529
17868 110
271 436 707
Ho: Variable One is independent of Variable Two.
Ha: Variable One is not independent of Variable Two.
e11 =
707
)271)(529(
= 202.77 e12 =
707
)436)(529(
= 326.23
e21 =
707
)178)(271(
= 68.23 e22 =
707
)178)(436(
= 109.77
Variable Two
Variable
One
(202.77)
203
(326.23)
326 529
178
(68.23)
68
(109.77)
110
271 436 707
χ2
=
77.202
)77.202203( 2
−
+
23.326
)23.326326( 2
−
+
23.68
)23.668( 2
−
+
77.109
)77.109110( 2
−
=
.00 + .00 + .00 + .00 = 0.00
α = .05, df = (c-1)(r-1) = (2-1)(2-1) = 1
χ2
.05,1 = 3.84146
Since the observed χ2
= 0.00 < χ2
.05,1 = 3.84146, the decision is to fail to reject
the null hypothesis.
Variable One is independent of Variable Two.
Chapter 12: Analysis of Categorical Data 16
12.12
Variable
Two
Variable
One
24 13 47 58 142
58393 59 187 244
117 72 234 302 725
Ho: Variable One is independent of Variable Two.
Ha: Variable One is not independent of Variable Two.
e11 =
725
)117)(142(
= 22.92 e12 =
725
)72)(142(
= 14.10
e13 =
725
)234)(142(
= 45.83 e14 =
725
)302)(142(
= 59.15
e21 =
725
)117)(583(
= 94.08 e22 =
725
)72)(583(
= 57.90
e23 =
725
)234)(583(
= 188.17 e24 =
725
)302)(583(
= 242.85
Variable
Two
Variable
One
(22.92)
24
(14.10)
13
(45.83)
47
(59.15)
58 142
583
(94.08)
93
(57.90)
59
(188.17)
187
(242.85)
244
117 72 234 302 725
χ2
=
92.22
)92.2224( 2
−
+
10.14
)10.1413( 2
−
+
83.45
)83.4547( 2
−
+
15.59
)15.5958( 2
−
+
08.94
)08.9493( 2
−
+
90.57
)90.5759( 2
−
+
17.188
)17.188188( 2
−
+
85.242
)85.242244( 2
−
=
.05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24
Chapter 12: Analysis of Categorical Data 17
α = .01, df = (c-1)(r-1) = (4-1)(2-1) = 3
χ2
.01,3 = 11.3449
Since the observed χ2
= 0.24 < χ2
.01,3 = 11.3449, the decision is to fail to
reject the null hypothesis.
Variable One is independent of Variable Two.
12.13
Social Class
Number
of
Children
Lower Middle Upper
0
1
2 or 3
>3
7 18 6 31
70
189
108
9 38 23
34 97 58
47 31 30
97 184 117 398
Ho: Social Class is independent of Number of Children.
Ha: Social Class is not independent of Number of Children.
e11 =
398
)97)(31(
= 7.56 e31 =
398
)97)(189(
= 46.06
e12 =
398
)184)(31(
= 14.3 e32 =
398
)184)(189(
= 87.38
e13 =
398
)117)(31(
= 9.11 e33 =
398
)117)(189(
= 55.56
e21 =
398
)97)(70(
= 17.06 e41 =
398
)97)(108(
= 26.32
e22 =
398
)184)(70(
= 32.36 e42 =
398
)184)(108(
= 49.93
e23 =
398
)117)(70(
= 20.58 e43 =
398
)117)(108(
= 31.75
Chapter 12: Analysis of Categorical Data 18
Social Class
Number
of
Children
Lower Middle Upper
0
1
2 or 3
>3
(7.56)
7
(14.33)
18
(9.11)
6 31
70
189
108
(17.06)
9
(32.36)
38
(20.58)
23
(46.06)
34
(87.38)
97
(55.56)
58
(26.32)
47
(49.93)
31
(31.75)
30
97 184 117 398
χ2
=
56.7
)56.77( 2
−
+
33.14
)33.1418( 2
−
+
11.9
)11.96( 2
−
+
06.17
)06.179( 2
−
+
36.32
)36.3238( 2
−
+
58.20
)58.2023( 2
−
+
06.46
)06.4634( 2
−
+
38.87
)38.8797( 2
−
+
56.55
)56.5558( 2
−
+
32.26
)32.2647( 2
−
+
93.49
)93.4931( 2
−
+
75.31
)75.3130( 2
−
=
.04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 +
7.18 + .10 = 34.97
α = .05, df = (c-1)(r-1) = (3-1)(4-1) = 6
χ2
.05,6 = 12.5916
Since the observed χ2
= 34.97 > χ2
.05,6 = 12.5916, the decision is to reject the
null hypothesis.
Number of children is not independent of social class.
Chapter 12: Analysis of Categorical Data 19
12.14
Type of Music Preferred
Region
Rock R&B Coun Clssic
195
235
202
632
NE 140 32 5 18
S 134 41 52 8
W 154 27 8 13
428 100 65 39
Ho: Type of music preferred is independent of region.
Ha: Type of music preferred is not independent of region.
e11 =
632
)428)(195(
= 132.6 e23 =
632
)65)(235(
= 24.17
e12 =
632
)100)(195(
= 30.85 e24 =
632
)39)(235(
= 14.50
e13 =
632
)65)(195(
= 20.06 e31 =
632
)428)(202(
= 136.80
e14 =
632
)39)(195(
= 12.03 e32 =
632
)100)(202(
= 31.96
e21 =
632
)428)(235(
= 159.15 e33 =
632
)65)(202(
= 20.78
e22 =
632
)100)(235(
= 37.18 e34 =
632
)39)(202(
= 12.47
Type of Music Preferred
Region
Rock R&B Coun Clssic
195
235
202
632
NE (132.06)
140
(30.85)
32
(20.06)
5
(12.03)
18
S (159.15)
134
(37.18)
41
(24.17)
52
(14.50)
8
W (136.80)
154
(31.96)
27
(20.78)
8
(12.47)
13
428 100 65 39
Chapter 12: Analysis of Categorical Data 20
χ2
=
06.132
)06.132141( 2
−
+
85.30
)85.3032( 2
−
+
06.20
)06.205( 2
−
+
03.12
)03.1218( 2
−
+
15.159
)15.159134( 2
−
+
18.37
)18.3741( 2
−
+
17.24
)17.2452( 2
−
+
50.14
)50.148( 2
−
+
80.136
)80.136154( 2
−
+
96.31
)96.3127( 2
−
+
78.20
)78.208( 2
−
+
47.12
)47.1213( 2
−
=
.48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 +
7.86 + .02 = 64.91
α = .01, df = (c-1)(r-1) = (4-1)(3-1) = 6
χ2
.01,6 = 16.8119
Since the observed χ2
= 64.91 > χ2
.01,6 = 16.8119, the decision is to
reject the null hypothesis.
Type of music preferred is not independent of region of the country.
12.15
Transportation Mode
Industry
Air Train Truck
85
35
120
Publishing 32 12 41
Comp.Hard. 5 6 24
37 18 65
H0: Transportation Mode is independent of Industry.
Ha: Transportation Mode is not independent of Industry.
e11 =
120
)37)(85(
= 26.21 e21 =
120
)37)(35(
= 10.79
e12 =
120
)18)(85(
= 12.75 e22 =
120
)18)(35(
= 5.25
e13 =
120
)65)(85(
= 46.04 e23 =
120
)65)(35(
= 18.96
Chapter 12: Analysis of Categorical Data 21
Transportation Mode
Industry
Air Train Truck
85
35
120
Publishing (26.21)
32
(12.75)
12
(46.04)
41
Comp.Hard. (10.79)
5
(5.25)
6
(18.96)
24
37 18 65
χ2
=
21.26
)21.2632( 2
−
+
75.12
)75.1212( 2
−
+
04.46
)04.4641( 2
−
+
79.10
)79.105( 2
−
+
25.5
)25.56( 2
−
+
96.18
)96.1824( 2
−
=
1.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43
α = .05, df = (c-1)(r-1) = (3-1)(2-1) = 2
χ2
.05,2 = 5.99147
Since the observed χ2
= 6.431 > χ2
.05,2 = 5.99147, the decision is to
reject the null hypothesis.
Transportation mode is not independent of industry.
12.16
Number of Bedrooms
Number of
Stories
< 2 3 > 4
274
575
1 116 101 57
2 90 325 160
206 426 217 849
H0: Number of Stories is independent of number of bedrooms.
Ha: Number of Stories is not independent of number of bedrooms.
e11 =
849
)206)(274(
= 66.48 e21 =
849
)206)(575(
= 139.52
e12 =
849
)426)(274(
= 137.48 e22 =
849
)426)(575(
= 288.52
Chapter 12: Analysis of Categorical Data 22
e13 =
849
)217)(274(
= 70.03 e23 =
849
)217)(575(
= 146.97
χ2
=
52.139
)52.13990( 2
−
+
48.137
)48.137101( 2
−
+
03.70
)03.7057( 2
−
+
52.139
)52.13990( 2
−
+
52.288
)52.288325( 2
−
+
97.146
)97.146160( 2
−
=
χ2
= 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34
α = .10 df = (c-1)(r-1) = (3-1)(2-1) = 2
χ2
.10,2 = 4.60517
Since the observed χ2
= 72.34 > χ2
.10,2 = 4.60517, the decision is to
reject the null hypothesis.
Number of stories is not independent of number of bedrooms.
12.17
Mexican Citizens
Type
of
Store
Yes No
41
35
30
60
Dept. 24 17
Disc. 20 15
Hard. 11 19
Shoe 32 28
87 79 166
Ho: Citizenship is independent of store type
Ha: Citizenship is not independent of store type
e11 =
166
)87)(41(
= 21.49 e31 =
166
)87)(30(
= 15.72
e12 =
166
)79)(41(
= 19.51 e32 =
166
)79)(30(
= 14.28
e21 =
166
)87)(35(
= 18.34 e41 =
166
)87)(60(
= 31.45
Chapter 12: Analysis of Categorical Data 23
e22 =
166
)79)(35(
= 16.66 e42 =
166
)79)(60(
= 28.55
Mexican Citizens
Type
of
Store
Yes No
41
35
30
60
Dept. (21.49)
24
(19.51)
17
Disc. (18.34)
20
(16.66)
15
Hard. (15.72)
11
(14.28)
19
Shoe (31.45)
32
(28.55)
28
87 79 166
χ2
=
49.21
)49.2124( 2
−
+
51.19
)51.1917( 2
−
+
34.18
)34.1820( 2
−
+
66.16
)66.1615( 2
−
+
72.15
)72.1511( 2
−
+
28.14
)28.1419( 2
−
+
45.31
)45.3132( 2
−
+
55.28
)55.2828( 2
−
=
.29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93
α = .05, df = (c-1)(r-1) = (2-1)(4-1) = 3
χ2
.05,3 = 7.81473
Since the observed χ2
= 3.93 < χ2
.05,3 = 7.81473, the decision is to fail to
reject the null hypothesis.
Citizenship is independent of type of store.
Chapter 12: Analysis of Categorical Data 24
12.18 α = .01, k = 7, df = 6
H0: The observed distribution is the same as the expected distribution
Ha: The observed distribution is not the same as the expected distribution
Use:
∑
−
=
e
e
f
ff 2
02 )(
χ
critical χ2
.01,7 = 18.4753
fo fe (f0-fe)2
0
2
0 )(
f
ff e−
214 206 64 0.311
235 232 9 0.039
279 268 121 0.451
281 284 9 0.032
264 268 16 0.060
254 232 484 2.086
211 206 25 0.121
3.100
∑
−
=
e
e
f
ff 2
02 )(
χ = 3.100
Since the observed value of χ2
= 3.1 < χ2
.01,7 = 18.4753, the decision is to fail to
reject the null hypothesis. The observed distribution is not different from the
expected distribution.
12.19
Variable 2
Variable 1
12 23 21 56
8 17 20 45
7 11 18 36
27 51 59 137
e11 = 11.00 e12 = 20.85 e13 = 24.12
e21 = 8.87 e22 = 16.75 e23 = 19.38
Chapter 12: Analysis of Categorical Data 25
e31 = 7.09 e32 = 13.40 e33 = 15.50
χ2
=
04.11
)04.1112( 2
−
+
85.20
)85.2023( 2
−
+
12.24
)12.2421( 2
−
+
87.8
)87.88( 2
−
+
75.16
)75.1617( 2
−
+
38.19
)38.1920( 2
−
+
09.7
)09.77( 2
−
+
40.13
)40.1311( 2
−
+
50.15
)50.1518( 2
−
=
.084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652
df = (c-1)(r-1) = (2)(2) = 4 α = .05
χ2
.05,4 = 9.48773
Since the observed value of χ2
= 1.652 < χ2
.05,4 = 9.48773, the decision is to fail
to reject the null hypothesis.
12.20
Location
NE W S
Customer Industrial 230 115 68 413
Retail 185 143 89 417
415 258 157 830
e11 =
830
)415)(413(
= 206.5 e21 =
830
)415)(417(
= 208.5
e12 =
830
)258)(413(
= 128.38 e22 =
830
)258)(417(
= 129.62
e13 =
830
)157)(413(
= 78.12 e23 =
830
)157)(417(
= 78.88
Chapter 12: Analysis of Categorical Data 26
Location
NE W S
Customer Industrial (206.5)
230
(128.38)
115
(78.12)
68 413
Retail (208.5)
185
(129.62)
143
(78.88)
89 417
415 258 157 830
χ2
=
5.206
)5.206230( 2
−
+
38.128
)38.128115( 2
−
+
12.78
)12.7868( 2
−
+
5.208
)5.208185( 2
−
+
62.129
)62.129143( 2
−
+
88.78
)88.7889( 2
−
=
2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70
α = .10 and df = (c - 1)(r - 1) = (3 - 1)(2 - 1) = 2
χ2
.10,2 = 4.60517
Since the observed χ2
= 10.70 > χ2
.10,2 = 4.60517, the decision is to reject the
null hypothesis.
Type of customer is not independent of geographic region.
12.21 Cookie Type fo
Chocolate Chip 189
Peanut Butter 168
Cheese Cracker 155
Lemon Flavored 161
Chocolate Mint 216
Vanilla Filled 165
Σfo = 1,054
Ho: Cookie Sales is uniformly distributed across kind of cookie.
Ha: Cookie Sales is not uniformly distributed across kind of cookie.
If cookie sales are uniformly distributed, then fe =
6
054,1
.
0
=
∑
kindsno
f
= 175.67
Chapter 12: Analysis of Categorical Data 27
fo fe
0
2
0 )(
f
ff e−
189 175.67 1.01
168 175.67 0.33
155 175.67 2.43
161 175.67 1.23
216 175.67 9.26
165 175.67 0.65
14.91
The observed χ2
= 14.91
α = .05 df = k - 1 = 6 - 1 = 5
χ2
.05,5 = 11.0705
Since the observed χ2
= 14.91 > χ2
.05,5 = 11.0705, the decision is to reject the
null hypothesis.
Cookie Sales is not uniformly distributed by kind of cookie.
12.22
Gender
M F
Bought
Car
Y 207 65 272
N 811 984 1,795
1,018 1,049 2,067
Ho: Purchasing a car or not is independent of gender.
Ha: Purchasing a car or not is not independent of gender.
e11 =
067,2
)018,1)(272(
= 133.96 e12 =
067,2
)049,1)(27(
= 138.04
e21 =
067,2
)018,1)(795,1(
= 884.04 e22 =
067,2
)049,1)(795,1(
= 910.96
Chapter 12: Analysis of Categorical Data 28
Gender
M F
Bought
Car
Y (133.96)
207
(138.04)
65 272
N (884.04)
811
(910.96)
984 1,795
1,018 1,049 2,067
χ2
=
96.133
)96.133207( 2
−
+
04.138
)04.13865( 2
−
+
04.884
)04.884811( 2
−
+
96.910
)96.910984( 2
−
= 39.82 + 38.65 + 6.03 + 5.86 = 90.36
α = .05 df = (c-1)(r-1) = (2-1)(2-1) = 1
χ2
.05,1 = 3.841
Since the observed χ2
= 90.36 > χ2
.05,1 = 3.841, the decision is to reject the
null hypothesis.
Purchasing a car is not independent of gender.
12.23 Arrivals fo (fo)(Arrivals)
0 26 0
1 40 40
2 57 114
3 32 96
4 17 68
5 12 60
6 8 48
Σfo = 192 Σ(fo)(arrivals) = 426
λ =
192
426))((
0
0
=
∑
∑
f
arrivalsf
= 2.2
Ho: The observed frequencies are Poisson distributed.
Ha: The observed frequencies are not Poisson distributed.
Chapter 12: Analysis of Categorical Data 29
Arrivals Probability fe
0 .1108 (.1108)(192) = 21.27
1 .2438 (.2438)(192) = 46.81
2 .2681 51.48
3 .1966 37.75
4 .1082 20.77
5 .0476 9.14
6 .0249 4.78
fo fe
0
2
0 )(
f
ff e−
26 21.27 1.05
40 46.81 2.18
57 51.48 0.59
32 37.75 0.88
17 20.77 0.68
12 9.14 0.89
8 4.78 2.17
8.44
Observed χ2
= 8.44
α = .05 df = k - 2 = 7 - 2 = 5
χ2
.05,5 = 11.0705
Since the observed χ2
= 8.44 < χ2
.05,5 = 11.0705, the decision is to fail to reject
the null hypothesis. There is not enough evidence to reject the claim that the
observed frequency of arrivals is Poisson distributed.
Chapter 12: Analysis of Categorical Data 30
12.24 Ho: The distribution of observed frequencies is the same as the
distribution of expected frequencies.
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
Soft Drink fo proportions fe
0
2
0 )(
f
ff e−
Classic Coke 361 .206 (.206)(1726) = 355.56 0.08
Pepsi 272 .145 (.145)(1726) = 250.27 1.89
Diet Coke 192 .085 146.71 13.98
Mt. Dew 121 .063 108.74 1.38
Dr. Pepper 94 .059 101.83 0.60
Sprite 102 .062 107.01 0.23
Others 584 .380 655.86 7.87
∑fo = 1,726 26.03
Calculated χ2
= 26.03
α = .05 df = k - 1 = 7 - 1 = 6
χ2
.05,6 = 12.5916
Since the observed χ2
= 26.03 > χ2
.05,6 = 12.5916, the decision is to reject the
null hypothesis.
The observed frequencies are not distributed the same as the expected frequencies
from the national poll.
12.25
Position
Manager Programmer Operator
Systems
Analyst
Years
0-3 6 37 11 13 67
4-8 28 16 23 24 91
> 8 47 10 12 19 88
81 63 46 56 246
Chapter 12: Analysis of Categorical Data 31
e11 =
246
)81)(67(
= 22.06 e23 =
246
)46)(91(
= 17.02
e12 =
246
)63)(67(
= 17.16 e24 =
246
)56)(91(
= 20.72
e13 =
246
)46)(67(
= 12.53 e31 =
246
)81)(88(
= 28.98
e14 =
246
)56)(67(
= 15.25 e32 =
246
)63)(88(
= 22.54
e21 =
246
)81)(91(
= 29.96 e33 =
246
)46)(88(
= 16.46
e22 =
246
)63)(91(
= 23.30 e34 =
246
)56)(88(
= 20.03
Position
Manager Programmer Operator
Systems
Analyst
Years
0-3 (22.06)
6
(17.16)
37
(12.53)
11
(15.25)
13
67
4-8 (29.96)
28
(23.30)
16
(17.02)
23
(20.72)
24
91
> 8 (28.98)
47
(22.54)
10
(16.46)
12
(20.03)
19
88
81 63 46 56 246
χ2
=
06.22
)06.226( 2
−
+
16.17
)16.1737( 2
−
+
53.12
)53.1211( 2
−
+
25.15
)25.1513( 2
−
+
96.29
)96.2928( 2
−
+
30.23
)30.2316( 2
−
+
02.17
)02.1723( 2
−
+
72.20
)72.2024( 2
−
+
98.28
)98.2847( 2
−
+
54.22
)54.2210( 2
−
+
46.16
)46.1612( 2
−
+
03.20
)03.2019( 2
−
=
11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 +
1.21 + .05 = 59.63
Chapter 12: Analysis of Categorical Data 32
α = .01 df = (c-1)(r-1) = (4-1)(3-1) = 6
χ2
.01,6 = 16.8119
Since the observed χ2
= 59.63 > χ2
.01,6 = 16.8119, the decision is to reject the
null hypothesis. Position is not independent of number of years of experience.
12.26 H0: p = .43 n = 315 α =.05
Ha: p ≠ .43 x = 120 α/2 = .025
fo fe
0
2
0 )(
f
ff e−
More Work,
More Business 120 (.43)(315) = 135.45 1.76
Others 195 (.57)(315) = 179.55 1.33
Total 315 315.00 3.09
The calculated value of χ2
is 3.09
α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1
χ2
.025,1 = 5.02389
Since χ2
= 3.09 < χ2
.025,1 = 5.02389, the decision is to fail to reject the null
hypothesis.
12.27
Type of College or University
Community
College
Large
University
Small
College
Number
of
Children
0 25 178 31 234
1 49 141 12 202
2 31 54 8 93
>3 22 14 6 42
127 387 57 571
Ho: Number of Children is independent of Type of College or University.
Ha: Number of Children is not independent of Type of College or University.
Chapter 12: Analysis of Categorical Data 33
e11 =
571
)127)(234(
= 52.05 e31 =
571
)127)(93(
= 20.68
e12 =
571
)387)(234(
= 158.60 e32 =
571
)387)(193(
= 63.03
e13 =
571
)57)(234(
= 23.36 e33 =
571
)57)(93(
= 9.28
e21 =
571
)127)(202(
= 44.93 e41 =
571
)127)(42(
= 9.34
e22 =
571
)387)(202(
= 136.91 e42 =
571
)387)(42(
= 28.47
e23 =
571
)57)(202(
= 20.16 e43 =
571
)57)(42(
= 4.19
Type of College or University
Community
College
Large
University
Small
College
Number
of
Children
0 (52.05)
25
(158.60)
178
(23.36)
31
234
1 (44.93)
49
(136.91)
141
(20.16)
12
202
2 (20.68)
31
(63.03)
54
(9.28)
8
93
>3 (9.34)
22
(28.47)
14
(4.19)
6
42
127 387 57 571
χ2
=
05.52
)05.5225( 2
−
+
6.158
)6.158178( 2
−
+
36.23
)36.2331( 2
−
+
93.44
)93.4449( 2
−
+
91.136
)91.136141( 2
−
+
16.20
)16.2012( 2
−
+
68.20
)68.2031( 2
−
+
03.63
)03.6354( 2
−
+
28.9
)28.98( 2
−
+
34.9
)34.922( 2
−
+
47.28
)47.2814( 2
−
+
19.4
)19.46( 2
−
=
Chapter 12: Analysis of Categorical Data 34
14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 +
17.16 + 7.35 + 0.78 = 54.63
α = .05, df= (c - 1)(r - 1) = (3 - 1)(4 - 1) = 6
χ2
.05,6 = 12.5916
Since the observed χ2
= 54.63 > χ2
.05,6 = 12.5916, the decision is to reject the
null hypothesis.
Number of children is not independent of type of College or University.
12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square
goodness-of-fit test indicates that there is a significant difference between the
observed frequencies and the expected frequencies. The distribution of responses
to the question are not the same for adults between 21 and 30 years of age as they
are to others. Marketing and sales people might reorient their 21 to 30 year old
efforts away from home improvement and pay more attention to leisure
travel/vacation, clothing, and home entertainment.
12.29 The observed chi-square value for this test of independence is 5.366. The
associated p-value of .252 indicates failure to reject the null hypothesis. There is
not enough evidence here to say that color choice is dependent upon gender.
Automobile marketing people do not have to worry about which colors especially
appeal to men or to women because car color is independent of gender. In
addition, design and production people can determine car color quotas based on
other variables.

12 ch ken black solution

  • 1.
    Chapter 12: Analysisof Categorical Data 1 Chapter 12 Analysis of Categorical Data LEARNING OBJECTIVES This chapter presents several nonparametric statistics that can be used to analyze data enabling you to: 1. Understand the chi-square goodness-of-fit test and how to use it. 2. Analyze data using the chi-square test of independence. CHAPTER TEACHING STRATEGY Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi- square goodness-of-fit and chi-square test of independence. These two techniques are important because they give the statistician a tool that is particularly useful for analyzing nominal data (even though independent variable categories can sometimes have ordinal or higher categories). It should be emphasized that there are many instances in business research where the resulting data gathered are merely categorical identification. For example, in segmenting the market place (consumers or industrial users), information is gathered regarding gender, income level, geographical location, political affiliation, religious preference, ethnicity, occupation, size of company, type of industry, etc. On these variables, the measurement is often a tallying of the frequency of occurrence of individuals, items, or companies in each category. The subject of the research is given no "score" or "measurement" other than a 0/1 for being a member or not of a given category. These two chi-square tests are perfectly tailored to analyze such data. The chi-square goodness-of-fit test examines the categories of one variable to determine if the distribution of observed occurrences matches some expected or theoretical distribution of occurrences. It can be used to determine if some standard or previously known distribution of proportions is the same as some observed distribution of
  • 2.
    Chapter 12: Analysisof Categorical Data 2 proportions. It can also be used to validate the theoretical distribution of occurrences of phenomena such as random arrivals which are often assumed to be Poisson distributed. You will note that the degrees of freedom which are k - 1 for a given set of expected values or for the uniform distribution change to k - 2 for an expected Poisson distribution and to k - 3 for an expected normal distribution. To conduct a chi-square goodness-of-fit test to analyze an expected Poisson distribution, the value of lambda must be estimated from the observed data. This causes the loss of an additional degree of freedom. With the normal distribution, both the mean and standard deviation of the expected distribution are estimated from the observed values causing the loss of two additional degrees of freedom from the k - 1 value. The chi-square test of independence is used to compare the observed frequencies along the categories of two independent variables to expected values to determine if the two variables are independent or not. Of course, if the variables are not independent, they are dependent or related. This allows business researchers to reach some conclusions about such questions as is smoking independent of gender or is type of housing preferred independent of geographic region. The chi-square test of independence is often used as a tool for preliminary analysis of data gathered in exploratory research where the researcher has little idea of what variables seem to be related to what variables, and the data are nominal. This test is particularly useful with demographic type data. A word of warning is appropriate here. When an expected frequency is small, the observed chi-square value can be inordinately large thus yielding an increased possibility of committing a Type I error. The research on this problem has yielded varying results with some authors indicating that expected values as low as two or three are acceptable and other researchers demanding that expected values be ten or more. In this text, we have settled on the fairly widespread accepted criterion of five or more. CHAPTER OUTLINE 16.1 Chi-Square Goodness-of-Fit Test Testing a Population Proportion Using the Chi-square Goodness-of-Fit Test as an Alternative Technique to the z Test 16.2 Contingency Analysis: Chi-Square Test of Independence KEY TERMS Categorical Data Chi-Square Test of Independence Chi-Square Distribution Contingency Analysis Chi-Square Goodness-of-Fit Test Contingency Table
  • 3.
    Chapter 12: Analysisof Categorical Data 3 SOLUTIONS TO CHAPTER 16 12.1 f0 0 2 0 )( f ff e− fe 53 68 3.309 37 42 0.595 32 33 0.030 28 22 1.636 18 10 6.400 15 8 6.125 Ho: The observed distribution is the same as the expected distribution. Ha: The observed distribution is not the same as the expected distribution. Observed ∑ − = e e f ff 2 02 )( χ = 18.095 df = k - 1 = 6 - 1 = 5, α = .05 χ2 .05,5 = 11.07 Since the observed χ2 = 18.095 > χ2 .05,5 = 11.07, the decision is to reject the null hypothesis. The observed frequencies are not distributed the same as the expected frequencies. 12.2 f0 fe 0 2 0 )( f ff e− 19 18 0.056 17 18 0.056 14 18 0.889 18 18 0.000 19 18 0.056 21 18 0.500 18 18 0.000 18 18 0.000 Σfo = 144 Σfe = 144 1.557
  • 4.
    Chapter 12: Analysisof Categorical Data 4 Ho: The observed frequencies are uniformly distributed. Ha: The observed frequencies are not uniformly distributed. 8 1440 == ∑ k f x = 18 In this uniform distribution, each fe = 18 df = k – 1 = 8 – 1 = 7, α = .01 χ2 .01,7 = 18.4753 Observed ∑ − = e e f ff 2 02 )( χ = 1.557 Since the observed χ2 = 1.557 < χ2 .01,7 = 18.4753, the decision is to fail to reject the null hypothesis There is no reason to conclude that the frequencies are not uniformly distributed. 12.3 Number f0 (Number)(f0) 0 28 0 1 17 17 2 11 22 3 5 15 54 Ho: The frequency distribution is Poisson. Ha: The frequency distribution is not Poisson. λ = 61 54 =0.9 Expected Expected Number Probability Frequency 0 .4066 24.803 1 .3659 22.312 2 .1647 10.047 > 3 .0628 3.831 Since fe for > 3 is less than 5, collapse categories 2 and >3:
  • 5.
    Chapter 12: Analysisof Categorical Data 5 Number fo fe 0 2 0 )( f ff e− 0 28 24.803 0.412 1 17 22.312 1.265 >2 16 13.878 0.324 61 60.993 2.001 df = k - 2 = 3 - 2 = 1, = .05 χ2 .05,1 = 3.84146 Calculated ∑ − = e e f ff 2 02 )( χ = 2.001 Since the observed χ2 = 2.001 < χ2 .05,1 = 3.84146, the decision is to fail to reject the null hypothesis. There is insufficient evidence to reject the distribution as Poisson distributed. The conclusion is that the distribution is Poisson distributed. 12.4 Category f(observed) Midpt. fm fm2 10-20 6 15 90 1,350 20-30 14 25 350 8,750 30-40 29 35 1,015 35,525 40-50 38 45 1,710 76,950 50-60 25 55 1,375 75,625 60-70 10 65 650 42,250 70-80 7 75 525 39,375 n = Σf = 129 Σfm = 5,715 Σfm2 = 279,825 129 715,5 == ∑ ∑ f fm x = 44.3 s = 128 129 )715,5( 825,279 1 )( 22 2 − = − −∑ ∑ n n fM fM = 14.43 Ho: The observed frequencies are normally distributed. Ha: The observed frequencies are not normally distributed.
  • 6.
    Chapter 12: Analysisof Categorical Data 6 For Category 10 - 20 Prob z = 43.14 3.4410 − = -2.38 .4913 z = 43.14 3.4420 − = -1.68 - .4535 Expected prob.: .0378 For Category 20-30 Prob for x = 20, z = -1.68 .4535 z = 43.14 3.4430 − = -0.99 -.3389 Expected prob: .1146 For Category 30 - 40 Prob for x = 30, z = -0.99 .3389 z = 43.14 3.4440 − = -0.30 -.1179 Expected prob: .2210 For Category 40 - 50 Prob for x = 40, z = -0.30 .1179 z = 43.14 3.4450 − = 0.40 +.1554 Expected prob: .2733 For Category 50 - 60 Prob z = 43.14 3.4460 − = 1.09 .3621 for x = 50, z = 0.40 -.1554 Expected prob: .2067
  • 7.
    Chapter 12: Analysisof Categorical Data 7 For Category 60 - 70 Prob z = 43.14 3.4470 − = 1.78 .4625 for x = 60, z = 1.09 -.3621 Expected prob: .1004 For Category 70 - 80 Prob z = 43.14 3.4480 − = 2.47 .4932 for x = 70, z = 1.78 -.4625 Expected prob: .0307 For < 10: Probability between 10 and the mean, 44.3 = (.0378 + .1145 + .2210 + .1179) = .4913. Probability < 10 = .5000 - .4913 = .0087 For > 80: Probability between 80 and the mean, 44.3 = (.0307 + .1004 + .2067 + .1554) = .4932. Probability > 80 = .5000 - .4932 = .0068 Category Prob expected frequency < 10 .0087 .0087(129) = 0.99 10-20 .0378 .0378(129) = 4.88 20-30 .1146 14.78 30-40 .2210 28.51 40-50 .2733 35.26 50-60 .2067 26.66 60-70 .1004 12.95 70-80 .0307 3.96 > 80 .0068 0.88 Due to the small sizes of expected frequencies, category < 10 is folded into 10-20 and >80 into 70-80.
  • 8.
    Chapter 12: Analysisof Categorical Data 8 Category fo fe 0 2 0 )( f ff e− 10-20 6 5.87 .003 20-30 14 14.78 .041 30-40 29 28.51 .008 40-50 38 32.26 .213 50-60 25 26.66 .103 60-70 10 12.95 .672 70-80 7 4.84 .964 2.004 Calculated ∑ − = e e f ff 2 02 )( χ = 2.004 df = k - 3 = 7 - 3 = 4, α = .05 χ2 .05,1 = 9.48773 Since the observed χ2 = 2.004 > χ2 .05,4 = 9.48773, the decision is to fail to reject the null hypothesis. There is not enough evidence to declare that the observed frequencies are not normally distributed. 12.5 Definition fo Exp.Prop. fe 0 2 0 )( f ff e− Happiness 42 .39 227(.39)= 88.53 24.46 Sales/Profit 95 .12 227(.12)= 27.24 168.55 Helping Others 27 .18 40.86 4.70 Achievement/ Challenge 63 .31 70.34 0.77 227 198.48 Ho: The observed frequencies are distributed the same as the expected frequencies. Ha: The observed frequencies are not distributed the same as the expected frequencies. Observed χ2 = 198.48 df = k – 1 = 4 – 1 = 3, α = .05 χ2 .05,3 = 7.81473
  • 9.
    Chapter 12: Analysisof Categorical Data 9 Since the observed χ2 = 198.48 > χ2 .05,3 = 7.81473, the decision is to reject the null hypothesis. The observed frequencies for men are not distributed the same as the expected frequencies which are based on the responses of women. 12.6 Age fo Prop. from survey fe 0 2 0 )( f ff e− 10-14 22 .09 (.09)(212)=19.08 0.45 15-19 50 .23 (.23)(212)=48.76 0.03 20-24 43 .22 46.64 0.28 25-29 29 .14 29.68 0.02 30-34 19 .10 21.20 0.23 > 35 49 .22 46.64 0.12 212 1.13 Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies. Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies. α = .01, df = k - 1 = 6 - 1 = 5 χ2 .01,5 = 15.0863 The observed χ2 = 1.13 Since the observed χ2 = 1.13 < χ2 .01,5 = 15.0863, the decision is to fail to reject the null hypothesis. There is not enough evidence to declare that the distribution of observed frequencies is different from the distribution of expected frequencies.
  • 10.
    Chapter 12: Analysisof Categorical Data 10 12.7 Age fo m fm fm2 10-20 16 15 240 3,600 20-30 44 25 1,100 27,500 30-40 61 35 2,135 74,725 40-50 56 45 2,520 113,400 50-60 35 55 1,925 105,875 60-70 19 65 1,235 80,275 231 Σfm = 9,155 Σfm2 = 405,375 231 155,9 == ∑ n fM x = 39.63 s = 230 231 )155,9( 375,405 1 )( 22 2 − = − −∑ ∑ n n fM fM = 13.6 Ho: The observed frequencies are normally distributed. Ha: The observed frequencies are not normally distributed. For Category 10-20 Prob z = 6.13 63.3910 − = -2.18 .4854 z = 6.13 63.3920 − = -1.44 -.4251 Expected prob. .0603 For Category 20-30 Prob for x = 20, z = -1.44 .4251 z = 6.13 63.3930 − = -0.71 -.2611 Expected prob. .1640 For Category 30-40 Prob for x = 30, z = -0.71 .2611 z = 6.13 63.3940 − = 0.03 +.0120 Expected prob. .2731
  • 11.
    Chapter 12: Analysisof Categorical Data 11 For Category 40-50 Prob z = 6.13 63.3950 − = 0.76 .2764 for x = 40, z = 0.03 -.0120 Expected prob. .2644 For Category 50-60 Prob z = 6.13 63.3960 − = 1.50 .4332 for x = 50, z = 0.76 -.2764 Expected prob. .1568 For Category 60-70 Prob z = 6.13 63.3970 − = 2.23 .4871 for x = 60, z = 1.50 -.4332 Expected prob. .0539 For < 10: Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854 Probability < 10 = .5000 - .4854 = .0146 For > 70: Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871 Probability > 70 = .5000 - .4871 = .0129 Age Probability fe < 10 .0146 (.0146)(231) = 3.37 10-20 .0603 (.0603)(231) = 13.93 20-30 .1640 37.88 30-40 .2731 63.09 40-50 .2644 61.08
  • 12.
    Chapter 12: Analysisof Categorical Data 12 50-60 .1568 36.22 60-70 .0539 12.45 > 70 .0129 2.98 Categories < 10 and > 70 are less than 5. Collapse the < 10 into 10-20 and > 70 into 60-70. Age fo fe 0 2 0 )( f ff e− 10-20 16 17.30 0.10 20-30 44 37.88 0.99 30-40 61 63.09 0.07 40-50 56 61.08 0.42 50-60 35 36.22 0.04 60-70 19 15.43 0.83 2.45 df = k - 3 = 6 - 3 = 3, α = .05 χ2 .05,3 = 7.81473 Observed χ2 = 2.45 Since the observed χ2 < χ2 .05,3 = 7.81473, the decision is to fail to reject the null hypothesis. There is no reason to reject that the observed frequencies are normally distributed. 12.8 Number f (f)⋅ (number) 0 18 0 1 28 28 2 47 94 3 21 63 4 16 64 5 11 55 6 or more 9 54 Σf = 150 Σf⋅(number) = 358 λ = 150 358 = ⋅ ∑ ∑ f numberf = 2.4 Ho: The observed frequencies are Poisson distributed. Ha: The observed frequencies are not Poisson distributed.
  • 13.
    Chapter 12: Analysisof Categorical Data 13 Number Probability fe 0 .0907 (.0907)(150 = 13.61 1 .2177 (.2177)(150) = 32.66 2 .2613 39.20 3 .2090 31.35 4 .1254 18.81 5 .0602 9.03 6 or more .0358 5.36 fo fe 0 2 0 )( f ff e− 18 13.61 1.42 28 32.66 0.66 47 39.66 1.55 21 31.35 3.42 16 18.81 0.42 11 9.03 0.43 9 5.36 2.47 10.37 The observed χ2 = 10.27 α = .01, df = k – 2 = 7 – 2 = 5, χ2 .01,5 = 15.0863 Since the observed χ2 = 10.27 < χ2 .01,5 = 15.0863, the decision is to fail to reject the null hypothesis. There is not enough evidence to reject the claim that the observed frequencies are Poisson distributed. 12.9 H0: p = .28 n = 270 x = 62 Ha: p ≠ .28 fo fe 0 2 0 )( f ff e− Spend More 62 270(.28) = 75.6 2.44656 Don't Spend More 208 270(.72) = 194.4 0.95144 Total 270 270.0 3.39800
  • 14.
    Chapter 12: Analysisof Categorical Data 14 The observed value of χ2 is 3.398 α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1 χ2 .025,1 = 5.02389 Since the observed χ2 = 3.398 < χ2 .025,1 = 5.02389, the decision is to fail to reject the null hypothesis. 12.10 H0: p = .30 n = 180 x= 42 Ha: p ≠ .30 f0 fe 0 2 0 )( f ff e− Provide 42 180(.30) = 54 2.6666 Don't Provide 138 180(.70) = 126 1.1429 Total 180 180 3.8095 The observed value of χ2 is 3.8095 α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1 χ2 .025,1 = 5.02389 Since the observed χ2 = 3.8095 < χ2 .025,1 = 5.02389, the decision is to fail to reject the null hypothesis.
  • 15.
    Chapter 12: Analysisof Categorical Data 15 12.11 Variable Two Variable One 203 326 529 17868 110 271 436 707 Ho: Variable One is independent of Variable Two. Ha: Variable One is not independent of Variable Two. e11 = 707 )271)(529( = 202.77 e12 = 707 )436)(529( = 326.23 e21 = 707 )178)(271( = 68.23 e22 = 707 )178)(436( = 109.77 Variable Two Variable One (202.77) 203 (326.23) 326 529 178 (68.23) 68 (109.77) 110 271 436 707 χ2 = 77.202 )77.202203( 2 − + 23.326 )23.326326( 2 − + 23.68 )23.668( 2 − + 77.109 )77.109110( 2 − = .00 + .00 + .00 + .00 = 0.00 α = .05, df = (c-1)(r-1) = (2-1)(2-1) = 1 χ2 .05,1 = 3.84146 Since the observed χ2 = 0.00 < χ2 .05,1 = 3.84146, the decision is to fail to reject the null hypothesis. Variable One is independent of Variable Two.
  • 16.
    Chapter 12: Analysisof Categorical Data 16 12.12 Variable Two Variable One 24 13 47 58 142 58393 59 187 244 117 72 234 302 725 Ho: Variable One is independent of Variable Two. Ha: Variable One is not independent of Variable Two. e11 = 725 )117)(142( = 22.92 e12 = 725 )72)(142( = 14.10 e13 = 725 )234)(142( = 45.83 e14 = 725 )302)(142( = 59.15 e21 = 725 )117)(583( = 94.08 e22 = 725 )72)(583( = 57.90 e23 = 725 )234)(583( = 188.17 e24 = 725 )302)(583( = 242.85 Variable Two Variable One (22.92) 24 (14.10) 13 (45.83) 47 (59.15) 58 142 583 (94.08) 93 (57.90) 59 (188.17) 187 (242.85) 244 117 72 234 302 725 χ2 = 92.22 )92.2224( 2 − + 10.14 )10.1413( 2 − + 83.45 )83.4547( 2 − + 15.59 )15.5958( 2 − + 08.94 )08.9493( 2 − + 90.57 )90.5759( 2 − + 17.188 )17.188188( 2 − + 85.242 )85.242244( 2 − = .05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24
  • 17.
    Chapter 12: Analysisof Categorical Data 17 α = .01, df = (c-1)(r-1) = (4-1)(2-1) = 3 χ2 .01,3 = 11.3449 Since the observed χ2 = 0.24 < χ2 .01,3 = 11.3449, the decision is to fail to reject the null hypothesis. Variable One is independent of Variable Two. 12.13 Social Class Number of Children Lower Middle Upper 0 1 2 or 3 >3 7 18 6 31 70 189 108 9 38 23 34 97 58 47 31 30 97 184 117 398 Ho: Social Class is independent of Number of Children. Ha: Social Class is not independent of Number of Children. e11 = 398 )97)(31( = 7.56 e31 = 398 )97)(189( = 46.06 e12 = 398 )184)(31( = 14.3 e32 = 398 )184)(189( = 87.38 e13 = 398 )117)(31( = 9.11 e33 = 398 )117)(189( = 55.56 e21 = 398 )97)(70( = 17.06 e41 = 398 )97)(108( = 26.32 e22 = 398 )184)(70( = 32.36 e42 = 398 )184)(108( = 49.93 e23 = 398 )117)(70( = 20.58 e43 = 398 )117)(108( = 31.75
  • 18.
    Chapter 12: Analysisof Categorical Data 18 Social Class Number of Children Lower Middle Upper 0 1 2 or 3 >3 (7.56) 7 (14.33) 18 (9.11) 6 31 70 189 108 (17.06) 9 (32.36) 38 (20.58) 23 (46.06) 34 (87.38) 97 (55.56) 58 (26.32) 47 (49.93) 31 (31.75) 30 97 184 117 398 χ2 = 56.7 )56.77( 2 − + 33.14 )33.1418( 2 − + 11.9 )11.96( 2 − + 06.17 )06.179( 2 − + 36.32 )36.3238( 2 − + 58.20 )58.2023( 2 − + 06.46 )06.4634( 2 − + 38.87 )38.8797( 2 − + 56.55 )56.5558( 2 − + 32.26 )32.2647( 2 − + 93.49 )93.4931( 2 − + 75.31 )75.3130( 2 − = .04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 + 7.18 + .10 = 34.97 α = .05, df = (c-1)(r-1) = (3-1)(4-1) = 6 χ2 .05,6 = 12.5916 Since the observed χ2 = 34.97 > χ2 .05,6 = 12.5916, the decision is to reject the null hypothesis. Number of children is not independent of social class.
  • 19.
    Chapter 12: Analysisof Categorical Data 19 12.14 Type of Music Preferred Region Rock R&B Coun Clssic 195 235 202 632 NE 140 32 5 18 S 134 41 52 8 W 154 27 8 13 428 100 65 39 Ho: Type of music preferred is independent of region. Ha: Type of music preferred is not independent of region. e11 = 632 )428)(195( = 132.6 e23 = 632 )65)(235( = 24.17 e12 = 632 )100)(195( = 30.85 e24 = 632 )39)(235( = 14.50 e13 = 632 )65)(195( = 20.06 e31 = 632 )428)(202( = 136.80 e14 = 632 )39)(195( = 12.03 e32 = 632 )100)(202( = 31.96 e21 = 632 )428)(235( = 159.15 e33 = 632 )65)(202( = 20.78 e22 = 632 )100)(235( = 37.18 e34 = 632 )39)(202( = 12.47 Type of Music Preferred Region Rock R&B Coun Clssic 195 235 202 632 NE (132.06) 140 (30.85) 32 (20.06) 5 (12.03) 18 S (159.15) 134 (37.18) 41 (24.17) 52 (14.50) 8 W (136.80) 154 (31.96) 27 (20.78) 8 (12.47) 13 428 100 65 39
  • 20.
    Chapter 12: Analysisof Categorical Data 20 χ2 = 06.132 )06.132141( 2 − + 85.30 )85.3032( 2 − + 06.20 )06.205( 2 − + 03.12 )03.1218( 2 − + 15.159 )15.159134( 2 − + 18.37 )18.3741( 2 − + 17.24 )17.2452( 2 − + 50.14 )50.148( 2 − + 80.136 )80.136154( 2 − + 96.31 )96.3127( 2 − + 78.20 )78.208( 2 − + 47.12 )47.1213( 2 − = .48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 + 7.86 + .02 = 64.91 α = .01, df = (c-1)(r-1) = (4-1)(3-1) = 6 χ2 .01,6 = 16.8119 Since the observed χ2 = 64.91 > χ2 .01,6 = 16.8119, the decision is to reject the null hypothesis. Type of music preferred is not independent of region of the country. 12.15 Transportation Mode Industry Air Train Truck 85 35 120 Publishing 32 12 41 Comp.Hard. 5 6 24 37 18 65 H0: Transportation Mode is independent of Industry. Ha: Transportation Mode is not independent of Industry. e11 = 120 )37)(85( = 26.21 e21 = 120 )37)(35( = 10.79 e12 = 120 )18)(85( = 12.75 e22 = 120 )18)(35( = 5.25 e13 = 120 )65)(85( = 46.04 e23 = 120 )65)(35( = 18.96
  • 21.
    Chapter 12: Analysisof Categorical Data 21 Transportation Mode Industry Air Train Truck 85 35 120 Publishing (26.21) 32 (12.75) 12 (46.04) 41 Comp.Hard. (10.79) 5 (5.25) 6 (18.96) 24 37 18 65 χ2 = 21.26 )21.2632( 2 − + 75.12 )75.1212( 2 − + 04.46 )04.4641( 2 − + 79.10 )79.105( 2 − + 25.5 )25.56( 2 − + 96.18 )96.1824( 2 − = 1.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43 α = .05, df = (c-1)(r-1) = (3-1)(2-1) = 2 χ2 .05,2 = 5.99147 Since the observed χ2 = 6.431 > χ2 .05,2 = 5.99147, the decision is to reject the null hypothesis. Transportation mode is not independent of industry. 12.16 Number of Bedrooms Number of Stories < 2 3 > 4 274 575 1 116 101 57 2 90 325 160 206 426 217 849 H0: Number of Stories is independent of number of bedrooms. Ha: Number of Stories is not independent of number of bedrooms. e11 = 849 )206)(274( = 66.48 e21 = 849 )206)(575( = 139.52 e12 = 849 )426)(274( = 137.48 e22 = 849 )426)(575( = 288.52
  • 22.
    Chapter 12: Analysisof Categorical Data 22 e13 = 849 )217)(274( = 70.03 e23 = 849 )217)(575( = 146.97 χ2 = 52.139 )52.13990( 2 − + 48.137 )48.137101( 2 − + 03.70 )03.7057( 2 − + 52.139 )52.13990( 2 − + 52.288 )52.288325( 2 − + 97.146 )97.146160( 2 − = χ2 = 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34 α = .10 df = (c-1)(r-1) = (3-1)(2-1) = 2 χ2 .10,2 = 4.60517 Since the observed χ2 = 72.34 > χ2 .10,2 = 4.60517, the decision is to reject the null hypothesis. Number of stories is not independent of number of bedrooms. 12.17 Mexican Citizens Type of Store Yes No 41 35 30 60 Dept. 24 17 Disc. 20 15 Hard. 11 19 Shoe 32 28 87 79 166 Ho: Citizenship is independent of store type Ha: Citizenship is not independent of store type e11 = 166 )87)(41( = 21.49 e31 = 166 )87)(30( = 15.72 e12 = 166 )79)(41( = 19.51 e32 = 166 )79)(30( = 14.28 e21 = 166 )87)(35( = 18.34 e41 = 166 )87)(60( = 31.45
  • 23.
    Chapter 12: Analysisof Categorical Data 23 e22 = 166 )79)(35( = 16.66 e42 = 166 )79)(60( = 28.55 Mexican Citizens Type of Store Yes No 41 35 30 60 Dept. (21.49) 24 (19.51) 17 Disc. (18.34) 20 (16.66) 15 Hard. (15.72) 11 (14.28) 19 Shoe (31.45) 32 (28.55) 28 87 79 166 χ2 = 49.21 )49.2124( 2 − + 51.19 )51.1917( 2 − + 34.18 )34.1820( 2 − + 66.16 )66.1615( 2 − + 72.15 )72.1511( 2 − + 28.14 )28.1419( 2 − + 45.31 )45.3132( 2 − + 55.28 )55.2828( 2 − = .29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93 α = .05, df = (c-1)(r-1) = (2-1)(4-1) = 3 χ2 .05,3 = 7.81473 Since the observed χ2 = 3.93 < χ2 .05,3 = 7.81473, the decision is to fail to reject the null hypothesis. Citizenship is independent of type of store.
  • 24.
    Chapter 12: Analysisof Categorical Data 24 12.18 α = .01, k = 7, df = 6 H0: The observed distribution is the same as the expected distribution Ha: The observed distribution is not the same as the expected distribution Use: ∑ − = e e f ff 2 02 )( χ critical χ2 .01,7 = 18.4753 fo fe (f0-fe)2 0 2 0 )( f ff e− 214 206 64 0.311 235 232 9 0.039 279 268 121 0.451 281 284 9 0.032 264 268 16 0.060 254 232 484 2.086 211 206 25 0.121 3.100 ∑ − = e e f ff 2 02 )( χ = 3.100 Since the observed value of χ2 = 3.1 < χ2 .01,7 = 18.4753, the decision is to fail to reject the null hypothesis. The observed distribution is not different from the expected distribution. 12.19 Variable 2 Variable 1 12 23 21 56 8 17 20 45 7 11 18 36 27 51 59 137 e11 = 11.00 e12 = 20.85 e13 = 24.12 e21 = 8.87 e22 = 16.75 e23 = 19.38
  • 25.
    Chapter 12: Analysisof Categorical Data 25 e31 = 7.09 e32 = 13.40 e33 = 15.50 χ2 = 04.11 )04.1112( 2 − + 85.20 )85.2023( 2 − + 12.24 )12.2421( 2 − + 87.8 )87.88( 2 − + 75.16 )75.1617( 2 − + 38.19 )38.1920( 2 − + 09.7 )09.77( 2 − + 40.13 )40.1311( 2 − + 50.15 )50.1518( 2 − = .084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652 df = (c-1)(r-1) = (2)(2) = 4 α = .05 χ2 .05,4 = 9.48773 Since the observed value of χ2 = 1.652 < χ2 .05,4 = 9.48773, the decision is to fail to reject the null hypothesis. 12.20 Location NE W S Customer Industrial 230 115 68 413 Retail 185 143 89 417 415 258 157 830 e11 = 830 )415)(413( = 206.5 e21 = 830 )415)(417( = 208.5 e12 = 830 )258)(413( = 128.38 e22 = 830 )258)(417( = 129.62 e13 = 830 )157)(413( = 78.12 e23 = 830 )157)(417( = 78.88
  • 26.
    Chapter 12: Analysisof Categorical Data 26 Location NE W S Customer Industrial (206.5) 230 (128.38) 115 (78.12) 68 413 Retail (208.5) 185 (129.62) 143 (78.88) 89 417 415 258 157 830 χ2 = 5.206 )5.206230( 2 − + 38.128 )38.128115( 2 − + 12.78 )12.7868( 2 − + 5.208 )5.208185( 2 − + 62.129 )62.129143( 2 − + 88.78 )88.7889( 2 − = 2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70 α = .10 and df = (c - 1)(r - 1) = (3 - 1)(2 - 1) = 2 χ2 .10,2 = 4.60517 Since the observed χ2 = 10.70 > χ2 .10,2 = 4.60517, the decision is to reject the null hypothesis. Type of customer is not independent of geographic region. 12.21 Cookie Type fo Chocolate Chip 189 Peanut Butter 168 Cheese Cracker 155 Lemon Flavored 161 Chocolate Mint 216 Vanilla Filled 165 Σfo = 1,054 Ho: Cookie Sales is uniformly distributed across kind of cookie. Ha: Cookie Sales is not uniformly distributed across kind of cookie. If cookie sales are uniformly distributed, then fe = 6 054,1 . 0 = ∑ kindsno f = 175.67
  • 27.
    Chapter 12: Analysisof Categorical Data 27 fo fe 0 2 0 )( f ff e− 189 175.67 1.01 168 175.67 0.33 155 175.67 2.43 161 175.67 1.23 216 175.67 9.26 165 175.67 0.65 14.91 The observed χ2 = 14.91 α = .05 df = k - 1 = 6 - 1 = 5 χ2 .05,5 = 11.0705 Since the observed χ2 = 14.91 > χ2 .05,5 = 11.0705, the decision is to reject the null hypothesis. Cookie Sales is not uniformly distributed by kind of cookie. 12.22 Gender M F Bought Car Y 207 65 272 N 811 984 1,795 1,018 1,049 2,067 Ho: Purchasing a car or not is independent of gender. Ha: Purchasing a car or not is not independent of gender. e11 = 067,2 )018,1)(272( = 133.96 e12 = 067,2 )049,1)(27( = 138.04 e21 = 067,2 )018,1)(795,1( = 884.04 e22 = 067,2 )049,1)(795,1( = 910.96
  • 28.
    Chapter 12: Analysisof Categorical Data 28 Gender M F Bought Car Y (133.96) 207 (138.04) 65 272 N (884.04) 811 (910.96) 984 1,795 1,018 1,049 2,067 χ2 = 96.133 )96.133207( 2 − + 04.138 )04.13865( 2 − + 04.884 )04.884811( 2 − + 96.910 )96.910984( 2 − = 39.82 + 38.65 + 6.03 + 5.86 = 90.36 α = .05 df = (c-1)(r-1) = (2-1)(2-1) = 1 χ2 .05,1 = 3.841 Since the observed χ2 = 90.36 > χ2 .05,1 = 3.841, the decision is to reject the null hypothesis. Purchasing a car is not independent of gender. 12.23 Arrivals fo (fo)(Arrivals) 0 26 0 1 40 40 2 57 114 3 32 96 4 17 68 5 12 60 6 8 48 Σfo = 192 Σ(fo)(arrivals) = 426 λ = 192 426))(( 0 0 = ∑ ∑ f arrivalsf = 2.2 Ho: The observed frequencies are Poisson distributed. Ha: The observed frequencies are not Poisson distributed.
  • 29.
    Chapter 12: Analysisof Categorical Data 29 Arrivals Probability fe 0 .1108 (.1108)(192) = 21.27 1 .2438 (.2438)(192) = 46.81 2 .2681 51.48 3 .1966 37.75 4 .1082 20.77 5 .0476 9.14 6 .0249 4.78 fo fe 0 2 0 )( f ff e− 26 21.27 1.05 40 46.81 2.18 57 51.48 0.59 32 37.75 0.88 17 20.77 0.68 12 9.14 0.89 8 4.78 2.17 8.44 Observed χ2 = 8.44 α = .05 df = k - 2 = 7 - 2 = 5 χ2 .05,5 = 11.0705 Since the observed χ2 = 8.44 < χ2 .05,5 = 11.0705, the decision is to fail to reject the null hypothesis. There is not enough evidence to reject the claim that the observed frequency of arrivals is Poisson distributed.
  • 30.
    Chapter 12: Analysisof Categorical Data 30 12.24 Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies. Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies. Soft Drink fo proportions fe 0 2 0 )( f ff e− Classic Coke 361 .206 (.206)(1726) = 355.56 0.08 Pepsi 272 .145 (.145)(1726) = 250.27 1.89 Diet Coke 192 .085 146.71 13.98 Mt. Dew 121 .063 108.74 1.38 Dr. Pepper 94 .059 101.83 0.60 Sprite 102 .062 107.01 0.23 Others 584 .380 655.86 7.87 ∑fo = 1,726 26.03 Calculated χ2 = 26.03 α = .05 df = k - 1 = 7 - 1 = 6 χ2 .05,6 = 12.5916 Since the observed χ2 = 26.03 > χ2 .05,6 = 12.5916, the decision is to reject the null hypothesis. The observed frequencies are not distributed the same as the expected frequencies from the national poll. 12.25 Position Manager Programmer Operator Systems Analyst Years 0-3 6 37 11 13 67 4-8 28 16 23 24 91 > 8 47 10 12 19 88 81 63 46 56 246
  • 31.
    Chapter 12: Analysisof Categorical Data 31 e11 = 246 )81)(67( = 22.06 e23 = 246 )46)(91( = 17.02 e12 = 246 )63)(67( = 17.16 e24 = 246 )56)(91( = 20.72 e13 = 246 )46)(67( = 12.53 e31 = 246 )81)(88( = 28.98 e14 = 246 )56)(67( = 15.25 e32 = 246 )63)(88( = 22.54 e21 = 246 )81)(91( = 29.96 e33 = 246 )46)(88( = 16.46 e22 = 246 )63)(91( = 23.30 e34 = 246 )56)(88( = 20.03 Position Manager Programmer Operator Systems Analyst Years 0-3 (22.06) 6 (17.16) 37 (12.53) 11 (15.25) 13 67 4-8 (29.96) 28 (23.30) 16 (17.02) 23 (20.72) 24 91 > 8 (28.98) 47 (22.54) 10 (16.46) 12 (20.03) 19 88 81 63 46 56 246 χ2 = 06.22 )06.226( 2 − + 16.17 )16.1737( 2 − + 53.12 )53.1211( 2 − + 25.15 )25.1513( 2 − + 96.29 )96.2928( 2 − + 30.23 )30.2316( 2 − + 02.17 )02.1723( 2 − + 72.20 )72.2024( 2 − + 98.28 )98.2847( 2 − + 54.22 )54.2210( 2 − + 46.16 )46.1612( 2 − + 03.20 )03.2019( 2 − = 11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 + 1.21 + .05 = 59.63
  • 32.
    Chapter 12: Analysisof Categorical Data 32 α = .01 df = (c-1)(r-1) = (4-1)(3-1) = 6 χ2 .01,6 = 16.8119 Since the observed χ2 = 59.63 > χ2 .01,6 = 16.8119, the decision is to reject the null hypothesis. Position is not independent of number of years of experience. 12.26 H0: p = .43 n = 315 α =.05 Ha: p ≠ .43 x = 120 α/2 = .025 fo fe 0 2 0 )( f ff e− More Work, More Business 120 (.43)(315) = 135.45 1.76 Others 195 (.57)(315) = 179.55 1.33 Total 315 315.00 3.09 The calculated value of χ2 is 3.09 α = .05 and α/2 = .025 df = k - 1 = 2 - 1 = 1 χ2 .025,1 = 5.02389 Since χ2 = 3.09 < χ2 .025,1 = 5.02389, the decision is to fail to reject the null hypothesis. 12.27 Type of College or University Community College Large University Small College Number of Children 0 25 178 31 234 1 49 141 12 202 2 31 54 8 93 >3 22 14 6 42 127 387 57 571 Ho: Number of Children is independent of Type of College or University. Ha: Number of Children is not independent of Type of College or University.
  • 33.
    Chapter 12: Analysisof Categorical Data 33 e11 = 571 )127)(234( = 52.05 e31 = 571 )127)(93( = 20.68 e12 = 571 )387)(234( = 158.60 e32 = 571 )387)(193( = 63.03 e13 = 571 )57)(234( = 23.36 e33 = 571 )57)(93( = 9.28 e21 = 571 )127)(202( = 44.93 e41 = 571 )127)(42( = 9.34 e22 = 571 )387)(202( = 136.91 e42 = 571 )387)(42( = 28.47 e23 = 571 )57)(202( = 20.16 e43 = 571 )57)(42( = 4.19 Type of College or University Community College Large University Small College Number of Children 0 (52.05) 25 (158.60) 178 (23.36) 31 234 1 (44.93) 49 (136.91) 141 (20.16) 12 202 2 (20.68) 31 (63.03) 54 (9.28) 8 93 >3 (9.34) 22 (28.47) 14 (4.19) 6 42 127 387 57 571 χ2 = 05.52 )05.5225( 2 − + 6.158 )6.158178( 2 − + 36.23 )36.2331( 2 − + 93.44 )93.4449( 2 − + 91.136 )91.136141( 2 − + 16.20 )16.2012( 2 − + 68.20 )68.2031( 2 − + 03.63 )03.6354( 2 − + 28.9 )28.98( 2 − + 34.9 )34.922( 2 − + 47.28 )47.2814( 2 − + 19.4 )19.46( 2 − =
  • 34.
    Chapter 12: Analysisof Categorical Data 34 14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 + 17.16 + 7.35 + 0.78 = 54.63 α = .05, df= (c - 1)(r - 1) = (3 - 1)(4 - 1) = 6 χ2 .05,6 = 12.5916 Since the observed χ2 = 54.63 > χ2 .05,6 = 12.5916, the decision is to reject the null hypothesis. Number of children is not independent of type of College or University. 12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square goodness-of-fit test indicates that there is a significant difference between the observed frequencies and the expected frequencies. The distribution of responses to the question are not the same for adults between 21 and 30 years of age as they are to others. Marketing and sales people might reorient their 21 to 30 year old efforts away from home improvement and pay more attention to leisure travel/vacation, clothing, and home entertainment. 12.29 The observed chi-square value for this test of independence is 5.366. The associated p-value of .252 indicates failure to reject the null hypothesis. There is not enough evidence here to say that color choice is dependent upon gender. Automobile marketing people do not have to worry about which colors especially appeal to men or to women because car color is independent of gender. In addition, design and production people can determine car color quotas based on other variables.