9 chi square

1Slide
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
John Loucks
St. Edward’s
University
...
...
...
..
SLIDES .BY

2Slide
Chapter 12
Comparing Multiple Proportions,
Test of Independence and Goodness of Fit
 Testing the Equality of Population Proportions
for Three or More Populations
 Goodness of Fit Test
 Test of Independence

3Slide
Comparing Multiple Proportions,
Test of Independence and Goodness of Fit
 In this chapter we introduce three additional
hypothesis-testing procedures.
 The test statistic and the distribution used are based
on the chi-square (c2) distribution.
 In all cases, the data are categorical.

4Slide
Using the notation
p1 = population proportion for population 1
p2 = population proportion for population 2
and pk = population proportion for population k
H0: p1 = p2 = . . . = pk
Ha: Not all population proportions are equal
Testing the Equality of Population Proportions
The hypotheses for the equality of population
proportions for k > 3 populations are as follows:

5Slide
 If H0 cannot be rejected, we cannot detect a
difference among the k population proportions.
 If H0 can be rejected, we can conclude that not all k
population proportions are equal.
 Further analyses can be done to conclude which
population proportions are significantly different
from others.

6Slide
 Example: Finger Lakes Homes
Finger Lakes Homes manufactures three models of
prefabricated homes, a two-story colonial, a log cabin,
and an A-frame. To help in product-line planning,
management would like to compare the customer
satisfaction with the three home styles.
p1 = proportion likely to repurchase a Colonial
for the population of Colonial owners
p2 = proportion likely to repurchase a Log Cabin
for the population of Log Cabin owners
p3 = proportion likely to repurchase an A-Frame
for the population of A-Frame owners

7Slide
We begin by taking a sample of owners from each of
the three populations.
Each sample contains categorical data indicating
whether the respondents are likely or not likely to
repurchase the home.

8Slide
Home Owner
Colonial Log A-Frame Total
Likely to Yes 100 81 83 264
Repurchase No 35 20 41 96
Total 135 101 124 360
 Observed Frequencies (sample results)

9Slide
Next, we determine the expected frequencies under
the assumption H0 is correct.
If a significant difference exists between the observed
and expected frequencies, H0 can be rejected.
(Row Total)(Column Total)
Total Sample Size
ij
i j
e 
Expected Frequencies
Under the Assumption H0 is True

10Slide
Home Owner
Colonial Log A-Frame Total
Likely to Yes 99.00 74.07 90.93 264
Repurchase No 36.00 26.93 33.07 96
Total 135 101 124 360
 Expected Frequencies (computed)

11Slide
2
2
( )ij ij
i j ij
f e
e
c

 
Next, compute the value of the chi-square test statistic.
Note: The test statistic has a chi-square distribution with
k – 1 degrees of freedom, provided the expected
frequency is 5 or more for each cell.
fij = observed frequency for the cell in row i and column j
eij = expected frequency for the cell in row i and column j
under the assumption H0 is true
where:

12Slide
Obs. Exp. Sqd. Sqd. Diff. /
Likely to Home Freq. Freq. Diff. Diff. Exp. Freq.
Repurch. Owner fij eij (fij - eij) (fij - eij)2
(fij - eij)2
/eij
Yes Colonial 97 97.50 -0.50 0.2500 0.0026
Yes Log Cab. 83 72.94 10.06 101.1142 1.3862
Yes A-Frame 80 89.56 -9.56 91.3086 1.0196
No Colonial 38 37.50 0.50 0.2500 0.0067
No Log Cab. 18 28.06 -10.06 101.1142 3.6041
No A-Frame 44 34.44 9.56 91.3086 2.6509
Total 360 360 c2
= 8.6700
 Computation of the Chi-Square Test Statistic.

13Slide
where  is the significance level and
there are k - 1 degrees of freedom
p-value approach:
Critical value approach:
Reject H0 if p-value < 
 Rejection Rule
2 2
c cReject H0 if

14Slide
 Rejection Rule (using  = .05)
c2
5.991
Do Not Reject H0 Reject H0
With  = .05 and
k - 1 = 3 - 1 = 2
degrees of freedom
Reject H0 if p-value < .05 or c2 > 5.991

15Slide
 Conclusion Using the p-Value Approach
The p-value <  . We can reject the null hypothesis.
Because c2 = 8.670 is between 9.210 and 7.378, the
area in the upper tail of the distribution is between
.01 and .025.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 2) 4.605 5.991 7.378 9.210 10.597
Actual p-value is .0131

16Slide
We have concluded that the population proportions
for the three populations of home owners are not
equal.
To identify where the differences between
population proportions exist, we will rely on a
multiple comparisons procedure.

17Slide
Multiple Comparisons Procedure
We begin by computing the three sample proportions.
We will use a multiple comparison procedure known
as the Marascuillo procedure.
1
100Colonial: .741
135
p  
2
81Log Cabin: .802
101
p  
3
83A-Frame: .669
124
p  

18Slide
 Marascuillo Procedure
We compute the absolute value of the pairwise
difference between sample proportions.
1 2 .741 .802 .061p p   
1 3 .741 .669 .072p p   
2 3 .802 .669 .133p p   
Colonial and Log Cabin:
Colonial and A-Frame:
Log Cabin and A-Frame:

19Slide
 Critical Values for the Marascuillo Pairwise Comparison
For each pairwise comparison compute a critical value
as follows:
2
; 1
(1 )(1 ) j ji i
ij k
i j
p pp p
CV
n n
c 

 
For  = .05 and k = 3: c2 = 5.991

20Slide
i jp p CVij
Significant if
Pairwise Comparison i jp p > CVij
Colonial vs. Log Cabin
Colonial vs. A-Frame
Log Cabin vs. A-Frame
.061
.072
.133
.0923
.0971
.1034
Not Significant
Not Significant
Significant
 Pairwise Comparison Tests

21Slide
Test of Independence
e
i j
ij 
(Row Total)(Column Total)
Sample Size
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed
frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.
H0: The column variable is independent of
the row variable
Ha: The column variable is not independent
of the row variable

22Slide
c2
2



( )f e
e
ij ij
ijji
5. Determine the rejection rule.
Reject H0 if p -value <  or .
2 2
c c
4. Compute the test statistic.
where  is the significance level and,
with n rows and m columns, there are
(n - 1)(m - 1) degrees of freedom.

23Slide
Each home sold by Finger Lakes Homes can be
classified according to price and to style. Finger
Lakes’ manager would like to determine if the
price of the home and the style of the home are
independent variables.
 Example: Finger Lakes Homes (B)

24Slide
Price Colonial Log Split-Level A-Frame
The number of homes sold for each model and
price for the past two years is shown below. For
convenience, the price of the home is listed as either
$99,000 or less or more than $99,000.
> $99,000 12 14 16 3
< $99,000 18 6 19 12
 Example: Finger Lakes Homes (B)

25Slide
 Hypotheses
H0: Price of the home is independent of the
style of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased

26Slide
 Expected Frequencies
Price Colonial Log Split-Level A-Frame Total
< $99K
> $99K
Total 30 20 35 15 100
12 14 16 3 45
18 6 19 12 55

27Slide
 Rejection Rule
2
.05 7.815c With  = .05 and (2 - 1)(4 - 1) = 3 d.f.,
Reject H0 if p-value < .05 or c2 > 7.815
c2
2 2 2
18 16 5
16 5
6 11
11
3 6 75
6 75




 
( . )
.
( )
. .
( . )
.
.
= .1364 + 2.2727 + . . . + 2.0833 = 9.149
 Test Statistic

28Slide
Because c2 = 9.145 is between 7.815 and 9.348, the
.05 and .025.
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

29Slide
 Conclusion Using the Critical Value Approach
We reject, at the .05 level of significance,
the assumption that the price of the home is
independent of the style of home that is
purchased.
c2 = 9.145 > 7.815

30Slide
Goodness of Fit Test:
Multinomial Probability Distribution
1. State the null and alternative hypotheses.
H0: The population follows a multinomial
distribution with specified probabilities
for each of the k categories
Ha: The population does not follow a
multinomial distribution with specified
probabilities for each of the k categories

31Slide
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.

32Slide
c2
2
1




( )f e
e
i i
ii
k
4. Compute the value of the test statistic.
Note: The test statistic has a chi-square distribution
with k – 1 df provided that the expected frequencies
are 5 or more for all categories.
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
where:

33Slide
where  is the significance level and
there are k - 1 degrees of freedom
p-value approach:
Critical value approach:
Reject H0 if p-value < 
5. Rejection rule:
2 2
c cReject H0 if

34Slide
Multinomial Distribution Goodness of Fit Test
 Example: Finger Lakes Homes (A)
Finger Lakes Homes manufactures four models of
prefabricated homes, a two-story colonial, a log cabin,
a split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.

35Slide
Split- A-
Model Colonial Log Level Frame
# Sold 30 20 35 15
The number of homes sold of each model for 100
sales over the past two years is shown below.
 Example: Finger Lakes Homes (A)

36Slide
 Hypotheses
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25

37Slide
 Rejection Rule
c2
7.815
Do Not Reject H0 Reject H0
With  = .05 and
k - 1 = 4 - 1 = 3
degrees of freedom
Reject H0 if p-value < .05 or c2 > 7.815.

38Slide
 Expected Frequencies
 Test Statistic
c2
2 2 2 2
30 25
25
20 25
25
35 25
25
15 25
25







( ) ( ) ( ) ( )
e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
= 1 + 1 + 4 + 4
= 10

39Slide
Because c2 = 10 is between 9.348 and 11.345, the
.025 and .01.
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

40Slide
 Conclusion Using the Critical Value Approach
We reject, at the .05 level of significance,
the assumption that there is no home style
preference.
c2 = 10 > 7.815

41Slide
Goodness of Fit Test: Normal Distribution
1. State the null and alternative hypotheses.
3. Compute the expected frequency, ei , for each interval.
(Multiply the sample size by the probability of a
normal random variable being in the interval.
2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval, record the observed frequencies
H0: The population has a normal distribution
Ha: The population does not have a normal distribution

42Slide
4. Compute the value of the test statistic.
c2
2
1




( )f e
e
i i
ii
k
c2
2
1




( )f e
e
i i
ii
k
5. Reject H0 if (where  is the significance level
and there are k - 3 degrees of freedom).
2 2
c c

43Slide
 Example: IQ Computers
IQ Computers (one better than HP?) manufactures
and sells a general purpose microcomputer. As part
of a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability
distribution.

44Slide
A simple random sample of 30 of the salespeople
was taken and their numbers of units sold are listed
below.
 Example: IQ Computers
(mean = 71, standard deviation = 18.54)
33 43 44 45 52 52 56 58 63 64
64 65 66 68 70 72 73 73 74 75
83 84 85 86 91 92 94 98 102 105

45Slide
 Hypotheses
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.
H0: The population of number of units sold
has a normal distribution with mean 71
and standard deviation 18.54.

46Slide
 Interval Definition
To satisfy the requirement of an expected
frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.

47Slide
 Interval Definition
Areas
= 1.00/6
= .1667
7153.02
71  .43(18.54) = 63.03 78.97
88.98 = 71 + .97(18.54)

48Slide
 Observed and Expected Frequencies
1
-2
1
0
-1
1
5
5
5
5
5
5
30
6
3
6
5
4
6
30
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98
More than 88.98
i fi ei fi - ei
Total

49Slide
2 2 2 2 2 2
2 (1) ( 2) (1) (0) ( 1) (1)
1.600
5 5 5 5 5 5
c
 
      
 Test Statistic
With  = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
of population parameters estimated), 2
.05 7.815c 
Reject H0 if p-value < .05 or c2 > 7.815.
 Rejection Rule

50Slide
The p-value >  . We cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with
 = 71 and  = 18.54.
Because c2 = 1.600 is between .584 and 6.251 in the
Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
c2 Value (df = 3) .584 6.251 7.815 9.348 11.345

51Slide
End of Chapter 12

9 chi square

Recommended

Recommended

More Related Content

Similar to 9 chi square

Similar to 9 chi square (20)

More from AASHISHSHRIVASTAV1

More from AASHISHSHRIVASTAV1 (20)

Recently uploaded

Recently uploaded (20)

9 chi square