2. • To compare more than two groups we use the following tests:
1. Chi Square Tests
2. ANOVA Tests
Introduction
3
• In Section 7 we learned how to compare the percentages of
two
groups for similarity.
• When you wish to assess or compare the percentages of more
than 2 groups for similarity you will employ the Chi-Square
test.
There are two types of Chi-Square tests:
1. Goodness of Fit – does the distribution of the data fit a
specific pattern?
2. Test of Independence – is there any difference in the
distribution of data for two or more groups?
Chi Square Tests
3. 4
Example:
You sample 100 people in NY arrested for drunk driving last
year and
note their age.
Based on this data can you conclude the proportion of people
arrested
is different or the same for each age group?
Our formal hypothesis we are testing is:
• H0: Drunk drivers are distributed equally across all age
categories
• H1: Drunk drivers are not distributed equally across all age
categories
• To decide whether to accept H0 or reject H0 in favor of H1,
we
calculate the good of fit “Test Statistic” as shown on the next
slide.
Age 16 - 25 26 - 35 36 - 45 46 - 55 56+
Arrested 32 25 19 16 8
Goodness of Fit I
4. 5
Goodness of Fit II
6
Goodness of Fit III
1.
2.
3.
4.
5.
7
• Note that our goodness of fit test does not always have to be
a test for an equal distribution across categories.
• For example reconsidering our drunk driving example we
could have also tested that the distribution is the same as last
years or is the same as in another state.
• Let’s consider another example here.
5. Goodness of Fit IV
8
Example:
In 2000, American Express asked a sample of 400 women over
45
who made financial decisions in the household.
In 2010, they asked 400 women again.
Are the distribution of responses in 2010 the same as in 2000 or
have they shifted?
Goodness of Fit V
9
Goodness of Fit VI
6. 10
• If you are instead interested in determining if there is a
difference in data distributions for different groups, then we
conduct a test of independence
• This is the second type of Chi-Square test
Test of Independence I
11
Example:
A sample of 300 adults were asked if they favor giving high
school
teachers more freedom to punish students for acting violent.
Results are shown by gender:
Do the opinions differ by gender is a natural question here.
Our formal hypothesis we are testing is:
• H0: Opinions do not differ by gender – gender and opinion are
independent of each other
• H1: Opinions do differ by gender – gender and opinion are
dependent on one another
7. To decide whether to accept H0 or reject H0 in favor of H1, we
calculate the
“Test Statistic” associated with the test of independence.
Favor Against No Opinion Total
Men 93 70 12 175
Women 87 32 6 125
Toal 180 102 18 300
Test of Independence II
12
To calculate our test statistics we follow the steps outlined
below
Step 1/2:
– First determine the numbers we would expect in each cell if
they
were independent.
– If they were independent we would expect to see the same
percent of men answering in favor as women, the same percent
against, and so on.
8. – We calculate this as shown below:
Favor Against No Opinion Total
Male
If opinion is independent of
gender then we would expect
60% of the males to be in
favor or .60 x 175 = 105
If opinion is independent
of gender then we would
expect 34% of the males
to be against or .34 x 175
= 59.5
If opinion is
independent of gender
then we would expect
6% of the males to
have no opinion or .06
x 175 = 10.5
9. 175
Female
If opinion is independent of
gender then we would expect
60% of the females to be in
favor or .60 x 125 = 75
If opinion is independent
of gender then we would
expect 34% of the
females to be against or
.34 x 125 = 42.5
If opinion is
independent of gender
then we would expect
6% of the females to
have no opinion or .06
x 125 = 7.5
125
10. Total 180/300 = 60.0% 102/300 = 34.0% 18/300 = 6.0% 300
Test of Independence III
13
Step 2/2: We then calculate our test statistic as follows:
Test of Independence IV
14
Test of Independence V
15
Let’s consider another example:
A sample of students across the US were asked their GPA and if
they
binge drink regularly (defined as 5+ drinks at one time more
than
three times per week).
11. Results are shown below by GPA.
Binge Do Not Binge Total
High GPA 1,260 3,588 4,848
Average GPA 2,157 4,186 6,343
Low GPA 441 497 938
Total 3,858 8,271 12,129
Test of Independence VI
16
Let’s consider another example (continue….):
• Do the percent that binge differ by quality of student?
Our formal hypothesis we are testing is:
– H0: Percent that binge or do not binge are the same by quality
of student –
12. showing independence
– H1: Percent that binge or do not binge are not the same by
quality of
student – showing dependence
To decide whether to accept H0 or reject H0 in favor of H1, we
calculate the “Test Statistic” associated with the test of
independence.
Test of Independence VII
17
• To calculate our test statistics we first calculate the number
of students that fall into each cell if there is no relationship
between binge drinking and quality of student as shown
below:
Binge Do Not Binge Total
High GPA
If binge is independent of
student quality then we
would expect 31.81% of
the best students to binge
13. or .3181 x 4,848 =
1,542.05
Expected = 3,305.95 4,848
Average GPA Expected = 2,017.59 Expected = 4325.41 6,343
Low GPA Expected = 298.36 Expected = 639.64 938
Total 3,858/12129 = 31.81%
8,271/12,129 =
68.19%
12,129
Test of Independence VIII
18
Next we calculate our test statistic as follows:
• We reject H0 and conclude H1 is true if the value of the excel
function 1 – CHISDIST(TS Value, (R-1)(C-1)) is greater than
90%.
• In this example 1-CHIDIST(189.78, 2) = 1-0 = 1 or 100%
14. • Hence we reject H0 and conclude that the binge drinking is
dependent on quality of student.
Test of Independence IX
19
• In Section 7 we learned how to compare the averages or means
of two groups for similarity.
• When you wish to assess or compare the averages or means
of more than 2 groups for similarity you will conduct an
Analysis of Variance or ANOVA test.
Analysis of Variance (ANOVA) I
20
Example:
Fifteen fourth graders were selected and assigned to 3 groups in
order
to assess 3 different math teaching methods. At the end of the
semester, each student was given a common math test.
15. Results of the tests by group follow.
Based on this data can you conclude any difference in the three
teaching methods based on these test scores?
Method 1 Method 2 Method 3
48
73
51
61
87
55
85
70
69
90
84
16. 68
95
74
67
Analysis of Variance (ANOVA) II
21
Our formal hypothesis we are testing is:
– H0: Teaching methods do not differ
– H1: There is a difference in teaching methods
To decide whether to accept H0 or reject H0 in favor of H1, we
calculate the “Test Statistic” associated with the ANOVA test.
This entails six steps.
Analysis of Variance (ANOVA) III
22
17. • Step 1:
– Calculate the following for each teaching method:
Analysis of Variance (ANOVA) IV
23
Analysis of Variance (ANOVA) V
• Step 2:
–
24
Analysis of Variance (ANOVA) VI
• Step 2 (continue...)
18. 25
Analysis of Variance (ANOVA) VII
• Step 3:
–
26
• Step 4:
• Calculate MSB = SSB / (k – 1)
where k = # of categories
19. • Step 5:
• Calculate MSW = SSW / (n – k)
where n = total number of observations in study
Analysis of Variance (ANOVA) VIII
MSB = 492/2
MSB = 246
MSW = 2,384/12
MSW = 198.7
27
• Step 5:
• Calculate the Test Statistics TS = MSB / MSW
• Step 6:
• Reject H0 and accept H1 that there is a difference between
groups if 1 – FDIST (TS value, k – 1, n – k) is greater than
20. 90%.
Analysis of Variance (ANOVA) IX
TS = 246/198.7
TS = 1.24
1 – FDIST (1.24 , 2 , 12) = 0.324
1 – 0.324 = 0.676 or 67.6%
We don’t have any statistical prove to say they are different
28
First you input the data in excel
Analysis of Variance (ANOVA) I
• We can also do ANOVA in Excel:
(Excel)
29
You then choose the ANOVA Single
Factor feature within the data analysis
area.
21. Analysis of Variance (ANOVA) II
(Excel)
30
You then highlight your data, check off
the labels option and input where you
wish to anchor the output.
Analysis of Variance (ANOVA) III
(Excel)
31
Your test statistic value
One minus this value
gives you your
probability of 68%
Analysis of Variance (ANOVA) IV
(Excel)
22. 32
9.1 Home Mail Corporation sells products by mail. The
company’s
management wants to find out if the number of orders received
on each of
the five days of the week is the same. The company took a
sample of 400
orders received during a four-week period. The following table
lists the
frequency distribution for these orders by the day of the week.
Conduct a
goodness of fit test to determine if the distribution of orders is
equally
distributed by day of the week.
9.2 One hundred auto drivers, who were stopped by police for
some violation,
were also checked to see if they were wearing their seat belts.
The
following table shows the results of this survey. Conduct a test
of
independence to determine if seat belt usage differs by gender.
23. Day of Week Mon Tue Wed Thur Fri
Number of Orders 92 68 65 86 89
Wearing
Seatbelt
Not Wearing
Seatbelt
Male 34 21
Female 30 15
Section 9 Exercises I
33
9.3 Two drugs were administered to two groups of patients to
cure the same
disease. One group of 60 patients and another group of 40
patients were
selected. The following table gives information about the
number of
patents who were cured and not cured by drug. Conduct a test of
24. independence to determine if each drug is as equally effective
in curing the
patients.
Cured Not Cured
Drug I 46 14
Drug II 18 22
Section 9 Exercises II
34
9.4 A large company buys thousands of light bulbs every year.
The company
is currently considering four brands of light bulbs to choose
from. Before
the company decides which light bulbs to buy, it wants to
investigate if the
mean life of the four types of light bulbs is the same. The
company’s
facilities department randomly selected a few bulbs of each type
and
tested them. The following table lists the number of hours (in
thousands)
that each of the bulbs by brand survived before burning out.
Conduct an
25. ANOVA in EXCEL to determine if mean life of these four
brands are the
same or different. If you have time also try to do by hand.
Brand 1 Brand 2 Brand 3 Brand 4
23 19 23 26
24 23 27 24
19 18 25 21
26 24 26 29
22 20 23 28
23 22 21 24
25 19 27 28
Section 9 Exercises III