3. Example
• Gulfstream Aerospace Company produced three different
prototypes as candidates for mass production as the company’s
newest large-cabin business jet, the Gulfstream IV. Each of the
three prototypes has slightly different features, which may bring
about differences in performance. Therefore, as part of the
decision-making process concerning which model to produce,
company engineers are interested in determining whether the
three proposed models have about the same average flight range.
Each of the models is assigned a random choice of 10 flight routes
and departure times, and the flight range on a full standard fuel
tank is measured (the planes carry additional fuel on the test
flights, to allow them to land safely at certain destination points).
Range data for the three prototypes, in nautical miles (measured to
the nearest 10 miles), are as follows.
4. Data
Prototype A Prototype B Prototype C
4,420 4,230 4,110
4,540 4,220 4,090
4,380 4,100 4,070
4,550 4,300 4,160
4,210 4,420 4,230
4,330 4,110 4,120
4,400 4,230 4,000
4,340 4,280 4,200
4,390 4,090 4,150
4,510 4,320 4,220
Do all three prototypes have the same average range?
5. Introduction
• ANOVA is a technique of testing hypotheses
about the significance in several population
means
• Was developed by R A Fisher
• Total variation in a data set accounted for two
components: between and within
6. ANOVA – What does it tell us?
• ANOVA = Analysis of Variance
• ANOVA will tell us whether we have sufficient
evidence to say that measurements from at
least one sample differ significantly from at
least one other.
– It will not tell us which ones differ, or how many
differ
7. Relationship Amongst Test, Analysis of Variance, Analysis of
Covariance, & Regression
One Independent One or More
Metric Dependent Variable
t Test
Binary
Variable
One-Way Analysis
of Variance
One Factor
N-Way Analysis
of Variance
More than
One Factor
Analysis of
Variance
Categorical:
Factorial
Analysis of
Covariance
Categorical
and Interval
Regression
Interval
Independent Variables
8. One-Way Analysis of Variance
Marketing researchers are often interested in
examining the differences in the mean values of the
dependent variable for several categories of a single
independent variable or factor. For example:
• Do the various segments differ in terms of their
volume of product consumption?
• Do the brand evaluations of groups exposed to
different commercials vary?
• What is the effect of consumers' familiarity with the
store (measured as high, medium, and low) on
preference for the store?
9. ANOVA vs. t-test
• ANOVA is like a t-test among multiple data
sets simultaneously
– t-tests can only be done between two data
sets, or between one set and a “true” value
• ANOVA uses the F distribution instead of
the t-distribution
• ANOVA assumes that all of the data sets
have equal variances
10. Conducting One-Way ANOVA
Interpret the Results
Identify the Dependent and Independent Variables
Decompose the Total Variation
Measure the Effects
Test the Significance
11. Completely randomized design
Population 1 Population 2….. Population k
Mean = 1 Mean = 2 …. Mean = k
Variance=1
2 Variance=2
2 … Variance = k
2
We want to know something about how the
populations compare. Do they have the same mean?
We can collect random samples from each
population, which gives us the following data.
12. Completely randomized design
Mean = M1 Mean = M2 ..… Mean = Mk
Variance=s1
2 Variance=s2
2 …. Variance = sk
2
N1 cases N2 cases …. Nk cases
Suppose we want to compare 3 college majors in a
business school by the average annual income
people make after post graduation. We collect the
following data (in Rs 1000s) based on random
surveys.
14. Completely randomized design
Can the dean conclude that there are
differences among the major’s incomes?
Ho: 1 = 2 = 3
HA: 1 2 3
In this problem we must take into account:
1) The variance between samples, or the actual
differences by major. This is called the sum of
squares for treatment (SST) (columns)
15. Completely randomized design
2) The variance within samples, or the variance
of incomes within a single major. This is called
the sum of squares for error (SSE).
Recall that when we sample, there will always
be a chance of getting something different
than the population. We account for this
through #2, or the SSE.
16. F-Statistic
For this test, we will calculate a F statistic, which
is used to compare variances.
F = SST/(k-1)
SSE/(n-k)
SST=sum of squares for treatment (columns)
SSE=sum of squares for error
k = the number of populations
n = total sample size
17. F-statistic
Intuitively, the F statistic is:
F = explained variance
unexplained variance
Explained variance is the difference between
majors
Unexplained variance is the difference based on
random sampling for each group
18. Decision Based on ANOVA
• F (calculated) > F (critical or theoretical)
• Reject H0
• P < 0.05 (chosen significance level)
• Reject H0
19. Calculating SST
SST = ni(Mi - )2
= grand mean or = Mi/k or the sum of all
values for all groups divided by total sample
size
Mi = mean for each sample
k= the number of populations
21. Calculating SST
Note that when M1 = M2 = M3, then SST=0 which
would support the null hypothesis.
In this example, the samples are of equal size,
but we can also run this analysis with samples
of varying size also.
22. Calculating SSE
SSE = (Xit – Mi)2
In other words, it is just the variance for each sample
added together.
SSE = (X1t – M1)2 + (X2t – M2)2 +
(X3t – M3)2
SSE = [(27-29)2 + (22-29)2 +…+ (29-29)2]
+ [(23-33.5)2 + (36-33.5)2 +…]
+ [(48-37)2 + (35-37)2 +…+ (29-37)2]
SSE = 819.5
23. Statistical Output
When you estimate this information in a computer
program, it will typically be presented in a table as
follows:
Source of df Sum of Mean F-ratio
Variation squares squares
Treatment k-1 SST MST=SST/(k-1) F=MST
Error n-k SSE MSE=SSE/(n-k) MSE
Total n-1 SS=SST+SSE
24. Calculating F for our example
F = 193/2
819.5/15
F = 1.77
Our calculated F is compared to the critical value
using the F-distribution with
F, k-1, n-k degrees of freedom
k-1 (numerator df)
n-k (denominator df)
25. The Results
For 95% confidence (=.05), our critical F is 3.68
In this case, 1.77 < 3.68 so we must accept the
null hypothesis.
The dean is puzzled by these results because
just by eyeballing the data, it looks like finance
majors make more money.
26. New Data
Many other factors may determine the salary
level, such as GPA. The dean decides to
collect new data selecting one student
randomly from each major with the following
average grades.
27. New data
Average Accounting Marketing Finance M(b)
A+ 41 45 51 M(b1)=45.67
A 36 38 45 M(b2)=39.67
B+ 27 33 31 M(b3)=30.83
B 32 29 35 M(b4)=32
C+ 26 31 32 M(b5)=29.67
C 23 25 27 M(b6)=25
M(t)1=30.83 M(t)2=33.5 M(t)3=36.83
= 33.72
28. Randomized Block Design
Now the data in the 3 samples are not
independent, they are matched by GPA levels.
Just like before, matched samples are superior
to unmatched samples because they provide
more information. In this case, we have
added a factor that may account for some of
the SSE.
29. Two way ANOVA
Now SS(total) = SST + SSB + SSE
Where SSB = the variability among blocks,
where a block is a matched group of
observations from each of the populations
We can calculate a two-way ANOVA to test our
null hypothesis.
30. Example: Solving with Excel
• Vishal Foods Ltd is a leading manufacturer of biscuits. The
company has launched a new brand in the four metros;
Delhi, Mumbai, Kolkata and Chennai. After one month, the
company realizes that there is a difference in the retail
price per pack of biscuits across cities. Before the launch,
the company had promised its employees and newly-
appointed retailers that the biscuits would be sold at a
uniform price in the country. The difference in price can
tarnish the image of the company. In order to make a quick
inference, the company collected data about the price from
six randomly selected stores across the four cities. Based on
the sample information, the price per pack of the biscuits,
in rupees, is given in the table
32. Hypotheses
• Null hypothesis is all the means are all equal
• Alternative hypothesis is all the means are
unequal
33. Excel Path
• Select Data Analysis
• From Data Analysis dialogue box, select Anova:
Single Factor
• Click OK
• In the Anova: Single Factor dialogue box, enter
the location of the samples in the variable Input
Range box. Select Grouped by Columns.
• Place the value of α
• Click OK
34. Minitab Path
• Select Stat from the menu bar
• A pull down menu will appear- select ANOVA
• Another pull down menu will appear- select One
Way Unstacked
• One Way dialogue box will appear
• By using select, place samples in the Responses
(in separate columns) box and place the
confidence level
• Click OK
• F and p values will appear in the output box
35. Output Sheet: Excel
df for SST is (4-1) = 3, df for SSE is (24-4) = 20, df for total is (24-1) =
23. F theoretical is 3.098 where as F calculated is 43.03. As F
calculated greater than F theoretical, F falls in the rejection region.
Null hypothesis is rejected.
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 6 132 22 0.2
Column 2 6 117.5 19.58333 0.641667
Column 3 6 106 17.66667 0.566667
Column 4 6 123.5 20.58333 0.441667
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 59.70833 3 19.90278 43.03303 6.54E-09 3.098391
Within Groups 9.25 20 0.4625
Total 68.95833 23
36. Conclusion
• There is enough evidence to believe that there
is a significant difference in the prices across
four cities
37. Two Way ANOVA
• A company which produces stationery items wants to
diversify into the photocopy paper manufacturing business.
The company has decided to first test market the product in
three areas termed as the north area, central area, and the
south area. The company takes a random sample of five
salesmen S1, S2, S3, S4 and S5 for this purpose. The sales
volume generated by these five salesmen, in thousand
rupees, and total sales in different regions are given in the
table.
• Use a randomized block design analysis to examine:
– Whether the salesmen significantly differ in performance?
– Whether there is a significant difference in terms of sales
capacity between the regions?
– Take 95% confidence level
38. Data
Region Salesmen
S1 S2 S3 S4 S5 Region’s
Total
North 24 30 26 23 32 135
Central 22 32 27 25 31 137
South 23 28 25 22 32 130
Salesmen’s
Total
69 90 78 70 95 402
39. Hypotheses
• Divided into two parts
• For treatments (columns)
• For blocks (rows)
• For treatments: Null hypothesis: all the treatment
means are equal; Alternative hypothesis: all the
treatment means are unequal
• For blocks: Null hypothesis: all the block means
are equal; Alternative hypothesis: all the block
means are unequal
40. Excel Path
• Select Data Analysis
• From Data Analysis dialogue box, select Anova:
Two Factor Without Replication
• Click OK
• In the Anova: Two Factor dialogue box, enter the
location of the samples in the variable Input
Range box
• Place the value of α
• Click OK
41. Output Sheet: Excel
The F calculated for columns is 30.17, which is greater than F theoretical
of 3.84, null hypothesis is rejected.
The F calculated for rows is 1.71, which is less than F theoretical of 4.46,
null hypothesis is accepted.
Anova: Two-Factor Without Replication
SUMMARY Count Sum Average Variance
Row 1 5 135 27 15
Row 2 5 137 27.4 17.3
Row 3 5 130 26 16.5
Column 1 3 69 23 1
Column 2 3 90 30 4
Column 3 3 78 26 1
Column 4 3 70 23.33333 2.333333
Column 5 3 95 31.66667 0.333333
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 5.2 2 2.6 1.714286 0.2401 4.45897
Columns 183.0667 4 45.76667 30.17582 7.09E-05 3.837853
Error 12.13333 8 1.516667
Total 200.4 14
42. Conclusion
• There is enough evidence to believe that there
is a significant difference in the performance
of five salesmen in terms of generation of
results. On the other hand, there is no
significant difference in the capacity of
generating sales for the three regions
43. Illustrative Applications of One-Way
Analysis of Variance
We illustrate the concepts discussed in this chapter
using the data presented in the next table
The department store is attempting to determine the
effect of in-store promotion (X) on sales (Y)
The null hypothesis is that the category means are
equal:
H0: µ1 = µ2 = µ3.
45. 16-45
One-Way ANOVA: Effect of In-store Promotion on
Store Sales
Cell means
Level of Count Mean
Promotion
High (1) 10 8.300
Medium (2) 10 6.200
Low (3) 10 3.700
TOTAL 30 6.067
Source of Sum of df Mean F ratio F
prob.
Variation squares square
Between groups 106.067 2 53.033 17.944 3.35
(Promotion)
Within groups 79.800 27 2.956
(Error)
TOTAL 185.867 29 6.409
46. • From table we see that for 2 and 27 degrees of freedom,
the critical value of F is 3.35 for 95% level. Because the
calculated value of F is greater than the critical value, we
reject the null hypothesis.
Illustrative Applications of One-Way
Analysis of Variance
47. SPSS Windows
One-way ANOVA can be efficiently performed using the
program COMPARE MEANS and then One-way ANOVA.
To select this procedure using SPSS for Windows click:
Analyze>Compare Means>One-Way ANOVA …
N-way analysis of variance and analysis of covariance
can be performed using GENERAL LINEAR MODEL. To
select this procedure using SPSS for Windows click:
Analyze>General Linear Model>Univariate …
48. SPSS Windows: One-Way ANOVA
1. Select ANALYZE from the SPSS menu bar.
2. Click COMPARE MEANS and then ONE-WAY ANOVA.
3. Move “Sales [sales]” in to the DEPENDENT LIST box.
4. Move “In-Store Promotion[promotion]” to the FACTOR
box.
5. Click OPTIONS
6. Click Descriptive.
7. Click CONTINUE.
8. Click OK.