2. The Basic Idea
We want to determine if different “treatments” have different
effects by comparing two different measures of variability of data
to determine how much of the variation of data is random and how
much is due to the treatment.
. . . . . . .
3. Outline
Basic ANOVA
The Set-Up
An Example
The Model
Treatment sum of squares SSTR
More sum of squares
An ANOVA F Test Example
Comparing ANOVA F test with Kruskal-Wallis test
. . . . . . .
4. The Hypotheses
▶ We have data divided into k categories called “treatments.”
The word “treatment” refers to application of chemicals or
other methods to improve crop yield on pieces of land.
. . . . . . .
5. The Hypotheses
▶ We have data divided into k categories called “treatments.”
The word “treatment” refers to application of chemicals or
other methods to improve crop yield on pieces of land.
▶ For the jth treatment, we obtain numbers
Y1j , Y2j , . . . , Ynj
which indicate how well the jth treatment worked to improve
crop output, or how well the jth drug worked for the patients
who took it.
. . . . . . .
6. The Hypotheses
▶ We have data divided into k categories called “treatments.”
The word “treatment” refers to application of chemicals or
other methods to improve crop yield on pieces of land.
▶ For the jth treatment, we obtain numbers
Y1j , Y2j , . . . , Ynj
which indicate how well the jth treatment worked to improve
crop output, or how well the jth drug worked for the patients
who took it.
▶ We test
H0 : µ1 = µ2 = · · · = µk
versus
H1 : not all the µj ’s are equal
. . . . . . .
7. Notations 1
▶ k = number of treatments. For example, we want to test the
effectiveness of k drugs.
. . . . . . .
8. Notations 1
▶ k = number of treatments. For example, we want to test the
effectiveness of k drugs.
▶ nj = size of sample from the jth treatment.
. . . . . . .
9. Notations 1
▶ k = number of treatments. For example, we want to test the
effectiveness of k drugs.
▶ nj = size of sample from the jth treatment.
∑k
▶ n= nj is the total number of sample points.
j=1
. . . . . . .
10. Notations 1
▶ k = number of treatments. For example, we want to test the
effectiveness of k drugs.
▶ nj = size of sample from the jth treatment.
∑k
▶ n= nj is the total number of sample points.
j=1
▶ Yij = ith sample point from jth treatment.
. . . . . . .
11. Notations 2
∑
nj
▶ T·j = Yij is the sum of the numbers in the jth treatment.
i=1
. . . . . . .
12. Notations 2
∑
nj
▶ T·j = Yij is the sum of the numbers in the jth treatment.
i=1
∑k
▶ T·· = T·j is the sum of all the numbers Yij .
j=1
. . . . . . .
13. Notations 2
∑
nj
▶ T·j = Yij is the sum of the numbers in the jth treatment.
i=1
∑k
▶ T·· = T·j is the sum of all the numbers Yij .
j=1
1 ∑
nj
T·j
▶ Y ·j = = · Yij is the sample mean from the jth
nj nj
i=1
treatment.
. . . . . . .
14. Notations 2
∑
nj
▶ T·j = Yij is the sum of the numbers in the jth treatment.
i=1
∑k
▶ T·· = T·j is the sum of all the numbers Yij .
j=1
1 ∑
nj
T·j
▶ Y ·j = = · Yij is the sample mean from the jth
nj nj
i=1
treatment.
T··
▶ Y ·· = is the average of all sample points.
n
. . . . . . .
15. An example
Treatment A B C
data: 1 6 9
3 5 8
1 7
. . . . . . .
16. An example
Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
nj 2 3 3 n=8 sample sizes
Y ·j 2 4 8 Y ·· = 5 averages
. . . . . . .
17. An example
Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
nj 2 3 3 n=8 sample sizes
Y ·j 2 4 8 Y ·· = 5 averages
Sj2 2 7 1 sample variances
. . . . . . .
18. An example
Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
nj 2 3 3 n=8 sample sizes
Y ·j 2 4 8 Y ·· = 5 averages
Sj2 2 7 1 sample variances
(nj − 1)Sj2 2 14 2 SSE = 18 error
. . . . . . .
19. An example
Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
nj 2 3 3 n=8 sample sizes
Y ·j 2 4 8 Y ·· = 5 averages
Sj2 2 7 1 sample variances
(nj − 1)Sj2 2 14 2 SSE = 18 error
nj (Y ·j − Y ·· )2 18 3 27 SSTR = 48 treatment
. . . . . . .
20. The theory
▶ The theory of ANOVA is based on the model Yij = µj + ϵij ,
where µj is the average effect (true mean) of treatment j and
ϵij are independent normal variables ϵij ∼ N(0, σ 2 ).
. . . . . . .
21. The theory
▶ The theory of ANOVA is based on the model Yij = µj + ϵij ,
where µj is the average effect (true mean) of treatment j and
ϵij are independent normal variables ϵij ∼ N(0, σ 2 ).
▶ Equivalently, Yij ∼ N(µj , σ 2 ).
. . . . . . .
22. The theory
▶ The theory of ANOVA is based on the model Yij = µj + ϵij ,
where µj is the average effect (true mean) of treatment j and
ϵij are independent normal variables ϵij ∼ N(0, σ 2 ).
▶ Equivalently, Yij ∼ N(µj , σ 2 ).
▶ Let µ be the true mean of the total population:
∑
k
nj µj
j=1
µ= .
n
. . . . . . .
23. The theory
▶ The theory of ANOVA is based on the model Yij = µj + ϵij ,
where µj is the average effect (true mean) of treatment j and
ϵij are independent normal variables ϵij ∼ N(0, σ 2 ).
▶ Equivalently, Yij ∼ N(µj , σ 2 ).
▶ Let µ be the true mean of the total population:
∑
k
nj µj
j=1
µ= .
n
Then
( ) ( )
σ2 σ2
Y ·j ∼ N µj , and Y ·· ∼ N µ,
nj n
. . . . . . .
24. Treatment sum of squares
∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
When the treatments are different, the treatment sum of squares
gets larger.
. . . . . . .
25. Treatment sum of squares
∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
When the treatments are different, the treatment sum of squares
gets larger.
Theorem 1
∑
k
SSTR = nj (Y ·j − µ)2 − n(Y ·· − µ)2 .
j=1
. . . . . . .
26. Treatment sum of squares
∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
When the treatments are different, the treatment sum of squares
gets larger.
Theorem 1
∑
k
SSTR = nj (Y ·j − µ)2 − n(Y ·· − µ)2 .
j=1
Theorem 2
∑
k
E (SSTR) = (k − 1)σ 2 + nj (µj − µ)2 .
j=1
. . . . . . .
27. Proof of Theorem 2
According to our model,
Y ·j ∼ N(µj , σ 2 /nj ), Y ·· ∼ N(µ, σ 2 /n).
Therefore,
σ2
E [(Y ·· − µ)2 ] = Var(Y ·· ) = ,
n
σ2
Var(Y ·j − µ) = Var(Y ·j ) = .
nj
The variance can also be computed using
Var(X ) = E (X 2 ) − E (X )2 :
Var(Y ·j − µ) E (Y ·j − µ)2
= E [(Y ·j − µ) ] −
2
σ 2 /nj (µj − µ)2
. . . . . . .
28. Proof of Theorem 2: continue
So
σ2
E [(Y ·j − µ)2 ] = + (µj − µ)2
nj
Therefore
∑
E (SSTR) = nj E [(Y ·j − µ)2 ] − nE [(Y ·· − µ)2 ]
j
∑ ( )
σ2 σ2
= nj + (µj − µ) − n ·
2
nj n
j
∑
= kσ 2 + nj (µj − µ)2 − σ 2
j
. . . . . . .
29. Sum of squares formula
Example
data vector (Yij ) = (1, 3; 6, 5, 1; 9, 8, 7)
(Y ·j ) = (2, 2; 4, 4, 4; 8, 8, 8)
. . . . . . .
33. Sum of squares identity
Theorem
SSTOT = SSTR + SSE .
This identity represents
∑ ∑ ∑
(Yij − Y ·· )2 = (Y ·j − Y ·· )2 + (Yij − Y ·j )2 .
i,j i,j i,j
. . . . . . .
34. Sum of squares identity
Theorem
SSTOT = SSTR + SSE .
This identity represents
∑ ∑ ∑
(Yij − Y ·· )2 = (Y ·j − Y ·· )2 + (Yij − Y ·j )2 .
i,j i,j i,j
Theorem
Suppose that µ1 = µ2 = · · · = µk = µ is true. Then
SSTR SSE
∼ χ2 ,
k−1 ∼ χ2 .
n−k
σ2 σ2
Furthermore, SSTR and SSE are independent.
. . . . . . .
35. Sum of squares identity
Theorem
SSTOT = SSTR + SSE .
This identity represents
∑ ∑ ∑
(Yij − Y ·· )2 = (Y ·j − Y ·· )2 + (Yij − Y ·j )2 .
i,j i,j i,j
Theorem
Suppose that µ1 = µ2 = · · · = µk = µ is true. Then
SSTR SSE
∼ χ2 ,
k−1 ∼ χ2 .
n−k
σ2 σ2
Furthermore, SSTR and SSE are independent.
. . . . . . .
36. F test
Theorem
Under the same conditions,
SSTR/(k − 1)
F = ∼ Fk−1,n−k ,
SSE /(n − k)
and the null hypotheses (µ1 = µ2 = · · · = µk = µ) should be
rejected at the level α of significance if the test statistic
F ≥ F1−α,k−1,n−k .
. . . . . . .
37. Outline
Basic ANOVA
The Set-Up
An Example
The Model
Treatment sum of squares SSTR
More sum of squares
An ANOVA F Test Example
Comparing ANOVA F test with Kruskal-Wallis test
. . . . . . .
38. The problem
k = 3 treatments Drug A Drug B Drug C
column # j 1 2 3
sample size nj 7 8 10
mean Y ·j 80 88 90
variance Sj2 5.2 4.8 5.4
Are these drugs different?
. . . . . . .
39. Finding totals and averages
treatment Drug A Drug B Drug C
j 1 2 3
nj 7 8 10
Y ·j 80 88 90
T·j = nj · Y ·j 560 704 900
. . . . . . .
40. Finding totals and averages
treatment Drug A Drug B Drug C
j 1 2 3
nj 7 8 10
Y ·j 80 88 90
T·j = nj · Y ·j 560 704 900
Therefore, ∑
T·· = T·j = 2164
j
and
T·· 2164
Y ·· = = = 86.56.
n 25
. . . . . . .
41. Finding SSTR and MSTR
treatment Drug A Drug B Drug C
j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34
. . . . . . .
42. Finding SSTR and MSTR
treatment Drug A Drug B Drug C
j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34
Therefore,
∑
SSTR = nj · (Y ·j − Y ·· )2 = 436.16.
j
. . . . . . .
43. Finding SSTR and MSTR
treatment Drug A Drug B Drug C
j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34
Therefore,
∑
SSTR = nj · (Y ·j − Y ·· )2 = 436.16.
j
The number of degrees of freedom of SSTR is k − 1 = 2. So
SSTR 436.16
MSTR = = = 218.08.
k −1 2
. . . . . . .
44. Finding SSE and MSE
treatment Drug A Drug B Drug C
nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
. . . . . . .
45. Finding SSE and MSE
treatment Drug A Drug B Drug C
nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
The sum of squared error (SSE) measures random errors and
variability of data. It tells us nothing about the treatments.
. . . . . . .
46. Finding SSE and MSE
treatment Drug A Drug B Drug C
nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
The sum of squared error (SSE) measures random errors and
variability of data. It tells us nothing about the treatments.
∑
SSE = (nj − 1) · Sj2 = 113.4.
j
. . . . . . .
47. Finding SSE and MSE
treatment Drug A Drug B Drug C
nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
The sum of squared error (SSE) measures random errors and
variability of data. It tells us nothing about the treatments.
∑
SSE = (nj − 1) · Sj2 = 113.4.
j
∑
The degree of freedom is (nj − 1) = n − k = 22,
j
. . . . . . .
48. Finding SSE and MSE
treatment Drug A Drug B Drug C
nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
The sum of squared error (SSE) measures random errors and
variability of data. It tells us nothing about the treatments.
∑
SSE = (nj − 1) · Sj2 = 113.4.
j
∑
The degree of freedom is (nj − 1) = n − k = 22, and the mean
j
squared error (MSE) is
SSE 113.4
MSE = = = 5.15.
n−k 22
. . . . . . .
49. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
. . . . . . .
50. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
▶ In ANOVA the F test is always right-tailed.
. . . . . . .
51. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
▶ In ANOVA the F test is always right-tailed.
▶ When the test statistic F is large, we conclude that there is a
significant difference between the drugs.
. . . . . . .
52. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
▶ In ANOVA the F test is always right-tailed.
▶ When the test statistic F is large, we conclude that there is a
significant difference between the drugs. This is because the
numerator measures the difference between the treatments,
and the denominator measures the mean variability of data.
. . . . . . .
53. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
▶ In ANOVA the F test is always right-tailed.
▶ When the test statistic F is large, we conclude that there is a
significant difference between the drugs. This is because the
numerator measures the difference between the treatments,
and the denominator measures the mean variability of data.
▶ The critical value is F1−α,k−1,n−k = F0.95,2,22 = 3.44.
. . . . . . .
54. ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15
▶ In ANOVA the F test is always right-tailed.
▶ When the test statistic F is large, we conclude that there is a
significant difference between the drugs. This is because the
numerator measures the difference between the treatments,
and the denominator measures the mean variability of data.
▶ The critical value is F1−α,k−1,n−k = F0.95,2,22 = 3.44.
▶ Since the test statistic is much larger than the critical value,
we reject H0 and conclude that the drugs are different. But
we don’t know if they make people better or worse!
. . . . . . .
55. Summary of results
The traditional way to summarize the results is by the following
chart with either the critical F value or the p-value in the last
column.
Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24
. . . . . . .
56. Summary of results
The traditional way to summarize the results is by the following
chart with either the critical F value or the p-value in the last
column.
Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24
The conclusion is that at least one of the drugs is different from
the other two.
. . . . . . .
57. Summary of results
The traditional way to summarize the results is by the following
chart with either the critical F value or the p-value in the last
column.
Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24
The conclusion is that at least one of the drugs is different from
the other two. We need to do additional tests to see which one is
different.
. . . . . . .
58. Outline
Basic ANOVA
The Set-Up
An Example
The Model
Treatment sum of squares SSTR
More sum of squares
An ANOVA F Test Example
Comparing ANOVA F test with Kruskal-Wallis test
. . . . . . .
59. ANOVA F test
In planning for future staffing, the ages of 19 hospital staff
members were analyzed. Three groups (nurses, doctors, and x-ray
techs) were chosen. At α = 0.05 , can it be concluded that the
average ages of the three groups differ? (See Excel workbook:
Chapter 12.)
. . . . . . .
60. ANOVA F test
In planning for future staffing, the ages of 19 hospital staff
members were analyzed. Three groups (nurses, doctors, and x-ray
techs) were chosen. At α = 0.05 , can it be concluded that the
average ages of the three groups differ? (See Excel workbook:
Chapter 12.)
Souce SS df MS F p
Treatment 1190.48 2 595.24 5.96 0.0116 × 10−8
Error 1598.05 16 99.88
Total 2788.53 18
. . . . . . .
61. ANOVA F test
In planning for future staffing, the ages of 19 hospital staff
members were analyzed. Three groups (nurses, doctors, and x-ray
techs) were chosen. At α = 0.05 , can it be concluded that the
average ages of the three groups differ? (See Excel workbook:
Chapter 12.)
Souce SS df MS F p
Treatment 1190.48 2 595.24 5.96 0.0116 × 10−8
Error 1598.05 16 99.88
Total 2788.53 18
Since F is big (or, p-value is small), we reject H0 and conclude
that the average ages of the three groups differ.
. . . . . . .
62. Kruskal-Wallis test
We could also work out this problem using the nonparametric
Kruskal-Wallis test.
The Kruskal-Wallis statistic is
12 ∑ Rj2
k
B= · − 3(n + 1) = 6.63
n(n + 1) nj
j=1
and the critical value
χ2 2
1−α,k−1 = χ0.95,2 = 5.99.
So we reject H0 and conclude that the average ages of the three
groups differ.
. . . . . . .