Introduction to Anova

ANOVA
Ji Li

April 24, 2012

. . . . . . .

The Basic Idea

We want to determine if different “treatments” have different
effects by comparing two different measures of variability of data
to determine how much of the variation of data is random and how
much is due to the treatment.

. . . . . . .

Outline

Basic ANOVA
The Set-Up
An Example
The Model
Treatment sum of squares SSTR
More sum of squares

An ANOVA F Test Example

Comparing ANOVA F test with Kruskal-Wallis test

. . . . . . .

The Hypotheses
▶ We have data divided into k categories called “treatments.”
The word “treatment” refers to application of chemicals or
other methods to improve crop yield on pieces of land.

. . . . . . .

The Hypotheses
▶ For the jth treatment, we obtain numbers

Y1j , Y2j , . . . , Ynj

which indicate how well the jth treatment worked to improve
crop output, or how well the jth drug worked for the patients
who took it.

. . . . . . .

The Hypotheses
▶ For the jth treatment, we obtain numbers

Y1j , Y2j , . . . , Ynj

which indicate how well the jth treatment worked to improve
crop output, or how well the jth drug worked for the patients
who took it.
▶ We test
H0 : µ1 = µ2 = · · · = µk
versus

H1 : not all the µj ’s are equal
. . . . . . .

Notations 1

▶ k = number of treatments. For example, we want to test the
eﬀectiveness of k drugs.

. . . . . . .

Notations 1

▶ nj = size of sample from the jth treatment.

. . . . . . .

Notations 1

∑k
▶ n= nj is the total number of sample points.
j=1

. . . . . . .

Notations 1

∑k
▶ n= nj is the total number of sample points.
j=1
▶ Yij = ith sample point from jth treatment.

. . . . . . .

Notations 2

∑
nj
▶ T·j = Yij is the sum of the numbers in the jth treatment.
i=1

. . . . . . .

Notations 2

∑
nj
i=1
∑k
▶ T·· = T·j is the sum of all the numbers Yij .
j=1

. . . . . . .

Notations 2

∑
nj
i=1
∑k
j=1

1 ∑
nj
T·j
▶ Y ·j = = · Yij is the sample mean from the jth
nj nj
i=1
treatment.

. . . . . . .

Notations 2

∑
nj
i=1
∑k
j=1

1 ∑
nj
T·j
▶ Y ·j = = · Yij is the sample mean from the jth
nj nj
i=1
treatment.
T··
▶ Y ·· = is the average of all sample points.
n

. . . . . . .

An example

Treatment A B C
data: 1 6 9
3 5 8
1 7

. . . . . . .

An example

Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
nj 2 3 3 n=8 sample sizes
Y ·j 2 4 8 Y ·· = 5 averages

. . . . . . .

An example

Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
Y ·j 2 4 8 Y ·· = 5 averages
Sj2 2 7 1 sample variances

. . . . . . .

An example

Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
Y ·j 2 4 8 Y ·· = 5 averages
(nj − 1)Sj2 2 14 2 SSE = 18 error

. . . . . . .

An example

Treatment A B C
data: 1 6 9
3 5 8
1 7
T·j 4 12 24 T·· = 40 totals
Y ·j 2 4 8 Y ·· = 5 averages
(nj − 1)Sj2 2 14 2 SSE = 18 error
nj (Y ·j − Y ·· )2 18 3 27 SSTR = 48 treatment

. . . . . . .

The theory
▶ The theory of ANOVA is based on the model Yij = µj + ϵij ,
where µj is the average eﬀect (true mean) of treatment j and
ϵij are independent normal variables ϵij ∼ N(0, σ 2 ).

. . . . . . .

The theory
▶ Equivalently, Yij ∼ N(µj , σ 2 ).

. . . . . . .

The theory
▶ Let µ be the true mean of the total population:

∑
k
nj µj
j=1
µ= .
n

. . . . . . .

The theory
▶ Let µ be the true mean of the total population:

∑
k
nj µj
j=1
µ= .
n

Then
( ) ( )
σ2 σ2
Y ·j ∼ N µj , and Y ·· ∼ N µ,
nj n

. . . . . . .

Treatment sum of squares
∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
When the treatments are diﬀerent, the treatment sum of squares
gets larger.

. . . . . . .

∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
gets larger.
Theorem 1

∑
k
SSTR = nj (Y ·j − µ)2 − n(Y ·· − µ)2 .
j=1

. . . . . . .

∑
k
SSTR = nj (Y ·j − Y ·· )2
j=1
gets larger.
Theorem 1

∑
k
SSTR = nj (Y ·j − µ)2 − n(Y ·· − µ)2 .
j=1

Theorem 2

∑
k
E (SSTR) = (k − 1)σ 2 + nj (µj − µ)2 .
j=1
. . . . . . .

Proof of Theorem 2
According to our model,

Y ·j ∼ N(µj , σ 2 /nj ), Y ·· ∼ N(µ, σ 2 /n).

Therefore,

σ2
E [(Y ·· − µ)2 ] = Var(Y ·· ) = ,
n
σ2
Var(Y ·j − µ) = Var(Y ·j ) = .
nj

The variance can also be computed using
Var(X ) = E (X 2 ) − E (X )2 :

Var(Y ·j − µ) E (Y ·j − µ)2
= E [(Y ·j − µ) ] −
2
σ 2 /nj (µj − µ)2
. . . . . . .

Proof of Theorem 2: continue

So
σ2
E [(Y ·j − µ)2 ] = + (µj − µ)2
nj
Therefore
∑
E (SSTR) = nj E [(Y ·j − µ)2 ] − nE [(Y ·· − µ)2 ]
j
∑ ( )
σ2 σ2
= nj + (µj − µ) − n ·
2
nj n
j
∑
= kσ 2 + nj (µj − µ)2 − σ 2
j

. . . . . . .

Sum of squares formula
Example
data vector (Yij ) = (1, 3; 6, 5, 1; 9, 8, 7)
(Y ·j ) = (2, 2; 4, 4, 4; 8, 8, 8)

. . . . . . .

Example
data vector (Yij ) = (1, 3; 6, 5, 1; 9, 8, 7)
(Y ·j ) = (2, 2; 4, 4, 4; 8, 8, 8)
error vector (Yij − Y ·j ) = (−1, 1; 2, 1, −3; 1, 0, −1)
(Y ·· ) = (5, 5; 5, 5, 5; 5, 5, 5)

. . . . . . .

Example
data vector (Yij ) = (1, 3; 6, 5, 1; 9, 8, 7)
(Y ·j ) = (2, 2; 4, 4, 4; 8, 8, 8)
(Y ·· ) = (5, 5; 5, 5, 5; 5, 5, 5)
treatment vector (Y ·j − Y ·· ) = (−3, −3; −1, −1, −1; 3, 3, 3)

. . . . . . .

Example
data vector (Yij ) = (1, 3; 6, 5, 1; 9, 8, 7)
(Y ·j ) = (2, 2; 4, 4, 4; 8, 8, 8)
(Y ·· ) = (5, 5; 5, 5, 5; 5, 5, 5)
treatment vector (Y ·j − Y ·· ) = (−3, −3; −1, −1, −1; 3, 3, 3)

Deﬁnitions
∑ ∑
SSE = (Yij − Y ·j )2 = (nj − 1)Sj2 .
i,j j
∑
SSTOT = (Yij − Y ·· )2
i,j

. . . . . . .

Sum of squares identity
Theorem

SSTOT = SSTR + SSE .
This identity represents
∑ ∑ ∑
(Yij − Y ·· )2 = (Y ·j − Y ·· )2 + (Yij − Y ·j )2 .
i,j i,j i,j

. . . . . . .

Sum of squares identity
Theorem

SSTOT = SSTR + SSE .
This identity represents
∑ ∑ ∑
(Yij − Y ·· )2 = (Y ·j − Y ·· )2 + (Yij − Y ·j )2 .
i,j i,j i,j

Theorem
Suppose that µ1 = µ2 = · · · = µk = µ is true. Then
SSTR SSE
∼ χ2 ,
k−1 ∼ χ2 .
n−k
σ2 σ2
Furthermore, SSTR and SSE are independent.
. . . . . . .

F test

Theorem
Under the same conditions,
SSTR/(k − 1)
F = ∼ Fk−1,n−k ,
SSE /(n − k)

and the null hypotheses (µ1 = µ2 = · · · = µk = µ) should be
rejected at the level α of signiﬁcance if the test statistic

F ≥ F1−α,k−1,n−k .

. . . . . . .

The problem

k = 3 treatments Drug A Drug B Drug C

column # j 1 2 3
sample size nj 7 8 10
mean Y ·j 80 88 90
variance Sj2 5.2 4.8 5.4

Are these drugs diﬀerent?

. . . . . . .

Finding totals and averages
treatment Drug A Drug B Drug C

j 1 2 3
nj 7 8 10
Y ·j 80 88 90
T·j = nj · Y ·j 560 704 900

. . . . . . .

Finding totals and averages

j 1 2 3
nj 7 8 10
Y ·j 80 88 90
T·j = nj · Y ·j 560 704 900

Therefore, ∑
T·· = T·j = 2164
j

and
T·· 2164
Y ·· = = = 86.56.
n 25

. . . . . . .

Finding SSTR and MSTR

j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34

. . . . . . .


j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34

Therefore,
∑
SSTR = nj · (Y ·j − Y ·· )2 = 436.16.
j

. . . . . . .


j 1 2 3
nj 7 8 10
Y ·j 80 88 90
nj · (Y ·j − Y ·· )2 301.24 16.59 118.34

Therefore,
∑
SSTR = nj · (Y ·j − Y ·· )2 = 436.16.
j

The number of degrees of freedom of SSTR is k − 1 = 2. So
SSTR 436.16
MSTR = = = 218.08.
k −1 2
. . . . . . .

Finding SSE and MSE

nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6

. . . . . . .

Finding SSE and MSE

nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
The sum of squared error (SSE) measures random errors and
variability of data. It tells us nothing about the treatments.

. . . . . . .

Finding SSE and MSE

nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
∑
SSE = (nj − 1) · Sj2 = 113.4.
j

. . . . . . .

Finding SSE and MSE

nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
∑
SSE = (nj − 1) · Sj2 = 113.4.
j
∑
The degree of freedom is (nj − 1) = n − k = 22,
j

. . . . . . .

Finding SSE and MSE

nj 7 8 10
Sj2 5.2 4.8 5.4
(nj − 1) · Sj2 31.2 33.6 48.6
∑
SSE = (nj − 1) · Sj2 = 113.4.
j
∑
The degree of freedom is (nj − 1) = n − k = 22, and the mean
j
squared error (MSE) is
SSE 113.4
MSE = = = 5.15.
n−k 22
. . . . . . .

ANOVA, F test
▶ The test statistic is
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

. . . . . . .

ANOVA, F test
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

▶ In ANOVA the F test is always right-tailed.

. . . . . . .

ANOVA, F test
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

▶ When the test statistic F is large, we conclude that there is a
signiﬁcant diﬀerence between the drugs.

. . . . . . .

ANOVA, F test
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

significant difference between the drugs. This is because the
numerator measures the difference between the treatments,
and the denominator measures the mean variability of data.

. . . . . . .

ANOVA, F test
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

▶ The critical value is F1−α,k−1,n−k = F0.95,2,22 = 3.44.

. . . . . . .

ANOVA, F test
SSTR/(k − 1) MSTR 218.08
F = = = = 42.3.
SSE /(n − k) MSE 5.15

▶ The critical value is F1−α,k−1,n−k = F0.95,2,22 = 3.44.
▶ Since the test statistic is much larger than the critical value,
we reject H0 and conclude that the drugs are diﬀerent. But
we don’t know if they make people better or worse!
. . . . . . .

Summary of results

The traditional way to summarize the results is by the following
chart with either the critical F value or the p-value in the last
column.

Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24

. . . . . . .

Summary of results

column.

Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24

The conclusion is that at least one of the drugs is diﬀerent from
the other two.

. . . . . . .

Summary of results

column.

Souce SS df MS F p
Treatment 436.16 2 218.08 42.3 2.9 × 10−8
Error 113.4 22 5.15
Total 549.56 24

The conclusion is that at least one of the drugs is diﬀerent from
the other two. We need to do additional tests to see which one is
diﬀerent.

. . . . . . .

ANOVA F test
In planning for future staffing, the ages of 19 hospital staff
members were analyzed. Three groups (nurses, doctors, and x-ray
techs) were chosen. At α = 0.05 , can it be concluded that the
average ages of the three groups differ? (See Excel workbook:
Chapter 12.)

. . . . . . .

ANOVA F test
Chapter 12.)

Souce SS df MS F p
Treatment 1190.48 2 595.24 5.96 0.0116 × 10−8
Error 1598.05 16 99.88
Total 2788.53 18

. . . . . . .

ANOVA F test
Chapter 12.)

Souce SS df MS F p
Treatment 1190.48 2 595.24 5.96 0.0116 × 10−8
Error 1598.05 16 99.88
Total 2788.53 18

Since F is big (or, p-value is small), we reject H0 and conclude
that the average ages of the three groups diﬀer.

. . . . . . .

Kruskal-Wallis test

We could also work out this problem using the nonparametric
Kruskal-Wallis test.
The Kruskal-Wallis statistic is

12 ∑ Rj2
k
B= · − 3(n + 1) = 6.63
n(n + 1) nj
j=1

and the critical value

χ2 2
1−α,k−1 = χ0.95,2 = 5.99.

So we reject H0 and conclude that the average ages of the three
groups diﬀer.

. . . . . . .

Introduction to Anova

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Anova

Similar to Introduction to Anova (10)

More from Ji Li

More from Ji Li (8)

Recently uploaded

Recently uploaded (20)

Introduction to Anova