a full lecture presentation on ANOVA .
areas covered include;
a. definition and purpose of anova
b. one-way anova
c. factorial anova
d. mutiple anova
e MANOVA
f. POST-HOC TESTS - types
f. easy step by step process of calculating post hoc test.
3. INTRODUCTION
What is Anova?
It is a statistical procedure used to test the degree to which
the means of 2 or more groups differ in an experiment
- In Anova we look at how he group means vary from each
other that they exceed individual differences within the
groups.
- Developed by Sir Ronald A. Fisher in 1920’s.
4. purpose
The purpose of ANOVA is much the same as the t
tests, the goal is to determine whether the mean
differences that are obtained for sample data are
sufficiently large to justify a conclusion that there are
mean differences between the populations from which
the samples were obtained.
5. Why Anova?
1. Two sample t-test are problematic thus increasing the risk
of type 1 error. The more t test you run the greater the risk
of type one error
- at 0.5 level of significance with hundred comparisons 5
will show some difference when none exists
-the difference between ANOVA and the t tests is that
ANOVA can be used in situations where there are two or
more means being compared, whereas the t tests are
limited to situations where only two means are involved.
- Anova avoids type 1 error in situations where a study is
comparing more than two population means
2. Anova allows to see if there are differences between the
means with an OMNIBUS test
- they test whether the explained variance in a set of data
is significantly greater than unexplained variance
6. CONDITIONS FOR ANOVA
1. Data must be experimental
2. With many experimental designs the sample sizes
must be equal for various factor level combinations
7. Assumptions
1) Normality: The values in each group are normally
distributed.
2) Homogeneity of variances: The variance within
each group should be equal for all groups.
3) Independence of error: The error(variation of each
value around its own group mean) should be
independent for each value.
8. Types of Anova
One way Anova
- One way repeated measure
factorial Anova
Multiple Anova
Manova
10. Variance
Why do scores vary?
1. What contributes to differences in scores
individual differences
Which group you are in
There is variation anytime that all of the data values are
not identical and this variation can come from different
sources such as the model or the factor
-There is always the left over variation that cant be
explained by any other sources. This source is called the
error
11. Variation can be:
a) Within group variation variability or
differences in particular groups
b) between groups differences depending
what groups one is in (what treatment you received)
- We are actually examining the ratio of
differences (variances) from treatment to
variances from individual differences if the
ratio is large there is significant impact from
treatment
12. .
We can know this variation by evaluating the ratio of
the MST to MSE and conducting F test
We can be able to compare multiple means
Between group variances reflect differences in the way
the groups were treated within group variances reflects
individual differences
Null hypothesis – there is no significant differences in
means
Alternative hypothesis – there is significant differences
between the means
We are comparing variance estimates
Variance = SS/DF
13. Degrees of freedom
This are the number of values that are free to vary once
certain parameters have been established
Usually this is one less than the sample size but in
general its the number of values minus the number of
parameters being estimated abbreviated as (df)
14. Variance (MS)
The sample variances is the average squared deviation
from the mean found by dividing the variation by the
degree of freedom
Variance (MS) = variation ∕ df
It is abbreviated as MS for mean of squares
MS = SS/df
15. F
F is the f statistic
F is the ratio of two sample variances
There will be an f statistic for each source except for the
error and the total
The MS column contains variances
The F test statistic for each source is the MS for that row
divided by the MS of the error row
16. F
F requires a pair of degrees of freedom, one for the
numerator and another for the denominator
The numerator df is the degree of freedom for the
source
The numerator df is the df for the error row
F is always a right tail test
17. Notes on Anova
The MS(Total) isn’t actually part of the ANOVA table,
but it represents the sample variance of the response
variable, so it’s useful to find
The total df is one less than the sample size
You would either need to find a Critical F value or the
p-value to finish the hypothesis test
18. One way Anova
Measures the outcome of intervention in subjects over
time
There is one independent variable like time on
therapy, an independent variable that has multiple
levels e.g time on therapy then a dependent variable
19. one way Anova
Determines means of 2 or more independent
groups
significantly different from one another.
Only 1 independent variable (factor/grouping
variable) with ≥3 levels
Only 1 dependent variable
Grouping variable- nominal
Outcome variable- interval or ratio
20. Steps..
1. State null and alternative hypothesis
2. State alpha
3. Calculate degree of freedom
4. State decision rule
5. Calculate test statistic
Calculate variance between samples
Calculate variance within samples
Calculate f statistic
If F’ is significant, perform post hoc test
State results and conclusion
21. Steps..
1. State the null and alternative hypothesis
H0 = U1 = U2------ Uj
Ho = all sample means are equal
Ha = not all of the Uj are equal at least one of the sample has different
means
2. State alpha 0.05
3. Calculate degrees of freedom (n-k) and (n-1)
k - No of samples
n – Total number of observation
4. State decision rule
if calculated f value > table value of F, reject H0
Calculate test statistic
22. Calculating variance between
samples
1. Calculate the mean of each sample.
2. Calculate the Grand average
3. Take the difference between means of various samples &
grand average.
4. Square these deviations & obtain total which will give
sum of squares between samples (SSC)
5. Divide the total obtained in step 4 by the degrees of
freedom to calculate the mean sum of square between
samples (MSC).
23. Calculating Variance within
Samples
1. Calculate mean value of each sample
2. Take the deviations of the various items in a sample from
the mean values of the respective samples.
3. Square these deviations & obtain total which gives the sum
of square within the samples (SSE)
4. Divide the total obtained in 3rd step by the degrees of
freedom to calculate the mean sum of squares within
samples (MSE).
24. Mean sum of squares
1. MSC ( Between samples)
MSC = SSC/k-1
2. MSE (within samples)
MSE = SSE / n-k
when
k= No of samples
n = total number of observations
25. Calculation of F statistic
f = variability between groups/ variability within
groups
F statistic = MSC/MSE
compare the F statistic value with F (critical) value
which is obtained by looking for it in the F distribution
tables against degrees of freedom
The calculated value of F if is > table value Ho is rejected
26.
Within-Group
Variance
Between-Group
Variance
Between-group variance is large relative to
the within-group variance, so F statistic will
be larger & > critical value, therefore
statistically significant .
Conclusion – At least one of group means is
significantly different from other group
means
28. Example:
3 sample population of preschool children obtained
from normal populations with equal variances. were
grouped in accordance to their visual acuity as 6/24,
6/36 and 6/60 obtained during a vision screening and
their scores for ease of performing daily activities were
assessed and recorded using a scale of 1-10 (1 coded for
being unable to perform a particular task and 10 for
most ease in performing a task ) test the hypothesis that
sample means of their scores are equal
30. Variance BETWEEN samples
Sum of squares between samples (SSC) =
n1 (M1 – Grand avg)2 + n2 (M2– Grand avg)2 + n3(M3– Grand
avg)2
5 ( 9 - 6.67) 2 + 5 ( 6.8 - 6.67) 2 + 5 ( 4.42 - 6.67) 2 =
52.55
27
.
26
2
55
.
52
1
k
SSC
MSC
Calculation of Mean sum of Squares between samples
(MSC)
k= No of Samples, n= Total No of observations
31. Variance WITH IN samples
X1 (X1 –
M1)2
X2 (X2– M2)2 X3 (X3– M3)2
8 1 5 3.24 3 1.44
10 1 5 3.24 3 1.44
8 1 8 1.44 5 0.64
9 0 7 0.04 4 0.04
10 1 9 4.84 6 3.24
4 12.8 6.8
Sum of squares within samples (SSE) = 4 + 12.8
+6.8=23.6
97
.
1
12
6
.
23
k
n
SSE
MSE
Calculation of Mean Sum Of Squares within samples
(MSE)
32. Calculation of ratio F
sss
F = 26.27/ 1.97
= 13.34
groups
within
y
Variabilit
groups
between
y
Variabilit
F
The Table value of F at 5% level of significance for d.f 2
& 12 is 3.88
The calculated value of F > table value
H0 is rejected. Hence there is significant difference in
sample means
33. Post hoc test
Uses: Post hoc tests are designed for situations in
which the researcher has already obtained a significant
omnibus F- test . Thus he/she analyses the differences
among the means to find which means are
significantly different from each other
34. Types of post-hoc Tests..
Depending upon research design & research question:
Bonferroni (more flexible)
Only some pairs of sample means are to be tested
Desired alpha level is divided by no. of comparisons
Tukey’s HSD Procedure
when all pairs of sample means are to be tested
Scheffe’s Procedure (when sample sizes are unequal)
35. Tukey honest sig difference
Tukey tests calculates a new critical value that can be used
to evaluate whether the differences between any two pairs
of means is significant .
The critical value is a little different because it involves the
mean difference that has to be exceeded to achieve
significance.
Simply one calculates one critical value and then the
differences between all possible pairs of means, each
critical value is then compared to the tukey critical value.
If the difference is larger than the tukey value, the
comparison is significant
36. Tukey HSD
The formula for the critical value is
t.hsd = M1-M2
√Msw (1/n)
1. M is the treatment group mean
2. Msw is the mean square error degree of freedom from
the overall F-test
3. n is the sample size for each treatment group
37. Post hoc for our data
t.hsd = M1-M2
√Msw (1/n)
M1=9 M2=6.8 M3= 4.2 n = 5
Msw = 12
1. M1 vs M2 = 9-6.8 = 1.42009
√ 12 × (1/5)
2. M1 vs M3 = 9-4.2 = 3.098
√ 12 × (1/5)
38. Cont..
3. M2 vs M3 = 6.8- 4.2 = 1.678
√ 12 × (1/5)
M1 vs M2 = 1.42009
M1 vs M3 = 3.0983
M2 vs M3 = 1.67829
Using Tukey's sig/ probability table , taking into account (
df w =12 n =3) the mean comparison between the three
pairwise means M1 and M2 and M3 are statistically
significant.
39. THE ANOVA OUTPUT TABLE
The ANOVA table is composed of rows, each row
represents one source of variation
For each source of variation …
The variation is in the SS column
The degrees of freedom is in the df column
The variance is in the MS column
The MS value is found by dividing the SS by the df
40. .
The ANOVA OUTPUT Table
Source SS
(variation)
df MS
(variance)
F
Explained*
Error
Total
41. LETS FILL OUR OUTPUT TABLE WITH THE VALUES WE CALCULATED
.
Source SS df MS F
Explained
(Between
groups)
52.55 2 26.27
Error
(within
groups)
23.6 12 1.97
Total 76.15 14
Divide SS by df to get MS.
42. Find F
.
Source SS df MS F
Explained 52.55 2 26.27 13.34
Error 23.6 12 1.97
Total 76.15 14
F = / =
43. Violations of Assumptions
Normality
Choose the non-parametric Kruskal-Wallis H Test
which does not require the assumption of normality.
Homogeneity of variances
Welch test or
Brown and Forsythe test or Kruskal-Wallis H Test
44.
45. Factorial Anova
Introduction
• Often, we wish to study 2 (or more) factors in a
single experiment, for example..
– Compare two or more treatment protocols
– Compare scores of people who are young, middle-aged,
and elderly
• The baseline experiment will therefore have two
factors as Independent Variables
– Treatment type
– Age groups
46. Two-Way ANOVA
Factorial ANOVA has
One dependent variable
interval or ratio with a normal distribution
Two independent variables
nominal (define groups), and independent of each other
Three hypothesis tests:
Test effect of each independent variable controlling for
the effects of the other independent variable
One: H0: factor A has no impact on Outcome (no relationship)
Two: H0: factor B has no impact on Outcome
Three: Test interaction effect for combinations of categories
H0: there is no relationship between the interaction effect of factors
A and B on the Outcome
48. Assumptions for the Two-Factor ANOVA
The validity of the ANOVA presented in this chapter
depends on three assumptions common to other
hypothesis tests
1. The observations within each sample must be
independent of each other
2. The populations from which the samples are selected
must be normally distributed
3. The populations from which the samples are selected
must have equal variances
(homogeneity of variance)
49. Factorial ANOVA asks three questions
and tests three null hypotheses
• First:
• Does Factor 1 have any
impact on the Outcome?
• Null: The groups defined by Factor 1 will have the same Mean
Outcome.
• Second:
• Does Factor 2 have any impact on the Outcome?
• Null: The groups defined by Factor 2 will have the same
Mean Outcome.
• Third:
• Do Factor 1 and Factor 2 interact in influencing Outcome?
• Null: No combination of Factor 1 and Factor 2 produces
unusually high or unusually low mean Outcome scores.
50. Plotting the Main Effects
Plot the means of each group (defined as a combination of
Factor 1 and Factor 2)
If all the null hypotheses are true, all the points will have
about the same Mean Outcome level.
51. The Null Result: No Effects
The two row means are
the same
The two column means
are the same
All groups have the
same mean score
Neither factor had any
effect
52. Main effects
Row means: the same
Column means: differ
Row means: differ
Column means: the same
53. Degrees of freedom for
Two-Factor ANOVA
dftotal = N – 1
dfwithin treatments = Σdfinside each treatment
dfbetween treatments = k – 1
dfA = number of groups – 1
dfB = number of groups – 1
dfAxB = dfbetween treatments dfA dfB
54. total sums of squares can be partitioned into
“explained” and “unexplained” components …
i.e
• Explained Sums of Squares component (variation
explained by differences between groups)
• Unexplained Sums of Squares component (variation
explained by differences within groups)
55. EXAMPLE
Researcher wants to test a new anti-anxiety medication.
They measure the anxiety of 36 participants on three
different dosages of the medication 0mg, 50 mg, and 100
mg. The participants are also divided based on what school
they are attending , which researchers hypothesize will also
affect the anxiety levels .
Anxiety is rated on a scale of 1-10, with 10 being high
intensity and 1 being low intensity
Use alpha = 0.05 to conduct your analysis
56. example0 mg 50 mg 100 mg
High school
students
3 5 7
4 4 8
5 3 7
3 5 8
4 5 7
3 5 7
College students 1 5 9
2 4 8
1 3 9
1 5 8
1 5 7
2 4 9
57. Factorial ANOVA will produce an F-ratio for each
main effect and for each interaction.
• Main effect: school attended – F ratio.
• Main effect: dosage – F ratio.
• Interaction effect: school attended by dosage – F
ratio
58. We will consider the effect of multiple independent
variables on a single dependent variable.
i.e:
First Independent Variable: dosage
Level 1:
0mg
Level 2:
50 mg
Level 3
100 mg
Second Independent Variable: school
attended
Level 1:
High school students
Level 2:
University students
59. Cont..
We will be comparing 6 groups (3 levels of dosage x 2 levels
of the stage of education ).
The procedure by which we analyze the sums of squares
among the 6 groups based on 2 independent variables
(stage of education and dosage) is called Factorial
ANOVA.
Thus we will organize the data for easy analysis
61. State the null hypothesis
There is no significant difference between the level of
anxiety caused by drug and the level education
There is no significant difference between the level of
anxiety and drug dosage
There is no significant interaction between the level of
education and drug dosage to cause a change in
anxiety
62. next we begin with calculating stage of education Sums of
Squares
We organize the data set with stage of education in the
headers,
63. High school college students
3 1
4 2
5 1
3 1
4 1
3 2
5 5
4 4
3 3
5 5
5 5
5 4
7 9
8 8
7 9
8 8
7 7
64. Next we calculate the dosage Sums of Squares
We reorder the data so that we can calculate sums of squares
for the dosage
65. we will compute the between group sums of squares for dosage Group
0mg 50mg 100mg
3 5 7
4 4 8
5 3 7
3 5 8
4 5 7
3 5 7
1 5 9
2 4 8
1 3 9
1 5 8
1 5 7
2 4 9
2.5 4.42 7.83
4.92 4.92 4.92
5.86 0.25 8.47
70.32 3 101.64
n
17
66. Here is how we reorder the data to calculate the error
(within groups sums of squares)
(mark that the tables are broken into preceding
slides due to the amount of data being
calculated)
67. dosage Stage of
education
Group mean deviation Deviation ²
O mg High schl 3 3.6 -0.6 0.36
O mg High schl 4 3.6 0.4 0.16
O mg High schl 5 3.6 1.4 1.96
O mg High schl 3 3.6 -0.6 0.36
O mg High schl 4 3.6 0.4 0.16
O mg High schl 3 3.6 - 0.6 0.36
O mg College st 1 1.3 - 0.3 0.09
O mg College st 2 1.3 0.7 0.49
O mg College st 1 1.3 - 0.3 0.09
O mg College st 1 1.3 - 0.3 0.09
68. Dosage stage of
education
Group mean deviation deviation²
O mg College st 1 1.3 - 0.3 0.09
O mg College st 2 1.3 0.7 0.49
50 mg High schl 5 4.5 0.5 0.25
50 mg High schl 4 4.5 - 0.5 0.25
50 mg High schl 3 4.5 - 1.5 2.25
50mg High schl 5 4.5 0.5 0.25
50 mg High schl 5 4.5 0.5 0.25
50 mg High schl 5 4.5 0.5 0.25
50 mg College st 5 4.3 0.7 0.49
50 mg College st 4 4.3 - 0.3 0.09
50 mg College st 3 4.3 - 1.3 1.69
69. Dosage Level of
education
Group mean deviation deviation²
50 mg College st 5 4.3 0.7 0.49
50 mg College st 5 4.3 0.7 0.49
50 mg College st 4 4.3 - 0.3 0.09
100 mg High schl 7 7.3 -0.3 0.09
100 mg High schl 8 7.3 0.7 0.49
100 mg High schl 7 7.3 -0.3 0.09
100 mg High schl 8 7.3 0.7 0.49
100 mg High schl 7 7.3 - 0.3 0.09
100 mg High schl 7 7.3 - 0.3 0.09
100 mg College st 9 8.3 0.7 0.49
71. Here is a simple way we go about calculating sums of squares for
the interaction between dosage and stage of education
Dependent variable : anxiety
source sum of sq df ms f Sig level
Level of education 2.25 1
dosage 174.96 2
Level of education×
dosage
error 16.22 30
Total SS
72. Calculate the mean sum of squares
for interaction effect
Here is a simple way we go about calculating sums of squares
for the interaction between dosage and stage of
education
We simply sum up the total sums of squares for the data
being analyzed and then subtract it from the other sums of
squares we calculated earlier i.e
Total sum of squares ( SS of dosage SS for age SS
for error) SS for interaction effect
73. So we will calculate total sum of squares to help us get the interaction effect
Here is our data again
76. to get interaction effect (level of education × dosage) = total SS
– (error + dosage + level of education)
source sum of sq df ms f Sig level
Level of education 2.25 1
dosage 174.96 2
Level of
education× dosage
13.1 2
error 16.22 30
Total SS 206.53
Total dosage
Level of
education
Error
L.E
×dosage
206.53 – 174.96 – 2.25 – 16.22 = 13.1
77. to determine the degrees of freedom for error.
we take the number of subjects (36) and subtract that number by
the number of subgroups (6): = 30
78. source sum of sq df
Level of
education
2.25 1
dosage 174.96 2
Level of
education×
dosage
13.1 2
error 16.22 3o
Total SS 206.53
MS
2.25
87.48
6.55
0.54
F
4.17
162
12.13
sig
0.05
79. If the F ratio is greater than the F critical, we would
reject the null hypothesis and determine that the result
is statistically significant. If the F ratio is smaller than
the F critical then we would fail to reject the null
hypothesis.
80. • In this case we reject the three stated null hypotheses
1. Main Effect for dosage: There is no significant
difference between the level of anxiety caused by drug and
the level education
2. Main Effect for level of education: There is no
significant difference between the level of anxiety and drug
dosage
3. Interaction Effect Between dosage and level of
education: There is no significant interaction between the
level of education and drug dosage to cause a change in
anxiety
81. .
Having rejected the three null hypothesis the
researcher goes ahead to carry out post hoc test using
either tukey hsd, bonferroni or Scheffe's procedure
In this case you can use tukey honest significant
difference by using the formula and procedure
discussed in the previous slides..
82. FATORIAL ANALYSIS ON SPSS
LETS HAVE A LOOK AT HOW FACTORIAL ANALYSIS IS DONE ON SPSS
USING OUR EXAMPLE
HERE IS OUR QUESTION AGAIN.
Researchers wanted to test a new anti-anxiety medication. They measured the
anxiety of 36 participants on 3 different dosages of the medication: 0mg, 50mg
and 100mg. Participants are also divided based on what school they are attending,
which researchers hypothesized will also affect anxiety levels. Anxiety is rated on a
scale of 1-10, with 10 being ‘high anxiety’ and 1 being ‘low anxiety’. Use a p value
of 0.05 to conduct your analysis
83. 0 mg 50mg 100mg
High school
students
3
4
5
3
4
3
5
4
3
5
5
5
7
8
7
8
7
7
College students 1
2
1
1
1
2
5
4
3
5
5
4
9
8
9
8
7
9
We have 2 independent variable: school and dosage and 1 dependent variable:
anxiety level.
84. State the null hypothesis
There is no significant difference between the level of
anxiety caused by drug and the level education
There is no significant difference between the level of
anxiety and drug dosage
There is no significant interaction between the level of
education and drug dosage to cause a change in
anxiety
85. Next we go to SPSS, starting with variable view
* Remember to properly code the variables
90. cont
Our factorial Anova tested for 3 things: school, dosage
and the interaction between school and dosage
School had an f value of 4.17 and a significance level of
0.04988
we can therefore reject the null hypothesis and say
there is a difference between high school and college
in terms of anxiety
91. Cont..
For dosage the significance level is zero so we reject the
null hypothesis: there is a significant difference
between anxiety levels of students administered 0mg,
50mg and 100mg.
For the interaction between school and dosage we
shall reject the null hypothesis: there is a significant
interaction between dosage and school.
92. SUMMARY OF FACTORIAL ANALYSIS ON SPSS
1) Select Analyze → General Linear Model →
2) Univariate →
3) Add dependent and fixed factor(s) →
4) Click on PLOTS →
5) Transfer independent variables from factors into
horizontal and separate lines →
6) Add →
7) Continue
93. FATORIAL ANALYSIS ON SPSS
8) Click the POST HOC button to get Univariate: Post Hoc
multiple comparisons for observed means
9) Transfer the Independent variable (IV) from factors to Post
Hoc tests, then select Tukey Hsd, bonferroni, scheffes
10) Click CONTINUE button to return to Univariate dialogue
box.
11) Click OPTIONS button
94. CONT..
12) Transfer IV from factor(s) and factor interactions box
into the
display means for box → TICK Descriptive Stat
Option
13) Click CONTINUE
14) Click OK
*...End..*
95. references
J. Neter, W. Wasserman and M.H. Kutner (1985). Applied
Linear Statistical Models, Second Edition, Irwin, Inc.
Y. Hochberg and A.C. Tamhane (1987). Multiple
Comparison Procedures. John Wiley & Sons, New York.
L.S. Nelson (1974). “Factors for the Analysis of Means,”
Journal of Quality Technology, 6, pp.175–181
J.C. Hsu (1996). Multiple Comparisons, Theory and
methods, Chapman & Hall, New York.
R.A. Olshen (1973). “The conditional level of the F-test,”
Journal of the American Statistical Association, 68, pp.692–
698
Editor's Notes
What contributes to scores variation
k= No of Samples, n= Total No of observations
We now determine the degrees of freedom for error.
Here we take the number of subjects (36) and subtract that number by the number of subgroups (6): = 30