The document discusses analysis of variance (ANOVA) which is used to compare the means of three or more groups. It explains that ANOVA avoids the problems of multiple t-tests by providing an omnibus test of differences between groups. The key steps of ANOVA are outlined, including partitioning variation between and within groups to calculate an F-ratio. A large F value indicates more difference between groups than expected by chance alone.
2. ANOVA:ANOVA:
Learning ObjectivesLearning Objectives
• Underlying methodological principles ofUnderlying methodological principles of
ANOVAANOVA
• Statistical principles: Partitioning ofStatistical principles: Partitioning of
variabilityvariability
• Summary table for one-way ANOVASummary table for one-way ANOVA
3. tt-Test vs ANOVA:-Test vs ANOVA:
The Case for Multiple GroupsThe Case for Multiple Groups
• tt-tests can be used to compare mean differences for two-tests can be used to compare mean differences for two
groupsgroups
• Between Subject DesignsBetween Subject Designs
• Within Subject DesignsWithin Subject Designs
• Paired Samples DesignsPaired Samples Designs
• The test allows us to make a judgment concerningThe test allows us to make a judgment concerning
whether or not the observed differences are likely to havewhether or not the observed differences are likely to have
occurred by chance.occurred by chance.
• Although helpful the t-test has its problemsAlthough helpful the t-test has its problems
4. Multiple GroupsMultiple Groups
• Two groups are often insufficientTwo groups are often insufficient
• No difference between Therapy X and Control:No difference between Therapy X and Control:
Are other therapies effective?Are other therapies effective?
• .05mmg/l Alcohol does not decrease memory.05mmg/l Alcohol does not decrease memory
ability relative to a control: What about otherability relative to a control: What about other
doses?doses?
• What if one control group is not enough?What if one control group is not enough?
5. Multiple Groups 2Multiple Groups 2
• With multiple groups we could make multipleWith multiple groups we could make multiple
comparisons using t-testscomparisons using t-tests
• ProblemProblem: We would expect some differences in: We would expect some differences in
means to be significant by chance alonemeans to be significant by chance alone
• How would we know which ones to trust?How would we know which ones to trust?
6. Comparing Multiple Groups:Comparing Multiple Groups:
TheThe ““One-WayOne-Way”” DesignDesign
• Independent variable (Independent variable (““factorfactor””):):
• Dependent variable (Dependent variable (““measurementmeasurement””):):
• Analysis: Variation between and within conditionsAnalysis: Variation between and within conditions
.......
Alcohol Level (Dose in mg/kg)
0 (Control) 10 20
7. Analysis of Variance (F test):Analysis of Variance (F test):
AdvantagesAdvantages
• Provides an omnibus test & avoids multiple t-tests andProvides an omnibus test & avoids multiple t-tests and
spurious significant resultsspurious significant results
• it is more stable since it relies on all of the data ...it is more stable since it relies on all of the data ...
• recall from your work on Std Error and T-testsrecall from your work on Std Error and T-tests
• the smaller the sample the lessthe smaller the sample the less
stable are the populationstable are the population
parameter estimatesparameter estimates..
8. What does ANOVA do?What does ANOVA do?
• Provides an F ratio that has an underlyingProvides an F ratio that has an underlying
distribution which we use to determine statisticaldistribution which we use to determine statistical
significance between groups (just like a t-test or asignificance between groups (just like a t-test or a
z-test)z-test)
• e.g., Take an experiment in which subjects aree.g., Take an experiment in which subjects are
randomly allocated to 3 groupsrandomly allocated to 3 groups
• The means and std deviations will all be different fromThe means and std deviations will all be different from
each othereach other
• We expect this because that is the nature of samplingWe expect this because that is the nature of sampling
(as you know!)(as you know!)
9. The question is …The question is …
are the groups more different thanare the groups more different than
we would expect by chance?we would expect by chance?
10. How does ANOVA work?How does ANOVA work?
• Instead of dealing with means as data points weInstead of dealing with means as data points we
deal with variationdeal with variation
• There is variation (variance) within groups (data)There is variation (variance) within groups (data)
• There is variance between group means (ExptlThere is variance between group means (Exptl
Effect)Effect)
• If groups are equivalent then the varianceIf groups are equivalent then the variance
between and within groups will be equal.between and within groups will be equal.
• Expected variation is used to calculate statisticalExpected variation is used to calculate statistical
significance in the same way that expectedsignificance in the same way that expected
differences in means are used in t-tests or z-testsdifferences in means are used in t-tests or z-tests
11. The basic ANOVA situationThe basic ANOVA situation
Two variables: 1 Categorical, 1 Quantitative
Main Question: Do the (means of) the quantitative
variables depend on which group (given by
categorical variable) the individual is in?
If categorical variable has only 2 values:
• 2-sample t-test
ANOVA allows for 3 or more groups
12. An example ANOVA situationAn example ANOVA situation
Subjects: 25 patients with blisters
Treatments: Treatment A, Treatment B, Placebo
Measurement: # of days until blisters heal
Data [and means]:
• A: 5,6,6,7,7,8,9,10 [7.25]
• B: 7,7,8,9,9,10,10,11 [8.875]
• P: 7,9,9,10,10,10,11,12,13 [10.11]
Are these differences significant?
13. Informal InvestigationInformal Investigation
Graphical investigation:
• side-by-side box plots
• multiple histograms
Whether the differences between the groups are
significant depends on
• the difference in the means
• the standard deviations of each group
• the sample sizes
ANOVA determines P-value from the F statistic
14. Side by Side BoxplotsSide by Side Boxplots
PBA
13
12
11
10
9
8
7
6
5
treatment
days
15. What does ANOVA do?What does ANOVA do?
At its simplest (there are extensions) ANOVA tests theAt its simplest (there are extensions) ANOVA tests the
following hypotheses:following hypotheses:
HH00: The means of all the groups are equal.: The means of all the groups are equal.
HHaa: Not all the means are equal: Not all the means are equal
doesndoesn’’t say how or which ones differ.t say how or which ones differ.
Can follow up withCan follow up with ““multiple comparisonsmultiple comparisons””
Note: we usually refer to the sub-populations asNote: we usually refer to the sub-populations as
““groupsgroups”” when doing ANOVA.when doing ANOVA.
16. Assumptions of ANOVAAssumptions of ANOVA
• each group is approximately normaleach group is approximately normal
• check this by looking at histograms or usecheck this by looking at histograms or use
assumptionsassumptions
• can handle some non-normality, but notcan handle some non-normality, but not
severe outlierssevere outliers
• standard deviations of each group arestandard deviations of each group are
approximately equalapproximately equal
• rule of thumb: ratio of largest to smallestrule of thumb: ratio of largest to smallest
sample st. dev. must be less than 2:1sample st. dev. must be less than 2:1
17. Normality CheckNormality Check
We should check for normality using:
• assumptions about population
• histograms for each group
With such small data sets, there really isn’t a
really good way to check normality from data,
but we make the common assumption that
physical measurements of people tend to be
normally distributed.
18. Notation for ANOVANotation for ANOVA
• n = number of individuals all together
• k = number of groups
• = mean for entire data set is
Group i has
• ni = # of individuals in group i
• xij = value for individual j in group i
• = mean for group i
• si = standard deviation for group i
ix
x
19. How ANOVA works (outline)How ANOVA works (outline)
ANOVA measures two sources of variation in the data and
compares their relative sizes
• variation BETWEEN groups
• for each data value look at the difference between
its group mean and the overall mean
• variation WITHIN groups
• for each data value we look at the difference
between that value and the mean of its group
20. The ANOVA F-statistic is a ratio of the
Between Group Variaton divided by the
Within Group Variation:
F =
Between
Within
=
MSB
MSW
A large F is evidence against H0, since it
indicates that there is more difference
between groups than within groups.
21. How are these computationsHow are these computations
made?made?
We want to measure the amount of variation due
to BETWEEN group variation and WITHIN group
variation
For each data value, we calculate its contribution
to:
• BETWEEN group variation:
• WITHIN group variation:
xi − x( )
2
2
)( iij xx −
22. An even smaller exampleAn even smaller example
Suppose we have three groups
• Group 1: 5.3, 6.0, 6.7
• Group 2: 5.5, 6.2, 6.4, 5.7
• Group 3: 7.5, 7.2, 7.9
We get the following statistics:
SUMMARY
Groups Count Sum Average Variance
Column1 3 18 6 0.49
Column2 4 23.8 5.95 0.176667
Column3 3 22.6 7.533333 0.123333
23. Analysis of Variance for days
Source DF SS MS F P
treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
ANOVA OutputANOVA Output
1 less than # of
groups
# of data values - # of groups
(equals df for each group
added together)
1 less than # of individuals
(just like other situations)
24. ANOVA OutputANOVA Output
MSB = SSB / DFB
MSW = SSW / DFW
Analysis of Variance for days
Source DF SS MS F P
treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
F = MSB / MSW
P-value
comes from
F(DFB,DFW)
(P-values for the F statistic are in Table E)
25. So How big is F?So How big is F?
Since F is
Mean Square Between / Mean Square Within
= MSB / MSW
A large value of F indicates relatively more
difference between groups than within groups
(evidence against H0)
To get the P-value, we compare to F(I-1,n-I)-distribution
• I-1 degrees of freedom in numerator (# groups -1)
• n - I degrees of freedom in denominator (rest of df)
26. WhereWhere’’s the Difference?s the Difference?
Analysis of Variance for days
Source DF SS MS F P
treatmen 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ----------+---------+---------+------
A 8 7.250 1.669 (-------*-------)
B 8 8.875 1.458 (-------*-------)
P 9 10.111 1.764 (------*-------)
----------+---------+---------+------
Pooled StDev = 1.641 7.5 9.0 10.5
Once ANOVA indicates that the groups do not all
appear to have the same means, what do we do?
Clearest difference: P is worse than A (CI’s don’t overlap)