SlideShare a Scribd company logo
1 of 63
U N I V E R S I T Y O F S O U T H F L O R I D A //
1
Analysis of Variance
Dr. S. Shivendu
U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Analysis of Variance
Differentiate the ANOVA and ANCOVA
statistical methods.
01
Identify real-world situations where the
ANOVA and ANCOVA methods for statistical
inference are applied.
02
Visualize statistical data using SAS Studio
03
U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Analysis of Variance
Analysis of Variance
Concepts
ANOVA
Concepts
ANCOVA
Concepts
Non-Parametric ANOVA
Concepts
Data Visualization
SAS procedures
U N I V E R S I T Y O F S O U T H F L O R I D A // 4
ANOVA is short form of ANalysis Of VAriance
Used with 3 or more groups to test for MEAN DIFFS.
E.g., caffeine study with 3 groups:
No caffeine
Mild dose
Jolt group
Level is value, kind or amount of treatment
Treatment Group is people who get specific level treatment
Treatment Effect is size of difference in means
What is ANOVA?
ANOVA
The test you choose depends on level of measurement:
Independent Dependent Statistical Test
Dichotomous Interval-ratio Independent Samples t-test
Dichotomous
Nominal Nominal Cross Tabs
Dichotomous Dichotomous
Nominal Interval-ratio ANOVA
Dichotomous Dichotomous
Interval-ratio Interval-ratio Correlation and OLS Regression
Dichotomous
ANOVA
• Sometimes we want to know whether the mean level on one
continuous variable (such as income) is different for each group
relative to the others in a nominal variable (such as degree received).
• We could use descriptive statistics (the mean income) to compare the
groups (sociology BA vs. MA vs. PhD).
• However, as sociologists, we usually want to use a sample to
determine whether groups are different in the population.
ANOVA
• ANOVA is an inferential statistics technique that allows you to
compare the mean level on one interval-ratio variable (such as
income) for each group relative to the others in a nominal variable
(such as degree).
• If you had only two groups to compare, ANOVA would give the same
answer as an independent samples t-test.
ANOVA
What if three racial groups had incomes distributed like this in your
sample?
All Groups
Groups Broken Down
Isn’t it conceivable that the differences are due to natural random variability
between samples? Would you want to claim they are different in the
population?
Income in $Ks
Income in $Ks
ANOVA
Now…What if three racial groups had incomes distributed like this in
your sample?
All Groups Combined
Groups Separated Out
Doesn’t it now appear that the groups may be different regardless of sampling
variability? Would you feel comfortable claiming the groups are different in the
population?
Income in $Ks
Income in $Ks
ANOVA
• Conceptually, ANOVA compares the variance within groups to
the overall variance between all the groups to determine
whether the groups appear distinct from each other or if they
look quite the same.
Different groups, different means.
Y-bar Y-bar Y-bar
Similar groups, similar means.
Y-bars
Categories
of Nominal
Variable
Measures on
Continuous
Variable
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 11 12 13 14 15 16
ANOVA
• When the groups have little variation within themselves, but large
variation between them, it would appear that they are distinct and
that their means are different.
Y-bar Y-bar Y-bar Y-bars
Different groups, different means. Similar groups, similar means.
ANOVA
• When the groups have a lot of variation within themselves, but little
variation between them, it would appear that they are similar and
that their means are not really different (perhaps they differ only
because of peculiarities of the particular sample).
Y-bar Y-bar Y-bar Y-bars
Similar groups, similar means.
Different groups, different means.
ANOVA
• Let’s call the the between groups variation: Between Variance: Between
Sum of Squares, BSS/df
• Let’s call the within groups variation:
Within Variance: Within Sum of Squares, WSS/df
• ANOVA compares Between Variance to Within Variance through a ratio we will
call F. F = BSS/g-1
WSS/n-g
Y-bar Y-bar Y-bar
ANOVA
• So what are these BSS and WSS things?
• Remember our friend “Variance?”
• What we are doing is separating out our dependent variable’s overall
variance for all groups into that which is attributable to the deviations
of groups’ means from the overall mean (B) and deviations of
individuals’ scores from their own group’s mean (W).
ANOVA
Variance:
Deviation Yi – Y-bar
Squared Deviation (Yi – Y-bar)2
Sum of Squares Σ(Yi – Y-bar)2
Variance Σ(Yi – Y-bar)2
n – 1
ANOVA
F = BSS/g-1
WSS/n-g
As WSS gets larger, F gets smaller.
As WSS gets smaller, F gets larger.
So, as F gets smaller, the groups are less distinct. As F gets larger, the
groups are more distinct.
Y-bar Y-bar Y-bar
Y-bar Y-bar Y-bar
ANOVA
• In repeated sampling, if there were no group differences, that ratio
“F” would be distributed in a particular way.
Distribution of “F” over repeated sampling,
recording a new F every time.
ANOVA
• The F distribution is like the normal curve, t distribution, and chi-squared distribution: we know
that there is a critical F that demarcates the value beyond which the values of the rarest 5% of F’s
will fall.
Most extreme
5% of F’s
Critical F
What if your sample’s F
were this large?
ANOVA
• One other thing to note is that, like chi-squared, there are different F
distributions depending on the degrees of freedom (df) of the BSS and the WSS.
E.g., F(g-1, n-g).
• BSS df: # of groups in your nominal variable minus 1
• WSS df: # of cases in sample minus number of groups
Most extreme
5% of F’s
Critical F
ANOVA
Look what I found on the web. F distribution changes shape
depending on your sample size and the number of groups you
are comparing:
ANOVA
Null: Group means are equal; 1 = 2 = …= g
Alternative: At least one group’s mean is different
So… if F is larger than the critical F:
Reject the Null in favor of the alternative!
ANOVA
• If your F is in the most extreme 5% of F’s that could occur by chance if
your two variables were unrelated, you have good evidence that your
F might not have come from a population where the means for each
group are equal.
• So, essentially, ANOVA uses your sample to tell you whether, in the
population, you have overlapping group distributions (no difference
between means) or fairly distinct group distributions (differences
between means).
ANOVA
Question: We know that at least one mean is different from another.
We don’t know which one. Now what?
Answer: We do a post-hoc test to see which means are different from
each other.
A post-hoc test (“after the fact test”) is a series of independent samples
t-tests comparing each group’s mean to each of the others’ means.
ANOVA
Conducting the t-tests, you will have g(g-1)/2 pairs of t-tests, where g is
number of groups
If there are 4 groups, then we will have 4 (4-1)/2 or 6 comparisons.
But before you go off doing all these t-tests, remember that with
statistics we play the odds.
Answer:
5
ANOVA
Because of the likelihood of multiple comparison errors, statisticians have created
ways to reduce the multiple comparison error rate.
One of these is the Bonferroni, which adjusts the -level for each comparison by
the number of comparisons. This lowers the likelihood of rejection in each test,
making the joint -level equal to the original -level.
In our example, .05/6 = .008. So -level for each comparison becomes .008. The
combined likelihood of a type 1 error will be .05.
Test
When do we use post hoc tests?
• a. after a significant overall F test
• b. after a nonsignificant overall F test
• c. in place of an overall F test
• d. when we want to determine the impact of different factors
But before we move forward-
• Assumptions of ANOVA
• There are three primary assumptions in ANOVA:
• The responses for each factor level have a normal population distribution.
• These distributions have the same variance.
• The data are independent.
Two-way ANOVA
Main Effects
• Marginal mean of level of factor A: The mean of the level of factor A
across all levels of factor B.
• The main effects of factor A refer to how the marginal means of levels
of factor A change as the level of A change
• In the absence of interactions, the main effects have a straightforward
interpretation: What happens to the mean as we change the level of
factor A and keep the level of factor B fixed.
Interactions
• There is an interaction between A and B if the difference in means for
the different levels of factor A changes as the level of factor B
changes.
• If there are interactions, the main effects no longer have a clear
interpretation. Need to examine the means of all combinations of
levels of A and B (e.g., by using an interaction plot).
0
2
4
6
8
10
12
14
Low B High B
Mean
Response
Low A
High A
0
2
4
6
8
10
12
14
Low B High B
Mean
Response
Low A
High A
F tests for the Two-way ANOVA
• Test for the difference between the levels of the
main factors A and B
F=
MS(A)
MSE
F=
MS(B)
MSE
Rejection region: F > F,a-1 ,n-ab F > F, b-1, n-ab
• Test for interaction between factors A and
B F=
MS(AB)
MSE
Rejection region: F > F,(a-1)(b-1),n-ab
SS(A)/(a-1) SS(B)/(b-1)
SS(AB)/(a-1)(b-1)
SSE/(n-ab)
Required conditions:
1. The response distributions are normal
2. The treatment variances are equal.
3. The samples are independent simple random samples.
Note: There are a*b populations (and samples), one for each
combination of levels of factor A and B.
• Example 15.3 – continued( Xm15-03)
F tests for the Two-way ANOVA
Convenience Quality Price
TV 491 677 575
TV 712 627 614
TV 558 590 706
TV 447 632 484
TV 479 683 478
TV 624 760 650
TV 546 690 583
TV 444 548 536
TV 582 579 579
TV 672 644 795
Newspaper 464 689 803
Newspaper 559 650 584
Newspaper 759 704 525
Newspaper 557 652 498
Newspaper 528 576 812
Newspaper 670 836 565
Newspaper 534 628 708
Newspaper 657 798 546
Newspaper 557 497 616
Newspaper 474 841 587
• Example 15.3 – continued
• Test of the difference in mean sales between the three marketing
strategies
H0: conv. = quality = price
H1: At least two mean sales are different
F tests for the Two-way ANOVA
ANOVA
Source of Variation SS df MS F P-value F crit
Medium 13172.0 1 13172.0 1.42 0.2387 4.02
Strategy 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
Error 501136.7 54 9280.3
Total 614757.0 59
Factor A Marketing strategies
• Example 15.3 – continued
• Test of the difference in mean sales between the three
marketing strategies
H0: conv. = quality = price
H1: At least two mean sales are different
F = MS(Marketing strategy)/MSE = 5.33
Fcritical = F,a-1,n-ab = F.05,3-1,60-(3)(2) = 3.17; (p-value = .0077)
• At 5% significance level there is evidence to infer that
differences in weekly sales exist among the marketing
strategies.
F tests for the Two-way ANOVA
MS(A)/MSE
• Example 15.3 - continued
• Test of the difference in mean sales between the two
advertising media
H0: TV. = Nespaper
H1: The two mean sales differ
F tests for the Two-way ANOVA
Factor B = Advertising media
ANOVA
Source of Variation SS df MS F P-value F crit
Medium 13172.0 1 13172.0 1.42 0.2387 4.02
Strategy 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
Error 501136.7 54 9280.3
Total 614757.0 59
• Example 15.3 - continued
• Test of the difference in mean sales between the two
advertising media
H0: TV. = Nespaper
H1: The two mean sales differ
F = MS(Media)/MSE = 1.42
Fcritical = F,a-1,n-ab = F.05,2-1,60-(3)(2) = 4.02 (p-value = .2387)
• At 5% significance level there is insufficient evidence to
infer that differences in weekly sales exist between the
two advertising media.
F tests for the Two-way ANOVA
MS(B)/MSE
• Example 15.3 - continued
• Test for interaction between factors A and B
H0: TV*conv. = TV*quality =…=newsp.*price
H1: At least two means differ
F tests for the Two-way ANOVA
Interaction AB = Marketing*Media
ANOVA
Source of Variation SS df MS F P-value F crit
Medium 13172.0 1 13172.0 1.42 0.2387 4.02
Stategy 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
Error 501136.7 54 9280.3
Total 614757.0 59
• Example 15.3 - continued
• Test for interaction between factor A and B
H0: TV*conv. = TV*quality =…=newsp.*price
H1: At least two means differ
F = MS(Marketing*Media)/MSE = .09
Fcritical = F,(a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value= .9171)
• At 5% significance level there is insufficient evidence to
infer that the two factors interact to affect the mean
weekly sales.
MS(AB)/MSE
F tests for the Two-way ANOVA
23-41
Analysis of Covariance
• A procedure for comparing treatment means that incorporates
information on a quantitative explanatory variable, X, sometimes called
a covariate.
• The procedure, ANCOVA, is a combination of ANOVA with regression.
23-42
Example: Calf Weight Gain
• An animal scientist wishes to examine the impact of a pair of new dietary
supplements on calf weight gain (response).
• Three treatments are defined: standard diet, standard diet + supplement Q, and
standard diet + supplement R.
• All new calves from a large herd are available for use as study units. She selects 30
calves for study. Calves are randomized to the three diets at random (completely
randomized design).
• Initial weights are recorded, then calves are placed on the diets. At the end of
four weeks the final weight is taken and weight gain is computed.
• Simple analysis of variance and associated multiple comparisons procedures
indicate no significant differences in weight gain between the two supplementary
diets, but big differences between the supplemental diets and the standard diet.
• Is this the end of the story? …
23-43
ANOVA Results
Average Weight Gain
(Response g/day)
Standard
Diet
+ Supplement Q
+ Supplement R
xx
x xx x x xx
xxx
xxx x xx
xx x
x x xx x x x
Simple ANOVA of a one-way classification would suggest no difference
between Supplements Q and R but both different from Standard diet.
23-44
Initial Weights
Initial Weight
Standard
Diet
+ Supplement Q
+ Supplement R
xx x
x xx x x xx
xx
x xxxx x xx
x x x
xx xx x x x
Plotting of the initial weights by group shows that the groups were not equal
when it came to initial weights.
23-45
Weight Gain to Initial
Weight
Standard Diet
Weight (kg)
wF
1
age
wi
1
wi
2
wgain
1
wF
2
2
gain
w
If animals come into the study at different ages, they have different
initial weights and are at different points on the growth curve.
Expected weight gains will be different depending on age at entry
into study.
23-46
Regression of Initial Weight to
Weight Gain
Initial
Weight
(x)
Weight
Gain
(g/day)
(Y)
wi
1
wi
2
wgain
1
wgain
2
If we disregard the age of
the animal but instead
focus on the initial weight,
we see that there is a
linear relationship between
initial weight and the
weight gain expected.
23-47
Covariates
Initial weight in the previous example is a covariable or covariate.
A covariate is a disturbing variable (confounder), that is, it is known to have an
effect on the response. Usually, the covariate can be measured but often we may
not be able to control its effect through blocking.
In the EXAMPLE, had the animal scientist known that the calves were very
variable in initial weight (or age), she could have:
• Created blocks of 3 or 6 equal weight animals, and randomized
treatments to calves within these blocks.
• This would have entailed some cost in terms of time spent sorting the
calves and then keeping track of block membership over the life of the
study.
• It was much easier to simply record the calf initial weight and then use
analysis of covariance for the final analysis.
• In many cases, due to the continuous nature of the covariate, blocking is
just not feasible.
23-48
Expectations under Ho
Initial
Weight
(x)
Expected
Weight
Gain
(g/day)
(Y)
Under Ho: no treatment
effects.
If all animals had come in with the same initial
weight, All three treatments would produce the
same weight gain.
Average Weight Animal
23-49
Expectations under HA
Initial
Weight
(x)
Expected
Weight
Gain
(g/day)
(Y)
Standard Diet (c)
+ Supplement Q (q)
+ Supplement R (r)
Under Ha: Significant
Treatment
effects
Average Weight Animal
WGs
WGQ
WGR
Different treatments
produce different weight
gains for animals of the
same initial weight.
23-50
Different Initial Weights
Initial
Weight
(x)
Expected
Weight
Gain
(g/day)
(Y)
cc c
c cc c c cc
qq
q qqqq q qq
r r r
rr rr r r r
Under Ho: no treatment
effects.
If the average initial weights in the treatment
groups differ, the observed weight gains will
be different, even if treatments have no
effect.
WGs
WGQ
WGR
23-51
Observed Responses under HA
Initial
Weight
(x)
Weight
Gain
(g/day)
(Y)
cc c
c cc c c cc
qq
q qqqq q qq
r r r
rr rr r r r
q
q
q
q
q
q
q
q
q
q
c
c
c
c
c
c
c
c
c
c
r
r
r
r
r
r
r
r
r
r Standard Diet
+ Supplement Q
+ Supplement R
Under HA: Significant
Treatment
effects
Suppose now that different supplements actually do increase weight gain.
This translates to animals in different treatment groups following different, but
parallel regression lines with initial weight.
WGR
WGs
WGQ
What difference in weight gain is due to Initial weight and what is due to Treatment?
23-52
Observed Group
Means
Initial
Weight
(x)
Weight
Gain
(g/day)
(Y)
cc c
c cc c c cc
qq
q qqqq q qq
r r r
rr rr r r r
q
q
q
q
q
q
q
q
q
q
c
c
c
c
c
c
c
c
c
c
r
r
r
r
r
r
r
r
r
r
Standard Diet
+ Supplement Q
+ Supplement R
yc
yr
yq
Unadjusted
treatment means
Simple one-way classification ANOVA (without
accounting for initial weight) gives us the
wrong answer!
23-53
Predicted Average
Responses
Initial
Weight
(x)
Weight
Gain
(g/day)
(Y)
cc c
c cc c c cc
qq
q qqqq q qq
r r r
rr rr r r r
q
q
q
q
q
q
q
q
q
q
c
c
c
c
c
c
c
c
c
c
r
r
r
r
r
r
r
r
r
r
Standard Diet
+ Supplement Q
+ Supplement R
 |
yq X x

 |
yc X x

 |
yr X x

X x

Expected weight gain is computed
for treatments for the average initial
weight and comparisons are then
made.
Adjusted treatment
means
23-54
ANCOVA: Objectives
The objective of an analysis of covariance is to
compare the treatment means after adjusting for
differences among the treatments due to
differences in the covariate levels for the
treatments groups.
The analysis proceeds by combining a regression
model with an analysis of variance model.
23-55
Model
ij i ij
E(y ) x
= m+ a + b
The i, i=1,…,t, are estimates of how each of the t treatments modifies the
overall mean response. (The index j=1,…,n, runs over the n replicates
for each treatment.)
The slope coefficient, , is a measure of how the average response changes
as the value of the covariate changes.
The analysis proceeds by fitting a linear regression model with dummy
variables to code for the different treatment levels.
23-56
A Priori Assumptions
The covariate is related to the response, and can account for variation
in the response.
Check with a scatterplot of Y vs. X.
The covariate is NOT related to the treatments.
If Y is related to X, then the variance of the treatment differences is
increased relative to that obtained from an ANOVA model
without X, which results in a loss of precision.
The treatment’s regression equations are linear in the
covariate.
Check with a scatterplot of Y vs. X, for each treatment. Non-linearity
can be accommodated (e.g. polynomial terms, transforms), but
analysis may be more complex.
The regression lines for the different treatments are parallel.
This means there is only one slope in the Y vs. X plots. Non-parallel
lines can be accommodated, but this complicates the analysis
since differences in treatments will now depend on the value of
23-57
Example
Four different formulations of an industrial glue are
being tested. The tensile strength (response) of the glue
is known to be related to the thickness as applied. Five
observations on strength (Y) in pounds, and thickness
(X) in 0.01 inches are made for each formulation.
Here:
• There are t=4 treatments (formulations of glue).
• Covariate X is thickness of applied glue.
• Each treatment is replicated n=5 times at different
values of X.
Formulation Strength Thickness
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
2 48.7 12
2 49.0 10
2 50.1 11
2 48.5 12
2 45.2 14
3 46.3 15
3 47.1 14
3 48.9 11
3 48.2 11
3 50.3 10
4 44.7 16
4 43.0 15
4 51.0 10
4 48.1 12
4 46.8 11
23-58
Formulation Profiles
40.0
44.0
48.0
52.0
16 15 10 12 11
Thickness (X)
Strength
(Y)
Form_1 Form_2 Form_3 Form_4
23-59
SAS Program data glue;
input Formulation Strength Thickness;
datalines;
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
2 48.7 12
2 49.0 10
2 50.1 11
2 48.5 12
2 45.2 14
3 46.3 15
3 47.1 14
3 48.9 11
3 48.2 11
3 50.3 10
4 44.7 16
4 43.0 15
4 51.0 10
4 48.1 12
4 46.8 11
;
run;
proc glm;
class formulation;
model strength = thickness formulation
/ solution ;
lsmeans formulation / stderr pdiff;
run;
The basic model is a combination of
regression and one-way classification.
U N I V E R S I T Y O F S O U T H F L O R I D A // 60
23-60
Output: Use Type III SS to test significance of each variable
Source DF Squares Mean Square F Value Pr > F
Model 4 66.31065753 16.57766438 10.17 0.0003
Error 15 24.44684247 1.62978950
Corrected Total 19 90.75750000
R-Square Coeff Var Root MSE Strength Mean
0.730636 2.691897 1.276632 47.42500
Source DF Type I SS Mean Square F Value Pr > F
Thickness 1 63.50120135 63.50120135 38.96 <.0001
Formulation 3 2.80945618 0.93648539 0.57 0.6405
Source DF Type III SS Mean Square F Value Pr > F
Thickness 1 53.20115753 53.20115753 32.64 <.0001
Formulation 3 2.80945618 0.93648539 0.57 0.6405
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 58.93698630 B 2.21321008 26.63 <.0001
Thickness -0.95445205 0.16705494 -5.71 <.0001
Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912
Formulation 2 0.62554795 B 0.82451389 0.76 0.4598
Formulation 3 0.86732877 B 0.81361075 1.07 0.3033
Formulation 4 0.00000000 B . . .
Regression on
thickness is
significant.
No formulation
differences.
Divide by MSE
to get mean
squares.
MSE
U N I V E R S I T Y O F S O U T H F L O R I D A // 61
23-61
Least Squares Means
(Adjusted Formulation means
computed at the average value of
Thickness [=12.45])
The GLM Procedure
Least Squares Means
Strength Standard LSMEAN
Formulation LSMEAN Error Pr > |t| Number
1 47.0449486 0.5782732 <.0001 1
2 47.6796062 0.5811616 <.0001 2
3 47.9213870 0.5724527 <.0001 3
4 47.0540582 0.5739134 <.0001 4
Least Squares Means for effect Formulation
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: Strength
i/j 1 2 3 4
1 0.4574 0.3011 0.9912
2 0.4574 0.7695 0.4598
3 0.3011 0.7695 0.3033
4 0.9912 0.4598 0.3033
U N I V E R S I T Y O F S O U T H F L O R I D A // 62
1. ANOVA is a useful statistical method, especially when one can randomize treatment.
2. If in experimental design, one suspects covariates that impact response variable, then one should use ANCOVA
3. Unlike regression analysis, though assumptions for ANOVA or ANCOVA are important, statistical inference is robust even if assumptions only mildly hold.
Key Takeaway
U N I V E R S I T Y O F S O U T H F L O R I D A // 63
Next Class
CLASS 6: Mar.
16
Class 6_Stats_Module Categorical Data and Statistical Analysis
 Categorical input and categorical output
 χ 2 test
 Categorical inputs with continuous outputs: ANOVA
 Kruskal-Wallis test
Quiz 6: Based on Class 6 Readings
Class 6_SAS_Module Statistical Analysis I
 Chap 9 (part) of LBS
 SAS Assignment 6 posted: Due before class 7

More Related Content

Similar to Analysis of Variance

Statistics for Anaesthesiologists
Statistics for AnaesthesiologistsStatistics for Anaesthesiologists
Statistics for Anaesthesiologistsxeonfusion
 
Statistical Tools
Statistical ToolsStatistical Tools
Statistical ToolsZyrenMisaki
 
12-5-03.ppt
12-5-03.ppt12-5-03.ppt
12-5-03.pptMUzair21
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)DSilvaGraf83
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)DMoseStaton39
 
Anova - One way and two way
Anova - One way and two wayAnova - One way and two way
Anova - One way and two wayAbarnaPeriasamy3
 
Anova lecture
Anova lectureAnova lecture
Anova lecturedoublem44
 
One Way Anova
One Way AnovaOne Way Anova
One Way Anovashoffma5
 
Parametric & non-parametric
Parametric & non-parametricParametric & non-parametric
Parametric & non-parametricSoniaBabaee
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
 
Assessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxAssessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxgalerussel59292
 
One way repeated measure anova
One way repeated measure anovaOne way repeated measure anova
One way repeated measure anovaAamna Haneef
 
Assessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxAssessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxfestockton
 
HomeworkPartnership Tax Year and Limited Liability Partnerships.docx
HomeworkPartnership Tax Year and Limited Liability Partnerships.docxHomeworkPartnership Tax Year and Limited Liability Partnerships.docx
HomeworkPartnership Tax Year and Limited Liability Partnerships.docxadampcarr67227
 

Similar to Analysis of Variance (20)

Statistics for Anaesthesiologists
Statistics for AnaesthesiologistsStatistics for Anaesthesiologists
Statistics for Anaesthesiologists
 
Statistical Tools
Statistical ToolsStatistical Tools
Statistical Tools
 
12-5-03.ppt
12-5-03.ppt12-5-03.ppt
12-5-03.ppt
 
12-5-03.ppt
12-5-03.ppt12-5-03.ppt
12-5-03.ppt
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
 
Shovan anova main
Shovan anova mainShovan anova main
Shovan anova main
 
Anova - One way and two way
Anova - One way and two wayAnova - One way and two way
Anova - One way and two way
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Anova test
Anova testAnova test
Anova test
 
Anova lecture
Anova lectureAnova lecture
Anova lecture
 
One Way Anova
One Way AnovaOne Way Anova
One Way Anova
 
ANOVA II
ANOVA IIANOVA II
ANOVA II
 
Parametric & non-parametric
Parametric & non-parametricParametric & non-parametric
Parametric & non-parametric
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Assessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxAssessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docx
 
One way repeated measure anova
One way repeated measure anovaOne way repeated measure anova
One way repeated measure anova
 
Bus 173_4.pptx
Bus 173_4.pptxBus 173_4.pptx
Bus 173_4.pptx
 
Assessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docxAssessment 4 ContextRecall that null hypothesis tests are of.docx
Assessment 4 ContextRecall that null hypothesis tests are of.docx
 
HomeworkPartnership Tax Year and Limited Liability Partnerships.docx
HomeworkPartnership Tax Year and Limited Liability Partnerships.docxHomeworkPartnership Tax Year and Limited Liability Partnerships.docx
HomeworkPartnership Tax Year and Limited Liability Partnerships.docx
 

More from Michael770443

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Michael770443
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice ModelMichael770443
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisMichael770443
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and ClassificationMichael770443
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical MethodsMichael770443
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical ConceptsMichael770443
 

More from Michael770443 (9)

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Classification
ClassificationClassification
Classification
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and Classification
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
 

Recently uploaded

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 

Analysis of Variance

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // 1 Analysis of Variance Dr. S. Shivendu
  • 2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2 Objectives Analysis of Variance Differentiate the ANOVA and ANCOVA statistical methods. 01 Identify real-world situations where the ANOVA and ANCOVA methods for statistical inference are applied. 02 Visualize statistical data using SAS Studio 03
  • 3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3 Agenda Analysis of Variance Analysis of Variance Concepts ANOVA Concepts ANCOVA Concepts Non-Parametric ANOVA Concepts Data Visualization SAS procedures
  • 4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4 ANOVA is short form of ANalysis Of VAriance Used with 3 or more groups to test for MEAN DIFFS. E.g., caffeine study with 3 groups: No caffeine Mild dose Jolt group Level is value, kind or amount of treatment Treatment Group is people who get specific level treatment Treatment Effect is size of difference in means What is ANOVA?
  • 5. ANOVA The test you choose depends on level of measurement: Independent Dependent Statistical Test Dichotomous Interval-ratio Independent Samples t-test Dichotomous Nominal Nominal Cross Tabs Dichotomous Dichotomous Nominal Interval-ratio ANOVA Dichotomous Dichotomous Interval-ratio Interval-ratio Correlation and OLS Regression Dichotomous
  • 6. ANOVA • Sometimes we want to know whether the mean level on one continuous variable (such as income) is different for each group relative to the others in a nominal variable (such as degree received). • We could use descriptive statistics (the mean income) to compare the groups (sociology BA vs. MA vs. PhD). • However, as sociologists, we usually want to use a sample to determine whether groups are different in the population.
  • 7. ANOVA • ANOVA is an inferential statistics technique that allows you to compare the mean level on one interval-ratio variable (such as income) for each group relative to the others in a nominal variable (such as degree). • If you had only two groups to compare, ANOVA would give the same answer as an independent samples t-test.
  • 8. ANOVA What if three racial groups had incomes distributed like this in your sample? All Groups Groups Broken Down Isn’t it conceivable that the differences are due to natural random variability between samples? Would you want to claim they are different in the population? Income in $Ks Income in $Ks
  • 9. ANOVA Now…What if three racial groups had incomes distributed like this in your sample? All Groups Combined Groups Separated Out Doesn’t it now appear that the groups may be different regardless of sampling variability? Would you feel comfortable claiming the groups are different in the population? Income in $Ks Income in $Ks
  • 10. ANOVA • Conceptually, ANOVA compares the variance within groups to the overall variance between all the groups to determine whether the groups appear distinct from each other or if they look quite the same. Different groups, different means. Y-bar Y-bar Y-bar Similar groups, similar means. Y-bars Categories of Nominal Variable Measures on Continuous Variable 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 11 12 13 14 15 16
  • 11. ANOVA • When the groups have little variation within themselves, but large variation between them, it would appear that they are distinct and that their means are different. Y-bar Y-bar Y-bar Y-bars Different groups, different means. Similar groups, similar means.
  • 12. ANOVA • When the groups have a lot of variation within themselves, but little variation between them, it would appear that they are similar and that their means are not really different (perhaps they differ only because of peculiarities of the particular sample). Y-bar Y-bar Y-bar Y-bars Similar groups, similar means. Different groups, different means.
  • 13. ANOVA • Let’s call the the between groups variation: Between Variance: Between Sum of Squares, BSS/df • Let’s call the within groups variation: Within Variance: Within Sum of Squares, WSS/df • ANOVA compares Between Variance to Within Variance through a ratio we will call F. F = BSS/g-1 WSS/n-g Y-bar Y-bar Y-bar
  • 14. ANOVA • So what are these BSS and WSS things? • Remember our friend “Variance?” • What we are doing is separating out our dependent variable’s overall variance for all groups into that which is attributable to the deviations of groups’ means from the overall mean (B) and deviations of individuals’ scores from their own group’s mean (W).
  • 15. ANOVA Variance: Deviation Yi – Y-bar Squared Deviation (Yi – Y-bar)2 Sum of Squares Σ(Yi – Y-bar)2 Variance Σ(Yi – Y-bar)2 n – 1
  • 16. ANOVA F = BSS/g-1 WSS/n-g As WSS gets larger, F gets smaller. As WSS gets smaller, F gets larger. So, as F gets smaller, the groups are less distinct. As F gets larger, the groups are more distinct. Y-bar Y-bar Y-bar Y-bar Y-bar Y-bar
  • 17. ANOVA • In repeated sampling, if there were no group differences, that ratio “F” would be distributed in a particular way. Distribution of “F” over repeated sampling, recording a new F every time.
  • 18. ANOVA • The F distribution is like the normal curve, t distribution, and chi-squared distribution: we know that there is a critical F that demarcates the value beyond which the values of the rarest 5% of F’s will fall. Most extreme 5% of F’s Critical F What if your sample’s F were this large?
  • 19. ANOVA • One other thing to note is that, like chi-squared, there are different F distributions depending on the degrees of freedom (df) of the BSS and the WSS. E.g., F(g-1, n-g). • BSS df: # of groups in your nominal variable minus 1 • WSS df: # of cases in sample minus number of groups Most extreme 5% of F’s Critical F
  • 20. ANOVA Look what I found on the web. F distribution changes shape depending on your sample size and the number of groups you are comparing:
  • 21. ANOVA Null: Group means are equal; 1 = 2 = …= g Alternative: At least one group’s mean is different So… if F is larger than the critical F: Reject the Null in favor of the alternative!
  • 22. ANOVA • If your F is in the most extreme 5% of F’s that could occur by chance if your two variables were unrelated, you have good evidence that your F might not have come from a population where the means for each group are equal. • So, essentially, ANOVA uses your sample to tell you whether, in the population, you have overlapping group distributions (no difference between means) or fairly distinct group distributions (differences between means).
  • 23. ANOVA Question: We know that at least one mean is different from another. We don’t know which one. Now what? Answer: We do a post-hoc test to see which means are different from each other. A post-hoc test (“after the fact test”) is a series of independent samples t-tests comparing each group’s mean to each of the others’ means.
  • 24. ANOVA Conducting the t-tests, you will have g(g-1)/2 pairs of t-tests, where g is number of groups If there are 4 groups, then we will have 4 (4-1)/2 or 6 comparisons. But before you go off doing all these t-tests, remember that with statistics we play the odds. Answer: 5
  • 25. ANOVA Because of the likelihood of multiple comparison errors, statisticians have created ways to reduce the multiple comparison error rate. One of these is the Bonferroni, which adjusts the -level for each comparison by the number of comparisons. This lowers the likelihood of rejection in each test, making the joint -level equal to the original -level. In our example, .05/6 = .008. So -level for each comparison becomes .008. The combined likelihood of a type 1 error will be .05.
  • 26. Test When do we use post hoc tests? • a. after a significant overall F test • b. after a nonsignificant overall F test • c. in place of an overall F test • d. when we want to determine the impact of different factors
  • 27. But before we move forward- • Assumptions of ANOVA • There are three primary assumptions in ANOVA: • The responses for each factor level have a normal population distribution. • These distributions have the same variance. • The data are independent.
  • 29. Main Effects • Marginal mean of level of factor A: The mean of the level of factor A across all levels of factor B. • The main effects of factor A refer to how the marginal means of levels of factor A change as the level of A change • In the absence of interactions, the main effects have a straightforward interpretation: What happens to the mean as we change the level of factor A and keep the level of factor B fixed.
  • 30. Interactions • There is an interaction between A and B if the difference in means for the different levels of factor A changes as the level of factor B changes. • If there are interactions, the main effects no longer have a clear interpretation. Need to examine the means of all combinations of levels of A and B (e.g., by using an interaction plot).
  • 31. 0 2 4 6 8 10 12 14 Low B High B Mean Response Low A High A 0 2 4 6 8 10 12 14 Low B High B Mean Response Low A High A
  • 32. F tests for the Two-way ANOVA • Test for the difference between the levels of the main factors A and B F= MS(A) MSE F= MS(B) MSE Rejection region: F > F,a-1 ,n-ab F > F, b-1, n-ab • Test for interaction between factors A and B F= MS(AB) MSE Rejection region: F > F,(a-1)(b-1),n-ab SS(A)/(a-1) SS(B)/(b-1) SS(AB)/(a-1)(b-1) SSE/(n-ab)
  • 33. Required conditions: 1. The response distributions are normal 2. The treatment variances are equal. 3. The samples are independent simple random samples. Note: There are a*b populations (and samples), one for each combination of levels of factor A and B.
  • 34. • Example 15.3 – continued( Xm15-03) F tests for the Two-way ANOVA Convenience Quality Price TV 491 677 575 TV 712 627 614 TV 558 590 706 TV 447 632 484 TV 479 683 478 TV 624 760 650 TV 546 690 583 TV 444 548 536 TV 582 579 579 TV 672 644 795 Newspaper 464 689 803 Newspaper 559 650 584 Newspaper 759 704 525 Newspaper 557 652 498 Newspaper 528 576 812 Newspaper 670 836 565 Newspaper 534 628 708 Newspaper 657 798 546 Newspaper 557 497 616 Newspaper 474 841 587
  • 35. • Example 15.3 – continued • Test of the difference in mean sales between the three marketing strategies H0: conv. = quality = price H1: At least two mean sales are different F tests for the Two-way ANOVA ANOVA Source of Variation SS df MS F P-value F crit Medium 13172.0 1 13172.0 1.42 0.2387 4.02 Strategy 98838.6 2 49419.3 5.33 0.0077 3.17 Interaction 1609.6 2 804.8 0.09 0.9171 3.17 Error 501136.7 54 9280.3 Total 614757.0 59 Factor A Marketing strategies
  • 36. • Example 15.3 – continued • Test of the difference in mean sales between the three marketing strategies H0: conv. = quality = price H1: At least two mean sales are different F = MS(Marketing strategy)/MSE = 5.33 Fcritical = F,a-1,n-ab = F.05,3-1,60-(3)(2) = 3.17; (p-value = .0077) • At 5% significance level there is evidence to infer that differences in weekly sales exist among the marketing strategies. F tests for the Two-way ANOVA MS(A)/MSE
  • 37. • Example 15.3 - continued • Test of the difference in mean sales between the two advertising media H0: TV. = Nespaper H1: The two mean sales differ F tests for the Two-way ANOVA Factor B = Advertising media ANOVA Source of Variation SS df MS F P-value F crit Medium 13172.0 1 13172.0 1.42 0.2387 4.02 Strategy 98838.6 2 49419.3 5.33 0.0077 3.17 Interaction 1609.6 2 804.8 0.09 0.9171 3.17 Error 501136.7 54 9280.3 Total 614757.0 59
  • 38. • Example 15.3 - continued • Test of the difference in mean sales between the two advertising media H0: TV. = Nespaper H1: The two mean sales differ F = MS(Media)/MSE = 1.42 Fcritical = F,a-1,n-ab = F.05,2-1,60-(3)(2) = 4.02 (p-value = .2387) • At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising media. F tests for the Two-way ANOVA MS(B)/MSE
  • 39. • Example 15.3 - continued • Test for interaction between factors A and B H0: TV*conv. = TV*quality =…=newsp.*price H1: At least two means differ F tests for the Two-way ANOVA Interaction AB = Marketing*Media ANOVA Source of Variation SS df MS F P-value F crit Medium 13172.0 1 13172.0 1.42 0.2387 4.02 Stategy 98838.6 2 49419.3 5.33 0.0077 3.17 Interaction 1609.6 2 804.8 0.09 0.9171 3.17 Error 501136.7 54 9280.3 Total 614757.0 59
  • 40. • Example 15.3 - continued • Test for interaction between factor A and B H0: TV*conv. = TV*quality =…=newsp.*price H1: At least two means differ F = MS(Marketing*Media)/MSE = .09 Fcritical = F,(a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value= .9171) • At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales. MS(AB)/MSE F tests for the Two-way ANOVA
  • 41. 23-41 Analysis of Covariance • A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate. • The procedure, ANCOVA, is a combination of ANOVA with regression.
  • 42. 23-42 Example: Calf Weight Gain • An animal scientist wishes to examine the impact of a pair of new dietary supplements on calf weight gain (response). • Three treatments are defined: standard diet, standard diet + supplement Q, and standard diet + supplement R. • All new calves from a large herd are available for use as study units. She selects 30 calves for study. Calves are randomized to the three diets at random (completely randomized design). • Initial weights are recorded, then calves are placed on the diets. At the end of four weeks the final weight is taken and weight gain is computed. • Simple analysis of variance and associated multiple comparisons procedures indicate no significant differences in weight gain between the two supplementary diets, but big differences between the supplemental diets and the standard diet. • Is this the end of the story? …
  • 43. 23-43 ANOVA Results Average Weight Gain (Response g/day) Standard Diet + Supplement Q + Supplement R xx x xx x x xx xxx xxx x xx xx x x x xx x x x Simple ANOVA of a one-way classification would suggest no difference between Supplements Q and R but both different from Standard diet.
  • 44. 23-44 Initial Weights Initial Weight Standard Diet + Supplement Q + Supplement R xx x x xx x x xx xx x xxxx x xx x x x xx xx x x x Plotting of the initial weights by group shows that the groups were not equal when it came to initial weights.
  • 45. 23-45 Weight Gain to Initial Weight Standard Diet Weight (kg) wF 1 age wi 1 wi 2 wgain 1 wF 2 2 gain w If animals come into the study at different ages, they have different initial weights and are at different points on the growth curve. Expected weight gains will be different depending on age at entry into study.
  • 46. 23-46 Regression of Initial Weight to Weight Gain Initial Weight (x) Weight Gain (g/day) (Y) wi 1 wi 2 wgain 1 wgain 2 If we disregard the age of the animal but instead focus on the initial weight, we see that there is a linear relationship between initial weight and the weight gain expected.
  • 47. 23-47 Covariates Initial weight in the previous example is a covariable or covariate. A covariate is a disturbing variable (confounder), that is, it is known to have an effect on the response. Usually, the covariate can be measured but often we may not be able to control its effect through blocking. In the EXAMPLE, had the animal scientist known that the calves were very variable in initial weight (or age), she could have: • Created blocks of 3 or 6 equal weight animals, and randomized treatments to calves within these blocks. • This would have entailed some cost in terms of time spent sorting the calves and then keeping track of block membership over the life of the study. • It was much easier to simply record the calf initial weight and then use analysis of covariance for the final analysis. • In many cases, due to the continuous nature of the covariate, blocking is just not feasible.
  • 48. 23-48 Expectations under Ho Initial Weight (x) Expected Weight Gain (g/day) (Y) Under Ho: no treatment effects. If all animals had come in with the same initial weight, All three treatments would produce the same weight gain. Average Weight Animal
  • 49. 23-49 Expectations under HA Initial Weight (x) Expected Weight Gain (g/day) (Y) Standard Diet (c) + Supplement Q (q) + Supplement R (r) Under Ha: Significant Treatment effects Average Weight Animal WGs WGQ WGR Different treatments produce different weight gains for animals of the same initial weight.
  • 50. 23-50 Different Initial Weights Initial Weight (x) Expected Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r Under Ho: no treatment effects. If the average initial weights in the treatment groups differ, the observed weight gains will be different, even if treatments have no effect. WGs WGQ WGR
  • 51. 23-51 Observed Responses under HA Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Under HA: Significant Treatment effects Suppose now that different supplements actually do increase weight gain. This translates to animals in different treatment groups following different, but parallel regression lines with initial weight. WGR WGs WGQ What difference in weight gain is due to Initial weight and what is due to Treatment?
  • 52. 23-52 Observed Group Means Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R yc yr yq Unadjusted treatment means Simple one-way classification ANOVA (without accounting for initial weight) gives us the wrong answer!
  • 53. 23-53 Predicted Average Responses Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R  | yq X x   | yc X x   | yr X x  X x  Expected weight gain is computed for treatments for the average initial weight and comparisons are then made. Adjusted treatment means
  • 54. 23-54 ANCOVA: Objectives The objective of an analysis of covariance is to compare the treatment means after adjusting for differences among the treatments due to differences in the covariate levels for the treatments groups. The analysis proceeds by combining a regression model with an analysis of variance model.
  • 55. 23-55 Model ij i ij E(y ) x = m+ a + b The i, i=1,…,t, are estimates of how each of the t treatments modifies the overall mean response. (The index j=1,…,n, runs over the n replicates for each treatment.) The slope coefficient, , is a measure of how the average response changes as the value of the covariate changes. The analysis proceeds by fitting a linear regression model with dummy variables to code for the different treatment levels.
  • 56. 23-56 A Priori Assumptions The covariate is related to the response, and can account for variation in the response. Check with a scatterplot of Y vs. X. The covariate is NOT related to the treatments. If Y is related to X, then the variance of the treatment differences is increased relative to that obtained from an ANOVA model without X, which results in a loss of precision. The treatment’s regression equations are linear in the covariate. Check with a scatterplot of Y vs. X, for each treatment. Non-linearity can be accommodated (e.g. polynomial terms, transforms), but analysis may be more complex. The regression lines for the different treatments are parallel. This means there is only one slope in the Y vs. X plots. Non-parallel lines can be accommodated, but this complicates the analysis since differences in treatments will now depend on the value of
  • 57. 23-57 Example Four different formulations of an industrial glue are being tested. The tensile strength (response) of the glue is known to be related to the thickness as applied. Five observations on strength (Y) in pounds, and thickness (X) in 0.01 inches are made for each formulation. Here: • There are t=4 treatments (formulations of glue). • Covariate X is thickness of applied glue. • Each treatment is replicated n=5 times at different values of X. Formulation Strength Thickness 1 46.5 13 1 45.9 14 1 49.8 12 1 46.1 12 1 44.3 14 2 48.7 12 2 49.0 10 2 50.1 11 2 48.5 12 2 45.2 14 3 46.3 15 3 47.1 14 3 48.9 11 3 48.2 11 3 50.3 10 4 44.7 16 4 43.0 15 4 51.0 10 4 48.1 12 4 46.8 11
  • 58. 23-58 Formulation Profiles 40.0 44.0 48.0 52.0 16 15 10 12 11 Thickness (X) Strength (Y) Form_1 Form_2 Form_3 Form_4
  • 59. 23-59 SAS Program data glue; input Formulation Strength Thickness; datalines; 1 46.5 13 1 45.9 14 1 49.8 12 1 46.1 12 1 44.3 14 2 48.7 12 2 49.0 10 2 50.1 11 2 48.5 12 2 45.2 14 3 46.3 15 3 47.1 14 3 48.9 11 3 48.2 11 3 50.3 10 4 44.7 16 4 43.0 15 4 51.0 10 4 48.1 12 4 46.8 11 ; run; proc glm; class formulation; model strength = thickness formulation / solution ; lsmeans formulation / stderr pdiff; run; The basic model is a combination of regression and one-way classification.
  • 60. U N I V E R S I T Y O F S O U T H F L O R I D A // 60 23-60 Output: Use Type III SS to test significance of each variable Source DF Squares Mean Square F Value Pr > F Model 4 66.31065753 16.57766438 10.17 0.0003 Error 15 24.44684247 1.62978950 Corrected Total 19 90.75750000 R-Square Coeff Var Root MSE Strength Mean 0.730636 2.691897 1.276632 47.42500 Source DF Type I SS Mean Square F Value Pr > F Thickness 1 63.50120135 63.50120135 38.96 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405 Source DF Type III SS Mean Square F Value Pr > F Thickness 1 53.20115753 53.20115753 32.64 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405 Standard Parameter Estimate Error t Value Pr > |t| Intercept 58.93698630 B 2.21321008 26.63 <.0001 Thickness -0.95445205 0.16705494 -5.71 <.0001 Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912 Formulation 2 0.62554795 B 0.82451389 0.76 0.4598 Formulation 3 0.86732877 B 0.81361075 1.07 0.3033 Formulation 4 0.00000000 B . . . Regression on thickness is significant. No formulation differences. Divide by MSE to get mean squares. MSE
  • 61. U N I V E R S I T Y O F S O U T H F L O R I D A // 61 23-61 Least Squares Means (Adjusted Formulation means computed at the average value of Thickness [=12.45]) The GLM Procedure Least Squares Means Strength Standard LSMEAN Formulation LSMEAN Error Pr > |t| Number 1 47.0449486 0.5782732 <.0001 1 2 47.6796062 0.5811616 <.0001 2 3 47.9213870 0.5724527 <.0001 3 4 47.0540582 0.5739134 <.0001 4 Least Squares Means for effect Formulation Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Strength i/j 1 2 3 4 1 0.4574 0.3011 0.9912 2 0.4574 0.7695 0.4598 3 0.3011 0.7695 0.3033 4 0.9912 0.4598 0.3033
  • 62. U N I V E R S I T Y O F S O U T H F L O R I D A // 62 1. ANOVA is a useful statistical method, especially when one can randomize treatment. 2. If in experimental design, one suspects covariates that impact response variable, then one should use ANCOVA 3. Unlike regression analysis, though assumptions for ANOVA or ANCOVA are important, statistical inference is robust even if assumptions only mildly hold. Key Takeaway
  • 63. U N I V E R S I T Y O F S O U T H F L O R I D A // 63 Next Class CLASS 6: Mar. 16 Class 6_Stats_Module Categorical Data and Statistical Analysis  Categorical input and categorical output  χ 2 test  Categorical inputs with continuous outputs: ANOVA  Kruskal-Wallis test Quiz 6: Based on Class 6 Readings Class 6_SAS_Module Statistical Analysis I  Chap 9 (part) of LBS  SAS Assignment 6 posted: Due before class 7