© Charles T. Diebold, Ph.D., 9152013. All Rights Reserved. .docx

© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved.
Page 1 of 16
Repeated Measures ANOVA Tutorial:
RSCH-8250 Advanced Quantitative Reasoning
Charles T. Diebold, Ph.D.
September 15, 2013
How to cite this document:
Diebold, C. T. (2013, September 15). Repeated measures
ANOVA tutorial: RSCH-8250 advanced quantitative
reasoning. Available from [email protected]
Table of Contents
Assignment and Tutorial Introduction
...............................................................................................
................. 2
Section 1: SPSS Specification of the Assignment
...............................................................................................
2
Descriptive Statistics

...............................................................................................
....................................... 3
Repeated Measures ANOVA
...............................................................................................
.......................... 3
Section 2: Annotated Example SPSS Output, Write Up Guide,
and Sample APA Table ..................................... 6
...............................................................................................
....................................... 6
Sphericity Assumption
...............................................................................................
.................................... 7
Tests of Within-Subjects Effects
...............................................................................................
..................... 7
Post Hoc: Profile Plot and Statistical Pairwise Comparison via
Estimated Marginal Means ............................ 8
Tests of Within-Subjects Contrasts
...............................................................................................
............... 10
Test of Between-Subjects Effects
...............................................................................................
.................. 10
Multivariate Tests for Repeated Measures
...............................................................................................
..... 11
Results Write Up Guide
....................................................................................... ........
................................ 11
Sample APA Table
...............................................................................................
....................................... 12
For the Inquisitive: Sphericity and Within-Subjects Effects
Redux ............................................................... 13

Page 2 of 16
Repeated Measures ANOVA Tutorial:
RSCH-8250 Advanced Quantitative Reasoning
Assignment and Tutorial Introduction
This tutorial is intended to assist RSCH-8250 students in
completing the Week 4 application assignment. I
recommend that you use this tutorial as your first line of
instruction; then, if you have time, study the
textbook chapter, Dr. Morrow’s video’s, or other resources
noted in the classroom.
3rd edition of Field textbook:
Review Chapter 13 in the Field textbook and complete repeated
measures analysis of variance in Smart Alex's
Task #2 on p. 504, using the Tutormarks.sav data set from the
Field text.
4th edition of Field textbook:
Review Chapter 14 in the Field textbook and complete repeated
measures analysis of variance in Smart Alex's
Task #2 on p. 589, using the Tutormarks.sav data set from the
Field text.

The objective of the exercise is to conduct and interpret a
repeated measures ANOVA using the following four
variables: tutor1, tutor2, tutor3, tutor4. These represent scores
on the same assignment by eight students from
four different lecturers.
The tutorial contains two sections. Section 1 provides step-by-
step graphic user interface (GUI) screenshot for
specifying the assignment in SPSS. If you follow the steps you
will produce correct SPSS output. Section 2
presents and interprets output for a different set of variables,
includes a results write up guide, and sample APA
style table (the variables and data in Section 2 are “made up”
and do not reflect real research).
Section 1: SPSS Specification of the Assignment
Open the dataset, the Variable View screenshot is shown below.
There are four variables in the dataset as
described above.
Page 3 of 16
ives (below
left). A Descriptives dialogue appears; select and
move all four variables into the Variable(s) box (below right).

Then click OK, which will produce descriptive
statistic output for each lecturers score.
Repeated Measures ANOVA
Go
(below left). A Repeated Measures Define
Factor(s) dialogue appears (below right).
Page 4 of 16
In the Within-Subjects Factor Name box, change “factor1” to
“lecturer” and type in 4 for the number of levels
(this is the number of repeated measures, in this case there were
4 lecturers (below left), then click the Add
button. Lecturer(4) now appears as in below right. Click the
Define button.
After clicking the Define button, the dialogue below left will
appear. Highlight and move all four variables into
the Within-subjects Variables box as shown below right. Click

the Plots button.
Page 5 of 16
After clicking the Plots button, the dialogue below left appears.
Move lecturer to the Horizontal Axis box; click
the Add button, which will add lecturer to the Plots box as
shown below right. Click the Continue button.
After clicking the Continue button you are returned to the
Repeated Measures dialogue shown bottom right on
previous page. Click the Options button. In the Options
dialogue, move (OVERALL) and lecturer to the
Display Means for box and check the box next to Compare main
effects. Under Confidence Interval adjustment,
click the down arrow and select Bonferroni. Under the Display
options, check Descriptive statistics and
Estimates of effect size.
Clicking Continue will produce the output needed for the
assignment.

Page 6 of 16
Section 2: Annotated Example SPSS Output, Write Up Guide,
and Sample APA Table
The example output shown below uses variables different from
the Week 4 assignment. The purpose is to
explain key elements of the output, point out what to focus on,
and demonstrate how to interpret and report the
results in APA statistical style. The output presented below is
not in order of appearance in SPSS output, but
rearranged to address specific teaching objectives and
interpretation tasks.
The variables in the example output are five quiz scores. For
context, you can think of these as a parallel-forms
design with each quiz equivalent in terms of content and
difficulty but given under varying conditions. For
example, the conditions might be different types of noise, each
presented at the same decibel level (e.g., 100dB,
which is about the noise level alongside a lawnmower):
Quiz Noise Condition
1 Nearby crowd of conversing people
2 Orchestra music
3 Pop music
4 Radio shock news
5 Jack hammers

The first part of the output, Within-Subjects Factors, confirms
that we set up the repeated measures
successfully. Here we expect and have the five quizzes listed as
dependent variables. In your assignment you
should see each of the four tutors listed (if not, do not pass go,
go back and restart the SPSS specification as
detailed in Section 1 of this tutorial.
The Descriptive Statistics portion of the output provides the
mean and standard deviation for each of the five
quizzes. This information is needed for the APA table (see
Sample APA Table section of this tutorial).
Within-Subjects Factors
Measure: MEASURE_1
quizzes Dependent
Variable
1 quiz1
2 quiz2
3 quiz3

4 quiz4
5 quiz5
Mean Std. Deviation N
quiz1 7.47 2.481 105
uiz2 7.98 1.623 105
quiz3 7.98 2.308 105
quiz4 7.80 2.280 105
quiz5 7.87 1.765 105
Page 7 of 16
Sphericity Assumption
Mauchly’s test of sphericity needs to be reported and decision
needs to be made whether to use the sphericity
assumed, Greenhouse-Geisser, or Huynh-Feldt results in the
Tests of Within-Subjects Effects. Sphericity for
repeated measures is similar to homogeneity of variance for
between-groups ANOVA. If the variances of each
repeated measure is equal, and if the covariances (and, thus,
correlations) of each pair of repeated measures is
equal, then there is compound symmetry and, as a result,
sphericity is satisfied. When violated (p < .05), the F
test is too liberal (increased Type I error), incorrectly
concluding statistical significance. Here, the sphericity

assumption was violated, Mauchly’s W(9, N = 105) = 93.85, p <
.001. So, in the Tests of Within-Subjects
Effects, we cannot use the sphericity assumed results. Instead,
for purposes of this assignment, we need to
choose between the Greenhouse-Geisser adjusted results or the
Huynh-Feldt adjusted results. In this case, we
would not draw different conclusions using either, but as a
general rule, use Greenhouse-Geisser if its epsilon
value is less than .75, otherwise use Huynh-Feldt1
Mauchly's Test of Sphericitya
.
Measure: MEASURE_1
Within Subjects
Effect
Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
Huynh-Feldt Lower-bound
quizzes .400 93.851 9 .000 .640 .657 .250
Tests the null hypothesis that the error covariance matrix of the
orthonormalized transformed dependent variables is
proportional to an
identity matrix.

a. Design: Intercept
Within Subjects Design: quizzes
b. May be used to adjust the degrees of freedom for the
averaged tests of significance. Corrected tests are displayed in
the Tests of
Within-Subjects Effects table.
The Greenhouse-Geisser adjusted test of mean differences
across the five repeated quiz measures was
statistically significant, F(2.56, 266.10) = 3.049. p = .037, ηp2
= η2 = .0282, ω2 = .0153
(a small effect).
Measure: MEASURE_1
Source Type III Sum of
Squares
df Mean Square F Sig. Partial Eta
Squared
quizzes
Sphericity Assumed 18.819 4 4.705 3.049 .017 .028
Greenhouse-Geisser 18.819 2.559 7.355 3.049 .037 .028
Huynh-Feldt 18.819 2.629 7.159 3.049 .035 .028

Lower-bound 18.819 1.000 18.819 3.049 .084 .028
Error(quizzes)
Sphericity Assumed 641.981 416 1.543
Greenhouse-Geisser 641.981 266.100 2.413
Huynh-Feldt 641.981 273.385 2.348
Lower-bound 641.981 104.000 6.173
1 There are other approaches to dealing with violation of
sphericity, but they are beyond the scope of this course.
2 In a oneway ANOVA, whether between-subjects or within-
subjects, partial eta squared and eta squared are the same.
3 ω2 calculated using formula 14.1 in Field’s 4th edition
(2013).
Page 8 of 16
Post Hoc: Profile Plot and Statistical Pairwise Comparison via
Estimated Marginal Means
The test of within-subjects effect was statistically significant,
indicating there was a difference in means
“somewhere” among the five quizzes. To figure out which ones
were different we have to dig deeper with post
hoc analysis.
Though statistically significant differences cannot be inferred
from a plot of the quizzes, it is a good place to
start to get a sense of what’s going on. In the plot below we see
that the crowd of conversing people (quiz1) was

associated with the lowest
of the five quiz scores.
Orchestra music (quiz2)
and pop music (quiz3)
were associated with the
highest scores. Radio shock
news (quiz4) and
jackhammers (quiz5) were
associated with scores
higher than those for crowd
noise and somewhat lower
than the two music
conditions. Had this been
real research, one would
have selected the noise
conditions with some a
priori theoretical
explanation for expected
differences and interpret
the actual result in light of
the theoretical expectations
(I leave such to your
scientific imagination).
Visual depictions can be misleading, which is why we rely on
statistical tests to determine which quiz means
were different from the others. Tests of pairwise comparisons
are part of the Estimated Marginal Means (EMM)
portion of the output. The first two parts of the EMM output
provide mean, standard error, and 95% confidence
intervals for the mean. These are useful for reference, but the
meat (or tofu) is in the output labeled Pairwise
Comparisons (see next page).

1. Grand Mean
Measure: MEASURE_1
Mean Std. Error 95% Confidence Interval
Lower Bound Upper Bound
7.819 .176 7.470 8.168
Estimates
Measure: MEASURE_1
quizzes Mean Std. Error 95% Confidence Interval
1 7.467 .242 6.987 7.947
2 7.981 .158 7.667 8.295
3 7.981 .225 7.534 8.428
4 7.800 .223 7.359 8.241
5 7.867 .172 7.525 8.208
Page 9 of 16
There is redundancy in the Pairwise Comparisons output, so be
careful not to repeat yourself in the results write
up. For example, the pairwise comparison of quiz 1 with quiz 2

is the same as the pairwise comparison of quiz 2
with quiz 1. You will avoid redundant results if you consider
only the quiz numbers in the 2nd column that are
numbered higher than the quiz number in the 1st column. For
example, for the five rows of information
associated with 1st column quiz 3, only consider rows 4 and 5.
In this example, only two pairs of quizzes statistically
significantly differed using Bonferroni adjusted p values.
The crowd conversing condition (quiz1) had a lower mean (MD
= -0.514, p = .049) than the orchestra music
condition (quiz2), and a lower mean (MD = -0.514, p = .001)
than the pop music condition (quiz3).
Notice in the table that there are significance values of 1.000.
Just like p cannot equal .000, it cannot equal
1.000. In such cases, report as p > .999).
Also notice that quiz2 and quiz3 had the same mean and the
same mean difference from quiz1, but one had a p
value of .049 and the other .001. If curious, see the “For the
Inquisitive…” section of this tutorial.
Pairwise Comparisons
Measure: MEASURE_1
(I) quizzes (J) quizzes Mean Difference
(I-J)
Std. Error Sig.b 95% Confidence Interval for
Differenceb

1
2 -.514* .179 .049 -1.028 -.001
3 -.514* .126 .001 -.874 -.154
4 -.333 .137 .168 -.727 .060
5 -.400 .215 .658 -1.017 .217
2
1 .514* .179 .049 .001 1.028
3 .000 .164 1.000 -.469 .469
4 .181 .173 1.000 -.316 .678
5 .114 .129 1.000 -.255 .483
3
1 .514* .126 .001 .154 .874
2 .000 .164 1.000 -.469 .469
4 .181 .143 1.000 -.229 .591
5 .114 .205 1.000 -.475 .703
4
1 .333 .137 .168 -.060 .727
2 -.181 .173 1.000 -.678 .316

3 -.181 .143 1.000 -.591 .229
5 -.067 .212 1.000 -.676 .542
5
1 .400 .215 .658 -.217 1.017
2 -.114 .129 1.000 -.483 .255
3 -.114 .205 1.000 -.703 .475
4 .067 .212 1.000 -.542 .676
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Bonferroni.
Page 10 of 16
The linear, quadratic, cubic, and fourth-order contrasts are
appropriate when the independent variable is
interval, such as repeated measures one month apart in time or
equal interval dosage increases. In this example
(and for the assignment) the independent variable is not interval
level, but nominal, reflecting different
environmental conditions under which the quiz was taken, so
these contrasts do not apply.

However, suppose quiz1 was taken after drinking a glass of
water (the control condition), quiz2 after 1 cup of
coffee, quiz3 after 2 cups of coffee, quiz4 after 3 cups of
coffee, and quiz5 after 4 cups of coffee. The
nonsignificant result for the linear contrast, p = .091, would
indicate that as coffee increases linearly there is not
a corresponding linear improvement in quiz score. The
significant quadratic result, p = .006, would indicate that
as coffee increases linearly, quiz results increase to a plateau
then decrease—a curvilinear effect that can be
visually seen in the profile plot. So, in this scenario of the quiz
conditions, coffee helps up to a point, then hurts
quiz performance.
Measure: MEASURE_1
Source quizzes Type III Sum of
Squares
Squared
quizzes
Linear 4.024 1 4.024 2.917 .091 .027
Quadratic 8.686 1 8.686 7.858 .006 .070
Cubic 6.095 1 6.095 2.323 .131 .022
Order 4 .014 1 .014 .013 .910 .000

Error(quizzes)
Linear 143.476 104 1.380
Quadratic 114.956 104 1.105
Cubic 272.905 104 2.624
Order 4 110.644 104 1.064
Test of Between-Subjects Effects
For a oneway within subjects repeated measures ANOVA, there
is no between-subjects effect because there is
no grouping factor. Nonetheless, a between-subjects test output
is produced, but is simply a test of the intercept
and of no importance or value. Ignore it.
Tests of Between-Subjects Effects
Measure: MEASURE_1
Transformed Variable: Average
Source Type III Sum of
Squares
Squared
Intercept 32097.190 1 32097.190 1974.033 .000 .950
Error 1691.010 104 16.260

Page 11 of 16
Multivariate Tests for Repeated Measures
Explication of the multivariate test of repeated measures is
beyond the scope of this course and is not as simple
a matter, as some suggest, to use when sphericity is violated.
Ignore it.
Multivariate Tests
Value F Hypothesis df Error df Sig. Partial Eta
Squared
Pillai's trace .152 4.539a 4.000 101.000 .002 .152
Wilks' lambda .848 4.539a 4.000 101.000 .002 .152
Hotelling's trace .180 4.539a 4.000 101.000 .002 .152
Roy's largest root .180 4.539a 4.000 101.000 .002 .152
Each F tests the multivariate effect of quizzes. These tests are
based on the linearly independent pairwise
comparisons among the estimated marginal means.
a. Exact statistic
Results Write Up Guide

Begin the write up by describing the context of the research and
the variables. If known, state how each variable
was operationalized, for example: “Overall GPA was measured
on the traditional 4-point scale from 0 (F) to 4
(A)”, or “Satisfaction was measured on a 5-point likert-type
scale from 1 (not at all satisfied) to 5 (extremely
satisfied).” Please pay attention to APA style for reporting scale
anchors (see p. 91 and p. 105 in the 6th edition
of the APA Manual).
Report descriptive statistics such as minimum, maximum, mean,
and standard deviation for each metric
variable. For nominal variables, report percentage for each level
of the variable, for example: “Of the total
sample (N = 150) there were 40 (26.7%) males and 110 (73.3%)
females.” Keep in mind that a sentence that
includes information in parentheticals must still be a sentence
(and make sense) if the parentheticals are
removed. For example: “Of the total sample there were 40 males
and 110 females.”
State the purpose of the analysis or provide the guiding research
question(s). If you use research questions, do
not craft them such that they can be answered with a yes or no.
Instead, craft them so that they will have a
quantitative answer. For example: “What is the strength and
direction of relationship between X and Y?” or
“What is the difference in group means on X between males and
females?”
Present null and alternative hypothesis sets applicable to the
analysis. For repeated measures ANOVA there
would be a hypothesis set for the main effect of the within-
group factor (i.e., mean differences among the
repeated measures).

State assumptions or other considerations for the analysis, and
report the actual statistical result for relevant
tests. For this course, the only repeated measures ANOVA
consideration that needs to be presented and
discussed is for the sphericity assumption. Even if violated, you
must still report and interpret the remaining
results.
Report and interpret the within-subjects effect, as well as any
post hoc analysis as needed. Be sure to include the
actual statistical results in text—examples were provided within
the annotated output section of this tutorial.
Page 12 of 16
Don’t forget to interpret (i.e., make sense of) the results. Draw
conclusions about rejecting or failing to reject
each null. If needed, summarize the results, without statistics,
in a concluding sentence or paragraph.
Provide APA style tables appropriate to the analysis. Do not use
SPSS table output, it is not in APA style.
Example APA tables for a repeated measures ANOVA are
shown below using the results from the example
output in this tutorial. Although one would typically not
duplicate information in text and tables, it is important
to demonstrate competence in both ways of reporting the
results; so, you cannot just provide tables, you must
also report the relevant statistical results within the textual
write up.
The complete SPSS output should be submitted as a separate

file or pasted into the write up document (but in an
appendix). Do not intermingle SPSS output and write up. The
only exception to this is if you want to include an
SPSS graph; they are not easily converted to APA style, so I
will permit them within the body of the write up.
Sample APA Table
Table 1
Repeated Measures Quiz Condition, Mean, Standard Deviation,
and Pairwise Comparisons (N = 105)
Mean difference in upper diagonal
p in lower diagonala
Quiz # Condition M SD 1 2 3 4 5
1 Crowd conversing 7.47 2.48 -0.51 -0.51 -0.33 -0.40
2 Orchestra music 7.98 1.62 .049 0.00 0.18 0.11
3 Pop music 7.98 2.31 .001 > .999 0.18 0.11
4 Radio shock news 7.80 2.28 .168 > .999 > .999 -0.07
5 Jack hammers 7.87 1.77 .658 > .999 > .999 > .999
aBonferroni adjusted for multiple comparisons.
Page 13 of 16
For the Inquisitive: Sphericity and Within-Subjects Effects

Redux
If the variances of each repeated measure is equal, and if the
covariances (and, thus, correlations) of each pair of
repeated measures is equal, then there is compound symmetry
and, as a result, sphericity is satisfied. When
violated (p < .05), the F test is too liberal (increased Type I
error), incorrectly concluding statistical
significance. In the example output the sphericity assumption
was violated, Mauchly’s W(9, N = 105) = 93.85, p
< .001.
To get a sense of why there was not sphericity, we can examine
the variances (or standard deviations) and pairwise correlations
of
the repeated measures.
From the Descriptive Statistics output we see that the standard
deviations ranged from 1.623 to 2.481. Thus, the variances,
being
the square of the standard deviations, ranged from 2.634 to
6.155,
which, on their face, seem far from being relatively equal.
From the correlation matrix we see
that the pairwise correlations
ranged from .445 (quiz4 with
quiz5) to .858 (quiz1 with quiz3).
The standard deviations of quiz1,
quiz3, and quiz4 appear relatively
equal, and so do their correlations:
.858, .829, and .796. I would
hypothesis that if just these three
were analyzed in repeated

measures ANOVA, sphericity
would be satisfied. I tested the
hypothesis and sphericity was
satisfied (see output below),
Mauchly’s W(2, N = 105) = 2.343,
p = .310.
Notice that the Greenhouse-
Geisser and Huynh-Feldt epsilon
values were .978 and .997,
respectively. The maximum
possible value is 1.0, indicating
perfect symmetry.
Measure: MEASURE_1
Within Subjects Effect Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
threequizzes .978 2.343 2 .310 .978 .997 .500
Mean Std. Deviation N
quiz1 7.47 2.481 105

quiz2 7.98 1.623 105
quiz3 7.98 2.308 105
quiz4 7.80 2.280 105
quiz5 7.87 1.765 105
Correlations
quiz1 quiz2 quiz3 quiz4 quiz5
quiz1
Pearson Correlation 1 .673** .858** .829** .504**
Sig. (2-tailed) .000 .000 .000 .000
N 105 105 105 105 105
quiz2
Pearson Correlation .673** 1 .688** .633** .700**
Sig. (2-tailed) .000 .000 .000 .000
N 105 105 105 105 105
quiz3
Pearson Correlation .858** .688** 1 .796** .493**
Sig. (2-tailed) .000 .000 .000 .000
N 105 105 105 105 105
quiz4
Pearson Correlation .829** .633** .796** 1 .445**

Sig. (2-tailed) .000 .000 .000 .000
N 105 105 105 105 105
quiz5
Pearson Correlation .504** .700** .493** .445** 1
Sig. (2-tailed) .000 .000 .000 .000
N 105 105 105 105 105
**. Correlation is significant at the 0.01 level (2-tailed).
Page 14 of 16
By comparison, with all five quizzes the Greenhouse-Geisser
and Huynh-Feldt epsilon values were .640 and
.657, respectively. While 1.0 is the maximum epsilon value, the
minimum—shown as the lower-bound
epsilon—depends on the number of repeated measures.
Specifically, the minimum epsilon is calculated as 1 ÷
(# of repeated measures – 1). In the case of the five quizzes,
this is 1 ÷ (5 – 1) = 1 ÷ 4 = .250, the lower-bound
value shown in the output.
Measure: MEASURE_1
Within Subjects
Effect

Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
quizzes .400 93.851 9 .000 .640 .657 .250
Before further examination of Greenhouse-Geisser and Huynh-
Feldt, I want to return to a technical point about
sphericity, itself. Previously, I stated that if there was
compound symmetry (equal variance and covariance),
symmetry would be satisfied. Compound symmetry is a
sufficient but
not necessary condition for sphericity. Even if there is not
compound
symmetry, sphericity is technically tested and satisfied if the
pairwise
difference between the repeated measures have equal variance.
With five repeated measures, there would be 10 such pairwise
differences (quiz1 minus quiz2, quiz1 minus quiz3, etc.). Using
syntax
compute commands I actually calculated the 10 pairwise
difference
variables. The variances of these 10 variables, as shown in the
output at
left, ranged from 1.656 to 4.858, indicating unequal variance as
expected.

Back to epsilon and tests of within-subjects effects.
Greenhouse-
Geisser and Huynh-Feldt are adjustments to the df value for the
repeated measures effect (here, the effect of the various quiz
environmental conditions) and the df error value.
Measure: MEASURE_1
Source Type III Sum
of Squares
Squared
Observed
Powera
quizzes
Sphericity Assumed 18.819 4 4.705 3.049 .017 .028 .805
Greenhouse-Geisser 18.819 2.559 7.355 3.049 .037 .028 .662
Huynh-Feldt 18.819 2.629 7.159 3.049 .035 .028 .670
Lower-bound 18.819 1.000 18.819 3.049 .084 .028 .409
Error(quizzes)
Sphericity Assumed 641.981 416 1.543
Greenhouse-Geisser 641.981 266.100 2.413

Huynh-Feldt 641.981 273.385 2.348
Lower-bound 641.981 104.000 6.173
a. Computed using alpha = .05
N Variance
q1minusq2 105 3.368
q1minusq3 105 1.656
q1minusq4 105 1.974
q1minusq5 105 4.858
q2minusq3 105 2.808
q2minusq4 105 3.150
q2minusq5 105 1.737
q3minusq4 105 2.150
q3minusq5 105 4.429
q4minusq5 105 4.736
Valid N (listwise) 105
Page 15 of 16
With sphericity assumed df quizzes = 4 because there were 5

quizzes ( i.e., df = number of repeated measures –
1), and df error = 416 (N – 1 times number of repeated measures
– 1 = 105 – 1 times 5 -1 = 104 x 4 = 416). The
Greenhouse-Geisser correction value (i.e., epsilon value) was
.640. The sphericity assumed effect df times .640
is the corrected effect df for Greenhouse-Geiser (4 x.640 = 2.6).
Similarly, the sphericity assumed error df times
.640 is the corrected error df for Greenhouse-Geiser (416 x .640
= 226.24, within rounding error of the output
value). The Huynh-Feldt df effect and error adjustments are
calculated the same way, but using epsilon value of
.657.
Because the same adjustment (multiplication by a constant
value) is made to both the effect df and the error df,
the F value and partial eta squared are unchanged. What differs
is that significance of the F value is tested using
different df values, so p will not be the same. In this example,
the F value is 3.049. When evaluated at 4 and 416
df (sphericity assumed), p = .017; but when evaluated at 2.559
and 266.100 (Greenhouse-Geisser), p = .037; and
for 2.629 and 273.385 (Huynh-Feldt), p = .035.
Notice that the p values get larger as the epsilon adjustment
value gets smaller. This helps to avoid the increased
risk of Type I error when sphericity is violated. It also,
however, decreases power4
. In the Tests of Within-
Subjects Effects on previous page, I included a power column.
It is highest with sphericity assumed (in this
example .805) and decreased to .670 for Huynh-Feldt and to
.662 for Greenhouse-Geisser (which had the lowest
epsilon value).

It should be clear that there can be statistical significance (p <
.05) with sphericity, but the effect may not be
statistically significant when sphericity is violated and F test
adjustments are made.
Greenhouse-Geisser may underestimate epsilon resulting in too
much correction and Huynh-Feldt may
overestimate epsilon resulting in not enough correction. As a
general rule, use Greenhouse-Geisser if its epsilon
value is less than .75, otherwise use Huynh-Feldt. You can also
average the two adjustments by taking the
average of the two p values, even though this is not technically
correct. Technically, you average the two
epsilon values and compute new effect df and error df adjusted
values. In this example, the average of the .640
and .657 epsilon values is .6485. Adjusted effect df would be 4
x .6485 = 2.594, and adjusted df error would be
416 x .6485 = 269.776. Unfortunately you cannot correctly
compute the p value using Excel’s fdist function
because it truncates the df values. Also, I am not aware of any
online calculator that works with decimal df
values.
Finally, recall that quiz2 and quiz3 had equal mean and equal
mean difference from quiz1, but the quiz1:quiz2
pairwise p = .049 and the quiz1:quiz3 pairwise p = .001. Why
the difference? In a nutshell, the quiz1:quiz3
difference had smaller standard error and, all else equal, p
decreases as standard error decreases. More
precisely, pairwise comparisons constitute a t test, where t =
mean difference ÷ standard error.
Measure: MEASURE_1

(I) quizzes (J) quizzes Mean Difference
(I-J)
Std. Error Sig.b 95% Confidence Interval for
Differenceb
1
2 -.514* .179 .049 -1.028 -.001
3 -.514* .126 .001 -.874 -.154
4 Field (2013) stated that “sphericity creates loss of power” (p.
547). I suspect this is a typo and he meant to state that “lack of
sphericity creates loss of power.
Page 16 of 16
For the quiz1:quiz2 comparison, t(103) = -.514 ÷ .179 = -2.872,
whereas for the quiz1:quiz3 comparison, t(103)
= -.514 ÷ .126 = 4.079. The t value is larger for the quiz1:quiz3
comparison because you are dividing the same
mean difference (-.514) by a smaller standard error. And, for
the same N, a larger t value has a smaller p value.
What does it mean that the quiz1:quiz3 difference had a smaller
standard error than the quiz1:quiz2 difference?
If you create a new variable, such as q1minusq2, by subtracting

the quiz2 scores from the quiz1 scores, and do
similar to create q1minusq3, we can look at the descriptive
statistics for each of the two newly created variables.
The standard error (SE) is a function of the standard deviation
(SD) and the sample size (N), such that
N
SDSE = . Because N = 105 for both variables, the issue boil
downs to differences in the standard deviation.
For the same mean difference and N, the variable with the
smaller standard deviation, in this case, q1minusq3,
will have a larger t value and smaller p value.
N Mean Std. Deviation Variance
Statistic Statistic Std. Error Statistic Statistic
q1minusq2 105 -.5143 .17909 1.83510 3.368
q1minusq3 105 -.5143 .12559 1.28687 1.656
Valid N (listwise) 105
Conceptually, the differences in quiz1 scores and quiz3 scores
were more homogeneous (less spread out) and
the quiz1 and quiz3 scores, themselves, were more highly
correlated (r[103] = .858), than the quiz1 and quiz2
scores (r[103] = .673. This is visually apparent in the
scatterplots below. The quiz1:quiz2 plot on the left is
more scattered than the quiz1:quiz3 plot on the right.
So, the mystery of why quiz2 and quiz3 had equal mean and
equal mean difference from quiz1, but the

quiz1:quiz2 pairwise p = .049 and the quiz1:quiz3 pairwise p =
.001 is solved!

© Charles T. Diebold, Ph.D., 9152013. All Rights Reserved. .docx

Recommended

Recommended

More Related Content

Similar to © Charles T. Diebold, Ph.D., 9152013. All Rights Reserved. .docx

Similar to © Charles T. Diebold, Ph.D., 9152013. All Rights Reserved. .docx (19)

More from LynellBull52

More from LynellBull52 (20)

Recently uploaded

Recently uploaded (20)

© Charles T. Diebold, Ph.D., 9152013. All Rights Reserved. .docx