REPEATED MEASURES OF ANOVA1Repeated Measures ANOVAIntroductionThe repeated measures ANOVA is a member of the ANOVA family. ANOVA is short forAnalysis Of Variance. All ANOVAs compare one or more mean scores with each other; theyare tests for the difference in mean scores (Statistics Solutions, 2012). Repeated measuresANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, andis the extension of the dependent t-test. A repeated measures ANOVA is also referred to as awithin-subjects ANOVA or ANOVA for correlated samples. The repeated factor is called a“within” subjects factor because comparisons are made multiple times ("repeated") “within” thesame subject rather than across ("between") different subjects (Vincent & Weir, n.d).The within-subjects ANOVA is appropriate for repeated measures designs (e.g., pretest-posttestdesigns), within-subjects experimental designs, matched designs, or multiple measures (Newsom,2012).Like T-Tests, repeated measures ANOVA gives the statistic tools to determine whether ornot change has occurred over time. T-Tests compare average scores at two different time periodsfor a single group of subjects. Repeated measures ANOVA compared the average score atmultiple time periods for a single group of subjects(Laerd Statistics, 2013). ExamplesTaking a self-esteem measure before, after, and following-up a psychologicalintervention), and/orA measure taken over time to measure change such as a motivation score upon entry to anew program, 6 months into the program, 1 year into the program, and at the exit of theprogram.A measure repeated across multiple conditionssuch as a measure of experimentalcondition A, condition B, and condition C, and
REPEATED MEASURES OF ANOVA2Several related, comparable measures (e.g., sub-scales of an IQ test).Since repeated measures are collected on the same subjects, the means of those measuresstudy are dependent. A particular subject’s scores will be more alike than scores collected frommultiple subjects, meaning that there is less variability from measure to measure than observedfrom person to person in simple ANOVA.Repeated measures ANOVA separates the two sources of variance: measures andpersons. This separation of the sources of variance decreases MSE, the random variation(sampling error) component, because there are now two sources of known variation (subjects andmeasures) instead of just one (subjects) as in simple ANOVA. The variation in scores due todifferences between subjects is separated from variation due to differences from measure tomeasure within a subject.Instead of comparing treatment effects to a group of different subjects, treatment effectsare compared across multiple measures in the same subjects. Each subject provides their own"control" value for the comparison. Consequently, this type of design is more sensitive todifferences (i.e., requires smaller differences in the dependent variable to reject the nullhypothesis) than are between subjects designs.Assumptions1. Dependent variableDependent variableshould be measured at the interval or ratio level (i.e., they arecontinuous). Examples of variables that meet this criterion include revision time(measured in hours), intelligence (measured using IQ score), exam performance(measured from 0 to 100), weight (measured in kg), and so forth (Laerd Statistics, 2013).
REPEATED MEASURES OF ANOVA32. Independent variableIndependent variable should consist of at least two categorical, "related groups" or"matched pairs". "Related groups" indicates that the same subjects are present in both groups.The reason that it is possible to have the same subjects in each group is because each subjecthas been measured on two occasions on the same dependent variable.For example, a researcher might have measured 10 individuals performance in a spellingtest (the dependent variable) before and after they underwent a new form of computerizedteaching method to improve spelling. He would like to know if the computer trainingimproved their spelling performance. The first related group consists of the subjects at thebeginning (prior to) the computerized spelling training and the second related group consistsof the same subjects, but now at the end of the computerized training. The repeated measuresANOVA can also be used to compare different subjects, but this does not happen veryoften(Laerd Statistics, 2013).3. No significant outliersdifferencesThere should be no significant outliers in the differences between the two related groups.Outliers are simply single data points within the data that do not follow the usual pattern(e.g., in a study of 100 students IQ scores, where the mean score was 108 with only a smallvariation between students, one student had a score of 156, which is very unusual, and mayeven put her in the top 1% of IQ scores globally). The problem with outliers is that they canhave a negative effect on the repeated measures ANOVA, distorting the differences betweenthe related groups (whether increasing or decreasing the scores on the dependent variable),which reduces the accuracy of your results. Fortunately, when using SPSS to run a repeatedmeasure ANOVA it can easily detect possible outliers.
REPEATED MEASURES OF ANOVA44. Normally distributed Dependent variableThe distributions of the differences in the dependent variable between the two or morerelated groups should be approximately normally distributed. The repeated measuresANOVA only require approximately normal data because it is quite "robust" to violations ofnormality, meaning that assumption can be a little violated and still provide valid results. TheShapiro-Wilk test of normality can test for normality, which is easily tested by usingSPSS(Laerd Statistics, 2013).5. SphericitySphericity, is the variances of the differences between all combinations of related groupsmust be equal. Unfortunately, repeated measures ANOVAs are particularly susceptible toviolating the assumption of sphericity, which causes the test to become too liberal (i.e., leadsto an increase in the Type I error rate; that is, the likelihood of detecting a statisticallysignificant result when there isnt one) and loss of power that leads to increase in Type IIerror (Discovering Statistics, n.d.). Mauchlys Test of Sphericity in SPSS, tests whether thedata has met or failed this assumption (Laerd Statistics, 2013).Use of Repeated Measures of ANOVAIn repeated measures ANOVA, the independent variable has categories called levels orrelated groups. Lowry (1999) states that in the correlated-samples ANOVA, the number ofconditions is three or more: A|B|C, A|B|C|D, and so forth. Thus, for k=3:Subject A B C Each rowrepresents onesubject measuredundereach of1subj1 undercondition Asubj1 undercondition Bsubj1 undercondition C2subj2 undercondition Asubj2 undercondition Bsubj2 undercondition C
REPEATED MEASURES OF ANOVA53subj3 undercondition Asubj3 undercondition Bsubj3 undercondition Ckconditions.And so on…For instance, measurements are repeated over time, such as when measuring changes inblood pressure due to an exercise-training program, the independent variable is time. Each level(or related group) is a specific time point. Hence, for the exercise-training study, there would bethree time points and each time-point is a level of the independent variableWhere measurements are made under different conditions, the conditions are the levels (orrelated groups) of the independent variable (e.g., type of cake is the independent variable withchocolate, caramel, and lemon cake as the levels of the independent variable). A schematic of a
REPEATED MEASURES OF ANOVA6different-conditions repeated measures design is shown below. It should be noted that often thelevels of the independent variable are not referred to as conditions, but treatments. Theindependent variable more commonly referred to as the within-subjects factor(Laerd Statistics,2013)..Formula for Repeated Measures ANOVAThe statistic used in repeated measures ANOVA is F, the same statistic as in simple ANOVA,but now computed using the ratio of the variation “within” the subjects to the “error” variation(Vincent & Weir, n.d.).
REPEATED MEASURES OF ANOVA7The observed between measures variance is an estimate of the variation between measuresthat would be expected in the population under the conditions of the study. The observed errorvariance is an estimate of the variation that would be expected to occur as a result of samplingerror alone.If the observed (computed) value for F is significantly higher than the value expectedby sampling variation alone, then the variance between groups is larger than would be expectedby sampling error alone. In other words, at least one mean differs from the others enough tocause large variation between the measures(Vincent & Weir, n.d.).The logic of Repeated Measures ANOVA is that any differences that are found betweentreatments can be explained by only two factors:1. Treatment effect.2. Error or ChanceLarge value of F: a lot of the overall variation in scores is due to the experimentalmanipulation, rather than to random variation between participants (Sussex Edu, n.d.).Small value of F: the variation in scores produced by the experimental manipulation issmall, compared to random variation between participants. Systematic variation randomvariation-“error” (Sussex Edu, n.d.).Understanding One-Way Repeated-Measures ANOVAIn many studies using the one-way repeated-measures design, the levels of a within-subject factor represent multiple observations on a scale over time or under different conditions.However, for some studies, levels of a within-subjects factor may represent scores from differentscales, and the focus may be on evaluating differences in means among these scales. In such asetting the scales must be commensurable for the ANOVA significance tests to be meaningful.
REPEATED MEASURES OF ANOVA8That is, the scales must measure individuals on the same metric, and the difference scoresbetween scales must be interpretable (Oak edu, n.d).In some studies, individuals are matched on one or more variables so that individualswithin a set are similar on a matching variable(s), while individuals not in the same set aredissimilar. The number of individuals within a set is equal to the number of levels of a factor.The individuals within a set are then observed under various levels of this factor. The matchingprocess for these designs is likely to produce correlated responses on the dependent variable likethose of repeated-measures designs. Consequently, the data from these studies can be analyzedas if the factor is a within-subjects factor.SPSS conducts a standard univariateF test if the within-subjects factor has only twolevels. Three types of tests are conducted if the within-subjects factor has more than two levels:the standard univariateF test, alternative univariate tests, and multivariate tests. All three types oftests evaluate the same hypothesis – the population means are equal for all levels of the factor.The choice of what test to report should be made prior to viewing the results (Oak edu, n.d).Thechoice of analysis depends on complex relationships between the degree of sphericity violationand sample size (Park, Cho & Ki, 2009).The standard univariate ANOVA F test is not recommended when the within subjectsfactor has more than two levels because one of its assumptions, the sphericity assumption iscommonly violated, and the ANOVA F test yields inaccurate p values to the extent that thisassumption is violated.The alternative univariate tests take into account violations of the sphericityassumption. These tests employ the same calculated F statistic as the standard univariate test, butits associated p value potentially differs. In determining the p value, an epsilon statistic is
REPEATED MEASURES OF ANOVA9calculated based on the sample data to assess the degree that the sphericity assumption isviolated. The numerator and denominator degrees of freedom of the standard test are multipliedby epsilon to obtain a corrected set of degrees of freedom for the tabled F value and to determineits p value (Oak edu, n.d).The multivariate test does not require the assumption of sphericity. Difference scoresare computed by comparing scores from different levels of the within-subjects factor. Forexample, for a within-subjects factor with three levels, difference scores might be computedbetween the first and second level and between the second and third level. The multivariate testthen would evaluate whether the population means for these two sets of difference scores aresimultaneously equal to zero. This test evaluates not only the means associated with these twosets of difference scores, but also evaluates whether the mean of the difference scores betweenthe first and third levels of the factor is equal to zero as well as linear combinations of thesedifference scores.The SPSS Repeated-Measures procedure computes the difference scores used in theanalysis for us. However, these difference scores do not become part of our data file and,therefore, we may or may not be aware that the multivariate test is conducted on these differencescores. Applied statisticians tend to prefer the multivariate test to the standard or the alternativeunivariate test because the multivariate test and follow-up tests have a close conceptual link toeach other (Oak edu, n.d).If the initial hypothesis that the means are equal is rejected and there are more than twomeans, then follow-up tests are conducted to determine which of the means differs significantlyfrom each other. Although more complex comparisons can be performed, most researcherschoose to conduct pairwise comparisons. These comparisons may be evaluated with SPSS using
REPEATED MEASURES OF ANOVA10the paired-samples ttest procedure, and a Bonferroni approach or the Holm’s SequentialBonferroni procedure, can be used to control for Type I error across the multiple pairwise tests.Hypothesis for Repeated Measures ANOVAAll three types of repeated measures ANOVA tests evaluate the same hypothesis(Khelifa, n.d.). The repeated measures ANOVA tests for whether there are any differencesbetween related population means. The null hypothesis (H0) states that the means are equal:H0: µ1 = µ2 = µ3 = … = µkWhere,µ = population mean andk = number of related groupsThe alternative hypothesis (HA) states that the related population means are not equal (atleast one mean is different to another mean):HA: at least two means are significantly differentExampleAn experimenter wants to look at the effects of practice on manual dexterity scores. Fourpeople are randomly sampled and tested at three different times. Does practice change manualdexterity scores? Test with a = .05.Person Session 1 Session 2 Session 3 PA 3 3 6 12B 2 2 2 6C 1 1 4 6D 2 4 6 12T1=8 T2=10 T3=18 G=36
REPEATED MEASURES OF ANOVA11k=3, n=4, N=12, ∑X2=140Step 1: State the HypothesesH0: m1 = m2 = m3Ha: At least one treatment mean is different from the others.Step 2: Determine FCritIf the example had been an independent measures ANOVA, we would use df betweenand within and find:Fcrit(2, 9)a=.05 = 4.26However, the example was a repeated measures ANOVA so we use df between and errorand find:Fcrit(2, 6)a=.05 = 5.14Step 3: Compute the StatisticAsfor a repeated measures ANOVA321236140612181212363123636312184184104814014123641841048222222222222222222222NGXSSSSSSSSNGkPSSnTXSSNGnTSStotalsubjectswithinerrorSubjectswithinBetween1116)1()(31921NdfnkNdfndfkNdfkdftotalerrorsubjectswithinbetween
REPEATED MEASURES OF ANOVA12Source SS df MS Fobt FcritBetween 14 2 7 7.0 5.14Within 18 9Subjects 12 3Error 6 6 1Total 32 11Conclusion: Fail to reject H0Alternate testThe Friedman testis a non-parametric statistical test used to detect differences intreatments across multiple test attempts. The Friedman analysis of variance by ranks is analternative to one-way repeated measures ANOVA if the dependent variable is not normallydistributed. When using the Friedman test it is important to use a sample size of at least 12participants to obtain accurate p values.AdvantagesRepeated measures are the method of using the same participants in differentexperimental manipulations (Field, 2011). A repeated measure, in using the same participants forboth manipulations, allows the researcher to exclude the effects of individual differences thatcould occur in independent groups (Howitt& Crammer, 2011). Factors such as IQ, ability, ageand other important variables remain the same (Field, 2011). Because the same participants areuse it requires fewer participants than other designs, such as independent designs.The important point is that this small but consistent difference can be detected in the faceof large overall differences among the subjects. Indeed, the difference between conditions is very
REPEATED MEASURES OF ANOVA13small relative to the differences among subjects. It is because the conditions can be comparedwithin each of the subjects that allow the small difference to be apparent. Differences betweensubjects are taken into account and are therefore not error.Removing variance due to differences between subjects from the error variance greatlyincreases the power of significance tests. Therefore, within-subjects designs are almost alwaysmore powerful than between-subject designs. Since power is such an important consideration inthe design of experiments, within-subject designs are generally preferable to between-subjectdesigns. This design is also very economical as sample members are recruited once for treatmentadministration (Choudhury, 2009)DisadvantagesThe use of same participants leads to difficulties counteracting problems of order effectsand need for additional materials. An effect o served could be due to boredom affectingconcentration and performance such as reaction time and accuracy caused by repetition (Pan,Shell &Schleifer, 1994; Bergh &Vrana, 1998; Dsowen, 2011). Effects could also e due topractice causing participants’ results to improve because they were given more chance topractice and become familiar with the task (Collie, Maruff, Darby &McStephen, 2003; Dsowen,2011). The order effects of of an experiment can be reduced by counterbalancing (Field, 2011).This involves randomly assigning the order of the experimental manipulations participants areexposed to. For example, half of the participants would be exposed to Control A and thenControl B, and the other half of the participants exposed to Control B and then Control A(Howitt& Crammer, 2011; Dsowen, 2011). The results collected should then be less affected byfactors such as boredom and practice. Researchers can also provide opportunities to take a break
REPEATED MEASURES OF ANOVA14during the experiment to counteract boredom and loss of concentration (Pan, Shell &Schleifer,1994; Dsowen, 2011).If a study was testing how Factor A and Factor B affected memory the researcher wouldrequire a different list of words for participants to memorize for Factor A and B, whereas inindependent groups the same list could be used for each factor, because each group only sees thematerial once. (Nilsson, Soil & Sullivan, 1994). In using repeated measures the individualdifferences of participants is reduced but this instead produces problems with individualdifferences between the materials or environments the participants are exposed to. Therefore theresult may be due to these differences in materials rather than the independent variables inquestion. The materials must be carefully examined to ensure equal quality in factors such asdifficulty (Riedel, Klaassen, Deutz, Someren&Praag, 1999; Dsowen, 2011).ConclusionThe advantages and disadvantages of repeated measures must be compared with benefitsof using independent groups. Each study must have careful consideration into which designwould best meet the needs of the study. Problems related to the design must be reduced to haveas little effect on results as possible. No method is without any difficulty and the researcher mustdecide which would best produce results the study in investigation (Dsowen, 2011).