Editorial BoardVolume 1 Volume 5 Volume 9History of Psychology Personality and Social Psychology Health PsychologyDonald K. Freedheim, PhD Theodore Millon, PhD Arthur M. Nezu, PhDCase Western Reserve University Institute for Advanced Studies in Christine Maguth Nezu, PhDCleveland, Ohio Personology and Psychopathology Pamela A. Geller, PhD Coral Gables, Florida Drexel University Melvin J. Lerner, PhD Philadelphia, PennsylvaniaVolume 2 Florida Atlantic UniversityResearch Methods in Psychology Boca Raton, Florida Volume 10 Assessment PsychologyJohn A. Schinka, PhDUniversity of South Florida Volume 6 John R. Graham, PhDTampa, Florida Developmental Psychology Kent State University Richard M. Lerner, PhD Kent, OhioWayne F. Velicer, PhDUniversity of Rhode Island M. Ann Easterbrooks, PhD Jack A. Naglieri, PhDKingston, Rhode Island Jayanthi Mistry, PhD George Mason University Tufts University Fairfax, Virginia Medford, Massachusetts Volume 11Volume 3 Forensic PsychologyBiological Psychology Volume 7 Educational Psychology Alan M. Goldstein, PhDMichela Gallagher, PhD John Jay College of CriminalJohns Hopkins University William M. Reynolds, PhD Justice–CUNYBaltimore, Maryland Humboldt State University New York, New YorkRandy J. Nelson, PhD Arcata, CaliforniaOhio State University Gloria E. Miller, PhD Volume 12Columbus, Ohio University of Denver Industrial and Organizational Denver, Colorado Psychology Walter C. Borman, PhDVolume 4 Volume 8 University of South FloridaExperimental Psychology Clinical Psychology Tampa, FloridaAlice F. Healy, PhD George Stricker, PhD Daniel R. Ilgen, PhDUniversity of Colorado Adelphi University Michigan State UniversityBoulder, Colorado Garden City, New York East Lansing, MichiganRobert W. Proctor, PhD Thomas A. Widiger, PhD Richard J. Klimoski, PhDPurdue University University of Kentucky George Mason UniversityWest Lafayette, Indiana Lexington, Kentucky Fairfax, Virginia v
My efforts in this work are proudly dedicated to Katherine, Christy, and John C. Schinka. J. A. S. This work is dedicated to Sue, the perfect companion for life’s many journeys and the center of my personal universe. W. F. V.
Handbook of Psychology PrefacePsychology at the beginning of the twenty-ﬁrst century has A second unifying thread in psychology is a commitmentbecome a highly diverse ﬁeld of scientiﬁc study and applied to the development and utilization of research methodstechnology. Psychologists commonly regard their discipline suitable for collecting and analyzing behavioral data. Withas the science of behavior, and the American Psychological attention both to specific procedures and their applicationAssociation has formally designated 2000 to 2010 as the in particular settings, Volume 2 addresses research methods“Decade of Behavior.” The pursuits of behavioral scientists in psychology.range from the natural sciences to the social sciences and em- Volumes 3 through 7 of the Handbook present the sub-brace a wide variety of objects of investigation. Some psy- stantive content of psychological knowledge in five broadchologists have more in common with biologists than with areas of study: biological psychology (Volume 3), experi-most other psychologists, and some have more in common mental psychology (Volume 4), personality and social psy-with sociologists than with most of their psychological col- chology (Volume 5), developmental psychology (Volume 6),leagues. Some psychologists are interested primarily in the be- and educational psychology (Volume 7). Volumes 8 throughhavior of animals, some in the behavior of people, and others 12 address the application of psychological knowledge inin the behavior of organizations. These and other dimensions ﬁve broad areas of professional practice: clinical psychologyof difference among psychological scientists are matched by (Volume 8), health psychology (Volume 9), assessment psy-equal if not greater heterogeneity among psychological practi- chology (Volume 10), forensic psychology (Volume 11), andtioners, who currently apply a vast array of methods in many industrial and organizational psychology (Volume 12). Eachdifferent settings to achieve highly varied purposes. of these volumes reviews what is currently known in these Psychology has been rich in comprehensive encyclope- areas of study and application and identiﬁes pertinent sourcesdias and in handbooks devoted to speciﬁc topics in the ﬁeld. of information in the literature. Each discusses unresolved is-However, there has not previously been any single handbook sues and unanswered questions and proposes future direc-designed to cover the broad scope of psychological science tions in conceptualization, research, and practice. Each of theand practice. The present 12-volume Handbook of Psychol- volumes also reﬂects the investment of scientiﬁc psycholo-ogy was conceived to occupy this place in the literature. gists in practical applications of their ﬁndings and the atten-Leading national and international scholars and practitioners tion of applied psychologists to the scientiﬁc basis of theirhave collaborated to produce 297 authoritative and detailed methods.chapters covering all fundamental facets of the discipline, The Handbook of Psychology was prepared for the pur-and the Handbook has been organized to capture the breadth pose of educating and informing readers about the presentand diversity of psychology and to encompass interests and state of psychological knowledge and about anticipated ad-concerns shared by psychologists in all branches of the ﬁeld. vances in behavioral science research and practice. With this Two unifying threads run through the science of behavior. purpose in mind, the individual Handbook volumes addressThe ﬁrst is a common history rooted in conceptual and em- the needs and interests of three groups. First, for graduate stu-pirical approaches to understanding the nature of behavior. dents in behavioral science, the volumes provide advancedThe specific histories of all specialty areas in psychology instruction in the basic concepts and methods that deﬁne thetrace their origins to the formulations of the classical philoso- ﬁelds they cover, together with a review of current knowl-phers and the methodology of the early experimentalists, and edge, core literature, and likely future developments. Second,appreciation for the historical evolution of psychology in all in addition to serving as graduate textbooks, the volumesof its variations transcends individual identities as being one offer professional psychologists an opportunity to read andkind of psychologist or another. Accordingly, Volume 1 in contemplate the views of distinguished colleagues concern-the Handbook is devoted to the history of psychology as ing the central thrusts of research and leading edges of prac-it emerged in many areas of scientific study and applied tice in their respective ﬁelds. Third, for psychologists seekingtechnology. to become conversant with ﬁelds outside their own specialty ix
x Handbook of Psychology Prefaceand for persons outside of psychology seeking informa- valuable contributions to the literature. I would like ﬁnally totion about psychological matters, the Handbook volumes express my appreciation to the editorial staff of John Wileyserve as a reference source for expanding their knowledge and Sons for the opportunity to share in the development ofand directing them to additional sources in the literature. this project and its pursuit to fruition, most particularly to The preparation of this Handbook was made possible by Jennifer Simon, Senior Editor, and her two assistants, Marythe diligence and scholarly sophistication of the 25 volume Porterﬁeld and Isabel Pratt. Without Jennifer’s vision of theeditors and co-editors who constituted the Editorial Board. Handbook and her keen judgment and unﬂagging support inAs Editor-in-Chief, I want to thank each of them for the plea- producing it, the occasion to write this preface would notsure of their collaboration in this project. I compliment them have arrived.for having recruited an outstanding cast of contributors totheir volumes and then working closely with these authors to IRVING B. WEINERachieve chapters that will stand each in their own right as Tampa, Florida
Volume PrefaceA scientific discipline is defined in many ways by the re- computers to perform complex methods of data analysis, in-search methods it employs. These methods can be said to rep- creased computer capacity allowing for more intense analysisresent the common language of the discipline’s researchers. of larger datasets, computer simulations that permit the eval-Consistent with the evolution of a lexicon, new research uation of procedures across a wide variety of situations, newmethods frequently arise from the development of new approaches to data analysis and statistical control, and ad-content areas. By every available measure—number of re- vances in companion sciences that opened pathways to thesearchers, number of publications, number of journals, num- exploration of behavior and created new areas of researchber of new subdisciplines—psychology has undergone a specialization and collaboration.tremendous growth over the last half-century. This growth is Consider the advances since the publication of thereﬂected in a parallel increase in the number of new research first edition of Kirk’s (1968) text on experimental design.methods available. At that time most studies were relatively small N experiments As we were planning and editing this volume, we dis- that were conducted in psychology laboratories. Research ac-cussed on many occasions the extent to which psychology tivity has subsequently exploded in applied and clinicaland the available research methods have become increasing areas, with a proliferation of new journals largely dedicatedcomplex over the course of our careers. When our generation to quasi-experimental studies and studies in the natural envi-of researchers began their careers in the late 1960s and early ronment (e.g., in neuropsychology and health psychology).1970s, experimental design was largely limited to simple Techniques such as polymerase chain reaction allow psychol-between-group designs, and data analysis was dominated by ogists to test speciﬁc genes as risk candidates for behaviorala single method, the analysis of variance. A few other ap- disorders. These studies rely on statistical procedures that areproaches were employed, but by a limited number of re- still largely ignored by many researchers (e.g., logistic re-searchers. Multivariate statistics had been developed, but gression, structural equation modeling). Brain imagingmultiple regression analysis was the only method that was procedures such as magnetic resonance imaging, magnetoen-applied with any frequency. Factor analysis was used almost cephalography, and positron-emission tomography provideexclusively as a method in scale development. Classical test cognitive psychologists and neuropsychologists the opportu-theory was the basis of most psychological and educational nity to study cortical activity on-line. Clinical trials involvingmeasures. Analysis of data from studies that did not meet behavioral interventions applied to large, representative sam-either the design or measurement assumptions required for an ples are commonplace in health psychology. Research em-analysis of variance was covered for most researchers by a ploying each of these procedures requires not only highlysingle book on nonparametric statistics by Siegel (1956). As specific and rigorous research methods, but also speciala review of the contents of this volume illustrates, the choice methods for handling and analyzing extremely large volumesof experimental and analytic methods available to the of data. Even in more traditional areas of research that con-present-day researcher is much broader. It would be fair to tinue to rely on group experimental designs, issues of mea-say that the researcher in the 1960s had to formulate research suring practical significance, determination of sample sizequestions to ﬁt the available methods. Currently, there are re- and power, and procedures for handling nuisance variablessearch methods available to address most research questions. are now important concerns. Not surprisingly, the third edi- In the history of science, an explosion of knowledge is tion of Kirk’s (1995) text has grown in page length by 60%.usually the result of an advance in technology, new theoreti- Our review of these trends leads to several conclusions,cal models, or unexpected empirical ﬁndings. Advances in which are reﬂected in the selection of topics covered by theresearch methods have occurred as the result of all three fac- chapters in this volume. Six features appear to characterizetors, typically in an interactive manner. Some of the speciﬁc the evolution in research methodology in psychology.factors include advances in instrumentation and measure- First, there has been a focus on the development of proce-ment technology, the availability of inexpensive desktop dures that employ statistical control rather than experimental xi
xii Volume Prefacecontrol. Because most of the recent growth involves research to focus on the individual and model individual differences.in areas that preclude direct control of independent variables, This becomes increasingly important as we recognize that in-multivariate statistics and the development of methods such terventions do not affect everyone in exactly the same waysas path analysis and structural equation modeling have been and that interventions become more and more tailored to thecritical developments. The use of statistical control has al- individual.lowed psychology to move from the carefully controlled con- The text is organized into four parts. The ﬁrst part, titledﬁnes of the laboratory to the natural environment. “Foundations of Research,” addresses issues that are funda- Second, there has been an increasing focus on construct- mental to all behavioral science research. The focus is ondriven, or latent-variable, research. A construct is deﬁned by study design, data management, data reduction, and data syn-multiple observed variables. Constructs can be viewed as thesis. The ﬁrst chapter, “Experimental Design” by Roger E.more reliable and more generalizable than a single observed Kirk, provides an overview of the basic considerations thatvariable. Constructs serve to organize a large set of observed go into the design of a study. Once, a chapter on this topicvariables, resulting in parsimony. Constructs are also theoret- would have had to devote a great deal of attention to compu-ically based. This theory-based approach serves to guide tational procedures. The availability of computers permits astudy design, the choice of variables, the data analysis, and shift in focus to the conceptual rather than the computationalthe data interpretation. issues. The second chapter, “Exploratory Data Analysis” by Third, there has been an increasing emphasis on the de- John T. Behrens and Chong-ho Yu, reminds us of the funda-velopment of new measures and new measurement models. mental importance of looking at data in the most basic waysThis is not a new trend but an acceleration of an old trend. as a ﬁrst step in any data analysis. In some ways this repre-The behavioral sciences have always placed the most empha- sents a “back to the future” chapter. Advances in computer-sis on the issue of measurement. With the movement of the based graphical methods have brought a great deal of sophis-ﬁeld out of the laboratory combined with advances in tech- tication to this very basic ﬁrst step.nology, the repertoire of measures, the quality of the mea- The third chapter, “Power: Basics, Practical Problems,sures, and the sophistication of the measurement models have and Possible Solutions” by Rand R. Wilcox, reﬂects the crit-all increased dramatically. ical change in focus for psychological research. Originally, Fourth, there is increasing recognition of the importance of the central focus of a test of signiﬁcance was on controllingthe temporal dimension in understanding a broad range of psy- Type I error rates. The late Jacob Cohen emphasized that re-chological phenomena. We have become a more intervention- searchers should be equally concerned by Type II errors.oriented science, recognizing not only the complexity of This resulted in an emphasis on the careful planning of atreatment effects but also the importance of the change in pat- study and a concern with effect size and selecting the appro-terns of the effects over time. The effects of an intervention priate sample size. Wilcox updates and extends these con-may be very different at different points in time. New statisti- cepts. Chapter 4, “Methods for Handling Missing Data” bycal models for modeling temporal data have resulted. John W. Graham, Patricio E. Cumsille, and Elvira Elek-Fisk, Fifth, new methods of analysis have been developed describes the impressive statistical advances in addressingthat no longer require the assumption of a continuous, equal- the common practical problem of missing observations.interval, normally distributed variable. Previously, re- Previously, researchers had relied on a series of ad hoc pro-searchers had the choice between very simple but limited cedures, often resulting in very inaccurate estimates. The newmethods of data analysis that corresponded to the properties statistical procedures allow the researcher to articulate theof the measure or more complex sophisticated methods of assumptions about the reason the data is missing and makeanalysis that assumed, often inappropriately, that the measure very sophisticated estimates of the missing value based on allmet very rigid assumptions. New methods have been devel- the available information. This topic has taken on even moreoped for categorical, ordinal, or simply nonnormal variables importance with the increasing emphasis on longitudinalthat can perform an equally sophisticated analysis. studies and the inevitable problem of attrition. Sixth, the importance of individual differences is increas- The ﬁfth chapter, “Preparatory Data Analysis” by Linda S.ingly emphasized in intervention studies. Psychology has Fidell and Barbara G. Tabachnick, describes methods of pre-always been interested in individual differences, but meth- processing data before the application of other methods ofods of data analysis have focused almost entirely on the rela- statistical analysis. Extreme values can distort the results oftionships between variables. Individuals were studied as the data analysis if not addressed. Diagnostic methods canmembers of groups, and individual differences served only to preprocess the data so that complex procedures are not un-inﬂate the error variance. New techniques permit researchers duly affected by a limited number of cases that often are the
Volume Preface xiiiresult of some type of error. The last two chapters in this part, ﬁndings that are of interest to psychologists in many ﬁelds.“Factor Analysis” by Richard L. Gorsuch and “Clustering The major goal of Chapter 11, “Animal Learning” by Russelland Classification Methods” by Glenn W. Milligan and M. Church, is to transfer what is fairly common knowledge inStephen C. Hirtle, describe two widely employed parsimony experimental animal psychology to investigators with limitedmethods. Factor analysis operates in the variable domain and exposure to this area of research. In Chapter 12, “Neuropsy-attempts to reduce a set of p observed variables to a smaller chology,” Russell M. Bauer, Elizabeth C. Leritz, and Dawnset of m factors. These factors, or latent variables, are more Bowers provide a discussion of neuropsychological inference,easily interpreted and thus facilitate interpretation. Cluster an overview of major approaches to neuropsychological re-analysis operates in the person domain and attempts to reduce search, and a review of newer techniques, including functionala set of N individuals to a set of k clusters. Cluster analysis neuroimaging, electrophysiology, magnetoencephalography,serves to explore the relationships among individuals and or- and reversible lesion methods. In each section, they describeganize the set of individuals into a limited number of sub- the conceptual basis of the technique, outline its strengths andtypes that share essential features. These methods are basic to weaknesses, and cite examples of how it has been used inthe development of construct-driven methods and the focus addressing conceptual problems in neuropsychology.on individual differences. Whatever their specialty area, when psychologists evalu- The second part, “Research Methods in Speciﬁc Content ate a program or policy, the question of impact is often at cen-Areas,” addresses research methods and issues as they apply ter stage. The last chapter in this part, “Program Evaluation”to speciﬁc content areas. Content areas were chosen in part to by Melvin M. Mark, focuses on key methods for estimatingparallel the other volumes of the Handbook. More important, the effects of policies and programs in the context of evalua-however, we attempted to sample content areas from a broad tion. Additionally, Mark addresses several noncausal formsspectrum of specialization with the hope that these chapters of program evaluation research that are infrequently ad-would provide insights into methodological concerns and dressed in methodological treatises.solutions that would generalize to other areas. Chapter 8, The third part is titled “Measurement Issues.” Advances in“Clinical Forensic Psychology” by Kevin S. Douglas, Randy measurement typically combine innovation in technologyK. Otto, and Randy Borum, addresses research methods and and progress in theory. As our measures become more so-issues that occur in assessment and treatment contexts. For phisticated, the areas of application also increase.each task that is unique to clinical forensic psychology Mood emerged as a seminal concept within psychologyresearch, they provide examples of the clinical challenges during the 1980s, and its prominence has continued unabatedconfronting the psychologist, identify problems faced when ever since. In Chapter 14, “Mood Measurement: Currentresearching the issues or constructs, and describe not only re- Status and Future Directions,” David Watson and Jatin Vaidyasearch strategies that have been employed but also their examine current research regarding the underlying structurestrengths and limitations. In Chapter 9, “Psychotherapy Out- of mood, describe and evaluate many of the most importantcome Research,” Evelyn S. Behar and Thomas D. Borkovec mood measures, and discuss several issues related to theaddress the methodological issues that need to be considered reliability and construct validity of mood measurement. Infor investigators to draw the strongest and most specific Chapter 15, “Measuring Personality and Psychopathology,”cause-and-effect conclusions about the active components of Leslie C. Morey uses objective self-report methods of mea-treatments, human behavior, and the effectiveness of thera- surement to illustrate contemporary procedures for scalepeutic interventions. development and validation, addressing issues critical to all The ﬁeld of health psychology is largely deﬁned by three measurement methods such as theoretical articulation, situa-topics: the role of behavior (e.g., smoking) in the develop- tional context, and the need for discriminant validity.ment and prevention of disease, the role of stress and emotion The appeal of circular models lies in the combination of aas psychobiological inﬂuences on disease, and psychological circle’s aesthetic (organizational) simplicity and its powerfulaspects of acute and chronic illness and medical care. Insight potential to describe data in uniquely compelling substantiveinto the methodological issues and solutions for research in and geometric ways, as has been demonstrated in describ-each of these topical areas is provided by Timothy W. Smith ing interpersonal behavior and occupational interests. Inin Chapter 10, “Health Psychology.” Chapter 16, “The Circumplex Model: Methods and Research At one time, most behavioral experimentation was con- Applications,” Michael B. Gurtman and Aaron L. Pincus dis-ducted by individuals whose training focused heavily on ani- cuss the application of the circumplex model to the descrip-mal research. Now many neuroscientists, trained in various tions of individuals, comparisons of groups, and evaluationsfields, conduct research in animal learning and publish of constructs and their measures.
xiv Volume Preface Chapter 17, “Item Response Theory and Measuring Abili- Velicer and Joseph L. Fava describe a method for studyingties” by Karen M. Schmidt and Susan E. Embretson, de- the change in a single individual over time. Instead of a sin-scribes the types of formal models that have been designed to gle observation on many subjects, this method relies on manyguide measure development. For many years, most tests of observations on a single subject. In many ways, this methodability and achievement have relied on classical test theory as is the prime exemplar of longitudinal research methods.a framework to guide both measure development and mea- Chapter 24, “Structural Equation Modeling” by Jodie B.sure evaluation. Item response theory updates this model in Ullman and Peter M. Bentler, describes a very generalmany important ways, permitting the development of a new method that combines three key themes: constructs or latentgeneration of measures of abilities and achievement that are variables, statistical control, and theory to guide data analy-particularly appropriate for a more interactive model of as- sis. First employed as an analytic method little more thansessment. The last chapter of this part, “Growth Curve Analy- 20 years ago, the method is now widely disseminated in thesis in Contemporary Psychological Research” by John J. behavioral sciences. Chapter 25, “Ordinal Analysis of Behav-McArdle and John R. Nesselroade, describes new quantita- ioral Data” by Jeffrey D. Long, Du Feng, and Norman Cliff,tive methods for the study of change in development psy- discusses the assumptions that underlie many of the widelychology. The methods permit the researcher to model a wide used statistical methods and describes a parallel series ofvariety of different patterns of developmental change over methods of analysis that only assume that the measure pro-time. vides ordinal information. The last chapter, “Latent Class and The ﬁnal part, “Data Analysis Methods,” addresses statis- Latent Transition Analysis” by Stephanie L. Lanza, Brian P.tical procedures that have been developed recently and are Flaherty, and Linda M. Collins, describes a new method forstill not widely employed by many researchers. They are typ- analyzing change over time. It is particularly appropriateically dependent on the availability of high-speed computers when the change process can be conceptualized as a series ofand permit researchers to investigate novel and complex re- discrete states.search questions. Chapter 19, “Multiple Linear Regression” In completing this project, we realized that we were veryby Leona Aiken, Stephen G. West, and Steven C. Pitts, de- fortunate in several ways. Irving Weiner’s performance asscribes the advances in multiple linear regression that permit editor-in-chief was simply wonderful. He applied just the rightapplications of this very basic method to the analysis of com- mix of obsessive concern and responsive support to keep thingsplex data sets and the incorporation of conceptual models to on schedule. His comments on issues of emphasis, perspective,guide the analysis. The testing of theoretical predictions and and quality were insightful and inevitably on target.the identification of implementation problems are the two We continue to be impressed with the professionalism ofmajor foci of this chapter. Chapter 20, “Logistic Regression” the authors that we were able to recruit into this effort.by Alfred DeMaris, describes a parallel method to multiple Consistent with their reputations, these individuals deliv-regression analysis for categorical variables. The procedure ered chapters of exceptional quality, making our burdenhas been developed primarily outside of psychology and is pale in comparison to other editorial experiences. Because ofnow being used much more frequently to address psycholog- the length of the project, we shared many contributors’ical questions. Chapter 21, “Meta-Analysis” by Frank L. experiences-marriages, births, illnesses, family crises. A def-Schmidt and John E. Hunter, describes procedures that have inite plus for us has been the formation of new friendshipsbeen developed for the quantitative integration of research and professional liaisons.ﬁndings across multiple studies. Previously, research ﬁndings Our editorial tasks were also aided greatly by the generouswere integrated in narrative form and were subject to the bi- assistance of our reviewers, most of whom will be quicklyases of the reviewer. The method also focuses attention on the recognized by our readers for their own expertise in researchimportance of effect size estimation. methodology. We are pleased to thank James Algina, Phipps Chapter 22, “Survival Analysis” by Judith D. Singer and Arabie, Patti Barrows, Betsy Jane Becker, Lisa M. Brown,John B. Willett, describes a recently developed method for Barbara M. Byrne, William F. Chaplin, Pat Cohen, Patrick J.analyzing longitudinal data. One approach is to code whether Curren, Glenn Curtiss, Richard B. Darlington, Susanan event has occurred at a given occasion. By switching the Duncan, Brian Everitt, Kerry Evers, Ron Gironda, Lisafocus on the time to the occurrence of the event, a much more Harlow, Michael R. Harwell, Don Hedeker, David Charlespowerful and sophisticated analysis can be performed. Again, Howell, Lawrence J. Hubert, Bradley E. Huitema, Beththe development of this procedure has occurred largely out- Jenkins, Herbert W. Marsh, Rosemarie A. Martin, Scott E.side psychology but is being employed much more fre- Maxwell, Kevin R. Murphy, Gregory Norman, Daniel J.quently. In Chapter 23, “Time Series Analysis,” Wayne Ozer, Melanie Page, Mark D. Reckase, Charles S. Reichardt,
Volume Preface xvSteven Reise, Joseph L. Rogers, Joseph Rossi, James or supported by differing approaches. For ﬂaws in the text,Rounds, Shlomo S. Sawilowsky, Ian Spence, James H. however, the usual rule applies: We assume all responsibility.Steiger, Xiaowu Sun, Randall C. Swaim, David Thissen,Bruce Thompson, Terence J. G. Tracey, Rod Vanderploeg, JOHN A. SCHINKAPaul F. Velleman, Howard Wainer, Douglas Williams, and WAYNE F. VELICERseveral anonymous reviewers for their thorough work andgood counsel. We finish this preface with a caveat. Readers will in- REFERENCESevitably discover several contradictions or disagreementsacross the chapter offerings. Inevitably, researchers in differ- Kirk, Roger E. (1968). Experimental design: Procedures for theent areas solve similar methodological problems in different behavioral sciences. Paciﬁc Grove, CA: Brooks/Cole.ways. These differences are reﬂected in the offerings of this Kirk, Roger E. (1995). Experimental design: Procedures for thetext, and we have not attempted to mediate these differing behavioral sciences (3rd ed.). Paciﬁc Grove, CA: Brooks/Cole.viewpoints. Rather, we believe that the serious researcher Siegel, S. (1956). Nonparametric statistics for the behavioralwill welcome the opportunity to review solutions suggested sciences. New York: McGraw-Hill.
ContentsHandbook of Psychology Preface ix Irving B. WeinerVolume Preface xi John A. Schinka and Wayne F. VelicerContributors xxi PA RT O N E FOUNDATIONS OF RESEARCH ISSUES: STUDY DESIGN, DATA MANAGEMENT, DATA REDUCTION, AND DATA SYNTHESIS1 EXPERIMENTAL DESIGN 3 Roger E. Kirk2 EXPLORATORY DATA ANALYSIS 33 John T. Behrens and Chong-ho Yu3 POWER: BASICS, PRACTICAL PROBLEMS, AND POSSIBLE SOLUTIONS 65 Rand R. Wilcox4 METHODS FOR HANDLING MISSING DATA 87 John W. Graham, Patricio E. Cumsille, and Elvira Elek-Fisk5 PREPARATORY DATA ANALYSIS 115 Linda S. Fidell and Barbara G. Tabachnick6 FACTOR ANALYSIS 143 Richard L. Gorsuch7 CLUSTERING AND CLASSIFICATION METHODS 165 Glenn W. Milligan and Stephen C. Hirtle PA RT T W O RESEARCH METHODS IN SPECIFIC CONTENT AREAS8 CLINICAL FORENSIC PSYCHOLOGY 189 Kevin S. Douglas, Randy K. Otto, and Randy Borum9 PSYCHOTHERAPY OUTCOME RESEARCH 213 Evelyn S. Behar and Thomas D. Borkovec xvii
xviii Contents10 HEALTH PSYCHOLOGY 241 Timothy W. Smith11 ANIMAL LEARNING 271 Russell M. Church12 NEUROPSYCHOLOGY 289 Russell M. Bauer, Elizabeth C. Leritz, and Dawn Bowers13 PROGRAM EVALUATION 323 Melvin M. Mark PA RT T H R E E MEASUREMENT ISSUES14 MOOD MEASUREMENT: CURRENT STATUS AND FUTURE DIRECTIONS 351 David Watson and Jatin Vaidya15 MEASURING PERSONALITY AND PSYCHOPATHOLOGY 377 Leslie C. Morey16 THE CIRCUMPLEX MODEL: METHODS AND RESEARCH APPLICATIONS 407 Michael B. Gurtman and Aaron L. Pincus17 ITEM RESPONSE THEORY AND MEASURING ABILITIES 429 Karen M. Schmidt and Susan E. Embretson18 GROWTH CURVE ANALYSIS IN CONTEMPORARY PSYCHOLOGICAL RESEARCH 447 John J. McArdle and John R. Nesselroade PA RT F O U R DATA ANALYSIS METHODS19 MULTIPLE LINEAR REGRESSION 483 Leona S. Aiken, Stephen G. West, and Steven C. Pitts20 LOGISTIC REGRESSION 509 Alfred DeMaris21 META-ANALYSIS 533 Frank L. Schmidt and John E. Hunter22 SURVIVAL ANALYSIS 555 Judith D. Singer and John B. Willett23 TIME SERIES ANALYSIS 581 Wayne F. Velicer and Joseph L. Fava
Contents xix24 STRUCTURAL EQUATION MODELING 607 Jodie B. Ullman and Peter M. Bentler25 ORDINAL ANALYSIS OF BEHAVIORAL DATA 635 Jeffrey D. Long, Du Feng, and Norman Cliff26 LATENT CLASS AND LATENT TRANSITION ANALYSIS 663 Stephanie T. Lanza, Brian P. Flaherty, and Linda M. CollinsAuthor Index 687Subject Index 703
ContributorsLeona S. Aiken, PhD Norman Cliff, PhDDepartment of Psychology Professor of Psychology EmeritusArizona State University University of Southern CaliforniaTempe, Arizona Los Angeles, CaliforniaRussell M. Bauer, PhD Linda M. Collins, PhDDepartment of Clinical and Health Psychology The Methodology CenterUniversity of Florida Pennsylvania State UniversityGainesville, Florida University Park, PennsylvaniaEvelyn S. Behar, MS Patricio E. Cumsille, PhDDepartment of Psychology Escuela de PsicologiaPennsylvania State University Universidad Católica de ChileUniversity Park, Pennsylvania Santiago, ChileJohn T. Behrens, PhD Alfred DeMaris, PhDCisco Networking Academy Program Department of SociologyCisco Systems, Inc. Bowling Green State UniversityPhoenix, Arizona Bowling Green, OhioPeter M. Bentler, PhD Kevin S. Douglas, PhD, LLBDepartment of Psychology Department of Mental Health Law & PolicyUniversity of California Florida Mental Health InstituteLos Angeles, California University of South Florida Tampa, FloridaThomas D. Borkovec, PhDDepartment of Psychology Du Feng, PhDPennsylvania State University Human Development and Family StudiesUniversity Park, Pennsylvania Texas Tech University Lubbock, TexasRandy Borum, PsyDDepartment of Mental Health Law & Policy Elvira Elek-Fisk, PhDFlorida Mental Health Institute The Methodology CenterUniversity of South Florida Pennsylvania State UniversityTampa, Florida University Park, PennsylvaniaDawn Bowers, PhD Susan E. Embretson, PhDDepartment of Clinical and Health Psychology Department of PsychologyUniversity of Florida University of KansasGainesville, Florida Lawrence, KansasRussell M. Church, PhD Joseph L. Fava, PhDDepartment of Psychology Cancer Prevention Research CenterBrown University University of Rhode IslandProvidence, Rhode Island Kingston, Rhode Island xxi
xxii ContributorsLinda S. Fidell, PhD Melvin M. Mark, PhDDepartment of Psychology Department of PsychologyCalifornia State University Pennsylvania State UniversityNorthridge, California University Park, PennsylvaniaBrian P. Flaherty, MS John J. McArdle, PhDThe Methodology Center Department of PsychologyPennsylvania State University University of VirginiaUniversity Park, Pennsylvania Charlottesville, VirginiaRichard L. Gorsuch, PhD Glenn W. Milligan, PhDGraduate School of Psychology Department of Management SciencesFuller Theological Seminary Ohio State UniversityPasadena, California Columbus, OhioJohn W. Graham, PhD Leslie C. Morey, PhDDepartment of Biobehavioral Health Department of PsychologyPennsylvania State University Texas A&M UniversityUniversity Park, Pennsylvania College Station, TexasMichael B. Gurtman, PhD John R. Nesselroade, PhDDepartment of Psychology Department of PsychologyUniversity of Wisconsin-Parkside University of VirginiaKenosha, Wisconsin Charlottesville, VirginiaStephen C. Hirtle, PhD Randy K. Otto, PhDSchool of Information Sciences Department of Mental Health Law & PolicyUniversity of Pittsburgh Florida Mental Health InstitutePittsburgh, Pennsylvania University of South Florida Tampa, FloridaJohn E. Hunter, PhD Aaron L. Pincus, PhDDepartment of Psychology Department of PsychologyMichigan State University Pennsylvania State UniversityEast Lansing, Michigan University Park, PennsylvaniaRoger E. Kirk, PhD Steven C. Pitts, PhDDepartment of Psychology and Neuroscience Department of PsychologyBaylor University University of Maryland, Baltimore CountyWaco, Texas Baltimore, MarylandStephanie T. Lanza, MS Karen M. Schmidt, PhDThe Methodology Center Department of PsychologyPennsylvania State University University of VirginiaUniversity Park, Pennsylvania Charlottesville, VirginiaElizabeth C. Leritz, MS Frank L. Schmidt, PhDDepartment of Clinical and Health Psychology Department of Management and OrganizationUniversity of Florida University of IowaGainesville, Florida Iowa City, IowaJeffrey D. Long, PhD Judith D. Singer, PhDDepartment of Educational Psychology Graduate School of EducationUniversity of Minnesota Harvard UniversityMinneapolis, Minnesota Cambridge, Massachusetts
Contributors xxiiiTimothy W. Smith, PhD David Watson, PhDDepartment of Psychology Department of PsychologyUniversity of Utah University of IowaSalt Lake City, Utah Iowa City, IowaBarbara G. Tabachnick, PhD Stephen G. West, PhDDepartment of Psychology Department of PsychologyCalifornia State University Arizona State UniversityNorthridge, California Tempe, ArizonaJodie B. Ullman, PhD Rand R. Wilcox, PhDDepartment of Psychology Department of PsychologyCalifornia State University University of Southern CaliforniaSan Bernadino, California Los Angeles, CaliforniaJatin Vaidya John B. Willett, PhDDepartment of Psychology Graduate School of EducationUniversity of Iowa Harvard UniversityIowa City, Iowa Cambridge, MassachusettsWayne F. Velicer, PhD Chong-ho Yu, PhDCancer Prevention Research Center Cisco Networking Academy ProgramUniversity of Rhode Island Cisco Systems, Inc.Kingston, Rhode Island Chandler, Arizona
PA R T O N E FOUNDATIONS OF RESEARCH ISSUES: STUDY DESIGN, DATA MANAGEMENT,DATA REDUCTION, AND DATA SYNTHESIS
CHAPTER 1Experimental DesignROGER E. KIRKSOME BASIC EXPERIMENTAL DESIGN CONCEPTS 3 FACTORIAL DESIGNS WITH CONFOUNDING 21THREE BUILDING BLOCK DESIGNS 4 Split-Plot Factorial Design 21 Completely Randomized Design 4 Confounded Factorial Designs 24 Randomized Block Design 6 Fractional Factorial Designs 25 Latin Square Design 9 HIERARCHICAL DESIGNS 27CLASSIFICATION OF EXPERIMENTAL DESIGNS 10 Hierarchical Designs With One orFACTORIAL DESIGNS 11 Two Nested Treatments 27 Completely Randomized Factorial Design 11 Hierarchical Design With Crossed Alternative Models 14 and Nested Treatments 28 Randomized Block Factorial Design 19 EXPERIMENTAL DESIGNS WITH A COVARIATE 29 REFERENCES 31SOME BASIC EXPERIMENTAL about (a) one or more parameters of a population or (b) theDESIGN CONCEPTS functional form of a population. Statistical hypotheses are rarely identical to scientiﬁc hypotheses—they areExperimental design is concerned with the skillful interroga- testable formulations of scientiﬁc hypotheses.tion of nature. Unfortunately, nature is reluctant to reveal 2. Determination of the experimental conditions (independenther secrets. Joan Fisher Box (1978) observed in her autobiog- variable) to be manipulated, the measurement (dependentraphy of her father, Ronald A. Fisher, “Far from behaving variable) to be recorded, and the extraneous conditionsconsistently, however, Nature appears vacillating, coy, and (nuisance variables) that must be controlled.ambiguous in her answers” (p. 140). Her most effective 3. Speciﬁcation of the number of participants required andtool for confusing researchers is variability—in particular, the population from which they will be sampled.variability among participants or experimental units. But 4. Speciﬁcation of the procedure for assigning the partici-two can play the variability game. By comparing the variabil- pants to the experimental conditions.ity among participants treated differently to the variability 5. Determination of the statistical analysis that will beamong participants treated alike, researchers can make in- performed.formed choices between competing hypotheses in scienceand technology. In short, an experimental design identiﬁes the independent, We must never underestimate nature—she is a formidable dependent, and nuisance variables and indicates the way infoe. Carefully designed and executed experiments are re- which the randomization and statistical aspects of an experi-quired to learn her secrets. An experimental design is a plan ment are to be carried out.for assigning participants to experimental conditions and thestatistical analysis associated with the plan (Kirk, 1995, p. 1).The design of an experiment involves a number of inter- Analysis of Variancerelated activities: Analysis of variance (ANOVA) is a useful tool for under-1. Formulation of statistical hypotheses that are germane to the standing the variability in designed experiments. The seminal scientiﬁc hypothesis. A statistical hypothesis is a statement ideas for both ANOVA and experimental design can be traced 3
4 Experimental Designto Ronald A. Fisher, a statistician who worked at the Rotham- Fisher popularized two other principles of good experi-sted Experimental Station. According to Box (1978, p. 100), mentation: replication and local control or blocking. Replica-Fisher developed the basic ideas ofANOVAbetween 1919 and tion is the observation of two or more participants under1925. The ﬁrst hint of what was to come appeared in a 1918 identical experimental conditions. Fisher observed that repli-paper in which Fisher partitioned the total variance of a human cation enables a researcher to estimate error effects andattribute into portions attributed to heredity, environment, and obtain a more precise estimate of treatment effects. Blocking,other factors. The analysis of variance table for a two-treat- on the other hand, is an experimental procedure for isolatingment factorial design appeared in a 1923 paper published with variation attributable to a nuisance variable. As the nameM. A. Mackenzie (Fisher & Mackenzie, 1923). Fisher referred suggests, nuisance variables are undesired sources of varia-to the table as a convenient way of arranging the arithmetic. In tion that can affect the dependent variable. There are many1924 Fisher (1925) introduced the Latin square design in con- sources of nuisance variation. Differences among partici-nection with a forest nursery experiment. The publication in pants comprise one source. Other sources include variation1925 of his classic textbook Statistical Methods for Research in the presentation of instructions to participants, changes inWorkers and a short paper the following year (Fisher, 1926) environmental conditions, and the effects of fatigue andpresented all the essential ideas of analysis of variance. The learning when participants are observed several times. Threetextbook (Fisher, 1925, pp. 244–249) included a table of the experimental approaches are used to deal with nuisancecritical values of the ANOVA test statistic in terms of a func- variables:tion called z, where z = 1 (ln 2 2 ˆ Treatment − ln 2 ). The statis- ˆ Error 2 2tics Treatment and Error denote, respectively, treatment and ˆ ˆ 1. Holding the variable constant.error variance. A more convenient form of Fisher’s z table that 2. Assigning participants randomly to the treatment levels sodid not require looking up log values was developed by that known and unsuspected sources of variation amongGeorge Snedecor (1934). His critical values are expressed in the participants are distributed over the entire experimentterms of the function F = 2 ˆ Treatment /2 ˆ Error that is obtained and do not affect just one or a limited number of treatmentdirectly from the ANOVA calculations. He named it F in honor levels.of Fisher. Fisher’s ﬁeld of experimentation—agriculture— 3. Including the nuisance variable as one of the factors in thewas a fortunate choice because results had immediate applica- experiment.tion with assessable economic value, because simplifyingassumptions such as normality and independence of errors The last experimental approach uses local control or blockingwere usually tenable, and because the cost of conducting to isolate variation attributable to the nuisance variable soexperiments was modest. that it does not appear in estimates of treatment and error effects. A statistical approach also can be used to deal withThree Principles of Good Experimental Design nuisance variables. The approach is called analysis of covari- ance and is described in the last section of this chapter.The publication of Fisher’s Statistical Methods for Research The three principles that Fisher vigorously championed—Workers and his 1935 The Design of Experiments gradually randomization, replication, and local control—remain theled to the acceptance of what today is considered to be the cornerstones of good experimental design.cornerstone of good experimental design: randomization.It is hard to imagine the hostility that greeted the suggestionthat participants or experimental units should be randomly THREE BUILDING BLOCK DESIGNSassigned to treatment levels. Before Fisher’s work, mostresearchers used systematic schemes, not subject to the laws Completely Randomized Designof chance, to assign participants. According to Fisher, ran-dom assignment has several purposes. It helps to distribute One of the simplest experimental designs is the randomizationthe idiosyncratic characteristics of participants over the treat- and analysis plan that is used with a t statistic for independentment levels so that they do not selectively bias the outcome of samples. Consider an experiment to compare the effectivenessthe experiment. Also, random assignment permits the com- of two diets for obese teenagers. The independent variable isputation of an unbiased estimate of error effects—those the two kinds of diets; the dependent variable is the amount ofeffects not attributable to the manipulation of the independent weight loss two months after going on a diet. For notationalvariable—and it helps to ensure that the error effects are convenience, the two diets are called treatment A. The levelsstatistically independent. of treatment A corresponding to the speciﬁc diets are denoted
Three Building Block Designs 5by the lowercase letter a and a subscript: a1 denotes one dietand a2 denotes the other. A particular but unspeciﬁed level oftreatment A is denoted by aj, where j ranges over the values 1and 2. The amount of weight loss in pounds 2 months afterparticipant i went on diet j is denoted by Yij. The null and alternative hypotheses for the weight-lossexperiment are, respectively, H0: 1 − 2 = 0 H1: 1 − 2 = 0 ,where 1 and 2 denote the mean weight loss of the respec-tive populations. Assume that 30 girls who want to loseweight are available to participate in the experiment. Theresearcher assigns n = 15 girls to each of the p = 2 diets so Figure 1.2 Layout for a completely randomized design (CR-3 design).that each of the (np)!/(n!) p = 155,117,520 possible assign- Forty-ﬁve girls are randomly assigned to three levels of treatment A with thements has the same probability. This is accomplished by restriction that 15 girls are assigned to each level. The mean weight loss innumbering the girls from 1 to 30 and drawing numbers from pounds for the girls in treatment levels a1, a2, and a3 is denoted by Y ·1 , Y ·2 , and Y ·3 , respectively.a random numbers table. The ﬁrst 15 numbers drawn between1 and 30 are assigned to treatment level a1; the remaining 15numbers are assigned to a2. The layout for this experiment is three diets. The null and alternative hypotheses for theshown in Figure 1.1. The girls who were assigned to treat- experiment are, respectively,ment level a1 are called Group1; those assigned to treatmentlevel a2 are called Group2. The mean weight losses of the two H0: 1 = 2 = 3groups of girls are denoted by Y ·1 and Y ·2 . H1: j = j for some j and j . The t independent-samples design involves randomlyassigning participants to two levels of a treatment. A com- Assume that 45 girls who want to lose weight are available topletely randomized design, which is described next, extends participate in the experiment. The girls are randomly as-this design strategy to two or more treatment levels. The com- signed to the three diets with the restriction that 15 girls arepletely randomized design is denoted by the letters CR-p, assigned to each diet. The layout for the experiment is shownwhere CR stands for “completely randomized” and p is the in Figure 1.2. A comparison of the layout in this ﬁgure withnumber of levels of the treatment. that in Figure 1.1 for a t independent-samples design reveals Again, consider the weight-loss experiment and suppose that they are the same except that the completely randomizedthat the researcher wants to evaluate the effectiveness of design has three treatment levels. The t independent-samples design can be thought of as a special case of a completely randomized design. When p is equal to two, the layouts and randomization plans for the designs are identical. Thus far I have identiﬁed the null hypothesis that the researcher wants to test, 1 = 2 = 3 , and described the manner in which the participants are assigned to the three treatment levels. In the following paragraphs I discuss the com- posite nature of an observation, describe the classical model equation for a CR-p design, and examine the meaning of the terms treatment effect and error effect. An observation, which is a measure of the dependent vari- able, can be thought of as a composite that reﬂects the effects of the (a) independent variable, (b) individual charac-Figure 1.1 Layout for a t independent-samples design. Thirty girls are ran- teristics of the participant or experimental unit, (c) chancedomly assigned to two levels of treatment A with the restriction that 15 girlsare assigned to each level. The mean weight loss in pounds for the girls in ﬂuctuations in the participant’s performance, (d) measure-treatment levels a1 and a2 is denoted by Y ·1 and Y ·2 , respectively. ment and recording errors that occur during data collection,
6 Experimental Designand (e) any other nuisance variables such as environmental level are different because the error effects, i( j)s, for theconditions that have not been controlled. Consider the weight observations are different. Recall that error effects reﬂect idio-loss of the ﬁfth participant in treatment level a2. Suppose that syncratic characteristics of the participants—those character-two months after beginning the diet this participant has lost istics that differ from one participant to another—and any13 pounds (Y52 = 13). What factors have affected the value of other variables that have not been controlled. Researchers at-Y52? One factor is the effectiveness of the diet. Other factors tempt to minimize the size of error effects by holding sourcesare her weight prior to starting the diet, the degree to which of variation that might contribute to the error effects constantshe stayed on the diet, and the amount she exercised during and by the judicial choice of an experimental design. Designsthe two-month trial, to mention only a few. In summary, Y52 is that are described next permit a researcher to isolate and re-a composite that reﬂects (a) the effects of treatment level a2, move some sources of variation that would ordinarily be in-(b) effects unique to the participant, (c) effects attributable to cluded in the error effects.chance ﬂuctuations in the participant’s behavior, (d) errors inmeasuring and recording the participant’s weight loss, and(e) any other effects that have not been controlled. Our con- Randomized Block Designjectures about Y52 or any of the other 44 observations can be The two designs just described use independent samples. Twoexpressed more formally by a model equation. The classical samples are independent if, for example, a researcher ran-model equation for the weight-loss experiment is domly samples from two populations or randomly assigns par- ticipants to p groups. Dependent samples, on the other hand, Yi j = + ␣ j + i( j) (i = 1, . . . , n; j = 1, . . . , p), can be obtained by any of the following procedures.where 1. Observe each participant under each treatment level in the experiment—that is, obtain repeated measures on the Yi j is the weight loss for participant i in treatment participants. level aj. 2. Form sets of participants who are similar with respect to is the grand mean of the three weight-loss popula- a variable that is correlated with the dependent variable. tion means. This procedure is called participant matching. ␣j is the treatment effect for population j and is equal to 3. Obtain sets of identical twins or littermates in which case j − . It reﬂects the effects of diet aj. the participants have similar genetic characteristics. i( j) is the within-groups error effect associated with Yi j 4. Obtain participants who are matched by mutual selection, and is equal to Yi j − − ␣ j . It reﬂects all effects for example, husband and wife pairs or business partners. not attributable to treatment level aj. The notation i( j) indicates that the ith participant appears only in In the behavioral and social sciences, the participants are treatment level j. Participant i is said to be nested often people whose aptitudes and experiences differ markedly. within the jth treatment level. Nesting is discussed Individual differences are inevitable, but it is often possible in the section titled “Hierarchical Designs.” to isolate or partition out a portion of these effects so that they do not appear in estimates of the error effects. One designAccording to the equation for this completely randomized for accomplishing this is the design used with a t statistic fordesign, each observation is the sum of three parameters dependent samples. As the name suggests, the design uses, ␣ j , and i( j) . The values of the parameters in the equation dependent samples. A t dependent-samples design also uses aare unknown but can be estimated from sample data. more complex randomization and analysis plan than does a t The meanings of the terms grand mean, , and treatment independent-samples design. However, the added complexityeffect, ␣ j , in the model equation seem fairly clear; the mean- is often accompanied by greater power—a point that I will de-ing of error effect, i( j) , requires a bit more explanation. Why velop later in connection with a randomized block design.do observations, Yi j s, in the same treatment level vary from Let’s reconsider the weight-loss experiment. It is reason-one participant to the next? This variation must be due to dif- able to assume that ease of losing weight is related to theferences among the participants and to other uncontrolled amount by which a girl is overweight. The design of the exper-variables because the parameters and ␣ j in the model equa- iment can be improved by isolating this nuisance variable.tion are constants for all participants in the same treatment Suppose that instead of randomly assigning 30 participants tolevel. To put it another way, observations in the same treatment the treatment levels, the researcher formed pairs of participants
Three Building Block Designs 7 participants who have been exposed to only one treatment level. Some writers reserve the designation randomized block design for this latter case. They refer to a design with repeated measurements in which the order of administration of the treatment levels is randomized independently for each participant as a subjects-by-treatments design. A design with repeated measurements in which the order of administration of the treatment levels is the same for all participants is referred to as a subject-by-trials design. I use the designationFigure 1.3 Layout for a t dependent-samples design. Each block contains randomized block design for all three cases.two girls who are overweight by about the same amount. The two girls in a Of the four ways of obtaining dependent samples, the useblock are randomly assigned to the treatment levels. The mean weight loss in of repeated measures on the participants typically results inpounds for the girls in treatment levels a1 and a2 is denoted by Y ·1 and Y ·2 ,respectively. the greatest homogeneity within the blocks. However, if re- peated measures are used, the effects of one treatment level should dissipate before the participant is observed under an-so that prior to going on a diet the participants in each pair are other treatment level. Otherwise the subsequent observationsoverweight by about the same amount. The participants in each will reﬂect the cumulative effects of the preceding treatmentpair constitute a block or set of matched participants. A simple levels. There is no such restriction, of course, if carryover ef-way to form blocks of matched participants is to rank them fects such as learning or fatigue are the researcher’s principalfrom least to most overweight. The participants ranked 1 and 2 interest. If blocks are composed of identical twins or litter-are assigned to block one, those ranked 3 and 4 are assigned to mates, it is assumed that the performance of participants hav-block two, and so on. In this example, 15 blocks of dependent ing identical or similar heredities will be more homogeneoussamples can be formed from the 30 participants. After all of the than the performance of participants having dissimilar hered-blocks have been formed, the two participants in each block ities. If blocks are composed of participants who are matchedare randomly assigned to the two diets. The layout for this ex- by mutual selection (e.g., husband and wife pairs or businessperiment is shown in Figure 1.3. If the researcher’s hunch is partners), a researcher should ascertain that the participantscorrect that ease in losing weight is related to the amount by in a block are in fact more homogeneous with respect to thewhich a girl is overweight, this design should result in a more dependent variable than are unmatched participants. A hus-powerful test of the null hypothesis, ·1 − ·2 = 0, than would band and wife often have similar political attitudes; the cou-a t test for independent samples. As we will see, the increased ple is less likely to have similar mechanical aptitudes.power results from isolating the nuisance variable (the amount Suppose that in the weight-loss experiment the researcherby which the girls are overweight) so that it does not appear in wants to evaluate the effectiveness of three diets, denotedthe estimate of the error effects. by a1, a2, and a3. The researcher suspects that ease of losing Earlier we saw that the layout and randomization proce- weight is related to the amount by which a girl is overweight.dures for a t independent-samples design and a completely If a sample of 45 girls is available, the blocking procedurerandomized design are the same except that a completely ran- described in connection with a t dependent-samples designdomized design can have more than two treatment levels. can be used to form 15 blocks of participants. The three par-The same comparison can be drawn between a t dependent- ticipants in a block are matched with respect to the nuisancesamples design and a randomized block design. A random- variable, the amount by which a girl is overweight. The lay-ized block design is denoted by the letters RB-p, where RB out for this experiment is shown in Figure 1.4. A comparisonstands for “randomized block” and p is the number of levels of the layout in this ﬁgure with that in Figure 1.3 for a tof the treatment. The four procedures for obtaining depen- dependent-samples design reveals that they are the same ex-dent samples that were described earlier can be used to form cept that the randomized block design has p = 3 treatmentthe blocks in a randomized block design. The procedure that levels. When p = 2, the layouts and randomization plans foris used does not affect the computation of signiﬁcance tests, the designs are identical. In this and later examples, I assumebut the procedure does affect the interpretation of the results. that all of the treatment levels and blocks of interest are rep-The results of an experiment with repeated measures general- resented in the experiment. In other words, the treatment lev-ize to a population of participants who have been exposed to els and blocks represent ﬁxed effects. A discussion of the caseall of the treatment levels. However, the results of an experi- in which either the treatment levels or blocks or both are ran-ment with matched participants generalize to a population of domly sampled from a population of levels, the mixed and
8 Experimental Design i j is the residual error effect associated with Yi j and is equal to Yi j − − ␣ j − i . It reﬂects all effects not attributable to treatment level aj and Blocki. According to the model equation for this randomized block design, each observation is the sum of four parameters: ,␣ j , i , and i j . A residual error effect is that portion of an observation that remains after the grand mean, treatment effect, and block effect have been subtracted from it; that is, i j = Yi j − − ␣ j − i . The sum of the squared errorFigure 1.4 Layout for a randomized block design (RB-3 design). Each effects for this randomized block design,block contains three girls who are overweight by about the same amount.The three girls in a block are randomly assigned to the treatment levels. The i2j = (Yi j − − ␣ j − i )2 ,mean weight loss in pounds for the girls in treatment levels a1, a2, and a3 isdenoted by Y ·1 , Y ·2 , and Y ·3 , respectively. The mean weight loss for thegirls in Block1, Block2, . . . , Block15 is denoted by Y 1· , Y 2· , . . . , Y 15· , will be smaller than the sum for the completely randomizedrespectively. design, i( j) = 2 (Yi j − − ␣ j )2 ,random effects cases, is beyond the scope of this chapter. Thereader is referred to Kirk (1995, pp. 256–257, 265–268). if i2 is not equal to zero for one or more blocks. This idea is A randomized block design enables a researcher to test illustrated in Figure 1.5, where the total sum of squares andtwo null hypotheses. degrees of freedom for the two designs are partitioned. The F H0: ·1 = ·2 = ·3 statistic that is used to test the null hypothesis can be thought (Treatment population means are equal.) of as a ratio of error and treatment effects, H0: 1· = 2· = · · · = 15· f (error effects) + f (treatment effects) F= (Block population means are equal.) f (error effects)The second hypothesis, which is usually of little interest, where f ( ) denotes a function of the effects in parentheses. Itstates that the population weight-loss means for the 15 levels is apparent from an examination of this ratio that the smallerof the nuisance variable are equal. The researcher expects a the sum of the squared error effects, the larger the F statistictest of this null hypothesis to be signiﬁcant. If the nuisance and, hence, the greater the probability of rejecting a false nullvariable represented by the blocks does not account for an ap-preciable proportion of the total variation in the experiment,little has been gained by isolating the effects of the variable.Before exploring this point, I describe the model equation foran RB-p design. The classical model equation for the weight-loss experi- SSWGment is p(n Ϫ 1) ϭ 42 Yi j = + ␣ j + i + i j (i = 1, . . . , n; j = 1, . . . , p),where SSRES Yi j is the weight loss for the participant in Block i and (n Ϫ 1)(p Ϫ 1) ϭ 28 treatment level aj. Figure 1.5 Partition of the total sum of squares (SSTOTAL) and degrees of is the grand mean of the three weight-loss popula- freedom (np − 1 = 44) for CR-3 and RB-3 designs. The treatment and tion means. within-groups sums of squares are denoted by, respectively, SSA and SSWG. The block and residual sums of squares are denoted by, respectively, SSBL ␣j is the treatment effect for population j and is equal to and SSRES. The shaded rectangles indicate the sums of squares that are used · j − . It reﬂects the effect of diet aj. to compute the error variance for each design: MSWG = SSWG/ p(n − 1) i is the block effect for population i and is equal to and MSRES = SSRES/(n − 1)( p − 1). If the nuisance variable (SSBL) in the randomized block design accounts for an appreciable portion of the total sum i· − . It reﬂects the effect of the nuisance variable of squares, the design will have a smaller error variance and, hence, greater in Blocki. power than the completely randomized design.
Three Building Block Designs 9hypothesis. Thus, by isolating a nuisance variable that ac-counts for an appreciable portion of the total variation in arandomized block design, a researcher is rewarded with amore powerful test of a false null hypothesis. As we have seen, blocking with respect to the nuisancevariable (the amount by which the girls are overweight)enables the researcher to isolate this variable and remove itfrom the error effects. But what if the nuisance variable Figure 1.6 Three-by-three Latin square, where aj denotes one of thedoesn’t account for any of the variation in the experiment? In j = 1, . . . , p levels of treatment A; bk denotes one of the k = 1, . . . , p levels of nuisance variable B; and cl denotes one of the l = 1, . . . , p levels of nui-other words, what if all of the block effects in the experiment sance variable C. Each level of treatment A appears once in each row andare equal to zero? In this unlikely case, the sum of the squared once in each column as required for a Latin square.error effects for the randomized block and completely ran-domized designs will be equal. In this case, the randomized Latin square: b1 is less than 15 pounds, b2 is 15 to 25 pounds,block design will be less powerful than the completely ran- and b3 is more than 25 pounds. The advantage of being able todomized design because its error variance, the denominator isolate two nuisance variables comes at a price. The ran-of the F statistic, has n − 1 fewer degrees of freedom than domization procedures for a Latin square design are morethe error variance for the completely randomized design. It complex than those for a randomized block design. Also, theshould be obvious that the nuisance variable should be se- number of rows and columns of a Latin square must eachlected with care. The larger the correlation between the nui- equal the number of treatment levels, which is three in the ex-sance variable and the dependent variable, the more likely it ample. This requirement can be very restrictive. For example,is that the block effects will account for an appreciable it was necessary to restrict the continuous variable of theproportion of the total variation in the experiment. amount by which girls are overweight to only three levels. The layout of the LS-3 design is shown in Figure 1.7.Latin Square DesignThe Latin square design described in this section derives itsname from an ancient puzzle that was concerned with thenumber of different ways that Latin letters can be arranged ina square matrix so that each letter appears once in each rowand once in each column. An example of a 3 × 3 Latin squareis shown in Figure 1.6. In this ﬁgure I have used the letter awith subscripts in place of Latin letters. The Latin square de-sign is denoted by the letters LS-p, where LS stands for“Latin square” and p is the number of levels of the treatment.A Latin square design enables a researcher to isolate the ef-fects of not one but two nuisance variables. The levels of onenuisance variable are assigned to the rows of the square; thelevels of the other nuisance variable are assigned to thecolumns. The levels of the treatment are assigned to the cellsof the square. Let’s return to the weight-loss experiment. With a Latinsquare design the researcher can isolate the effects of theamount by which girls are overweight and the effects of a sec-ond nuisance variable, for example, genetic predisposition tobe overweight. A rough measure of the second nuisance vari- Figure 1.7 Layout for a Latin square design (LS-3 design) that is based onable can be obtained by asking a girl’s parents whether they the Latin square in Figure 1.6. Treatment A represents three kinds of diets;were overweight as teenagers: c1 denotes neither parent over- nuisance variable B represents amount by which the girls are overweight;weight, c2 denotes one parent overweight, and c3 denotes both and nuisance variable C represents genetic predisposition to be overweight.parents overweight. This nuisance variable can be assigned to The girls in Group1, for example, received diet a1, were less than ﬁfteen pounds overweight (b1), and neither parent had been overweight as athe columns of the Latin square. Three levels of the amount by teenager (c1). The mean weight loss in pounds for the girls in the nine groupswhich girls are overweight can be assigned to the rows of the is denoted by Y ·111 , Y ·123 , . . . , Y ·331 .
10 Experimental Design The design in Figure 1.7 enables the researcher to test if the combined effects of ␤2 , ␥l2 , and k 2 are jklthree null hypotheses: greater than i2 . The beneﬁts of isolating two nuisance variables are a smaller error variance and increased power. H0: 1·· = 2·· = 3·· Thus far I have described three of the simplest experimen- (Treatment population means are equal.) tal designs: the completely randomized design, randomized H0: ·1· = ·2· = ·3· block design, and Latin square design. The three designs are (Row population means are equal.) called building block designs because complex experimental H0: ··1 = ··2 = ··3 designs can be constructed by combining two or more of these (Column population means are equal.) simple designs (Kirk, 1995, p. 40). Furthermore, the random- ization procedures, data analysis, and model assumptions forThe ﬁrst hypothesis states that the population means for the complex designs represent extensions of those for the threethree diets are equal. The second and third hypotheses make building block designs. The three designs provide the organi-similar assertions about the population means for the two zational structure for the design nomenclature and classiﬁca-nuisance variables. Tests of these nuisance variables are ex- tion scheme that is described next.pected to be signiﬁcant. As discussed earlier, if the nuisancevariables do not account for an appreciable proportion of thetotal variation in the experiment, little has been gained by iso- CLASSIFICATION OF EXPERIMENTAL DESIGNSlating the effects of the variables. The classical model equation for this version of the A classiﬁcation scheme for experimental designs is given inweight-loss experiment is Table 1.1. The designs in the category systematic designs do not use random assignment of participants or experimental Yi jkl = + ␣ j + ␤k + ␥l + jkl + i( jkl) units and are of historical interest only. According to Leonard (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , p; l = 1, . . . , p), and Clark (1939), agricultural ﬁeld research employing sys- tematic designs on a practical scale dates back to 1834. Overwhere the last 80 years systematic designs have fallen into disuse be- cause designs employing random assignment are more likely Yi jkl is the weight loss for the ith participant in treat- to provide valid estimates of treatment and error effects and ment level aj, row bk, and column cl. can be analyzed using the powerful tools of statistical infer- ␣j is the treatment effect for population j and is equal ence such as analysis of variance. Experimental designs using to j·· − . It reﬂects the effect of diet aj. random assignment are called randomized designs. The ran- ␤k is the row effect for population k and is equal domized designs in Table 1.1 are subdivided into categories to ·k· − . It reﬂects the effect of nuisance vari- based on (a) the number of treatments, (b) whether participants able bk. are assigned to relatively homogeneous blocks prior to random ␥l is the column effect for population l and is equal assignment, (c) presence or absence of confounding, (d) use of to ··l − . It reﬂects the effects of nuisance vari- crossed or nested treatments, and (e) use of a covariate. able cl. The letters p and q in the abbreviated designations denote jkl is the residual effect that is equal to jkl − j·· − the number of levels of treatments A and B, respectively. If a ·k· − ··l + 2. design includes a third and fourth treatment, say treatments C i( jkl) is the within-cell error effect associated with Yijkl and D, the number of their levels is denoted by r and t, and is equal to Yi jkl − − ␣ j − ␤k − ␥l − jkl . respectively. In general, the designation for designs with two or more treatments includes the letters CR, RB, or LS toAccording to the model equation for this Latin square design, indicate the building block design. The letter F or H is addedeach observation is the sum of six parameters: , ␣ j , ␤k , to the designation to indicate that the design is, respectively, a␥l , jkl , and i( jkl) . The sum of the squared within-cell error factorial design or a hierarchical design. For example, the F ineffects for the Latin square design, the designation CRF-pq indicates that it is a factorial design; the CR and pq indicate that the design was constructed by i( jkl) = 2 (Yi jkl − − ␣ j −␤k − ␥l − jkl )2 , combining two completely randomized designs with p and q treatment levels. The letters CF, PF, FF, and AC are added towill be smaller than the sum for the randomized block design, the designation if the design is, respectively, a confounded factorial design, partially confounded factorial design, frac- i2j = (Yi j − − ␣ j − i )2 , tional factorial design, or analysis of covariance design.
Factorial Designs 11TABLE 1.1 Classiﬁcation of Experimental Designs Abbreviated AbbreviatedExperimental Design Designationa Experimental Design Designationa I. Systematic Designs (selected examples). b. Randomized block completely confounded RBCF-pk 1. Beavan’s chessboard design. factorial design. 2. Beavan’s half-drill strip design. c. Randomized block partially confounded RBPF-pk 3. Diagonal square design. factorial design. 4. Knut Vik square design. 4. Designs with treatment-interaction confounding. II. Randomized Designs With One Treatment. a. Completely randomized fractional CRFF-pk−i A. Experimental units randomly assigned to factorial design. treatment levels. b. Graeco-Latin square fractional factorial design. GLSFF-pk 1. Completely randomized design. CR-p c. Latin square fractional factorial design. LSFF-pk B. Experimental units assigned to relatively d. Randomized block fractional factorial design. RBFF-pk−i homogeneous blocks or groups prior to B. Hierarchical designs: designs in which one or random assignment. more treatments are nested. 1. Balanced incomplete block design. BIB-p 1. Designs with complete nesting. 2. Cross-over design. CO-p a. Completely randomized hierarchical design. CRH-pq(A) 3. Generalized randomized block design. GRB-p b. Randomized block hierarchical design. RBH-pq(A) 4. Graeco-Latin square design. GLS-p 2. Designs with partial nesting. 5. Hyper-Graeco-Latin square design. HGLS-p a. Completely randomized partial CRPH-pq(A)r 6. Latin square design. LS-p hierarchical design. 7. Lattice balanced incomplete block design. LBIB-p b. Randomized block partial hierarchical design. RBPH-pq(A)r 8. Lattice partially balanced incomplete LPBIB-p c. Split-plot partial hierarchical design. SPH-p·qr(B) block design. IV. Randomized Designs With One or More Covariates. 9. Lattice unbalanced incomplete block design. LUBIB-p A. Designs that include a covariate have 10. Partially balanced incomplete block design. PBIB-p the letters AC added to the abbreviated 11. Randomized block design. RB-p designation as in the following examples. 12. Youden square design. YBIB-p 1. Completely randomized analysis of covariance CRAC-pIII. Randomized Designs With Two or More Treatments. design. A. Factorial designs: designs in which all treatments 2. Completely randomized factorial analysis CRFAC-pq are crossed. of covariance design. 1. Designs without confounding. 3. Latin square analysis of covariance design. LSAC-p a. Completely randomized factorial design. CRF-pq 4. Randomized block analysis of covariance design. RBAC-p b. Generalized randomized block factorial design. GRBF-pq 5. Split-plot factorial analysis of covariance design. SPFAC-p·q c. Randomized block factorial design. RBF-pq V. Miscellaneous Designs (select examples). 2. Design with group-treatment confounding. 1. Solomon four-group design. a. Split-plot factorial design. SPF- p·q 2. Interrupted time-series design. 3. Designs with group-interaction confounding. a. Latin square confounded factorial design. LSCF-pkaThe abbreviated designations are discussed later.Three of these designs are described later. Because of space choose. Because of the wide variety of designs available, it islimitations, I cannot describe all of the designs in Table 1.1. important to identify them clearly in research reports. OneI will focus on those designs that are potentially the most often sees statements such as “a two-treatment factorial de-useful in the behavioral and social sciences. sign was used.” It should be evident that a more precise It is apparent from Table 1.1 that a wide array of designs description is required. This description could refer to 10 ofis available to researchers. Unfortunately, there is no univer- the 11 factorial designs in Table 1.1.sally accepted designation for the various designs—some Thus far, the discussion has been limited to designs withdesigns have as many as ﬁve different names. For example, one treatment and one or two nuisance variables. In the fol-the completely randomized design has been called a one-way lowing sections I describe designs with two or more treat-classiﬁcation design, single-factor design, randomized group ments that are constructed by combining several buildingdesign, simple randomized design, and single variable exper- block designs.iment. Also, a variety of design classiﬁcation schemes havebeen proposed. The classiﬁcation scheme in Table 1.1 owes FACTORIAL DESIGNSmuch to Cochran and Cox (1957, chaps. 4–13) and Federer(1955, pp. 11–12). Completely Randomized Factorial Design A quick perusal of Table 1.1 reveals why researcherssometimes have difﬁculty selecting an appropriate experi- Factorial designs differ from those described previously inmental design—there are a lot of designs from which to that two or more treatments can be evaluated simultaneously
12 Experimental Designin an experiment. The simplest factorial design from thestandpoint of randomization, data analysis, and model as-sumptions is based on a completely randomized design and,hence, is called a completely randomized factorial design. Atwo-treatment completely randomized factorial design is de-noted by the letters CRF-pq, where p and q denote the num-ber of levels, respectively, of treatments A and B. In the weight-loss experiment, a researcher might be inter-ested in knowing also whether walking on a treadmill for20 minutes a day would contribute to losing weight, as wellas whether the difference between the effects of walking ornot walking on the treadmill would be the same for each ofthe three diets. To answer these additional questions, a re-searcher can use a two-treatment completely randomized fac-torial design. Let treatment A consist of the three diets (a1, a2,and a3) and treatment B consist of no exercise on the tread-mill (b1) and exercise for 20 minutes a day on the treadmill(b2). This design is a CRF-32 design, where 3 is the numberof levels of treatment A and 2 is the number of levels of treat-ment B. The layout for the design is obtained by combiningthe treatment levels of a CR-3 design with those of a CR-2design so that each treatment level of the CR-3 design ap- Figure 1.8 Layout for a two-treatment completely randomized factorialpears once with each level of the CR-2 design and vice versa. design (CRF-32 design). Thirty girls are randomly assigned to six combina- tions of treatments A and B with the restriction that ﬁve girls are assigned toThe resulting design has 3 × 2 = 6 treatment combinations each combination. The mean weight loss in pounds for girls in the six groupsas follows: a1b1, a1b2, a2b1, a2b2, a3b1, a3b2. When treatment is denoted by Y ·11 , Y ·12 , . . . , Y ·32 .levels are combined in this way, the treatments are said to becrossed. The use of crossed treatments is a characteristic of i( jk) is the within-cell error effect associated with Yi jkall factorial designs. The layout of the design with 30 girls and is equal to Yi jk − − ␣ j − ␤k − (␣␤) jk . Itrandomly assigned to the six treatment combinations is reﬂects all effects not attributable to treatmentshown in Figure 1.8. level aj, treatment level bk, and the interaction of aj The classical model equation for the weight-loss experi- and bk.ment is The CRF-32 design enables a researcher to test three null Yi jk = + ␣ j + ␤k + (␣␤) jk + i( jk) hypotheses: (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , q), H0: 1· = 2· = 3· (Treatment A population means are equal.)where H0: ·1 = ·2 Yi jk is the weight loss for participant i in treatment (Treatment B population means are equal.) combination aj bk. H0: jk − jk − j k + j k = 0 for all j and k is the grand mean of the six weight-loss popula- (All A × B interaction effects equal zero.) tion means. The last hypothesis is unique to factorial designs. It states that ␣j is the treatment effect for population aj and is the joint effects (interaction) of treatments A and B are equal equal to j· − . It reﬂects the effect of diet aj. to zero for all combinations of the two treatments. Two treat- ␤k is the treatment effect for population bk and is ments are said to interact if any difference in the dependent equal to ·k − . It reﬂects the effects of exercise variable for one treatment is different at two or more levels of condition bk. the other treatment. (␣␤) jk is the interaction effect for populations aj and bk Thirty girls are available to participate in the weight-loss ex- and is equal to jk − j· − ·k − . Interaction periment and have been randomly assigned to the six treatment effects are discussed later. combinations with the restriction that ﬁve girls are assigned to
Factorial Designs 13TABLE 1.2 Weight-Loss Data for the Diet (aj) and Exercise TABLE 1.4 Analysis of Variance for the Weight-Loss DataConditions (bk) Source SS df MS F pa1b1 a1b2 a2b1 a2b2 a3b1 a3b2 Treatment A (Diet) 131.6667 2 65.8334 4.25 .026 7 7 9 10 15 13 Treatment B (Exercise) 67.5000 1 67.5000 4.35 .04813 14 4 5 10 16 A×B 35.0000 2 17.5000 1.13 .340 9 11 7 7 12 20 Within cell 372.0000 24 15.5000 5 4 14 15 5 19 1 9 11 13 8 12 Total 606.1667 29each combination. The data, weight loss for each girl, are given the hypotheses are false. As John Tukey (1991) wrote, “thein Table 1.2. A descriptive summary of the data—sample effects of A and B are always different—in some decimalmeans and standard deviations—is given in Table 1.3. place—for any A and B. Thus asking ‘Are the effects differ- An examination of Table 1.3 suggests that diet a3 resulted ent?’ is foolish” (p. 100). Furthermore, rejection of a nullin more weight loss than did the other diets and 20 minutes a hypothesis tells us nothing about the size of the treatmentday on the treadmill was beneﬁcial. The analysis of variance effects or whether they are important or large enough to befor the weight-loss data is summarized in Table 1.4, which useful—that is, their practical signiﬁcance. In spite of numer-shows that the null hypotheses for treatments A and B can be ous criticisms of null hypothesis signiﬁcance testing, re-rejected. We know that at least one contrast or difference searchers continue to focus on null hypotheses and p values.among the diet population means is not equal to zero. Also, The focus should be on the data and on what the data tell thefrom Tables 1.3 and 1.4 we know that 20 minutes a day on researcher about the scientiﬁc hypothesis. This is not a newthe treadmill resulted in greater weight loss than did the idea. It was originally touched on by Karl Pearson in 1901no-exercise condition. The A × B interaction test is not and more explicitly by Fisher in 1925. Fisher (1925) pro-signiﬁcant. When two treatments interact, a graph in which posed that researchers supplement null hypothesis signiﬁ-treatment-combination population means are connected by cance tests with measures of strength of association. Sincelines will always reveal at least two nonparallel lines for one then over 40 supplementary measures of effect magnitudeor more segments of the lines. The nonsigniﬁcant interac- have been proposed (Kirk, 1996). The majority of the mea-tion test in Table 1.4 tells us that there is no reason for be- sures fall into one of two categories: measures of strength oflieving that the population difference in weight loss between association and measures of effect size (typically, standard-the treadmill and no-treadmill conditions is different for the ized mean differences). Hays (1963) introduced a measurethree diets. If the interaction had been signiﬁcant, our interest of strength of association that can assist a researcher in as-would have shifted from interpreting the tests of treatments A sessing the importance or usefulness of a treatment: omegaand B to understanding the nature of the interaction. Proce- squared, 2 . Omega squared estimates the proportion of the ˆdures for interpreting interactions are described by Kirk population variance in the dependent variable accounted(1995, pp. 370–372, 377–389). for by a treatment. For experiments with several treatments, as in the weight-loss experiment, partial omega squared isStatistical Signiﬁcance Versus Practical Signiﬁcance computed. For example, the proportion of variance in the dependent variable, Y, accounted for by treatment A eliminat-The rejection of the null hypotheses for the diet and exercise ing treatment B and the A × B interaction is denoted bytreatments is not very informative. We know in advance that 2 |A·B, AB . Similarly, 2 |B·A, AB denotes the proportion of the ˆY ˆY variance accounted for by treatment B eliminating treatmentTABLE 1.3 Descriptive Summary of the Weight-Loss Data: Means A and the A × B interaction. For the weight-loss experiment,(Y) and Standard Deviations (S) the partial omega squareds for treatments A and B are, Mean respectively, Standard Diet a1 Diet a2 Diet a3 DeviationNo treadmill Y ·11 = 7.0 Y ·21 = 9.0 Y ·31 = 10.0 Y ··1 = 8.7 S·11 = 4.0 S·21 = 3.4 S·31 = 3.4 S··1 = 3.8 ( p − 1)(FA − 1) exercise (b1) 2 |A·B, AB = ˆYTreadmill Y ·12 = 9.0 Y ·22 = 10.0 Y ·32 = 16.0 Y ··2 = 11.7 ( p − 1)(FA − 1) + npq exercise (b2) S·12 = 3.4 S·22 = 3.7 S·32 = 3.2 S··2 = 4.6 Y ·1· = 8.0 Y ·2· = 9.5 Y ·3· = 13.0 (3 − 1)(4.247 − 1) S·1· = 3.8 S·2· = 3.6 S·3· = 4.4 = = 0.18 (3 − 1)(4.247 − 1) + (5)(3)(2)
14 Experimental Design (q − 1)(FB − 1) procedures for computing sums of squares assume that all 2 |B·A, AB = ˆY (q − 1)(FB − 1) + npq cell ns in multitreatment experiments are equal. If the cell ns are not equal, some researchers use one of the following pro- (2 − 1)(4.376 − 1) = = 0.10. cedures to obtain approximate tests of null hypotheses: (a) es- (2 − 1)(4.376 − 1) + (5)(3)(2) timate the missing observations under the assumption that the treatments do not interact, (b) randomly set aside data to re-Following Cohen’s (1988, pp. 284–288) guidelines for inter- duce all cell ns to the same size, and (c) use an unweighted-preting omega squared, means analysis. The latter approach consists of performing .010 is a small association an ANOVA on the cell means and then multiplying the sums .059 is a medium association of squares by the harmonic mean of the cell ns. None of these procedures is entirely satisfactory. Fortunately, exact solu- .138 is a large association, tions to the unequal cell n problem exist. Two solutions thatwe conclude that the diets accounted for a large proportion are described next are based on a regression model and a cellof the population variance in weight loss. This is consistent means model. Unlike the classical model approach, thewith our perception of the differences between the weight- regression and cell means model approaches require a com-loss means for the three diets: girls on diet a3 lost ﬁve more puter and software for manipulating matrices.pounds than did those on a1. Certainly, any girl who is anx- Suppose that halfway through the weight-loss experimentious to lose weight would want to be on diet a3. Likewise, the the third participant in treatment combination a2 b2 (Y322 = 7)medium association between the exercise conditions and moved to another area of the country and dropped out of theweight loss is practically signiﬁcant: Walking on the tread- experiment. The loss of this participant resulted in unequalmill resulted in a mean weight loss of 3 pounds. Based on cell ns. Cell a2b2 has four participants; the other cells have ﬁveTukey’s HSD statistic, 95% conﬁdence intervals for the three participants. The analysis of the weight-loss data using thepairwise contrasts among the diet means are regression model is described next. −5.9 < ·1 − ·2 < 2.9 Regression Model −9.4 < ·1 − ·3 < −0.6 A qualitative regression model equation with h − 1 = −7.9 < ·2 − ·3 < 0.9. ( p − 1) + (q − 1) + ( p − 1)(q − 1) = 5 independent vari-Because the conﬁdence interval for ·1 − ·3 does not con- ables (Xi1, Xi2, . . ., Xi2 Xi3) and h = 6 parameters (␤0 , ␤1 , . . . ,tain 0, we can be conﬁdent that diet a3 is superior to diet a1. ␤5 ),Hedges’s (1981) effect size for the difference between diets A effects B effects A× B effectsa1 and a3 is Yi = ␤0 + ␤1 X i 1 + ␤2 X i 2 + ␤3 X i 3 + ␤4 X i 1 X i 3 + ␤5 X i 2 X i 3 + ei , Y ··1 − Y ··2 |8.0 − 13.0| g= = = 1.27, can be formulated so that tests of selected parameters of the σPooled ˆ 3.937 regression model provide tests of null hypotheses for A, B,a large effect. and A × B in the weight-loss experiment. Tests of the fol- Unfortunately, there is no statistic that measures practical lowing null hypotheses for this regression model are of par-signiﬁcance. The determination of whether results are impor- ticular interest:tant or useful must be made by the researcher. However, con-ﬁdence intervals and measures of effect magnitude can help H0: ␤1 = ␤2 = 0the researcher make this decision. If our discipline is to H0: ␤3 = 0progress as it should, researchers must look beyond signiﬁ- H0: ␤4 = ␤5 = 0cance tests and p values and focus on what their data tell themabout the phenomenon under investigation. For a fuller In order for tests of these null hypotheses to provide testsdiscussion of this point, see Kirk (2001). of ANOVA null hypotheses, it is necessary to establish a correspondence between the ﬁve independent variables ofAlternative Models the regression model equation and ( p − 1) + (q − 1) + ( p − 1)(q − 1) = 5 treatment and interaction effects of theThus far, I have described the classical model equation for CRF-32 design. One way to establish this correspondence isseveral experimental designs. This model and associated to code the independent variables of the regression model as
Factorial Designs 15follows: F statistics for testing hypotheses for selected regression 1, if an observation is in a1 parameters are obtained by dividing a regression mean square, X i1 = −1, if an observation is in a3 MSR, by an error mean square, MSE, where MSR = SSR/dfreg 0, otherwise and MSE = SSE/dferror. The regression sum of squares, SSR, 1, if an observation is in a2 that reﬂects the contribution of independent variables X1 and X i2 = −1, if an observation is in a3 X2 over and above the contribution of X3, X1X3, and X2X3 is 0, otherwise given by the difference between two error sums of squares, SSE, as follows: 1, if an observation is in b1 X i3 = −1, if an observation is in b2 A B A×B product of coded values SSR( X 1 X 2 | X 3 X 1 X 3 X 2 X 3 ) X i1 X i3 = associated with a1 and b1 B A×B A B A×B product of coded values X i2 X i3 = = SSE( X 3 X 1 X 3 X 2 X 3 ) – SSE( X 1 X 2 X 3 X 1 X 3 X 2 X 3 ) associated with a2 and b1This coding scheme, which is called effect coding, produced An error sum of squares is given bythe X matrix in Table 1.5. The y vector in Table 1.5 containsweight-loss observations for the six treatment combinations. SSE( ) = y y − [(Xi Xi )−1 (Xi y)] (Xi y),The ﬁrst column vector, x0, in the X matrix contains ones; thesecond through the sixth column vectors contain coded where the Xi matrix contains the ﬁrst column, x0, of X andvalues for Xi1, Xi2, . . . , Xi2Xi3. To save space, only a portion the columns corresponding the independent variables con-of the 29 rows of X and y are shown. As mentioned earlier, tained in SSE( ). For example, the X matrix used in comput-observation Y322 is missing. Hence, each of the treatment ing SSE(X3 X1X3 X2X3) contains four columns: x0, x3, x1x3,combinations contains ﬁve observations except for a2b2, and x2x3. The regression sum of squares corresponding towhich contains four. SSA in ANOVA is A B A×BTABLE 1.5 Data Vector, y, and X Matrix for the Regression Model SSR( X 1 X 2 | X 3 X 1 X 3 X 2 X 3 ) y X 29×1 29×6 B A×B A B A×B A B A× B = SSE( X 3 X 1 X 3 X 2 X 3 ) – SSE( X 1 X 2 X 3 X 1 X 3 X 2 X 3 ) Ά Ά Ά x0 x1 x2 x3 x1x3 x2x3 Ά 7 . 1 . 1 . 0 . 1 . 1 . 0 . = 488.1538 − 360.7500 = 127.4038a1b1 . . . . . . . . . . . . . . 1 1 1 0 1 1 0 with p − 1 = 2 degrees of freedom. This sum of squares −1 −1 is used in testing the regression null hypothesis H0: ␤1 = Ά 7 1 1 0 0 . . . . . . . . . . . . . .a1b2 . . . . . . . ␤2 = 0. Because of the correspondence between the regres- 9 1 1 0 −1 −1 0 sion and ANOVA parameters, a test of this regression null hypothesis is equivalent to testing the ANOVA null hypothe- Ά 9 1 0 1 1 0 1 . . . . . . . . . . . . . .a2b1 . . . . . . . sis for treatment A. 11 1 0 1 1 0 1 The regression sum of squares corresponding to SSB in −1 −1 ANOVA is Ά 10 1 0 1 0 . . . . . . . . . . . . . .a2b2 . . . . . . . B A A×B 13 1 0 1 −1 0 −1 SSR( X 3 | X 1 X 2 X 1 X 3 X 2 X 3 ) −1 −1 −1 −1 Ά 15 1 1 . . . . . . . . . . . . . . A A×B A B A×Ba3b1 . . . . . . . 8 1 −1 −1 1 −1 −1 = SSE( X 1 X 2 X 1 X 3 X 2 X 3 ) – SSE( X 1 X 2 X 3 X 1 X 3 X 2 X 3 ) Ά 13 1 −1 −1 −1 1 1 . . . . . . . . . . . . . . = 436.8000 − 360.7500 = 76.0500a3b2 . . . . . . . 12 1 −1 −1 −1 1 1 with q − 1 = 1 degree of freedom.
16 Experimental Design The regression sum of squares corresponding to SSA × B focuses on the grand mean, treatment effects, and interactionin ANOVA is effects. The cell means model equation for the CRF-pq design, A×B A B Yi jk = jk + i( jk) SSR( X 1 X 3 X 2 X 3 | X 1 X 2 X 3 ) (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , q), A B A B A×B = SSE( X 1 X 2 X 3 ) − SSE( X 1 X 2 X 3 X 1 X 3 X 2 X 3 ) focuses on cell means, where jk denotes the mean in cell aj and bk. Although I described the classical model ﬁrst, this is not the order in which the models evolved historically. = 388.5385 − 360.7500 = 27.7885 According to Urquhart, Weeks, and Henderson (1973), Fisher’s early development of ANOVA was conceptualizedwith ( p − 1)(q − 1) = 2 degrees of freedom. by his colleagues in terms of cell means. It was not until later The regression error sum of squares corresponding to that cell means were given a linear structure in terms of theSSWCELL in ANOVA is grand mean and model effects, that is, jk = + ␣ j + ␤k + (␣␤) jk . The classical model equation for a CRF-pq design A B A×B uses four parameters, + ␣ j + ␤k + (␣␤) jk , to represent SSE( X 1 X 2 X 3 X 1 X 3 X 2 X 3 ) = 360.7500 one parameter, jk . Because of this structure, the classical model is overparameterized. For example, the expectation ofwith N − h = 29 − 6 = 23 degrees of freedom. the classical model equation for the weight-loss experiment The total sum of squares is contains 12 parameters: , ␣1, ␣2, ␣3, ␤1, ␤2, (␣␤)11, (␣␤)12, (␣␤)21, (␣␤)22, (␣␤)31, (␣␤)32. However, there are only six SSTO = y y − y JyN −1 = 595.7931, cells means from which to estimate the 12 parameters. When there are missing cells in multitreatment designs, a researcherwhere J is a 29 × 29 matrix of ones and N = 29, the number is faced with the question of which parameters or parametricof weight-loss observations. The total sum of squares has functions are estimable. For a discussion of this and otherN − 1 = 28 degrees of freedom. The analysis of the weight- problems, see Hocking (1985), Hocking and Speed (1975),loss data is summarized in Table 1.6. The null hypotheses Searle (1987), and Timm (1975).␤1 = ␤2 = 0 and ␤3 = 0 can be rejected. Hence, indepen- The cell means model avoids the problems associated withdent variables X1 or X2 as well as X3 contribute to predicting overparameterization. A population mean can be estimatedthe dependent variable. As we see in the next section, the for each cell that contains one or more observations. Thus,F statistics in Table 1.6 are identical to the ANOVA F statis- the model is fully parameterized. Unlike the classical model,tics for the cell means model. the cell means model does not impose a structure on the analysis of data. Consequently, the model can be used to test hypotheses about any linear combination of population cellCell Means Model means. It is up to the researcher to decide which tests are meaningful or useful based on the original research hypothe-The classical model equation for a CRF-pq design, ses, the way the experiment was conducted, and the data that are available. Yi jk = + ␣ j + ␤k + (␣␤) jk + i( jk) I will use the weight-loss data in Table 1.2 to illustrate the (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , q), computational procedures for the cell means model. Again, TABLE 1.6 Analysis of Variance for the Weight-Loss Data (Observation Y322 is missing) Source SS df MS F p X1 X2 | X3 X1X3 X2X3 127.4038 p−1=2 63.7019 4.06 .031 X3 | X1 X2 X1X3 X2X3 76.0500 q −1=1 76.0500 4.85 .038 X1X3 X2X3 | X1 X2 X3 27.7885 ( p − 1)(q − 1) = 2 13.8943 0.89 .426 Error 360.7500 N − h = 23 15.6848 Total 595.7931 N − 1 = 28
Factorial Designs 17we will assume that observation Y322 is missing. The null TABLE 1.7 Data Vector, y, and X Matrix for the Cell Means Modelhypothesis for treatment A is y X 29×1 29×6 x1 x2 x3 x4 x5 x6 H0: 1· = 2· = 3· . Ά 7 1 0 0 0 0 0 . . . . . . . . . . . . . . a1b1 . . . . . . .An equivalent null hypothesis that is used with the cell means 1 1 0 0 0 0 0model is Ά 7 0 1 0 0 0 0 . . . . . . . . . . . . . . H0: 1· − 2· = 0 a1b2 . . . . . . . (1.1) 9 0 1 0 0 0 0 2· − 3· = 0. Ά 9 0 0 1 0 0 0 . . . . . . . . . . . . . .In terms of cell means, this hypothesis can be expressed as a2b1 . . . . . . . 11 0 0 1 0 0 0 + 12 + 22 H0: 11 − 21 =0 Ά 10 0 0 0 1 0 0 2 2 (1.2) . . . . . . . a2b2 . . . . . . . . . . . . . . 21 + 22 + 32 − 31 = 0, 13 0 0 0 1 0 0 2 2 Ά 15 0 0 0 0 1 0 . . . . . . . . . . . . . . a3b1where 1· = (11 + 12 )/2, 2· = (21 + 22 )/2, and so . . . . . . . 8 0 0 0 0 1 0on. In matrix notation, the null hypothesis is Ά 13 0 0 0 0 0 1 . . . . . . . . . . . . . . CA 0 a3b2 . . . . . . . ( p−1) × h h×1 ( p−1)×1 12 0 0 0 0 0 1 11 12 1 1 1 −1 −1 0 0 21 0 H0: = , 2 0 0 1 1 −1 −1 22 0 experiment is given in Table 1.7. The structural matrix is 31 32 coded as follows:where p is the number of levels of treatment A and h is the 1, if an observation is in a1 b1number of cell means. In order for the null hypothesis x1 = 0, otherwiseCA = 0 to be testable, the CA matrix must be of full rowrank. This means that each row of CA must be linearly 1, if an observation is in a1 b2 x2 =independent of every other row. The maximum number of 0, otherwisesuch rows is p − 1, which is why it is necessary to express thenull hypothesis as Equation 1.1 or 1.2. An estimator of the null 1, if an observation is in a2 b1 x3 =hypothesis, CA − 0, is incorporated in the formula for com- 0, otherwiseputing a sum of squares. For example, the estimator appears . . .as CA − 0 in the formula for the treatment A sum of squares ˆ 1, if an observation is in a3 b2 −1 −1 x6 = SSA = (CA − 0) [CA (X X) C A ] (CA − 0), ˆ ˆ (1.3) 0, otherwisewhere is a vector of sample cell means. Equation 1.3 sim- ˆ For the weight-loss data, the sum of squares for treatmentpliﬁes to A is SSA = (CA ) [CA (X X)−1 C A ]−1 (CA ) ˆ ˆ SSA = (CA ) [CA (X X)−1 C A ]−1 (C A ) = 127.4038 ˆ ˆbecause 0 is a vector of zeros. In the formula, CA is a with p − 1 = 2 degrees of freedom.coefﬁcient matrix that deﬁnes the null hypothesis, = ˆ The null hypothesis for treatment B is[(X X)−1 (X y)] = [Y ·11 , Y ·12 · · · Y ·23 ] , and X is a struc-tural matrix. The structural matrix for the weight-loss H0: ·1 = ·2 .
18 Experimental DesignAn equivalent null hypothesis that is used with the cell meansmodel is H0: ·1 − ·2 = 0.In terms of cell means, this hypothesis is expressed as 11 + 21 + 31 + 22 + 32 H0: − 12 = 0. Figure 1.9 Two interaction terms of the form jk − jk − j k + j k are 3 3 obtained from the crossed lines by subtracting the two ijs connected by a dashed line from the two ijs connected by a solid line.In matrix notation, the null hypothesis is The sum of squares for the A × B interaction is CB 0 (q−1)×h (q−1)×1 h×1 11 SSA × B = (CA×B ) [CA ×B (X X)−1 C A×B ]−1 (CA×B ) ˆ ˆ 12 = 27.7885 1 H0: [ 1 −1 1 −1 1 −1 ] 21 = , 3 22 with ( p − 1)(q − 1) = 2 degrees of freedom. 31 32 The within-cell sum of squares is SSWCELL = y y − (X y) = 360.7500, ˆwhere q is the number of levels of treatment B and h is thenumber of cell means. The sum of squares for treatment where y is the vector of weight-loss observations:B is [7 13 9 . . . 12]. The within-cell sum of squares has N − h = 29 − 6 = 23 degrees of freedom. SSB = (CB ) [CB (X X)−1 C B ]−1 (CB ) = 76.0500 ˆ ˆ The total sum of squares iswith q − 1 = 1 degree of freedom. SSTO = y y − y JyN −1 = 595.7931, The null hypothesis for the A × B interaction is where J is a 29 × 29 matrix of ones and N = 29, the number H0: jk − jk − j k + j k = 0 for all j and k. of weight-loss observations. The total sum of squares has N − 1 = 28 degrees of freedom. The analysis of the weight-loss data is summarized inFor the weight-loss data, the interaction null hypothesis is Table 1.8. The F statistics in Table 1.8 are identical to those in Table 1.6, where the regression model was used. H0: 11 − 12 − 21 + 22 = 0 The cell means model is extremely versatile. It can be 21 − 22 − 31 + 32 = 0 used when observations are missing and when entire cells are missing. It allows a researcher to test hypotheses aboutThe two rows of the null hypothesis correspond to the two any linear combination of population cell means. It has ansets of means connected by crossed lines in Figure 1.9. In important advantage over the regression model. With the cellmatrix notation, the null hypothesis is means model, there is never any ambiguity about the hypoth- esis that is tested because an estimator of the null hypothesis, CA×B 0 C − 0, appears in the formula for a sum of squares. Lack of ˆ ( p−1)(q−1)×h ( p−1)(q−1)×1 h×1 space prevents a discussion of the many other advantages of 11 12 the model; the reader is referred to Kirk (1995, pp. 289–301, 413–431). However, before leaving the subject, the model 1 −1 −1 1 0 0 21 0 H0 : = . 0 0 1 −1 −1 1 22 0 will be used to test a null hypothesis for weighted means. 31 Occasionally, researchers collect data in which the sample 32 sizes are proportional to the population sizes. This might
Factorial Designs 19 TABLE 1.8 Analysis of Variance for the Weight-Loss Data (Observation Y322 is missing) Source SS df MS F p Treatment A (Diet) 127.4038 p−1=2 63.7019 4.06 .031 Treatment B (Exercise) 76.0500 q −1=1 76.0500 4.85 .038 A×B 27.7885 ( p − 1)(q − 1) = 2 13.8943 0.89 .426 Within cell 360.7500 N − h = 23 15.6848 Total 595.7931 N − 1 = 28occur, for example, in survey research. When cell ns are where MSWCELL is obtained from Table 1.8. The null hy-unequal, a researcher has a choice between computing un- pothesis is rejected. This is another example of the versatilityweighted means or weighted means. Unweighted means are of the cell means model. A researcher can test hypothesessimple averages of cell means. These are the means that were about any linear combination of population cell means.used in the previous analyses. Weighted means are weighted In most research situations, sample sizes are not propor-averages of cell means in which the weights are the sample tional to population sizes. Unless a researcher has a com-cell sizes, n jk . Consider again the weight-loss data in which pelling reason to weight the sample means proportional to theobservation Y322 is missing. Unweighted and weighted sam- sample sizes, unweighted means should be used.ple means for treatment level a2 where observation Y322 ismissing are, respectively, Randomized Block Factorial Design 21 + 22 ˆ ˆ 9.00 + 10.75 Next I describe a factorial design that is constructed from two 2· = ˆ = = 9.88 q 2 randomized block designs. The design is called a randomized n 21 21 + n 22 22 ˆ ˆ 5(9.00) + 4(10.75) block factorial design and is denoted by RBF-pq. The RBF-pq ˆ 2· = = = 9.78; n j· 9 design is obtained by combining the levels of an RB-p design with those of an RB-q design so that each level of the RB-pnj. is the number of observations in the jth level of treatment design appears once with each level of the RB-q design andA. The null hypothesis using weighted cell means for treat- vice versa. The design uses the blocking technique describedment A is in connection with an RB-p design to isolate variation attrib- utable to a nuisance variable while simultaneously evaluating n 11 11 + n 12 12 n 21 21 + n 22 22 two or more treatments and associated interactions. H0: − =0 In discussing the weight-loss experiment, I hypothesized n 1· n 2· that ease of losing weight is related to the amount by which a n 21 21 + n 22 22 n 31 31 + n 32 32 − = 0. girl is overweight. If the hypothesis is correct, a researcher n 2· n 3· can improve on the CRF-32 design by isolating this nuisance variable. Suppose that instead of randomly assigning 30 girlsThe coefﬁcient matrix for computing SSA is to the six treatment combinations in the diet experiment, the 5 5 −5 −4 0 0 researcher formed blocks of six girls such that the girls in a 10 10 9 9 CA = , block are overweight by about the same amount. One way to 0 0 5 9 4 9 − 10 5 − 10 5 form the blocks is to rank the girls from the least to the most overweight. The six least overweight girls are assigned towhere the entries in CA are ±n jk /n j. and zero. The sum of block 1. The next six girls are assigned to block 2 and so on.squares and mean square for treatment A are, respectively, In this example, ﬁve blocks of dependent samples can be formed from the 30 participants. Once the girls have been SSA = (CA ) [CA (X X)−1 C A ]−1 (CA ) = 128.2375 ˆ ˆ assigned to the blocks, the girls in each block are randomly MSA = SSA/( p − 1) = 146.3556/(3 − 1) = 64.1188. assigned to the six treatment combinations. The layout for this experiment is shown in Figure 1.10.The F statistic and p value for treatment A are The classical model equation for the experiment is MSA 64.1188 Yi jk = + i + ␣ j + ␤k + (␣␤) jk + (␣␤) jki F= = = 4.09 p = .030, MSWCELL 15.6848 (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , q),
20 Experimental Design Figure 1.10 Layout for a two-treatment randomized block factorial design (RBF-32 design). Each block contains six girls who are overweight by about the same amount. The girls in a block are randomly assigned to the six treatment combinations.where H0: ··1 = ··2 (Treatment B population means are equal.) Yi jk is the weight loss for the participant in Blocki H0: · jk − · jk − · j k + · j k = 0 for all j and k and treatment combination ajbk. (All A × B interaction effects equal zero.) is the grand mean of the six weight-loss popula- tion means. The hypothesis that the block population means are equal i is the block effect for population i and is equal is of little interest because the blocks represent different to i·· − . It reﬂects the effect of the nuisance amounts by which the girls are overweight. variable in Blocki. The data for the RBF-32 design are shown in Table 1.9. The same data were analyzed earlier using a CRF-32 ␣j is the treatment effect for population aj and is design. Each block in Table 1.9 contains six girls who at equal to · j· − . It reﬂects the effect of diet aj. the beginning of the experiment were overweight by about ␤k is the treatment effect for population bk and is the same amount. The ANOVA for these data is given in equal to ··k − . It reﬂects the effects of exer- Table 1.10. A comparison of Table 1.10 with Table 1.4 re- cise condition bk. veals that the RBF-32 design is more powerful than the (␣␤) jk is the interaction effect for populations aj and bk CRF-32 design. Consider, for example, treatment A. The and is equal to · jk − · j· − ··k − . F statistic for the randomized block factorial design is (␣␤) jki is the residual error effect for treatment combi- F(2, 20) = 8.09, p = .003; the F for the completely random- nation ajbk and Blocki. ized factorial design is F(2, 24) = 4.25, p = .026. The ran- domized block factorial design is more powerful becauseThe design enables a researcher to test four null hypotheses: the nuisance variable—the amount by which participants are overweight— has been removed from the residual error H0: 1·· = 2·· = · · · = 5·· variance. A schematic partition of the total sum of squares (Block population means are equal.) and degrees of freedom for the two designs is shown in Figure 1.11. It is apparent from Figure 1.11 that the H0: ·1· = ·2· = ·3· SSRESIDUAL will always be smaller than the SSWCELL if (Treatment A population means are equal.) TABLE 1.10 Analysis of Variance for the Weight-Loss DataTABLE 1.9 Weight-Loss Data for the Diet (aj) and Exercise Source SS df MS F pConditions (bk) Blocks 209.3333 4 52.3333 6.43 .002 a1b1 a1b2 a2b1 a2b2 a3b1 a3b2 Treatments 234.1667 5Block1 5 4 7 5 8 13 Treatment A (Diet) 131.6667 2 65.8334 8.09 .003Block2 7 7 4 7 5 16 Treatment B (Exercise) 67.5000 1 67.5000 8.30 .009Block3 1 14 9 13 10 12 A×B 35.0000 2 17.5000 2.15 .142Block4 9 9 11 15 12 20 Residual 162.6667 20 8.1333Block5 13 11 14 10 15 19 Total 606.1667 29
Factorial Designs with Confounding 21 SSWCELL ϭ 372.0 pq(n Ϫ 1) ϭ 24 SSWCELL ϭ 372.0 pq(n Ϫ 1) ϭ 24 SSRES ϭ 162.7 (n Ϫ 1)(pq Ϫ 1) ϭ 20 Figure 1.11 Schematic partition of the total sum of squares and degrees of freedom for CRF-32 and RBF-32 designs. The shaded rectangles indicate the sums of squares that are used to compute the error variance for each design: MSWCELL = SSWCELL/ pq(n − 1) and MSRES = SSRES/(n − 1)( pq − 1). If the nuisance variable (SSBL) in the RBF-32 design accounts for an appreciable portion of the total sum of squares, the design will have a smaller error vari- ance and, hence, greater power than the CRF-32 design.the SSBLOCKS is greater than zero. The larger the SS- mentation in which the levels of, say, treatment A areBLOCKS in a randomized block factorial design are, the applied to relatively large plots of land—the whole plots.greater the reduction in the SSRESIDUAL. The whole plots are then split or subdivided, and the levels of treatment B are applied to the subplots within each whole plot. A two-treatment split-plot factorial design is constructedFACTORIAL DESIGNS WITH CONFOUNDING by combining two building block designs: a completely ran- domized design having p levels of treatment A and a random-Split-Plot Factorial Design ized block design having q levels of treatment B. The assign-As we have just seen, an important advantage of a random- ment of participants to the treatment combinations is carriedized block factorial design relative to a completely random- out in two stages. Consider the weight-loss experiment again.ized factorial design is greater power. However, if either p or Suppose that we ranked the 30 participants from least to mostq in a two-treatment randomized block factorial design is overweight. The participants ranked 1 and 2 are assigned tomoderately large, the number of treatment combinations in block 1, those ranked 3 and 4 are assigned to block 2, andeach block can be prohibitively large. For example, an RBF- so on. This procedure produces 15 blocks each containing two45 design has blocks of size 4 × 5 = 20. Obtaining blocks girls who are similar with respect to being overweight. In thewith 20 matched participants or observing each participant ﬁrst stage of randomization the 15 blocks of girls are randomly20 times is generally not feasible. In the late 1920s Ronald assigned to the three levels of treatment A with ﬁve blocks inA. Fisher and Frank Yates addressed the problem of prohib- each level. In the second stage of randomization the two girlsitively large block sizes by developing confounding schemes in each block are randomly assigned to the two levels of treat-in which only a portion of the treatment combinations in an ment B. An exception to this randomization procedure must beexperiment are assigned to each block. Their work was made when treatment B is a temporal variable such as succes-extended in the 1940s by David J. Finney (1945, 1946) and sive learning trials or periods of time. Trial 2, for example, can-Oscar Kempthorne (1947). One design that achieves a re- not occur before Trial 1.duction in block size is the two-treatment split-plot factorial The layout for a split-plot factorial design with threedesign. The term split-plot comes from agricultural experi- levels of treatment A and two levels of treatment B is
22 Experimental Design Figure 1.12 Layout for a two-treatment split-plot factorial design (SPF-3·2 design). The 3n blocks are randomly assigned to the p = 3 levels of treatment A with the restriction that n blocks are assigned to each level of A. The n blocks assigned to each level of treatment A con- stitute a group of blocks. In the second stage of randomization, the two matched participants in a block are randomly assigned to the q = 2 levels of treatment B.shown in Figure 1.12. Treatment A is called a between-blocks treatment A; a different and usually much smaller errortreatment; B is a within-blocks treatment. The designation variance, MSRESIDUAL, is used to test treatment B and thefor a two-treatment split-plot factorial design is SPF- p·q. A × B interaction. As a result, the power of the tests for BThe p preceding the dot denotes the number of levels of the and the A × B interaction is greater than that for A. Hence,between-blocks treatment; the q after the dot denotes the a split-plot factorial design is a good design choice if anumber of levels of the within-blocks treatment. Hence, researcher is more interested in treatment B and the A × Bthe design in Figure 1.12 is an SPF-3·2 design. interaction than in treatment A. When both treatments and An RBF-32 design contains 3 × 2 = 6 treatment combi- the A × B interaction are of equal interest, a randomizednations and has blocks of size six. The SPF-3·2 design in block factorial design is a better choice if the larger blockFigure 1.12 contains the same six treatment combinations, size is acceptable. If a large block size is not acceptable andbut the block size is only two. The advantage of the split- the researcher is primarily interested in treatments A and B,plot factorial—smaller block size—is achieved by con- an alternative design choice is the confounded factorialfounding groups of blocks with treatment A. Consider the design. This design, which is described later, achieves asample means Y ·1· , Y ·2· , and Y ·3· in Figure 1.12. The differ- reduction in block size by confounding groups of blocksences among the means reﬂect the differences among the with the A × B interaction. As a result, tests of treatmentsthree groups as well as the differences among the three A and B are more powerful than the test of the A × Blevels of treatment A. To put it another way, we cannot tell interaction.how much of the differences among the three sample means Earlier, an RBF-32 design was used for the weight-lossis attributable to the differences among Group1, Group2, and experiment because the researcher was interested in tests ofGroup3 and how much is attributable to the differences treatments A and B and the A × B interaction. For purposesamong treatments levels a1, a2, and a3. For this reason, the of comparison, I analyze the same weight-loss data as ifthree groups and treatment A are said to be completely an SPF-3·2 design had been used even though, as we willconfounded. see, this is not a good design choice. But ﬁrst I describe the The use of confounding to reduce the block size in an classical model equation for a two-treatment split-plot facto-SPF- p·q design involves a tradeoff that needs to be made rial design.explicit. The RBF-32 design uses the same error variance, The classical model equation for the weight-loss experi-MSRESIDUAL, to test hypotheses for treatments A and B ment isand the A × B interaction. The two-treatment split-plotfactorial design, however, uses two error variances. Yi jk = + ␣ j + i( j) + ␤k + (␣␤) jk + (␤)ki( j)MSBLOCKS within A, denoted by MSBL(A), is used to test (i = 1, . . . , n; j = 1, . . . , p; k = 1, . . . , q),
Factorial Designs with Confounding 23where TABLE 1.11 Weight-Loss Data for the Diet (aj) and Exercise Conditions (bk) Yi jk is the weight loss for the participant in Block i( j) Treatment Treatment and treatment combination aj bk. Level b1 Level b2 Ά is the grand mean of the six weight-loss popula- Block1 5 4 Block2 7 7 tion means. Group1 a1 Block3 1 14 ␣j is the treatment effect for population aj and is Block4 9 9 equal to · j· − . It reﬂects the effect of diet aj. Block5 13 11 Ά i( j) is the block effect for population i and is equal to Block6 7 5 Block7 4 7 i j· − · j· . The block effect is nested within aj. Group2 a2 Block8 9 13 ␤k is the treatment effect for population bk and is Block9 11 15 equal to ··k − . It reﬂects the effects of exer- Block10 14 10 Ά cise condition bk. Block11 8 13 Block12 5 16 (␣␤) jk is the interaction effect for populations aj and bk Group3 a3 Block13 10 12 and is equal to · jk − · j· − ··k + . Block14 12 20 Block15 15 19 (␤)ki( j) is the residual error effect for treatment level bk and Blocki( j) and is equal to Yi jk − − ␣ j − i( j) − ␤k − (␣␤) jk . effectiveness of the three diets, the SPF-3·2 design is a poor The design enables a researcher to test three null hypotheses: choice. However, the SPF-3·2 design fares somewhat better if one’s primary interests are in treatment B and the A × B H0: ·1· = ·2· = ·3· interaction: (Treatment A population means are equal.) Treatment B H0: ··1 = ··2 67.5000/1 67.5000 (Treatment B population means are equal.) SPF-3·2 design F= = = 6.23 p = .028 130.0000/12 10.8333 H0: · jk − · jk − · j k + · j k = 0 for all j and k 67.5000/1 67.5000 CRF-32 design F= = = 4.35 p = .048 (All A × B interaction effects equal zero.) 372.0000/24 15.5000 67.5000/1 67.5000 RBF-32 design F= = = 8.30 p = .009 The weight-loss data from Tables 1.2 and 1.9 are recasts in 162.6667/20 8.1333the form of an SPF-3·2 design in Table 1.11. The ANOVA A × B interactionfor these data is given in Table 1.12. The null hypothesis for 35.0000/2 17.5000treatment B can be rejected. However, the null hypothesis SPF-3·2 design F= = = 1.62 p = .239 130.0000/12 10.8333for treatment A and the A × B interaction cannot be rejected. 35.0000/2 17.5000The denominator of the F statistic for treatment A CRF-32 design F= = = 1.13 p = .340 372.0000/24 15.5000[MSBL(A) = 20.1667] is almost twice as large as the de- 35.0000/2 17.5000nominator for the tests of B and A × B (MSRES = 10.8333). RBF-32 design F= = = 2.15 p = .142 162.6667/20 8.1333A feeling for the relative power of the test of treatment A forthe SPF-3·2, CRF-32, and RBF-32 designs can be obtainedby comparing their F statistics and p values: TABLE 1.12 Analysis of Variance for the Weight-Loss Data Source SS df MS F p Treatment A 1. Between blocks 373.6667 14 131.6667/2 65.8334 2. Treatment A (Diet) 131.6667 2 65.8334 [2͞3]a 3.26 .074 SPF-3·2 design F= = = 3.26 p = .074 3. Blocks within A 242.0000 12 20.1667 242.0000/12 20.1667 4. Within blocks 232.5000 15 131.6667/2 65.8334 CRF-32 design F= = = 4.25 p = .026 5. Treatment B 67.5000 1 67.5000 [5͞7] 6.23 .028 372.0000/24 15.5000 (Exercise) 131.6667/2 65.8334 6. A × B 35.0000 2 17.5000 [6͞7] 1.62 .239 RBF-32 design F= = = 8.09 p = .003 162.6667/20 8.1333 7. Residual 130.0000 12 10.8333 8. Total 606.1667 29For testing treatment A, the SPF-3·2 design is the least a The fraction [2͞3] indicates that the F statistic was obtained by dividing thepowerful. Clearly, if one’s primary interest is in the mean square in row two by the mean square in row three.
24 Experimental Design The SPF-3·2 design is the ﬁrst design I have described A and B. The RBF-pq design is a better choice if blocks of sizethat involves two different building block designs: a CR-p p × q are acceptable. If this block size is too large, an alterna-design and an RB-q design. Also, it is the ﬁrst design that tive choice is a two-treatment confounded factorial design.has two error variances: one for testing the between-blocks This design confounds an interaction with groups of blocks.effects and another for testing the within-blocks effects. A As a result, the test of the interaction is less powerful than testsweighted average of the two error variances is equal to of treatments A and B. Confounded factorial designs are con-MSWCELL in a CRF-pq design, where the weights are the structed from either a randomized block design or a Latindegrees of freedom of the two error variances. This can be square design. The designs are denoted by, respectively,shown using the mean squares from Tables 1.4 and 1.12: RBCF-pk and LSCF-pk, where RB and LS identify the build- ing block design, C indicates that the interaction is completely p(n − 1)MSBL(A) + p(n − 1)(q − 1)MSRESIDUAL confounded with groups of blocks, F indicates a factorial de- p(n − 1) + p(n − 1)(q − 1) sign, and pk indicates that the design has k treatments each having p levels. The simplest randomized block confounded = MSWCELL factorial design has two treatments with two levels each. Con- sider the RBCF-22 design in Figure 1.14. The A × B interac- 3(5 − 1)20.1667 + 3(5 − 1)(2 − 1)10.8333 tion is completely confounded with Group1 and Group2, as I = 15.5000 3(5 − 1) + 3(5 − 1)(2 − 1) will now show. An interaction effect for treatments A and B has the general form jk − jk − j k + j k . Let i jkzA schematic partition of the total sum of squares and degrees denote the population mean for the ith block, jth level of A,of freedom for the CRF-32 and SPF-3·2 designs is shown in kth level of B, and zth group. For the design in Figure 1.14,Figure 1.13. the A × B interaction effect is ·111 − ·122 − ·212 + ·221Confounded Factorial Designs orAs we have seen, an SPF-p·q design is not the best designchoice if a researcher’s primary interest is in testing treatments (·111 + ·221 ) − (·122 + ·212 ). SSWCELL ϭ 372.0 pq(n Ϫ 1) ϭ 24 SSBL(A) ϭ 242.0 p(n Ϫ 1) ϭ 12 SSRES ϭ 130.0 p(n Ϫ 1)(q Ϫ 1) ϭ 12 Figure 1.13 Schematic partition of the total sum of squares and degrees of freedom for CRF-32 and SPF-3·2 designs. The shaded rectangles indicate the sums of squares that are used to compute the error variance for each design. The SPF-3·2 design has two error variances: MSBL(A) = SSBL(A)/ p(n − 1) is used to test treatment A; MSRES = SSRES/ p(n − 1)(q − 1) is used to test treatment B and the A × B interaction. The within-blocks error variance, MSRES, is usually much smaller than the between-blocks error variance, MSBL(A). As a result, tests of treatment B and the A × B interaction are more powerful than the test of treatment A.
Factorial Designs with Confounding 25 Figure 1.14 Layout for a two-treatment randomized block confounded factorial design (RBCF-22 design). A score in the ith block, jth level of treatment A, kth level of treatment B, and zth group is denoted by Yijkz.The difference between the effects of Group1 and Group2, Fractional Factorial Designs (·111 − ·221 ) − (·122 + ·212 ), Two kinds of confounding have been described thus far: group-treatment confounding in an SPF- p·q design andinvolves the same contrast among means as the A × B inter- group-interaction confounding in an RBCF-pk design. A thirdaction effect. Hence, the two sets of effects are completely form of confounding, treatment-interaction confounding, isconfounded because we cannot determine how much of the used in a fractional factorial design. This kind of confoundingdifference (·111 + ·221 ) − (·122 + ·212 ) is attributable to reduces the number of treatment combinations that must bethe A × B interaction and how much is attributable to the included in a multitreatment experiment to some fraction—difference between Group1 and Group2. , , , , , 1 1 1 1 1 2 3 4 8 9 and so on—of the total number of treatment The RBCF-pk design, like the SPF- p·q design, has two combinations. A CRF-22222 design has 32 treatment combi-error variances: one for testing the between-blocks effects nations. By using a 1 or 1 fractional factorial design, the 2 4and a different and usually much smaller error variance for number of treatment combinations that must be includedtesting the within-blocks effects. In the RBCF-pk design, in the experiment can be reduced to, respectively,treatments A and B are within-block treatments and are eval- 1 2 (32) = 16 or 1 (32) = 8. 4uated with greater power than the A × B interaction that is a The theory of fractional factorial designs was developedbetween-block component. Researchers need to understand for 2k and 3k designs by Finney (1945, 1946) and extended bythe tradeoff that is required when a treatment or interaction is Kempthorne (1947) to designs of the type pk, where p is aconfounded with groups to reduce the size of blocks. The prime number that denotes the number of levels of eachpower of the test of the confounded effects is generally less treatment and k denotes the number of treatments. Fractionalthan the power of tests of the unconfounded effects. Hence, if factorial designs are most useful for pilot experiments andpossible, researchers should avoid confounding effects that exploratory research situations that permit follow-up experi-are the major focus of an experiment. Sometimes, however, ments to be performed. Thus, a large number of treatments,confounding is necessary to obtain a reasonable block size. If typically six or more, can be investigated efﬁciently in anthe power of the confounded effects is not acceptable, the initial experiment, with subsequent experiments designed topower always can be increased by using a larger number of focus on the most promising independent variables.blocks. Fractional factorial designs have much in common with One of the characteristics of the designs that have been confounded factorial designs. The latter designs achieve adescribed so far is that all of the treatment combinations reduction in the number of treatment combinations that mustappear in the experiment. The fractional factorial design that be included in a block. Fractional factorial designs achieve ais described next does not share this characteristic. As the reduction in the number of treatment combinations in the ex-name suggests, a fractional factorial design includes only a periment. The reduction in the size of an experiment comes atfraction of the treatment combinations of a complete factorial a price, however. Considerable ambiguity may exist in inter-design. preting the results of an experiment when the design includes
26 Experimental Designonly one half or one third of the treatment combinations.Ambiguity occurs because two or more names can be givento each sum of squares. For example, a sum of squares mightbe attributed to the effects of treatment A and the BCDE in-teraction. The two or more names given to the same sum ofsquares are called aliases. In a one-half fractional factorialdesign, all sums of squares have two aliases. In a one-thirdfractional factorial design, all sums of squares have threealiases, and so on. Treatments are customarily aliased withhigher-order interactions that are assumed to equal zero. Thishelps to minimize but does not eliminate ambiguity in inter-preting the outcome of an experiment. Fractional factorial designs are constructed from com-pletely randomized, randomized block, and Latin square de-signs and denoted by, respectively, CRFF-pk–1, RBFF-pk–1,and LSFF-pk. Let’s examine the designation CRFF-25–1. The Figure 1.15 Layout for a three-treatment completely randomized frac- tional factorial design (CRFF-23−1 design). A score for the ith participant inletters CR indicate that the building block design is a com- treatment combination aj bk cl is denoted by Yi jkl . The 4n participants are ran-pletely randomized design; FF indicates that it is a fractional domly assigned to the treatment combinations with the restriction that n par-factorial design; and 25 indicates that each of the ﬁve treat- ticipants are assigned to each combination. The mean for the participants in the four groups is denoted by Y ·111 , Y ·122 , Y ·212 , and Y ·221 .ments has two levels. The −1 in 25−1 indicates that the designis a one-half fraction of a complete 25 factorial design. Thisfollows because the designation for a one-half fraction of a 25 A × C interaction are two names for another source of varia-factorial design can be written as 1 25 = 2−1 25 = 25−1 . A 2 tion, as are A × B and C. Hence, the F statisticsone-fourth fraction of a 25 factorial design is denoted byCRFF- p5−2 because 1 25 = 22 25 = 2−2 25 = 25−2 . 1 MSA MSB × C 4 F= and F = To conserve space, I describe a small CRFF-23−1 design. MSWCELL MSWCELLA fractional factorial design with only three treatments is un-realistic, but the small size simpliﬁes the presentation. The test the same sources of variation. If F = MSA/MSWCELL islayout for the design is shown in Figure 1.15. On close in- signiﬁcant, a researcher does not know whether it is becausespection of Figure 1.15, it is apparent that the CRFF-23−1 de- treatment A is signiﬁcant, the B × C interaction is signiﬁ-sign contains the four treatment combinations of a CRF-22 cant, or both.design. For example, if we ignore treatment C, the design in At this point you are probably wondering why anyoneFigure 1.15 has the following combinations of treatments A would use such a design—after all, experiment