Theory drivenShould always be piloted firstManipulation check: a question or set of questions designed to determine whether the independent variable was manipulated successfully.Confirmatory (fixed design) vs. Exploratory approachReliabilityValidityConstruct Validity: Are you actually measuring what you intend to measureFace: looks like it’s measuring what it claims to measurePredictive criterion: can we make predictionsInternal: If a study can plausibly demonstrate the causal relationship between a treatment and an outcome measure, it is sais to have internal validity.External: generalizability
History: Things that have changed in the participants’ environments other than those forming a direct part of the inquiry (e.g., occurrence of major air disaster during study of effectiveness of desensitization programme on persons with fear of air travel). Testing: Changes occurring as a result of practice and experience gained by participants on any pre-tests (e.g., asking opinions about factory farming of animals before some intervention may lead respondents to think about the issues and develop more negative attitudes). Instrumentation: Some aspect(s) of the way participants were measured changed between pre-test and post-test (e.g., raters in observational study using a wider or narrower definition of a particular behavior as they get more familiar with the situation).Regression (to the mean): If participants are chosen because they are unusual or atypical (e.g., high scores) later testing will tend to give less unusual scores (“regression to the mean”); e.g., in an intervention programme with pupils with learning difficulties where ten highest-scoring pupils in a special unit are matched with ten of the lowest-scoring pupils in a mainstream school, regression effects will tend to show the former performing relatively worse on a subsequent test. See further details on p. 142.Mortality: Participants dropping out of the study (e.g., in study of adult literacy programme – selective drop-out of those who are making little progress).Maturation: Growth, change or development in participants unrelated to the treatment enquiry (e.g., evaluation extended athletics training programme with teenagers – intervening in height, weight and general maturity). Aging processes (simply growing older, cognitive dvpmt) physiological states ( hunger, fatigue, attention span, motivation)Selection: Initial differences between groups prior to involvement in the enquiry (e.g., the use of an arbitrary non-random rule to produce two groups ensures they differ in one respect which may correlate with others).Selection by maturation interaction: Predisposition of groups to grow apart (or together if initially different); e.g., use of groups of boys and girls initially matched on physical strength in a study of fitness programme.Experimenter Bias: can influence research results in subtle ways. Ambiguity about causal direction: Does A cause B, or B cause A? (e.g., in any correlational study, unless it is known that A precedes B, or vice versa – or some other logical analysis is possibleDiffusion of treatments: When one group learns information or otherwise inadvertently receives aspects of a treatment intended only for a second group (e.g., in a quasi-experimental study of two classes in the same school).Compensatory equalization of treatments: If one group receives “special” treatment, there will be organizational and other pressures for a control group to receive it (e.g., nurses in a hospital study may improve the treatment of a control group on grounds of fairness).Compensatory rivalry: As above but an effect on the participants themselves – referred to as the “John Henry” effect after the steel worker who killed himself through over exertion to prove his superiority to the new steam drill; e.g., when a group in an organization sees itself under threat from a planned change in another part of the organization and improves performance).
The methods for increasing internal validity (or controlling for extraneous variables) usually involves attempting to ensure that extraneous factors are exerting an equal effect on all groups studied. If this occurs, the extraneous variables are essentially “cancelled out.” that is ,any observed differences between groups on a DV cannot be due to an extraneous variable that is operating equally in all groups. As you’ll see, there are also some methods of control that involve attempting to eliminate extraneous factors, control for their influence, rather than having them operate equally in all the groups.Random assignment (or randomization) of subjects to groups is considered the most “powerful” method for controlling extraneous variables. By definition, random assignment means that, for all subjects in the experiment, the probability of being assigned to a particular group is the same. For instance, say you are studying the effects of a new psychotherapy on two groups: a therapy group vs. a wait list group. If you are using random assignment, you would take steps to ensure that every subject has a 50-50 chance of being assigned to each of these groups. So you wouldn’t place the first 20 people who sign up, or those who appear most motivated for treatment, or those who needed therapy the most in one condition over the other. Instead, assignment intro groups is determined on a random basis. The PURPOSE of random assignment is to obtain equivalency among groups. It is the most powerful method of obtaining equivalency in that all extraneous characteristics (even those the researcher hasn’t even measured or thought of) will theoretically be evenly distributed between all the groups.Random assignment vs. random selection: Random selection refers to methods of selecting subjects to participate in the research study (picking every 10th student in the UDM directory). In contrast, random assignment takes place after the subjects have been selected. Specifically, random selection means Ss have an equal chance of being included IN THE STUDY, random assignment means the Ss we have already selected have an equal chance of being assigned to the various GROUPS or TREATMENT CONDITIONS. MATCHING: Matching is a procedure that can control for the effects of a specific extraneous variable. It involves identifying subjects (usually through a pretest) who are similar in terms of their status on the extraneous variable, then grouping similar subjects and randomly assigning members of the matched group to the treatment conditions. Maybe we are concerned with Ss Intelligence for the Therapy study. We could administer an IQ assessment (WAIS) before treatment and create matched pairs of subjects with similar IQ scores. Then randomly assign one member of the pair to the Tx condition and the other to the control group. Matching is particularly useful when the sample size is small, and we cannot count on random assignment to ensure equivalency among groups for extraneous variables.Blocking: Blocking involves studying the effects of an extraneous variable (Usually a pre-existing subject characteristic, such as gender or intelligence) to determine if and to what degree it is accounting for scores on the dependent variable. In other words, blocking basically involves making the extraneous variable another independent variable. For instance, in the therapy study, the researcher might divide all subjects into two groups (or blocks): a higher intelligence group (e.g., IQ > 110) and a lower intelligence group (IQ < 110) in addition to the therapy vs. waitlist group. Now the research has two independent variables with two conditions each: Therapy (therapy vs waitlist) and IQ (hi vs. low), and 4 experimental groups (see next slide). This procedure allows the researcher, in conducting statistical analyses of the results, to separate out the effects of therapy from the effects of intelligence, as well as identify a possible interaction (how they work in combination) between the variables. Some people confuse blocking with matching. Matching is a procedure used to ENSURE EQUIVALENCY in terms of extraneous variables. Blocking is designed to determine the effects of extraneous variables. Note that when matching was used in our sample study, no independent variable was added to the study, and therefore, two experimental groups remained. By contrast, in using blocking we added a new independent variable with 2 groups. Keep in mind that this will make our study more complicated – the more conditions we have the larger the sample size we need and the more complicated interpreting our results becomes.Holding extraneous variables constant: This method completely eliminates the effects of an extraneous variable. It involves including only subjects who are homogenous in terms of their status on the extraneous variable. For example, in the therapy study, the researcher might choose to select only highly intelligent people for the study (experimental and control), eliminating those with average or lower IQ from the mix. Or maybe looks at only females and excludes males. Therefore, ay differences between the two groups on the DV cannot be attributed to intelligence. If for this study only above average intelligence subjects were studied, we would have not way of knowing I the therapy was effective, ineffective or even harmful for individuals of lower intelligence.Statistical control: Previous methods involve research design strategies: that is, they have to do with how experimental groups are formed and how subjects are assigned to these groups. By contrast, statistical techniques are strategies for increasing internal validity; that is it involves statistically adjusting the data. In essence, it is like post-hoc matching after the data are obtained. DV scores are adjusted so that subjects are equalized in terms of their status on one or more extraneous variables (this involves complex statistical calculations beyond the scope of this lecture). Typically assume the covariate is linearly related (therapy is more effective for high IQ and less effective for low IQ), but could also have U shaped relationship (Yerkes-Dodgson relationship), maybe more effective for average IQ than low or high IQ. Then we run into more problems. Could transform data – convert to Z-scores and square (to remove minus sign – thus – 1 SD would be equal to + SD) – this would make it linear and put it in all the same direction – but would have to have solid evidence to justify this.
Procedures for Maximizing Internal ValidityA researcher can maximize internal validity by taking steps to minimize the potential threats to internal validity. Fraenkel and Wallen (1993) suggest four general ways in which these threats can be minimized:1.) Standardization of the conditions under which the research study is carried out will help minimize threats to internal validity from history and instrumentation. 2.) Obtaining as much information as possible about the participants in the research study aids in minimizing threats to internal validity from mortality and selection. 3.) Obtaining as much information as possible about the procedural details of the research study, for example, where and when the study occurs, minimizes threats to internal validity from history and instrumentation. 4.) Choosing an appropriate research design can help control most other threats to internal validity. The following are some specific suggestions for minimizing the potential threat to internal validity from each of the sources mentioned earlier.History: The use of a control group, selected from the same population as the experimental group (s) and which experiences the same concurrent history as the experimental group(s), can help eliminate most of the effects of history. Also, the shorter the duration of an experiment, the less likely history will be a threat.Maturation: The effects of maturation, like the effects of history, can be minimized by the use of a control group, selected from the same population as the experimental group(s). Also, like the effects of history, the effects of maturation tend to be minimized in studies of short duration.Testing: The use of a research design that does not include a pretest can eliminate testing as a potential threat to internal validity. If baseline or pretreatment data are needed, the use of unobtrusive measures (data collection techniques about which the experimental participant is unaware) may minimize the effects of testing. It also may help for a researcher to use different equivalent forms of a test for pre-testing and post-testing.Instrumentation: Careful specification and control of the measurement procedures can eliminate most instrumentation threat. Standardized instruments, administration or data collection procedures, and the training of observers are among the procedures that help control the instrumentation threat.Statistical regression: Avoiding the use of extreme scorers, when average scorers are excluded, will minimize the threat due to statistical regression.Mortality: Choosing large groups and ensuring that they are representative of the population from which they were selected can minimize mortality threats. The use of follow-up procedures with a portion of those who leave the study or who were initially unavailable can further minimize mortality as a threat.Selection: Random selection and random assignment of subjects minimize selection as a threat to internal validity. If random selection and assignment are not possible, the use of certain statistical techniques, used as part of a careful quasi-experimental design, can adjust for group differences and thereby minimize selection as a threat.
Selection: findings being specific to the group or sampled studied. If not representative of the population you’re in big troubleAddress through random selectionSetting: Findings being specific to, or dependent on, the particular context in which the study took placeHistory: Specific and unique historical experiences may determine or affect the findings.Construct effects: The particular constructs studied may be specific to the group studied.
Selection X Treatment Interaction: The term “interaction” means that some variable has one effect under one set of circumstances, but a different effect under another set of circumstances. The term implies that a given effect is not generalizable; that is, it doesn’t work the same way under all circumstances, e.g., a new treatment that is effective for the elderly but isn’t effective for adolescents.An interaction between selection and treatment means that the effects of a given treatment would not generalize to other members of the population of interest (or target population). For instance, an independent variable that has an effect when college students are studied may not demonstrate the same effect with non-college students. Another example: when subjects are volunteers, they may be more motivated than non-volunteers and, as a result, more responsive to the experimental treatment. Thus, results may not generalize to individuals who are not inclined to volunteer for research studies.History X Treatment Interaction: This means that the effects of a treatment do not generalize beyond the setting and/or time period in which the experiment was done. For instance, one might argue that the results of social psychology experiments on conformity and obedience to authority done in the 1950’s and early 1960’s might not, if conducted today, turn out the same way, given the emphasis on individuality and cynicism toward authority in today’s society.Testing X Treatment Interaction: In some instances, the results of research in which pretests are used might not generalize to cases in which pretests are not used. For instance, a pretest may “sensitize” subjects to the purpose of the research study or otherwise increase their susceptibility to respond to treatment. This phenonmenon is known as pretest sensitization.An example of pretest sensitization might be in a study that tests the effect of a film designed to reduce racism, where one group views the film and another does not. Say that the study includes a pretest for both groups that asks many questions about the subjects’ current racial attitudes. As a result, the subjects’ awareness of racial issues might be increased or activated. Thus, the group that views the film might be motivated to pay close attention to it and follow its advice. The results of the study might not generalize to a study that does not include a pretest.Demand Characteristics: the term “demand characteristics” refers to cues in the research setting that allow subjects to guess the research hypothesis. As a result of these cues, subjects may behave differently than they would in field (“real world”) settings. For instance, complaint subjects might act to confirm the hypothesis, or negativistic subjects might try to disprove the hypothesis.Hawthorne Effect: The Hawthorne effect refers to tendency of subjects to behave differently due to the mere fact they are participating in research. The term comes from a finding of studies carried out just outside Chicago at the Hawthorne Plant of the Western electric Company in the 1930s. Workers in this study increased their output following any change in their environment, such as increases or decreases in room illumination and temperature. Apparently, their changes in behavior were connected to the attention they received as research participants, rather than the specific effects of environmental changes. Thus, the results of studies in which subjects know they are participating in research may not generalize to non-research settings.Order Effects (aka carryover effects & multiple treatment interference): This is a threat to external validity in studies with a repeated measures design, or studies in which the same subjects are exposed to more than one treatment. For example, suppose that each subject in an educational experiment receives three different treatments: A, B, and C. Also suppose that Treatment C is found to produce significantly greater learning gains than Treatment A and Treatment B. It could be that Treatment C’s superiority occurred because it followed previous interventions; if so, results could not be safely generalized to a situation in which subjects received Treatment C only. Is it treatment C, or the combination of A + B + C.
Random selectionStratified random samplingCluster samplingNaturalistic ResearchExternal validity at expense of internal validityAnalogue research: Social psychology studies in 50’s and 60’s: Milgram study, analogues of real world phenomenon on a controlled, experimental setting. Analogue studies have a high degree of internal consistency (control lots of variables) but potentially at expense of external validity. The less the laborator setting resembles the real world, the bigger this problem potentially is.Single- and Double-Blind Research: single blind – participant is unaware of conditions or treatment to which they have been assigned. In a double blind, both the experimenter and participant are unaware of assignment (reduces experimenter bias, Hawthorne effect and other demand characteristics). Required Counterbalancing: In A, B, C education study, would test with all possible permutations (Latin Square design)A B C, A C B, B C A, B A C, C A B, C B A
Mook: many psychological investigations are accused of “failure to generalize to the real world” because of bias or artificiality of setting.
True Experimental ResearchQuasi-Experimental ResearchCorrelational ResearchTo be covered at a later dateResearcher does not manipulate the variables of interest, that is the researcher does not control who gets what level of each independent variable. Variables are measured rather than manipulated.Developmental ResearchTime-Series Design: involves taking multiple measurements over time (usually multiple pretest and post test measures) in order to assess the effects of an independent variable. For instance lets say you were assessing the effects of a new program in junior high school aimed at smoking prevention. One way to assess the programs effect is to administer multiple assessments before (pretests) and after (post tests) the introduction of the program. This type of research is sometimes referred to as an interrupted time series design, because the series of measurements on the dependent variable is intereupted by the administration of the treatment (repeated measures)Advantage is it allows us to rule out many threats to internal validity, such as maturation, regression, and testing.In a one group interrupted time series design, like the one described above, the major threat to internal validity is history. An event that occurs at the same time as the administration of treatment can account for observed changes in the post test phase. For example, with the smoking prevention study lets assume that that cost of cigarettes increased significantly at the start of the study. The use of multiple pre tests and post tests would not control for this. Instead the way to control for history is to use a two group time series design; e.g., take the same measurements from students in a “control” school that is comparable to the one studied. However, in this second school you don’t present the program. You then compare the two schools over time. Since the historical event will occur equally to both student groups, if the treatment group shows an effect and the other doesn’t, then you can attribute the effect to the program rather than a possible historical event.Single-Subjects Designs: well suited for research on behavior modification. AB design: a great deal of variability in the target behavior poses a major threat to this design. It would be very difficult to establish a reliable baseline or the effect of treatment. Reversal or withdrawal design ABAB design has a number of advantages over the ABA design. First of all, if the behavior changes in the predicted direction after the second introduction of the treatment, there is additional confirmation that the treatment is responsible for observed changes on the DV. Second, if the study ends with the withdrawal condition, the subject is left back in the condition the researcher is trying to change in the first place. Multiple Baseline Design: used when the use of reversal design is not possible. May not be possible or ethical to remove treatment. Target behavior may not return to baseline after treatment is withdrawn. The multiple baseline design does not involve withdrawing the treatment. They instead involve applying the treatment sequentially (i.e., across different baselines). The treatment may be applied sequentially across different behaviors to the same subject in different settings (multiple baseline across behaviors), to the same subject in different settings (multiple baselines across settings), or to the same behavior or different subjects (multiple baseline across subjects). Might test treatment for multiple behaviors for treatment of autistic childQualitative ResearchPreviously covered
Assignment to GroupAssignment to group is designated by a letter at the beginning of each line (i.e., group) that describes how the group was assigned. The major types of assignment are:R = random assignmentN = nonequivalent groupsC = assignment by cutoff
Experimental realism vs. Mundane (real world) realismSolomon Asch’s “line conformity” study
Between-Subjects Design: different groups of subjects are randomly assigned to the levels of the independenWithin-Subjects DesignMixed-DesignSingle-Subject Design
Vulnerable to History, Maturation, Selection, MortalityNo Random Assignment
No Random Assignment
No Random Assignment
Post-test only design: NOTE – this is not the same thing as a case study. Typically the case study has multiple sources of data (usually qualitative, but some may be quantitative) extended over time, and there is also information about the context.
Between groups designs vs. Within groups designsTwo group designsPost-test-only randomized control trial (RCT): random allocation of participants into an experimental group (given the experimental “treatment” and a “no treatment” control group. Post tests of the two groups compared.Post-test-only two treatment comparison: Random allocation of participants to experimental group 1 (given experimental treatment 1) or to experimental group 2 (given experimental treatment 2. Post-tests of the two groups compared.Pre-test post-test RCT: Random allocation of participants to an experimental group (given the experimental treatment) and a “no treatment” control group. Pre-test to post-test changes of individuals in the two groups compared.Pre-test post-test two treatment comparison: Random allocation of participants to experimental group 1 (given experimental treatment 1), or to experimental group 2 (given experimental treatment 2). Pre-test to post-test changes of individuals in the two groups compared.Three (or more) group simple designs: it is possible to extend any of the above two group designs by including additional experimental groups (given different experimental “treatments”). The RCT’s retain a “no treatment” control group.Factorial designs: Two (or more) independent variables (IV’s) involved (e.g., type of music and number of decibles). Each IV studied at two or more levels. Random allocation of participants to groups covering all possible combinations of levels of the different IV’s. Can be post test only or pre-test post-test.Parametric designs: Several “levels” of an IV covered with random allocation of participants to groups to get a view of the effect of the IV over a range of values. Can be post-test only or pre-test post test (could be used to better understand medication dosing). Seen it used in fMRI research.Matched pairs designs: establishing pairs of participants with similar scores on a variable known to be related to the dependent variable of the experiment. Random allocation of members of the pairs to different experimental groups (or to an experimental and control group). This approach can be used in several two group designs. Attractive, but can introduce complexities both in setting up and interpreting.Repeated measures designs: Designs where the same participant is tested under two or more experimental treatments or conditions (or both an experimental and control condition). Can be thought of as the extreme example of matched pairs designs.
Pretty standard classic randomized control trial design (RCT)Control for most significant threats to validity (at least internal validity)Possible problem could be pre-test sensitization. These next 3 designs do not control for unique intra-session history (reduced by testing Ss individually rather than in groups).Even with individual sessions, history can be uncontrolled if al the exp. Group is run before the control group (or vice versa)However, if run simultaneously, different experimentors are required, which can introduce experimenter differences that can become a form of intrasession history confounded with treatment.Analysis: don’t run multiple t-tests. Consider change scores 01 to 02 change vs. 03 to 04 change t-test, or compare 02 vs 04 controlling for pretest scores (particularly if concerned about regression to the mean, but b/c random assignment probably not a problem).
Possible analyses: one-way anova (O2, O4, O5, O6), or 2 x 2 factorial (intervention (X vs control) vs pre-test (no pretest vs pretest) to determine effects of pretest sensitization, treatment effect, and potential interaction of treatment and pretesting. 2x2 ANOVA: No X XPretested 04 02Unpretested 06 05Solid design, but rarely practical (need a lot of participants)
Similar to Static-Group Comparison, but with random assignment to conditions.Similar to design 4 (with pretests), but can be useful if you have significant concerns about pre-test sensitizationStatistics: T-test to compare O1 and O2
The primary advantage of within subjects design is that it is a more powerful statistically than between subjects design. In statistical terminology, the power of a research design refers to its ability to detect effects of the independent variable. A powerful design is able to detect effects of the independent variable more easily than less powerful designs. Within subjects designs are more powerful because the participants in all experimental conditions are identical in every way (after all they are the same individuals). When this is the case, none of the observed variability in responses to the various conditions can be due to preexisting differences between participants in the groups. Otherwise, there would be 24 different possible orders.Alternatively a Latin Square design may be used to control for order effects. In a Latin square design, each condition appears once at each ordinal position (1st, 2nd, 3rd, etc), and each condition precedes and follows every other condition once. For example, if a within subjects design has four conditions, a Latin square would involve administering the conditions in for different orders as shown.
Factorial designs: Two (or more) independent variables (IV’s) involved (e.g., type of music and number of decibles). Each IV studied at two or more levels. Random allocation of participants to groups covering all possible combinations of levels of the different IV’s. Can be post test only or pre-test post-test.Violent lyrics (hi vs low) and provocation (hi vs low)We can also depict a factorial design in design notation. Because of the treatment level combinations, it is useful to use subscripts on the treatment (X) symbol. We can see in the figure that there are four groups, one for each combination of levels of factors. It is also immediately apparent that the groups were randomly assigned and that this is a posttest-only design (typically a no-no).
Probably the easiest way to begin understanding factorial designs is by looking at an example. Let's imagine a design where we have an educational program where we would like to look at a variety of program variations to see which works best. For instance, we would like to vary the amount of time the children receive instruction with one group getting 1 hour of instruction per week and another getting 4 hours per week. And, we'd like to vary the setting with one group getting the instruction in-class (probably pulled off into a corner of the classroom) and the other group being pulled-out of the classroom for instruction in another room. We could think about having four separate groups to do this, but when we are varying the amount of time in instruction, what setting would we use: in-class or pull-out? And, when we were studying setting, what amount of instruction time would we use: 1 hour, 4 hours, or something else?With factorial designs, we don't have to compromise when answering these questions. We can have it both ways if we cross each of our two time in instruction conditions with each of our two settings. Let's begin by doing some defining of terms. In factorial designs, a factoris a major independent variable. In this example we have two factors: time in instruction and setting. A level is a subdivision of a factor. In this example, time in instruction has two levels and setting has two levels. Sometimes we depict a factorial design with a numbering notation. In this example, we can say that we have a 2 x 2 (spoken "two-by-two) factorial design. In this notation, the number of numbers tells you how many factors there are and the number values tell you how many levels. If I said I had a 3 x 4 factorial design, you would know that I had 2 factors and that one factor had 3 levels while the other had 4. Order of the numbers makes no difference and we could just as easily term this a 4 x 3 factorial design. The number of different treatment groups that we have in any factorial design can easily be determined by multiplying through the number notation. For instance, in our example we have 2 x 2 = 4 groups. In our notational example, we would need 3 x 4 = 12 groups.
Let's begin by looking at the "null" case. The null case is a situation where the treatments have no effect. This figure assumes that even if we didn't give the training we could expect that students would score a 5 on average on the outcome test. You can see in this hypothetical case that all four groups score an average of 5 and therefore the row and column averages must be 5. You can't see the lines for both levels in the graphs because one line falls right on top of the other.It’s important to note that we are looking at the averages for each cell
A main effect is an outcome that is a consistent difference between levels of a factor. For instance, we would say there’s a main effect for setting if we find a statistical difference between the averages for the in-class and pull-out groups, at all levels of time in instruction. The first figure depicts a main effect of time. For all settings, the 4 hour/week condition worked better than the 1 hour/week one. It is also possible to have a main effect for setting (and none for time).
the second main effect graph we see that in-class training was better than pull-out training for all amounts of time.
Finally, it is possible to have a main effect on both variables simultaneously as depicted in the third main effect figure. In this instance 4 hours/week always works better than 1 hour/week and in-class setting always works better than pull-out.Essentially there is a cumulative effect: in classtx is better than out of class tx, 4 hrs is better than 1 hr, and the combination of 4 hrs plus in class instruction is best.
If we could only look at main effects, factorial designs would be useful. But, because of the way we combine levels in factorial designs, they also enable us to examine the interaction effects that exist between factors. An interaction effect exists when differences on one factor depend on the level you are on another factor. It's important to recognize that an interaction is between factors, not levels. We wouldn't say there's an interaction between 4 hours/week and in-class treatment. Instead, we would say that there's an interaction between time and setting, and then we would go on to describe the specific levels involved.How do you know if there is an interaction in a factorial design? There are three ways you can determine there's an interaction. First, when you run the statistical analysis, the statistical table will report on all main effects and interactions. Second, you know there's an interaction when can't talk about effect on one factor without mentioning the other factor. if you can say at the end of our study that time in instruction makes a difference, then you know that you have a main effect and not an interaction (because you did not have to mention the setting factor when describing the results for time). On the other hand, when you have an interaction it is impossible to describe your results accurately without mentioning both factors. Finally, you can always spot an interaction in the graphs of group means -- whenever there are lines that are not parallel there is an interaction present! If you check out the main effect graphs above, you will notice that all of the lines within a graph are parallel. In contrast, for all of the interaction graphs, you will see that the lines are not parallel.In this first interaction effect graph, we see that one combination of levels -- 4 hours/week and in-class setting -- does better than the other three.
In this second interaction we have a more complex "cross-over" interaction. Here, at 1 hour/week the pull-out group does better than the in-class group while at 4 hours/week the reverse is true. Furthermore, the both of these combinations of levels do equally well.SummaryFactorial design has several important features. First, it has great flexibility for exploring or enhancing the “signal” (treatment) in our studies. Whenever we are interested in examining treatment variations, factorial designs should be strong candidates as the designs of choice. Second, factorial designs are efficient. Instead of conducting a series of independent studies we are effectively able to combine these studies into one. Finally, factorial designs are the only effective way to examine interaction effects.So far, we have only looked at a very simple 2 x 2 factorial design structure. You may want to look at some factorial design variations to get a deeper understanding of how they work. You may also want to examine how we approach the statistical analysis of factorial experimental designs.
Matched designsMatched variables correlate with DV; measurement of matched variable unlikely to influence treatment effectRepeated measures designsOrder effects unlikely;IV’s lend to repeated measurement; would likely be exposed in real life; individual differences likely to mask treatment effectsSimple two group designsOrder effects likely; IV(s) don’t lend to repeated measurement; Ss may be sensitized by pretesting or matching; not likely to get all treatments in real life.Before-after / pre-post designPre-testing unlikely to affect Tx effects; concerns whether random assignment has produced equivalent groups; individual differences may mask Tx effectsFactorial designsInterested in > 1 IV & interaction effects a concernParamentric designsIV(s) have a range of values or levels of interest; wish to investigate form or nature of relationship between IV and DV
ProblemsRandom assignment: there are practical and ethical problems in achieving random assignment to different experimental treatments or conditions (e.g., withholding the treatment from a no-treatment control group). Random assignment is often feasible only in atypical circumstances or with selected respondents, leading to questions of generalizability. A problem with both lab experiments and field experiments. Validity: possibly decreased internal validity from lab experiments.Ethical issuesControl (control is lost when we move out of the lab).AdvantagesGeneralizabilityValidity (reduced demand characteristics) possibly increased external validityParticipant availability
Pretest-posttest designsHistory: something other than the quasi-independent variable that occurred between the pretest and posttest caused the observed change.Maturation: normal changes that occur over time, such as those associated with development, may be mistakenly attributed to the quasi-independent variable.Regression (to the mean): when participants were selected because they had extreme scores, their scores may change in the direction of the mean between pretest and posttest even if the quasi-independent variable has no effect.Pretest sensitization: taking the pretest changes participants reactions to the posttest.Two or more nonequivalent groupsSelection bias: the researcher erroneously concludes that the quasi-independent variable caused the difference between the gropus when, in fact, the groups differed even before the occurance of the quasi-independent variable: in a true experiment, random assignment eliminates this confound.Local history: an extraneous event occurs in one group but not in the other(s): this event, not the quasi-independent variable, caused the difference between the groups; also called a selection by history interaction: For example, looking at tobacco prevention program between two school classes. The teacher from one of the classes is a smoker and develops lung cancer.
Potentially vulnerable to history , Instrumentation (?)
Potential external validity concerns of Testing X Treatment, Reactive Arrangements, Multiple treatment interference.
Similar to #4, but without random assignmentVulnerable to Selection X Maturation
Need to consider various effects when studying development. For example, studying PTSD in Veterans. Possible influence of age of veteran. Consider differences between Vietnam vets (60’s) and vets from current wars (20’s through early 40’s). Cohort effects: differences in life experiences, generational differences. Compare how vietnam vets were treated coming home vs. WW2 vets and OIF/OEF/OND vets. Time of measurement effects: If we measure PTSD yearly starting in 2000, might there have been an effect after the Twin Towers on 9/11/2001? Might see spikes in PTSDS Sx related to time of measurement in 2001, 2002.
Is a quasi-experimental design – the quasi-independent variable is time itself.Longitudinal studies assess changes over an extended period of time by looking at the same groups of subjects for months or even years. Looking at academic and social development, we may choose a small sample from each of the low- and high-income areas and assess them on various measures every six months for a period of ten years. The results of longitudinal studies can provide valuable qualitative and quantitative data regarding the differences in development between various groups. The major concern with longitudinal research, aside from the obvious lack of control, randomization, and standardization, is the length of time it takes to complete the study. Imagine starting a project that must be constantly maintained for a period of ten or more years. The subject mortality rate due to illness, relocation, and other factors alone could result in major concerns, not to mention the amount of energy and time that must be devoted to the research. Longitudinal designs involve repeated measures on the same variables for the same group or groups on an extended serious of occasions. Such studies might either precede or follow some intervention or other event and examine its effects over time. These designs are, in principle, very attractive. They avoid many problems of cross-sectional studies where such matters are handled retrospectively and depend on participant’s recall of past events. However, they are difficult and complex to run, and typically call for considerable resources. Some problems includeSample attritionNeed to devise measures that can be repeatedNeed for special methods of data analysis (repeated measures analysis)
Takes a snapshot -cannot differentiate age related effects from cohort or generational effects (differences of vietnam vets and oif/oef/ond vets)One way to reduce the amount of time and the mortality rate in a developmental study is to assess different ages at the same time rather than using the same groups over an extended period. A cross sectional study might look at the same theory regarding academic and social development but assess a small group of three year olds, six year olds, nine year olds and 12 year olds at the same time. The assumption is that the differences between the age ranges represent natural development and that of a longitudinal study had been used, similar results would be found. The obvious benefit is in the length of time it takes to complete the study, but the assumptions that the six year old group will achieve the same academic and social development as the nine year old group can be invalid. Maybe there’s some time of measurement effect in 2010 – maybe related to the real-estate market crash.Advantages and disadvantagesThe most important advantage of cross sectional studies is that in general they are quick and cheap. As there is no follow up, less resources are required to run the study. Cross sectional studies are the best way to determine prevalence and are useful at identifying associations that can then be more rigorously studied using a cohort study or randomised controlled study. The most important problem with this type of study is differentiating cause and effect from simple association. For example, a study finding an association between low CD4 counts and HIV infection does not demonstrate whether HIV infection lowers CD4 levels or low CD4 levels predispose to HIV infection. Moreover, male homosexuality is associated with both but causes neither. (Another example of a confounding variable). Often there are a number of plausible explanations. For example, if a study shows a negative relation between height and age it could be concluded that people lose height as they get older, younger generations are getting taller, or that tall people have a reduced life expectancy when compared with short people. Cross sectional studies do not provide an explanation for their findings. Rare conditions cannot efficiently be studied using cross sectional studies because even in large samples there may be no one with the disease. In this situation it is better to study a cross sectional sample of patients who already have the disease (a case series). In this way it was found in 1983 that of 1000 patients with AIDS, 727 were homosexual or bisexual men and 236 were intrvenous drug abusers.6 The conclusion that individuals in these two groups had a higher relative risk was inescapable. The natural history of HIV infection was then studied using cohort studies and efficacy of treatments via case controlled studies and randomised clinical trials. ExamplesAn example of a cross sectional study was the prevalence study of skull fractures in children admitted to hospital in Edinburgh from 1983 to 1989.7 Note that although the study period was seven years it was not a longitudinal or cohort study because information about each subject was recorded at a single point in time. A questionnaire based cross sectional study explored the relation between A&E attendance and alcohol consumption in elderly persons.9A recent example can be found in the BMJ, in which the prevalence of serious eye disease in a London population was evaluated.10Previous SectionNext SectionKey pointsCross sectional studiesCross sectional studies are the best way to determine prevalenceAre relatively quickCan study multiple outcomesDo not themselves differentiate between cause and effect or the sequence of events
Cross sequential studies combine both longitudinal and cross sectional methods in an attempt to both shorten the length of the research and minimize developmental assumptions. For this method, groups of different age children (three, six, and nine for example) may be studied for a period of three years to both assess developmental changes and assure that the typical three year old is similar to the typical 6 year old after three years of development.
Single-Subjects Designs: well suited for research on behavior modification. AB design: a great deal of variability in the target behavior poses a major threat to this design. It would be very difficult to establish a reliable baseline or the effect of treatment. Reversal or withdrawal design: same as AB but remove treatment to see if it returns to baseline ABAB design has a number of advantages over the ABA design. First of all, if the behavior changes in the predicted direction after the second introduction of the treatment, there is additional confirmation that the treatment is responsible for observed changes on the DV. Second, if the study ends with the withdrawal condition, the subject is left back in the condition the researcher is trying to change in the first place. Multiple Baseline Design: used when the use of reversal design is not possible. May not be possible or ethical to remove treatment. Target behavior may not return to baseline after treatment is withdrawn. The multiple baseline design does not involve withdrawing the treatment. They instead involve applying the treatment sequentially (i.e., across different baselines). The treatment may be applied sequentially across different behaviors to the same subject in different settings (multiple baseline across behaviors), to the same subject in different settings (multiple baselines across settings), or to the same behavior or different subjects (multiple baseline across subjects). Might test treatment for multiple behaviors for treatment of autistic child
Fixed Designs for Psychological Research
Fixed Designs Experimental &Quasi-Experimental Grant Heller, Ph.D. PYC 5040
Fixed Designs• Theory driven• Should always be piloted first• Manipulation check may be useful• Confirmatory (fixed design) vs. Exploratory approach• Reliability• Construct Validity – Face – Predictive criterion – Internal – External (generalizability)
Experimental Research• Issues o Random selection and assignment o Group equivalence• Control & Comparison groups o Control (no treatment, wait list, placebo) o Comparison (standard treatment)• Assessing the impact of the Intervention / Manipulation o Manipulation checks o Treatment fidelity
Validity• Internal Validity o extent to which the changes in the study DV can be attributed to changes in the IV• External Validity o extent to which the results can be generalized
Threats to Internal Validity (Campbell & Stanley, 1963)• History• Testing• Instrumentation• Statistical Regression (to the mean)• Differential Mortality• Maturation• Selection• Selection X Maturation (interaction)• Experimenter Bias• Ambiguity about causal direction (A B or B A?)• Diffusion of treatments• Compensatory equalization of treatments• Compensatory rivalry
Threats to Internal Validity• Remember the acronym: MRS SMITH – Maturation – Regression to the mean – Selection of subjects – Selection by maturation interaction – Mortality – Instrumentation – Testing – History
Controlling for Threats to Internal Validity• Random assignment *** – Random assignment vs. Random selection• Matching – To ensure equivalency between groups• Blocking – To determine effects of extraneous variables• Holding extraneous variables constant – Reduces generalizability• Controlling for effects of extraneous variables (covariates) statistically – ANCOVA, MMR, partial correlation, etc.
Blocking example Therapy Wait List IQ > 110 IQ > 110Therapy Wait List Therapy Wait List IQ < 110 IQ < 110
Maximizing Internal Validity (Fraenkel & Wallen, 1993)1. Standardization of conditions – Minimize history & instrumentation2. Obtain as much info on participants as possible – Minimize mortality & selection3. Tighten up procedures of study – Minimize history & instrumentation4. Choose appropriate research design – Helps control most threats to internal validity
Threats to External Validity (generalizability) (LeCompte & Goetz, 1982)• Selection – Address through random selection• Setting• History• Construct effects
Threats to External Validity cont.• Selection X Treatment Interaction• History X Treatment Interaction• Testing X Treatment Interaction• Demand Characteristics• Hawthorne Effect• Order Effects (aka carryover effects & multiple treatment interference)
Ways to Increase External Validity• Random sampling/selection *** – Stratified random sampling – Cluster sampling• Naturalistic Research – Internal validity at expense of external validity – Analogue research• Single- and Double-Blind Research• Counterbalancing
Defense against threats to validity• for External Validity o Random selection of subjects• for Internal Validity o Random assignment to conditions
Internal vs. External Validity• Tradeoff between Internal & External Validity• How do we prioritize one over the other? – Most would argue in favor of internal validity• Mook (1983) In Defense of External Invalidity – “to what populations, settings, and so on, do we want the effect to be generalized?” (p. 379) – “we are not making observations, but testing them.” (p. 380) – Lab experiments allow us to test theory, find out what is possible, and break down phenomenon.
Specific Research Designs & Strategies• True Experimental Research – Random assignment to groups, receive different levels of manipulated variable• Quasi-Experimental Research – Random assignment is not possible (pre-existing groups)• Correlational Research – To be covered at a later date – Variables measured rather than manipulated• Developmental Research• Time-Series Design• Single-Subjects Designs• Qualitative Research – will covered
Experimental Fixed Designs• Assignment of Ss to different conditions• Manipulation of at least 1 variable (IV)• Measurement of effects of manipulation on 1 or more variables (DV’s)• Control of all other variables• Experimental realism vs. Mundane realism• Demand characteristics – Deception: but at what cost?• Expectancy effects – Double blind procedures
3 Essential Properties of a Well Designed Experiment (Leary, 2004)1. Manipulation of 1 or more Independent Variables (IVs)2. Random assignment to groups 1. Assure initial group equivalence3. Adequate control of extraneous variables
3 Aspects of Experimental Design• 1.) the number of independent variables (IV’s)• 2.) the number of treatment conditions – Levels of IV’s• 3.) whether the same or different subjects are used in each treatment condition.
Types of Experiments• Between-Subjects Design• Within-Subjects Design (repeated measures)• Mixed-Design – Combines between & within subjects designs• Single-Subject Design
Three Pre-Experimental Designs• 1.) The one-shot case study X OVulnerable to: History, Maturation, Selection, Mortality, Selectio n X Treatment Avoid!
Three Pre-Experimental Designs• 2.) The one-group pretest-posttest design O1 X O2Vulnerable to: History, Maturation, Testing, Instrumentation, Regression (?), Selection X Maturation, Selection X Treatment Avoid!
Three Pre-Experimental Designs• 3.) The static group comparison __ __O1 X __ O2Vulnerable to: Selection, Mortality, Selection X Maturation, Maturation (?), Selection X Treatment Avoid!
Designs to Avoid• Post-test only design – Problem: impossible to determine change from pre- treatment (no baseline measure) – Suggestion: improve design or adopt case study methodology• Post-test only non-equivalent groups – Problem: no baseline measure, so any differences between groups cannot be attributed to treatment – Suggestions: incorporate a pre-test; employ random assignment when possible; consider case study• Pre-test post-test single group design – Problem: widely used, but vulnerable to history, maturation regression. – Suggestion: add 2nd pre-tested no-treatment control group
True Experimental Designs• Two group designs – Post-test-only randomized control trial (RCT) – Post-test-only two treatment comparison – Pre-test post-test RCT – Pre-test post-test two treatment comparison• Three (or more) group simple designs• Factorial designs• Parametric designs• Matched pairs designs• Repeated measures designs – Within-groups design
Three True Experimental Designs• 4.) The pretest-posttest control group design R O1 X O2 R O3 O4
Three True Experimental Designs• 5.) The Solomon four-group design R O1 X O2 R O3 O4 R X O5 R O6
Three True Experimental Designs• 6.) The posttest-only control group design R X O1 R O2
Within-Subjects Designs• Advantages • Disadvantages – Increased statistical – Order effects power • Address through • Fewer participants counterbalancing needed • Latin square design • Carryover effects may still exist 1st 2nd 3rd 4th Group 1 0 mg 100 mg 600 mg 300 mg Group 2 100 mg 300 mg 0 mg 600 mg Group 3 300 mg 600 mg 100 mg 0 mg Group 4 600 mg 0 mg 300 mg 100 mg
Posttest-Only One-Way Designs • Randomized groups design Random Initial IV DV assignment Sample manipulated measured to groups • Matched-subjects design Ss in each block Initial Matched randomly IV DV assigned to manipulated measuredSample into blocks groups • Repeated measures design Initial Receives 1 DV Receives DV another levelSample level of IV measured of the IV measured
Pretest-Posttest-Only One-Way Designs • Randomized groups design DV Random DV Initial IV manip- measured assignment measuredSample to groups ulated (pretest) (posttest) • Matched-subjects design Ss in blocks DV Match DV Initial randomly IV manip- measured into assigned to measuredSample ulated (pretest) blocks groups (posttest) • Repeated measures design DV Receive DV Receive DV Initial measured another measured measured one levelSample (pretest) of IV posttest level of IV posttest #1 #2
2 X 2 Factorial Design Independent Variable A A1 A2 Also notated: R X11 OIndependent B1 Variable B R X12 O R X21 O B2 R X22 O
3 X 2 Factorial Design Independent Variable A A1 A2 A3Independent B1 Variable B B2
2 X 2 X 2 Factorial Design Independent Variable A A1 A2 Independent Variable B Independent Variable B B1 B2 B1 B2Independent C1 C1 Variable C C2 C2
2 X 2 X 2 Factorial Design Same design, different notation A1 A2 B1 B2 B1 B2C1 C2 C1 C2 C1 C2 C1 C2
Experimental Designs: when to use• Matched designs – Matched variables correlate with DV; measurement of matched variable unlikely to influence treatment effect• Repeated measures designs – Order effects unlikely; IV’s lend to repeated measurement; would likely be exposed in real life; individual differences likely to mask treatment effects• Simple two group designs – Order effects likely; IV(s) don’t lend to repeated measurement; Ss may be sensitized by pretesting or matching; not likely to get all treatments in real life.• Before-after / pre-post design – Pre-testing unlikely to affect Tx effects; concerns whether random assignment has produced equivalent groups; individual differences may mask Tx effects• Factorial designs – Interested in > 1 IV & interaction effects a concern• Parametric designs – IV(s) have a range of values or levels of interest; wish to investigate form or nature of relationship between IV and DV
Quasi-experiments“A research design involving an experimentalapproach but where random assignment totreatment and comparison groups has notbeen used” (Campbell & Stanley, 1963).
Quasi-experiments• Experimental approach, but random assignment not used• Typically employ naturally occurring groups – Classrooms, clinics, organizations, geographic areas, etc.• Generally do not possess same degree of internal validity as true experiments
Common threats to internal validity of quasi-experimental designs• Pretest-posttest designs – History – Maturation – Regression (to the mean) – Pretest sensitization• Two or more nonequivalent groups – Selection bias – Local history