This document summarizes key aspects of Chapter 5 from the book "Research Methods: Building a Knowledge Base" which discusses experimental designs. It defines key terminology used in experiments such as independent variable, dependent variable, experimental condition, control condition, and levels of independent variables. It describes the three key features of experiments as manipulating variables, controlling the environment, and assigning participants to conditions. It provides an example experiment that examined the effects of maternal behavior in rats.
Historical philosophical, theoretical, and legal foundations of special and i...
Malec, T. & Newman, M. (2013). Research methods Building a kn.docx
1. Malec, T. & Newman, M. (2013). Research methods: Building a
knowledge base. San Diego, CA: Bridgepoint Education, Inc.
ISBN-13: 9781621785743, ISBN-10: 1621785742.
Chapter 5: Experimental Designs – Determining Cause-and-
Effect Relationships
hapter 5
Experimental Designs—Determining Cause-and-
EffectRelationships
Cosmo Condina/Stone/Getty Images
Chapter Contents
· Experiment Terminology
· Key Features of Experiments
· Experimental Validity
· Experimental Designs
· Analyzing Experiments
· Wrap-Up: Avoiding Error
· Critiquing a Quantitative Study
· Mixed Methods Research Designs
One of the oldest debates within psychology concerns the relati
ve contributions that biology and the environment make in shapi
ng ourthoughts, feelings, and behaviors. Do we become who we
are because it is hard-
wired into our DNA or in response to early experiences? Dopeo
ple take on their parents’ personality quirks because they carry t
heir parents’ genes or because they grew up in their parents’ ho
mes? Thereare, in fact, several ways to address these types of q
uestions. In fact, a consortium of researchers at the University o
f Minnesota has spent thepast 2 decades comparing pairs of iden
tical and fraternal twins to tease apart the contributions of genes
and environment. You can read moreat the research group’s we
bsite, Minnesota Center for Twin and Family Research, http://m
2. ctfr.psych.umn.edu/.
Creatas Images/Thinkstock
Researchers at the University ofMinnesota work with twins in o
rder tostudy the impact of genetics versusupbringing on persona
lity traits.
An alternative to using twin pairs to separate genetic and enviro
nmental influence is through the use of experimental designs, w
hich have the primary goal of explaining the causes of behavior.
Recall fromChapter 2 (Section 2.1, Overview of Research Desi
gns) that experiments can speak to cause and effectbecause the
experimenter has control over the environment and is able to ma
nipulate variables. Oneparticularly ingenious example comes fro
m the laboratory of Michael Meaney, a professor of psychiatrya
nd neurology at McGill University, using female rats as experim
ental subjects (Francis, Dioro, Liu, &Meaney, 1999). Meaney’s
research revealed that the parenting ability of female rats can be
reliablyclassified based on how attentive they are to their rat p
ups, as well as how much time they spendgrooming the pups. Th
e question tackled in this study was whether these behaviors we
re learned fromthe rats’ own mothers or transmitted genetically.
To answer this question experimentally, Meaney andcolleagues
had to think very carefully about the comparisons they wanted t
o make. It would have beeninsufficient to simply compare the of
fspring of good and bad mothers—
this approach could notdistinguish between genetic and environ
mental pathways.
Instead, Meaney decided to use a technique called cross-
fostering, or switching rat pups from one mother to another as s
oon as they wereborn. This resulted in four combinations of rats
: (1) those born to inattentive mothers but raised by attentive on
es, (2) those born to attentivemothers but raised by inattentive o
nes, (3) those born and raised by attentive mothers, and (4) thos
e born and raised by inattentive mothers.Meaney then tested the
rat pups several months later and observed the way they behave
d with their own offspring. The setup of thisexperiment allowed
3. Meaney to make clear comparisons between the influence of bir
th mothers and the rearing process. At the end of thestudy, the c
onclusion was crystal clear: Maternal behavior is all about the e
nvironment. Those rat pups that ultimately grew up to be inatten
tivemothers were those who had been raised by inattentive moth
ers.
This final chapter is dedicated to experimental designs, in whic
h the primary goal is to explain behavior. Experimental designs
rank highest onthe continuum of control (see Figure 5.1) becaus
e the experimenter can manipulate variables, minimize extraneo
us variables, and assignparticipants to conditions. The chapter b
egins with an overview of the key features of experiments and t
hen covers the importance of bothinternal and external validity
of experiments. From there, the discussion moves to the process
of designing and analyzing experiments and asummary of strate
gies for minimizing error in experiments. It concludes with guid
elines for critiquing a quantitative study.
Figure 5.1: Experimental designs on the continuum of control
5.1 Experiment Terminology
Flying Colours Ltd/Photodisc/Thinkstock
A variable is any factor that has morethan one value, such as hei
ght.
Before we dive into the details, it is important to cover the term
inology that we will use to describedifferent aspects of experim
ental designs. Much of this will be familiar from previous chapt
ers, with afew new additions. First, let’s review the basics.
Recall that a variable is any factor that has more than one value.
For example, height is a variablebecause people can be short, ta
ll, or anywhere in between. Depression is a variable because pe
ople canexperience a wide range of symptoms, from mild to sev
ere. The independent variable (IV) is the variablethat is manipul
ated by the experimenter in order to test hypotheses about cause
. The dependentvariable (DV) is the variable that is measured b
4. y the experimenter in order to assess the effects of theindepende
nt variable. For example, in an experiment testing the hypothesi
s that fear causes prejudice,fear would be the independent varia
ble and prejudice would be the dependent variable. To keep thes
eterms straight, it is helpful to think of the main goal of experi
mental designs. That is, we test hypothesesabout cause by mani
pulating an independent variable and then looking for changes i
n a dependentvariable. Thus, our independent variable causes ch
anges in the dependent variable; for example, fear is hypothesiz
ed to cause changes inprejudice.
Any manipulation of independent variables results in two or mo
re versions of the variable. One common way to describe the ver
sions of theindependent variable is in terms of different groups,
or conditions. The most basic experiments have two conditions:
The experimentalcondition receives treatment designed to test t
he hypothesis, while the control condition does not receive this
treatment. In our fear andprejudice example, the participants wh
o make up the experimental condition would be made to feel afr
aid, while the participants who make upthe control condition wo
uld not. This setup allows us to test whether introducing fear to
one group of participants leads them to express moreprejudice t
han the other group of participants, who are not made fearful.
Another common way to describe these versions is in terms of l
evels of the independent variable. Levels describe the specific s
et ofcircumstances created by manipulating a variable. For exam
ple, in the fear and prejudice experiment, the variable of fear w
ould have two levels—
afraid and not afraid. There are countless ways to introduce fear
into the experiment. One option would be to adopt the techniqu
e used bythe Stanford social psychologist Stanley Schachter (19
59), who led participants to believe they would be exposed to a
series of painful electricshocks. In Schachter’s study, the painfu
l shocks never happened, but they did induce a fearful state as p
eople anticipated them. So those at the“afraid” level of the inde
pendent variable might be told to expect these shocks, while tho
se at the “not afraid” level of the independent variablewould not
5. be given this expectation.
At this stage, it may seem odd to have two sets of vocabulary te
rms—“levels” and “conditions”—
for the same concept. However, there is asubtle difference in ho
w these terms are used once we get into advanced experimental
designs. As the designs become more complex, it isoften necess
ary to expand independent variables to include several groups a
nd multiple variables. Once this happens, we will need different
terminology to distinguish between the versions of one variable
and the combinations of multiple variables. We will return to thi
s complexitylater in the chapter, in Section 5.4, Experimental D
esigns.
5.2 Key Features of Experiments
The overview of research designs (Chapter 2, Section 2.1) descr
ibed the overall process of experiments in the following way: A
researchercontrols the environment as much as possible so that
all participants have the same experience. He or she then manip
ulates, or changes, onekey variable, and then measures the outc
omes in another key variable. In this section, we will examine t
his process of control in more detail.Experiments can be disting
uished from all other designs by three key features: manipulatin
g variables, controlling the environment, andassigning people to
groups comprising experimental and control conditions.
Manipulating Variables
The most crucial element of an experiment is that the researcher
must manipulate, or change, some key variable. To study the ef
fects of hunger,for example, a researcher could manipulate the a
mount of food given to the participants. Or, to study the effects
of temperature, theexperimenter could raise and lower the tempe
rature of the thermostat in the laboratory. Because these factors
are under the researchers’ directcontrol, they can feel more conf
ident that changing them contributes to changes in the dependen
t variables.
In Chapter 2 we discussed the main shortcoming of correlational
6. research: These designs do not allow us to make causal stateme
nts. As you’llrecall from that chapter (as well as from Chapter 4
), correlational research is designed to predict one variable from
another.
One of the examples in Chapter 2 concerned the correlation bet
ween income levels and happiness, with the goal of trying to pre
dict happinesslevels based on knowing people’s income level. If
we measure these as they occur in the real world, we cannot sa
y for sure which variablecauses the other. However, we could se
ttle this question relatively quickly with the right experiment. L
et’s say we bring two groups into thelaboratory, give one group
$100 and a second group nothing. If the first group were happie
r at the end of the study, this would support theidea that money
really does buy happiness. Of course, this is a rather simplistic l
ook at the connection between money and happiness, butbecause
it manipulated levels of money, this study would bring us close
r to making causal statements about the effects of money.
To manipulate variables, it is necessary to have at least two vers
ions of the variable. That is, to study the effects of money, we n
eed acomparison group that does not receive money. To study th
e effects of hunger, we would need both a hungry and a not-
hungry group. Havingtwo versions of the variable distinguishes
experimental designs from the structured observations discussed
in Chapter 3 (ObservationalResearch), in which all participants
received the same set of conditions in the laboratory. Even the
most basic experiment must have two sets ofconditions, which a
re often an experimental group and a control group. But, as we
will see later in this chapter, experiments can become muchmor
e complex. You might have one experimental group and two con
trol groups, or five degrees of food deprivation, ranging from 0
to 12 hourswithout food. Your decisions about the number and n
ature of these groups will depend on consideration of both your
hypotheses and previousliterature.
When it comes to the manipulation of variables, there are three
options available. First, environmental manipulations involve ch
anging someaspect of the setting. Environmental manipulations
7. are perhaps the most common in psychology studies, and they in
clude everything fromvarying the temperature to varying the am
ount of money people receive. The key is to change the way diff
erent groups of people experiencetheir time in the laboratory—
it is either hot or cold, and they either receive or don’t receive $
100. Second, instructional manipulations involvechanging the w
ay a task is described in order to change participants’ mind-
sets. For example, you could give all participants the same math
testbut describe it as an intelligence test for one group and a pr
oblem-
solving task for another. Because an intelligence test is thought
to haveimplications for life success, you might expect participa
nts in this group to be more nervous about their scores. Finally,
an invasive manipulationinvolves taking measures to change int
ernal, physiological processes and is usually conducted in medic
al settings. For example, studies of newdrugs involve administer
ing the drug to volunteers to determine whether it has an effect
on some physical or psychological symptom. Or, forexample, st
udies of cardiovascular health often involve having participants
run on a treadmill to measure how the heart functions under stre
ss.
iStockphoto/Thinkstock
Medical studies often use invasivemanipulation to change intern
al,physiological processes.
Finally, there is one qualification to the rule that we must manip
ulate a variable. In many experiments,researchers divide up part
icipants based on an inherent difference (e.g., gender) or person
ality measures(e.g., self-
esteem or neuroticism) that capture stable individual characteris
tics among people. The ideabehind these personality measures is
that someone scoring high on a measure of neuroticism (forexa
mple) would be expected to be more neurotic across situations t
han someone scoring lower on themeasure. Using this technique
allows us to compare how, for example, men and women, or pe
ople withhigh and low self-
8. esteem, respond to manipulations. When existent differences are
used in anexperimental context, they are referred to as quasi-
independent variables—
“quasi,” or “nearly,” becausethey are being measured, not mani
pulated, by the experimenter, and thus do not meet the criteria f
or aregular independent variable. Because these variables are no
t manipulated, an experimenter cannotmake causal statements ab
out them. In order for a study to count as an experiment, these q
uasi-
independent variables would have to be combined with a true in
dependent variable. This could be assimple as comparing how m
en and women respond to a new antidepressant drug—
gender would bequasi-
independent while drug type would be a true independent variab
le.
Controlling the Environment
The second important element of experimental designs is that th
e researcher has a high degree ofcontrol over the environment. I
n addition to manipulating variables, a researcher conducting an
experiment ensures that the other aspects of the environment are
the same for all participants. Forinstance, if you were intereste
d in the effects of temperature on people’s mood, you could man
ipulatetemperature levels in the laboratory so that some people
experienced warmer temperatures and otherpeople cooler temper
atures. But it would be equally important to make sure that othe
r potential influences on mood were the same for bothgroups. T
hat is, you would want to make sure that the “warm” and “cool”
groups were tested in the same room, around the same time of d
ay,and by the same experimenter.
The overall goal, then, is to control extraneous variables, or var
iables that add noise to your hypothesis test. In essence, the mor
e you are ableto control extraneous variables, the more confiden
ce you can have in the results of your hypothesis test. As we wil
l discuss in the section“Experimental Validity,” the impact of e
xtraneous variables can vary in a study. Let’s say we conduct th
9. e study on temperature and mood, andall of our participants are
in a windowless room with a flickering fluorescent light. This w
ould likely have an influence on mood—
makingeveryone a little bit grumpy—
but cause few problems for our hypothesis test because it would
affect everyone equally. Table 5.1 showshypothetical data from
two variations of this study, using a 10-
point scale to measure mood ratings. In the top row, participants
were in a well-
lit room; we can see that participants in the cooler room reporte
d being in a better mood (i.e., an 8 versus a 5). In the bottom ro
w, allparticipants were in the windowless room with flickering l
ights. These numbers suggest that people were still in a better m
ood in the coolerroom (5) than a warm room (2), but the flickeri
ng fluorescent light had a constant dampening effect on everyon
e’s mood.
Table 5.1: Influence of an extraneous variable
Cool Room
Warm Room
Variation 1: Well-Lit
8
5
Variation 2: Flickering Fluorescent
5
2
Assigning People to Conditions
The third key feature of experimental designs is that the researc
her can assign people to receive different conditions, or version
s, of theindependent variable. This is an important piece of the e
xperimental process: The experimenter not only controls the opt
ions—warm versuscool room; $100 versus no money, etc.—
he or she also gets to control which participants get each option.
Whereas a correlational design mightassess the relationship bet
ween current mood and choosing the warm room, an experiment
10. al design will have some participants assigned to thewarm room
and then measure the effects on their mood. In other words, an e
xperimenter is able to make causal statements because thatperso
n causes things to happen.
The most common, and most preferable, way to assign people to
conditions is through a process called random assignment. An e
xperimenterwho uses random assignment makes a separate decis
ion for each participant as to which group he or she will be assi
gned to before theparticipant arrives. As the term implies, this d
ecision is made randomly—
by flipping a coin, using a random number table (for an example
, click here), drawing numbers out of an envelope, or some othe
r random process. The overall goal is to try to balance out existe
nt differences amongpeople, as illustrated in Figure 5.2. So, for
example, some people might generally be more comfortable in
warm rooms, while others might bemore comfortable in cold roo
ms. If each person who shows up for the study has an equal cha
nce of being in either group, then the groups inthe sample shoul
d reflect the same distribution of differences as the population.F
igure 5.2: Random assignment
The 25 participants in our sample consist of a mix of happy and
sad people. The goal of random assignment is to havethese diffe
rences distributed equally across the experimental conditions. T
hus, the two groups on the right each consistof six happy and si
x sad people, and our random assignment was successful.
Another significant advantage of forming groups through rando
m assignment is that it helps to avoid bias in the selection and a
ssignment ofsubjects. For example, it would be a bad idea to ass
ign people to groups based on a first impression of them becaus
e participants might beplaced in the cold room if they arrived at
the laboratory dressed in warm clothing. Experimenters who ma
ke decisions about conditionassignments ahead of time can be m
ore confident that the independent variable is responsible for ch
anges in the dependent variable.
It is worth highlighting the difference here between random sele
11. ction and random assignment (discussed in Chapter 4, Section 4.
3, SamplingFrom the Population). Random selection means that
the sample of participants is chosen at random from the populati
on, as with the probabilitysampling methods discussed in Chapt
er 4. However, most psychology experiments use a convenience
sample of individuals who volunteer tocomplete the study. This
means that the sample is often far from fully random. However,
a researcher can still make sure that the groupassignments are ra
ndom so that each condition contains an equal representation of
the sample.
In some cases—most notably, when samples are small—
random assignment may not be sufficient to balance an importan
t characteristic thatmight affect the results of a particular study.
Imagine conducting a study that compared two strategies for te
aching students complex mathskills. In this example, it would b
e especially important to make sure that both groups contained a
mix of individuals with, say, average andabove-
average intelligence. For this reason, it would be necessary to ta
ke extra steps to ensure that intelligence was equally distributed
amongthe groups, which can be accomplished with a variation
on random assignment called matched random assignment. This
requires theexperimenter to obtain scores on an important match
ing variable (in this case, intelligence), rank participants based
on the matching variable,and then randomly assign people to co
nditions. Figure 5.3 shows how this process would unfold in our
math skills study. First, participants aregiven an IQ test to mea
sure existing differences in intelligence. Second, the experiment
er ranks participants based on these scores, from highestto lowe
st. Third, the experimenter would move down this list in order a
nd randomly assign each participant to one of the conditions. Th
isprocess still contains an element of random assignment, but ad
ding the extra step of rank ordering ensures a more balanced dis
tribution ofintelligence test scores across the conditions.Figure
5.3
The 20 participants in our sample represent a mix of very high,
12. average, and very low intelligence test scores (measured1–
100). The goal of matched random assignment is to ensure that t
his variation is distributed equally across the twoconditions. Th
e experimenter would first rank participants by intelligence test
scores (top box) and then distribute theseparticipants alternately
between the conditions. The result is that both groups (lower b
oxes) contain a good mix of high,average, and low scores.Resea
rch: Making an ImpactThe Stanford Prison Experiment
The landmark 1971 Stanford Prison Experiment had an extremel
y widespread impact on multiple fieldsand real-
life settings such as prison reform, the ethics of human research
, and terrorism and torturetactics (Clements, 1999; Haney & Zi
mbardo, 1998). The study, conducted by Phillip Zimbardo and h
iscolleagues at Stanford University, placed volunteer participant
s in a simulated prison environment andrandomly assigned them
to play the roles of “guards” and “prisoners.” These 24 particip
ants had beenselected based on their personality traits that mark
ed them as “good apples” and who had no previousevidence of a
ntisocial behavior (Haney, Banks, & Zimbardo, 1973). Neverthe
less, the simulated prisonquickly took on the characteristics of a
real prison, with simulated situations of dominance,dehumaniza
tion, severe psychological distress, and the unexpected phenome
na of social control,obedience, and effects of power (Haney, 20
02; Banuazizi & Movahedi, 1975). The prisoners weresubjected
to humiliation similar to what has been seen in prison scandals s
uch as Abu Ghraib:nakedness, sexual humiliation, verbal tormen
t, chains, and bags over prisoners’ heads. Although plannedto ru
n for 2 weeks, the experiment was stopped after only 6 days due
to the severe psychological harmit was causing the prisoners an
d the unexpected behavior of the prison guards.
The study led to major reform in the ethical guidelines for psyc
hological research and humanetreatment, and in fact has never b
een replicated in a scientific setting. Following publication of th
isexperiment, the Supreme Court saw an influx of cases regardin
g prisoner treatment and the structure ofthe prison system. In an
interesting twist, one of the original researchers, Craig Haney,
13. was inspired topursue a career in prison reform based on what h
e learned from this study. Zimbardo himself went onto testify o
n behalf of the soldiers accused of abuse at Abu Ghraib, highlig
hting the powerful influenceof social roles (e.g., prison guard) o
n our behavior. An overwhelming majority of social and psychol
ogicalresearch has found the punitive punishment system to not
only be ineffective but actually deleterious toprisoner behavior
and recidivism. The suggestions from current research include c
alls to restructureprison “power” dynamics in an effort to increa
se prisoner safety and reduce guard brutality.
5.3 Experimental Validity
Chapter 2 (Section 2.2) discussed the concept of validity, or the
degree to which measures capture the constructs that they were
designed tocapture. For example, a measure of happiness needs
to actually capture differences in people’s levels of happiness. I
n this section, we return tothe subject of validity in an experime
ntal context. Similar to our earlier discussion, validity refers he
re to whether the experimental results aredemonstrating what we
think they are demonstrating. We will cover two types of validi
ty that are relevant to experimental designs. The first is internal
validity, which assesses the degree to which results can be attri
buted to independent variables. The second is external validity,
whichassesses how well the results generalize to situations beyo
nd the specific conditions laid out in the experiment. Taken toge
ther, internal andexternal validity provide a way to assess the m
erits of an experiment. However, each of these has its own threa
ts and remedies, as discussed inthe following sections.
Internal Validity
In order to have a high degree of internal validity, experimenter
s strive for maximum control over extraneous variables. That is,
they try todesign experiments so that the independent variable i
s the only cause of differences between groups. But, of course,
no study is ever perfect,and there will always be some degree of
error. In many cases, errors are the result of unavoidable causes
14. , such as the health or mood of theparticipants on the day of the
experiment. In other cases, errors are caused by factors that are,
in fact, under the experimenter’s control. In thissection, we wil
l focus on several of these more manageable threats to internal v
alidity and discuss strategies for reducing their influence.
Experimental Confounds
To avoid threats to the internal validity of an experiment, it is i
mportant to control and minimize the influence of extraneous va
riables thatmight add noise to a hypothesis test. In many cases,
extraneous variables can be considered relatively minor nuisanc
es, as when our moodexperiment was accidentally run in a depre
ssing room. But now, let’s say we ran our study on temperature
and mood, and owing to a lack ofcareful planning, we accidental
ly placed all of the warm-
room participants in a sunny room, and the cool-
room participants in a windowlessroom. We might very well fin
d that the warm-
room participants were in a much better mood. But would this b
e the result of warmtemperatures or the result of exposure to su
nshine? Unfortunately, we would be unable to tell the difference
because of a confoundingvariable, or confound (in the case of c
orrelation studies,third variable). The confounding variable cha
nges systematically with the independentvariable. In this examp
le, room lighting would be confounded with room temperature b
ecause all of the warm-
room participants were alsoexposed to sunshine, and all of the c
ool-
room participants to artificial lighting. This combination of vari
ables would leave us unable todetermine which variable actually
had the effect on mood. The result would be that our groups dif
fered in more than one way, which wouldseriously hinder our ab
ility to say that the independent variable (the room) caused the
dependent variable (mood) to change.
Digital Vision/Thinkstock
15. The demeanor of the person running astudy may be a confoundi
ng variable.
It may sound like an oversimplification, but the way to avoid co
nfounds is to be very careful in designingexperiments. By ensur
ing that groups are alike in every way but the experimental cond
ition, one cangenerally prevent confounds. This is somewhat eas
ier said than done because confounds can come fromunexpected
places. For example, most studies involve the use of multiple re
search assistants whomanage data collection and interact with p
articipants. Some of these assistants might be more or lessfriend
ly than others, so it is important to make sure each of them inter
acts with participants in allconditions. If your friendliest assista
nt works with everyone in the warm-
room group, for example, itwould result in a confounding variab
le (friendly versus unfriendly assistants) between room and rese
archassistant. Consequently, you would be unable to separate th
e influence of your independent variable(the room) from that of
the confound (your research assistant).
Selection Bias
Internal validity can also be threatened when groups are differe
nt before the manipulation, which isknown as selection bias. Sel
ection bias causes problems because these inherent differences
might be the driving factor behind the results.Imagine you are te
sting a new program that will help people stop smoking. You mi
ght decide to ask for volunteers who are ready to quitsmoking a
nd put them through a 6-
week program. But by asking for volunteers—
a remarkably common error—
you gather a group of peoplewho are already somewhat motivate
d to stop smoking. Thus, it is difficult to separate the effects of
your new program from the effects of this apriori motivation.
One easy way to avoid this problem is through either random or
matched-random assignment. In the stop-
smoking example, you could still askfor volunteers, but then ran
domly assign these volunteers to one of the two programs. Beca
16. use both groups would consist of people motivatedto quit smoki
ng, this would help to cancel out the effects of motivation. Anot
her way to minimize selection bias is to use the same people inb
oth conditions so that they serve as their own control. In the sto
p-
smoking example, you could assign volunteers first to one progr
am and thento the other. However, you might run into a problem
with this approach—
participants who successfully quit smoking in the first program
wouldnot benefit from the second program. This technique is kn
own as a within-
subject design, and we will discuss its advantages and disadvant
agesin the subsection “Within-
Subject Designs” in Section 5.4, Experimental Designs.
Differential Attrition
Despite your best efforts at random assignment, you could still
have a biased sample at the end of a study as a result of differen
tial attrition.The problem of differential attrition (sometimes cal
led the mortality threat) occurs when subjects drop out of experi
mental groups for differentreasons. Let’s say you’re conducting
a study of the effects of exercise on depression levels. You man
age to randomly assign people to either 1week of regular exerci
se or 1 week of regular therapy. At first glance, it appears that t
he exercise group shows a dramatic drop in depressionsymptoms
. But then you notice that approximately one third of the people
in this group dropped out before completing the study. Chances
areyou are left with those who are most motivated to exercise, t
o overcome their depression, or both. Thus, you are unable to is
olate the effectsof your independent variable on depression sym
ptoms. While you cannot prevent people from dropping out of y
our study, you can lookcarefully at those who do. In many cases
, you can spot a pattern and use it to guide future research. For
example, it may be possible todiscover a profile of people who
dropped out of the exercise study and use this knowledge to incr
ease retention for the next attempt.
17. Outside Events
As much as we strive to control the laboratory environment, par
ticipants are often influenced by events in the outside world. Th
ese events—sometimes called history effects—are often large-
scale events such as political upheavals and natural disasters. T
he threat to research is that itbecomes difficult to tell whether p
articipants’ responses are the result of the independent variable
or the historical event(s). One great exampleof this comes from
a paper published by social psychologist Ryan Brown, now a pr
ofessor at the University of Oklahoma, on the effects ofreceivin
g different types of affirmative action as people were selected f
or a leadership position. The goal was to determine the best way
toframe affirmative action in order to avoid undermining the re
cipient’s confidence (Brown, Charnsangavej, Keough, Newman,
& Rentfrow, 2000).For about a week during the data collection
process, students at the University of Texas, where the study wa
s being conducted, were protestingon the main lawn about a con
troversial lawsuit regarding affirmative action policies. The res
ult was that participants arriving for this laboratorystudy had to
pass through a swarm of people holding signs that either denoun
ced or supported affirmative action. These types of outsideevent
s are difficult, if not impossible, to control. But, because these r
esearchers were aware of the protests, they made a decision to e
xcludefrom the study data gathered from participants during the
week of the protests, thus minimizing the effects of these outsid
e events.
Expectancy Effects
One final set of threats to internal validity results from the influ
ence of expectancies on people’s behavior. This can cause troub
le forexperimental designs in three related ways. First, experime
nter expectancies can cause researchers to see what they expect
to see, leading tosubtle bias in favor of their hypotheses. In a cl
ever demonstration of this phenomenon, the psychologist Robert
Rosenthal asked his graduatestudents at Harvard University to t
18. rain groups of rats to run a maze (Rosenthal & Fode, 1963). He
also told them that based on a pretest, therats had been classifie
d as either bright or dull. As you might have guessed, these labe
ls were pure fiction, but they still influenced the way thatthe stu
dents treated the rats. Rats labeled “bright” were given more en
couragement and learned the maze much more quickly than rats
labeled“dull.” Rosenthal later extended this line of work to teac
hers’ expectations of their students (Rosenthal & Jacobson, 196
8) and found support forthe same conclusion: People often bring
about the results they expect by behaving in a particular way.
One common way to avoid experimenter expectancies is to have
participants interact with a researcher who is “blind” (i.e., una
ware) to thecondition that each participant is in. The researcher
may be fully aware of the research hypothesis, but his or her be
havior is unlikely to affectthe results. In the Rosenthal and Fode
(1963) study, the graduate students’ behavior influenced the rat
s’ learning speed only because they wereaware of the labels “bri
ght” and “dull.” If these had not been assigned, the rats would h
ave been treated fairly equally across the conditions.
Second, participants in a research study often behave differently
based on their own expectancies about the goals of the study. T
heseexpectancies often develop in response to demand character
istics, or cues in the study that lead participants to guess the hy
pothesis. In a well-
known study conducted at the University of Wisconsin, psychol
ogists Leonard Berkowitz and Anthony LePage found that partic
ipants wouldbehave more aggressively—
by delivering electric shocks to another participant—
if a gun was in the room than if there were no gun present(Berk
owitz & LePage, 1967). This finding has some clear implication
s for gun control policies, suggesting that the mere presence of
gunsincreases the likelihood of violence. However, a common cr
itique of this study is that participants may have quickly clued i
n to its purpose andfigured out how they were “supposed” to be
have. That is, the gun served as a demand characteristic, possibl
y making participants act moreaggressively because they though
19. t it was expected of them.
To minimize demand characteristics, researchers use a variety o
f techniques, all of which attempt to hide the true purpose of the
study fromparticipants. One common strategy is to use a cover
story, or a misleading statement about what is being studied. In
Chapter 1 (Section 1.4,Hypotheses and Theories, and Section 1.
7, Ethics in Research), we discussed Milgram’s famous obedien
ce studies, which discovered that peoplewere willing to obey or
ders to deliver dangerous levels of electric shocks to other peop
le. In order to disguise the purpose of the study, Milgramdescrib
ed it to people as a study of punishment and learning. And the a
ffirmative action study by Ryan Brown and colleagues (Brown e
t al.,2000) was presented as a study of leadership styles. The go
al in using these cover stories is to give participants a compellin
g explanation forwhat they experience during the study and to d
irect their attention away from the research hypothesis.
Another strategy is to use the unrelated-
experiments technique, which leads participants to believe that t
hey are completing two differentexperiments during one laborat
ory session. The experimenter can use this bit of deception to pr
esent the independent variable during the firstexperiment and th
en measure the dependent variable during the second experiment
. For example, a study by Harvard psychologist MargaretShih an
d colleagues (Shih, Pittinsky, & Ambady, 1999) recruited Asian
American females and asked them to complete two supposedly u
nrelatedstudies. In the first, they were asked to read and form i
mpressions of one of two magazine articles; these articles were
designed to make themfocus on either their Asian American ide
ntity or their female identity. In the second experiment, they we
re asked to complete a math test asquickly as possible. The goal
of this study was to examine the effects on math performance o
f priming different aspects of identity. Based onprevious researc
h, these authors predicted that priming an Asian American ident
ity would remind participants of positive stereotypes regarding
Asians and math performance, whereas priming a female identit
y would remind participants of negative stereotypes regarding w
20. omen and mathperformance. As expected, priming an Asian Am
erican identity led this group of participants to do better on a m
ath test than did priming afemale identity. The unrelated-
experiments technique was especially useful for this study beca
use it kept participants from connecting theindependent variable
(magazine article prime) with the dependent variable (math test
).
A final way in which expectancies shape behavior is the placebo
effect, meaning that change can result from the mere expectatio
n that changewill occur. Imagine you wanted to test the hypothe
sis that alcohol causes people to become aggressive. One relativ
ely easy way to do this wouldbe to give alcohol to a group of vo
lunteers (aged 21 and older) and then measure how aggressive t
hey became in response to being provoked.The problem with thi
s approach is that people also expect alcohol to change their beh
avior, so you might see changes in aggression simplybecause of
these expectations. Fortunately, there is an easy solution: Add a
placebo control group to your study that mimics the experiment
alcondition in every way but one. In this case, you might tell all
participants that they will be drinking a mix of vodka and oran
ge juice but onlyadd vodka to half of the participants’ drinks. T
he orange-juice-
only group serves as the placebo control, so any differences bet
ween this groupand the alcohol group can be attributed to the al
cohol itself.
External Validity
In order to attain a high degree of external validity in their expe
riments, researchers strive for maximum realism in the laborator
y environment. External validity means that the results extend b
eyond the particular set of circumstances created in a single stu
dy. Recall that science is acumulative discipline and that knowl
edge grows one study at a time. Thus, each study is more meani
ngful to the extent that it sheds light on areal phenomenon and t
o the extent that the results generalize to other studies. Let’s ex
amine each of these criteria separately.
21. Mundane Realism
The first component of external validity is the extent to which a
n experiment captures the real-
world phenomenon under study. One popularquestion in the area
of aggression research is whether rejection by a peer group lead
s to aggression. That is, when people are rejected from agroup,
do they lash out and behave aggressively toward the members of
that group? Researchers must find realistic ways to manipulater
ejection and measure aggression without infringing on participa
nts’ welfare. Given the need to strike this balance, how real can
things get inthe laboratory? How do we study real-
world phenomena without sacrificing internal validity?
The answer is to strive for mundane realism, meaning that the re
search replicates the psychological conditions of the real-
world phenomenon(sometimes referred to as ecological validity)
. In other words, we need not re-
create the phenomenon down to the last detail; instead, we aimt
o make the laboratory setting feel like the real world. Researche
rs studying aggressive behavior and rejection have developed so
me ratherclever ways of doing this, including allowing participa
nts to administer loud noise blasts or serve large quantities of h
ot sauce to those whorejected them. Psychologically, these acts
feel like aggressive revenge because participants are able to las
h out against those who rejected them,with the intent of causing
harm, even though the behaviors themselves may differ from th
e ways people exact revenge in the real world.
In a 1996 study, Tara MacDonald and her colleagues at Queen’s
University in Ontario, Canada, examined the relationship betwe
en alcohol andcondom use (MacDonald, Zanna, & Fong, 1996).
The authors pointed out a puzzling set of real-
world data: Most people reported that theywould use condoms w
hen engaging in casual sex, but the rates of unprotected sex (i.e.
, having sexual intercourse without a condom) were alsoremarka
bly high. In this study, the authors found that alcohol was a key
factor in causing “common sense to go out the window” (p. 763)
22. ,resulting in a decreased likelihood of condom use. But how on
earth might they study this phenomenon in the laboratory? In th
e authors’words, “even the most ambitious of scientists would h
ave to conclude that it is impossible to observe the effects of int
oxication on actualcondom use in a controlled laboratory setting
” (p. 765).
To solve this dilemma, MacDonald and colleagues developed a
clever technique for studying people’s intentions to use condom
s. Participantswere randomly assigned to either an alcohol or pl
acebo condition, and then they viewed a video depicting a youn
g couple that was faced withthe dilemma of whether to have unp
rotected sex. At the key decision point in the video, the tape wa
s stopped and participants were askedwhat they would do in the
situation. As predicted, participants who were randomly assigne
d to consume alcohol said they would be morewilling to proceed
with unprotected sex. While this laboratory study does not capt
ure the full experience of making decisions about casual sex,it d
oes a nice job of capturing the psychological conditions involve
d.
Generalizing Results
The second component of external validity is the extent to whic
h research findings generalize to other studies. Generalizability
refers to theextent to which the results extend to other studies, u
sing a wide variety of populations and a wide variety of operati
onal definitions (sometimesreferred to as population validity). I
f we conclude that rejection causes people to become more aggr
essive, for example, this conclusion shouldideally carry over to
other studies of the same phenomenon, using different ways of
manipulating rejection and different ways of measuringaggressi
on. If we want to conclude that alcohol reduces intentions to use
condoms, we would need to test this relationship in a variety of
settings—from laboratories to nightclubs—
using different measures of intentions.
Thus, each study that we conduct is limited in its conclusions. I
n order for your particular idea to take hold in the scientific lite
23. rature, it must be replicated, or repeated in different contexts. T
hese replications can take one of four forms. First, exact replica
tion involves trying to re-
createthe original experiment as closely as possible in order to v
erify the findings. This type of replication is often the first step
following a surprisingresult, and it helps researchers to gain mo
re confidence in the patterns. The second and much more comm
on method, conceptual replication,involves testing the relations
hip between conceptual variables using new operational definiti
ons. Conceptual replications would include testingour aggressio
n hypotheses using new measures or examining the link between
alcohol and condom use in different settings. For example,rejec
tion might be operationalized in one study by having participant
s be chosen last for a group project. A conceptual replication mi
ght take adifferent approach: operationalizing rejection by havin
g participants be ignored during a group conversation or voted o
ut of the group. Likewise,a conceptual replication might change
the operationalization of aggression by having one study measur
e the delivery of loud blasts of noise andanother measure the am
ount of hot sauce that people give to their rejecters. Each variati
on studies the same concept (aggression or rejection)but uses sli
ghtly different operationalizations. If all of these variations yiel
ded similar results, this would provide further evidence of theun
derlying ideas—
in this case, that rejection causes people to be more aggressive.
The third method, participant replication, involves repeating the
study with a new population of participants. These types of repl
ication areusually driven by a compelling theory as to why the t
wo populations differ. For example, you might reasonably hypot
hesize that the decision touse condoms is guided by a different s
et of considerations among college students than among older, s
ingle adults. Finally, constructivereplication re-
creates the original experiment but adds elements to the design.
These additions are typically designed to either rule outalternati
ve explanations or extend knowledge about the variables under s
tudy. In our rejection and aggression example, you might test w
24. hethermales and females respond the same way or perhaps comp
are the impact of being rejected by a group versus an individual.
Internal Versus External Validity
We have focused on two ways to assess validity in the context o
f experimental designs. Internal validity assesses the degree to
which results canbe attributed to independent variables; external
validity assesses how well results generalize beyond the specifi
c conditions of the experiment.In an ideal world, studies would
have a high degree of both of these. That is, we would feel com
pletely confident that our independent variablewas the only caus
e of differences in our dependent variable, and our experimental
paradigm would perfectly capture the real-
worldphenomenon under study.
In reality, though, there is often a trade-
off between internal and external validity. In MacDonald and co
lleagues’ study on condom use(MacDonald et al., 1996), the res
earchers sacrificed some realism in order to conduct a tightly co
ntrolled study of participants’ intentions. InBerkowitz and LePa
ge’s (1967) study on the effect of weapons, the researchers riske
d the presence of a demand characteristic in order to studyreacti
ons to actual weapons. These types of trade-
offs are always made based on the goals of the experiment. To g
ive you a better sense ofhow researchers make these compromis
es, let’s evaluate three fictional examples.
Scenario 1: Time Pressure and Stereotyping
Dr. Bob is interested in whether people are more likely to rely o
n stereotypes when they are in a hurry. In a well-
controlled laboratoryexperiment, participants are asked to categ
orize ambiguous shapes as either squares or circles, and half of
these participants are given a shorttime limit to accomplish the t
ask. The independent variable is the presence or absence of time
pressure, and the dependent variable is theextent to which peop
le use stereotypes in their classification of ambiguous shapes. D
r. Bob hypothesizes that people will be more likely to usestereot
25. ypes when they are in a hurry because they will have fewer cogn
itive resources to carefully consider all aspects of the situation.
Dr. Bobtakes great care to have all participants meet in the sam
e room. He uses the same research assistant every time, and the
study is alwaysconducted in the morning. Consistent with his hy
pothesis, Dr. Bob finds that people seem to use shape stereotype
s more under time pressure.
The internal validity of this study appears high—
Dr. Bob has controlled for other influences on participants’ atte
ntion span by collecting all of hisdata in the morning. He has al
so minimized error variance by using the same room and the sa
me research assistant. In addition, Dr. Bob hascreated a tightly
controlled study of stereotyping through the use of circles and s
quares. Had he used photographs of people (rather thanshapes),
the attractiveness of these people might have influenced particip
ants’ judgments. But here’s the trade-
off: By studying the socialphenomenon of stereotyping using ge
ometric shapes, Bob has removed the social element of the stud
y, thereby posing a threat to mundanerealism. The psychological
meaning of stereotyping shapes is rather different from the mea
ning of stereotyping people, which makes this studyrelatively lo
w in external validity.
Scenario 2: Hunger and Mood
Dr. Jen is interested in the effects of hunger on mood; not surpri
singly, she predicts that people will be happier when they are w
ell fed. Shetests this hypothesis with a lengthy laboratory experi
ment, requiring participants to be confined to a laboratory room
for 12 hours with very fewdistractions. Participants have access
to a small pile of magazines to help pass the time. Half of the p
articipants are allowed to eat during thistime, and the other half
is deprived of food for the full 12 hours. Dr. Jen—
a naturally friendly person—collects data from the food-
deprivationgroups on a Saturday afternoon, while her grumpy re
search assistant, Mike, collects data from the well-
fed group on a Monday morning. Herindependent variable is foo
26. d deprivation, with participants either not deprived of food or d
eprived for 12 hours. Her dependent variable consistsof particip
ants’ self-
reported mood ratings. When Dr. Jen analyzes the data, she is s
hocked to discover that participants in the food-
deprivationgroup were much happier than those in the well-
fed group.
Compared with our first scenario, this study seems high on exter
nal validity. To test her predictions about food deprivation, Dr.
Jen actuallydeprives her participants of food. One possible prob
lem with external validity is that participants are confined to a l
aboratory setting during thedeprivation period with only a small
pile of magazines to read. That is, participants may be more aff
ected by hunger when they do not haveother things to distract th
em. In the real world, people are often hungry but distracted by
paying attention to work, family, or leisure activities.But Dr. Je
n has sacrificed some external validity for the sake of controllin
g how participants spend their time during the deprivation perio
d. Thelarger problem with her study has to do with internal vali
dity. Dr. Jen has accidentally confounded two additional variabl
es with her independentvariable: Participants in the deprivation
group have a different experimenter and data are collected at a d
ifferent time of day. Thus, Dr. Jen’ssurprising results most likel
y reflect the fact that everyone is in a better mood on Saturday t
han on Monday and that Dr. Jen is more pleasant tospend 12 ho
urs with than Mike is.
Scenario 3: Math Tutoring and Graduation Rates
Dr. Liz is interested in whether specialized math tutoring can he
lp increase graduation rates among female math majors. To test
her hypothesis,she solicits female volunteers for a math skills w
orkshop by placing flyers around campus, as well as by sending
email announcements to allmath majors. The independent variab
le is whether participants are in the math skills workshop, and t
he dependent variable is whetherparticipants graduate with a ma
th degree. Those who volunteer for the workshop are given wee
27. kly skills tutoring, along with informal discussiongroups design
ed to provide encouragement and increase motivation. At the en
d of the study, Dr. Liz is pleased to see that participants in thew
orkshops are twice as likely as nonparticipants to stick with the
major and graduate.
The obvious strength of this study is its external validity. Dr. Li
z has provided math tutoring to math majors, and she has observ
ed a differencein graduation rates. Thus, this study is very much
embedded in the real world. But, as you might expect, this exte
rnal validity comes at a cost tointernal validity. The biggest fla
w is that Dr. Liz has recruited volunteers for her workshops, res
ulting in selection bias for her sample. Peoplewho volunteer for
extra math tutoring are likely to be more invested in completing
their degree and might also have more time available todedicat
e to their education. Dr. Liz would also need to be mindful of h
ow many people drop out of her study. If significant numbers of
participants withdrew, she could have a problem with differenti
al attrition, so that the most motivated people stayed with the w
orkshops. Onerelatively easy fix for this study would have been
to ask for volunteers more generally, and then randomly assign t
hese volunteers to take part ineither the math tutoring workshop
s or a different type of workshop. While the sample might still h
ave been less than random, Dr. Liz would atleast have had the p
ower to assign participants to different groups.
A Note on Qualitative Research Validity and Reliability
As discussed in Chapter 3, the validity of a quantitative study hi
nges on whether the experimental results demonstrate what we t
hink they aredemonstrating; reliability refers to whether the exp
erimental results will yield the same or similar results in other e
xperiments. The concepts ofvalidity and reliability in qualitativ
e research do not carry the same meanings as they do in quantita
tive research, nor is the goal to generalizethe results to individu
als, sites, or places outside of those under study. As Creswell (2
009) notes, qualitative validity “means the researcherchecks for
the accuracy of the findings by employing certain procedures, w
28. hile qualitative reliability indicates that the researcher’s approa
ch isconsistent across different researchers and different project
s” (p. 190). Because qualitative research does not include experi
mental results,numerical output, and data analyses, many qualit
ative researchers argue that they must evaluate the quality of th
eir results differently and focusmore on the trustworthiness or o
verall worth of their data. Thus, they ask, are the findings worth
y of attention? And, how do you evaluatethem?
To evaluate the trustworthiness or the validity and reliability of
qualitative studies, Guba and Lincoln (1994) proposed the follo
wing alternativecriteria outlined in Table 5.2 that are utilized by
many qualitative researchers.
Table 5.2: Criteria for evaluating quantitative research and quali
tative research
Criteria for evaluating quantitative research
Alternative criteria for evaluating qualitative research
Internal validity
·
Assesses whether the independentvariable is the only possible e
xplanationfor the dependent variable
Credibility
·
Used to assess “the accuracy of the identification and descriptio
n of thesubject of the study” (Smith & Davis, 2010, p. 51)
·
Examines whether the research is credible or believable from th
e perspectiveof the participant
External validity
·
Evaluates whether the results can beapplied to different populati
ons andsettings
Transferability
29. ·
Focuses on the transferability of findings to other settings and g
roups
·
Transferability is enhanced by providing thorough and clear rep
orts so thatthe results can be transferred to a different context
Objectivity
·
Empirical findings that can be confirmedby others and corrected
throughsubsequent research
Confirmability
·
The extent to which the qualitative report “is accurate, unbiased
, and can beconfirmed by others” (Smith & Davis, 2010, p. 51)
·
Confirmability is enhanced by having other researchers review d
rafts andpoint out inconsistencies, contradictions, and biases
·
Confirmability is also enhanced by providing thorough reports r
egarding theprocedures that were used to check and recheck the
data
Reliability
·
Assesses whether the methods will yieldsimilar results in other
studies
Dependability
·
“The extent to which the researcher believes the same results w
ould beproduced if the study were replicated” (Smith & Davis, 2
010, p. 51)
·
Emphasizes the need for the researcher to account for and descri
30. be thechanges that occur in the setting and how these changes af
fected the waythe researcher approached the study
There have been lengthy debates about the value of including al
ternative sets of criteria for judging qualitative research. Althou
gh the criteriafor evaluating validity and reliability in quantitati
ve and qualitative research may seem similar and appear to be m
ere relabeling of concepts, itshould be noted that the procedures
utilized to assess them are not. For example, to ensure validity
in qualitative studies, researchers employone or more of the foll
owing strategies:
·
Triangulation: Using multiple sources of data collection to build
justification for themes. If multiple sources of data confirm the
mes, thisprocess can be considered to add to the validity of the
study.
·
Member checking: Asking participants to review the final report
and confirm whether the descriptions or themes are accurate.
·
Providing rich, thick descriptions: Describing in detail the setti
ng, participants, and procedures. This process can add to the val
idity of thefindings.
· Clarifying researcher bias: Self-
reflections on any bias the researcher brings to the study helps c
reate an open and honest report.
·
Presenting negative or discrepant information: The researcher di
scussing any information that runs counter to the themes.
·
Peer debriefing: Utilizing peer debriefers that review and ask qu
estions about the study.
·
Spending prolonged time in the field: Spending long periods of
time in the field allows the researcher to develop in-
depth understandings ofthe phenomenon of interest. The more e
xperience a researcher has with participants in their natural setti
31. ng, the more valid the findings willbe.
·
External auditors: Employing an independent reviewer who is n
ot familiar with the research or project who can provide an obje
ctiveassessment of the study.
To determine whether qualitative research approaches are consis
tent or reliable, researchers must first ensure that all steps of th
e proceduresare documented thoroughly. Gibbs (2007) suggests
the following strategies:
·
Checking transcripts: The researcher checks written transcripts
against tape-
recorded information to ensure that mistakes were not madeduri
ng transcription.
·
Ensuring codes are stable: Verifying that a shift in the meaning
of codes did not occur during the process of coding. This is acc
omplished byconstantly comparing data with codes and providin
g detailed descriptions of the codes.
·
Coordinating communication: The researcher communicates the
analyses to coders through regular documented meetings.
· Cross-checking codes: The researcher cross-
checks codes developed by other researchers and compares the r
esults with his or her own.
5.3 Experimental Validity
Chapter 2 (Section 2.2) discussed the concept of validity, or the
degree to which measures capture the constructs that they were
designed tocapture. For example, a measure of happiness needs
to actually capture differences in people’s levels of happiness. I
n this section, we return tothe subject of validity in an experime
ntal context. Similar to our earlier discussion, validity refers he
re to whether the experimental results aredemonstrating what we
think they are demonstrating. We will cover two types of validi
ty that are relevant to experimental designs. The first is internal
32. validity, which assesses the degree to which results can be attri
buted to independent variables. The second is external validity,
whichassesses how well the results generalize to situations beyo
nd the specific conditions laid out in the experiment. Taken toge
ther, internal andexternal validity provide a way to assess the m
erits of an experiment. However, each of these has its own threa
ts and remedies, as discussed inthe following sections.
Internal Validity
In order to have a high degree of internal validity, experimenter
s strive for maximum control over extraneous variables. That is,
they try todesign experiments so that the independent variable i
s the only cause of differences between groups. But, of course,
no study is ever perfect,and there will always be some degree of
error. In many cases, errors are the result of unavoidable causes
, such as the health or mood of theparticipants on the day of the
experiment. In other cases, errors are caused by factors that are,
in fact, under the experimenter’s control. In thissection, we wil
l focus on several of these more manageable threats to internal v
alidity and discuss strategies for reducing their influence.
Experimental Confounds
To avoid threats to the internal validity of an experiment, it is i
mportant to control and minimize the influence of extraneous va
riables thatmight add noise to a hypothesis test. In many cases,
extraneous variables can be considered relatively minor nuisanc
es, as when our moodexperiment was accidentally run in a depre
ssing room. But now, let’s say we ran our study on temperature
and mood, and owing to a lack ofcareful planning, we accidental
ly placed all of the warm-
room participants in a sunny room, and the cool-
room participants in a windowlessroom. We might very well fin
d that the warm-
room participants were in a much better mood. But would this b
e the result of warmtemperatures or the result of exposure to su
nshine? Unfortunately, we would be unable to tell the difference
33. because of a confoundingvariable, or confound (in the case of c
orrelation studies,third variable). The confounding variable cha
nges systematically with the independentvariable. In this examp
le, room lighting would be confounded with room temperature b
ecause all of the warm-
room participants were alsoexposed to sunshine, and all of the c
ool-
room participants to artificial lighting. This combination of vari
ables would leave us unable todetermine which variable actually
had the effect on mood. The result would be that our groups dif
fered in more than one way, which wouldseriously hinder our ab
ility to say that the independent variable (the room) caused the
dependent variable (mood) to change.
Digital Vision/Thinkstock
The demeanor of the person running astudy may be a confoundi
ng variable.
It may sound like an oversimplification, but the way to avoid co
nfounds is to be very careful in designingexperiments. By ensur
ing that groups are alike in every way but the experimental cond
ition, one cangenerally prevent confounds. This is somewhat eas
ier said than done because confounds can come fromunexpected
places. For example, most studies involve the use of multiple re
search assistants whomanage data collection and interact with p
articipants. Some of these assistants might be more or lessfriend
ly than others, so it is important to make sure each of them inter
acts with participants in allconditions. If your friendliest assista
nt works with everyone in the warm-
room group, for example, itwould result in a confounding variab
le (friendly versus unfriendly assistants) between room and rese
archassistant. Consequently, you would be unable to separate th
e influence of your independent variable(the room) from that of
the confound (your research assistant).
Selection Bias
Internal validity can also be threatened when groups are differe
34. nt before the manipulation, which isknown as selection bias. Sel
ection bias causes problems because these inherent differences
might be the driving factor behind the results.Imagine you are te
sting a new program that will help people stop smoking. You mi
ght decide to ask for volunteers who are ready to quitsmoking a
nd put them through a 6-
week program. But by asking for volunteers—
a remarkably common error—
you gather a group of peoplewho are already somewhat motivate
d to stop smoking. Thus, it is difficult to separate the effects of
your new program from the effects of this apriori motivation.
One easy way to avoid this problem is through either random or
matched-random assignment. In the stop-
smoking example, you could still askfor volunteers, but then ran
domly assign these volunteers to one of the two programs. Beca
use both groups would consist of people motivatedto quit smoki
ng, this would help to cancel out the effects of motivation. Anot
her way to minimize selection bias is to use the same people inb
oth conditions so that they serve as their own control. In the sto
p-
smoking example, you could assign volunteers first to one progr
am and thento the other. However, you might run into a problem
with this approach—
participants who successfully quit smoking in the first program
wouldnot benefit from the second program. This technique is kn
own as a within-
subject design, and we will discuss its advantages and disadvant
agesin the subsection “Within-
Subject Designs” in Section 5.4, Experimental Designs.
Differential Attrition
Despite your best efforts at random assignment, you could still
have a biased sample at the end of a study as a result of differen
tial attrition.The problem of differential attrition (sometimes cal
led the mortality threat) occurs when subjects drop out of experi
mental groups for differentreasons. Let’s say you’re conducting
35. a study of the effects of exercise on depression levels. You man
age to randomly assign people to either 1week of regular exerci
se or 1 week of regular therapy. At first glance, it appears that t
he exercise group shows a dramatic drop in depressionsymptoms
. But then you notice that approximately one third of the people
in this group dropped out before completing the study. Chances
areyou are left with those who are most motivated to exercise, t
o overcome their depression, or both. Thus, you are unable to is
olate the effectsof your independent variable on depression sym
ptoms. While you cannot prevent people from dropping out of y
our study, you can lookcarefully at those who do. In many cases
, you can spot a pattern and use it to guide future research. For
example, it may be possible todiscover a profile of people who
dropped out of the exercise study and use this knowledge to incr
ease retention for the next attempt.
Outside Events
As much as we strive to control the laboratory environment, par
ticipants are often influenced by events in the outside world. Th
ese events—sometimes called history effects—are often large-
scale events such as political upheavals and natural disasters. T
he threat to research is that itbecomes difficult to tell whether p
articipants’ responses are the result of the independent variable
or the historical event(s). One great exampleof this comes from
a paper published by social psychologist Ryan Brown, now a pr
ofessor at the University of Oklahoma, on the effects ofreceivin
g different types of affirmative action as people were selected f
or a leadership position. The goal was to determine the best way
toframe affirmative action in order to avoid undermining the re
cipient’s confidence (Brown, Charnsangavej, Keough, Newman,
& Rentfrow, 2000).For about a week during the data collection
process, students at the University of Texas, where the study wa
s being conducted, were protestingon the main lawn about a con
troversial lawsuit regarding affirmative action policies. The res
ult was that participants arriving for this laboratorystudy had to
pass through a swarm of people holding signs that either denoun
36. ced or supported affirmative action. These types of outsideevent
s are difficult, if not impossible, to control. But, because these r
esearchers were aware of the protests, they made a decision to e
xcludefrom the study data gathered from participants during the
week of the protests, thus minimizing the effects of these outsid
e events.
Expectancy Effects
One final set of threats to internal validity results from the influ
ence of expectancies on people’s behavior. This can cause troub
le forexperimental designs in three related ways. First, experime
nter expectancies can cause researchers to see what they expect
to see, leading tosubtle bias in favor of their hypotheses. In a cl
ever demonstration of this phenomenon, the psychologist Robert
Rosenthal asked his graduatestudents at Harvard University to t
rain groups of rats to run a maze (Rosenthal & Fode, 1963). He
also told them that based on a pretest, therats had been classifie
d as either bright or dull. As you might have guessed, these labe
ls were pure fiction, but they still influenced the way thatthe stu
dents treated the rats. Rats labeled “bright” were given more en
couragement and learned the maze much more quickly than rats
labeled“dull.” Rosenthal later extended this line of work to teac
hers’ expectations of their students (Rosenthal & Jacobson, 196
8) and found support forthe same conclusion: People often bring
about the results they expect by behaving in a particular way.
One common way to avoid experimenter expectancies is to have
participants interact with a researcher who is “blind” (i.e., una
ware) to thecondition that each participant is in. The researcher
may be fully aware of the research hypothesis, but his or her be
havior is unlikely to affectthe results. In the Rosenthal and Fode
(1963) study, the graduate students’ behavior influenced the rat
s’ learning speed only because they wereaware of the labels “bri
ght” and “dull.” If these had not been assigned, the rats would h
ave been treated fairly equally across the conditions.
Second, participants in a research study often behave differently
based on their own expectancies about the goals of the study. T
37. heseexpectancies often develop in response to demand character
istics, or cues in the study that lead participants to guess the hy
pothesis. In a well-
known study conducted at the University of Wisconsin, psychol
ogists Leonard Berkowitz and Anthony LePage found that partic
ipants wouldbehave more aggressively—
by delivering electric shocks to another participant—
if a gun was in the room than if there were no gun present(Berk
owitz & LePage, 1967). This finding has some clear implication
s for gun control policies, suggesting that the mere presence of
gunsincreases the likelihood of violence. However, a common cr
itique of this study is that participants may have quickly clued i
n to its purpose andfigured out how they were “supposed” to be
have. That is, the gun served as a demand characteristic, possibl
y making participants act moreaggressively because they though
t it was expected of them.
To minimize demand characteristics, researchers use a variety o
f techniques, all of which attempt to hide the true purpose of the
study fromparticipants. One common strategy is to use a cover
story, or a misleading statement about what is being studied. In
Chapter 1 (Section 1.4,Hypotheses and Theories, and Section 1.
7, Ethics in Research), we discussed Milgram’s famous obedien
ce studies, which discovered that peoplewere willing to obey or
ders to deliver dangerous levels of electric shocks to other peop
le. In order to disguise the purpose of the study, Milgramdescrib
ed it to people as a study of punishment and learning. And the a
ffirmative action study by Ryan Brown and colleagues (Brown e
t al.,2000) was presented as a study of leadership styles. The go
al in using these cover stories is to give participants a compellin
g explanation forwhat they experience during the study and to d
irect their attention away from the research hypothesis.
Another strategy is to use the unrelated-
experiments technique, which leads participants to believe that t
hey are completing two differentexperiments during one laborat
ory session. The experimenter can use this bit of deception to pr
esent the independent variable during the firstexperiment and th
38. en measure the dependent variable during the second experiment
. For example, a study by Harvard psychologist MargaretShih an
d colleagues (Shih, Pittinsky, & Ambady, 1999) recruited Asian
American females and asked them to complete two supposedly u
nrelatedstudies. In the first, they were asked to read and form i
mpressions of one of two magazine articles; these articles were
designed to make themfocus on either their Asian American ide
ntity or their female identity. In the second experiment, they we
re asked to complete a math test asquickly as possible. The goal
of this study was to examine the effects on math performance o
f priming different aspects of identity. Based onprevious researc
h, these authors predicted that priming an Asian American ident
ity would remind participants of positive stereotypes regarding
Asians and math performance, whereas priming a female identit
y would remind participants of negative stereotypes regarding w
omen and mathperformance. As expected, priming an Asian Am
erican identity led this group of participants to do better on a m
ath test than did priming afemale identity. The unrelated-
experiments technique was especially useful for this study beca
use it kept participants from connecting theindependent variable
(magazine article prime) with the dependent variable (math test
).
A final way in which expectancies shape behavior is the placebo
effect, meaning that change can result from the mere expectatio
n that changewill occur. Imagine you wanted to test the hypothe
sis that alcohol causes people to become aggressive. One relativ
ely easy way to do this wouldbe to give alcohol to a group of vo
lunteers (aged 21 and older) and then measure how aggressive t
hey became in response to being provoked.The problem with thi
s approach is that people also expect alcohol to change their beh
avior, so you might see changes in aggression simplybecause of
these expectations. Fortunately, there is an easy solution: Add a
placebo control group to your study that mimics the experiment
alcondition in every way but one. In this case, you might tell all
participants that they will be drinking a mix of vodka and oran
ge juice but onlyadd vodka to half of the participants’ drinks. T
39. he orange-juice-
only group serves as the placebo control, so any differences bet
ween this groupand the alcohol group can be attributed to the al
cohol itself.
External Validity
In order to attain a high degree of external validity in their expe
riments, researchers strive for maximum realism in the laborator
y environment. External validity means that the results extend b
eyond the particular set of circumstances created in a single stu
dy. Recall that science is acumulative discipline and that knowl
edge grows one study at a time. Thus, each study is more meani
ngful to the extent that it sheds light on areal phenomenon and t
o the extent that the results generalize to other studies. Let’s ex
amine each of these criteria separately.
Mundane Realism
The first component of external validity is the extent to which a
n experiment captures the real-
world phenomenon under study. One popularquestion in the area
of aggression research is whether rejection by a peer group lead
s to aggression. That is, when people are rejected from agroup,
do they lash out and behave aggressively toward the members of
that group? Researchers must find realistic ways to manipulater
ejection and measure aggression without infringing on participa
nts’ welfare. Given the need to strike this balance, how real can
things get inthe laboratory? How do we study real-
world phenomena without sacrificing internal validity?
The answer is to strive for mundane realism, meaning that the re
search replicates the psychological conditions of the real-
world phenomenon(sometimes referred to as ecological validity)
. In other words, we need not re-
create the phenomenon down to the last detail; instead, we aimt
o make the laboratory setting feel like the real world. Researche
rs studying aggressive behavior and rejection have developed so
me ratherclever ways of doing this, including allowing participa
40. nts to administer loud noise blasts or serve large quantities of h
ot sauce to those whorejected them. Psychologically, these acts
feel like aggressive revenge because participants are able to las
h out against those who rejected them,with the intent of causing
harm, even though the behaviors themselves may differ from th
e ways people exact revenge in the real world.
In a 1996 study, Tara MacDonald and her colleagues at Queen’s
University in Ontario, Canada, examined the relationship betwe
en alcohol andcondom use (MacDonald, Zanna, & Fong, 1996).
The authors pointed out a puzzling set of real-
world data: Most people reported that theywould use condoms w
hen engaging in casual sex, but the rates of unprotected sex (i.e.
, having sexual intercourse without a condom) were alsoremarka
bly high. In this study, the authors found that alcohol was a key
factor in causing “common sense to go out the window” (p. 763)
,resulting in a decreased likelihood of condom use. But how on
earth might they study this phenomenon in the laboratory? In th
e authors’words, “even the most ambitious of scientists would h
ave to conclude that it is impossible to observe the effects of int
oxication on actualcondom use in a controlled laboratory setting
” (p. 765).
To solve this dilemma, MacDonald and colleagues developed a
clever technique for studying people’s intentions to use condom
s. Participantswere randomly assigned to either an alcohol or pl
acebo condition, and then they viewed a video depicting a youn
g couple that was faced withthe dilemma of whether to have unp
rotected sex. At the key decision point in the video, the tape wa
s stopped and participants were askedwhat they would do in the
situation. As predicted, participants who were randomly assigne
d to consume alcohol said they would be morewilling to proceed
with unprotected sex. While this laboratory study does not capt
ure the full experience of making decisions about casual sex,it d
oes a nice job of capturing the psychological conditions involve
d.
Generalizing Results
41. The second component of external validity is the extent to whic
h research findings generalize to other studies. Generalizability
refers to theextent to which the results extend to other studies, u
sing a wide variety of populations and a wide variety of operati
onal definitions (sometimesreferred to as population validity). I
f we conclude that rejection causes people to become more aggr
essive, for example, this conclusion shouldideally carry over to
other studies of the same phenomenon, using different ways of
manipulating rejection and different ways of measuringaggressi
on. If we want to conclude that alcohol reduces intentions to use
condoms, we would need to test this relationship in a variety of
settings—from laboratories to nightclubs—
using different measures of intentions.
Thus, each study that we conduct is limited in its conclusions. I
n order for your particular idea to take hold in the scientific lite
rature, it must be replicated, or repeated in different contexts. T
hese replications can take one of four forms. First, exact replica
tion involves trying to re-
createthe original experiment as closely as possible in order to v
erify the findings. This type of replication is often the first step
following a surprisingresult, and it helps researchers to gain mo
re confidence in the patterns. The second and much more comm
on method, conceptual replication,involves testing the relations
hip between conceptual variables using new operational definiti
ons. Conceptual replications would include testingour aggressio
n hypotheses using new measures or examining the link between
alcohol and condom use in different settings. For example,rejec
tion might be operationalized in one study by having participant
s be chosen last for a group project. A conceptual replication mi
ght take adifferent approach: operationalizing rejection by havin
g participants be ignored during a group conversation or voted o
ut of the group. Likewise,a conceptual replication might change
the operationalization of aggression by having one study measur
e the delivery of loud blasts of noise andanother measure the am
ount of hot sauce that people give to their rejecters. Each variati
on studies the same concept (aggression or rejection)but uses sli
42. ghtly different operationalizations. If all of these variations yiel
ded similar results, this would provide further evidence of theun
derlying ideas—
in this case, that rejection causes people to be more aggressive.
The third method, participant replication, involves repeating the
study with a new population of participants. These types of repl
ication areusually driven by a compelling theory as to why the t
wo populations differ. For example, you might reasonably hypot
hesize that the decision touse condoms is guided by a different s
et of considerations among college students than among older, s
ingle adults. Finally, constructivereplication re-
creates the original experiment but adds elements to the design.
These additions are typically designed to either rule outalternati
ve explanations or extend knowledge about the variables under s
tudy. In our rejection and aggression example, you might test w
hethermales and females respond the same way or perhaps comp
are the impact of being rejected by a group versus an individual.
Internal Versus External Validity
We have focused on two ways to assess validity in the context o
f experimental designs. Internal validity assesses the degree to
which results canbe attributed to independent variables; external
validity assesses how well results generalize beyond the specifi
c conditions of the experiment.In an ideal world, studies would
have a high degree of both of these. That is, we would feel com
pletely confident that our independent variablewas the only caus
e of differences in our dependent variable, and our experimental
paradigm would perfectly capture the real-
worldphenomenon under study.
In reality, though, there is often a trade-
off between internal and external validity. In MacDonald and co
lleagues’ study on condom use(MacDonald et al., 1996), the res
earchers sacrificed some realism in order to conduct a tightly co
ntrolled study of participants’ intentions. InBerkowitz and LePa
ge’s (1967) study on the effect of weapons, the researchers riske
d the presence of a demand characteristic in order to studyreacti
43. ons to actual weapons. These types of trade-
offs are always made based on the goals of the experiment. To g
ive you a better sense ofhow researchers make these compromis
es, let’s evaluate three fictional examples.
Scenario 1: Time Pressure and Stereotyping
Dr. Bob is interested in whether people are more likely to rely o
n stereotypes when they are in a hurry. In a well-
controlled laboratoryexperiment, participants are asked to categ
orize ambiguous shapes as either squares or circles, and half of
these participants are given a shorttime limit to accomplish the t
ask. The independent variable is the presence or absence of time
pressure, and the dependent variable is theextent to which peop
le use stereotypes in their classification of ambiguous shapes. D
r. Bob hypothesizes that people will be more likely to usestereot
ypes when they are in a hurry because they will have fewer cogn
itive resources to carefully consider all aspects of the situation.
Dr. Bobtakes great care to have all participants meet in the sam
e room. He uses the same research assistant every time, and the
study is alwaysconducted in the morning. Consistent with his hy
pothesis, Dr. Bob finds that people seem to use shape stereotype
s more under time pressure.
The internal validity of this study appears high—
Dr. Bob has controlled for other influences on participants’ atte
ntion span by collecting all of hisdata in the morning. He has al
so minimized error variance by using the same room and the sa
me research assistant. In addition, Dr. Bob hascreated a tightly
controlled study of stereotyping through the use of circles and s
quares. Had he used photographs of people (rather thanshapes),
the attractiveness of these people might have influenced particip
ants’ judgments. But here’s the trade-
off: By studying the socialphenomenon of stereotyping using ge
ometric shapes, Bob has removed the social element of the stud
y, thereby posing a threat to mundanerealism. The psychological
meaning of stereotyping shapes is rather different from the mea
ning of stereotyping people, which makes this studyrelatively lo
44. w in external validity.
Scenario 2: Hunger and Mood
Dr. Jen is interested in the effects of hunger on mood; not surpri
singly, she predicts that people will be happier when they are w
ell fed. Shetests this hypothesis with a lengthy laboratory experi
ment, requiring participants to be confined to a laboratory room
for 12 hours with very fewdistractions. Participants have access
to a small pile of magazines to help pass the time. Half of the p
articipants are allowed to eat during thistime, and the other half
is deprived of food for the full 12 hours. Dr. Jen—
a naturally friendly person—collects data from the food-
deprivationgroups on a Saturday afternoon, while her grumpy re
search assistant, Mike, collects data from the well-
fed group on a Monday morning. Herindependent variable is foo
d deprivation, with participants either not deprived of food or d
eprived for 12 hours. Her dependent variable consistsof particip
ants’ self-
reported mood ratings. When Dr. Jen analyzes the data, she is s
hocked to discover that participants in the food-
deprivationgroup were much happier than those in the well-
fed group.
Compared with our first scenario, this study seems high on exter
nal validity. To test her predictions about food deprivation, Dr.
Jen actuallydeprives her participants of food. One possible prob
lem with external validity is that participants are confined to a l
aboratory setting during thedeprivation period with only a small
pile of magazines to read. That is, participants may be more aff
ected by hunger when they do not haveother things to distract th
em. In the real world, people are often hungry but distracted by
paying attention to work, family, or leisure activities.But Dr. Je
n has sacrificed some external validity for the sake of controllin
g how participants spend their time during the deprivation perio
d. Thelarger problem with her study has to do with internal vali
dity. Dr. Jen has accidentally confounded two additional variabl
es with her independentvariable: Participants in the deprivation
45. group have a different experimenter and data are collected at a d
ifferent time of day. Thus, Dr. Jen’ssurprising results most likel
y reflect the fact that everyone is in a better mood on Saturday t
han on Monday and that Dr. Jen is more pleasant tospend 12 ho
urs with than Mike is.
Scenario 3: Math Tutoring and Graduation Rates
Dr. Liz is interested in whether specialized math tutoring can he
lp increase graduation rates among female math majors. To test
her hypothesis,she solicits female volunteers for a math skills w
orkshop by placing flyers around campus, as well as by sending
email announcements to allmath majors. The independent variab
le is whether participants are in the math skills workshop, and t
he dependent variable is whetherparticipants graduate with a ma
th degree. Those who volunteer for the workshop are given wee
kly skills tutoring, along with informal discussiongroups design
ed to provide encouragement and increase motivation. At the en
d of the study, Dr. Liz is pleased to see that participants in thew
orkshops are twice as likely as nonparticipants to stick with the
major and graduate.
The obvious strength of this study is its external validity. Dr. Li
z has provided math tutoring to math majors, and she has observ
ed a differencein graduation rates. Thus, this study is very much
embedded in the real world. But, as you might expect, this exte
rnal validity comes at a cost tointernal validity. The biggest fla
w is that Dr. Liz has recruited volunteers for her workshops, res
ulting in selection bias for her sample. Peoplewho volunteer for
extra math tutoring are likely to be more invested in completing
their degree and might also have more time available todedicat
e to their education. Dr. Liz would also need to be mindful of h
ow many people drop out of her study. If significant numbers of
participants withdrew, she could have a problem with differenti
al attrition, so that the most motivated people stayed with the w
orkshops. Onerelatively easy fix for this study would have been
to ask for volunteers more generally, and then randomly assign t
hese volunteers to take part ineither the math tutoring workshop
46. s or a different type of workshop. While the sample might still h
ave been less than random, Dr. Liz would atleast have had the p
ower to assign participants to different groups.
A Note on Qualitative Research Validity and Reliability
As discussed in Chapter 3, the validity of a quantitative study hi
nges on whether the experimental results demonstrate what we t
hink they aredemonstrating; reliability refers to whether the exp
erimental results will yield the same or similar results in other e
xperiments. The concepts ofvalidity and reliability in qualitativ
e research do not carry the same meanings as they do in quantita
tive research, nor is the goal to generalizethe results to individu
als, sites, or places outside of those under study. As Creswell (2
009) notes, qualitative validity “means the researcherchecks for
the accuracy of the findings by employing certain procedures, w
hile qualitative reliability indicates that the researcher’s approa
ch isconsistent across different researchers and different project
s” (p. 190). Because qualitative research does not include experi
mental results,numerical output, and data analyses, many qualit
ative researchers argue that they must evaluate the quality of th
eir results differently and focusmore on the trustworthiness or o
verall worth of their data. Thus, they ask, are the findings worth
y of attention? And, how do you evaluatethem?
To evaluate the trustworthiness or the validity and reliability of
qualitative studies, Guba and Lincoln (1994) proposed the follo
wing alternativecriteria outlined in Table 5.2 that are utilized by
many qualitative researchers.
Table 5.2: Criteria for evaluating quantitative research and quali
tative research
Criteria for evaluating quantitative research
Alternative criteria for evaluating qualitative research
Internal validity
·
Assesses whether the independentvariable is the only possible e
xplanationfor the dependent variable
47. Credibility
·
Used to assess “the accuracy of the identification and descriptio
n of thesubject of the study” (Smith & Davis, 2010, p. 51)
·
Examines whether the research is credible or believable from th
e perspectiveof the participant
External validity
·
Evaluates whether the results can beapplied to different populati
ons andsettings
Transferability
·
Focuses on the transferability of findings to other settings and g
roups
·
Transferability is enhanced by providing thorough and clear rep
orts so thatthe results can be transferred to a different context
Objectivity
·
Empirical findings that can be confirmedby others and corrected
throughsubsequent research
Confirmability
·
The extent to which the qualitative report “is accurate, unbiased
, and can beconfirmed by others” (Smith & Davis, 2010, p. 51)
·
Confirmability is enhanced by having other researchers review d
rafts andpoint out inconsistencies, contradictions, and biases
·
Confirmability is also enhanced by providing thorough reports r
48. egarding theprocedures that were used to check and recheck the
data
Reliability
·
Assesses whether the methods will yieldsimilar results in other
studies
Dependability
·
“The extent to which the researcher believes the same results w
ould beproduced if the study were replicated” (Smith & Davis, 2
010, p. 51)
·
Emphasizes the need for the researcher to account for and descri
be thechanges that occur in the setting and how these changes af
fected the waythe researcher approached the study
There have been lengthy debates about the value of including al
ternative sets of criteria for judging qualitative research. Althou
gh the criteriafor evaluating validity and reliability in quantitati
ve and qualitative research may seem similar and appear to be m
ere relabeling of concepts, itshould be noted that the procedures
utilized to assess them are not. For example, to ensure validity
in qualitative studies, researchers employone or more of the foll
owing strategies:
·
Triangulation: Using multiple sources of data collection to build
justification for themes. If multiple sources of data confirm the
mes, thisprocess can be considered to add to the validity of the
study.
·
Member checking: Asking participants to review the final report
and confirm whether the descriptions or themes are accurate.
·
Providing rich, thick descriptions: Describing in detail the setti
ng, participants, and procedures. This process can add to the val
49. idity of thefindings.
· Clarifying researcher bias: Self-
reflections on any bias the researcher brings to the study helps c
reate an open and honest report.
·
Presenting negative or discrepant information: The researcher di
scussing any information that runs counter to the themes.
·
Peer debriefing: Utilizing peer debriefers that review and ask qu
estions about the study.
·
Spending prolonged time in the field: Spending long periods of
time in the field allows the researcher to develop in-
depth understandings ofthe phenomenon of interest. The more e
xperience a researcher has with participants in their natural setti
ng, the more valid the findings willbe.
·
External auditors: Employing an independent reviewer who is n
ot familiar with the research or project who can provide an obje
ctiveassessment of the study.
To determine whether qualitative research approaches are consis
tent or reliable, researchers must first ensure that all steps of th
e proceduresare documented thoroughly. Gibbs (2007) suggests
the following strategies:
·
Checking transcripts: The researcher checks written transcripts
against tape-
recorded information to ensure that mistakes were not madeduri
ng transcription.
·
Ensuring codes are stable: Verifying that a shift in the meaning
of codes did not occur during the process of coding. This is acc
omplished byconstantly comparing data with codes and providin
g detailed descriptions of the codes.
·
Coordinating communication: The researcher communicates the
50. analyses to coders through regular documented meetings.
· Cross-checking codes: The researcher cross-
checks codes developed by other researchers and compares the r
esults with his or her own.
.4 Experimental Designs
There are three types of experimental designs that can be used i
n research studies: pre-experimental designs, quasi-
experimental designs, andtrue experimental designs. Each desig
n differs in the degree to which it controls for confounding or hi
dden variables; in turn, the degree ofcontrol affects the internal
validity of the study. Pre-
experimental designs have little to almost no control; instead, th
ey involve studying a singlegroup or unbalanced groups that are
not randomly assigned and then introducing an intervention or t
reatment during the study. Quasi-
experimental designs offer some control and may or may not inc
lude a control group, but the participants are not randomly assig
ned to groups. True experimental designs give the researcher ma
ximum control and involve randomly assigning participants and
manipulating the independentvariable. As we will see later in th
is discussion, true experiments include randomized group select
ion and assignments, a control group, and alarge degree of contr
ol over confounding variables.
The best types of designs are those that offer control over confo
unding variables and include random assignment of the participa
nts. Althoughtrue experiments provide a much stronger and mor
e valid design, sometimes they cannot be used for a study for pa
rticular reasons. Thefollowing sections will discuss examples of
the various types of designs. This is not an exhaustive list, and
researchers can modify designs orcombine them in various ways
.
Pre-Experimental Designs
Pre-
experimental designs follow basic experimental procedures but
51. cannot show cause-and-
effect relationships. Some researchers argue thatpre-
experimental designs confer so little control that they have mini
mal scientific value. As a result, these designs should be used o
nly whenmaking tentative hypotheses and should be followed up
with more controlled research (Leedy & Ormrod, 2010). As we
will see in the followingthree sections, pre-
experimental designs generally enroll a single group, although s
ome designs include experimental and control groups that arenot
randomly selected.
The three pre-
experimental designs we describe next are very limited in terms
of the conclusions that can be drawn from them. In comparison,
quasi-
experimental designs (described later in this section) confer mor
e control over confounding variables.
One-Shot Case Study
One-
shot case studies are the most basic type of experimental design
s. Because they do not include a control or comparison group, it
isimpossible to know whether the outcome results would have b
een better if the intervention had not been provided. This type o
f pre-
experimental design involves using one group (Group A), introd
ucing an intervention (X), and then administering a posttest obs
ervation (O) todetermine the effects of the intervention, as show
n below:
Group A: X_____________O
Additionally, since this type of design does not use a pretest, it
is impossible to determine whether changes within the group ha
ve actuallytaken place. As a result, one-
shot case studies have low internal validity.
In addition, because one-
shot case studies confer little to no control over confounding va
riables, variables such as maturation or otherenvironmental con