The plan for today
• Quick causal inference recap
• Experimental design
• Ethics
• Next week: more experimental design
2
The Fundamental Problem of Causal Inference
• It is impossible to observe any unit we’re interested in (e.g., person, country,
fi
rm, school) both when it has and has not been changed by a causal action
• Only in physics and chemistry are units (particles, molecules)
interchangeable (“exchangeable”) enough that we don’t have to worry
about this
• If I give 100 euros to Mary and she gets happier than she was before, we
fundamentally cannot know how happy she would have been if I had not
given her 100 euros
• We can use theory, intuition, anecdote, data to come up with a (very) good
guess
• But we can never be sure
That’s experiments in theory, what about in practice?
Real-world implementation of experiments is di
ffi
cult!
• When you assign a unit (e.g. person) to treatment, they may not actually take
that treatment
• You give them a drug but they don’t take it
• You send them a YouTube video to watch but they don’t watch it, or they
mute it and don’t pay attention
• Same for the control group
• They may go out and
fi
nd the drug themselves, or stumble on the YouTube
video
That’s experiments in theory, what about in practice?
• “Compliers are subjects who will take the treatment if and only if they were
assigned to the treatment group…
• Non-compliers are composed of the three remaining subgroups:
• Always-takers are subjects who will always take the treatment even if they
were assigned to the control group
• Never-takers are subjects who will never take the treatment even if they
were assigned to the treatment group
• De
fi
ers are subjects who will do the opposite of their treatment assignment
status” https://en.wikipedia.org/wiki/Local_average_treatment_e
ff
ect
Experimental design
• Experiments have three “main” components:
• Treatment
• Randomization into treatment and control groups
• Measurement of outcome
• Let’s look at each of these components in turn
• Also look at groupings of these that form common ‘types’
of experiments
8
Designing a treatment
Good treatments
• “One hopes that the treatment alters values of the independent variable (e.g., causes
subjects to think about campaign
fi
nance in terms of free speech) or induces certain
beliefs among participants (e.g., how much they will get paid).” (Druckman 2020 p.82)
• The treatment should:
• Be e
ffi
cacious
• Fit with the theoretical construct the researcher is interested in
• Vitamin D and…beach holiday? Multivitamin? Stern lecture from doctor?
• Support for Putin and…seeing an o
ffi
cial arrested for corruption? Watching a Navalny
video about regime corruption? Reading a TI report about Russian corruption levels?
• Have a basis in theory
• How will the knowledge gained from the experiment
fi
t in with other things we know
about the world?
9
Designing a treatment
Validation, piloting
• “When it comes to evaluating treatments, researchers should not
trust themselves to validate them.
• A crucial step taken in the design of an experiment entails validating
the intervention with a sample that matches the experimental
participants and/or the participants themselves.”
• “One need not test the outcome variables of interest but instead
assess whether participants interpret and react to the intervention
as presumed (e.g., increased anxiety or social trust)
10
Designing a treatment
Validation, piloting 2
• Piloting has the advantage of allowing one to evaluate di
ff
erent
approaches before implementing the actual experiment
• Ideally, one pilots on a sample drawn from the same population as
the experiment
• If that is not possible, however, one should carefully think about
possible di
ff
erences between the pilot sample and the
experimental sample”
11
Designing a treatment
Manipulation checks
• “In addition to piloting, one can incorporate a manipulation check
into the experiment itself to empirically assess whether respondents
receive and perceive the treatment as intended.”
• Example: experiment on whether seeing a news report from Fox
News leads people to vote for Republicans more than a CNN report
• Manipulation check: ask what the source of the clip was
• Downsides: extra cost, be careful with outcome measurement
12
Measurement and validity
Druckman 2020 p.87, 93
• Experiments are usually*
taken to have good
internal validity and
‘statistical conclusion
validity’
• Good treatment design,
measurement,
randomization will help
ensure the
fi
rst three of
these types of validity
13
External validity and generalizability
Druckman p.94-102
• “External validity means generalizing across 1) samples, 2) settings, 3)
treatments, and 4) outcome measures”
• What is being generalized?
• Existence of an e
ff
ect? Precise e
ff
ect size?
• To what are you generalizing?
• What population?
• The answers to these questions depend on the goals of the experiment
14
External validity and realism/naturalism
• Does it matter how realistic your treatment is?
• What is feasible and ethical?
• Example:
• Outcome: voting in an election
• Conceptual treatment: watching advertisements for a candidate
• Practical/actual treatment:
• Have participants watch 30 minutes of the news with advertisements
interspersed?
• Show a series of only advertisements? How many? How many times?
15
Other kinds of treatments
Encouragement design
• Intent-to-treat estimator
• “randomly incentivize subjects recruited via survey to follow one of two Twitter
accounts programmed to retweet posts by politically in
fl
uential users. Subjects were
periodically quizzed about the contents of their Twitter feeds and surveyed again to
gauge the e
ff
ect of exposure to counter-attitudinal social media content.” (Guess 2021)
• Shows the trade-o
ff
between naturalism and strength of treatment
• “Like the o
ffl
ine world, online environments are crowded and multifaceted, with
many competing demands on users’ attention.”
• People just don’t see or pay attention to stu
ff
!
• “at least in an intent-to-treat world, manipulating a single post, ad impression, or
account exposure may not in itself be expected to produce measurably large
e
ff
ects.”
16
Ethics
Morton & Williams Chapters 11-13
• Experiments must be
ethical!
• Harm or risk to participants
• Changing of important real-
world outcomes (e.g.,
elections)
• Deception
18
Ethics
Morton & Williams
• Bene
fi
ts vs. risks
• Harms
• Psychological harm
• Invasion of privacy or
con
fi
dentiality
19
Ethics
Morton & Williams
• Probability and magnitude of harm
• Compare to daily life and routine risks
• Vulnerable subjects
• Prisoners, children, disabled
• When possible, experiments need to get informed consent
• Not always feasible! This may be a foreign concept or may interfere with the
experiment
• “Informed consent has become a mainstay of research with human subjects because it
serves two purposes: (1) it ensures that the subjects are voluntarily participating and
that their autonomy is protected and (2) it provides researchers with legal protections in
case of unexpected events.”
20
Ethics
Morton & Williams Chapter 13
• Deception
• Concerns about contaminating a subject pool
• If you must use deception, you should probably debrief
21
Population and sample
• The population you wish to generalize to may be:
• All adult residents of Ireland
• All adult voters of Ireland
• Residents of Dublin between 18 and 45 years of age
• Or perhaps the population is irrelevant
• Your experiment will need to de
fi
ne a sample of that population on
which your treatment will be applied
22
Sampling
Druckman p.109-120
• How homogenous do you think the treatment is?
• If you’re interested in attitudes towards pension reform, your
sample may need su
ffi
cient young and old people
• Pharmaceuticals and biological sex
• Urban vs. rural residents
• Cost, generalizability, practicality
23
Sampling: Random samples
• Dial random telephone numbers
• Pick names out of a list (phonebook) randomly
• Where do you get the list??
• Not always legal or feasible
24
Druckman p.109-120
Sampling: Convenience samples
• Take whoever is convenient
• or whoever selects into your sample
• Put up posters, send out emails, buy advertisements
• Talk to people on the street
• Cheaper and easier, but sharply limits generalizability
25
Druckman p.109-120
Sampling: Weighting
• “Weighting requires that one obtain descriptive data of the target
population, typically demographics.
• For example, when the population includes all Americans, one can
use the U.S. Census…for demographic population
fi
gures.
• One then computes weights that account for each respondent’s
probability of being included in the sample
• For example, if the population consists of 50% men but the
sample contains only 40% men, then male sample respondents
will be weighted to count more in computations from the sample
(and women will be counted less)
26
Druckman p.109-120
Sampling: Weighting
• Survey researchers commonly use weights, even with many
probability samples, to ensure the accuracy of observational
inferences (e.g., the percentage of men who hold a particular
attitude)” (Druckman 2020 p.117)
• Consider weighting if:
• e
ff
ects are heterogeneous in a way you can correct for
• you care about the population
• you are interested in precise e
ff
ect size
27
Druckman p.109-120
Sample size and power
https://egap.org/resource/10-things-to-know-about-statistical-power/
• “Power is the ability to distinguish signal from noise.”
• “If our experiments are highly-powered, we can be con
fi
dent that if there truly is a
treatment e
ff
ect, we’ll be able to see it.”
• We want to avoid false negatives and false positives
• Example:
• “Now suppose an experiment instead used subjects’ income as an outcome variable.
• Incomes can vary pretty widely – in some places, it is not uncommon for people to
have neighbors that earn two, ten, or one hundred times their daily wages.
• When noise is high, experiments have more trouble.
• A treatment that increased workers’ incomes by 1% would be di
ffi
cult to detect,
because incomes di
ff
er by so much in the
fi
rst place.”
28
Sample size and power
https://egap.org/resource/10-things-to-know-about-statistical-power/
• The three ingredients of statistical power:
• Strength of the treatment
• Background noise
• As the background noise of your outcome variables increases, the power of
your experiment decreases
• To the extent that it is possible, try to select outcome variables that have low
variability
• In practical terms, this means comparing the standard deviation of the
outcome variable to the expected treatment e
ff
ect size
• Sample size
• See link for formula and calculator, but also beware! Power is a slippery thing
29
Sample size and power
https://egap.org/resource/10-things-to-know-about-statistical-power/
• https://www.stat.ubc.ca/~rollin/stats/ssize/n2.html
• https://machinelearningmastery.com/statistical-power-and-power-
analysis-in-python/
• “Statistical power is the probability of a hypothesis test of
fi
nding
an e
ff
ect if there is an e
ff
ect to be found.
• A power analysis can be used to estimate the minimum sample
size required for an experiment, given a desired signi
fi
cance level,
e
ff
ect size, and statistical power.”
30
Randomization
Random assignment to treatment and control groups
• So you’ve got your experimental design, a sample of people to
experiment on
• Now you need to assign people to treatment and control
• Otherwise it wouldn’t be an experiment!
• Simple randomization
• Complete simple randomization
• Block and cluster randomization
31
Randomization: Simple random assignment
Druckman 2020, p.109-120
• “Simple random assignment is a term of art, referring to a procedure—a die roll
or coin toss—that gives each subject an identical probability of being assigned
to the treatment group
• The practical drawback of simple random assignment is that when N is small,
random chance can create a treatment group that is larger or smaller than
what the researcher intended.” (FEDAI p.36)
• “A useful special case of simple random assignment is complete random
assignment, where exactly m of N units are assigned to the treatment group with
equal probability.”
• Be careful about de
fi
ning random: things like birthday may not be completely
random in a formal sense 32
Block randomization
https://egap.org/resource/10-things-to-know-about-randomization/
• It is possible, when randomizing, to specify the balance of particular
factors you care about between treatment and control groups
• even though it is not possible to specify which particular units are
selected for either group
• For example, you can specify that treatment and control groups
contain equal ratios of men to women
33
Block randomization
https://egap.org/resource/10-things-to-know-about-randomization/
• Why is this desirable?
• Not because our estimate of the average treatment e
ff
ect would otherwise be
biased, but because it could be really noisy.
• Suppose that a random assignment happened to generate a very male
treatment group and a very female control group. We would observe a
correlation between gender and treatment status. If we were to estimate a
treatment e
ff
ect, that treatment e
ff
ect would still be unbiased because gender
did not cause treatment status.
• However, it would be more di
ffi
cult to reject the null hypothesis that it was
not our treatment but gender that was producing the e
ff
ect.
• In short, the imbalance produces a noisy estimate, which makes it more
di
ffi
cult for us to be con
fi
dent in our estimates.
34
Block randomization
https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html
• “Block random assignment (sometimes known as strati
fi
ed
random assignment) is a powerful tool when used well.
• In this design, subjects are sorted into blocks (strata) according to
their pre-treatment covariates, and then complete random
assignment is conducted within each block.
• For example, a researcher might block on gender, assigning
exactly half of the men and exactly half of the women to
treatment.”
35
Block randomization
https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html
• “Why block?
• The
fi
rst reason is to signal to future readers that treatment e
ff
ect
heterogeneity may be of interest: is the treatment e
ff
ect di
ff
erent for
men versus women? Of course, such heterogeneity could be explored
if complete random assignment had been used, but blocking on a
covariate defends a researcher (somewhat) against claims of data
dredging.
• The second reason is to increase precision. If the blocking variables
are predictive of the outcome (i.e., they are correlated with the
outcome), then blocking may help to decrease sampling variability. It’s
important, however, not to overstate these advantages. The gains from
a blocked design can often be realized through covariate adjustment
alone.” 36
Cluster randomization
https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html
• Assigning units to treatment or control as a cluster
• “Housemates in households: whole households are assigned to treatment or control
• Students in classrooms: whole classrooms are assigned to treatment or control
• Residents in towns or villages: whole communities are assigned to treatment or
control”
• Don’t do this unless you really have to!
• “Clustered assignment decreases the e
ff
ective sample size of an experiment. In
the extreme case when outcomes are perfectly correlated with clusters, the
experiment has an e
ff
ective sample size equal to the number of clusters. When
outcomes are perfectly uncorrelated with clusters, the e
ff
ective sample size is equal
to the number of subjects. Almost all cluster-assigned experiments fall somewhere in
the middle of these two extremes.”
38
Experiment cookbook
Druckman p.234+
• Big picture idea
• Short (i.e., few pages) document on the general topic and why it is
relevant to understanding social, political, and/or economic
phenomena
40
• Detailed literature review
• An exhaustive search of research on the topic, and detailed
descriptions of speci
fi
c studies
• It is here that the researcher should identify speci
fi
c gaps in
existing knowledge.
41
Experiment cookbook
Druckman p.234+
• Research question(s) and outcomes
• Given the identi
fi
cation of a gap in existing work, the next step is to
put forth a speci
fi
c question (or questions) to be addressed
• This includes identifying the precise outcome variable(s) of interest
42
Experiment cookbook
Druckman p.234+
• Theory and hypotheses
• Development of a theory and hypotheses to be tested
• Researchers should take their time to derive concrete and speci
fi
c
predictions
• As part of this step, potential mediators and/or moderators should
be speci
fi
ed
• Also, in putting forth predictions, one must be careful to isolate the
comparisons to be used.
43
Experiment cookbook
Druckman p.234+
• Research design
• Discussion of the designs used by others who have addressed
similar questions, and how the proposed design connects with
previous work. In many cases, the ideal strategy is to utilize and
extend prior designs.
• Discussion of how such a design will provide data relevant to the
larger questions.
44
Experiment cookbook
Druckman p.234+
• Research design (cont’d)
• Identifying where the data will come from, which includes:
• Consideration of the sample and any potential biases.
• Detailed measures and where the measures were obtained—that
is, where have they been used in prior studies? The measures
need to clearly connect to the hypotheses, including the
outcome variables and mediators/moderators.
45
Experiment cookbook
Druckman p.234+
• Research design (continued more)
• In many cases, the design may be too practically complex (e.g.,
number of experimental conditions relative to realistic sample size),
and decisions must be made on what can be trimmed without
interfering with the goal of the study.
• For original data collection, pre-tests of stimuli, question wordings,
etc., are critical to ensure the approach has content and construct
validity.
• Issues related to internal and external validity should be discussed.
46
Experiment cookbook
Druckman p.234+
• Data collection document
• If the project involves original data collection, a step-by-step plan
needs to be put forth so as not to later forget such details as
recruitment, implementation, etc.
47
Experiment cookbook
Druckman p.234+
• Data analysis plan
• There needs to be a clear data analysis plan—how exactly will the
data be used to test hypotheses? The researcher should directly
connect the design and measures to the hypotheses.
• This often involves making a table with each measure and how it
maps on to speci
fi
c hypotheses.
48
Experiment cookbook
Druckman p.234+
• Then
• Do the experiment
49
Experiment cookbook
Druckman p.234+
Next time
• More on speci
fi
c experimental designs
• Take a look at the readings — choose chapters that are interesting to
you
• Assignment 1!
• Due Sunday
50