This document provides an overview of randomized assignment as a treatment-comparison design for impact evaluations. It discusses:
- The internal and external validity of randomized assignment designs
- How randomized assignment produces statistically equivalent treatment and comparison groups
- The steps involved in randomized assignment, including defining the eligible population, selecting an evaluation sample, and randomly assigning units to treatment and comparison
- Considerations for the level of randomization and risks like spillovers and non-compliance
- How to estimate impacts by comparing average outcomes between treatment and comparison groups
- Checklists for ensuring a valid randomized assignment design
- Potential strengths and limitations of randomized assignment evaluations
- Different types of randomized controlled trial designs based on level of assignment and
4. Randomized assignment of
treatment
Gold standard of impact evaluation
• Uses a random process, or chance, to decide who is
granted access to the program and who is not
• Every eligible unit (for example, an individual,
household, business, school, hospital, or community) has
the same probability of being selected for treatment by a
program
11/12/2022 4
Commonly referred to as randomized control trials, randomized evaluations,
experimental evaluations, and social experiments
5. Why randomized assignment is also a fair
and transparent way to assign scarce program resources?
11/12/2022 5
6. 6
11/12/2022
• Randomized assignment is a rule that can be easily explained by program
managers
• Understood by key constituents
• Considered fair in many circumstances
• Shields program managers from potential accusations of favoritism or
corruption
7. A number of programs routinely use lotteries as a way to
select participants from the pool of eligible individuals
Primarily because of their advantages for administration
and governance
A valuable operational tool
Case
1
Case
2
Africa
11/12/2022 7
8. 8
11/12/2022
N
Transparent and fair way of allocating the
benefits among applicants??
Case
1
Short-term employment
opportunities, mostly to clean or
rehabilitate roads
Given the attractiveness of the benefits, many
more youth applied than places where available
In Côte d’Ivoire, following a
period of crisis, the government
introduced a temporary
employment program
A Public Lottery Method
9. 9
11/12/2022
• If N spots were available
for the program, the
applicants having drawn
the lowest numbers were
selected for the program.
• The lottery process was
organized separately for
men and women.
• Well accepted by
participants
• Helped provide an image of
fairness and transparency to
the program in a post-
conflict environment
Case
1
11. When we randomly assign units to
treatment and comparison groups, that
randomized assignment process in
itself will produce two groups that have a high
probability of being statistically identical
As long as the number of potential units to
which we apply the randomized assignment
process is sufficiently large.
11/12/2022 11
13. Population of eligible unit= 1000 people
Treatment Group: 500 people Comparison Group: 500
people
Random Assignment
40% women
40% women
40% women
20% Blue eyes
20% Blue eyes
20% Blue eyes
11/12/2022 13
14. • In general, if the population of eligible units is large enough, then the
randomized assignment mechanism will ensure that any characteristic of the
population will transfer to both the treatment group and the comparison group.
• Just as observed characteristics such as sex or the color of a person’s eyes
transfer to both the treatment group and the comparison group,
then logically
• characteristics that are more difficult to observe(unobserved variables), such
as
motivation, preferences, or other personality traits, would also apply equally to
both treatment and comparison groups
15. Having two groups that are similar in every way guarantees
that the estimated counterfactual approximates the true
value of the outcome in the absence of treatment, and that
once the program is implemented, the estimated impacts
will not suffer from selection bias.
SO WHAT?
15
11/12/2022
16. BEFORE A PROGRAM STARTS After we launch the program
• Baseline data from our evaluation
sample
• Verify that in fact there are no
systematic differences in observed
characteristics between the treatment
and comparison groups
• If we observe differences in outcomes
between the treatment and
comparison groups,
• We will know that those differences
can be explained only by the
introduction of the program,
• Since by construction the two groups
were identical at the baseline, and are
exposed to the same external
environmental factors over time.
In this sense, the comparison group controls for all
factors that might also explain the outcome of interest.
11/12/2022 16
17. We can be confident that our estimated
impact constitutes the true impact of the
program
Observed
factors
Unobserved
factors
Eliminated factors that might
otherwise plausibly explain the
difference in outcomes
11/12/2022 17
18. To estimate impact of Program
Outcome under treatment - our estimate of the counterfactual
the mean outcome of the
randomly assigned treatment
group
the mean outcome of the
randomly assigned comparison
group
11/12/2022 18
19. In some cases, however, it is not
necessary to include all units in the
evaluation
11/12/2022 19
20. Evaluate the effectiveness of cash bonuses on the probability
that they will get their children vaccinated
Population of eligible units= 1 million mothers
Evaluation Sample = 1000 mothers
Random Selection
Treatment Group Comparison Group
Random selection preserves
characteristics
11/12/2022 20
Randomized Assignment
preserves characteristics
21. 11/12/2022
Randomized Assignment as a Program Allocation Rule:
Conditional Cash Transfers and Education in Mexico
21
● The communities and households eligible for the program were determined
based on a poverty index created from census data and baseline data collection.
● About two-thirds of the localities (314 out of 495) were randomly selected to
receive the program in the first two years, and the remaining 181 served as a
comparison group before entering the program in the third year.
Progresa program, now called
“Prospera,”
22. 11/12/2022
Randomized Assignment as a Program Allocation Rule:
Conditional Cash Transfers and Education in Mexico
22
● Largest increase in enrollment among girls who had completed grade 6
● Likely reason is that girls tend to drop out of school at greater rates as
they get older
● Given a slightly larger transfer to stay in school past the primary grade levels
● These short-term impacts were then extrapolated to predict the longer-term impact
of the Progresa program on lifetime schooling and earnings.
Progresa program, now called
“Prospera,”
24. Internal Validity
Comparison group provides an accurate estimate of the counterfactual, so that we are
estimating the true impact of the program.
BEFORE A PROGRAM STARTS After we launch the program
• Random Assignment
• Two Groups Statistically equivalent
• Differences in outcomes between the
treatment and comparison groups
• Explained only by the introduction of
the program
• Two groups exposed to the same
external environmental factors
11/12/2022 24
26. External Validity
• Evaluation sample should accurately represent the population of eligible units
• The results of the evaluation can then be generalized to the population of eligible
units
• Use of random sampling to ensure that the evaluation sample accurately reflects the
population of eligible units
• So that impacts identified in the evaluation sample can be extrapolated to the
population
11/12/2022 26
27. Randomized assignment of
treatment
Impact evaluation
Random sample of population
unit
Internally valid estimates of impact Estimated impacts may not be
generalizable to the population of
eligible unit
Comparison group may not be valid, thus
jeopardizing internal validity
Sample would be representative
11/12/2022 27
28. When the eligible population
is greater than the number of
program spaces available
When a program needs to
be gradually phased in until
it covers the entire eligible
population
When Can
Randomized
Assignment Be
Used?
1.
2.
1.
2.
11/12/2022 28
29. 29
11/12/2022
External Validity Internal Validity
1. Define eligible units 2. Select the evaluation
sample
3. Randomize assignment
to treatment
Steps in Randomized Assignment to Treatment
30. Define the units that are eligible for
the program
2
3
1
• Depending on the particular program, a unit could be a person, a
health center, a school, a business, or even an entire village or
municipality.
• The population of eligible units consists of those for which we are
interested in knowing the impact of our program.
• For example, if we are implementing a training program for primary
school teachers in rural areas, then primary school teachers in urban
areas or secondary school teachers would not belong to our
population of eligible units.
11/12/2022 30
31. Select the evaluation
sample
2
3
1
• Mainly to limit data collection costs
• Imagine an evaluation in which the population of eligible units includes tens of
thousands of teachers in every school in the country
• Need to collect detailed information on teacher pedagogical knowledge and
practice
• Interviewing and assessing every teacher in the country costly and infeasible.
• Based on power calculations, we might determine that to answer evaluation
question, it is sufficient to take a sample of 1,000 teachers distributed over200
schools.
• As long as the sample of teachers is representative of the whole population of
teachers, any results found in the evaluation will be externally valid
11/12/2022 31
32. Randomize assignment
to treatment
2
3
1
• Form the treatment and comparison groups from the units in the
evaluation sample through randomized assignment.
• It is important to decide on the rule before generating the random
numbers
• Whether we use a public lottery, a roll of dice, or computer-
generated random numbers, it is important to document the process
to ensure that it is transparent.
11/12/2022 32
33. At What Level Do We Perform Randomized
Assignment?
Household
Individual Business
Community Region
In general, the level at which units are randomly assigned to treatment
and comparison groups will be greatly affected by where and how the
program is being implemented.
11/12/2022 33
34. Level of Randomized
Assignment Can become difficult to
perform an impact evaluation
Eg- level of regions or provinces in a country
Because the number of regions or provinces in most countries is not
sufficiently large to yield balanced treatment and comparison groups
EXAMPLE:
6
PROVINCES
Treatment Group – 3 Province
Comparison Group – 3 Province
Insufficient to ensure that the baseline characteristics of the
treatment and comparison groups are balanced
11/12/2022 34
35. Level of Randomized Assignment
35
11/12/2022
• For randomized assignment to yield unbiased estimates of impact, it is
important to ensure that time-bound external factors (such as the weather or
local election cycles) are on average the same in the treatment and comparison
groups.
• As the level of assignment increases, it becomes increasingly unlikely that these
factors will be balanced across treatment and comparison groups
36. Two particular types of risks to consider when choosing the level of
assignment
Spillovers
Occur when the treatment
group directly or indirectly
affects outcomes in the
comparison group (or vice
versa)
1
Imperfect compliance
Occurs when some members of the
comparison group participate in the
program, or some members of the
treatment group do not
2
11/12/2022 36
37. Estimating impact under randomized
assignment
Treatment Average (Y) for the treatment group = 100
Comparison Average (Y ) for the comparison group = 80
Impact 100 - 80 20
=
The impact of the program is simply the difference between the average outcome (Y)
for the treatment group and the average outcome (Y) for the comparison group
11/12/2022 37
38. • Are the baseline characteristics
balanced?
• Compare the baseline
characteristics of the treatment
group and the comparison group.
Checklist 1
• Has any noncompliance with the
assignment occurred?
• Check whether all eligible units have
received the treatment and that no
ineligible units have received the
treatment.
• If noncompliance has occurred, need
to use the instrumental variable
method
Checklist 2
Checklist
Checklist: Randomized Assignment
11/12/2022 38
39. • Are the numbers of units in the
treatment and comparison groups
sufficiently large?
• If not, combine randomized
assignment with difference-in-
differences
Checklist 3
• Is there any reason to believe that
outcomes for some units may somehow
depend on the assignment of other
units?
• Could there be an impact of the
treatment on units in the comparison
group?
Checklist 4
Checklist
Checklist: Randomized Assignment
11/12/2022 39
40. 01 Well-designed RCT tells the
clearest causal story
possible
Determine areas and populations
at risk
Strengths of Randomization
03 RCTs are a fair and transparent means of
program assignment
02
RCTs are easy to analyze, as results are
driven by the difference in mean
outcomes between treatment and
control
11/12/2022 40
41. 04
Does not have to affect an
entire project or program
Strengths of Randomization
05
Well-designed RCTs can open the
black box of how impact happens
11/12/2022 41
Even when the causal chain of a program is too complex to
unravel, RCTs can still offer insights on the conditions under
which impacts occur
42. 1
Evaluating interventions
where adoption/participation
is far lower than expected
1
Researcher capture
3
Researchers may be more
interested in carrying out a
study that produces an
academic publication
Underpowered evaluations
2
In reality, the actual power of
many RCTs is only around
50%
Things that can go wrong with randomized controlled trials
11/12/2022 42
43. Things that can go wrong with randomized controlled trials
1
Not getting buy-in or sufficient
oversight for randomization
4
Self-contamination
6
Occurs when the control
group is exposed to the
same intervention or
another intervention that
affects the same outcomes
Getting the standard errors wrong
5
Studies that do not adjust standard
errors for clustering may
incorrectly conclude that an impact
is significant
11/12/2022 43
44. 1
Reporting biased findings
Many studies focus unduly on
significant coefficients, often
the positive ones, discounting
“perverse” (negative) and
insignificant results
7
Measuring the wrong outcomes
A common reason that important
outcomes are not measured is that
unintended consequences, which
should have ideally been captured in
the theory of change, were ignored.
8
Things that can go wrong with randomized controlled trials
11/12/2022 44
45. 1
Looking at the stars
• Researchers can miss the fact that a very significant impact is
actually really rather small in absolute terms and too little to be of
interest to policy makers.
• Where there is a clear single outcome of the intervention, then cost-
effectiveness is a good way of reporting impact, preferably in a table
of comparisons with other interventions affecting the same
outcome.
9
Things that can go wrong with randomized controlled trials
11/12/2022 45
46. Types of Randomized Controlled Trial Designs
Level of assignment
1
Different approaches to
random assignment
2
Type of treatment
combinations assessed
3
11/12/2022 46
47. Simple RCT
• Unit of assignment is the
same as the unit of treatment
and measurement
• An example could be a
business development
program for small and
medium-sized enterprises in
which eligible enterprises
are randomly assigned to
treatment and control
groups.
• The outcomes could include
firm level sales, profitability,
and employment
Cluster RCT
• Unit of assignment contains
multiple treatment units
• In practical terms, it is more
feasible to randomly assign a
service with shared
community infrastructure,
such as electrification or
water supply at community
or block level, rather than at
a household level.
• Help to contain spillover
effects and contamination.
11/12/2022 47
Level of
Assignment
48. Oversubscription
• When there is excess demand for a program or the eligible
population exceeds that which can be served with available
resources
• Then random selection, such as a lottery, can be used
determine which of the eligible applicants are included and
which are in the control group
• Random selection into the program can be the fairest and
most transparent means of deciding who gets in.
11/12/2022 48
Different
Approaches
1
49. Altered threshold randomization
• Enables random assignment by slightly altering the eligibility
threshold
• By relaxing the threshold, it is possible to identify a larger
eligible population than can be treated, within which
treatment is assigned randomly.
• For example, if the eligibility criterion for a nutrition program
is households with children aged up to 24 months, this
threshold could be raised to 30 months.
11/12/2022 49
Different
Approaches
2
50. Pipeline or step-wedged designs
• Randomize the order of treatment, rather than the treatment
itself
• All units of assignment will receive the program over time
• Time of entry to the program that is randomly assigned
• For example, if budgetary and logistical constraints prevent
the immediate nationwide rollout of a program, it may be
possible to randomly select units that will receive the
program during the first stage
11/12/2022 50
Different
Approaches
3
51. Encouragement designs
• Used for programs and policies that are universally available
but not universally adopted
• Treatment group is provided with an encouragement to take
up the intervention
• Rather than randomly assigning access to the program,
researchers randomly assign an encouragement.
11/12/2022 51
Different
Approaches
4
52. Encouragement designs
• Encouragement can be a small incentive, letter, postcard, or
phone call that reminds people of their eligibility and details
steps to enroll in the program.
• Effective encouragement leads to higher take-up of the
program in the treatment group than in the control group.
• It is the impact of receiving an encouragement to take up the
program that is evaluated (and its indirect effect on program
take-up), rather than the direct impact of the program itself.
11/12/2022 52
Different
Approaches
4
53. Stratification or prior matching
• Can be used to ensure balance with a smaller sample size
• Matches units (e.g., communities) into pairs based on
observed characteristics, randomly assigning one community
of each pair into the treatment group and the other to control.
• Example, two communities which are particularly remote, or
large, or with minority ethnic populations, prior matching
ensures that one of these goes in the treatment group and the
other in control.
• Ensures that both treatment and control have the same
proportion of units for variables used for the stratification
(e.g., low/medium/high income, rural/urban, poor/nonpoor),
and can also help to increase power and facilitate subgroup
analysis
11/12/2022 53
Different
Approaches
5
54. Multiple treatment arms and treated control groups
Helps to evaluate multiple research questions that would
otherwise require several trials
Two main advantages in comparison to separate trials:
1) a reduction in administrative burden
2) improved efficiency by using shared information.
11/12/2022 54
Different
Approaches
6
55. • A special type of a multiple arm study in which one arm
receives multiple interventions.
• For example, one arm gets intervention A, another gets
intervention B, the third gets both A and B, and the fourth is
untreated.
• Allow testing of whether different interventions are
complements or substitutes
• Often claimed that there is complementarity between
different interventions (e.g., microfinance and business
development, input subsidies and extension services, and
improved water and hygiene education)
• Factorial designs allow that claim to be tested
Multiple
treatment arms
and treated
control groups
11/12/2022 55
Different Approaches
7
Factorial designs
56. • Crossover designs are related to factorial designs, but
treatments are sequential, rather than simultaneous.
• This means that the third arm gets B followed by A, and an
additional arm might get A followed by B, rather than a
factorial design where A and B are given together.
• This can test if intervention sequencing matters, but
requires more treatment arms than a factorial design.
Multiple
treatment arms
and treated
control groups
11/12/2022 56
Different Approaches
8
Crossover designs
57. Parallel Design
• Most commonly used study design
• Subjects are randomized to one or more study arms and each
study arm will be allocated a different intervention
• After randomization each participant will stay in their assigned
treatment arm for the duration of the study
• Parallel group design can be applied to many diseases and allows
running experiments simultaneously in a number of groups, and
groups can be in separate locations.
• The randomized patients in parallel groups should not
inadvertently contaminate the other group by unplanned co-
interventions or cross-overs.
11/12/2022 57
Different
Approaches
9
58. Standardized Mean difference (SMD)
58
11/12/2022
• MD is the difference in the means of the treatment group and the control group,
• SMD is the MD divided by the standard deviation (SD), derived from either or both of the
groups
• Depending on how this SD is calculated, the SMD has several versions such, as Cohen's d
Glass's Δ , and Hedges’ g
• When the outcome is measured in different units across trials, then we use SMD to
combine the outcomes in the meta-analyses
• SMD serves as an easy way to judge the magnitude of the effect
general rules of thumb described by Cohen that suggest that an SMD of 0.2 represents a
“small” effect, an SMD of 0.5 represents a “medium” effect, and an SMD of 0.8 represents
a “large” effect
60. Recall of HISP
• Objective of HISP To reduce what poor households spend on
primary care and medicine and ultimately to improve health
outcomes.
• Although many outcome indicators could be considered for the
program evaluation, government is particularly interested in
analyzing the effects of HISP on per capita yearly out-of-pocket
expenditures (subsequently referred to simply as health
expenditures).
11/12/2022 60
61. HISP
61
11/12/2022
• HISP was rolled out as a pilot, and the 100 treatment villages were selected
randomly from among all of the rural villages in the country,
• So, the treatment villages should, on average, have the same characteristics as
the untreated rural villages in the country.
• Luckily, at the time of the baseline and follow-up surveys, the survey
firm collected data on an additional 100 rural villages that were not
offered the program.
• Those 100 villages were also randomly selected from the population of rural
villages in the country
62. HISP
62
11/12/2022
• Thus the way that the two groups of villages were chosen ensures that they have
statistically identical characteristics, except that the 100 treatment villages
received HISP and the 100 comparison villages did not.
• Given randomized assignment of treatment, we are quite confident that no
external factors other than HISP would explain any differences in outcomes
between the treatment and comparison villages
63. Validation of Assumption?
Whether eligible households in the treatment and
comparison villages have similar characteristics
at baseline?
63
11/12/2022
68. 68
11/12/2022
The only statistically significant differences
are
• Number of years of education of the head
of household
• Distance to hospital
Those differences are small (only 0.16 years,
or less than 6 percent of the comparison
group’s average years of education,
2.91 kilometers, or less than 3 percent of the
comparison group’s average distance to a
hospital)
• Even with a randomized experiment on a
large sample, a small number of
differences can be expected because of
chance and the properties of the statistical
test.
• Using standard significance levels of 5
percent we could expect differences in
about 5 percent of characteristics to be
statistically significant, though we would
not expect the magnitude of these
differences to be large.
71. 71
11/12/2022
Given that we have a valid comparison
group, we can find the impact of the HISP
simply by taking the difference between
the average out-of-pocket health
expenditures of households in the
treatment villages and randomly
assigned comparison villages in the
follow-up period.
The impact is a reduction of US$10.14
over two years.
74. 74
11/12/2022
• Replicating result through a linear regression analysis
yields the same result as t test
• In a multivariate regression analysis we find that the
program has reduced the expenditures of the enrolled
households by US$10.01 over two years, which is nearly
identical to the linear regression result.
75. With randomized assignment, we can be confident that no factors
are present that are systematically different between the treatment
and comparison groups that might also explain the difference in
health expenditures.
Thus the only plausible reason that poor households in treatment
communities have lower expenditures than households in comparison
villages is that the first group received the health insurance program and the
other group did not
75
11/12/2022
76. Properly conducted
randomized controlled trials
ensure balanced characteristics
between those with and
without interventions, so that
differences are only due to the
intervention
Random assignment can be done in
several ways, some of which only
alter the sequencing, eligibility
threshold, or incentives to use an
intervention, rather than overall
project rollout.
Most randomized controlled
trials are cluster designs,
where the unit of assignment
contains multiple treated
units
Key Messages
11/12/2022 76
77. References
77
11/12/2022
• Gertler PJ, Martinez S, Premand P, Rawlings LB, Vermeersch CM. Impact
evaluation in practice. World Bank Publications; 2016 Sep 12.
• White H, Raitzer DA. Impact evaluation of development interventions: A
practical guide. Asian Development Bank; 2017 Dec 1.
• Heard K, O’Toole E, Naimpally R, Bressler L. Real world challenges to
randomization and their solutions. Boston, MA: Abdul Latif Jameel Poverty
Action Lab. 2017 Apr.
• Better Evaluation. Randomized Control Trial (RCT) [Available from:
https://www.betterevaluation.org/fr/node/1125.