Evaluating Public Policies using Experiments

Using Social Experiments to Evaluate Policies
Webinar
Prof. Marcos Vera-Hernández
UCL & IFS
m.vera@ucl.ac.uk
marcos.verahernandez@gmail.com

The Royal Swedish Academy of Sciences has decided to award the
Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred
Nobel 2019 to:
• Abhijit Banerjee
Massachusetts Institute of Technology, Cambridge, USA
• Esther Duflo
Massachusetts Institute of Technology, Cambridge, USA
• Michael Kremer
Harvard University, Cambridge, USA
• “for their experimental approach to alleviating global poverty”

Why?
Social experiments are a tool to ascertain whether social programmes/
government policies work
We can know what policies are better and hence alleviate
poverty by choosing the most cost

Programme for the webinar
• Why do we need social experiments?
• What is a social experiment?
• Are social experiments an academic curiosity?
• Steps to conduct a social experiment
• Basic analysis of a social experiment
• Deviation from the basic randomisation design
• Efficacy vs. Effectiveness studies
• Advantages and potential problems of social experiments

Why do we need social experiments?
Or why a naive evaluation method does not work

Example of a Social Programme
Conditional Cash Transfer Programme
• Started mid 1990s in Brazil and Mexico, currently in more than 20 countries
• Mothers receive a monthly cash transfers:
• if her children attend school and
• If they are up to date with preventive health care

Example of a Social Programme
Conditional Cash Transfer Programme
• Programme is implemented at municipality level
• Mothers receive a monthly cash transfers:
• if her children attend school and
• If they are up to date with preventive health care
• Remember CCT = Conditional Cash Transfer

Naive evaluation of a CCT
• A government decides to implements a CCT programme
• It will implement the CCT in some towns and not others, so to be
able to compare the outcomes of towns with and without the
programme
• What towns will the Minister of Social Protection choose to
implement the CCT programme?

• Scenario A: Power-hungry minister
The Minister’s dream is to become President in the next election
The Minister wants the CCT programme to look successful, even if it
is not
The Minister implements the CCT programme in the towns with the
best mayors

Naive Evaluation of a CCT
The sample of towns (each mayor represents his/her town)
Bad Mayors = non-CCT towns
Good Mayors = CCT towns
In the power-hungry minister scenario, the majority of orange (best mayors) would have
been CCT towns

• The government implements the CCT for two years
• At the end of the two years, it compares school attendance rates of towns with and
without the CCT
• It finds that school attendance rates are higher in the towns with the CCT

• The Minister claims that the CCT programme was a success because it increased school
attendance rates
• However, others might be sceptical:
• CCT Towns –> Best Mayors -> The had recruited the best headteachers
• CCT Towns –> Best Mayors -> They had taken advantage of other government
programmes (i.e. school free lunch)

• Comparing CCT vs. Non-CCTs towns
overestimates the CCT programme
effect.
• Because the comparison includes
the “best mayor effects.”
• Even if the programme had been
completely unsuccessful, the
comparison would have made it
look successful.
Unfortunately, we will never know because nobody knows for certain how large the “best mayor
effects” is. Moreover, the Minister might never acknowledge that the programme was implemented
in the towns with the best majors
0%
10%
20%
30%
40%
50%
60%
70%
80%
CCT Towns Non-CCTTowns
Schooling Attendance Rates
Baseline level Best Mayors effect Effect of CCT Program

• Scenario B: A Benevolent Minister
The Minister is really worried about the welfare of the poorest
The Minister instructs their civil servants to implement the CCT in the
poorest municipalities
There is data on municipality poverty but it is 3 years old. Civil
servants can use their subjective knowledge on establishing which
are the poorest municipalities

• The government implements the CCT for two years
• At the end of the two years, it compares school attendance rates of towns with and
without the CCT
• It finds that school attendance rates are lower in the towns with the CCT

• The Leader of the Opposition claims that the Benevolent Minister should be sacked
because the CCT programme did not deliver higher school attendance rates
• But we know that:
• Benevolent Minister -> CCT Towns –> Poorest Towns-> Low school attendance rates

• Comparing CCT vs. Non-CCTs towns
underestimates the CCT programme
effect.
• Because CCT towns are poorer (have
less wealth) so their attendance in the
absence of the programme would
have been lower than non-CCT towns
• If the programme had not been
implemented, the attendance rate
would have been even lower in CCT
towns
Unfortunately, we will never know because nobody knows the effect of poverty/wealth on attendance
rate (nobody knows how big the orange parts are). The Benevolent Minister is frustrated because she
knows that her CCT programme could have been successful, but she cannot show it.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Baseline level Effect of Wealth Effect of CCT Program

Naive evaluation
• We have seen that when politicians / civil servants choose who receive a programme, a
naive comparison of treated/untreated towns/individuals will give biased results.
• The same problem occurs if the programme is up there and people decide whether to take-
up the programme or not.
• We need a tool to evaluate social programmes and prevent this bias from happening.
• Experiments are an excellent tool to do this.

• Take a sample of individuals (towns, or firms).
• Flipping a coin (or using a computer program),
allocate them to treatment and control
• Treatment -> receives the intervention
• Control -> does not receive the intervention
• Instead of having a power-hungry or a benevolent
minister choosing what town that receives the
CCT, we will use a coin.
• Social experiment = Randomised Control Trial of a
social programme

Social experiment
• Why does replacing the minister with the coin makes a difference?
• If you divide the towns with a coin, you will get a very similar number of
“good mayors” or “poor towns” in the heads (CCT towns) than in the
tails (non-CCT towns).

Social experiment
The sample of towns (each mayor represents his/her town)
Tails = non-CCT towns
Heads = CCT towns
In the power-hungry minister scenario, the majority of orange (best mayors) would have
been CCT towns

Power-hungry minister Social Experiment
Thanks to the randomization of the social experiment, the share of ”best majors” is the
same in the CCT and non-CCT towns. Hence, when we compare attendance rates of
treatment and control towns, the only difference is the green, the true effect of the CCT
programme. That is what we want!
0%
10%
20%
30%
40%
50%
60%
70%
80%
0%
10%
20%
30%
40%
50%
60%
70%
80%

Social experiment
• Not only one factor (how good the major is, but many others such as school quality,
household wealth, distance to school, etc) will be important for the outcome of
interest: school attendance.
• Thanks to the randomization, all factors/variables that are important for the outcome
of interest will be equally distributed in treatment and control. This is because the
only difference between the two groups is the flip of a coin.
• Oh well the other difference is that the treatment will receive the intervention , and
not the control.
• Hence, by comparing treatment and control, we can estimate the difference that the
programme makes ( i.e. the effect of the programme). So, the ”naive” comparison
works if the allocation to the programme is random (will not work if it is not random)

Are experiments an academic
curiosity?

§ The Job Training Partnership Act (JTPA), US
§ Randomized trials of training programs for the disadvantaged
§ Very credible evaluation found no (or possibly negative) effects for youths; youth training
budget subsequently cut substantially
§ The Self-Sufficiency Project (SSP), Canada
§ Large randomized trials of wage subsidies for income-assistance recipients
§ Very credible evidence that a sufficient subsidy can induce some income-assistance
recipients to find and hold a fulltime job
§ Influential in welfare-reform in several countries
§ The Employment, Retention and Advancement (ERA) demonstration
§ Support and incentives for those in work to encourage them to retain jobs and advance
§ First large-scale RCT in UK in social sciences
Example of experiments to evaluate interventions

§ Oregon Health Insurance Experiment, US
§ Experiment conducted in the State of Oregon
§ Health Insurance given to poor individuals
§ By means of a lottery, not enough money for everyone eligible
§ Conditional Cash Transfers in many developing countries
§ PROGRESA (Mexico), Red de Proteccion Social (Nicaragua)
§ More generally, many evaluations in developing countries are done through
randomised experiments, as the 2019 Nobel Prize in Economics attest to.
Example of experiments to evaluate interventions

Steps to conduct an
experiment

1. Design the intervention that is going to be tested, and the reference
population
2. Design the experiment (individual/cluster, required sample, logistics)
3. Ethical approval
• Also a good idea to register the experiment in an Experimental Registry, i.e.
• https://www.socialscienceregistry.org/
4. Communicate/agree with stakeholders the process of randomisation (private
or public as in PROGRESA)
5. Collect a baseline sample (not always necessary)
• This is the background information about individuals/towns, etc.
• Interviewers visiting individual/households but other methods are possible (telephone
calls, internet surveys, administrative databases)
• Will probably require individuals consenting to participate in the study
Steps to conduct an experiment

6. Conduct the randomisation
7. Implement intervention in the Treatment group
8. Collect data post/during intervention:
• Outcomes post/during intervention (most important)!!!
• Other info that helps you to explain how the impact took place (intermediate outcomes)
or that helps you explain the lack of impact (be prepared for all possible outcomes)
• Background information (if baseline was not collected)
9. Undertake analysis
10. Present the results to policy makers and stakeholders
11. Register the results in the Experimental Registry
12. Write report, publications, etc.
Steps to conduct an experiment

Basic analysis of an experiment

Checking that the randomization worked well
We said that:
“Thanks to the randomization, all factors/variables that are important for the
outcome of interest will be equally distributed in treatment and control. This is
because the only difference between the two groups is the flip of a coin”
We can check that in the data: there should not statistically significant differences
in background characteristics between Treatment and Control

Checking that the randomisation worked well
• We can only do the test between Treatment and Control with variables that
we have collected or that are available in the databases that we are using
for the study
• But the idea is that there will be no differences either in those variables
that are not available. The randomisation should have “equalised” the
characteristics of treatment and control, indepedently of whether we
observe them or not.
• Even if the randomization has worked well, we expect to see one statistically
significant difference every 20 variables which are tested

Checking that the randomization worked well
• We can only use variables not affected by the intervention (age,
parents’ education, etc)
• The best is to use the same outcome vbles collected at baseline.

Practical computation of the impact of the intervention
𝑦" = outcome variable of observation i i.e. school attendace of child i
𝑦6 = average of outcome variable in the treatment group
𝑦9 = average of outcome variable in the control group
Estimate of the Impact = 𝑦6 - 𝑦9
Graphically…

Social experiment data (with unfeasible disaggregation) Social experiment data as collected
The data collected will not tell us what share of the attendance rate is due to “the best mayors effect.” But it will tell us
the difference between treatment and control, which corresponds with the programme effect!
0%
10%
20%
30%
40%
50%
60%
70%
80%
0%
10%
20%
30%
40%
50%
60%
70%
80%
AttendanceRate
𝑦6 = average of outcome variable in the treatment group = 60%
𝑦9 = average of outcome variable in the control group = 50%
Estimate of the Impact = 𝑦6 - 𝑦9 = 10%

• But in practice, we obtain exactly the same numerical value if estimate the simple univariate
regression:
𝑦" = 𝛼 + 𝛽𝑇" + 𝜀,
where 𝑇" = 1 if observation i is in the Treatment Group, and
𝑇" = 0 if observation i is in the Control Group
Estimate of 𝛽 = Estimate of the Impact = 𝑦6 - 𝑦9
The advantage of using the regression is that you will also obtain the Standard Error
and p-values for statistical significance

It is standard to include some other covariates, 𝑋", in the regression. So you use multivariate
regression.
𝑦" = 𝛼 + 𝛽𝑇" + 𝛾𝑋" + 𝜀,
The advantage is that usually, you gain precision in the estimate of 𝛽 (=the confidence interval
for 𝛽 gets narrower).
To gain precision, it is fantastic if you include as covariate the outcome variable measured at
baseline
It is very important that the 𝑋" variables cannot be affected by the intervention.
Beware, it is potentially very different a multivariate regression if 𝑇" is randomized that
if not. If it is not randomized, it potentially has the same problems as the “naive
approach.”

• Not enough that the estimate of 𝛽 is positive (or negative). You need that it is also statistically
different from zero at an acceptable confidence level (usually 95% but sometimes 90%).
• This will be “pop up” from the statistical/econometric software directly.
• However, the sample will need to be large enough for the experiment to have enough power.
We will not be looking at these methods. If interested, I discuss the methods to choose
sample size in this paper: https://www.ifs.org.uk/publications/7844

Deviations from the basic
randomisation design

• Cluster level means that if anyone from a group is allocated/offered treatment,
the rest of people from the same group will also be
• For instance, there are treatment and control towns. Anyone eligible in a
treatment town gets the treatment. No one eligible from the control towns
gets the intervention.
• Other examples:
• Districts or provinces
• Schools
• Hospitals or Primary Health Care centres
Individual vs. Cluster level treatment

• Cluster based allocation requires more sample than individual
based allocation
• Also, we do not need only more sample, but also enough
number of “groups” (villages, schools, etc), which means higher
transportation costs, etc.
• More sample usually means more costs but sometimes it is not
feasible or desirable to do individual randomization.
Individual vs. Cluster level treatment

Oversubscription
• Some interventions are very popular
• Demand exceeds available slots
• A lottery can be done on the group of people that applied for the
intervention
• It is also a good way to justify the need to randomise
• A disadvantage is that, probably, those individuals who are
randomised-out will look for similar programmes and try to register
• Very important to keep in mind when designing the questionnaires

Stagered implementation
• Politicians are sometimes against randomization
• However, there are many occasions where the available funds are
not enough to implement the programme widely
• So, the programme can be first implemented in a set of towns, and
later on add the other set of towns
• The important thing is that the towns that go first or second
must be random
• You are randomizing the order of implementation
• Because all towns will be treated at the end, it will be difficult to
evaluate the long-term impact of the programme

Within-group randomization
• Example: Evaluate a new computer system for hospitals
• All hospitals want to participate… difficult to leave one out!
• An alternative:
• In some hospitals (randomly chosen) implement it in Cardiology
• In another group of hospitals (also randomly chosen), implement it in Internal
Medicine
• All hospital managers are happy because they all got the programme
• Compare the performance of the Cardiology department in hospitals that
received the intervention in the cardiology dept. vs. hospitals that received the
intervention in Internal Medicine.
• Caveat: posible contamination between departments

Encouragement design
• Useful when we cannot exclude anyone from the programme (for instance, free for
all childcare)
• We take a sample, and randomly divided in “T” and “C”. We decrease the
participation costs for those chosen as “T”:
• Incentive for participating
• A letter providing them with info to participate
• To estimate the impact, you have to use Instrumental Variables (see the course
materials).
𝑦" = 𝛼 + 𝛽𝑃" + 𝜀,
𝑃" = 1 if observation i participated, and
𝑃" = 0 if observation i did not participate
We would use the Encouragement variable which is random as instrument.
This method will only give us the impact of the intervention for those who participated thanks to receiving the
encouragement.

Efficacy study vs. Effectiveness

Efficacy study
• The intervention is implemented at its best
• Very well trained and capable staff
• Ideal population setting
• Under very good supervision
• Usually not implemented by the government, but by an academic or NGO
• Sometimes, it includes several variations of the intervention to see which one works
better
• The question of interest is: if the intervention is implemented very well,
will it work?
• Usually, it is researcher initiated, and running at a small scale
• But this might not represent how the program will be implemented in real life when it
is run at scale by the government (effectiveness)
• It is a proof of concept

Effectiveness study
• The intervention is implemented under real life
conditions:
• By the staff that will implement it in real life
• With real life training
• Under “normal” supervision
• For the population/areas that will benefit of the
intervention
• Usually implemented by the government

Advantages and potential problems
with social experiments

Advantages
• If the experiment works well, the estimates are valid without the need of
further assumptions (not the case for the other methods that we will see)
• Hence, the results are difficult to criticise and they have a good chance of being
influential
• Method is easy to understand, it increases the chances of being used for policy
• The case of PROGRESA in Mexico

Potential problems
• It is useful to classify the problems that experiments can have in:
• Problems with the internal validity of the experiment
• The result of the experiment is not correct
• Problems with the external validity of the experiment
• The result of the experiment is correct, but might not be very informative on what
the result would be in other contexts/populations, etc.

§ Non-compliance can arise from subject behaviour.
§ Substitution bias: Individuals in the control group obtain the treatment (or a
close substitute) outside the scope of the study
§ Dropouts: Some individuals in the treatment group do not undertake or
complete the treatment
§ Non-compliance can come from programme administrators
§ Staff are unfamiliar with or poorly trained in evaluation methods or if they
have ethical concerns about denying treatment to the control group.
§ Sample Attrition
§ We cannot collect the outcomes of certain individuals because we lose them (they change
address,…)
Threats to internal validity

• The individuals might know that they are part of an evaluation
• This might change the behaviour of the treatment group (they might exert more
effort). This is called Hawthorne effects
• Or the control group might feel discouraged that they were not chosen, also
called John Henry effects
Threats to internal validity

Threats to the external validity
§ Over-generalization
§ The impact of the treatment on the group studied may not be informative about
the impact on other groups.
§ If the experiment was conducted in one region, the results might not apply to other
regions
§ If it was a efficacy study, the results of the effectiveness study might be quite
different
§ General equilibrium effects and displacement effects.
§ Spillovers might occur with a full programme that do not occur with a small trial
§ For instance, in a small trial in which cash is give to a few community members will not increase
prices, but price increase might happen when the program is rolled out and everyone receives
cash.

Threats to the external validity
§ Randomization bias:
• Individuals who are willing to participate in a trial (and possibly be randomly
assigned to a control group) might not be representative of those individuals
who would apply to participate in the subsequent programme (with certain
treatment)
• Think of the ads in the metro asking for volunteers for medical trials. Are those
who accept representative of the average individual in the population?
§ Choice of outcome variable
• Failure to measure a correct or comprehensive outcome
• For example, short effects may not be indicative of long run effects

• Experiments are considered the gold standard on the evaluation of
interventions
• However, it is important to realize that they are not infallible and can have
problems
• When experiments are not feasible or desirable, there are other methods
to evaluate the impact of a policy (i.e. difference-in-differences, regression
discontinuity), which are also part of this course.

Many thanks!
Feel free to contact me at:
m.vera@ucl.ac.uk
marcos.verahernandez@gmail.com

Evaluating Public Policies using Experiments

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Evaluating Public Policies using Experiments

Similar to Evaluating Public Policies using Experiments (20)

Recently uploaded

Recently uploaded (20)

Evaluating Public Policies using Experiments