Alexis Diamond - quasi experiments

445 views
355 views

Published on

Presentation at "Impact Evaluation for Financial Inclusion" (January 2013)

CGAP and the UK Department for International Development (DFID) convened over 70 funders, practitioners, and researchers for a workshop on impact evaluation for financial inclusion in January 2013. Co-hosted by DFID in London, the workshop was an opportunity for participants to engage with leading researchers on the latest research methods of impact evaluation and to discuss other areas on the impact evaluation agenda.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
445
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Alexis Diamond - quasi experiments

  1. 1. An Introduction to Impact Evaluationin Observational (Non-Experimental) Settings Alexis Diamond Development Impact Department
  2. 2. Goals for this Presentation• To explain key differences between randomized experiments (RCTs) and observational studies• To briefly sketch some of the most important methods of causal inference in observational studies, showing how they might be applied to answer questions in access to finance projects, and offering practical guidance on:  Matching (an estimator, or a tool for designing observational studies?)  Differences-in-differences  Encouragement design (Instrumental variable, or “IV” regression)  Regression discontinuity design  Synthetic control methods 2
  3. 3. Basic concepts• Observational study: comparison of treated and control groups in which the objective is to estimate cause and effect relationships, without the benefit of random assignment. Observational studies are also known as quasi-experiments or natural experiments• In a randomized experiment, random chance forms comparison groups (treatment and control), making groups comparable in terms of both measureable characteristics and characteristics that cannot be measured.• Generally, If assumptions are met, causal conclusions follow— but generally only in randomized experiments do we KNOW assumptions are met; otherwise, assumptions aren’t testable. 3
  4. 4. Experiments (RCTs) vs. observational studies ―That’s not an experiment you have there, that’s an experience.‖ — Sir R. A. Fisher (England, 1890-1962) Because of selection bias—the presence of confoundersReflected in the controversy over Pitt/Khandker and Roodman/Morduch 4
  5. 5. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impactindicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control 5
  6. 6. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impactindicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day 6
  7. 7. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impactindicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day Average for the control group: $4/day 7
  8. 8. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impactindicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day Average for the control group: $4/day Difference = +$1/day: 8
  9. 9. Selection bias: ―perfect implementation‖ How should one think about that result, +$1/day? Does it mean the project has positive impact? 9
  10. 10. Selection bias: ―perfect implementation‖ How should one think about that result, +$1/day? Does it mean the project has positive impact? Impact on who?i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0)1 5 5 ? Treatment ?2 6 6 ? Treatment ?3 4 4 ? Treatment ?4 4 ? 4 Control ?5 2 ? 2 Control ?6 6 ? 6 Control ? 10
  11. 11. Selection bias: ―perfect implementation‖ Avg Treatment Effect for Treated (ATT) = 3 Avg Treatment Effect for Control (ATC) = -1 Avg Treatment Effect (ATE) = +3 – (-1) = 4 i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0) 1 5 5 2 Treatment +3 2 6 6 3 Treatment +3 3 4 4 1 Treatment +3 4 4 3 4 Control -1 5 2 1 2 Control -1 6 6 5 6 Control -1That simple $1/day difference we identified earlier = ATT + BIAS 11
  12. 12. Selection bias: Ignore it at your perilIdentifying impacts requires identifying Y(1) and Y(0) for the same units BIAS can be positive/negative, big/small, observed/hidden… i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0) 1 5 5 ? Treatment ? 2 6 6 ? Treatment ? 3 4 4 ? Treatment ? 4 4 ? 4 Control ? 5 2 ? 2 Control ? 6 6 ? 6 Control ? 12
  13. 13. Observational studies: Are they credible? Yes, but… A judgment-free method for dealing with problems of sample selection bias is the Holy Grail of the evaluation literature, but this search reflects more the aspirations of researchers than any plausible reality… —Rajeev Dehejia, ―Practical Propensity Score Matching‖ • Some have tried to set up tests for observational methods: e.g., “Can method X (matching, regression, IV, etc.) recover the true (experimental) benchmark?” • Such efforts have generally failed to conclusively validate observational studies. 13
  14. 14. Observational studies: Are they credible? Yes, but… History abounds with examples where causality has ultimately found general acceptance without any experimental evidence… The evidence of a causal effect of smoking on lung cancer is now generally accepted, without any direct experimental evidence to support it… At the same time, the long road toward general acceptance of the causal interpretation …shows the difficulties in gaining acceptance for causal claims without randomization. —Guido Imbens, ―Better LATE than Nothing‖ 14
  15. 15. Why bother with observational studies?• Studies that start as perfect RCTs often end as broken RCTs, not “gold-standard” RCTs. These broken RCTs may be better than many observational studies, but there is no bright line distinguishing broken RCTs from observational studies.• Standard RCTs cannot address many important policy issues (i.e., macroeconomic questions, or cases with general equilibrium effects more broadly)• Other issues are difficult to address with RCTs, setting up a trade-off between rigor and relevance. What’s better—the RCT in a lab setting, or the equivalent observational study?• RCTs are often more expensive, time-consuming, and fragile than alternatives—can be high risk and not always strategic. 15
  16. 16. More advantages of observational studies• Sometimes you can use pre-existing data, which has time and cost advantages (though there are clear trade-offs) o Typical out-of-pocket time/cost of a World Bank RCT: > 1 year & $500K o Occasionally they can be done cheaply and easily, especially in a place like India (there are examples where it costs < $50,000) o With administrative data, observational studies may have no (or trivial) out-of-pocket costs, and be completed in days or weeks.• Sometimes you want to apply observational methods to experimental data• Good for hypothesis-generation• Avoids RCT’s ethical considerations 16
  17. 17. Methodology #1: MatchingYou: My clients enjoy big impacts from our bank’s financingCritic: Compared to whom? Where’s the control group?You: Ok, I’ll go find one—and then you’ll see! 17
  18. 18. Methodology #1: Matching You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! Eg: Boonperm/Haughten, “Thailand Village Fund” (2009) Control 1X2:Age Treated Control 2 Control 3 X1: Education 18
  19. 19. Methodology #1: Matching You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! Eg: Boonperm/Haughten, “Thailand Village Fund” (2009) Control 1X2: X2: Control 1Age Age Treated Control 2 Treated Control 2 Control 3 Control 3 X1: Education Rescale X1: Education multiplied by 2 19
  20. 20. Matching: Points to consider• Matching is (unfortunately) as much art as science, and there are more methodological varieties of matching than there are flavors of ice cream• Widespread agreement that matching is, at a minimum, a useful pre-processing step to reduce model dependence. Unfortunately, no consensus on balance tests/diagnostics.• Hugely important benefit of matching is that it is performed “blind to the answer”—comparing favorably with regression• Matching helps with selection bias due to observed variables (confounders)—it does not help with unobserved confounders. For the latter, one can (and should) do sensitivity analysis. 20
  21. 21. Methodology #2: Differences-in-Differences (D-i-D)You: My clients enjoy big impacts from our bank’s financingCritic: Compared to whom? Where’s the control group?You: Ok, I’ll go find one—and then you’ll see!Critic: Too many unobservables. It’s a waste of time.You: Well, can you assume my control group’s growth rate (e.g., near zero), is a good proxy for the treatment group’s counterfactual growth rate (without the loan?) D-i-D: subtract one before/after difference from the other Addresses observed confounders (regression assumptions) & unobserved time-invariant confounders common to treatment and control groups. See Kondo’s work in the Philippines (ADB). 21
  22. 22. Diffs-in-Diffs: Points to consider TreatedIncome Estimated ATET Treated Counterfactual Pre-treatment Control difference Control Before AfterNOTE: Circles are observed, square (counterfactual) is unobserved (imputed). 22
  23. 23. Diffs-in-Diffs: Points to consider• If matching is implausible, why would D-i-D be plausible? Does the parallel trend assumption seem easier to believe?• The parallel trend assumption must hold over the time period, implying composition of two groups should remain constant over time.• D-i-D benefits from “placebo tests” run pre-treatment 23
  24. 24. Methodology #3: Encouragement DesignYou: Well, can you assume my control group’s growth rate (e.g., near zero), is a good proxy for the treatment group’s counterfactual growth rate (without the loan?)Critic: No, also not credible.You: OK, how about a natural experiment? Our FI established additional info kiosks in 100 villages to encourage loan take-up—these villages were not chosen at random, but it was ―practically‖ random. The encouragement (“instrument”, assumed “as good as random”) has an effect (for some) on probability of finance. This method leverages this “exogenous” variation to overcome potential bias from both observed and unobserved confounders. 24
  25. 25. Encouragement Design: Points to consider• Encouragement design requires strong assumptions: o Encouragement must really be random or almost random, and must have no direct effect on impacts (only an indirect effect via treatment) o The encouragement must NEVER discourage take-up (no defiers) o Causal estimates restricted to “compliers” only… (Who?) o Also, for credible results, encouragement had better be effective• Strange quirk: different answers, from different models, can all be “correct” because complier populations may differ• Was popular, now more disparaged in observational work• Again, sensitivity tests are available and should be run 25
  26. 26. Methodology #4: Regression Discontinuity DesignYou: OK, how about a natural experiment? Our FI established additional info kiosks in 100 villages to encourage loan take-up—these villages were not chosen at random, but it was ―practically‖ random.Critic: I don’t buy it. Rollout was in fact strategic, not random.You: Ok, I’ll try again. This bank always provides extra lines of credit at great terms to customers with credit scores above a certain threshold. Let’s compare results for customers just above and below the threshold. Treatment assumed as good as random at the threshold if the discontinuity is sharp. RDD addresses observed and unobserved confounders. What question will the RDD design above answer? 26
  27. 27. Regression Discontinuity Design: Points to consider • Generally considered a very strong design: US Dept of Education classifies it in the same category as RCT • Only informative for those at the discontinuity threshold • No “gaming” the threshold allowed (ideally, the threshold is unknown to the subjects, or outside subjects’ control) • Relatively low statistical power, requiring much larger sample sizes than RCTs or other observational methods. • Watch out for contamination by other treatments at the same discontinuity • Sensitivity tests available to probe plausibility of assumptions 27
  28. 28. Methodology #5: Synthetic control methodCritic: I don’t buy it. It must’ve been strategic, not random.You: Ok, I’ll try again. This bank always offers extra lines of credit at great terms to customers with credit scores above a certain threshold. Let’s compare results for customers just above and below the threshold.Critic: I’m not interested in only a narrow set of borrowers.You: Last try. How about we do an in-depth case-study of a greenfield microfinance institution, asking about the social welfare impact on the neighboring community? Synthetic controls allows inference for a single treated unit. This approach addresses observed and unobserved confounders. 28
  29. 29. Methodology #5: Synthetic control methodEstimating Average Impact on Household Consumption in a Single Village 1000000 800000 600000 400000 200000 1995 2000 2005 2010 year Treated District (Kabil) Synthetic Control District 29
  30. 30. Synthetic controls: Points to consider• Only method allowing for rigorous quantitative causal inference for a single treated unit• Enormous growth in popularity in last 5 years• Particularly well-suited to case-studies exploring program impacts at village/city/state/country level• Requires time-series data and many control units• Placebo tests are available to assess plausibility of critical assumptions 30
  31. 31. Elaborate theories, multiple testsWhen asked what can be done in observationalstudies to clarify the step from association tocausation, Fisher replied: ―Make your theorieselaborate.‖ (Cochrane)This is sage advice, but often misunderstood.Fisher didn’t mean you should make yourtheories and explanations complicated.He meant, when constructing causal hypothesis,envisage as many different consequences of itstruth as possible, and plan observational studiesto discover whether each holds. • Creating/testing elaborate theories is particularly helpful for indirectly testing for hidden biases (unconfoundedness). 31
  32. 32. Final thoughts 32
  33. 33. Final thoughts• Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) 33
  34. 34. Final thoughts• Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is)• Also ex-ante, be clear re treatment, covariates, units, and assumptions. 34
  35. 35. Final thoughts• Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is)• Also ex-ante, be clear re treatment, covariates, units, and assumptions.• Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer. 35
  36. 36. Final thoughts• Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is)• Also ex-ante, be clear re treatment, covariates, units, and assumptions.• Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer.• Run diagnostics/sensitivity tests for unobserved (hidden) bias 36
  37. 37. Final thoughts• Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is)• Also ex-ante, be clear re treatment, covariates, units, and assumptions.• Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer.• Run diagnostics/sensitivity tests for unobserved (hidden) bias• Devise/test multiple“elaborate theories”. Invest in learning about the substantive problem to be solved, and be skeptical of your own results. 37

×