Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Propensity Score Matching (PSM) Module 4

124 views

Published on

The goal of this course is to provide policy analysts and project managers with the tools for evaluating the impact of a project, program or policy. This course provides information on the methods that can be used to measure the impact of a project, program or policy on the well-being of individuals and households. The course addresses the ways in which the results of an impact evaluation may be put to use – such as, to improve the design of projects and programs, as an input into cost-benefit analysis, and as a basis for policy decisions.

Published in: Government & Nonprofit
  • Be the first to comment

  • Be the first to like this

Propensity Score Matching (PSM) Module 4

  1. 1. Module 4 PROPENSITY SCORE MATCHING (PSM) SHAHID KHANDKER INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE (IFPRI)
  2. 2. Non-Experimental Methods: Constructing Counterfactual From large group of controls, find those similar to participants in pre-treatment characteristics Focus on pre-treatment characteristics because not affected by program PSM: matches program and control areas on pre-treatment observed characteristics
  3. 3. How PSM Works 1. Construct statistical comparison group based on a model of probability of participating, based on observed characteristics. 2. Participants then matched on the basis of this probability, or propensity score, to non-participants. 3. Average treatment effect of program = mean difference in outcomes across these two groups.
  4. 4. The Propensity Score Summarizes the characteristics of households into an index: P(X)=Pr(T=1|X) 1. Use representative sample survey of eligible non- participants and participants To calculate: 2. Estimate probit/logit model of program participation T as a function of all exogenous variables X in the data likely to affect participation Predicted probability of participation: Pr(T=1|X)
  5. 5. The Propensity Score Rosenbaum and Rubin (1983) show that, under certain assumptions, matching on P(X) is as good as matching on X. Necessary assumptions are conditional independence and a common support
  6. 6. Assumption 1: Conditional Independence Given a set of observable covariates X that are not affected by treatment, potential outcomes YT, YC are independent of treatment assignment T  (Yi T ,Yi C )Ti | Xi Interpretation: participation based entirely on observed characteristics. Not directly testable assumption; depends on features of the program itself
  7. 7. Assumption 2: Common Support Treatment observations have comparison units “nearby” in the propensity score distribution Density of scores for participants Density of scores for nonparticipants Density 0 1Region of common support Propensity score Example of common support Density of scores for participants Density of scores for nonparticipants Density 0 1Region of common support Propensity score Weak common support
  8. 8. Assumption 2: Common Support But if dropped observations = nonrandom subset of sample, potential bias May be useful to examine characteristics of dropped units to help interpret potential bias in estimated treatment effect
  9. 9. Better matching with larger sample of nonparticipants X across surveys should reflect same concept Assumption 2: Common Support If the two samples come from different surveys: Use similar questionnaire, same interviewers or interviewer training, same survey period Should also draw participants and nonparticipants from same economic environment/geographic area
  10. 10. Assumption 2: Common Support Balancing also needed: though a treated and its matched comparator might have same P(X), this does not mean they are necessarily similar.  p ^ (X | T  1)  p ^ (X | T  0) Also need to check if observations with same P(X) have same distribution of observable covariates independent of treatment status.
  11. 11. Matching After calculating propensity scores, need to decide on method to match non-participants with participants 1. Nearest neighbor 2. Caliper  Match on closest P(X) (or closest five neighbors)  Difference in propensity scores for a participant and closest neighbor may still be very high. Can be avoided by imposing a “tolerance” on maximum propensity score distance (caliper).
  12. 12. Matching 3. Stratification 4. Kernel  partitions common support into different strata, and calculates program impact in each interval. After calculating propensity scores, need to decide on method to match non-participants with participants  non-parametric matching estimator; use weighted averages of all observations in non- participant group to represent the counterfactual.
  13. 13. Calculating Treatment Impact Matching method creates weights   APSM TT  1 NT Yi T iT  - W (i, j)Yj C jC          NT = number of participants i W = weights comparison units by propensity score distribution of participants
  14. 14. PSM and Regression-Based Methods Consistent OLS estimates of ATE can be calculated under the assumption of conditional exogeneity Hirano et. al. (2003): run a weighted least squares regression of the outcome on treatment and other covariates, using the inverse of a nonparametric estimate of the propensity score Leads to fully efficient estimator
  15. 15. PSM and Regression-Based Methods  Yit   Ti1 it Hirano et. al., 2003: ATE for population: weights = for participants and for nonparticipants  1/P ^ (X)  1/(1 P ^ (X))
  16. 16. Conclusions PSM useful where unobserved heterogeneity does not determine program participation Baseline data on wide range of pre-program characteristics can better specify P(X)=Pr(T=1|X) Whether this is actually the case depends on the unique features of the program itself PSM imposes less constraints on functional form/distribution of error
  17. 17. Case Studies: PROPENSITY SCORE MATCHING (PSM)
  18. 18. How PSM Works 1. Construct statistical comparison (i.e., counterfactual) group based on a model of probability of participating, based on observed characteristics. 2. Participants then matched on the basis of this probability, or propensity score, to non-participants. 3. Average treatment effect of program = mean difference in outcomes across these two groups.
  19. 19. Case Study 1: Farmer-Field School in Peru
  20. 20. Overview Godtland et. al., 2004: impact of a pilot farmer-field- school (FFS) program in Peru on farmers’ knowledge of pest management practices related to potato cultivation FFS started in 1998 by scientists in collaboration with CARE-Peru
  21. 21. Program Design Program not randomized, farmers self-selecting Large sample of nonparticipants, drawn from: • Villages where FFS program existed • Villages without the FFS program but with other programs run by CARE-Peru • Control villages - similar to the FFS villages in observable characteristics as climate, distance to district capitals, and infrastructure
  22. 22. Estimating Program Effect Simple comparison of knowledge levels across participants and non-participants would yield biased estimates of the program effect Non-participants would therefore need to be matched to participants over a set of common characteristics, to ensure comparability Initial assumption: selection on observed characteristics
  23. 23. Generating Common Support 1. Choosing propensity score cutoff 3. Construct a weighted match for each participant 2. Choosing comparison group Three methods: • Nearest-neighbor (5) matching: 5 non-participants to each participant, within proposed 0.01 bound • Using full sample of nonparticipants • No formal rule: choose threshold = 0.6 • Nonparametric kernel regression method • Those not matched - dropped
  24. 24. Evaluating Comparability 1. Choosing propensity score cutoff / NN (5) methods 2. Weighted match method Balancing tests: whether the means of the observable variables for each group are significantly different • Tests for equality of means conducted across samples of participants and their weighted matches • Divide each comparison and treatment group into two strata, ordered by propensity scores • Within each stratum, t-test of equality of means across samples for each X in farmer participation equation
  25. 25. Evaluating Comparability In general, across methods, null not rejected that differences not significantly different across two samples - common support validated Regression method was also used; no substantial differences with alternative methods Results do suggest farmers who participated in field- level school program have better knowledge on integrated pest management (IPM) Improved knowledge about IPM has tended to increase farm productivity
  26. 26. Case Study 2: Trabajar Workfare Program in Argentina
  27. 27. Overview Trabajar: workfare program set up in Argentina during economic crisis in 1997 Jalan and Ravallion (2003): measure net income gains from participating Participants must engage in work to receive benefits: 80% Trabajar workers came from poorest 20% of the Argentine population Not randomized; no time for baseline
  28. 28. Difficulties in Measuring Net Income Gains No access to baseline, randomization Measurement of foregone income, and hence construction of a proper counterfactual, was therefore a challenge Participants also need not have been unemployed prior to joining Trabajar
  29. 29. Approach: Use of Multiple Surveys Jalan and Ravallion able to construct counterfactual using contemporaneous survey data of large sample of non-participants Post-intervention national survey conducted of about 2,800 participants and non-participants - both groups came from similar economic environment.
  30. 30. PSM Application 1. Kernel density estimation used to match sample of participants and non-participants over common values of propensity scores 2% of nonparticipant sample from the top and bottom of the distribution Non-participants for whom the estimated density was equal to zero Excluded:
  31. 31. PSM Application 2. Estimates of the average treatment effect based on based on nearest-neighbor, nearest five neighbors, as well as kernel-weighted matching were constructed Average gains of about half of the maximum monthly Trabajar wage of US$200 realized
  32. 32. Does Selection on Observables Hold? Jalan and Ravallion test for potential remaining selection bias on unobserved characteristics by applying the Sargan-Wu-Hausman test
  33. 33. Sargan-Wu-Hausman Test •On sample of participants and matched non-participants, ran OLS regression of income on: (1) Propensity score and residuals from logit participation equation (2) Additional control variables Z that exclude instruments (provincial dummies) used to identify exogenous variation in income gains If the coefficient on the residuals ≠ 0, unobserved selection bias may continue to pose a problem
  34. 34. Results: Test for Unobserved Bias Test was used to detect selection bias only in nearest- neighbor estimates (one participant matched to one non- participant  lended more feasibly to regression-based approach) • Coefficient on residuals not statistically significant under null • Coefficient on propensity score similar to average impact in nearest- neighbor matching estimate Results:

×