IFPRI- RIMS Workshop
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

IFPRI- RIMS Workshop

  • 327 views
Uploaded on

International Food Policy Research Institute (IFPRI) organized a three days Training Workshop on ‘Monitoring and Evaluation Methods’ on 10-12 March 2014 in New Delhi, India. The workshop is part of......

International Food Policy Research Institute (IFPRI) organized a three days Training Workshop on ‘Monitoring and Evaluation Methods’ on 10-12 March 2014 in New Delhi, India. The workshop is part of an IFAD grant to IFPRI to partner in the Monitoring and Evaluation component of the ongoing projects in the region. The three day workshop is intended to be a collaborative affair between project directors, M & E leaders and M & E experts. As part of the workshop, detailed interaction will take place on the evaluation routines involving sampling, questionnaire development, data collection and management techniques and production of an evaluation report. The workshop is designed to better understand the M & E needs of various projects that are at different stages of implementation. Both the generic issues involved in M & E programs as well as project specific needs will be addressed in the workshop. The objective of the workshop is to come up with a work plan for M & E domains in the IFAD projects and determine the possibilities of collaboration between IFPRI and project leaders.

More in: Data & Analytics
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
327
On Slideshare
327
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Nicholas Minot (IFPRI/Uganda) AtsukoToda (IFAD/Vietnam) Nguyen Ngoc Ahn (DEPOCEN) RIMS+ surveys: A tool for project design and evaluation
  • 2. Background on RIMS in Vietnam Results and Information Management System (RIMS)  3rd -level results are associated with project impact on child malnutrition and household living standards.  IFPRI project focus on the household survey used to collect third-level results
  • 3. Background on RIMS RIMS survey guidelines  Should be implemented for large, national IFAD projects  Should be done before, and at end of project  Sample size: 900 beneficiary households  Returning to same households not recommended  Concern about concentration of IFAD program efforts  Administrative complications of finding old households
  • 4. Background on RIMS RIMS questionnaire  Objective is to measure assets and child nutrition  divided into three sections  Section 1 – Household demographics  Section 2 – Housing, assets, and food security  Section 3 – Anthropometry
  • 5. Background on RIMS Standardization of RIMS questionnaire  Ensures comparability across countries  Makes analysis relatively quick  Assures quality  But little flexibility in questionnaire design & analysis  Does not collect intermediary indicators
  • 6. Changes in RIMS+ Overview of changes Changes Rationale 1. Expanded questionnaire Collect additional information to diagnose farmer constraints, improve design of interventions, and measure impact on intermediate indicators 2. Use of control group Better measurement of impact of project by controlling for broader changes in rural conditions 3.Additional training and supervision Improve quality of data 4. GPS to geo-reference households Facilitate return to same households (panel) and better supervision of enumerators 5. Flexible questionnaire & analysis Address information needs of the IFAD project and IFAD planning in general
  • 7. Changes in RIMS+ 1. Expanded questionnaire RIMS+ RIMS New info in RIMS+ A. Member characteristics 1. Household demographics + ethnicity, school attendance, & reasons for not attending B. Housing 2. Survey questions + roof material, ownership status, location of toilet C. Assets 2. Survey questions + agricultural equipment D. Land (no info) Farm size, ownership, irrigation, distance E. Crop production (no info) Production, sales, & prices for 25 crops; cost of 6 inputs F. Livestock & fisheries (no info) Herd size, sales, & costs for 12 types of animals, use of vet services, type of feeding
  • 8. Changes in RIMS+ 1. Expanded questionnaire (continued) RIMS+ RIMS New info in RIMS+ G. Extension & market access (no info) Access to extension, who uses, cooperatives, details of sales, distance to markets H. Non-farm activities (no info) Income and business expenses for 11 non-farm income sources, gender roles I. Food security 2. Survey questions + coping strategies and quality of diet J. Credit & borrowing (no info) Access to credit, info on loans received K. Socio-Economic Development Plan (no info) Knowledge of and participation in SEDP process L. Risk & vulnerability (no info) Perceived risk of six natural disasters M. Anthropometry 3.Anthropometry No new information
  • 9. Changes in RIMS+ 2. Use of control group  Control group is 300 households that are similar to beneficiaries but not in project area  Useful to control for changes in rural areas due to other factors Beneficiary households Control households Impact according to current before-after comparison Actual impact using info from control group Example 1 Income rises 8% Income rises 4% due to economic growth Suggests that project caused 8% increase in income Actually, only a 4% increase due to project Example 2 Income does not change Income falls 4% due to drought Suggests that project had no effect Actually, 4% increase in income due to project
  • 10. Changes in RIMS+ 2. Use of control group (continued) Time Before project After project Control group Beneficiary households Outcome indicator Actual effect of project Before-after difference is hypothetical path of beneficiary households without the project, based on growth in control group
  • 11. Changes in RIMS+ 3.Additional training and supervision  Because questionnaire is longer and somewhat more complicated, need for additional training & supervision of enumerators  IFPRI & DEPOCEN prepared detailed enumerator manual  DEPOCEN provided 5 days of training plus testing of questionnaire  DEPOCEN also provided additional supervision during data collection, particularly important in first week of data collection
  • 12. Changes in RIMS+ 4. Use of GPS units  GPS units are sometimes used in RIMS surveys  Main purpose is to make it easier to find household to interview in later round of survey  Additional benefit of verifying that enumerators have visited households in village
  • 13. Changes in RIMS+ 5. Flexible questionnaire & analysis of results  Original RIMS is analyzed in a “black box”  Advantage is analysis is fast, reliable, and comparable  But little opportunity to customize results for project  RIMS+ questionnaire can be customized for project Type of IFAD project Possible customization of questionnaire Farmer training & extension Access to extension, sources of info, perception of usefulness, adoption of advice, yield Linking farmers to market Travel time to markets, types of buyers, degree of competition, prices received, share sold Promotion of non-farm enterprises Number & composition of NFEs, profitability, training needs, perceived constraints, factors affecting success Improved access to credit Sources of credit, interest rates paid, use of credit, reasons for use of informal credit, factors affecting repayment rate
  • 14. Changes in RIMS+ 5. Flexible questionnaire & analysis of results  RIMS+ analysis can be customized to address questions relevant for project design & implementation  Is access to extension services different for female-headed farmers?  Can pepper be successfully grown by small-scale farmers with limited resources?  Is targeting landless households more (or less) pro-poor than targeting farmers with less than 0.5 hectares?  Is satisfaction with project services higher in one district than in another?
  • 15. Expanded questionnaire  More information and more complicated questionnaire  Requires additional training and supervision  Longer interview time (double at least)  Requires a new data entry program  Separate data entry in CSPro for 1200 questionnaires  At least 2 days in preparing CSpro entry data form  Another 2 days for training in data entry in CSPro in addition to RIMS training.  Increased complexity in analysis and reporting Cost and implementation issues
  • 16. Use of control group  Increased workload with financial implication (additional 300 non- project household)  Implementing survey in non-project area is more difficult due to logistics, cooperation  Data entry in both RIMS and CSPro  RIMS software to enter RIMS core questions for 900 beneficiary households  Data entry in CSPro for full questionnaire for1200 household sample  Additional training/supervision  Project managers do not see immediate benefit Cost and implementation issues
  • 17. Use of GPS Cost and implementation issues  Increased training time (1/2 day) and additional time at household (10 minutes)  Not easy to use due to language barrier  Additional burden due to the fact that interviewers already have to carry weight and scale
  • 18. Cost and implementation issues Component First-time costs Per survey costs Expanded questionnaire in data collection Already carried out under IFAD-IFPRI Partnership Interview time is approximately doubled Use of control group No fixed cost Increases field costs by 50- 100% Additional training & supervision Enumerator manual prepared under Partnership Approximately US$ 10- 15k per survey Use of GPS units Cost to purchase = US$ 100 x 20 units = US$ 2000 Modest - GPS units can be shared across projects or rented Analysis of data Large initial cost of preparing analysis programs, already undertaken by Partnership For standard analysis, negligible. For customized analysis, requires Stata skills Cost estimates
  • 19. Questions Results of Vietnam RIMS+  Which crops are pro-poor?  How does crop commercialization vary across farmers?  Do female-headed farmers have equal access to modern inputs?  How important is income from non-farm activities?  How to farmers perceive the risks of natural disasters?  Is food security threatened by crop commercialization?  How involved are farmers in the preparation of the Socio- Economic Development plans?  Will raising farmer income improve child nutrition?
  • 20. Which crops are pro-poor? Results of Vietnam RIMS+ • Rice is grown by majority of the poor, but fewer high-income households • Maize, groundnut, red onion, bananas, tea, and vegetables are grown by both poor and non-poor • Avocado, mango, durian, pepper, sugarcane, coffee, and cashew are grown disproportionately by high- income farms • This is not to say they can’t be grown by poor farmers, but any untargeted support to these crops will not be pro- poor
  • 21. Is input use less among female-headed households? Results of Vietnam RIMS+ • Not much evidence that input use per hectare is lower • But smaller farm sizes lead to smaller crop production and lower income
  • 22. What is the importance of non-farm income? Results of Vietnam RIMS+ • Even the 20% of farms with the smallest area (less than 0.10 hectares) earns the bulk of their income from crop production • 45% of smallest farms rent, sharecrop, borrow, or use illegally other land
  • 23. How do farmers perceive the risk of different natural disasters? Results of Vietnam RIMS+ • Perception of disaster risk varies by province • Also, perception of likely losses is greater for poor households
  • 24. Is food security threatened by commercialization? Results of Vietnam RIMS+ • Commercialization is defined as the share of the value of crop production that is sold • Relationship holds even after controlling for per capita income and farm size in regression analysis
  • 25. Will raising farmer income improve child nutrition? Results of Vietnam RIMS+ • Yes, but effect is weak • Many other variables influence child nutrition: sanitation, health care, education, child rearing practices, etc. -5 05 Z-scores 10 12 14 16 18 Log of per capita income Length/height-for-age Z-score Weight-for-length/height Z-score lowess haz06 lnpcinc lowess whz06 lnpcinc
  • 26. Summary & conclusions  RIMS+ surveys probably not suitable for all IFAD projects because of additional costs  Conditions under which it is most suitable:  IFAD project design is flexible, can be revised in light of new information from survey  IFAD project focuses on a new topic or new region, so there is a need for information  There are gaps in knowledge about farm household livelihoods and behavior relevant to project  IFAD project is relatively large, implying an adequate M&E budget When is RIMS+ most suitable?
  • 27. Additional issues  Size of control group  At the moment, 900 treatment to meet standard RIMS requirement and 300 control  But typically control group is similar size  It would reduce costs to develop a Core Module and additional modules that are selected depending on project (e.g. agricultural marketing, credit, extension)  RIMS+ would require additional capacity building for IFAD project staff  Project has prepared an enumerator manual and data entry programs and could also prepare an implementation guidelines if needed Summary & conclusions
  • 28. Page 28 Objective of Impact Evaluation Measure the effect of the program on its beneficiaries (and eventually on its non- beneficiaries) by answering the counterfactual question:  How would individuals who participated in a program have fared in the absence of the program?  How would those who were not exposed to the program have fared in the presence of the program? Two main problems arise: confounding factors and selection biases.
  • 29. Page 29 Comparing averages  Individual-level measure of impact : what would be the outcome (e.g. farm incomes) had he/she not participated to the program (in our case the treatment?  Compare the individual with the program, to the same individual without the program, at the same time ? - can never observe both, missing data problem.  Instead:Average impact on given groups of individuals  Compare mean outcome in group of participants (Treatment group) to mean outcome in similar group of non-participants (Control group)  AverageTreatment effect on the treated (ATT):
  • 30. Page 30 Building a control group  Compare what is comparable.  Treatment” and “Control” groups must look the same if there was no program.  Generally, those individuals who benefit from the program initially differ from those who don’t.  External selection: programs are explicitly targeted (Particular areas, Particular individuals).  Self selection: the decision to participate is voluntary.  Pb with comparing beneficiaries and non-beneficiaries: the difference can be attributed to both the impact or the original differences.  SELECTION BIAS - when individuals or groups are selected or self select for treatment on characteristics that may also affect their outcomes.
  • 31. Page 31 Initial Population Selection Treatment Group (receives procedure X) Impact = Y Exp – Y Control Quintile I (Poorer) Quintile II Quintile III Quintile IV QuintileV (Richer) Program selection does not lead to selection bias (from Bernard 2006) Control group (does not receives procedure X)
  • 32. Page 32 Initial Population Quintile I (Poorer) Quintile II Quintile III Quintile IV QuintileV (Richer) Control group (does not receives procedure X) Treatment Group (receives procedure X) Program selection leads to selection bias Selection Impact ≠ Y Exp – Y Control
  • 33. Page 33 “Sign” of the selection bias (1) Program targeted on “worse-off” households Treatment Control Observed difference is negative Actual impact
  • 34. Page 34 Treatment Control Observed difference is very large Actual impact “Sign” of the selection bias (2) Program targeted on “better-off” households
  • 35. Impact evaluation for policy decisions  Impact evaluations needed to-  curtailing inefficient programs,  to scaling up interventions  adjusting program benefits,  to selecting among various program alternatives.  The Mexican Progresa/Oportunidades evaluation became influential because of  the innovative nature of the program  its impact evaluation provided credible and strong evidence Page 35
  • 36. Role of qualitative data  Qualitative data-a key supplement to quantitative impact evaluations providing complementary perspectives on program’s performance.  Employ mixed methods (Bamberger, Rao &Woolcock 2010).  Approaches include FGD, expert elicitation, key informant interviews (Rao andWoolcock 2003).  Useful 1. Can use to develop hypotheses as to how and why the program would work  2. Before quantitative IE results become available, qualitative work can provide quick insights happenings in the program.  3. In the analysis stage, it can provide context and explanations for the quantitative results Page 36
  • 37. Focusing on quantitative methods  A central feature of IE is use of longitudinal data to use “difference-in-differences” or “double difference” methods.  Methods rely on baseline data collected before the project implementation and follow-up data after it starts to develop a “before/after” comparison.  Data collected from households receiving the program and those that do not (“with the program” / “without the program”). Page 37
  • 38. Double difference methods: continued  Why both “before/after” and “with/without” data are necessary ?  Suppose only collected data from beneficiaries.  Suppose between the baseline and follow-up, some adverse event occurs.  —the benefits of the program being more than offset by the damage from bad event.These effects would show up in the difference over time in the intervention group, in addition to the effects attributable to the program.  More generally, restricting the evaluation to only “before/after” comparisons makes it impossible to separate program impacts from the influence of other events that affect beneficiary households.  To guard against this add a second dimension to evaluation design that includes data on households “with” and “without” the program. Page 38
  • 39. Illustration of double difference Survey round Intervention group (Group I) Control group (Group C) Difference across groups Follow-up I1 C1 I1 – C1 Baseline I0 C0 I0 – C0 Difference across time I1 – I0 C1 – C0 Double-difference (I1 – C1) – (I0 – C0) Page 39
  • 40. Randomization  With random program assignment all individuals-same chance of receiving the program.  With well done randomized design evaluation, beneficiaries and non-beneficiaries on average, the same observed and, more important, unobserved characteristics (since they are more difficult to control for).  In this way a credible basis for comparison is established, freed from selectivity concerns, and the direction of causality is certain.  A further advantage to a randomized design is that program impact is easy to calculate and easier to understand and explain.  Heckman and Smith (1995)-however, point  Randomization bias- the process of randomization itself leads to a different beneficiary pool than would otherwise have been treated  substitution bias where non-beneficiaries obtain similar treatments from different sources—a form of “contamination.” Page 40
  • 41. Matching  Matching methods of program evaluation construct a comparison group by “matching” treatment households to comparison group households based on observable characteristics.  The impact is estimated as the average difference in the outcomes for each treatment household from a weighted average of outcomes in each similar comparison group household from the matched sample.  Matching methods differ in the selection of the matched comparison and in how these weighted average differences in outcomes are constructed.  One popular approach is propensity score matching (PSM). Page 41
  • 42. Regression discontinuity  The regression discontinuity design (RDD)-method that can be used for programs that have a continuous eligibility index with a clearly defined cutoff score to determine eligibility.  To apply RDD, two main conditions are needed:  1.A continuous eligibility index.  2.A clearly defined cutoff score, that is, a point on the index above or below which the population is classified as eligible for the program. Page 42
  • 43. RDD- Continued  The regression discontinuity measures the difference in post- intervention outcomes, such as incomes between the units near the eligibility cutoff  The difference is estimated using regression based on sub- sample around the cutoff point Page 43
  • 44. Encouragement design  Encouragement design is useful when intervention cannot be randomly administered to some and not others.  The method requires - a randomly-selected group of beneficiaries receive extra encouragement to undertake the intervention.  Encouragement -additional information or incentives.  By randomizing encouragement and carefully tracking outcomes for those who do and do not receive encouragement, it is possible to obtain reliable estimates of encouragement and intervention itself  compare results for the randomly-selected encouraged group vs. results for the randomly-selected not-encouraged group.This quantity of interest, known as the “Intention-to-Treat” effect, or ITT, is the effect of the encouragement itself Page 44
  • 45. Encouragement design: continued  Effect of the treatment obtained by adjusting the ITT by the amount of non- compliance  LATE=ITT/Compliance rate  Compliance Rate = Fraction of Subjects that were treated in the treatment group - Fraction of Subjects that were treated in the control group  With 100% compliance rate LATE = ITT - all assigned to the treatment take the treatment and all those assigned to the control do not take the treatment.  The compliance rate can be thought of as the fraction of subjects that fall into the sub-population of “compliers”, the group for whom the decision to take treatment was directly affected by the assignment.  This is the group induced by the encouragement to take advantage of the treatment. Page 45
  • 46. Finally on encouragement design  Compliers-the group of people that actually stick to the experimental protocol- take treatment if assigned to the treatment group and not if assigned to control.  For policy compliers are the only ones who are actually affected by the encouragement.  Usually, the compliance rate < 1  LATE effect estimates the effect of treatment only for the sub-population of compliers and it does not constitute the effect of the treatment for the whole sample.  Special case when the control group can be excluded from taking the treatment, the non-compliance can only occur in the treatment group and the LATE =ATT  In general, the compliance rate depends on the encouragement. Page 46
  • 47. Power calculation  Power  The ability of a study to detect an impact. Conducting a power calculation is a crucial step in impact evaluation design,  Power calculation  A calculation of the sample required for the impact evaluation, which depends on the minimum effect size and required level of confidence. Page 47
  • 48. Power –continued  We discuss the basic intuition behind power calculations by focusing on the simplest case—an evaluation conducted using a RCT and assuming that noncompliance is not an issue.  Power calculations indicate the minimum sample size needed to conduct IE.  Assess whether existing data sets are large enough for the purpose of conducting an impact evaluation.  Avoid collecting too much information, which can be very costly. Page 48
  • 49. Large samples better resemble population (both treatment and control) (World Bank 2008) Page 49
  • 50. Type 1 and Type 2 error  A type I error is made when an evaluation concludes that a program has had an impact, when in reality it had no impact.  A type II error occurs when an evaluation concludes that the program has had no impact, when in fact it has had an impact.  the likelihood of a type I error can be set by a parameter called the “confidence level.  Many factors affect the likelihood of committing a type II error, but the sample size is crucial Page 50
  • 51. Power stuff continued  If the average of 50,000 units treated is same as the average weight of 50,000 comparison units, then one probably can confidently conclude that the program has had no impact.  By contrast, if a sample of two treatment children weigh on average the same as a sample of two comparison children, it is harder to reach a reliable conclusion.  The power (or statistical power) of an impact evaluation is the probability that it will detect a difference between the treatment and comparison groups, when in fact one exists.An impact evaluation has a high power if there is a low risk of not detecting real program impacts, that is, of committing a type II error. Page 51
  • 52. Power calculations: continued (World Bank 2008)  Involves the following steps  Does the program create clusters?  What is the outcome indicator?  Is it required to compare program impacts between subgroups?  What is the minimum level of impact that would justify the investment made in the intervention?  What is a reasonable level of power for the evaluation being conducted?  6.What are the baseline mean and variance of the outcome indicators? Page 52
  • 53. Power calculations: continued  Power calculations involve different steps, depending on whether the program randomly assigns benefits among clusters or simply assigns benefits randomly among all units in a population.  No clusters – take a random sample of population (entire)  If subgroups will need larger sample (for example both male and female)  Minimum level of impact below which the program will be treated as not successful?  For an evaluation to identify small effects in difference in mean outcomes sample will required to be larger – minimum detectable effect should be chosen carefully  There can be different power levels – standard 80 percent i.e. find impact in 80 percent of cases when one has occurred  Get mean and variance in baseline right- more variance in the baseline will require larger sample to capture effect  Think of sensitivity to sample size to assumptions -lower expected impact, higher variance in the outcome indicator, or a higher power level Page 53
  • 54. Brief blurb on power with clusters  In the presence of clustering, guiding principle is that the number of clusters matters more than the number of individuals within the clusters.A sufficient number of clusters is required to test whether a program has had an impact by comparing outcomes in samples of treatment and comparison units.  If district is cluster -2 districts versus 100 districts on average latter could give similar treatment and comparison groups but can be costly  All steps 1-6 like before except  How variable is the outcome indicator within clusters?  In general, higher intra-cluster correlation in outcomes increases the number of clusters required to achieve a given power level – gain less by adding one more person from same village than from other village Page 54
  • 55. In this project  There are both clusters  Unclustered interventions Page 55