Chris NicolettiActivity #267: Analysing the socio-economicimpact of the Water Hibah on beneficiaryhouseholds and communities (Stage 1)Impact EvaluationTraining CurriculumSession 3April 18, 2013
MEASURING RESULTSFrom Promises into EvidenceIMPACT EVALUATIONANDThis material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge itsuse as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary Material,The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarilythose of the World Bank.Some of the data collection management material was developed by Adam Ross.
3Tuesday - Session 1INTRODUCTION AND OVERVIEW1) Introduction2) Why is evaluation valuable?3) What makes a good evaluation?4) How to implement an evaluation?Wednesday - Session 2EVALUATION DESIGN5) Causal Inference6) Choosing your IE method/design7) Impact Evaluation ToolboxThursday - Session 3SAMPLE DESIGN AND DATA COLLECTION9) Sample Designs10) Types of Error and Biases11) Data Collection Plans12) Data Collection ManagementFriday - Session 4INDICATORS & QUESTIONNAIRE DESIGN1) Results chain/logic models2) SMART indicators3) Questionnaire DesignOutline: topics being covered
4Impact Evaluation ProjectTimelineDesign the InterventionRollout theInterventionOngoingMonitoringand ProcessEvaluationDesign theImpactEvaluationCollectBaselineDataCollectFollow-updataEndlineAnalysisScope of thissession
5• We are planning a prospective impact evaluation• We need baseline and follow-up survey data ontreatment and control groups to measureprogram impact.• We are going to collect our own data for theimpact evaluation.Assumptions of thispresentation
6• Can you use existing data?• Regular surveys (Census, DHS survey, IRAP, etc.)• Regular Monitoring (Annual achievement tests)• Administrative Records (health records, schoolenrollment, etc.)• In many instances, administrative data isinsufficient, poor quality or not at the scale youwould like.Before collecting your owndata
7• Who should collect the data?• Bureau of statistics – may have good capacity and aworthwhile place to invest in further capacity.• University – often times a social science-basedprogram will have data collection experience.• External Firm – depends on the questions, complexityand magnitude.• When do you need to start?• Keep in mind, procurement, training and datacollection all take time.Before collecting your data
8Objectives of sampling anddata collection…•We need data that:1. Accurately reflects the reality of the population(remember the external validity discussion fromyesterday).2. Is representative of the entire eligible population.3. Allows policy makers and analysts to make real-time, informed decisions.4. Has minimal sampling and non-sampling error.
9• Sampling error: the result of observing a sampleof n households (the sample size or the“evaluation sample”) rater than all N householdsin the target population.• Remember our diagram from yesterday…Types of errors: SamplingError
11• Non-Sampling Error: The result of errors insurvey development and execution. Someexamples are:• Measurement error – when the answers recorded aredifferent than the actual values• Selection bias – results from imperfections in sampleframe of deficiencies in the sample selection process/• Non-response – when we don’t get answers frompeople on certain questions and/or from entirehouseholds.• What are some other that you can think of???Types of error: Non-SamplingError
12Source of Error ExamplesStrategies tominimize errorPlanning andinterpretationInadequate definitions of concepts, terms orpopulations.Ensure all concepts, terms andpopulations are defined preciselythrough consultation between data usersand survey designers.Sample selection Inadequate list from which sample isselected; biased sample selection.Check list for accuracy, duplicates andmissing units; use appropriate selectionprocedures (see “Bias and Accuracy”below).Survey methods Inappropriate method (e.g., mail survey for avery complicated topic).Choose an appropriate method and testthoroughly.Questionnaire Loaded, misleading or ambiguous questions,poor layout or sequencing.Use plain English, clear questions andlogical layout; test thoroughly.Interviewers Leading respondents, making assumptions,misunderstanding or mis-recording answers.Provide clear interviewer instructionsand appropriate training, includingexercises and field supervision.Respondents Refusals, memory problems, roundinganswers, protecting personal interests orintegrity.Promote survey through public media;ensure confidentiality; if interviewer-based, use well-trained, impartialinterviewers and probing techniques; ifmail-based, use a well-writtenintroductory letter.Processing Errors in data entry, coding or editing. Adequately train and superviseprocessing staff; check a sample of eachperson’s work.Estimation Incorrect weighting, errors in calculation ofestimates.Ensure that skilled statisticiansundertake estimation.This table was extracted from:http://www.oesr.qld.gov.au/about-statistics/survey-methods/#Sources
13Random samplingSystematic samplingStratified samplingConvenience samplingSnowball samplingMulti-stage samplingProbability Proportional to SizeLet’s discuss each one…We will now discuss the typesof sampling…
14•Random sampling is the purest form ofprobability sampling. Each member of thepopulation has an equal and known chance ofbeing selected.• Can anyone explain what is meant by probabilitysampling?• Does IndII use this currently?Random Sampling
15•Systematic sampling is often usedinstead of random sampling. After the requiredsample size has been calculated, every Nthrecord is selected from a list of populationmembers. Its only advantage over the randomsampling technique is simplicity.• Rather than rolling a dice or lottery, you simply chooseevery 10th person on the list.• Is this still random???Systematic sampling
16•Stratified sampling is commonly usedprobability method that reduces sampling error. Astratum is a subset of the population that share atleast one common characteristic.• The researcher first identifies the relevant stratumsand then random sampling is used to select asufficient number of subjects from each stratum.• Stratified sampling is often used when one or more ofthe stratums in the population have a low incidencerelative to the other stratums.• What are some cases when this would be important?Stratified Sampling
17•Convenience sampling is used inexploratory research where the researcher isinterested in getting an inexpensiveapproximation of the truth.• The sample is selected because they are convenient.• This nonprobability method is often used duringpreliminary research efforts to get a gross estimate ofthe results, without incurring the cost or time requiredto select a random sample.• Examples: “That person looks like they will answer mysurvey…,” “That house is closer…” or, “That village isless expensive to get to…”Convenience Sampling
18•Probability proportional to size(PPS) is a sampling technique for use withsurveys in which the probability of selecting asampling unit (e.g., village, zone, district, school)is proportional to the size of its population.Probability proportional to size
19•Multistage sampling - is a complex form ofsampling in which two or more levels of units areembedded one in the other.• The first stage consists of constructing the clusters that willbe used to sample from. In the second stage, a sample ofprimary units is randomly selected from each cluster. Allultimate units (individuals, for instance) selected at the laststep of this procedure are then surveyed.• This technique involves taking random samples fromproceeding random samples.• You pick multistage sampling to maximize the efficiencyof your design!!!Multi-stage sampling
20• How do sampling techniques factor into SampleDesign?• Sampling methods are just one part of the sampledesign.• Remember that sample design can be very complex,because we are trying to capture enough informationin our sample, to be able to test outcomes.• Remember from Tuesday….But what about sampledesign?
21• A good summary is provided by Duflo (2006):• The power of the design is the probability that, for a given effect size and a givenstatistical significance level, we will be able to reject the hypothesis of zero effect.Sample sizes, as well as other (evaluation & sample) design choices, will affectthe power of an experiment.• There are lots of things to consider, such as:• The impact estimator to be used; The test parameters (power level, significancelevel); The minimum detectable effect; Characteristics of the sampled (target)population (population sizes for potential levels of sampling, means, standarddeviations, intra-unit correlation coefficients (if multistage sampling is used)); andthe sample design to be used for the sample surveyA good sample design requiresexpert knowledge…
22The basic process is this…Level of PowerLevel ofHypothesisTestsCorrelations inoutcomeswithin groups(ICCS)Mean andVariance ofoutcomes &MDES
23𝑀𝐷𝐸𝑆 = 𝑡(1 − 𝑘) +𝑡 𝛼 ∗ √[1𝑃 1 − 𝑃] √[𝜎2𝑁]And the mathematical formulais this…The minimum detectable effect size for a given t-test statistics (t) withpower (k), significance level (α), sample size (N) and portion of subjectsallocated to treatment group (P).This equation can show a lot of things:(1) trade-off between power and size: when the size decreases, 𝑡 𝛼increases, so that the minimum effect size increases for a givenpower.(2) The MDES drives the whole equation, if you have a larger MDES thesample size (N) may be lower to test at the same level of significance(α).(3) The equation is minimized when the portion of subject allocated totreatment groups is equal to .5
24• The equation becomes more complex whenintroducing other factors, such as:• Multi-stage designs, clusters, etc.• Unknown variances on your outcome variables• Costs of treatment do not allow for equal treatmentand control groups.• Budget ceilings• Multiple treatments with different MDES (Example:expenditures on water vs. impacts on educationlevels).• Grouped errors.Additional factors…
25• In practice, the survey expert/statistician will test anumber of different design options to determine which onewill be the most efficient, given the information at hand.• Try different sampling strategies.• If you have baseline and endline data, you can improve efficiencyby using a Diff-in-Diff estimator.• Incorporate the adjustment to standard errors from evaluationdesign choice.• Calculate ICCs to determine if clustering or multi-stage designsare better.• Check various stratifications to see if this improves efficiency.• Adjust significance levels based on the rigor that is needed.• Incorporate marginal costs – based on previous work and/or inputfrom the field teams.TEST MULTIPLE DESIGNS…
26Typically you calculate MDES withdifferent sample sizesPower Function Corresponding to Different Sample SizesSample Size ofEach DesignGroupMinimum Detectable Effect, D (as fraction of standard deviation)0 .05 .10 .15 .201500 .05 .17 .39 .65 .861750 .05 .18 .43 .71 .902000 .05 .20 .47 .76 .932500 .05 .22 .54 .84 .973000 .05 .25 .61 .89 .983500 .05 .27 .67 .93 .994000 .05 .30 .72 .95 .9954500 .05 .32 .76 .97 .999
27EvaluationDesignSampleDesignEvaluation Design and SampleDesign...Allow forestimation andhypothesis testingof the impacts!!!
How does this all tie together?In the case of the Phase 1 Water Hibah?
29• Provision of services to villages and households under the WaterHibah is not determined by randomization, but by assessment andWTP.• The dataset design exhibits some characteristics of a controlledexperiment with connected and unconnected, but connection decisionis not determined by randomization.• Household matching is not an efficient method with the potentialdiscrepancies we identified in the pilot test, and does not work verywell with the sample design that was chosen.• Village-level matching is not feasible because there are usuallyconnected and unconnected in a single village (locality).• The design we have chosen is: pretest-posttest-nonequivalent-control-group quasi-experimental design that will useregression-adjusted Difference-in-Difference impactestimators.An example: Socio-econimpact of Endline Water Hibah
30• Outcome indicators: we have simplified versions of themin the baseline, but they have been modified for endline Use baseline dataset to calculate ICCs.• Highest variation in outcome indicators was identified acrossvillages (localities) primary sample unit is the village.• The # of households in the village was found to improve theefficiency of the design stratify villages based on the # ofhouseholds• Marginal costs of village visit vs. household visit wereincluded.• The final sample design that was identified is referred toas: Stratified Multi-stage sampling with 250 villages and7-14 households per experimental group = 7,000 hhs.An example: Socio-econimpact of Endline Water Hibah
Data Collection ManagmentNow that we have a design in place… what do you do?
32• Initial Steps• Publicity• Coordination with local officials• Recruitment of Staff• Training• Piloting• Fielding• Field ControlTraining & Field Control…
33• Conduct an awareness campaign• Have a regular column in a newspaper• Have regular segments in the broadcast media• Prepare brochures for distribution.• Sometimes this is not feasible, but it is nice tomake your eligible sample aware of the possibilitythat they will be surveyed.Publicity
34• Ministry Officials• Local Officials• Local VIPs/Village Leaders• Key Stakeholders• IndII projects involve a number of parties (i.e.,LGs, PDAMS, households, etc.) What are thecoordination efforts taken on IndII projects?Coordination
35Field Managers• Tracking and reporting on field issues• Applying the field sampling methodology• Assigning units for enumeration• Disposition of cases and field validation checks• Completing the central office receipt controlsheets• Giving out daily assignments to Field Interviewersand ensuring that daily data collection activitiesrun smoothlyTeam Composition
36Editors• Applying the field sampling methodology• Field checking each completed survey for internalconsistency and completeness• Disposition surveys as complete• Assuring that surveys are receipted appropriatelyand delivered to the central office for data entryTeam Composition
37Enumerators• Completing the interviews and ensuring that allappropriate units are surveyed• Assisting supervisors in applying the fieldsampling methodology• Data checks and editing• Documenting the status of particular cases andassigning disposition codesTeam Composition
38Thorough training is essential to ensure that boththe interviewers and supervisors have thenecessary knowledge and skills to collect validand reliable data. The purpose of training is to:• Ensure a standardized application of the surveymaterials• Clarify the rationale of the study and study protocol• Motivate interviewers• Provide practical suggestions• Improve the overall quality of the data• Allow 2-3 weeks for training and pilot testingTraining is essential!!!
39• Opening and logistics of training• Introduction to the Project• Survey design and methodology• Sampling and enumeration• Introduction to field supervision• Detailed review of each survey module• Survey logistics• Role playing and interview techniques• Gaining cooperation• Reducing Bias• Ethics in survey research• Gender issues in conducting an interview• Controlling an interview - ProbingTraining topics
40• The main purpose of pilot testing is to catch potentialproblems before they become costly mistakes.• It is typically used if an instrument or method of datacollection is being used for the first time or for the firsttime with a particular group.• Pilot testing provides information on how long datacollection can be expected to take and a preview of howdifficult items will be to complete.• The latter is important as, with proper advanced notice, you canmodify questions and possibly even the way you collectinformationPilot testing is mandatory…
41Valid and reliable data is based on rigorousquality control standards• Observation• Editing• Spot Checks• Re-interviews• ValidationQuality Control is crucial…
For tomorrow…We will talk more about quality control and questionnairedesign practices.
43Tuesday - Session 1INTRODUCTION AND OVERVIEW1) Introduction2) Why is evaluation valuable?3) What makes a good evaluation?4) How to implement an evaluation?Wednesday - Session 2EVALUATION DESIGN5) Causal Inference6) Choosing your IE method/design7) Impact Evaluation ToolboxThursday - Session 3SAMPLE DESIGN AND DATA COLLECTION9) Sample Designs10) Types of Error and Biases11) Data Collection Plans12) Data Collection ManagementFriday - Session 4INDICATORS & QUESTIONNAIRE DESIGN1) Results chain/logic models2) SMART indicators3) Questionnaire DesignOutline: topics being covered
MEASURING RESULTSFrom Promises into EvidenceIMPACT EVALUATIONANDThis material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge itsuse as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary Material,The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarilythose of the World Bank.Some of the data collection management material was developed by Adam Ross