Advantages & Challenges of collecting & using longitudinal studies for research and policy.
Marta Favara, Senior Research Officer & Paul Dornan, Senior Policy Officer
Young Lives, University of Oxford
DFID Statistics Conference
6 September 2016
1. Advantages and challenges of collecting and using
longitudinal studies for research and policy
Marta Favara, Senior Research Officer
Paul Dornan, Senior Policy Officer
Young Lives, University of Oxford
Professional Development Conference
6th September, 2016
2. Outline of this presentation
1. Value of longitudinal (cohort) studies vs. cross sectional data & RCT
2. Overview of Young Lives study, a unique multi-countries multi cohort longitudinal
dataset
3. Main areas of policy relevant research
4. In the backstage
– Processes in place for designing and implementing the survey questionnaire
– Multi cohorts longitudinal data: Main challenges and risk-mitigation strategies
3. Longitudinal cohort studies
• Allow to adopt an holistic approach
• Enhance understanding of how outcomes are shaped:
– Allows to identify links between earlier circumstances and later (long term) outcomes
– Identifying what shapes later well-being; when differences emerge
• Testing the ‘dynamics’ of social processes:
– Enable evaluation of the differing impacts of continuing circumstances (or one-off
changes) on later well-being, for example the consequences of chronic poverty
RCT
• RCTs can be used to give precise answers to specific questions – evaluating the specific
changes in well-being attributed to a particular programme but :
– They can only answer the question posed by the trial.
– External validity concerns
– Not able to look at long-term effects (cohort maintenance, costs)
Cross-sectional
• Representativeness
• Easier and cheaper to administer
• Useful for drawing a picture about a specific aspect of the society (e.g. DHS).
Value of longitudinal (cohort) studies vs. cross sectional data & RCT
4. They are not competing methodologies: but rather to employ each to triangulate
between methods, and to use one to inform the other (particularly relevant in
developing countries).
Triangulate between methods
C, L • Observe the problem
L
• Understand the origin of the problem (t-n,…t-2, t-1, t, t+1;
t+2,..,t+n)
L
• Identify areas worth examining in greater detail
RCT
• Test different solution and their effectiveness in the short, medium
test
L
• Understand the post-intervention dynamics and the effectiveness
in the long run
C
• Drawing a representative snapshot of the intervention status quo
6. Young Lives: a unique multi-countries multi-cohorts
longitudinal study
7. – Sentinel site sampling; four stages sampling process (region, district/provinces, sentinel
sites, random sampling of children within sites);
– Purposively over-sampled poor areas (40% urban / 60% rural) using different poverty
indicators in each countries
Ethiopia India Peru Vietnam
Sampling design
8. 1. Longitudinal data covering a period of 15 years from early childhood to adulthood
2. A life-course approach, very relevant for policy design (early childhood, middle
childhood and adolescence)
3. Cross-cohorts, cross-countries comparisons
– Compare two cohorts at the same age (trends, exposure to different policy context)
– A new generation (Children of YL’s children)
4. Use (panel) siblings data to investigate how household or community circumstances
affect child outcomes at the same age; explore intra-household dynamics; controlling for
the influence of past events and circumstances
5. Comprehensive set of information collected at community and household level
(caregiver, YL child, a subsample of (younger) siblings and the children of the YL
children)
Nice features of YL data
9. 1. Nutrition & Health
2. Education: School Effectiveness; Learning Trajectories and Skills Formation
over the Life-cycle
3. Pathways to and from Marriage and Parenthood
4. Transition to the Labour Market
5. Poverty & Inequality
Main areas for policy relevant research
10. Nutrition & Health
• What are the long-run effects of early childhood malnutrition? What are the impacts on
the development of cognitive skills and psycho-social competencies?
• What is the incidence, extent, determinants of growth recovery and failure in adolescence?
• What is the nature and determinants of maternal malnutrition during the life-cycle and the
implications for maternal and child outcomes?
• What does predict risky behaviours (smoking, drinking, drugs, criminal behaviours)?
Main areas for policy relevant research (1)
11. Education: School Effectiveness
• What are the characteristics of effective schools (including teacher, management
characteristics; public vs. private schools, language of instruction etc.)
• What lessons may be learned across contexts concerning school effectiveness and
educational policies
Education: Learning Trajectories and Skills Formation over the Life-cycle
• To what extent schooling is important in shaping children’s learning and for cognitive skills,
non-cognitive and technical skills formation? At which stages of the educational life-course
is schooling more or less ‘critical’?
• At what stages do learning gaps emerge, widen or narrow?
Main areas for policy relevant research (2)
12. Pathways to and from Marriage and Parenthood
• What are the (early) social and economic predictors of getting married, cohabiting or having
a child during the teen years?
– What is the role of parental and childhood expectations and aspirations, as well as
gender norms and preferences?
– How do changes in the labour market affect young people’s relationships and decisions
around marriage and parenthood?
• What are the social and economic consequences of getting married, cohabiting or having a
child during the teen years? (e.g. women’s economic participation)
• What affects the quality of married life and decision-making for married/cohabiting
adolescent girls, boys and couples?
Main areas for policy relevant research (3)
13. Transition to the Labour Market
• What happens to young women and men when they leave education and enter the labor
market at the age of 15 and 22? How many of them are employed (and self-employed),
unemployed, inactive and under-unemployed?
• How their background and experiences as children shapes their access to the labor market?
• What skills facilitate the transition to the labor market and to “quality” jobs? To what
extent education and training are effectively equipping youth with the “right” skills for the
labor market.
• To what extent young people realized their childhood aspirations? What role do
expectations play?
• How is the school-to-work transition of young people related to other parallel key early life
transitions, including cohabitation, marriage and childbearing? How young people conciliate
paid activities with other responsibilities?
Main areas for policy relevant research (4)
14. Poverty & Inequality
• Exploring the links between childhood poverty, the strategies people use to earn their living
and the assets available to them, and the implications for children’s long-term life chances.
• How do inequalities interact in the ways they impact on children’s development potential?
• How do inequalities, including gender inequalities, evolve during early, middle and later
childhood?
• The impact of transfers and social protection.
Main areas for policy relevant research (5)
15. – Challenge 1. Cohort maintenance
– Challenge 2. Getting comparable measures over time
– Challenge 3. Across countries coordination/comparability
– Challenge 4. Ensure high quality data
– Challenge 5. Data collection methods: switch to CAPI
Multi cohorts longitudinal data: Main Challenges
16. Challenges :
– Some attrition is inevitable
– Cohort is relatively small for a longitudinal study
– Study period is relatively long (three years gap between waves)
Risk mitigating strategies:
– Collecting detailed contact information
– Importance of tracking
– Maintains continuity of social contact and trust between researchers and families
– Reduce refusal rates as much as possible:
₋ Importance of explaining what we’re doing
₋ Reciprocity
₋ Ensure no respondents are over-loaded (by different elements/sub-studies)
₋ Compensations (Losing a day of work has big impact on income)
Challenges: 1. Cohort maintenance & attrition
17. …and we have been quite successful!
YC OC Overall
Ethiopia 2.2% 8.4% 4.3%
India 2.6% 4.3% 3.2%
Peru 6.3% 10.3% 7.3%
Vietnam 2.9% 9.9% 5.3%
Total 3.6% 8.1% 5.0%
ETHIOPIA INDIA
PERU VIETNAM
18. Challenges:
– The questions need to change as the children grow up
– Keep as many questions as possible the same across rounds (panel variables)
– Asking the same questions of the YC as we did the OC in earlier rounds (core base
variables)
– Ensure comparability over time (e.g. cognitive tests-- Item Response Theory)
Limitations for comparability:
- Switch from PAPI to CAPI;
- Some changes in the structure of the questionnaire are inevitable
- `Getting stuck’ with the errors of the past to the seek of maintain comparability across
rounds
Challenges: 2. Getting comparable measures over time
19. Benefits:
– How patterns of relationships are similar/different across countries.
– Understanding why and how specific policies or programmes are effective in one
country.
– Comparative analysis can give greater confidence that evidence found in one country is
applicable to others.
– Learning in relation to methods: trying to develop measures that can be used across
cultures.
Challenges:
– Constructing a questionnaire that suits different national contexts.
– Ethical committee approval and country specific sensitivities.
– Deal with different fieldwork processes.
Risk mitigating strategies :
– Define research priorities and relevant survey questions in each country
– There are also some country variations
– Translation and back translation is key to ensure consistency
– Continuity of country team leaders and fieldworker coordinators.
Challenges: 3. Across countries coordination and comparability
20. Challenges:
– Maintaining increasingly complex survey instruments
– Maintaining strong coordination and liaison between Quant/Qual/ School survey teams
– Participant recall
Risk mitigating strategies:
– Piloting and training are crucial!
₋ Ensure research questions work in the field and are consistent with local situations
and children’s ages
₋ Ensure questionnaire are not too long / burdensome
₋ Train teams and learn from practical experience of field work
₋ Produce accurate instrument manuals and protocols
₋ Ensure that good data collection systems are in place
– Consistency checks are embedded in CAPI, some information are prefilled, ultimately
some inconsistencies can be solved ex-post
Challenges: 4. Quality of the data
21. • CAPI introduced in R4 – is a different way of doing surveys (e.g. changes
dynamic of interview)
Benefits:
– Eliminate data entry error.
– Know how work is progressing
– Avoid mistakes before they happen (embedded skip pattern)
Challenges:
– Requires more time at the front end (building the programme)
– Fieldworkers to get familiar with a new instruments
– Put in place a data management and transfer systems
– Devolve responsibilities to the in-country data managers (in Peru and Vietnam)
Risk mitigating strategies:
– Extra effort at the front end in programming
– Piloting and testing the application is crucial!
– Training country data managers and fieldworkers on data management and
transfer systems.
Challenges: 5. Introducing CAPI
24. Young Lives in pills
₋ Multi-disciplinary study that aims to:
- improve understanding of childhood poverty and inequalities
- provide evidence to improve policies & practice
⁻ Young Lives components:
Household survey (child, caregiver, younger siblings, children of the YL children,
community representatives); Longitudinal qualitative research; School survey: parallel
to round 2 and 5 of the household survey.
₋ Following nearly 12,000 children in 4 countries: Ethiopia; India (Andhra Pradesh &
Telangana); Peru and Vietnam
₋ Over a 15-year period: first data collected in 2002, with 5 survey rounds
₋ Two age cohorts in each country:
- 2,000 children born in 2000-01
- 1,000 children born in 1994-95
₋ Collaboration:
- Partners in each study country
- Publicly archived survey data (UK Data Archive and listed on the World Bank Micro Data
website) and core-funded by DFID
25. Step 1: Design The Survey Questionnaire
Step 2: Tracking and preparing CAPI programme
Step 3: Training and piloting
Step 4: Fieldwork
Step 5: Data cleaning, validations
Step 6: Preliminary analysis and Research
Six steps from design to the field
26. – Demographic information (hh roster), socio economic indicators (wealth index,
food consumption)
– Health information and anthropometrics (YL child, parents, siblings and child of
YL children)
– Education history (all hh members) and cognitive skills (YL child, siblings)
– Subjective wellbeing and psychosocial competencies (YL child, siblings)
– Employment status/history and time use (all hh members)
– Job related skills
– Job and Educational Aspirations/expectations (YL child, parents)
– Expectations about marriage and parenthood (YL child, parents)
– Fertility history
– Marriage/cohabitation history
– Control over assets (intra-household decision making)
– Social norms indicators
– Knowledge on SRH and access to contraceptives
– Sexual behaviours, risky behaviours and criminal activities (Peru)
Information collected
27. Source: Outes and Dercon, 2008
Non-random attrition
₋ Attriting households (R1-R2) tend to have fewer assets, poorer access to services and
utilities and are less educated (more in Ethiopia and India than Peru and Vietnam)
(Panel A)
₋ These averages hide substantial variation between different types of attriting
households (Panel B)
₋ The presence of non-random attrition does not necessarily imply attrition bias: no
attrition bias found when looking (Ethiopia is an exception)
28. Ethiopia
Sampling design (1)
Four stages sampling process:
1. Regions (Amhara, Oromia, SNNPR, Tigray
and Addis Ababa, accounting for 96% of
national population)
2. Woredas (districts) (3-5 districts in each
regions, 20 in total)
3. Kebele (at least 1 for each woredas)
4. 100 young children (born in 2001-02)
and 50 older children (born in 1994-5)
were selected within those sites.
Criteria to select districts:
1. Districts with food deficit profile
2. Districts which capture diversity across
regions and ethnicities in both urban and
rural areas
3. Manageable costs in term of tracking for
the future rounds
Comparing with DHS and WMS 2000: 2000:
Poor hh are over-sampled, but YL covers the
diversity of children in the country including
up to 75% percentile of the Ethiopian
population.
29. India
Sampling design (2)
Four stages sampling process:
1. AP, Telangana
2. Districts
3. 20 sentinel sites (defined by mandal)
4. 100 young children (born in 2001-02) and 50
older children (born in 1994-5) were randomly
selected within those sites.
Criteria to (rank &) select districts & mandals:
1. Economic development (per capita income, %
of urban population)
2. Human development (female literacy, infant
mortality, etc.)
3. Infrastructure (total length of road per
100km2, n. of hospital beds per 10,000
people).
- One poor and one non-poor district/mandal in
each region/district (districts selected for
sampling covered approximately 28% of the
state population )
Comparison to the DHS 1998/9:
YLs hh seem to be slightly wealthier than the
average household in Andhra Pradesh. YL sample
covers the diversity of children in poor households
in Andhra Pradesh
30. Peru
Sampling design (3)
Sampling process:
1. Sample frame at district level excluding the
top 5% richest district based on poverty map
2001
2. Districts divided in population groups
ordered by poverty index and randomly
selected to cover rural, urban, peri-urban
coastal, mountain and amazon areas
(random selection proportional to district
population)
3. Within the selected districts a village was
randomly chosen
4. Within each village the street blocks were
counted and randomly numbered to select
the starting point.
Comparison to the DHS 2000:
YL cover the diversity of children and hh in Peru
31. Vietnam
Sampling design (4)
Four stages sampling process:
1. Regions (5/8 regions, North-East region, Red
River Delta, City, South Central Coast, Mekong
Delta.
2. Provinces (5 in total , 1 per region, Lao Cai,
Hung Yen, Da Nang Phu Yen, Ben Tre).
3. Sentinel sites (4 commune per province, 2 poor,
1 average and 1 above-average commune )
4. 100 young children (born in 2001-02) and 50
older children (born in 1994-5) were selected
within those sites.
Criteria to rank communes:
1. Development of infrastructure,
2. Percentage of poor households in the
commune
3. Child malnutrition status.
Comparison to the DHS and VHLSS 2002:
The urban sector is under-represented . YL includes
hh with on average less access to basic services and
slightly poorer than the average in Viet Nam. YL
sample covers the diversity of children in the
country.