INDIA HUMAN
DEVELOPMENT SURVEY
(IHDS)
TRAINING PROGRAM
MARCH 16, 2016
How to merge two rounds?
Merging Household Files
Relationship between IHDS-I
and IHDS-II households
IHDS-I sample
(N=41,554)
Replacement
households in
IHDS-II (N=2,134)
Split households
from round 1
(N=5,397)
Reinterview
Households
(N=34,621)
Attrition (N=6,911)
 Most important
concept in merging
two data files
1. Some households in
round 1 with no
match in round 2
and vice versa
2. Households in
round 1 match with
more than 1
household in round
2
Any questions?
 Who were chosen for reinterview?
 Recontact rate of 83%? What does it mean?
 How were replacement households chosen?
 What is a split household?
What is needed to merge
household files?
1. Round 1 household file – N=41,554
2. Round 2 household file – N=42,152
 (Why are there more cases in round 2?)
3. Linking file – N=42,152 – gives Round 1
identification codes for all Round 2
households that were reinterviewed, missing
linking codes for 2,134 households that are
new
Step 1 – Link round 2 data to
linking file to get round 1 ID
 use linkhh, clear
 sort STATEID DISTID PSUID HHID
HHSPLITID
 merge 1:1 STATEID DISTID PSUID HHID
HHSPLITID using round2HH
 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005, gen(_mergeR2link)
 save round2HH_plus, replace
Step 2-Merge this Round 2+ file
with Round 1 file
 use round1HH
 rename HHID HHID2005
 rename HHSPLITID HHSPLITID2005
 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005
 merge 1:m STATEID DISTID PSUID HHID2005
HHSPLITID2005 using round2HH_plus,
gen(_mergeR1R2)
 sort STATEID DISTID PSUID HHID HHSPLITID
 save mergedHHR1R2, replace
Cases in Merged file is superset
 Households surveyed in both rounds N=40,018
 Households surveyed in round 1 only (attrition)
N=6,911
 Households surveyd in round 2 only
(replacement) N=2,134
 Total N=49,063
 Keep only _mergeR1R2==3 for panel analysis
(N=40,018)
Merging Individual Files
Relationship between IHDS-I
and IHDS-II individuals
IHDS-I sample
(N=215,754)
New
individulas, new
HH (N=9,760)
New Ind in R1
HH (N=43,822)
Reinterview Ind
(N=150,995)
HH attrition
(N=29,299)
Ind. attrition in
interview hh
(N=35,464)
 Most important
concept in merging
two data files
1. Even reinterview
households have
new members
(births, marriages)
2. Even reinterview
households have
some members who
are no longer there
(deaths, marriages,
migration)
What is needed to merge
individual files?
1. Round 1 household file – N=215,754
2. Round 2 household file – N=204,568
 (Why are there more cases in round 2?)
3. Linking file – N=204,568 – gives Round 1
identification codes for all Round 2
households that were reinterviewed, missing
linking codes for 2,134 households that are
new
Step 1 – Link round 2 data to
linking file to get round 1 ID
 use linkind, clear
 sort STATEID DISTID PSUID HHID
HHSPLITID PERSONID
 merge 1:1 STATEID DISTID PSUID HHID
HHSPLITID PERONID using round2IND
 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005, gen(_mergeR2link)
 save round2IND_plus, replace
Step 2-Merge this Round 2+ file
with Round 1 file
 use round1IND
 rename HHID HHID2005
 rename HHSPLITID HHSPLITID2005
 rename PERSONID PERSONID2005
 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005 PERSONID2005
 merge 1:m STATEID DISTID PSUID HHID2005
HHSPLITID2005 PERSONID2005 using
round2IND_plus, gen(_mergeR1R2)
 sort STATEID DISTID PSUID HHID HHSPLITID
 save mergedINDR1R2, replace
Cases in Merged file is superset
 Individuals surveyed in both rounds N=150,988
 Individuals surveyed in round 1 only
(attrition/death/migration) N=64,766
 Individuals surveyd in round 2 only
(replacement/new) N=53,580
 Total N=269,334
 Keep only _mergeR1R2==3 for panel analysis
(N=150,988)
Evermarried woman file
linkage
Same process as individual file
linkage
 But only one thing to note, there was no ever
married woman file for 2004-5 so you will be
merging with the household file from 2004-5
Merging Caution
Merging overwrites variables
 So if you want to keep variables from round 1
and round 2 separate, before merging you may
want to rename all round 1 variables
 Typically we use the command
 Rename * x*
 Rename xSTATEID STATEID et. For merging
 So xr05 will be age in 20045 and r05 will be
age in 2011-12

Merging

  • 1.
    INDIA HUMAN DEVELOPMENT SURVEY (IHDS) TRAININGPROGRAM MARCH 16, 2016 How to merge two rounds?
  • 2.
  • 3.
    Relationship between IHDS-I andIHDS-II households IHDS-I sample (N=41,554) Replacement households in IHDS-II (N=2,134) Split households from round 1 (N=5,397) Reinterview Households (N=34,621) Attrition (N=6,911)  Most important concept in merging two data files 1. Some households in round 1 with no match in round 2 and vice versa 2. Households in round 1 match with more than 1 household in round 2
  • 4.
    Any questions?  Whowere chosen for reinterview?  Recontact rate of 83%? What does it mean?  How were replacement households chosen?  What is a split household?
  • 5.
    What is neededto merge household files? 1. Round 1 household file – N=41,554 2. Round 2 household file – N=42,152  (Why are there more cases in round 2?) 3. Linking file – N=42,152 – gives Round 1 identification codes for all Round 2 households that were reinterviewed, missing linking codes for 2,134 households that are new
  • 6.
    Step 1 –Link round 2 data to linking file to get round 1 ID  use linkhh, clear  sort STATEID DISTID PSUID HHID HHSPLITID  merge 1:1 STATEID DISTID PSUID HHID HHSPLITID using round2HH  sort STATEID DISTID PSUID HHID2005 HHSPLITID2005, gen(_mergeR2link)  save round2HH_plus, replace
  • 7.
    Step 2-Merge thisRound 2+ file with Round 1 file  use round1HH  rename HHID HHID2005  rename HHSPLITID HHSPLITID2005  sort STATEID DISTID PSUID HHID2005 HHSPLITID2005  merge 1:m STATEID DISTID PSUID HHID2005 HHSPLITID2005 using round2HH_plus, gen(_mergeR1R2)  sort STATEID DISTID PSUID HHID HHSPLITID  save mergedHHR1R2, replace
  • 8.
    Cases in Mergedfile is superset  Households surveyed in both rounds N=40,018  Households surveyed in round 1 only (attrition) N=6,911  Households surveyd in round 2 only (replacement) N=2,134  Total N=49,063  Keep only _mergeR1R2==3 for panel analysis (N=40,018)
  • 9.
  • 10.
    Relationship between IHDS-I andIHDS-II individuals IHDS-I sample (N=215,754) New individulas, new HH (N=9,760) New Ind in R1 HH (N=43,822) Reinterview Ind (N=150,995) HH attrition (N=29,299) Ind. attrition in interview hh (N=35,464)  Most important concept in merging two data files 1. Even reinterview households have new members (births, marriages) 2. Even reinterview households have some members who are no longer there (deaths, marriages, migration)
  • 11.
    What is neededto merge individual files? 1. Round 1 household file – N=215,754 2. Round 2 household file – N=204,568  (Why are there more cases in round 2?) 3. Linking file – N=204,568 – gives Round 1 identification codes for all Round 2 households that were reinterviewed, missing linking codes for 2,134 households that are new
  • 12.
    Step 1 –Link round 2 data to linking file to get round 1 ID  use linkind, clear  sort STATEID DISTID PSUID HHID HHSPLITID PERSONID  merge 1:1 STATEID DISTID PSUID HHID HHSPLITID PERONID using round2IND  sort STATEID DISTID PSUID HHID2005 HHSPLITID2005, gen(_mergeR2link)  save round2IND_plus, replace
  • 13.
    Step 2-Merge thisRound 2+ file with Round 1 file  use round1IND  rename HHID HHID2005  rename HHSPLITID HHSPLITID2005  rename PERSONID PERSONID2005  sort STATEID DISTID PSUID HHID2005 HHSPLITID2005 PERSONID2005  merge 1:m STATEID DISTID PSUID HHID2005 HHSPLITID2005 PERSONID2005 using round2IND_plus, gen(_mergeR1R2)  sort STATEID DISTID PSUID HHID HHSPLITID  save mergedINDR1R2, replace
  • 14.
    Cases in Mergedfile is superset  Individuals surveyed in both rounds N=150,988  Individuals surveyed in round 1 only (attrition/death/migration) N=64,766  Individuals surveyd in round 2 only (replacement/new) N=53,580  Total N=269,334  Keep only _mergeR1R2==3 for panel analysis (N=150,988)
  • 15.
  • 16.
    Same process asindividual file linkage  But only one thing to note, there was no ever married woman file for 2004-5 so you will be merging with the household file from 2004-5
  • 17.
  • 18.
    Merging overwrites variables So if you want to keep variables from round 1 and round 2 separate, before merging you may want to rename all round 1 variables  Typically we use the command  Rename * x*  Rename xSTATEID STATEID et. For merging  So xr05 will be age in 20045 and r05 will be age in 2011-12