SlideShare a Scribd company logo
1 of 25
Download to read offline
A measurement error model approach to survey
data integration: combining information from two
surveys
Jae Kwang Kim 1
Iowa State University
2017 SAE conference, Paris
July 11th, 2017
1
Joint work with Seho Park
Survey data integration
Want to combine information from multiple surveys
Three situations
1 Multiple samples for one target population
2 One sample each from multiple populations
3 Multiple samples from multiple populations
Small area estimation is a special case of survey data integration, in
that multiple sub-populations represent multiple domains.
Kim (ISU) Survey Data Integration 7/11/2017 2 / 25
Motivation
USAID Bureau for Food Security (BFS) sponsors Food and Nutrition
Technical Assistance III project (FANTA).
Key technical areas of focus are food security, maternal and child health,
agriculture, and livelihoods strengthening.
Kim (ISU) Survey Data Integration 7/11/2017 3 / 25
Motivation
FANTA has two projects: Feed the Future (FTF) and Food for Peace
(FFP) development projects.
FFP project was conducted by ICF International, and FTF project was
conducted by UNC MEASURE.
Two surveys were conducted in 2013 from selected departments in
Guatemala: San Marcos, Totonicapan, Quiche, Quezaltenango, and
Huehuetenango.
Kim (ISU) Survey Data Integration 7/11/2017 4 / 25
Map of Guatemala
Kim (ISU) Survey Data Integration 7/11/2017 5 / 25
FFP and FTF Projects in Guatemala
Figure: Selected Departments in Guatemala
Kim (ISU) Survey Data Integration 7/11/2017 6 / 25
Overlap Area
Figure: FFP ZOI and FFP Project Implementation Area for Guatemala
Kim (ISU) Survey Data Integration 7/11/2017 7 / 25
Overlap Area
Table: Overlap Area: Departments and Municipalities
Department Municipality
San Marcos Sibinal
Tajumulco
Totonicapan Momostenango
Santa Lucia La Reforma
Huehuetenango Chiantla
Concepcion Huista
Jacaltenango
San Antonio Huista
Todos Santos
Quetzaltenango San Juan Ostuncalco
Quiche Chichicastenango
(Santa Maria) Nebaj
Uspantan
Cunen
San Juan Cotzal
Kim (ISU) Survey Data Integration 7/11/2017 8 / 25
Common Indicators
Two surveys have their own indicators and 11 common indicators
were chosen to be studied.
The common items are about women’s nutritional status, children’s
well-being status, and prevalence of poverty in household.
Kim (ISU) Survey Data Integration 7/11/2017 9 / 25
Common Indicators
Table: Common Indicators
Indicator Description
Daily Per Capita Expendi-
tures (PCE)
Average daily per capita consumption con-
stant 2010 USD
Prevalence of Poverty
(PP)
Prevalence of poverty: percentage of people
living on less than $1.25 USD per capita per
day
Mean Depth Poverty
(MDP)
Average of the differences between total
daily
Prevalence of Households
with Hunger (HHS)
Prevalence of households with moderate or
severe hunger
Prevalence of Under-
weight Women
Women that are eligible for BMI (not cur-
rently pregnant and not within 2 months of
delivery) who has BMI less than 18.5
Women’s Dietary Diver-
sity Score (WDDS)
Mean number of food groups consumed by
women of reproductive age (15-49 years)
Kim (ISU) Survey Data Integration 7/11/2017 10 / 25
Common Indicators
Table: Common Indicators (Cont’d)
Indicator Description
Prevalence of Stunted
Children
Prevalence of stunted children under five
years of age (0-59 months)
Prevalence of Wasted
Children
Prevalence of wasted children under five
years of age (0-59 months)
Prevalence of Under-
weight Children
Prevalence of underweight children under
five years of age (0-59 months)
Prevalence of Children Re-
ceiving a Minimum Ac-
ceptable Diet (MAD)
Prevalence of children 6-23 months receiv-
ing a minimum acceptable diet
Prevalence of Exclusive
Breastfeeding (EBF)
Prevalence of exclusive breastfeeding of chil-
dren under six months of age
Kim (ISU) Survey Data Integration 7/11/2017 11 / 25
Estimates from two surveys
Table: Daily Per Capita Expenditure
Department FFP/ICF FTF/UNC T-statistics
N Mean S.E. N Mean S.E.
San Marcos 1419 0.558 0.014 981 1.166 0.018 -23.376
Totonicapan 1654 0.388 0.015 181 0.896 0.039 -5.505
Huehuetenango 877 0.456 0.023 1535 1.140 0.018 -30.587
Quetzaltenango 628 0.695 0.022 60 1.325 0.112 -26.179
Quiche 1288 0.382 0.015 1350 1.045 0.015 -12.179
Kim (ISU) Survey Data Integration 7/11/2017 12 / 25
Estimates from two surveys
Table: Prevalence of Households with Hunger (%)
Department FFP/ICF FTF/UNC T-statistics
N Mean S.E. N Mean S.E.
San Marcos 1419 3.76 0.50 981 15.35 1.08 -9.733
Totonicapan 1654 11.79 0.87 181 15.01 2.72 -1.125
Huehuetenango 877 8.91 0.91 1535 15.58 0.87 -5.323
Quetzaltenango 628 6.84 0.91 60 9.94 3.96 -0.765
Quiche 1288 7.13 0.74 1350 9.73 0.77 -2.430
Kim (ISU) Survey Data Integration 7/11/2017 13 / 25
Data Structure
Table: Data Structure
X Ya Yb
Sample A o o
Sample B o o
Kim (ISU) Survey Data Integration 7/11/2017 14 / 25
Goal: Synthetic data imputation
Table: Data Structure
X Ya Yb
Sample A o o o
Sample B o o o
Kim (ISU) Survey Data Integration 7/11/2017 15 / 25
Methodology
Steps
1 Specify a measurement error model.
2 Derive prediction model using Bayes theorem.
3 Parameter estimation: EM algorithm.
4 Generating imputed values from the prediction model.
Kim (ISU) Survey Data Integration 7/11/2017 16 / 25
Step 1: Model specification
Assume that Sample A is a gold standard one. That is, Ya = Y .
Structural Equation model
Ya ∼ f1(ya | x; θ1).
From the observations in Sample A, we can perform model
diagnostics.
Measurement error model
Yb ∼ f2(yb | ya; θ2).
Assume nondifferentiability of measurement error model
f (yb | x, ya) = f (yb | ya)
For dichotomous y-variables, measurement error model becomes
misclassification model.
Kim (ISU) Survey Data Integration 7/11/2017 17 / 25
Step 2: Prediction model
Prediction model is the model for the counterfactual outcome,
conditional on the observed values.
Prediction model for Yb in sample A:
p(yb | x, ya) = f2(yb | ya).
Prediction model for Ya in sample B: Using Bayes formula, we can
derive
p(ya | x, yb) =
f1(ya | x; θ1)f2(yb | ya; θ2)
f1(ya | x; θ1)f2(yb | ya; θ2)dya
The prediction model can be used to obtain the best prediction of Yai
for i ∈ Sb.
Kim (ISU) Survey Data Integration 7/11/2017 18 / 25
Step 3: Parameter estimation - EM algorithm
E-step: compute
Q1(θ1 | data; ˆθ(t)
) =
i∈Sa
wi,a log f1(yai | xi ; θ1)
+
i∈Sb
wi,bE{log f1(Ya | xi ; θ1) | xi , ybi ; ˆθ(t)
}
Q2(θ2 | data; ˆθ(t)
) =
i∈Sa
wi,aE{log f2(Yb | yai ; θ2) | x, yai ; ˆθ(t)
)
+
i∈Sb
wi,bE{log f2(ybi | Ya; θ2) | x, ybi ; ˆθ(t)
)},
where the conditional expectations are computed from the prediction
model in Step 2.
M-step: update the parameters by maximizing Q1 and Q2 wrt θ1 and
θ2, respectively.
Kim (ISU) Survey Data Integration 7/11/2017 19 / 25
Step 4: Best prediction
Using the measurement error model, we can predict yai by
ˆyai = E(Ya | xi , ybi ) for i ∈ SB.
A prediction estimation of µ = E(Ya) can be obtained by
ˆµ∗
=
i∈SA
wi,ayai + i∈SB
wi,b ˆyai
i∈SA
wi,a + i∈SB
wi,b
Reference: Kim, Berg, and Park (2016). Statistical Matching using
fractional imputation. Survey Methodology, 42, 19–40.
Kim (ISU) Survey Data Integration 7/11/2017 20 / 25
Application to FANTA project
1 Model for PCE
yai = xi β + ei
ybi = α0 + α1yai + ui
where ei ∼ N(0, σ2
e ) and ui ∼ N(0, σ2
u).
2 Model for HHS prevalence
yai ∼ Bernoulli(πi )
ybi ∼ Bernoulli{pyai + q(1 − yai )}
where logit(πi ) = xi β and p, q ∈ (0, 1).
Kim (ISU) Survey Data Integration 7/11/2017 21 / 25
Model Diagnostics for PCE model
-2 -1 0 1 2
-2-1012
Fitted Values Vs Residuals
Fitted Values
Residuals
-4 -2 0 2 4
-2-1012
Normal Q-Q Plot
Theoretical Quantiles
SampleQuantiles
Kim (ISU) Survey Data Integration 7/11/2017 22 / 25
Result: PCE Indictor
Department FFP FTF Combined
San Marcos 0.558 1.165 0.563
(0.030) (0.038) (0.026)
Totonicapan 0.388 0.895 0.331
(0.030) (0.085) (0.028)
Quiche 0.382 1.045 0.396
(0.030) (0.031) (0.026)
Huehuetenango 0.456 1.140 0.479
(0.044) (0.036) (0.027)
Quetzaltenango 0.695 1.325 0.795
(0.044) (0.232) (0.043)
Kim (ISU) Survey Data Integration 7/11/2017 23 / 25
Results for HHS indicator
Department FFP FTF Combined
San Marcos 3.76 15.35 3.77
(1.01) (2.22) (1.00)
Totonicapan 11.79 15.01 12.08
(1.70) (6.00) (1.60)
Quiche 7.13 9.73 7.19
(1.50) (1.57) (1.42)
Huehuetenango 8.91 15.58 8.75
(1.90) (2.00) (1.90)
Quetzaltenango 6.84 9.94 6.85
(1.80) (8.25) (1.70)
Kim (ISU) Survey Data Integration 7/11/2017 24 / 25
Concluding remark
Survey data integration using measurement error model is considered.
Prediction of the counterfactual outcome is obtained by Bayes
theorem.
Parameter estimation involves EM algorithm.
Bayesian approach can be developed (not discussed here).
Extension to GLMM model for the structural equation model is under
progress.
Kim (ISU) Survey Data Integration 7/11/2017 25 / 25

More Related Content

Similar to A measurement error model approach to survey data integration: combining information from two surveys

Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...essp2
 
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima models
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima modelsAnalyzing neonatal deaths in Zimbabwe using box-jenkins arima models
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima modelsSubmissionResearchpa
 
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIa
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIaDr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIa
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIaAKADEMIYA2063
 
Can big data help in the production of reliable local area statistics?
Can big data help in the production of reliable local area statistics?Can big data help in the production of reliable local area statistics?
Can big data help in the production of reliable local area statistics?kimlyman
 
O programie Rodzina 500+ na konferencji Quantitative Methods in Economics
O programie Rodzina 500+ na konferencji Quantitative Methods in EconomicsO programie Rodzina 500+ na konferencji Quantitative Methods in Economics
O programie Rodzina 500+ na konferencji Quantitative Methods in EconomicsGRAPE
 
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
 
Our research on 500+ during WIEM 2021 conference
Our research on 500+ during WIEM 2021 conferenceOur research on 500+ during WIEM 2021 conference
Our research on 500+ during WIEM 2021 conferenceGRAPE
 
Statistical analysis of correlated data using generalized estimating equation...
Statistical analysis of correlated data using generalized estimating equation...Statistical analysis of correlated data using generalized estimating equation...
Statistical analysis of correlated data using generalized estimating equation...Angelina Lessa
 
family planning
family planning family planning
family planning selam49
 
Effect of Climate Shock on Cognitive Development of Children in Ethiopia
Effect of Climate Shock on Cognitive Development of Children in EthiopiaEffect of Climate Shock on Cognitive Development of Children in Ethiopia
Effect of Climate Shock on Cognitive Development of Children in Ethiopiaessp2
 
analysis of a cross over design
analysis of a cross over designanalysis of a cross over design
analysis of a cross over designFasika Alemu
 
UNU WIDER Conf Daidone 1
UNU WIDER Conf Daidone 1UNU WIDER Conf Daidone 1
UNU WIDER Conf Daidone 1Gean Spektor
 
Modelling Food Systems as Neural Networks
Modelling Food Systems as Neural NetworksModelling Food Systems as Neural Networks
Modelling Food Systems as Neural NetworksIFPRI Africa
 
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUE
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUEDIABETES PREDICTOR USING ENSEMBLE TECHNIQUE
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUEIRJET Journal
 
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...Economic Research Forum
 

Similar to A measurement error model approach to survey data integration: combining information from two surveys (20)

Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
 
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima models
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima modelsAnalyzing neonatal deaths in Zimbabwe using box-jenkins arima models
Analyzing neonatal deaths in Zimbabwe using box-jenkins arima models
 
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIa
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIaDr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIa
Dr. Emmanuel Orkoh_2023 AGRODEP Annual Conference - Parallel Session IIIa
 
Project brief
Project briefProject brief
Project brief
 
Can big data help in the production of reliable local area statistics?
Can big data help in the production of reliable local area statistics?Can big data help in the production of reliable local area statistics?
Can big data help in the production of reliable local area statistics?
 
O programie Rodzina 500+ na konferencji Quantitative Methods in Economics
O programie Rodzina 500+ na konferencji Quantitative Methods in EconomicsO programie Rodzina 500+ na konferencji Quantitative Methods in Economics
O programie Rodzina 500+ na konferencji Quantitative Methods in Economics
 
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
 
Our research on 500+ during WIEM 2021 conference
Our research on 500+ during WIEM 2021 conferenceOur research on 500+ during WIEM 2021 conference
Our research on 500+ during WIEM 2021 conference
 
Adv.-Statistics-2.pptx
Adv.-Statistics-2.pptxAdv.-Statistics-2.pptx
Adv.-Statistics-2.pptx
 
Statistical analysis of correlated data using generalized estimating equation...
Statistical analysis of correlated data using generalized estimating equation...Statistical analysis of correlated data using generalized estimating equation...
Statistical analysis of correlated data using generalized estimating equation...
 
Passive Dietary Monitoring - the use of wearable cameras and AI to quantify d...
Passive Dietary Monitoring - the use of wearable cameras and AI to quantify d...Passive Dietary Monitoring - the use of wearable cameras and AI to quantify d...
Passive Dietary Monitoring - the use of wearable cameras and AI to quantify d...
 
family planning
family planning family planning
family planning
 
Effect of Climate Shock on Cognitive Development of Children in Ethiopia
Effect of Climate Shock on Cognitive Development of Children in EthiopiaEffect of Climate Shock on Cognitive Development of Children in Ethiopia
Effect of Climate Shock on Cognitive Development of Children in Ethiopia
 
analysis of a cross over design
analysis of a cross over designanalysis of a cross over design
analysis of a cross over design
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
UNU WIDER Conf Daidone 1
UNU WIDER Conf Daidone 1UNU WIDER Conf Daidone 1
UNU WIDER Conf Daidone 1
 
Austin Aging Research
Austin Aging Research Austin Aging Research
Austin Aging Research
 
Modelling Food Systems as Neural Networks
Modelling Food Systems as Neural NetworksModelling Food Systems as Neural Networks
Modelling Food Systems as Neural Networks
 
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUE
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUEDIABETES PREDICTOR USING ENSEMBLE TECHNIQUE
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUE
 
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...
“Dynamics and Decomposition of Inequality of Outcomes and Inequality of Oppor...
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 

A measurement error model approach to survey data integration: combining information from two surveys

  • 1. A measurement error model approach to survey data integration: combining information from two surveys Jae Kwang Kim 1 Iowa State University 2017 SAE conference, Paris July 11th, 2017 1 Joint work with Seho Park
  • 2. Survey data integration Want to combine information from multiple surveys Three situations 1 Multiple samples for one target population 2 One sample each from multiple populations 3 Multiple samples from multiple populations Small area estimation is a special case of survey data integration, in that multiple sub-populations represent multiple domains. Kim (ISU) Survey Data Integration 7/11/2017 2 / 25
  • 3. Motivation USAID Bureau for Food Security (BFS) sponsors Food and Nutrition Technical Assistance III project (FANTA). Key technical areas of focus are food security, maternal and child health, agriculture, and livelihoods strengthening. Kim (ISU) Survey Data Integration 7/11/2017 3 / 25
  • 4. Motivation FANTA has two projects: Feed the Future (FTF) and Food for Peace (FFP) development projects. FFP project was conducted by ICF International, and FTF project was conducted by UNC MEASURE. Two surveys were conducted in 2013 from selected departments in Guatemala: San Marcos, Totonicapan, Quiche, Quezaltenango, and Huehuetenango. Kim (ISU) Survey Data Integration 7/11/2017 4 / 25
  • 5. Map of Guatemala Kim (ISU) Survey Data Integration 7/11/2017 5 / 25
  • 6. FFP and FTF Projects in Guatemala Figure: Selected Departments in Guatemala Kim (ISU) Survey Data Integration 7/11/2017 6 / 25
  • 7. Overlap Area Figure: FFP ZOI and FFP Project Implementation Area for Guatemala Kim (ISU) Survey Data Integration 7/11/2017 7 / 25
  • 8. Overlap Area Table: Overlap Area: Departments and Municipalities Department Municipality San Marcos Sibinal Tajumulco Totonicapan Momostenango Santa Lucia La Reforma Huehuetenango Chiantla Concepcion Huista Jacaltenango San Antonio Huista Todos Santos Quetzaltenango San Juan Ostuncalco Quiche Chichicastenango (Santa Maria) Nebaj Uspantan Cunen San Juan Cotzal Kim (ISU) Survey Data Integration 7/11/2017 8 / 25
  • 9. Common Indicators Two surveys have their own indicators and 11 common indicators were chosen to be studied. The common items are about women’s nutritional status, children’s well-being status, and prevalence of poverty in household. Kim (ISU) Survey Data Integration 7/11/2017 9 / 25
  • 10. Common Indicators Table: Common Indicators Indicator Description Daily Per Capita Expendi- tures (PCE) Average daily per capita consumption con- stant 2010 USD Prevalence of Poverty (PP) Prevalence of poverty: percentage of people living on less than $1.25 USD per capita per day Mean Depth Poverty (MDP) Average of the differences between total daily Prevalence of Households with Hunger (HHS) Prevalence of households with moderate or severe hunger Prevalence of Under- weight Women Women that are eligible for BMI (not cur- rently pregnant and not within 2 months of delivery) who has BMI less than 18.5 Women’s Dietary Diver- sity Score (WDDS) Mean number of food groups consumed by women of reproductive age (15-49 years) Kim (ISU) Survey Data Integration 7/11/2017 10 / 25
  • 11. Common Indicators Table: Common Indicators (Cont’d) Indicator Description Prevalence of Stunted Children Prevalence of stunted children under five years of age (0-59 months) Prevalence of Wasted Children Prevalence of wasted children under five years of age (0-59 months) Prevalence of Under- weight Children Prevalence of underweight children under five years of age (0-59 months) Prevalence of Children Re- ceiving a Minimum Ac- ceptable Diet (MAD) Prevalence of children 6-23 months receiv- ing a minimum acceptable diet Prevalence of Exclusive Breastfeeding (EBF) Prevalence of exclusive breastfeeding of chil- dren under six months of age Kim (ISU) Survey Data Integration 7/11/2017 11 / 25
  • 12. Estimates from two surveys Table: Daily Per Capita Expenditure Department FFP/ICF FTF/UNC T-statistics N Mean S.E. N Mean S.E. San Marcos 1419 0.558 0.014 981 1.166 0.018 -23.376 Totonicapan 1654 0.388 0.015 181 0.896 0.039 -5.505 Huehuetenango 877 0.456 0.023 1535 1.140 0.018 -30.587 Quetzaltenango 628 0.695 0.022 60 1.325 0.112 -26.179 Quiche 1288 0.382 0.015 1350 1.045 0.015 -12.179 Kim (ISU) Survey Data Integration 7/11/2017 12 / 25
  • 13. Estimates from two surveys Table: Prevalence of Households with Hunger (%) Department FFP/ICF FTF/UNC T-statistics N Mean S.E. N Mean S.E. San Marcos 1419 3.76 0.50 981 15.35 1.08 -9.733 Totonicapan 1654 11.79 0.87 181 15.01 2.72 -1.125 Huehuetenango 877 8.91 0.91 1535 15.58 0.87 -5.323 Quetzaltenango 628 6.84 0.91 60 9.94 3.96 -0.765 Quiche 1288 7.13 0.74 1350 9.73 0.77 -2.430 Kim (ISU) Survey Data Integration 7/11/2017 13 / 25
  • 14. Data Structure Table: Data Structure X Ya Yb Sample A o o Sample B o o Kim (ISU) Survey Data Integration 7/11/2017 14 / 25
  • 15. Goal: Synthetic data imputation Table: Data Structure X Ya Yb Sample A o o o Sample B o o o Kim (ISU) Survey Data Integration 7/11/2017 15 / 25
  • 16. Methodology Steps 1 Specify a measurement error model. 2 Derive prediction model using Bayes theorem. 3 Parameter estimation: EM algorithm. 4 Generating imputed values from the prediction model. Kim (ISU) Survey Data Integration 7/11/2017 16 / 25
  • 17. Step 1: Model specification Assume that Sample A is a gold standard one. That is, Ya = Y . Structural Equation model Ya ∼ f1(ya | x; θ1). From the observations in Sample A, we can perform model diagnostics. Measurement error model Yb ∼ f2(yb | ya; θ2). Assume nondifferentiability of measurement error model f (yb | x, ya) = f (yb | ya) For dichotomous y-variables, measurement error model becomes misclassification model. Kim (ISU) Survey Data Integration 7/11/2017 17 / 25
  • 18. Step 2: Prediction model Prediction model is the model for the counterfactual outcome, conditional on the observed values. Prediction model for Yb in sample A: p(yb | x, ya) = f2(yb | ya). Prediction model for Ya in sample B: Using Bayes formula, we can derive p(ya | x, yb) = f1(ya | x; θ1)f2(yb | ya; θ2) f1(ya | x; θ1)f2(yb | ya; θ2)dya The prediction model can be used to obtain the best prediction of Yai for i ∈ Sb. Kim (ISU) Survey Data Integration 7/11/2017 18 / 25
  • 19. Step 3: Parameter estimation - EM algorithm E-step: compute Q1(θ1 | data; ˆθ(t) ) = i∈Sa wi,a log f1(yai | xi ; θ1) + i∈Sb wi,bE{log f1(Ya | xi ; θ1) | xi , ybi ; ˆθ(t) } Q2(θ2 | data; ˆθ(t) ) = i∈Sa wi,aE{log f2(Yb | yai ; θ2) | x, yai ; ˆθ(t) ) + i∈Sb wi,bE{log f2(ybi | Ya; θ2) | x, ybi ; ˆθ(t) )}, where the conditional expectations are computed from the prediction model in Step 2. M-step: update the parameters by maximizing Q1 and Q2 wrt θ1 and θ2, respectively. Kim (ISU) Survey Data Integration 7/11/2017 19 / 25
  • 20. Step 4: Best prediction Using the measurement error model, we can predict yai by ˆyai = E(Ya | xi , ybi ) for i ∈ SB. A prediction estimation of µ = E(Ya) can be obtained by ˆµ∗ = i∈SA wi,ayai + i∈SB wi,b ˆyai i∈SA wi,a + i∈SB wi,b Reference: Kim, Berg, and Park (2016). Statistical Matching using fractional imputation. Survey Methodology, 42, 19–40. Kim (ISU) Survey Data Integration 7/11/2017 20 / 25
  • 21. Application to FANTA project 1 Model for PCE yai = xi β + ei ybi = α0 + α1yai + ui where ei ∼ N(0, σ2 e ) and ui ∼ N(0, σ2 u). 2 Model for HHS prevalence yai ∼ Bernoulli(πi ) ybi ∼ Bernoulli{pyai + q(1 − yai )} where logit(πi ) = xi β and p, q ∈ (0, 1). Kim (ISU) Survey Data Integration 7/11/2017 21 / 25
  • 22. Model Diagnostics for PCE model -2 -1 0 1 2 -2-1012 Fitted Values Vs Residuals Fitted Values Residuals -4 -2 0 2 4 -2-1012 Normal Q-Q Plot Theoretical Quantiles SampleQuantiles Kim (ISU) Survey Data Integration 7/11/2017 22 / 25
  • 23. Result: PCE Indictor Department FFP FTF Combined San Marcos 0.558 1.165 0.563 (0.030) (0.038) (0.026) Totonicapan 0.388 0.895 0.331 (0.030) (0.085) (0.028) Quiche 0.382 1.045 0.396 (0.030) (0.031) (0.026) Huehuetenango 0.456 1.140 0.479 (0.044) (0.036) (0.027) Quetzaltenango 0.695 1.325 0.795 (0.044) (0.232) (0.043) Kim (ISU) Survey Data Integration 7/11/2017 23 / 25
  • 24. Results for HHS indicator Department FFP FTF Combined San Marcos 3.76 15.35 3.77 (1.01) (2.22) (1.00) Totonicapan 11.79 15.01 12.08 (1.70) (6.00) (1.60) Quiche 7.13 9.73 7.19 (1.50) (1.57) (1.42) Huehuetenango 8.91 15.58 8.75 (1.90) (2.00) (1.90) Quetzaltenango 6.84 9.94 6.85 (1.80) (8.25) (1.70) Kim (ISU) Survey Data Integration 7/11/2017 24 / 25
  • 25. Concluding remark Survey data integration using measurement error model is considered. Prediction of the counterfactual outcome is obtained by Bayes theorem. Parameter estimation involves EM algorithm. Bayesian approach can be developed (not discussed here). Extension to GLMM model for the structural equation model is under progress. Kim (ISU) Survey Data Integration 7/11/2017 25 / 25