Response Rates Impact Data Quality, But not How you Might Think

www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
Response Rates Impact Data Quality,
but Not How you Might Think
Based on 2 papers:
Eckman, S and Koch, A. “The Relationship between Response Rates,
Sampling Method and Data Quality: Evidence from the European Social
Survey” Under Review
Eckman, S, Himelein, K and Dever, J. “Innovative Sample Designs Using
GIS Technology" forthcoming in Advances in Comparative Survey
Methods: Multicultural, Multinational and Multiregional Context.
Stephanie Eckman, RTI Fellow

Motivation
Relationship between RR & Data Quality
High response rates signal data are good quality
Response rates uncorrelated with data quality
– High RR survey no more accurate than low (Keeter et al, 2000)
– Merkle & Edelman (2002)
– Groves & Peytcheva (2008)
2

RR NR bias (Merkle Edelman)
3 Merkle & Edelman 2002

RRs do not Correlate with Nonresponse Bias
4 Groves & Peytcheva 2008

Motivation
Relationship between RR & Data Quality
High response rates signal data are good quality
Response rates uncorrelated with data quality
– High RR survey not more accurate than low (Keeter et al, 2000)
– Merkle & Edelman (2002)
– Groves & Peytcheva (2008)
But maybe high response rates are a sign that data are crap?
5

Data Quality
Total Survey Error Framework
– Undercoverage
– Nonresponse
– Measurement error
– Editing error
– Processing error
– etc.
Misrepresentation error
– Undercoverage + Nonresponse
Tradeoff between undercoverage & NR
– Eckman & Kreuter 2017
6
Image: http://makeagif.com/dkjuuc

European Social Survey
7 waves
30+ countries
Central Committee sets standards
– Core questionnaire
– Minimum effective sample size
– Paradata collection
– Documentation
– Face to face attempts
– RR standard 70%
Our data: 136 country-rounds in first 6 waves
7

Sampling Methods in Analysis
8
Sampling
Method Includes
Field Staff Involvement in
Selecting
nHousehold Person
Individual
Register
None None
70
Household
Register
Household Register
Address Register
None
Interviewer
None
Interviewer
41
Household
Walk
Listing
Random Walk
Lister
Lister
Interviewer
Interviewer
25

2 Measures of Data Quality
External measure:
– How different is ESS from Labor Force Survey?
– On 6 categorical variables: age, gender, HH size, marital status, etc.
– Index of dissimilarity measures how different 2 surveys are
– Average over 6 variables
– Assumes LFS is higher quality
Internal measure:
– 50% of all respondents from gender heterogeneous couples should be
women
– ‫ܫ‬஼,௝ > 1.96 indicates significant deviation from 50%
10
‫ܫ‬஼,௝ =
% female஼,௝ − 50
50 ∗ 50/݊
‫ܦ‬௒,஼,௝ = 0.5 ∗ ෍ |ܻ஼,௝,௞
ாௌௌ
− ܻ஼,௝,௞
௅ிௌ
|
௞

2 DVs, 2 IVs
Dependent variables: misrepresentation error
– External measure
– Internal measure
Independent variables
– RR
– Sampling method
Joint effect of RR and sampling method on data quality
11

2 Measures vs RR, by Sample Type
12

Regression Models
13
Estimated Regression Coefficients

Implications
High RRs might signal that you have problems with your data
– When interviewers select samples
– Interviewers seem to manipulate selection process to keep RRs high
Note that ESS does better random walk than other surveys
– Listing should be done by someone other than interviewer
Other problems with random walk
– Walker effects
– No probabilities of selection
14

Possible Solutions
What are some alternatives to random walk?
– Satellite Photos
– Reverse Geocoding
– Qibla Method
– Geosampling
– Listing with Drones
15

GIS Resources
Turn by turn directions on phone
Satellite images
– Daytime images
– Small-sat revolution
– Nighttime lights
Other remote sensing data
How can we exploit these resources for sampling?
– And avoid random walks problems
16

Reverse Geocoding
Geocoding: address → coordinate
Reverse geocoding: coordinate → address
– Select random points in segment
– Identify closest address
– Many online tools
– Used in Italy ISSP 2009, 2011
18

Example of Reverse Geocoding
19

20

21

Qibla Method
Qibla is Arabic for “in the direction of Mecca”
Given random starting coordinate
– Interviewer walks in the direction of Mecca
– Selects first HH encountered
22

Geosampling
Select first stage units
– Administrative units
– Or 1km squares
Select second stage units
– Smaller squares
Visit and interview all households in smaller unit
25

Geosampling: First & Second Stage
26

Eliminates separate listing
step
Still vulnerable to interviewer
manipulation
Possible QC by interviewer
GPS tracks? (Himelein et al,
2014)
Geosampling: Second & Third Stage
27

Use of UAVs for Listing
RTI has tested listing from drone images
– Galapagos & Guatemala
28
Amer et al 2016

Listing with Drones
29
Amer et al 2016

Listing with Drones
30
Amer et al 2016

Listing with Drones
Still testing use of drones
Legal issues
Use local staff to code from images
31

Conclusions
Ideal method:
– Removes influence of interviewer
– Results in equal probability sample of HUs
– With known probabilities
No alternative is perfect
– High involvement of interviewers
– High data requirements
Drones may prove useful
32

33
Stephanie Eckman, PhD
Fellow, Survey Research Division
RTI International
seckman@rti.org
stepheckman.com

Response Rates Impact Data Quality, But not How you Might Think

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to Response Rates Impact Data Quality, But not How you Might Think

Similar to Response Rates Impact Data Quality, But not How you Might Think (20)

More from Stephanie Eckman

More from Stephanie Eckman (6)

Recently uploaded

Recently uploaded (20)

Response Rates Impact Data Quality, But not How you Might Think

Editor's Notes