As survey costs increase and response rates decrease, researchers are looking to alternative methods to collect data from study subjects. Passive data are data collected from subjects without posing questions and recording responses. Examples are passive data are: location data collected from smartphones; applications installed on smartphones; activity data from fitness devices such as fitbits. Because they are collected without subject involvement, passive data may offer a way to reduce the burden born by our research subjects while also allowing us to collect high quality data needed for social science research. However, preliminary research into how to collect and analyze passive data is needed. In this talk, I present three research studies which use passive data to improve the quality and/or reduce the burden of survey data. The talk will focus on what we have learned and what research remains to be done.
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Three Studies on Supplementing Survey Data with Active Data
1. www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
3 Studies on Supplementing
Survey Data with Passive Data
Stephanie Eckman
2. Crisis (or Changes) in Traditional Survey Research
Traditional model
– Lengthy instrument
– High RR
– High coverage, random sample
New model
– Short instruments
– High incentives, low RRs
– Non-probability
Can Passive
Data Offer a
Way Forward?
3. Passive Data – Macro Level
Unobtrusive Measures:
Nonreactive Research in the
Social Sciences
– Webb, Campbell, Schwartz &
Sechrest
– 1971
4. Passive Data – Micro Level
AP Photo, from psys.org from techradar.com
5. Research Question
Can we use passive data to make surveys better?
response rates
data collection costs
measurement error
Motivated misreporting
Socially desirable reporting
6. Three Studies
Important issues
I won’t discuss
– Consent to passive data
– Nonprobability samples
– Ethics of passive data
Always-on location data
Location & device data
Fitness data
7. STUDY1
“Augmenting Survey Data with
Always-On Location”
with Rob Chew, Sam Goree, Herschel Sanders,
Mike Carpenter, Nick Baldasaro, Robert Furberg
8. STUDY1
Dual Data Collection
Convenience sample
– RTI employees with iPhones
Survey data
– Intake survey
– Daily survey for 2 weeks
– Outtake survey
Passive data
– Moves app: always-on location
1,928
coordinates
Median:
5 / subject / day
12. STUDY1
Potentially in Coordinates
Exercise behavior
Religious service attendance
Home location — correlated with income, race
Whether respondent visits daycare or school
Work location and hours
13. STUDY1
Where Did Our Subject Go?
Matched coordinates to 3 sources
– Google Places
– Yelp
– Foursquare
29% of
coordinates
had no match (homes?)
Remainder had
> 10 matches on average
Disagreement
between databases
Ex: grocery store / liquor
store / dentist office
Agreed on large places
15. STUDY1
Research Needed
Train ML model to predict R characteristics
from location data
– Study with gold standard data
Google, Facebook, SafeGraph, Foursquare
– Likely won’t share algorithms with us
– Slightly different emphasis
– Can we partner with them?
16. STUDY2
“Can Passive Data Replace Active
Data in Smartphone Surveys?”
with Tobias Konitzer & David Rothschild
18. STUDY2
Survey Data
Overrepresents:
Women
Low education
Questionnaire
Demographics
Race
Martial status
Presence of children
4 political knowledge items
6 political engagement items
1,971
respondents
From Pollfish database
20. STUDY2
Historical location data
– July 2017 – April 2018
– When app in use
– 304 coordinates per R
(median)
Passive Data: Special Delivery
Applications installed
– Facebook
– Snapchat
– Walmart
– VeryDice
21. STUDY2
Methods
21
Impute
– Race
– Marital Status
– Presence of children
– Political Knowledge index
– Political Engagement index
Compare 2 models:
– Age, gender only
– Age, gender, passive data
Language, OS, phone model
Home block group
Home location imputed:
– Phone location: 10pm – 6am
24. STUDY3
Data Sources
AddHealth study Wave 5
– Link responses to data from Fitbit, Jawbone, etc
Comparison of 2 sources on sleep and exercise
– Survey responses
– Device data
Both subject to error
25. STUDY3
Latent Class/Variable Models to Uncover Truth
2 sources, both measured with error
Similar to Oberski et al JASA, 2017
– Comparison of survey & administrative data
26. STUDY3
Latent Class/Variable Models to Uncover Truth
True values
Observed
Values
Sources
Sleep Exercise Exercise Sleep Exercise Exercise
Fitness
Tracker
Survey
Amount
of Sleep
Amount
of
Exercise
Amount of
Exercise
Amount of
Exercise
27. Discussion
Can we use passive data to make surveys better?
Info hidden in always-on location data
Ask fewer questions
– Lower cost
– Higher RR
Potential to reduce ME
– Offer clients alternatives
28. Discussion
What to worry about
Accuracy of sensor data
Have a plan before you collect data
Security
– Hacking of passive data
– Confidentiality
Selection bias