ICF International’s research assesses the feasibility of using data fusion to combine data for an alcohol survey collected from a dual-frame random digit dialing (RDD) telephone sample, with data collected from an opt-in web panel.
A benefit of RDD telephone samples is that they are based on probability sampling, yet they are challenged by decreasing response rates and increasing costs. Data collection via opt-in web-panel raises concerns regarding population representativeness, yet is a fraction of the cost of a telephone sample. A hybrid approach that fuses RDD survey data with opt-in panel data is attractive from a cost perspective.
The source of the RDD data is the thirteenth iteration of the National Alcohol Survey (N13), a dual-frame telephone survey conducted on landline and cell phones. N13 covers a number of alcohol related topics, including alcohol consumption and behaviors, effects of alcohol on individual lives and the lives of others, expenditures for alcohol, alcohol-attributed and non-attributed health conditions and perceptions about alcohol, as well as related factors such as emotional well-being.
This presentation was originally given at 2015 American Association for Public Opinion Research (AAPOR) Annual Conference.
For more info, visit: www.icfi.com/SurveyResearch
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Combining a Probability Based Telephone Sample with an Opt-in Web Panel
1. 1icfi.com |
Combining a Probability Based Telephone
Sample with an Opt-in Web Panel
Randal ZuWallack
James Dayton
Naomi Freedner-Maguire
ICF International
Katherine J. Karriker-Jaffe, PhD
Thomas K. Greenfield, PhD
Alcohol Research Group, Public
Health Institute
3. 3icfi.com |
Acronyms to Know
• RR – Response Rate
• NAS – National Alcohol Survey
• N1-N14 = the iteration of the NAS (N1 = the first iteration, N14 = the
14th iteration)
• CATI – Computer Assisted Telephone Interview
• RDD – Random Digit Dial
• F2F = Face to Face interview
4. 4icfi.com |
National Alcohol Survey (NAS)
• Adults ages 18 and older in the United States to measure a number
of alcohol related topics
• Questionnaire:
– Alcohol consumption and behavior
– Effects of alcohol on individual lives and the lives of others
– Perceptions about alcohol
– Emotional well-being
– …and more.
• Length averages 45 min
6. 6icfi.com |
Motivation
2. Survey length
Current drinkers: 48 min
Former drinkers: 41 min
Abstainers: 34 min
0%
10%
20%
30%
40%
50%
0 10 20 30 40 50 60
Survey length
(Minutes)
Break-off Rate
Current Drinkers Abstainers Former Drinkers
Drinker status
determined
= average length for partial
versus midterminate
7. 7icfi.com |
Motivation
3. Modernizing NAS
First NAS
“N1”
1964
F2F
N10
2000
RDD
CATI
N12
2009
Dual frame
CATI
N14
2019
??
New technologies
Emerging methods
“Fit-for-purpose”
8. 8icfi.com |
Proposed approach
• Combine probability-based RDD with nonprobability Web panel
using data fusion (aka statistical matching)
Cost
Not ALL IN on Web
• Why Web panel?
• Why data fusion?
• IDEA:
1. Use a probability based RDD to identify who’s who in the population
2. Use Web panel to measure behaviors, attitudes (what, where, when?)
9. 9icfi.com |
Data Fusion
18+ adult
Current drinker
Wine drinker
Beer drinker
Spirits drinkerFormer drinker
Abstainer
All that apply
Who? What? Where? When?
Wine
Beer
Spirits
18+ adult
Current drinker
Former drinker
Abstainer
• RDD measures population—%wine drinkers, etc.
• Web measures depth of information conditional on who’s who
10. 10icfi.com |
Data Fusion
Survey A
X, Y
Survey B
X, Z
XA = XB
Matched data
X, Y, Z
• Critical: Conditional independence
– Y is independent of Z given X
11. 11icfi.com |
Data
• NAS N13 extract
– Dual-frame RDD, CATI
– Average interview length: nearly 45 minutes
– National sample, oversamples in geos with high black or Hispanic
densities
– Data collection ongoing since October 2014. Data extracted on
December 23, 2014
– 3358 completed interviews (1336 cell phone, 2022 landline).
• NAS Web experiment
– Shortened version of the N13 questionnaire focused on alcohol
consumption.
– Average interview length: 20 minutes
– Conducted Jan 23-27 2014 by Schlesinger Associates
(http://www.schlesingerassociates.com/online_solutions.aspx)
– 841 completed surveys
12. 12icfi.com |
Methods
• Focus on
– Current drinkers
• 1932 interviews
out of 3358 (57%)
• 657 out of 841
(78%)
– Alcohol consumption
and behaviors
• How often drinking wine in past 12 months
• How often drinking beer in past 12 months
• How often drinking spirits in past 12 months
• How often drinking any alcohol in past 12 months
• Typical number of drinks when drinking wine
• Typical number of drinks when drinking spirits
• Typical number of drinks when drinking beer
• Number of times drinking 12 drinks
• Number of times drinking 8-11 drinks
• Number of times drinking 5-7 drinks
• Number of times getting drunk
• Maximum number of drinks in a single day
• How often do you drink when spending a quiet evening
at home?
• How often do you drink at bars, taverns, or cocktail
lounges?
• How often do you drink when spending time with
friends in a public place, such as a park, street, or
parking lot?
• How often do you drink at a party in someone’s home?
• Question 1: Are
alcohol consumption
and behaviors
conditionally
independent of
other topics on the
survey?
13. 13icfi.com |
Conditional Independence
Effects of drinking
Help for drinking problem
Drug use
Perceptions of drinking
Drinking injuries & illnesses
Emotional health
Education
Other people’s drinking
Stressful events
Neighborhood characteristics
Ethnic experiences
Correlation Analysis
1. We created 91 “other” variables
(+16 drinking behavior variables)
2. Formed variable clusters—oblique principal
component cluster analysis (SAS PROC
VARCLUS)
– Variables assigned to clusters based on
correlation with other variables in the cluster
– Used Spearman rank correlation
14. 14icfi.com |
Conditional Independence
• As expected, the 16 drinking behaviors clustered together
• 1 perception: “How much do you agree or disagree with the following
statements…I drink to be sociable.”
• 3 clusters:
– Beer consumption and heavy drinking
– Wine consumption and home drinking
– Social drinking (bars, with friends, at parties)
• “Other” variables formed 22 other clusters
15. 15icfi.com |
Conditional Independence
3. Calculated partial correlations of drinking clusters with “other”
clusters
– Used 1st principal component scores for each cluster (outcome from
the clustering algorithm)
– Imputation
– 3 stages for partial correlations
1. Design variables
2. + Wine, beer, spirits indicators and demos and general health status
3. + Key drinking variables
Squared Correlations Mean Min Max
Design variables 0.0260 0.0000 0.1889
Design and demographics 0.0147 0.0000 0.1158
Design, demographics, and
drinking variables
0.0069 0.0000 0.0452
Gender General health
Age Quality of life
Race/ethnicity Physical activity
Tenure Beer Drinker
FT student under 30 Wine Drinker
Presence of children Spirits Drinker
Marital status Drink quiet eve at home
Educational attainment Kept drinking wanted to stop
Employment status
16. 16icfi.com |
The Split
• Linking variables (X): Wine, beer, spirits indicators and demos and
general health status and key drinking variables
• RDD (Y): 52 total variables conditionally independent of drinking
behaviors
– “Other” clusters where 0 of the 3 partial correlations were significant at
0.05 level (7 clusters representing 28 variables)
– Visual inspection: 7 more clusters (24 variables) were marginally
correlated (max<0.08)
• Web (Z): 55 total variables
– 3 drinking behavior clusters (17 variables)
– 8 clusters correlated with drinking behaviors (38 variables)
• Question 1: Are alcohol consumption and behaviors conditionally
independent of other topics on the survey? YES and NO
• Question 2: Conditional on X, does Web = Phone (Zw=Zp)?
17. 17icfi.com |
Web = Phone?
• We explore differences in phone response versus web response
using adjusted means and frequencies
• Dependent: drinking variables
– Independent:
• Controls--Wine, beer, spirits indicators and demos and general health status
and key drinking variables
• Web or phone
• Examine differences between RDD phone or the Web panel after
controlling for the matching variables?
18. 18icfi.com |
Adjusted drinker distribution RDD vs Phone
• Differences in the
distribution of drinker
types—web consistently
higher
• But drinker type will be a
control in the matching
• What about consumption
for these drinker types?
79%
86%
65%
72%
67%
76%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
RDD Web RDD Web RDD Web
Wine Beer Spirits
Drinker types
RDD
Web
19. 19icfi.com |
26.6%
20.5%
19.9%
21.7%
11.3%
22.3%
19.1%
20.3%
24.5%
13.8%
<1 per mo 1 per mo 2-3 per
mo
1 per wk 3-4 per
wk or
more
Frequency of beer drinking
31.0%
20.8%
18.9%
16.9%
12.5%
23.3%
18.7%
19.9% 20.7%
17.4%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
<1 per
mo
1 per mo 2-3 per
mo
1 per wk 3-4 per
wk or
more
Frequency of wine drinking
Adjusted Frequencies
• Higher weekly consumption for Web vs RDD
– Spirits (not shown) is same pattern
RDD
Web
20. 20icfi.com |
Adjusted Means
<.0001 0.97330.00280.0195
2.01
2.27
2.57
2.75
2.38
2.56
3.51 3.51
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
RDD WEB RDD WEB RDD WEB RDD WEB
WINE DRNKRS BEER DRNKRS SPIRITS DRNKRS MAX DRNKS
TYPICAL DRINKS ON DRINK DAY
21. 21icfi.com |
Adjusted Frequencies
• Question 2: Conditional on X, does Web = Phone (Zw=Zp)?
YES AND NO
RDD
Web
2.6%
5.2%
14.8%
38.3%
1.9%
4.7%
15.8%
54.0%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
12+drnks 8-11 drnks 5-7 drnks Drunk
Heavy drinking in past 12 months
22. 22icfi.com |
Matching
• Used propensity score matching due to small sample sizes and
many combinations of age, race/ethnicity, educational attainment,
etc.
– Allowed multiple matches to Web, but only if Pscore<=0.01
– Not all web cases used (distance to far)
• Ex.--Female wine drinkers: 831 RDD, 399 Web
Typical number of wine drinks
RDD 1.46
RDD/Web 1.79
Drank enough to feel drunk (past yr)
RDD 35.5%
RDD/Web 49.9%
36.2%
19.0%
16.6%
14.1% 14.1%
27.6%
13.2%
18.0%
20.2% 21.0%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
<1 per mo 1 per mo 2-3 per
mo
1 per wk 3-4 per wk
or more
Frequency of wine drinking
RDD
RDD/Web
23. 23icfi.com |
Matching
• Ex.—Male beer drinkers
– 682 RDD, 165 Web
19.9%
17.6% 18.6%
26.4%
17.5%18.5%
12.0%
14.7%
27.5% 27.3%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
<1 per mo 1 per mo 2-3 per
mo
1 per wk 3-4 per
wk or
more
Frequency of beer drinking
Typical number of wine drinks
RDD 2.66
RDD/Web 2.66
Drank enough to feel drunk (past yr)
RDD 52.8%
RDD/Web 65.4%
RDD
RDD/Web
24. 24icfi.com |
Cost Benefit
RDD Survey length 45 25
Landline 0.265 0.375
Cell phone 0.175 0.205
Wt avg 40% cell 0.229 0.307
RDD CPI ($30 /hr) $131.00 $97.72
Web CPI $0.00 $7.50
Total CPI $131.00 $105.22
Savings $1.00 $0.80
Recruit more than
needed for
matching—
assume 50% more
for cost analysis.
25. 25icfi.com |
Summary
• Data fusion is a model based process of putting two (or more)
disparate data sets together if the are conditionally independent
• The second condition to our data fusion model is that the data
sources must represent the same population
– Web and phone responses are different even after adjusting for the
demographic differences and drinker types.
• Is it mode effects or model failure?
• Benefits of this model:
– Average interview length reduced
– Considerable cost savings
26. 26icfi.com |
Mode Effects
– No “January effect”—Phone data collected in fall 2014,
web in Jan 2015.
B2c. And how often do you usually have beer or malt
beverages?
02 More than once a day
03 Once a day
04 Nearly every day
05 Three or Four times a week
06 Once or twice a week
07 Two or three times a month
08 About once a month
09 Less than once a month but at least once a year
10 Less than once a year
11 Never
– Order effects?
• Higher consumption on the web
– Higher frequency of drinking on web
Web—primacy
Phone—recency
– Social desirability?
27. 27icfi.com |
– Social desirability?
• self-administered vs interviewer-administered
• direction and magnitude consistent with literature
Mode effects
• Frequency of getting drunk
– +16 points on web
B21. How often in the last twelve months
did you drink enough to feel drunk?
01 Every day or nearly every day
02 Three to four times a week
03 Once or twice a week
04 Once to three times a month
05 Less than once a month
06 Once in those 12 months
07 Never in those 12 months
28. 28icfi.com |
Next Steps
• Confirm the conditional independence for the 2 variable sets
– Based on a second larger sample
• Understand mode effects
– Is Web doing a better job at measuring consumption?
– Ask the NAS questions to a split sample: 50% CATI, 50% Web
• Must be same population
• Panel variance
– Want to explore the consistency of the NAS measures
• Multiple samples from the same panel (within panel variability)
• For different panels (between panel variability)
29. 29icfi.com |
Acknowledgements
• Thanks to Schlesinger Associates for recruiting web panelists free of
charge for this experiment
– Special thanks to Svetla Ninova and Jason Horine from Schlesinger
Associates
• For more information, please contact:
– Randy.Zuwallack@icfi.com
• Visit: icfi.com/SurveyResearch