Random digit dialing cell phone surveys and surveillance systems: data quality, data weighting strategies, and bias
1. Random digit dialing cell phone surveys and
surveillance systems: data quality, data
ill t d t lit d t
weighting strategies, and bias
Cristine D. Delnevo, PhD, MPH & Daniel A. Gundersen, MA,
UMDNJ‐ School of Public Health
Randal S. ZuWallack, MS & Frederica R. Conrey, PhD
Randal S ZuWallack MS & Frederica R Conrey PhD
ICF Macro International
Presented at 137th Annual Meeting & Exposition
P t d t 137th A l M ti & E iti
Philadelphia, PA
November 7‐11, 2009
Work supported in part by the National Cancer Institute (R21CA129474 ) and a contract from the New
Jersey Department of Health and Senior Services, through funding from the Cigarette Tax
2. Wireless substitution (US), 2005 2008
Wireless substitution (US), 2005–2008
Source: Blumberg & Luke Wireless substitution: Early release of estimates from the National Health Interview Survey, July‐
December 2008. National Center for Health Statistics. May 2009.
3. Demographic differences
Demographic differences
• Gender: men more likely than
Wireless substitution by age (and time)
Wireless substitution by age (and time)
women to be wireless only
b i l l
• SES: Adults living in (30.9%) and
near poverty (23.8%) more likely
than higher income adults
h hi h i d l
(16.0%) to be wireless only
• Region: Wireless substitution
highest in South (21.3%) and
hi h t i S th (21 3%) d
Midwest (20.8%) vs. Northeast
(11.4%) or West (17.2 %)
• Race/Ethnicity: Wireless
R /Eth i it Wi l
substitution highest among
black (21.4%) and Hispanic
(25.0%) adults vs. Non‐Hispanic
(25 0%) adults vs Non Hispanic
white adults (16.6%)
Source: Blumberg & Luke Wireless substitution: Early release of estimates from the National Health Interview Survey, July‐
December 2008. National Center for Health Statistics. May 2009.
4. Biased health estimates?
Biased health estimates?
• Potential for biased health estimates due to sample
Potential for biased health estimates due to sample
under‐coverage remains a real, growing threat to
RDD health surveys
• Cell‐phone only also differs with respect to health
behaviors and the validity of some health estimates
based on traditional RDD surveys are increasingly
questionable
5. Health estimates by phone status
Health estimates by phone status
Has a landline Wireless-only
y No telephone
p
telephone
NHIS July – December 2007
5+ alcoholic drinks in 1 day 17.7 37.3 27.1
Current smoker 18.0 30.6 38.6
Uninsured 13.7
13 7 28.7
28 7 44.1
44 1
Has a usual place for care 87.5 68.0 61.8
Flu vaccination 32.7 16.6 20.9
Ever tested for HIV 34.7 47.6 45.8
Source: Blumberg SJ, Luke JV. Coverage bias in traditional telephone surveys of low‐income and young
adults. Public Opin Q. 2007;71:734–749
6. Cigarette smoking among young adults,
2003‐2005, NHIS & BRFSS
2003 2005 NHIS & BRFSS
Source: Delnevo, Gundersen & Hagman (2008) Declining prevalence of alcohol and smoking estimates
among young adults nationally: artifacts of sample under‐coverage? Am J of Epidemiology
7. New challenges: Wireless Mostly?
Telephone Status, NHIS July‐
December 2008
• The percentage of adults
living in wireless‐mostly
households has been
increasing
• Who are they?
Who are they?
(demographic & health
behaviors?)
• Will they respond to
landline surveys?
Source: Blumberg & Luke Wireless substitution: Early release of estimates from the National Health Interview Survey, July‐
December 2008. National Center for Health Statistics. May 2009.
8. Reactions?
• Starting in 2009, the Center
for Disease Control and
f Di C l d
Prevention (CDC) is
requiring states to
incorporate cell phone
incorporate cell phone
interviews in their regular
BRFSS sample
• Yet there is no widely
y
accepted methods of
evaluating data quality or
data weighting, particularly
for state and local area
for state and local area
surveys.
• AAPOR Cell Phone Task
Force report
Force report
9. This special session:
This special session:
• Analysis of data quality of cell phone
Analysis of data quality of cell phone
surveys
• Demonstrate weighting procedures for
merging cell phone samples with landline
samples
• An assessment of bias in landline only
An assessment of bias in landline only
surveys, and
10. Assessing Data Quality of
Cell Phone Random Digit
g
Dial Surveys
Frederica R Conrey
Randy Zuwallack
11. Data Quality
• Q lit of Responses
Quality f R
• Quality of Sample
12. 2008 New Jersey Adult Tobacco
Survey
• R d
Random Di it Di l L d (RR 19%) and
Digit Dial Land (RR=19%) d
Cell Phone (RR=16%)
• Short version
– 49 Questions (min=37; max=70)
– 534 Cell completes
– 468 Landline completes
13. Survival Analysis
• Diff
Different people get diff
t l t different surveys
t
because of skip patterns
• Survival analysis
– Measures the impact of survey mode on non-
response
– Controls for differences in survey length
15. Item Non-Response
Failure Rates by Phone Mode
Mean
M Median
M di Std
Cell 3.3% 2.2% 7.4%
Landline 3.4% 2.0% 5.5%
Survival predicted by p
p y phone mode:
Hazard=1.00, p>.95
16. Open Ended Response
Total Responses p Open
p per p
Open end reports of the
O d t f th End by Phone Mode
events of 2 recalled Responses
commercials / OE DK / OE
– 251 respondents were Cell .52 .49
asked at least one Landline .48 .55
open ended question P .56 .33
18. Unit Non-Response
Data
D t quality i th t
lit is threatened if
d
– Response rates are low AND
– The people who DO NOT respond are
different from those who DO.
If cell phone respondents are less likely to
respond, then there is non-response bias.
19. Survival Analysis
• C ll and l dli respondents may get
Cell d landline d t t
different surveys
• Response rates alone don’t tell the story
• Survival analysis tells whether cell
y
respondents are more likely to break off
g
given the same survey length
y g
20. Survival Model
25%
Sample Breaking Off
20%
g
15%
10% Cell
e
Landline
5%
0%
0 20 40 60 80
Survey Questions
p<.001
001
21. What does a difference in survey
survival mean?
• C ll respondents quit sooner th l dli
Cell d t it than landline
respondents.
• The sample under-represents cell phones
• The longer the survey, the worse the
g y
nonresponse bias
• The solution?
– Careful weighting
– Short surveys
22. In a population study of tobacco use
behavior…
Minimal difference
Mi i l diff Substantial difference
S b t ti l diff
between cell and between cell and
landline in response landline in response
quality rate
• No difference emerged in • Sample quality may be
item-nonresponse threatened if cell phone
• No difference emerged in surveys are too long or
richness of open end
i h f d weighted incorrectly
responses.
24. Thanks
• Cris Delnevo
Cris Delnevo
• Dan Gundersen
• NCI (R21CA129474), New Jersey Dept of
C ( 2 C 29 ) f
Health and Senior Services
25. Dual Frame
D
A. Adults in landline households with no cell phone,
A Adults in landline households with no cell phone
B. Adults in landline households with a cell phone, and
C. Adults in non‐landline households with a cell phone (cell only).
26. Dual Frame
D
Common designs:
Common designs:
Dual frame w/ no overlap: Landline (A+B) + Cell (C)
Dual frame w/ overlap: Landline (A+B) + Cell (B+C)
Uncommon design:
Dual frame w/ no overlap: Landline (A) + Cell (B+C)
Dual frame w/ no overlap: Landline (A) + Cell (B+C)
27. Weighting Challenges
• Challenge 1: How do we put the dual frames
Challenge 1: How do we put the dual frames
together?
• Challenge 2: Differential Nonresponse
28. Challenge 1
• How do we put the dual frames together?
How do we put the dual frames together?
• No overlap
– Estimate of cell‐only population size?
Estimate of cell only population size?
• Internal estimate
• External estimate: NHIS (Blumberg et al.)
• With overlap
– Must determine group membership
– Adjust for multiple selection probabilities
– Estimate of phone group population sizes?
29. Cell Survey
• “In addition to your cell phone is there at
In addition to your cell phone, is there at
least one telephone inside your home that is
currently working and is not a cell phone? Do
currently working and is not a cell phone? Do
not include telephones only used for business
or telephones only used for computers or fax
or telephones only used for computers or fax
machines.”
– ‘yes’ = dual user, while those who responded
yes = dual user while those who responded
– ‘no’ = cell‐only
30. Landline Survey
• “In addition to your residential landline
In addition to your residential landline
telephone, do you also use one or more cell
phone numbers?
phone numbers?”
– ‘yes’ = dual user
– ‘no’ = landline only.
no = landline only
31. Example 1‐‐Colorado
• Combine with BRFSS
Combine with BRFSS
• Group membership
–KKnown for cell
f ll
– Unknown for landline
• Limited to dual frame w/ no overlap
– Used 15% (NHIS state estimates) for merging
landline and cell
– Poststratified dual sample to age and sex.
32. Example 2
• You are midway through a landline survey and
You are midway through a landline survey and
want to add cell phones. You don’t know who
has a cell phone and who doesn t. What are
has a cell phone and who doesn’t What are
your options?
1) Add cell only
1) Add cell only
2) Add cell and dual‐users
33. Challenge 2
• Differential Nonresponse
Differential Nonresponse
• Cell‐only overrepresented when conducting
cell phone surveys.
cell phone surveys
– Contact rate
–CCooperation rate
i
• Those who rely more on their cell phone will
be easier to reach.
38. Our Goal
Rebalance
R b l
on cell
reliance
Rebalance
on landline
reliance
39. Measuring Telephone Reliance
• Cell only landline only
Cell only, landline only
• Classify Dual users
– “Of ll th t l h
“Of all the telephone calls that you receive, are…”
ll th t i ”
• All or almost all calls received on a cell phone? (cell‐
mostly)
• Some received on a cell phone and some on a regular
landline phone? (true‐dual)
• Very few or none received on a cell phone? (landline‐
mostly)
40. Telephone Reliance
C
L
e
a
l
n
d Landline Landline True Cell Cell l
l
i
Only Mostly Dual Mostly Only P
n (0) (1) (2) (3) (4) h
e o
n
e
41. Response Propensity
• Adjust for differential nonresponse by
Adjust for differential nonresponse by
benchmarking against NHIS
• Logistic regression model
Logistic regression model
– Dependent: Survey type
•1 b
1 = observe cell user in national cell phone survey
ll i ti l ll h
• 0 = observe cell user in NHIS
– Independent: Cell phone reliance (1 4) age race
Independent: Cell phone reliance (1‐4), age, race
42. Data sources
National cell sample
National cell sample NHIS
n=500
Landline mostly 8% 23%
True dual 27% 42%
Cell mostly 23% 17%
Cell only 42% 18%
Cell users 100% 100%
43. Applying the model
• Applied to same data—poststatification
Applied to same data poststatification
• Applied to independent data
–A
Assumption: national cell sample measures the
ti ti l ll l th
odds ratio for observing a cell‐only respondent in
a cell sample relative to a dual user.
a cell sample relative to a dual‐user
• S
State, local surveys
l l
44. Applying the model
Colorado
Cell Sample
w/o NR adj w/ NR adj
Landline mostly (1) 9% 26%
True dual (2) 26% 37%
Cell mostly (3)
C ll tl (3) 19% 15%
Cell only (4) 46% 21%
Total cell sample 100% 100%
45. Applying the model
• Assume landline only is 20% (we don’t know)
Assume landline only is 20% (we don t know)
CO
Landline only 20%
Landline mostly 21%
True dual 30%
Cell mostly 12%
NHIS
NHIS
Cell only 17% state
estimate
Total population
p p 100% = 15%
46. City Sample
Landline Cell Sample Combined
Sample Samples
w/o NR
w/o NR w/ NR
w/ NR w/o NR
w/o NR w/ NR
w/ NR
adj adj adj adj
Cell‐only ‐ 43.5 18.9 35.5 13.4
Cell‐mostly 12.4 25.1 19.8 13.8 12.2
True Dual 30.1 23.6 35.5 18.7 25.1
Landline mostly
Landline‐mostly 19.1 7.9 25.8 9.9 17.2
Landline‐only 38.4 ‐ ‐ 23.9 32.1
47. Conclusions
• Dual‐frame
Dual frame
– There are ways to combine the data, even when
we don t have a full picture of group membership.
we don’t have a full picture of group membership
• Differential nonresponse
– Response propensity model rebalances the cell
Response propensity model rebalances the cell
sample based on cell reliance.
– Can be applied at state and local levels when no
Can be applied at state and local levels when no
benchmarks exist.
– Next steps: explore a response propensity model
Next steps: explore a response propensity model
for landline.
49. Examining the bias in landline only
g y
surveys: How does the cell phone only
population differ from the landline
population differ from the landline
population on health indicators, and are
estimates from landline surveys biased?
Daniel A. Gundersen, MA, UMDNJ‐SPH
Cristine D. Delnevo, PhD, MPH, UMDNJ‐SPH
Randy S. ZuWallack, MS, ICF Macro
50. Cell Phone Substitution and RDD surveys
Cell Phone Substitution and RDD surveys
• RDD surveys (e.g. BRFSS) have traditionally only sampled
household telephones (i.e. landlines)
• Up until early 2000s, rate of cell phone only households
was small
• From mid 2000s, rate of substitution has grown
substantially
– 6 7% of adults in 2005 to 18 4% in 2008 nationally1
6.7% of adults in 2005 to 18.4% in 2008 nationally
• Higher among certain demographic groups1
– Young adults
– Hispanics and Blacks
– Poor and near poor
51. What is bias due to coverage error in the
sampling frame?
li f ?
• Non‐covered population is different from
Non covered population is different from
covered population on some variable of
interest
– If proportion of non‐covered ( ) is small, bias will
be small
– If difference between the covered and
,
noncovered is small, bias will be small
52. Previous Research
Previous Research
• Data from Jan 2004‐June 2005 NHIS found2
– Greater than 1 percentage point bias in binge drinking, smoking
prevalence, usual place for medical care, receiving influenza
vaccine
• Data from 2007 NHIS found3
Data from 2007 NHIS found
– Bias increased slightly for past year binge drinking and receiving
influenza vaccine
– These biases were larger among young adults and low income
These biases were larger among young adults and low income
persons
• Data from 2001‐2005 BRFSS on 18‐24 year olds found4
– Prevalence of binge drinking, heavy drinking, and cigarette
smoking declined during 2003‐2005; coincided with large
increase in wireless substitution among young adults
– NHIS and NSDUH did not observe similar declines during this
period
53. Our Study:
Our Study:
• Objective:
Objective:
– Assess the presence of bias in landline RDD due to
exclusion of cell phone only on select health
exclusion of cell phone only on select health
indicators
• Data Source and Instrument:
Data Source and Instrument:
– Cell phone RDD of adults in Colorado (n=501)
• May to September 2008
May to September 2008
• Instrument was shortened version of BRFSS
– BRFSS from same data collection period (n=4,527)
BRFSS from same data collection period (n 4,527)
54. Methodology
• Cell Phone sample:
Ce o e sa p e:
– Design weights account for probability of selection
• BRFSS
– Standard BRFSS design weight accounts for strata,
number of landlines and adults in the household
– Postratified b
f d by age(7)*sex*race
( )* *
• Merged data
–D i
Design weights scaled to represent share of
i ht l dt t h f
population by phone status
– Postratified by age(7)*sex*race
y g ( )
55. Statistical Analyses
Statistical Analyses
• Comparisons of BRFSS landline and Cell Only based on design
weights
• Comparison of BRFSS landline and merged data are
postratified to demographic makeup of CO
• We assume the merged data to be unbiased (i.e. no coverage
error due to cell only exclusion)
error due to cell only exclusion)
• All analyses conducted in STATA v.10.1 to account for complex
sampling design
56. Table. CO BRFSS vs. CO Cell Only, May‐September 2008 (n=5,028)
y, y p ( , )
BRFSS landline Cell only
y Difference
Health Indicator % (95%CI) % (95%CI) % (95%CI)
Smoking*
* 15.29 (±1.14) 28.14 (±4.46) ‐12.85 (±4.60)
( ) ( ) ( )
Ever had HIV test* 36.64 (±1.85) 52.51 (±5.09) ‐15.87 (±5.41)
Having health insurance* 88.36 (±1.11) 72.46 (±4.38) 15.9 (±4.51)
Having primary care provider* 85.91 (±1.16) 60.39 (±4.85) 25.52 (±5.00)
Not affording care due to cost 12.27 (±1.08) 20.42 (±3.94)
Not affording care due to cost* 12 27 (±1 08) 20 42 (±3 94) ‐8.15 (±4.08)
‐8 15 (±4 08)
*p<.05; data weighted to correct for sampling design
67. Summary of Findings
Summary of Findings
• Bias is present not only among those with high wireless
substitution rates
• Smoking prevalence underestimated among those with
higher wireless substitution rates
higher wireless substitution rates
• Ever had an HIV test substantially underestimated
among all groups
– R l ti bi l
Relative bias large among those with high wireless
th ith hi h i l
substitution rates (young adults, non‐whites, low SES)
• Bias for health care insurance and having primary care
provider is underestimated among non‐whites, but
id i d i d hi b
overestimated among other groups
– Relative bias is small
68. Implications for study design and
analysis
l
• When possible, include an RDD of cell phone only
population (BRFSS now does this)
– If you can’t, be aware of the potential for bias and interpret
findings accordingly
• If you’re analyzing landline RDD data from past years
– Interpret findings with potential bias in mind
– Historical trend may observe artificial changes due to coverage
error
– Wireless substitution rates differ by geographic region so
problem may be less in certain areas
• A bi
A bias present today may not be the same historically or in
d b h hi i ll i
the future
– Characteristics of the early adopters may not be the same as the
current cell only population today or laggards
current cell only population today or laggards
69. Limitations
• Unable to assess bias in some subpopulations
Unable to assess bias in some subpopulations
due to small sample size
• Study does not account for cell phone mostly
Study does not account for cell phone mostly
population
70. References
1. Blumberg SJ & Julian V. Luke. (2009). Wireless Substitution: Early
release of estimates from the National Health Interview Survey,
l f ti t f th N ti l H lth I t i S
July‐December 2008.
2. Blumberg SJ, Luke JV & Marcie L. Cynamon. (2006). Telephone
coverage and health survey estimates: evaluating concern about
coverage and health survey estimates: evaluating concern about
wireless substitution. American Journal of Public Health. 96(5):
926‐931.
3. Blumberg SJ & Julian V. Luke. (2009). Reevaluating the need for
concern regarding noncoverage bias in landline surveys. American
Journal of Public Health. 99(10): 1806‐1810.
4. Delnevo CD, Gundersen DA & Brett T. Hagman. (2008). Declining
estimated prevalence of alcohol drinking and smoking among
estimated prevalence of alcohol drinking and smoking among
young adults nationally: artifacts of sample undercoverage?
American Journal of Epidemiology. 167(1): 15‐19.
71. Contact Info
Contact Info
Cristine Delnevo, PhD, MPH delnevo@umdnj.edu
Cristine Delnevo PhD MPH delnevo@umdnj edu
Daniel A. Gundersen, MA gunderda@umdnj.edu
Randal ZuWallack Randal.Zuwallack@macrointernational.com
Riki Conrey Frederica.Conrey@macrointernational.com