Pres jsm jul31_boudreaux

Design Effect Anomalies in the American
Community Survey
Michel Boudreaux
Joint Statistical Meetings
July 29, 2012
San Diego, CA

Funded by a grant from the Robert Wood Johnson Foundation

Overview

• Background
• Irregularities in the DEFF of the ACS
• Potential drivers
• Practical significance for analysts

2

Design Effects
• A relative measure of sample efficiency
𝜎2
𝑐𝑜𝑚𝑝𝑙𝑒𝑥
𝐷𝐸𝐹𝐹 =
𝜎2
𝑆𝑅𝑆
• Sensitive to design elements
– Stratification (-)
– Intra-cluster correlation (+)
– Variance in the weights (+)
• Unique to each variable
• Typically 2-4
3

ACS Sample Design
• Separate design for HU and GQ
• Housing Unit Design
– Frame: MAF
– Each county chosen with certainty
– Sub-county strata defined on population and RR
– 1/3 Non-responders sampled for personal
interviews
– PUMS created as a systematic sample such that a
1% sample of each state is formed

4

Construction of Full Sample Weight

• Inverse of the probability of selection (BW)
• CAPI Sub-sample adjustment (SSF)
• Seasonal response adjustment
• Non-interview adjustment
• Mode bias adjustment
• Raking to control totals at the sub-county
level

5

Complex Variance Estimation

• Successive Difference Replication
– Similar to BRR w/Fay’s adjustment
– Geographic sort order is informative
• Replicate Weights (80 replicates)
– 1 of 3 replicate factors applied to each case
• 1.0 (50% of cases), 0.3, 1.7
• Factor assigned using a Hadamard Matrix (RF)
– BW adjusted with replicate factor
– Weighting process repeated

6

SDR Formula

𝑅
4
𝜎𝑥 = (𝑥 𝑟 − 𝑥0 )2
𝑅
𝑟=1

7

Design Effects in the 2009 PUMS
25.0
21.7 2009 ACS PUMS
20.0 Average ratio of Nation
to State: 3.00
15.0

10.0
10.0 National ACS
7.0
5.0 State ACS
5.0
2.0 National CPS

0.0
Health Poverty Personal People in HU that are
Insurance Earnings Rental Rental
Housing

8

Design Effects in the 2010 PUMS
25.0
21.2 2010 ACS PUMS
to State: 2.88
15.0

10.0 8.8 National
6.4 State Average
5.0 4.2
1.8
0.0
Health Poverty Personal People in HU that are
Insurance Earnings Rental Rental
Housing

9

Design Effect for Health Insurance in 2008
Internal File (Derived from AFF)
9.00
8.01
8.00
7.00
6.00
5.00
4.00
2.96
3.00
2.00
1.00
0.00
National State Ave

10

2008 Design Factors (Published vs. Derived)
• DF’s are ratios of standard errors
– Used as a ratio-adjuster in place of SDR

3.0
2.5
2.5
2.0 1.8 1.8 1.7
1.5 Derived from PUMS
1.0 Published

0.5
0.0
National State Average

11

Using an Alternative Complex Variance
Estimator
• Taylor Series
– Strata: PUMA; Cluster: Household
5.0 4.7
4.0 3.2
3.0
2.0 1.8
1.1 National ACS
1.0 0.5 State Average
0.0
Health Poverty Personal People in HU that
Insurance Earnings Rental are Rental
Housing

12

Summary of Anomaly
• National DEFF is far larger than expectation
– Versus literature (Kish, 2003)
– Versus state average
– Versus CPS
• Consistently present
– Across Variables
– Across years and PUMS/internal file
• Important Exceptions
– Personal Earnings (low correlation)
– Published Design Factors
– Taylor Series estimator

13

Potential Causes

• Over-sampling
– PUMS sampling rate set in each state to 1%
– Moderate geographic oversampling, but not likely
• Heterogeneity in the weights (1+L)
– 1+L in line with expectations; not correlated with
geography (for full weight and replicates)
• Rounding of the weights
– ACS rounds, CPS does not
– Ruled this out
• Sample Size
– Definitely differentiates states from nation

14

Results by sample size (2009)
8.0
7.0
7.0

6.0

5.0 4.8
Full Sample
4.0
50% Sample
3.0 2.6 2.6 7% Sample
2.2
2.0 1.5
1.0

0.0
DEFF Ratio of National to State

15

Results by sample size (2009)
8.0
7.0
7.0

6.0

5.0 4.8
Full Sample
4.0
50% Sample
3.0 2.6 2.6 7% Sample
2.2
2.0 1.5
1.0

0.0
DEFF Ratio of National to State

16

Why Sample Size?

• As sample size increases, the probability of
capturing a meaningful outlier increases
• Only potential outlier in SDR versus Taylor
Series is the variation across weights.
80
1
𝑀𝑆𝐸 𝑖 = (𝑤 𝑖𝑟 − 𝑤 𝑖0 )2
80
𝑟=1
For i= (1…n) and r= (1…80)

17

Histogram of MSE

MSE Percentiles
5%: 169.8
25%: 997.4
50%: 1,782.2
75%: 3,415.8
95%: 14,317

18

Results w/o Upper 5% of MSE Distribution
With Outliers
21.7 to State: 3.0
W/o Outliers
20.0
Average ratio of Nation
to State: 1.5
15.0

10.0
10.0
7.7
7.0
Nat'l (Full Sample)
5.0 5.0
5.0 Nat'l (Excld Outlier)
2.6 2.0 1.4 1.6 State (Ecld. Outlier)
0.0
Health Poverty Personal People in HU that
Insurance Earnings Rental are Rental
Housing

19

Design Effect of Health Insurance

0.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0

1.0
240
309
394
496
584
676
765
934
1,201
1,369
1,412
1,517
1,963
2,040
2,445
Effect Appears Non-Linear

2,548
Number of Outliers 2,686
2,859
3,056
3,472
4,027
4,931
5,379
6,920
9,839
U.S.

17,665
20

Outlier Correlates
MSE
0-14,316 14,317+ (Outlier)

Full Sample Wt, Mean (SD) 89 (47.6) 322 (85.3)
Median Replicate, Mean (SD) 89 (47.8) 323 (85.9)

Mode/GQ*
HU Mail, % 99.6 .34
HU CATI/CAPI, % 85.3 14.9
GQ, % 97.3 2.7
* Significant at p<0.05; Remains significant after adjusting for age, race, and
sex
NOTE: CATI/CAPI is grouped together in PUMS

21

One Potential Reason for Outlying MSE

• Recall the weighting strategy for the replicates
– BW * RF * SSF …
• Potentially some correlation between RF and
SSF that causes larger MSE in CAPI cases,
relative to Mail/CATI

22

Limitations

• Our hypothesis that outlying MSE cause this
anomaly fails to explain 2 results
– Why do we fail to find an effect for characteristics
with low intra-class correlations (earnings)?
– The group (CATI/CAPI) with the highest rate and
frequency of outliers does not have the highest
DEFF

23

Design Effects by Mode/GQ
MSE % Outliers # Outliers DEFF of Insurance
(mean)

HU Mail 2005 .34 6,814 4.5
HU CATI/CAPI 7110 14.9 142,427 1.9
GQ 3713 2.7 2,256 2.0

24

Practical Significance

• ACS is designed for local area estimation
• National standard errors already small
– National S.E. for health insurance: 0.06
– National S.E. w/o outliers: 0.03
– National S.E. assuming state DEFF: 0.01
• Two practical areas of concern
– Multi-year file: larger sample size
– Large sub-groups

25

National Design Effects in Single Year vs.
Multi-year
40.0
34.3
35.0
30.0
25.0
21.7
20.0 Single Year
15.0 12.4 13.5 Multi-Year
10.0
10.0 7.0
5.0 2.0 2.5
0.0
Coverage Poverty Earnings People in
Rentals

26

Average State Design Effects in Single Year
vs. Multi-year
5.0
4.6
4.5 4.4
4.1 4.2
4.0
3.5
3.0 2.7 2.8
2.5 Single Year
2.0 Multi-Year
1.5
1.1 1.1
1.0
0.5
0.0
Coverage Poverty Earnings People in
Rentals

27

Large Sub-Groups
SE (%) for Whites
7 Observed: 0.05
Assuming Ave DEFF: 0.03
6

5

4
DEFF

3

2

1

0
NHOPI AIAN White Asian Multiple Black Other

28

Summary/Recommendations

• National Design Effects appear to large
– National SE’s upwardly biased
• Potentially driven by CAPI Sub-sampling
– Only apparent at aggregated domains
• Further investigation into RF by SSF interaction
– Accurate reflection of sample design or fixable in the
weights?
• Analysts that wish to avoid this can adopt
alternative variance method (TS)

29

Acknowledgements

• Funding
– RWJF grant to SHADAC
– Interdisciplinary Doctoral Fellowship Program
(Univ. of MN)
• Collaborators
– Peter Graven
– Michael Davern
– Kathleen Call

30

Michel Boudreaux
boudr019@umn.edu

Sign up to receive our newsletter and updates at
www.shadac.org
@shadac

Pres jsm jul31_boudreaux

Recommended

Recommended

More Related Content

Similar to Pres jsm jul31_boudreaux

Similar to Pres jsm jul31_boudreaux (8)

More from soder145

More from soder145 (20)

Recently uploaded

Recently uploaded (20)

Pres jsm jul31_boudreaux