VIP Independent Call Girls in Taloja 🌹 9920725232 ( Call Me ) Mumbai Escorts ...
Pres jsm jul31_boudreaux
1. Design Effect Anomalies in the American
Community Survey
Michel Boudreaux
Joint Statistical Meetings
July 29, 2012
San Diego, CA
Funded by a grant from the Robert Wood Johnson Foundation
2. Overview
• Background
• Irregularities in the DEFF of the ACS
• Potential drivers
• Practical significance for analysts
2
3. Design Effects
• A relative measure of sample efficiency
𝜎2
𝑐𝑜𝑚𝑝𝑙𝑒𝑥
𝐷𝐸𝐹𝐹 =
𝜎2
𝑆𝑅𝑆
• Sensitive to design elements
– Stratification (-)
– Intra-cluster correlation (+)
– Variance in the weights (+)
• Unique to each variable
• Typically 2-4
3
4. ACS Sample Design
• Separate design for HU and GQ
• Housing Unit Design
– Frame: MAF
– Each county chosen with certainty
– Sub-county strata defined on population and RR
– 1/3 Non-responders sampled for personal
interviews
– PUMS created as a systematic sample such that a
1% sample of each state is formed
4
5. Construction of Full Sample Weight
• Inverse of the probability of selection (BW)
• CAPI Sub-sample adjustment (SSF)
• Seasonal response adjustment
• Non-interview adjustment
• Mode bias adjustment
• Raking to control totals at the sub-county
level
5
6. Complex Variance Estimation
• Successive Difference Replication
– Similar to BRR w/Fay’s adjustment
– Geographic sort order is informative
• Replicate Weights (80 replicates)
– 1 of 3 replicate factors applied to each case
• 1.0 (50% of cases), 0.3, 1.7
• Factor assigned using a Hadamard Matrix (RF)
– BW adjusted with replicate factor
– Weighting process repeated
6
8. Design Effects in the 2009 PUMS
25.0
21.7 2009 ACS PUMS
20.0 Average ratio of Nation
to State: 3.00
15.0
10.0
10.0 National ACS
7.0
5.0 State ACS
5.0
2.0 National CPS
0.0
Health Poverty Personal People in HU that are
Insurance Earnings Rental Rental
Housing
8
9. Design Effects in the 2010 PUMS
25.0
21.2 2010 ACS PUMS
20.0 Average ratio of Nation
to State: 2.88
15.0
10.0 8.8 National
6.4 State Average
5.0 4.2
1.8
0.0
Health Poverty Personal People in HU that are
Insurance Earnings Rental Rental
Housing
9
10. Design Effect for Health Insurance in 2008
Internal File (Derived from AFF)
9.00
8.01
8.00
7.00
6.00
5.00
4.00
2.96
3.00
2.00
1.00
0.00
National State Ave
10
11. 2008 Design Factors (Published vs. Derived)
• DF’s are ratios of standard errors
– Used as a ratio-adjuster in place of SDR
3.0
2.5
2.5
2.0 1.8 1.8 1.7
1.5 Derived from PUMS
1.0 Published
0.5
0.0
National State Average
11
12. Using an Alternative Complex Variance
Estimator
• Taylor Series
– Strata: PUMA; Cluster: Household
5.0 4.7
4.0 3.2
3.0
2.0 1.8
1.1 National ACS
1.0 0.5 State Average
0.0
Health Poverty Personal People in HU that
Insurance Earnings Rental are Rental
Housing
12
13. Summary of Anomaly
• National DEFF is far larger than expectation
– Versus literature (Kish, 2003)
– Versus state average
– Versus CPS
• Consistently present
– Across Variables
– Across years and PUMS/internal file
• Important Exceptions
– Personal Earnings (low correlation)
– Published Design Factors
– Taylor Series estimator
13
14. Potential Causes
• Over-sampling
– PUMS sampling rate set in each state to 1%
– Moderate geographic oversampling, but not likely
• Heterogeneity in the weights (1+L)
– 1+L in line with expectations; not correlated with
geography (for full weight and replicates)
• Rounding of the weights
– ACS rounds, CPS does not
– Ruled this out
• Sample Size
– Definitely differentiates states from nation
14
15. Results by sample size (2009)
8.0
7.0
7.0
6.0
5.0 4.8
Full Sample
4.0
50% Sample
3.0 2.6 2.6 7% Sample
2.2
2.0 1.5
1.0
0.0
DEFF Ratio of National to State
15
16. Results by sample size (2009)
8.0
7.0
7.0
6.0
5.0 4.8
Full Sample
4.0
50% Sample
3.0 2.6 2.6 7% Sample
2.2
2.0 1.5
1.0
0.0
DEFF Ratio of National to State
16
17. Why Sample Size?
• As sample size increases, the probability of
capturing a meaningful outlier increases
• Only potential outlier in SDR versus Taylor
Series is the variation across weights.
80
1
𝑀𝑆𝐸 𝑖 = (𝑤 𝑖𝑟 − 𝑤 𝑖0 )2
80
𝑟=1
For i= (1…n) and r= (1…80)
17
19. Results w/o Upper 5% of MSE Distribution
With Outliers
25.0 Average ratio of Nation
21.7 to State: 3.0
W/o Outliers
20.0
Average ratio of Nation
to State: 1.5
15.0
10.0
10.0
7.7
7.0
Nat'l (Full Sample)
5.0 5.0
5.0 Nat'l (Excld Outlier)
2.6 2.0 1.4 1.6 State (Ecld. Outlier)
0.0
Health Poverty Personal People in HU that
Insurance Earnings Rental are Rental
Housing
19
20. Design Effect of Health Insurance
0.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
1.0
240
309
394
496
584
676
765
934
1,201
1,369
1,412
1,517
1,963
2,040
2,445
Effect Appears Non-Linear
2,548
Number of Outliers 2,686
2,859
3,056
3,472
4,027
4,931
5,379
6,920
9,839
U.S.
17,665
20
21. Outlier Correlates
MSE
0-14,316 14,317+ (Outlier)
Full Sample Wt, Mean (SD) 89 (47.6) 322 (85.3)
Median Replicate, Mean (SD) 89 (47.8) 323 (85.9)
Mode/GQ*
HU Mail, % 99.6 .34
HU CATI/CAPI, % 85.3 14.9
GQ, % 97.3 2.7
* Significant at p<0.05; Remains significant after adjusting for age, race, and
sex
NOTE: CATI/CAPI is grouped together in PUMS
21
22. One Potential Reason for Outlying MSE
• Recall the weighting strategy for the replicates
– BW * RF * SSF …
• Potentially some correlation between RF and
SSF that causes larger MSE in CAPI cases,
relative to Mail/CATI
22
23. Limitations
• Our hypothesis that outlying MSE cause this
anomaly fails to explain 2 results
– Why do we fail to find an effect for characteristics
with low intra-class correlations (earnings)?
– The group (CATI/CAPI) with the highest rate and
frequency of outliers does not have the highest
DEFF
23
24. Design Effects by Mode/GQ
MSE % Outliers # Outliers DEFF of Insurance
(mean)
HU Mail 2005 .34 6,814 4.5
HU CATI/CAPI 7110 14.9 142,427 1.9
GQ 3713 2.7 2,256 2.0
24
25. Practical Significance
• ACS is designed for local area estimation
• National standard errors already small
– National S.E. for health insurance: 0.06
– National S.E. w/o outliers: 0.03
– National S.E. assuming state DEFF: 0.01
• Two practical areas of concern
– Multi-year file: larger sample size
– Large sub-groups
25
26. National Design Effects in Single Year vs.
Multi-year
40.0
34.3
35.0
30.0
25.0
21.7
20.0 Single Year
15.0 12.4 13.5 Multi-Year
10.0
10.0 7.0
5.0 2.0 2.5
0.0
Coverage Poverty Earnings People in
Rentals
26
27. Average State Design Effects in Single Year
vs. Multi-year
5.0
4.6
4.5 4.4
4.1 4.2
4.0
3.5
3.0 2.7 2.8
2.5 Single Year
2.0 Multi-Year
1.5
1.1 1.1
1.0
0.5
0.0
Coverage Poverty Earnings People in
Rentals
27
28. Large Sub-Groups
SE (%) for Whites
7 Observed: 0.05
Assuming Ave DEFF: 0.03
6
5
4
DEFF
3
2
1
0
NHOPI AIAN White Asian Multiple Black Other
28
29. Summary/Recommendations
• National Design Effects appear to large
– National SE’s upwardly biased
• Potentially driven by CAPI Sub-sampling
– Only apparent at aggregated domains
• Further investigation into RF by SSF interaction
– Accurate reflection of sample design or fixable in the
weights?
• Analysts that wish to avoid this can adopt
alternative variance method (TS)
29
30. Acknowledgements
• Funding
– RWJF grant to SHADAC
– Interdisciplinary Doctoral Fellowship Program
(Univ. of MN)
• Collaborators
– Peter Graven
– Michael Davern
– Kathleen Call
30
31. Michel Boudreaux
boudr019@umn.edu
Sign up to receive our newsletter and updates at
www.shadac.org
@shadac