SlideShare a Scribd company logo
1 of 34
Are State-Level Estimates
for the American Housing
Survey Feasible?
Ernest Lawley, Stephen Ash, Brian Shaffer,
Kathy Zha
Demographic Statistical Methods Division
Sample Design and Estimation
US Census Bureau
August 2015
Disclaimer
This presentation is released to inform interested parties of
ongoing research and to encourage discussion of work in
progress. The views expressed on statistical,
methodological, technical, or operational issues are those
of the authors and not necessarily those of the U.S.
Census Bureau.
2
Today’s discussion
1. Background
2. State-Level Estimation & Results
3. Conclusion
3
Background
4
AHS-N Background
• The American Housing Survey – National
sample (AHS-N) is a 30-year longitudinal survey
of housing units (occupied and vacant)
• Produces estimates at the national level; for
each census division (9) and census region (4)
 New England, Middle Atlantic, East North Central,
West North Central, South Atlantic, East South
Central, West South Central, Mountain, Pacific
 Northeast, Midwest, South, West
5
AHS-N Background
• Data available from prior design: 1985 - 2013
• Redesigned in 2015
• Data Collection currently going on
• Sample selection is a two-stage process
̵ First stage Primary Sampling Unit (PSU) selection
̵ Second stage Housing Unit (HU) selection
6
AHS-N Sample Design
1. Primary Sampling Unit (PSU) selection
First Stage
• A PSU consists of a county or group of counties.
• If 100,000+ housing units, Self-Representing (SR).
 All SR PSUs were selected (170)
• If <100,000 HUs, Non-Self-Representing (NSR)
 NSR PSUs grouped into strata, 1 PSU per stratum was
selected (224), with probability proportional to size.
• The “prior design” (ie design being used for this
study) selected PSUs by REGION
7
AHS-N Sample Design (cont.)
8
Census Regions
AHS-N Sample Design (cont.)
9
AHS-N Sample Design (cont.)
10
Chelan, Douglas,
Kittitas PSU
Nevada
California
Oregon
Idaho
Montana
Wyoming
Utah
Colorado
Arizona
New Mexico
Washington
Alaska
Hawaii
Selected this PSU!
AHS-N Sample Design (cont.)
2. Housing Unit (HU) selection
Second Stage
• Systematic Sampling
• Within-PSU rate determined so that the overall
probability of selection was the same
11
AHS-N Weighting Procedures
1. Nonresponse adjustment
– Cell-based method to ensure the adjusted sum of
weights for interviewed units = total eligible
weight.
2. First stage (PSU) adjustment
− Aligns NSR PSU weights to the stratum housing
unit totals from the census
3. Adjust Estimate to Independent Housing Unit
Estimate.
4. Adjust Estimate to Population Controls
12
AHS-N Variance Estimation
We want to evaluate variances (standard errors) to
determine the feasibility of State-Level estimates.
Replication Method (Fay and Train 1995)
•Successive Differences Replication (SDR) used for
Self-Representing (SR) PSUs
•Balanced Repeated Replication (BRR) used for Non
Self-Representing (NSR) PSUs
− BRR accounts for PSU selection (ie first-stage
selection)
•160 Replicates created for each Interviewed Housing
Unit
13
AHS-N Variance Estimation (cont.)
14
AHS-N Variance Estimation (cont.)
15
State-Level Estimation & Results
16
SR vs. NSR Percentages
Population Division 1980 Housing Unit counts aggregated by SR and
NSR County
17
Methods for State-Level Estimates
• Method 1: “Brute Force”—Sum all SR and
NSR cases in each state
• Method 2: Synthetic Method—Distribute
NSR cases to each state within region, then
sum
• Method 3: Adjust to Individual State &
Population Control Totals & Raked
18
Methods for State-Level Estimates
CHECKS:
•State Estimate small CVs (less than 10-15%)
•Compare to Population Division Housing Unit
State Level Estimate to ensure each state’s
estimate is in the “same ballpark”
•“Good” states will be used to produce state-
level estimates of Total HUs as well as other
smaller domains (ie estimates of owners,
renters, occupied, vacant, family composition,
income, housing cost, housing quality,
neighborhood quality, fuel type, etc)
19
METHOD 1—”BRUTE FORCE”
Sum all SR and NSR cases in each state
NOTE: %Diff = 100 * (Method 1 – Control)/Control 20
METHOD 2—SYNTHETIC METHOD
• Each NSR sample (and interviewed) case
receives a final weight
• This weight is distributed accordingly to each
state within Region
- This occurs because we don’t know what other
PSUs in the Region (which likely are in other
states) were selected in the sample case’s Strata
21
METHOD 2—SYNTHETIC METHOD
EXAMPLE
Suppose we sample and successfully interview
a house in Lassen County, CA. This house
receives a final weight of 2,250. We do not
know the other PSUs in the West Region that
the Lassen County PSU is representing.
Because of this unknown, we have to “spread
out” this weight of 2,250 in all states in the West
Region.
22
METHOD 2—SYNTHETIC METHOD
EXAMPLE (cont.)
Weight=2,250
Percent of Housing Units in each of the West
Region 13 states, with distribution received of
the Lassen County housing weight:
23
METHOD 2—SYNTHETIC METHOD
Proportionally distribute NSR Weights to each
State throughout region, then sum each state’s
SR and (synthetically adjusted) NSR weights
NOTE: %Diff = 100 * (Method 2 – Control)/Control
24
METHOD 3—Use State-Level Control
Totals
• Sum all SR and NSR cases in each state (ie “brute
force method” then make adjustments based on
State-Level and Population control totals
• Control totals obtained from Census Bureau’s
Population Division
- Pop Division control totals by County, aggregated
to State total by SR, NSR
- Totals were raked using Black and Hispanic totals
Method currently used; this method will change based on
other ongoing AHS research (Yesterday’s session 137
@8:50am “Results of Calibration Research for the 2015
American Housing Survey”)
25
METHOD 3—Use State-Level &
Population Control Totals
26NOTE: %Diff = 100 * (Method 3 – Control)/Control
METHOD 3—Use State-Level &
Population Control Totals
27
A Few Sub-Domains of the AHS
•Total Occupied
•Total Vacant
•Seasonal
•New Construction
•Mobile Homes
QUESTION OF THE DAY: What’s a good sample
size within a sub-domain to yield a good estimate?
METHOD 3—Use State-Level &
Population Control Totals
28
Estimated Values of Sub-Domains
Ongoing Research
•Determination of sufficient sample size for sub-domains
•Sample sizes must produce estimates with CVs < 15%
•Some sub-domains may be suppressed due to high CVs
29
METHOD 3—Use State-Level &
Population Control Totals
METHOD 3—Use State-Level &
Population Control Totals
30
Are these sample sizes sufficient to calculate feasible sub-domain
estimates?
METHOD 3—Use State-Level &
Population Control Totals
31
State-Level CVs for Each Sub-Domain
Conclusion
Future Research
•Data from 2015 (New Design)
•Improvements to Synthetic Method
– More known information to NSR weights
•Small Area Estimation Techniques?
•Use of Calibration with HU/Pop Controls in new design
– Raking used with prior design
•Possible inclusion of more states
32
Conclusion
“Just when you think you have all the answers, I
CHANGE THE QUESTIONS!”
– Rowdy Roddy Piper
33
QUESTIONS?
34
Ernie Lawley
Ernest.R.Lawley@census.gov
Steve Ash
Stephen.Eliot.Ash@census.gov
Brian Shaffer
Brian.Shaffer@census.gov
Kathy Zha
Kathy.Zha@census.gov

More Related Content

Viewers also liked (7)

introduction to Cloud computing by Hima bindu
introduction to Cloud computing by Hima binduintroduction to Cloud computing by Hima bindu
introduction to Cloud computing by Hima bindu
 
Stress management of ac industry
Stress management of ac industryStress management of ac industry
Stress management of ac industry
 
Historia de la cirujìa
Historia de la cirujìaHistoria de la cirujìa
Historia de la cirujìa
 
Eman Danial Resume - Exec. Secty 2015
Eman Danial Resume - Exec. Secty 2015Eman Danial Resume - Exec. Secty 2015
Eman Danial Resume - Exec. Secty 2015
 
Self-Organizing Logical-Clustering Topology for Managing Distributed Context ...
Self-Organizing Logical-Clustering Topology for Managing Distributed Context ...Self-Organizing Logical-Clustering Topology for Managing Distributed Context ...
Self-Organizing Logical-Clustering Topology for Managing Distributed Context ...
 
AMBIENTE DE APRENDIZAJE MEDIADO POR TIC
AMBIENTE DE APRENDIZAJE MEDIADO POR TICAMBIENTE DE APRENDIZAJE MEDIADO POR TIC
AMBIENTE DE APRENDIZAJE MEDIADO POR TIC
 
Presentasi NESIA Terbaru 31 Januari 2016 www.dream4freedom.co
Presentasi NESIA Terbaru 31 Januari 2016 www.dream4freedom.coPresentasi NESIA Terbaru 31 Januari 2016 www.dream4freedom.co
Presentasi NESIA Terbaru 31 Januari 2016 www.dream4freedom.co
 

Similar to JSM Slides--Are State-Level Estimates for the AHS Feasible

Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
Ernest Lawley
 
Sampling_-_Basic_Concepts_-_Recap (1).ppt
Sampling_-_Basic_Concepts_-_Recap (1).pptSampling_-_Basic_Concepts_-_Recap (1).ppt
Sampling_-_Basic_Concepts_-_Recap (1).ppt
ronaldrobin1
 

Similar to JSM Slides--Are State-Level Estimates for the AHS Feasible (20)

Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
Are State-Level Estimates for the American Housing Survey Feasible Paper v1.0
 
e224o608.ppt
e224o608.ppte224o608.ppt
e224o608.ppt
 
e224o608.ppt
e224o608.ppte224o608.ppt
e224o608.ppt
 
Adv.-Statistics-2.pptx
Adv.-Statistics-2.pptxAdv.-Statistics-2.pptx
Adv.-Statistics-2.pptx
 
A. R. Discenza, S. Loriga, A. Martini - The Italian Labour Force Survey consi...
A. R. Discenza, S. Loriga, A. Martini - The Italian Labour Force Survey consi...A. R. Discenza, S. Loriga, A. Martini - The Italian Labour Force Survey consi...
A. R. Discenza, S. Loriga, A. Martini - The Italian Labour Force Survey consi...
 
Chisquare Test of Association.pdf in biostatistics
Chisquare Test of Association.pdf in biostatisticsChisquare Test of Association.pdf in biostatistics
Chisquare Test of Association.pdf in biostatistics
 
5.pdf
5.pdf5.pdf
5.pdf
 
Sampling_-_Basic_Concepts_-_Recap (1).ppt
Sampling_-_Basic_Concepts_-_Recap (1).pptSampling_-_Basic_Concepts_-_Recap (1).ppt
Sampling_-_Basic_Concepts_-_Recap (1).ppt
 
Development of the Global Exposure Database (GED)
Development of the Global Exposure Database (GED)Development of the Global Exposure Database (GED)
Development of the Global Exposure Database (GED)
 
Statistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahuStatistics in research by dr. sudhir sahu
Statistics in research by dr. sudhir sahu
 
statistics.pdf
statistics.pdfstatistics.pdf
statistics.pdf
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
3.1 non parametric test
3.1 non parametric test3.1 non parametric test
3.1 non parametric test
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
 
Fiscal Impact Analysis: New Methods, New Data and Best Practices
Fiscal Impact Analysis: New Methods, New Data and Best PracticesFiscal Impact Analysis: New Methods, New Data and Best Practices
Fiscal Impact Analysis: New Methods, New Data and Best Practices
 
Non parametric-tests
Non parametric-testsNon parametric-tests
Non parametric-tests
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Chapter_2_Sampling.pptx
Chapter_2_Sampling.pptxChapter_2_Sampling.pptx
Chapter_2_Sampling.pptx
 
Storm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SASStorm Prediction data analysis using R/SAS
Storm Prediction data analysis using R/SAS
 

JSM Slides--Are State-Level Estimates for the AHS Feasible

  • 1. Are State-Level Estimates for the American Housing Survey Feasible? Ernest Lawley, Stephen Ash, Brian Shaffer, Kathy Zha Demographic Statistical Methods Division Sample Design and Estimation US Census Bureau August 2015
  • 2. Disclaimer This presentation is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed on statistical, methodological, technical, or operational issues are those of the authors and not necessarily those of the U.S. Census Bureau. 2
  • 3. Today’s discussion 1. Background 2. State-Level Estimation & Results 3. Conclusion 3
  • 5. AHS-N Background • The American Housing Survey – National sample (AHS-N) is a 30-year longitudinal survey of housing units (occupied and vacant) • Produces estimates at the national level; for each census division (9) and census region (4)  New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain, Pacific  Northeast, Midwest, South, West 5
  • 6. AHS-N Background • Data available from prior design: 1985 - 2013 • Redesigned in 2015 • Data Collection currently going on • Sample selection is a two-stage process ̵ First stage Primary Sampling Unit (PSU) selection ̵ Second stage Housing Unit (HU) selection 6
  • 7. AHS-N Sample Design 1. Primary Sampling Unit (PSU) selection First Stage • A PSU consists of a county or group of counties. • If 100,000+ housing units, Self-Representing (SR).  All SR PSUs were selected (170) • If <100,000 HUs, Non-Self-Representing (NSR)  NSR PSUs grouped into strata, 1 PSU per stratum was selected (224), with probability proportional to size. • The “prior design” (ie design being used for this study) selected PSUs by REGION 7
  • 8. AHS-N Sample Design (cont.) 8 Census Regions
  • 10. AHS-N Sample Design (cont.) 10 Chelan, Douglas, Kittitas PSU Nevada California Oregon Idaho Montana Wyoming Utah Colorado Arizona New Mexico Washington Alaska Hawaii Selected this PSU!
  • 11. AHS-N Sample Design (cont.) 2. Housing Unit (HU) selection Second Stage • Systematic Sampling • Within-PSU rate determined so that the overall probability of selection was the same 11
  • 12. AHS-N Weighting Procedures 1. Nonresponse adjustment – Cell-based method to ensure the adjusted sum of weights for interviewed units = total eligible weight. 2. First stage (PSU) adjustment − Aligns NSR PSU weights to the stratum housing unit totals from the census 3. Adjust Estimate to Independent Housing Unit Estimate. 4. Adjust Estimate to Population Controls 12
  • 13. AHS-N Variance Estimation We want to evaluate variances (standard errors) to determine the feasibility of State-Level estimates. Replication Method (Fay and Train 1995) •Successive Differences Replication (SDR) used for Self-Representing (SR) PSUs •Balanced Repeated Replication (BRR) used for Non Self-Representing (NSR) PSUs − BRR accounts for PSU selection (ie first-stage selection) •160 Replicates created for each Interviewed Housing Unit 13
  • 17. SR vs. NSR Percentages Population Division 1980 Housing Unit counts aggregated by SR and NSR County 17
  • 18. Methods for State-Level Estimates • Method 1: “Brute Force”—Sum all SR and NSR cases in each state • Method 2: Synthetic Method—Distribute NSR cases to each state within region, then sum • Method 3: Adjust to Individual State & Population Control Totals & Raked 18
  • 19. Methods for State-Level Estimates CHECKS: •State Estimate small CVs (less than 10-15%) •Compare to Population Division Housing Unit State Level Estimate to ensure each state’s estimate is in the “same ballpark” •“Good” states will be used to produce state- level estimates of Total HUs as well as other smaller domains (ie estimates of owners, renters, occupied, vacant, family composition, income, housing cost, housing quality, neighborhood quality, fuel type, etc) 19
  • 20. METHOD 1—”BRUTE FORCE” Sum all SR and NSR cases in each state NOTE: %Diff = 100 * (Method 1 – Control)/Control 20
  • 21. METHOD 2—SYNTHETIC METHOD • Each NSR sample (and interviewed) case receives a final weight • This weight is distributed accordingly to each state within Region - This occurs because we don’t know what other PSUs in the Region (which likely are in other states) were selected in the sample case’s Strata 21
  • 22. METHOD 2—SYNTHETIC METHOD EXAMPLE Suppose we sample and successfully interview a house in Lassen County, CA. This house receives a final weight of 2,250. We do not know the other PSUs in the West Region that the Lassen County PSU is representing. Because of this unknown, we have to “spread out” this weight of 2,250 in all states in the West Region. 22
  • 23. METHOD 2—SYNTHETIC METHOD EXAMPLE (cont.) Weight=2,250 Percent of Housing Units in each of the West Region 13 states, with distribution received of the Lassen County housing weight: 23
  • 24. METHOD 2—SYNTHETIC METHOD Proportionally distribute NSR Weights to each State throughout region, then sum each state’s SR and (synthetically adjusted) NSR weights NOTE: %Diff = 100 * (Method 2 – Control)/Control 24
  • 25. METHOD 3—Use State-Level Control Totals • Sum all SR and NSR cases in each state (ie “brute force method” then make adjustments based on State-Level and Population control totals • Control totals obtained from Census Bureau’s Population Division - Pop Division control totals by County, aggregated to State total by SR, NSR - Totals were raked using Black and Hispanic totals Method currently used; this method will change based on other ongoing AHS research (Yesterday’s session 137 @8:50am “Results of Calibration Research for the 2015 American Housing Survey”) 25
  • 26. METHOD 3—Use State-Level & Population Control Totals 26NOTE: %Diff = 100 * (Method 3 – Control)/Control
  • 27. METHOD 3—Use State-Level & Population Control Totals 27 A Few Sub-Domains of the AHS •Total Occupied •Total Vacant •Seasonal •New Construction •Mobile Homes QUESTION OF THE DAY: What’s a good sample size within a sub-domain to yield a good estimate?
  • 28. METHOD 3—Use State-Level & Population Control Totals 28 Estimated Values of Sub-Domains
  • 29. Ongoing Research •Determination of sufficient sample size for sub-domains •Sample sizes must produce estimates with CVs < 15% •Some sub-domains may be suppressed due to high CVs 29 METHOD 3—Use State-Level & Population Control Totals
  • 30. METHOD 3—Use State-Level & Population Control Totals 30 Are these sample sizes sufficient to calculate feasible sub-domain estimates?
  • 31. METHOD 3—Use State-Level & Population Control Totals 31 State-Level CVs for Each Sub-Domain
  • 32. Conclusion Future Research •Data from 2015 (New Design) •Improvements to Synthetic Method – More known information to NSR weights •Small Area Estimation Techniques? •Use of Calibration with HU/Pop Controls in new design – Raking used with prior design •Possible inclusion of more states 32
  • 33. Conclusion “Just when you think you have all the answers, I CHANGE THE QUESTIONS!” – Rowdy Roddy Piper 33
  • 34. QUESTIONS? 34 Ernie Lawley Ernest.R.Lawley@census.gov Steve Ash Stephen.Eliot.Ash@census.gov Brian Shaffer Brian.Shaffer@census.gov Kathy Zha Kathy.Zha@census.gov

Editor's Notes

  1. First, I will give a brief description of the sample design and our current weighting procedures. Then, I’ll talk about the methodology we’re studying. Next, I’ll show how we applied the method to the 2013 AHS. And finally, I’ll summarize what we found and what we believe are our next steps.
  2. The American Housing Survey is a 30-year longitudinal survey of the US housing stock. It is used to produce estimates of many characteristics of our housing units, both occupied and vacant.
  3. The 2015 current redesign is finishing its final enumeration period. We are taking this time to review our current methods and see if we can make any improvements, including current research in evaluating the feasibility of state-level estimates. We currently only have data for the prior 1985-2013 design, which was sampled at the region level. The 2015 sample was selected at the division level. Are these designs sufficient in creating state-level estimates? That is the question we will attempt to answer over the next few slides. Over the next few slides, I’ll introduce the sample design and describe our current weighting procedures.
  4. To get our sample, we first selected our PSUs. We defined our PSUs a Self-representing if they had over 100,000 housing units and Non-self-representing if they had less than 100,000. All SR PSUs were in the sample. We grouped the NSR PSUs into strata based on similarities between the PSUs and selected one PSU per stratum, with probability proportionate to the size of the stratum. NOTE: LOOK AT SR PSUs AS THOSE COUNTIES WITH LARGE POPULATIONS (GENERALLY ONE COUNTY PER PSU) LOOK AT NSR PSUs AS COLLECTIONS OF COUNTIES WITH SMALLER POPULATIONS (GROUPS OF 2 OR MORE, UP TO 8 COUNTIES PER PSU)
  5. This slide defines the Census Regions. AN EXAMPLE OF NSR SAMPLING PROCESS: Let’s focus on the West Region, particularly Washington.
  6. Let’s take a look at Counties and PSUs in Washington state. NOTE: THESE DATA ARE MADE UP FOR EXAMPLE PURPOSES. All counties with greater than 100,000 population are considered SR PSUs. These are denoted as WHITE COLORED counties. Counties whose populations are under 100,000 are grouped together into Primary Sampling Units (PSU), as denoted by common colors. Each colored county group (or PSU) is then matched with other PSUs in the West region. This matching is done by demographers who determine which PSUs are most similar to each other based on economics and demographics. A group of PSUs is referred to as a STRATUM. For this example, let’s focus on the GREEN PSU ie Chelan, Douglas, and Kittitas counties.
  7. Here’s a look at the West Region by county. Demographers determined that the Chelan/Douglas/Kittitas PSU has similarities to 8 other PSUs in the West Region, and grouped them into a STRATUM. This made-up stratum contains PSUs in Washington, California, Alaska, Montana, Wyoming, Nevada, Colorado, and 2 in New Mexico. From this stratum, one PSU will be chosen to represent all of the PSUs within the stratum. The PSUs are assigned probabilities of selection based on each PSU’s total number of Housing Units. For our example, let’s say the California PSU was chosen (Siskiyou/Modoc/Shasta/Lassen PSU). That means that the housing units that get chosen in this California PSU will represent all housing units in the entire stratum, to include those counties that make up the PSUs in green. You see that houses in California also represent houses in Washington, Alaska, Montana, Wyoming, Nevada, etc. This spread of NSR representation throughout the West region makes calculating State-level estimates quite challenging. To add to the challenge, record keeping in 1985 wasn’t very good; records exist for PSUs selected but records DO NOT exist for the PSUs that selected PSUs represented.
  8. Once we selected our PSUs, we systematically selected our housing units so that their overall probabilities of selection were the same.
  9. 1. Our weighting procedures start with a nonresponse adjustment. We use a cell-based method to ensure the adjusted sum of weights for interviewed units = total eligible weight. 2. Using census data, we essentially counted all the housing units in the stratum and divided it by the estimate for that stratum (multiplying the census HU counts in our PSU by the PSU weight). 3. Make necessary adjustments to housing unit counts. 4. Make necessary adjustments using population from Census’s Population Division.
  10. For each sample case, the unbiased weight is multiplied by replicate factors to produce unbiased replicate weights. These unbiased weights are further adjusted through the weighting procedures just as the full sample is weighted (refer to the prior slides). By applying all of the weighting adjustments to each replicate, the final replicate weights reflect the impact of the weighting adjustments on the variance.
  11. Total of 160 replicates for each sampled case. Use this formula to create standard errors, where 𝜃 ̂_𝑟 is the summed up replicate estimate (there will be 160 of these sums) and 𝜃 ̂_0 is the summed up “original” estimate.
  12. The expanded example of the Standard Error formula utilizing the 160 replicates.
  13. States with higher SR percentages will yield better state-level estimates. This is due to the NSR portion of the sample design being selected at the region level (rather than the state level)—NSR selection yields estimates which represent strata that may span over several different states. For the purpose of this study, we will only consider those 1985-designed states at or near 75% and above. 13 states: Arizona, California, Connecticut, DC, Florida, Hawaii, Illinois, Maryland, Massachusetts, New Jersey, New York, Pennsylvania, and Rhode Island. FYI, the 1980 is NOT A TYPO. The prior design was selected in 1985 using the 1980 Census. Because we are using the prior design (remember that the new design is still out in the field and we have yet to receive that data), we are handcuffed using SR/NSR percentages from the 1980 Census. For this study, we will use data collected from the 2013 AHS.
  14. %Diff=(Method 1 – ACS)/ACS THE “CONTROL” COLUMN REPRESENTS A CONTROL TOTAL FROM POPULATION DIVISION OF HOUSING UNITS ONLY. I’ll explain why this is an important note a few slides later. Note some large CVs (Arizona, Maryland) and large %Diff values (Hawaii, Pennsylvania, Rhode Island) Investigation on why Hawaii’s %Diff is so large… This is happening because in 1985, AHS only selected SR sample cases from Hawaii. The probability of NSR selection within the West Region didn’t allocate any sample cases in Hawaii, even though there were eligible cases in NSR counties. This is different than Connecticut, DC, and Rhode Island, because those states do not have any NSR counties. Because Hawaii has no NSR sample cases (thus sample weight=0) in the AHS, there is no AHS representation in the NSR counties. Synthetic methods drew NSR cases from other states in the West Region which is why the underestimate for Hawaii for Method 2 was cut down to around 11%. Because AHS selected no sample in NSR counties in Hawaii, thus always underestimating Hawaii, we will eliminate Hawaii as a candidate.
  15. Remember in a prior slide (slide 10) that record keeping in 1985 wasn’t very good; records exist for PSUs selected but records DO NOT exist for the PSUs that selected PSUs represented. In our example, if we knew what PSUs that Lassen County represented, then we could “spread out” the 2,250 weight among those PSUs. But because of the lack of records, we cannot make any assumptions of the PSUs represented; thus we have to spread out the 2,250 among states in the entire region. This is not efficient, but we do not have a choice. For the new 2015 design, we will know what PSUs the selected PSUs represent. We will be able to spread out weights among only those PSUs. Also, the sample design was not selected by REGION, but rather DIVISION, which actually is a smaller breakdown of REGION (there are 9 total divisions located within the 4 regions).
  16. Repeat the process of distributing weights for all NSR housing units by state in each region.
  17. %Diff=(Method 1 – ACS)/ACS CVs look mostly pretty good; large %Diff values in Arizona and Rhode Island.
  18. CVs look great here! Remember that THE “CONTROL” COLUMN REPRESENTED A CONTROL TOTAL FROM POPULATION DIVISION OF HOUSING UNITS ONLY. Method 3 utilizes the Housing Unit control total along with population controls, specifically focusing on controlling for black and hispanic populations. You’ll notice that states with higher populations in these demographics had their estimates slightly altered due to these population controls (Arizona, California, DC, Maryland). These SE/CV values look pretty good at the total HU count level. One more analysis of Generalized Variance Functions (GVFs) will allow us to determine if subdomain estimates are feasible (subdomain are those counts such as occupied, vacant, owner, renter, etc).
  19. CVs looked good for Total Housing Unit counts for each of the states, but what about analysis of subdomains? For this analysis we will look at columns of Table 1-1 in the AHS Publication: Seasonal, Total Occupied, Total Vacant, New Construction, and Mobile Homes
  20. From these values we can obtain average weights, then obtain an estimated sample size for each sub-domain.
  21. How much sample in each sub-domain is enough sample to obtain good estimates? That is still being researched. Notice small sample sizes for sub-domains in DC and Rhode Island. This probably spells doom for those states.
  22. Look at CVs for each state’s sub-domain estimate based on sample size. We want to keep these below 15%. Greyed out areas are those areas greater than 15%. This will help us target states that are probable for elimination, as well as sub-domains to be suppressed. We are currently researching sufficient sample sizes along with CV values at various estimation points.