SlideShare a Scribd company logo
Alternative Allocation Design for
the Occupational Employment
Statistics (OES) Survey
Ernest Lawley, Bureau of Labor Statistics
Marie C. Stetser, Bureau of Labor Statistics
Dr. Eduardas Valaitis, American University
OEUS ANNUAL MEETING 2007
Washington, DC
Alternative Allocation Design for the
Occupational Employment
Statistics (OES) Survey
• Occupational Employment Statistics (OES)
Survey
• Frame Development
• Frame Stratification
• Sample Requirements
• Prior Allocation Design
• Current Allocation Design
• Calculating Sh (standard error)
• Reliability
OES Survey
• Partnership with 50 States + DC, Guam,
Puerto Rico, US Virgin Islands
• Measures occupational employment and
wages within 300+ industry groups*
– Approximately 800 detailed occupations
(SOC)
– Broken down by MSA—aggregated Statewide
and Nationwide
*using 4-digit and 5-digit NAICS codes
Frame Development
• Quarterly Census of Employment and Wages (QCEW)
– Collects non-railroad data for all business establishments for 50
States + DC, PR, USVI
– Data includes pertinent information for each establishment such
as: Trade Name, Legal Name, Address information, and Monthly
Employment for the past 12 months
– Data compiled into Bureau’s Longitudinal Database (LDB)
• Railroad Frame File
– Collected by Bureau’s Office of Safety and Health (OSH)
• Guam Frame File
– Collected by one of the BLS Regional Offices
All three elements combined; OES Frame≈6.7 million
business establishments
Frame Stratification
• Frame initially stratified geographically
– Approximately 600 geographic areas
• Approximately 400 State/Metropolitan Statistical Areas (MSAs)
• Approximately 200 Non-MSA Areas (“rural”)
• Frame further stratified by detailed industry (NAICS 4-
digit, selected NAICS 5-digit)
– Approximately 350 industries
– Industry is related to occupation
• Approximately 170,000 total non-empty strata
– Each business establishment in the nation fits into exactly one of
these defined strata
– Each non-empty stratum contains one business establishment to
hundreds of business establishments
Frame Stratification
State 1
MSA X MSA Y
Industry 1 Industry 2Industry 1 Industry 2
State 2
MSA X MSA Z
Industry
1
Industry
2
Industry
1
Industry
2
Sample Requirements
• Sample allocated by stratum
• Sample Allocation≈1.2 million establishments
• Individual State Sample Sizes (∑≈1.2 million)
– Confidential value for each State
– Based on State employment population
– Last modified in 1996
Example:
Hypothetically (exact values are confidential):
State State Sample Size
California 120,000
Texas 100,000
New York 100,000
Florida 85,000
And so forth… Σ≈1.2 million
Prior Allocation Design
“Proportional-to-Employment”
• Maximum Employment
– Maximum monthly employment value in LDB for each
establishment
STEPS:
1. Sum max employment values across stratum, Nh
2. Sum max employment values across state, ΣNh
3. Look up Individual State Sample Size, n
4. Calculate stratum allocation: nh=n∙(Nh/ΣNh)
5. Repeat calculation for all strata, approx. 170,000
times
Note: n may require iterative reduction to work
minimum sample allocation requirements for each
Prior Allocation Design
• Advantages
– Simple
– Strata with larger populations are allocated
more sample
• Is this necessarily an advantage?
Prior Allocation Design
“A sample should allocate most heavily to those
strata where the least amount of certainty
exists.”
Causes for uncertainty (less reliability)
within a sampled stratum:
• Undersampling a large population
• Undersampling where there is a large
variability in occupations
Prior Allocation Design
• Disadvantage
– Estimates in smaller strata that have large
occupational variability may not be reliable
due to allocation of smaller sample size
Prior Allocation Design
Accomodations/Food
Services Industry
• 90% of all employees work in
88 occupations
• 12.8 million workers in this
industry
Wholesale Trade Industry
• 90% of all employees work in
175 occupations
• 6.1 million workers in this
industry
EXAMPLE
Which of these cells should be allocated more
sample?
Using “Proportional Allocation”:
Accom/Food Services Wholesale Trade
120,000 establishments 72,000 establishments
Current Allocation Design
Neyman Allocation
( )∑
=
⋅
⋅
•= H
1h
hh
hh
h
SN
SN
nn
n=Individual State “fixed” sample size
Nh = sum of stratum frame employees
Sh represents an occupational
variability measure within a stratum
Occupations for each stratum (or cell)
obtained from recent estimates
file; weighted data
Denominator summed overall by
state
Current Allocation Design
Neyman Allocation Proportional Allocation
( )∑
=
⋅
⋅
•= H
1h
hh
hh
h
SN
SN
nn
( )∑
=
•= H
1h
h
h
h
N
N
nn
“Occupational Variability” measure; notice that the
“adjustment” from the Proportional Allocation
formula.
Calculating Sh
1. Calculate a “coefficient of variation” for
each occupation within an industry.
2. Determine 90th
-percentile of occupations
within each industry.
3. Sh (for each industry) is calculated by
obtaining the weighted mean of CVs for
the 90th
-percentile of occupations within
each industry.
Calculating Sh
Step 1: Calculating a “coefficient of variation” for
each occupation within stratum
– Using most recent weighted estimates file:
• Count # of employees in each occupation for each business
establishment (call this yi)
• Count # of employees total for each business establishment
(call this xi)
• Sample weight, wi, represents the number of business
establishments that each establishment on the estimates
file (i) represents
• Create a “weighted ratio”Rw=Σ(wi∙yi)/Σ(wi∙xi); summed over
a defined cell
– Note: This ratio is the ratio of occupational employment to
overall employment; ratio will always be ≤ 1.
Calculating Sh
• CV formula (unweighted)
– Derived from variance formula
– Relative variance (CV2
) for an original variate Yi:
– Using a little algebra (remember R=y/x):
( )
2
N
i
2
i
2
Y
2
2
Y
Y)1N(
YY
Y
S
CV
⋅−
−
==
∑
R
S
x
1
xR
S
y
S
CV
y
yy
Y
⋅
=
⋅
==
( )
R
1N
xRy
x
1
CV
N
1i
2
ii
Y
−
⋅−
⋅
=
∑
=

Calculating Sh
( )[ ]
w
i
i
n
1i
2
iwii
Y
R
1w
xRyw
x
1
CV R
−
⋅−
⋅
≈
∑
∑
=
• CV formula (for each defined “Sh cell”),
summed by cell (including weights):
• Note: x-bar is a weighted average.
∑
∑
= n
1
i
n
i
ii
w
xw
x
Calculating Sh
EXAMPLE (hypothetical cell w/ sampled 2 business establishments)
• Restaurant ABC; represents 5 businesses
• What is ABC’s weight?
• Restaurant XYZ; represents itself (1 business)
• What is XYZ’s weight?
ABC’s Staffing Pattern
Occupation # employed
Waitress/Waiter 8
Cook 4
Dishwasher 2
Janitor 1
Manager 1
TOTAL 16
XYZ’s Staffing Pattern
Occupation # employed
Waitress/Waiter 32
Cook 15
Dishwasher 10
Manager 3
TOTAL 60
Calculations for ABC
Waitress/Waiter Cook Dishwasher Janitor Manager
yi
= 8 yi
= 4 yi
= 2 yi
= 1 yi
= 1
wi
yi
=5∙8=40 wi
yi
=5∙4=20 wi
yi
=5∙2=10 wi
yi
=5∙1=5 wi
yi
=5∙1=5
xi = 16 xi = 16 xi = 16 xi = 16 xi = 16
wi
xi
=5∙16=80 wi
xi
=5∙16=80 wi
xi
=5∙16=80 wi
xi
=5∙16=80 wi
xi
=5∙16=80
Calculations for XYZ
Waitress/Waiter Cook Dishwasher Manager
yi
= 32 yi
= 15 yi
= 10 yi
= 3
wi
yi
=1∙32=32 wi
yi
=1∙15=15 wi
yi
=1∙10=10 wi
yi
=1∙3=3
xi
= 60 xi
= 60 xi
= 60 xi
= 60
wi
xi
=1∙60=60 wi
xi
=1∙60=60 wi
xi
=1∙60=60 wi
xi
=1∙60=60
Calculating Sh
( )[ ]
w
i
i
n
1i
2
iwii
Y
R
1w
xRyw
x
1
CV R
−
⋅−
⋅
≈
∑
∑
=
ABC
yi
=8
wiyi=5∙8=40
xi
=16
wi
xi
=5∙16=80
XYZ
yi
=32
wiyi=1∙32=32
xi
=60
wi
xi
=1∙60=60
Example: CVs for
Occupations
Occupation CV
Waitress/Waiter 0.060
Cook 0
Dishwasher 0.271
Janitor 1.626
Manager 0.203
Waitress/Waiter
( )
( ) ( )
( ) 060.0
140
3240
16
140
32406032
140
32408040
15
6080
1
CV
22
YR
≈
+
−



 +⋅−+



 +⋅−
⋅
+
+
≈
The smaller the CV
value, the less diverse
the occupation is within
the defined cell.
Step 2: Avoiding “atypical” occupations
within each cell:
• Conservative approach: utilize 90th
-
percentile until further research is done
• Exclude bottom 10th
percentile of
occupations
Calculating Sh
Calculating Sh
Step 3: A CV is created for each occupation
within a defined cell—How are occupations
within a cell “combined” to create one value
for the cell?
– Weighted mean of 90th
-percentile occupations
• Obtain occupational proportion for each cell
• Obtain Sh by calculating weighted mean of the top-90th
-
percentile of occupations
– Less prevalent (bottom 10%) occupations are eliminated
– Sh=weighted mean of 90th-
percentile CVs within defined cell
Calculating Sh
Example (sorted in “proportional order”)
90th
percentile
(Look at proportions)
• 90th
-percentile Occupations
– Weighted mean=Sh=Σ ”products”
≈ 0.03 + 0 + 0.04 = 0.07
Weighted Mean of CVs of All Occupations
Occupation CV Proportion Product
Waitress/Waiter 0.060 72/140≈0.51 0.060*0.51≈0.03
Cook 0 35/140=0.25 0*0.25=0
Dishwasher 0.271 20/140≈0.14 0.271*0.14≈0.04
Manager 0.203 8/140≈0.06 0.203*0.06≈0.01
Janitor 1.626 5/140≈0.04 1.626*0.04≈0.07
Calculating Sh
Defining Sh “cell”
– Normality of individual CVs
– Sufficient amount of data to create reliable estimate of
occupational variability (Sh)
Calculating Sh
Aggregation by National Industry (Industry-only)
Concerns:
– Assumption that national aggregates of industry will produce
accurate CVs and Sh values
• Aggregation necessary due to lack of data for finely-detailed cells
• 88.6% of industry MSA-BOS staffing patterns were similar to
corresponding nationally-aggregated industry staffing patterns
(α=0.10)
Calculating Sh
Reliability
• Problem of small populations in geographic areas
• Desire to produce similar reliability in large and small areas
– Example: Utilizing the Neyman Allocation method illustrated,
Chicago takes up approximately 54% if Illinois’s sample allocation;
this may lead to a possible unreliable sample in non-Chicago areas
within Illinois
Reliability
How to “spread out” sample allocation?
Bankier (1988): Power Allocations: Determining Sample Sizes for
Subnational Areas
• Adjust exponent for Nh (numerator and denominator) in the
Neyman Allocation
• Drops Chicago’s value to approx. 34% of IL’s sample allocation
( )∑=
⋅
⋅
•= H
h
hh
hh
h
SN
SN
nn
1
Nh = sum of stratum frame employees
Sh represents an occupational
variability measure within a stratum
Occupations for each stratum (or
cell) obtained from recent
estimates file; weighted data
Denominator summed overall by state
Total Allocation for Illinois
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
B
loom
ingtonC
ham
paign
C
hicago
D
anvilleR
ockIsland
D
ecaturK
ankakee
P
eoria
R
ockford
E
.S
t.LouisS
pringfield
B
O
S
1
B
O
S
2
B
O
S
3
B
O
S
4
Allocation
Neyman 90th
Neyman 90th(SqRoot)
Reliability
Alternative Allocation Design for
the Occupational Employment
Statistics (OES) Survey
QUESTIONS?
lawley.ernest@bls.gov

More Related Content

Viewers also liked

Kasko Sigortası Neleri Karşılar?
Kasko Sigortası Neleri Karşılar?Kasko Sigortası Neleri Karşılar?
Kasko Sigortası Neleri Karşılar?
Sigorta Bilgileri
 
Geohydrology ii (2)
Geohydrology ii (2)Geohydrology ii (2)
Geohydrology ii (2)
Amro Elfeki
 
Speed Floor Melbourne Engineers Presentation & Techinicial Data
Speed Floor Melbourne Engineers Presentation & Techinicial DataSpeed Floor Melbourne Engineers Presentation & Techinicial Data
Speed Floor Melbourne Engineers Presentation & Techinicial Data
drk357111
 
Mu0018 change management
Mu0018 change managementMu0018 change management
Mu0018 change management
consult4solutions
 
How to Setup a Market Cooperation
How to Setup a Market CooperationHow to Setup a Market Cooperation
How to Setup a Market Cooperation
Mikael Balte
 
İnovatif Kimya Dergisi Sayı-12
İnovatif Kimya Dergisi Sayı-12İnovatif Kimya Dergisi Sayı-12
İnovatif Kimya Dergisi Sayı-12
İnovatif Kimya Dergisi
 
My last vacations was at l eticia
My last vacations was at l eticiaMy last vacations was at l eticia
My last vacations was at l eticia
Juandaviid08
 

Viewers also liked (8)

Kasko Sigortası Neleri Karşılar?
Kasko Sigortası Neleri Karşılar?Kasko Sigortası Neleri Karşılar?
Kasko Sigortası Neleri Karşılar?
 
Geohydrology ii (2)
Geohydrology ii (2)Geohydrology ii (2)
Geohydrology ii (2)
 
Speed Floor Melbourne Engineers Presentation & Techinicial Data
Speed Floor Melbourne Engineers Presentation & Techinicial DataSpeed Floor Melbourne Engineers Presentation & Techinicial Data
Speed Floor Melbourne Engineers Presentation & Techinicial Data
 
Mu0018 change management
Mu0018 change managementMu0018 change management
Mu0018 change management
 
How to Setup a Market Cooperation
How to Setup a Market CooperationHow to Setup a Market Cooperation
How to Setup a Market Cooperation
 
İnovatif Kimya Dergisi Sayı-12
İnovatif Kimya Dergisi Sayı-12İnovatif Kimya Dergisi Sayı-12
İnovatif Kimya Dergisi Sayı-12
 
My last vacations was at l eticia
My last vacations was at l eticiaMy last vacations was at l eticia
My last vacations was at l eticia
 
2038868 (1)
2038868 (1)2038868 (1)
2038868 (1)
 

Similar to OEUS Lawley

Pareto_analysis-inventory_ABC_simplified_20160511.ppt
Pareto_analysis-inventory_ABC_simplified_20160511.pptPareto_analysis-inventory_ABC_simplified_20160511.ppt
Pareto_analysis-inventory_ABC_simplified_20160511.ppt
PavanKrishnaGadepall
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
Joydeep Hazarika
 
QNT 561 Introduction Education--qnt561.com
QNT 561 Introduction Education--qnt561.comQNT 561 Introduction Education--qnt561.com
QNT 561 Introduction Education--qnt561.com
kopiko227
 
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docxAssignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
ssuser562afc1
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18
Ruru Chowdhury
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
Ruru Chowdhury
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_newshengvn
 
Demand estimation by regression analysis
Demand estimation by regression analysisDemand estimation by regression analysis
Demand estimation by regression analysis
sohrab642
 
Summary statistics (1)
Summary statistics (1)Summary statistics (1)
Summary statistics (1)Godwin Okley
 
Workshop 4
Workshop 4Workshop 4
Workshop 4eeetq
 
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib KeeminkPython and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
PyData
 
Production function omp pt 33
Production function omp pt 33Production function omp pt 33
Production function omp pt 33Naresh Jimmy
 
Pricing optimization poster version 2 (1)
Pricing optimization poster version 2 (1)Pricing optimization poster version 2 (1)
Pricing optimization poster version 2 (1)Alex Potocki
 
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docxGrade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
whittemorelucilla
 
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docxPA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
gerardkortney
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdf
ICOMICOM4
 
Ch5 - Forecasting in marketing engineering
Ch5 - Forecasting in marketing engineeringCh5 - Forecasting in marketing engineering
Ch5 - Forecasting in marketing engineering
German Jordanian University
 
Index Numbers
Index NumbersIndex Numbers
Index Numbers
Ameya Gandhi
 
Ch15
Ch15Ch15

Similar to OEUS Lawley (20)

Pareto_analysis-inventory_ABC_simplified_20160511.ppt
Pareto_analysis-inventory_ABC_simplified_20160511.pptPareto_analysis-inventory_ABC_simplified_20160511.ppt
Pareto_analysis-inventory_ABC_simplified_20160511.ppt
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
QNT 561 Introduction Education--qnt561.com
QNT 561 Introduction Education--qnt561.comQNT 561 Introduction Education--qnt561.com
QNT 561 Introduction Education--qnt561.com
 
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docxAssignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
Assignment Exercise 17–1 Variance AnalysisGreenview Hospital op.docx
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_new
 
Demand estimation by regression analysis
Demand estimation by regression analysisDemand estimation by regression analysis
Demand estimation by regression analysis
 
Summary statistics (1)
Summary statistics (1)Summary statistics (1)
Summary statistics (1)
 
Workshop 4
Workshop 4Workshop 4
Workshop 4
 
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib KeeminkPython and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
Python and the Holy Grail of Causal Inference - Dennis Ramondt, Huib Keemink
 
Production function omp pt 33
Production function omp pt 33Production function omp pt 33
Production function omp pt 33
 
Pricing optimization poster version 2 (1)
Pricing optimization poster version 2 (1)Pricing optimization poster version 2 (1)
Pricing optimization poster version 2 (1)
 
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docxGrade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
Grade RubricPoint UniversityBUSI 555 Cost Management & Decision Ma.docx
 
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docxPA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdf
 
Ch5 - Forecasting in marketing engineering
Ch5 - Forecasting in marketing engineeringCh5 - Forecasting in marketing engineering
Ch5 - Forecasting in marketing engineering
 
Index Numbers
Index NumbersIndex Numbers
Index Numbers
 
Ch15
Ch15Ch15
Ch15
 

OEUS Lawley

  • 1. Alternative Allocation Design for the Occupational Employment Statistics (OES) Survey Ernest Lawley, Bureau of Labor Statistics Marie C. Stetser, Bureau of Labor Statistics Dr. Eduardas Valaitis, American University OEUS ANNUAL MEETING 2007 Washington, DC
  • 2. Alternative Allocation Design for the Occupational Employment Statistics (OES) Survey • Occupational Employment Statistics (OES) Survey • Frame Development • Frame Stratification • Sample Requirements • Prior Allocation Design • Current Allocation Design • Calculating Sh (standard error) • Reliability
  • 3. OES Survey • Partnership with 50 States + DC, Guam, Puerto Rico, US Virgin Islands • Measures occupational employment and wages within 300+ industry groups* – Approximately 800 detailed occupations (SOC) – Broken down by MSA—aggregated Statewide and Nationwide *using 4-digit and 5-digit NAICS codes
  • 4. Frame Development • Quarterly Census of Employment and Wages (QCEW) – Collects non-railroad data for all business establishments for 50 States + DC, PR, USVI – Data includes pertinent information for each establishment such as: Trade Name, Legal Name, Address information, and Monthly Employment for the past 12 months – Data compiled into Bureau’s Longitudinal Database (LDB) • Railroad Frame File – Collected by Bureau’s Office of Safety and Health (OSH) • Guam Frame File – Collected by one of the BLS Regional Offices All three elements combined; OES Frame≈6.7 million business establishments
  • 5. Frame Stratification • Frame initially stratified geographically – Approximately 600 geographic areas • Approximately 400 State/Metropolitan Statistical Areas (MSAs) • Approximately 200 Non-MSA Areas (“rural”) • Frame further stratified by detailed industry (NAICS 4- digit, selected NAICS 5-digit) – Approximately 350 industries – Industry is related to occupation • Approximately 170,000 total non-empty strata – Each business establishment in the nation fits into exactly one of these defined strata – Each non-empty stratum contains one business establishment to hundreds of business establishments
  • 6. Frame Stratification State 1 MSA X MSA Y Industry 1 Industry 2Industry 1 Industry 2 State 2 MSA X MSA Z Industry 1 Industry 2 Industry 1 Industry 2
  • 7. Sample Requirements • Sample allocated by stratum • Sample Allocation≈1.2 million establishments • Individual State Sample Sizes (∑≈1.2 million) – Confidential value for each State – Based on State employment population – Last modified in 1996 Example: Hypothetically (exact values are confidential): State State Sample Size California 120,000 Texas 100,000 New York 100,000 Florida 85,000 And so forth… Σ≈1.2 million
  • 8. Prior Allocation Design “Proportional-to-Employment” • Maximum Employment – Maximum monthly employment value in LDB for each establishment STEPS: 1. Sum max employment values across stratum, Nh 2. Sum max employment values across state, ΣNh 3. Look up Individual State Sample Size, n 4. Calculate stratum allocation: nh=n∙(Nh/ΣNh) 5. Repeat calculation for all strata, approx. 170,000 times Note: n may require iterative reduction to work minimum sample allocation requirements for each
  • 9. Prior Allocation Design • Advantages – Simple – Strata with larger populations are allocated more sample • Is this necessarily an advantage?
  • 10. Prior Allocation Design “A sample should allocate most heavily to those strata where the least amount of certainty exists.” Causes for uncertainty (less reliability) within a sampled stratum: • Undersampling a large population • Undersampling where there is a large variability in occupations
  • 11. Prior Allocation Design • Disadvantage – Estimates in smaller strata that have large occupational variability may not be reliable due to allocation of smaller sample size
  • 12. Prior Allocation Design Accomodations/Food Services Industry • 90% of all employees work in 88 occupations • 12.8 million workers in this industry Wholesale Trade Industry • 90% of all employees work in 175 occupations • 6.1 million workers in this industry EXAMPLE Which of these cells should be allocated more sample? Using “Proportional Allocation”: Accom/Food Services Wholesale Trade 120,000 establishments 72,000 establishments
  • 13. Current Allocation Design Neyman Allocation ( )∑ = ⋅ ⋅ •= H 1h hh hh h SN SN nn n=Individual State “fixed” sample size Nh = sum of stratum frame employees Sh represents an occupational variability measure within a stratum Occupations for each stratum (or cell) obtained from recent estimates file; weighted data Denominator summed overall by state
  • 14. Current Allocation Design Neyman Allocation Proportional Allocation ( )∑ = ⋅ ⋅ •= H 1h hh hh h SN SN nn ( )∑ = •= H 1h h h h N N nn “Occupational Variability” measure; notice that the “adjustment” from the Proportional Allocation formula.
  • 15. Calculating Sh 1. Calculate a “coefficient of variation” for each occupation within an industry. 2. Determine 90th -percentile of occupations within each industry. 3. Sh (for each industry) is calculated by obtaining the weighted mean of CVs for the 90th -percentile of occupations within each industry.
  • 16. Calculating Sh Step 1: Calculating a “coefficient of variation” for each occupation within stratum – Using most recent weighted estimates file: • Count # of employees in each occupation for each business establishment (call this yi) • Count # of employees total for each business establishment (call this xi) • Sample weight, wi, represents the number of business establishments that each establishment on the estimates file (i) represents • Create a “weighted ratio”Rw=Σ(wi∙yi)/Σ(wi∙xi); summed over a defined cell – Note: This ratio is the ratio of occupational employment to overall employment; ratio will always be ≤ 1.
  • 17. Calculating Sh • CV formula (unweighted) – Derived from variance formula – Relative variance (CV2 ) for an original variate Yi: – Using a little algebra (remember R=y/x): ( ) 2 N i 2 i 2 Y 2 2 Y Y)1N( YY Y S CV ⋅− − == ∑ R S x 1 xR S y S CV y yy Y ⋅ = ⋅ == ( ) R 1N xRy x 1 CV N 1i 2 ii Y − ⋅− ⋅ = ∑ = 
  • 18. Calculating Sh ( )[ ] w i i n 1i 2 iwii Y R 1w xRyw x 1 CV R − ⋅− ⋅ ≈ ∑ ∑ = • CV formula (for each defined “Sh cell”), summed by cell (including weights): • Note: x-bar is a weighted average. ∑ ∑ = n 1 i n i ii w xw x
  • 19. Calculating Sh EXAMPLE (hypothetical cell w/ sampled 2 business establishments) • Restaurant ABC; represents 5 businesses • What is ABC’s weight? • Restaurant XYZ; represents itself (1 business) • What is XYZ’s weight? ABC’s Staffing Pattern Occupation # employed Waitress/Waiter 8 Cook 4 Dishwasher 2 Janitor 1 Manager 1 TOTAL 16 XYZ’s Staffing Pattern Occupation # employed Waitress/Waiter 32 Cook 15 Dishwasher 10 Manager 3 TOTAL 60 Calculations for ABC Waitress/Waiter Cook Dishwasher Janitor Manager yi = 8 yi = 4 yi = 2 yi = 1 yi = 1 wi yi =5∙8=40 wi yi =5∙4=20 wi yi =5∙2=10 wi yi =5∙1=5 wi yi =5∙1=5 xi = 16 xi = 16 xi = 16 xi = 16 xi = 16 wi xi =5∙16=80 wi xi =5∙16=80 wi xi =5∙16=80 wi xi =5∙16=80 wi xi =5∙16=80 Calculations for XYZ Waitress/Waiter Cook Dishwasher Manager yi = 32 yi = 15 yi = 10 yi = 3 wi yi =1∙32=32 wi yi =1∙15=15 wi yi =1∙10=10 wi yi =1∙3=3 xi = 60 xi = 60 xi = 60 xi = 60 wi xi =1∙60=60 wi xi =1∙60=60 wi xi =1∙60=60 wi xi =1∙60=60
  • 20. Calculating Sh ( )[ ] w i i n 1i 2 iwii Y R 1w xRyw x 1 CV R − ⋅− ⋅ ≈ ∑ ∑ = ABC yi =8 wiyi=5∙8=40 xi =16 wi xi =5∙16=80 XYZ yi =32 wiyi=1∙32=32 xi =60 wi xi =1∙60=60 Example: CVs for Occupations Occupation CV Waitress/Waiter 0.060 Cook 0 Dishwasher 0.271 Janitor 1.626 Manager 0.203 Waitress/Waiter ( ) ( ) ( ) ( ) 060.0 140 3240 16 140 32406032 140 32408040 15 6080 1 CV 22 YR ≈ + −     +⋅−+     +⋅− ⋅ + + ≈ The smaller the CV value, the less diverse the occupation is within the defined cell.
  • 21. Step 2: Avoiding “atypical” occupations within each cell: • Conservative approach: utilize 90th - percentile until further research is done • Exclude bottom 10th percentile of occupations Calculating Sh
  • 22. Calculating Sh Step 3: A CV is created for each occupation within a defined cell—How are occupations within a cell “combined” to create one value for the cell? – Weighted mean of 90th -percentile occupations • Obtain occupational proportion for each cell • Obtain Sh by calculating weighted mean of the top-90th - percentile of occupations – Less prevalent (bottom 10%) occupations are eliminated – Sh=weighted mean of 90th- percentile CVs within defined cell
  • 23. Calculating Sh Example (sorted in “proportional order”) 90th percentile (Look at proportions) • 90th -percentile Occupations – Weighted mean=Sh=Σ ”products” ≈ 0.03 + 0 + 0.04 = 0.07 Weighted Mean of CVs of All Occupations Occupation CV Proportion Product Waitress/Waiter 0.060 72/140≈0.51 0.060*0.51≈0.03 Cook 0 35/140=0.25 0*0.25=0 Dishwasher 0.271 20/140≈0.14 0.271*0.14≈0.04 Manager 0.203 8/140≈0.06 0.203*0.06≈0.01 Janitor 1.626 5/140≈0.04 1.626*0.04≈0.07
  • 24. Calculating Sh Defining Sh “cell” – Normality of individual CVs – Sufficient amount of data to create reliable estimate of occupational variability (Sh)
  • 25. Calculating Sh Aggregation by National Industry (Industry-only) Concerns: – Assumption that national aggregates of industry will produce accurate CVs and Sh values • Aggregation necessary due to lack of data for finely-detailed cells • 88.6% of industry MSA-BOS staffing patterns were similar to corresponding nationally-aggregated industry staffing patterns (α=0.10)
  • 27. Reliability • Problem of small populations in geographic areas • Desire to produce similar reliability in large and small areas – Example: Utilizing the Neyman Allocation method illustrated, Chicago takes up approximately 54% if Illinois’s sample allocation; this may lead to a possible unreliable sample in non-Chicago areas within Illinois
  • 28. Reliability How to “spread out” sample allocation? Bankier (1988): Power Allocations: Determining Sample Sizes for Subnational Areas • Adjust exponent for Nh (numerator and denominator) in the Neyman Allocation • Drops Chicago’s value to approx. 34% of IL’s sample allocation ( )∑= ⋅ ⋅ •= H h hh hh h SN SN nn 1 Nh = sum of stratum frame employees Sh represents an occupational variability measure within a stratum Occupations for each stratum (or cell) obtained from recent estimates file; weighted data Denominator summed overall by state
  • 29. Total Allocation for Illinois 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 B loom ingtonC ham paign C hicago D anvilleR ockIsland D ecaturK ankakee P eoria R ockford E .S t.LouisS pringfield B O S 1 B O S 2 B O S 3 B O S 4 Allocation Neyman 90th Neyman 90th(SqRoot) Reliability
  • 30. Alternative Allocation Design for the Occupational Employment Statistics (OES) Survey QUESTIONS? lawley.ernest@bls.gov

Editor's Notes

  1. QCEW/LDB will be referenced in a couple of future slides
  2. State/MSA=MSAs split into two or more parts if crosses a state line; i.e. Kansas City is divided into Kansas City, MO and Kansas City, KS.
  3. Now the frame has been created and stratified; time to allocate for the sample: 6.7 million frame1.2 million sample
  4. Last modified in 1996 and hasn’t been modified since due to possible shifting of budgets from state-to-state. States do not like when budgets are shifted because it takes money away from one State and gives it to another.
  5. Maximum Employment: refer to QCEW/LDB slide 4
  6. An objective of a good sample design is to produce similar reliability of estimates within stratum; namely, similar occupational reliability for each defined area.
  7. Note: you want to select an appropriate sample (and not short-change) in strata where populations are large, even though occupations are not so variable. This is due to the fact that you still want to create reliable estimates. The “proportional to employment” design ensured that large populations would not be undersampled, but did not ensure a proper sample for populations where large occupational variability existed.
  8. Inefficiencytoo much sample wasted on (occupational) homogenous strata; better to collect more heterogenous information.
  9. Notice that Acc/Food Services industry contains less occupations, thus is more homogenous than Wholesale Trade industry which contains more occupations (more heterogeneous or exhibits more occupational diversity). Proportional Allocation does not consider occupational variability within a stratum (or aggregated stratum) Is there a method that takes into consideration both frame size AND occupational variability?
  10. Notice that values in the numerator, when each is increased—this results in an increase in allocation (denominator is fixed for each state). Question: what do we use for Sh? Sh is defined as the “key variable” of interest; measures some sort of “variability”. We are measuring employment and wages for 800 different occupations by stratum. Within each stratum exists an “occupational variance”. Where to obtain occupational information? Occupational information is NOT on the QCEW (slide 4). Must obtain occupational information from the most readily available source; the most recent OES estimates file.
  11. How to calculate Sh? VERY CAREFULLY! For simplicity, an Sh cell will be defined as industry-only (disregard geography).
  12. Including weights in formula.
  13. We do not want to include any unusual occupations when aggregating CV values; these unusual occupations tend to have high CV values and may heavily (and unncessesarily) influence occupational variance, Sh.
  14. Put step 1 and step 2 together, calculate weighted mean of CVs for 90th percentile occupations within industry.
  15. 90th percentile: .51+.25+.14=.90 Note: Utilizing the 90th-percentile weighted mean is a more objective measure of Sh than using the “unweighted mean of typical occupations” method. Choosing typical occupations for an industry may be a subjective measure.
  16. How are we going to group CVs to get a weighted average for Sh? Best grouping was determined nationally by industry. That is, take a weighed mean of all CVs for the 90th percentile of occupations in each national industry cell (or stratum).
  17. Does staffing pattern of a local industry match staffing pattern of national industry? Does staffing pattern of education industry in Miami match national education staffing pattern? Does staffing pattern of education industry in Salt Lake City match national education staffing pattern?
  18. How are we going to group CVs to get a weighted average for Sh? Best grouping was determined nationally by industry. That is, take a weighed mean of all CVs for the 90th percentile of occupations in each national industry cell (or stratum).
  19. Neyman Allocation using occupational variability measure for Sh takes care of the problem of unreliable estimates within industries that may have a small number of employees relative to occupational variability, but what about unreliable estimates in areas of states where there are small populations? Bloomington, Champaign, Danville, Kankakee, Springfield
  20. Adjusted exponent to Nh and tested; acceptable level of “spread” of allocation when exponent=0.5.