Predicting Long-Term Unemployment for Workforce System Clients

Predicting Long-Term
Unemployment for
Workforce System Clients
Insights from Machine Learning
Jessica Smith Stockham

Objective
 Inform how the U.S. Department of Labor (DOL) workforce
system can prioritize limited follow-up resources to those
who are mostly likely to have the most trouble finding or
keeping a job
 Gain insights from machine learning on the characteristics
of workforce system clients that are most likely to be
unemployed 1 year after exiting workforce system
services

Method
107 predictors
Unemployed
Employed
Tree Algorithm
 Fit 2 different “tree-based” machine learning algorithms
 Decision Trees
 Random Forests
 Decision trees are relatively simple and quick but can
suffer from over-fitting (i.e., they predict your training
dataset too well and not generalize to future data)
 Random Forests are computationally intensive and more
complicated, but generalize better to future data

Data
 1. DOL Performance
Records
 Administrative data with
characteristics of
individuals served,
workforce services
provided, and employment
outcomes for 4 quarters
after exit
 PY 2018 Q2 WIOA
Performance Records Public
Use Data File
 2. O*Net Skills Dataset
 Mapping of occupation
codes to 35 skills (e.g.,
reading comprehension,
active listening)
 Every occupation is rated on
each skill on a 0-5 scale
 E.g., Chief Executives
have a score of 4.88 on
Active Listening and a
score of 0 on Equipment
Maintenance.
1 Active Learning
2 Active Listening
3 Complex Problem Solving
4 Coordination
5 Critical Thinking
6 Equipment Maintenance
7 Equipment Selection
8 Installation
9 Instructing
10 Judgment and Decision Making
11 Learning Strategies
12 Management of Financial Resources
13 Management of Material Resources
14 Management of Personnel Resources
15 Mathematics
16 Monitoring
17 Negotiation
18 Operation Monitoring
19 Operation and Control
20 Operations Analysis
21 Persuasion
22 Programming
23 Quality Control Analysis
24 Reading Comprehension
25 Repairing
26 Science
27 Service Orientation
28 Social Perceptiveness
29 Speaking
30 Systems Analysis
31 Systems Evaluation
32 Technology Design
33 Time Management
34 Troubleshooting
35 Writing

Data Scope and Limitations
 Data Scope
 ~1 million workforce system clients
 Client’s most recent spell of service at the workforce center
 Adults age 25 - 65 served by the workforce system from July 1, 2016
to December 31, 2018
 50 states (excludes territories)
 Employment outcomes available
 Limitation = Lost about half the raw data file when I merged in O*Net
skills ratings (due to missingness on the most recent occupation
variable)
 Results may not generalize to the broader workforce system population

Participant Trends
 OUTCOME:
 33% of participants are
unemployed 1 yr later
 SELECT FEATURES
 53% have only a high
school diploma/GED or
less
 43% are White
 28% are Black
 Age range is diverse
20%
27%
24%
29%
25-30 31-40 41-50 51-65
Age

Decision Tree: 15 Most Important Features
to predict unemployment
 Age 51-65 (vs Age 25-30)
 Education Level Less than HS (vs
having a BA or higher)
 Duration (in days) of workforce system
service receipt
 Living in LA, FL, MA, CT, NH, IN (vs
California)
 Veteran status
 Being long-term unemployed prior to
receiving workforce system services
 Being Black or providing “no response”
to the race/ethnicity data element (vs
being White)
 The following job skills: programming,
monitoring


Random Forest: 15 Most Important Features
to predict unemployment
 Age 51-65 (vs Age 25-30)
 Education Level Less than HS (vs
having a BA or higher)
 Duration (in days) of workforce system
service receipt
 Living in LA or FL (vs CA)
 Providing “no response” to the
race/ethnicity data element (vs being
White)
 The following job skills: programming,
monitoring, reading comprehension,
math, writing, operation monitoring
 Being long-term unemployed prior to
receiving workforce system services
 Veteran status


Model Best Parameters
Accuracy on
Validation
Dataset
(Share of
Correctly
Classified Cases)
Accuracy on
Test Dataset
(Share of
Correctly
Classified
Cases)
Decision Tree -max # of leaf nodes = 100 67% 67%
Random Forest -max depth of tree = 10
-max features = 14
-N estimators = 100
67% 67%
Prediction Results

 Decision Tree and Random Forest models have the same
level of accuracy on the training and test data.
 However, they vary some in the most important
predictive features
 67% accuracy is not that much better than a coin flip
 Future research: Improve predictive power
 Modify features (add interaction and higher order terms)
 Restrict the data scope to a more homogenous subset of the
workforce system clients, such as low-income adults age 30-
40 in California.
 Diagnose who is missing data on O*Net skill ratings
 Try additional machine learning algorithms
Takeaways

 https://jhsmith22.github.io/workforce_ml/
Project website

Extra: Fitting a Machine Learning Model
 1. Engineer the features: clean data and recode values as needed.
Covert categorical features into binary dummies.
 2. Split data into a “training” vs “test” datasets. Hold the “test”
data in reserve until Step #5.
 3. Try out a range of model parameters on the training dataset,
leveraging 5-fold cross-validation to create a more robust fit.
 4. Pick the best model parameters based on prediction accuracy
(How well does the model trained on the “training dataset” predict
the outcomes in the “validation” dataset?)
 5. Assess how well my model generalizes to unseen data. Evaluate
how accurately the model predicts the outcomes in the “test”
dataset.

Demographic & Socioeconomic Features
# Predictor Coding Description
1 Age Continuous Age in years at program entry
2 Sex Categorical Male, Female, neither. Omitted male for interpretability
3 Race/Ethnicity Categorical Hispanic, Asian (Not Hispanic), Black (Not Hispanic), Native Hawaiian/Pacific
Islander/American Indian/Alaska Native (not Hispanic), White (Not Hispanic),
Multiple Race (not Hispanic). Omitted “White” for interpretability.
4 Education Level Categorical Less than HS, HS diploma or GED, some post-secondary, postsecondary
technical or vocational certificate, Associate’s degree, Bachelor’s Degree or
higher. Omitted “Bachelor’s Degree or higher” for interpretability.
5 Veteran Status Binary Flag for veteran
6 Low-income Status Binary Flag for low-income at entry
7 English as a Second
Language
Binary Flag for English as a Second Language at entry
8 Single Parent Binary Flag for single parent at entry
9 Criminal History Categorical Yes, no, or refused to answer. Omitted “no” for interpretability.
10 Long-term unemployed Binary Flag for being unemployed for 27 or more consecutive weeks
11 Public Assistance Status Binary Flag for receipt of TANF, SNAP, SSI, or other reported assistance
12 State Categorical State that submitted the participant data. Omitted CA for interpretability.

Workforce System Experience Features
# Name Coding Description
13 2017 Exit Year Binary 2017 rather than 2016 Exit Year
14 Service Duration Continuous Cumulative days of service
15 Number of Spells of
Service Receipt
Continuous Count of the number of cycles of “entry”
and “exit” into workforce system services
Recent Occupation Skills Features
# Name Coding Description
16 Skill Rating (for each
of the 35 skills) for
the client’s most
recent occupation
Continuous Rating between 0 - 5

Predicting Long-Term Unemployment for Workforce System Clients

Recommended

Recommended

More Related Content

Similar to Predicting Long-Term Unemployment for Workforce System Clients

Similar to Predicting Long-Term Unemployment for Workforce System Clients (20)

Recently uploaded

Recently uploaded (20)

Predicting Long-Term Unemployment for Workforce System Clients