SlideShare a Scribd company logo
1 of 14
Predicting Long-Term
Unemployment for
Workforce System Clients
Insights from Machine Learning
Jessica Smith Stockham
Objective
 Inform how the U.S. Department of Labor (DOL) workforce
system can prioritize limited follow-up resources to those
who are mostly likely to have the most trouble finding or
keeping a job
 Gain insights from machine learning on the characteristics
of workforce system clients that are most likely to be
unemployed 1 year after exiting workforce system
services
Method
107 predictors
Unemployed
Employed
Tree Algorithm
 Fit 2 different “tree-based” machine learning algorithms
 Decision Trees
 Random Forests
 Decision trees are relatively simple and quick but can
suffer from over-fitting (i.e., they predict your training
dataset too well and not generalize to future data)
 Random Forests are computationally intensive and more
complicated, but generalize better to future data
Data
 1. DOL Performance
Records
 Administrative data with
characteristics of
individuals served,
workforce services
provided, and employment
outcomes for 4 quarters
after exit
 PY 2018 Q2 WIOA
Performance Records Public
Use Data File
 2. O*Net Skills Dataset
 Mapping of occupation
codes to 35 skills (e.g.,
reading comprehension,
active listening)
 Every occupation is rated on
each skill on a 0-5 scale
 E.g., Chief Executives
have a score of 4.88 on
Active Listening and a
score of 0 on Equipment
Maintenance.
1 Active Learning
2 Active Listening
3 Complex Problem Solving
4 Coordination
5 Critical Thinking
6 Equipment Maintenance
7 Equipment Selection
8 Installation
9 Instructing
10 Judgment and Decision Making
11 Learning Strategies
12 Management of Financial Resources
13 Management of Material Resources
14 Management of Personnel Resources
15 Mathematics
16 Monitoring
17 Negotiation
18 Operation Monitoring
19 Operation and Control
20 Operations Analysis
21 Persuasion
22 Programming
23 Quality Control Analysis
24 Reading Comprehension
25 Repairing
26 Science
27 Service Orientation
28 Social Perceptiveness
29 Speaking
30 Systems Analysis
31 Systems Evaluation
32 Technology Design
33 Time Management
34 Troubleshooting
35 Writing
Data Scope and Limitations
 Data Scope
 ~1 million workforce system clients
 Client’s most recent spell of service at the workforce center
 Adults age 25 - 65 served by the workforce system from July 1, 2016
to December 31, 2018
 50 states (excludes territories)
 Employment outcomes available
 Limitation = Lost about half the raw data file when I merged in O*Net
skills ratings (due to missingness on the most recent occupation
variable)
 Results may not generalize to the broader workforce system population
Participant Trends
 OUTCOME:
 33% of participants are
unemployed 1 yr later
 SELECT FEATURES
 53% have only a high
school diploma/GED or
less
 43% are White
 28% are Black
 Age range is diverse
20%
27%
24%
29%
25-30 31-40 41-50 51-65
Age
Decision Tree: 15 Most Important Features
to predict unemployment
 Age 51-65 (vs Age 25-30)
 Education Level Less than HS (vs
having a BA or higher)
 Duration (in days) of workforce system
service receipt
 Living in LA, FL, MA, CT, NH, IN (vs
California)
 Veteran status
 Being long-term unemployed prior to
receiving workforce system services
 Being Black or providing “no response”
to the race/ethnicity data element (vs
being White)
 The following job skills: programming,
monitoring

Random Forest: 15 Most Important Features
to predict unemployment
 Age 51-65 (vs Age 25-30)
 Education Level Less than HS (vs
having a BA or higher)
 Duration (in days) of workforce system
service receipt
 Living in LA or FL (vs CA)
 Providing “no response” to the
race/ethnicity data element (vs being
White)
 The following job skills: programming,
monitoring, reading comprehension,
math, writing, operation monitoring
 Being long-term unemployed prior to
receiving workforce system services
 Veteran status

Model Best Parameters
Accuracy on
Validation
Dataset
(Share of
Correctly
Classified Cases)
Accuracy on
Test Dataset
(Share of
Correctly
Classified
Cases)
Decision Tree -max # of leaf nodes = 100 67% 67%
Random Forest -max depth of tree = 10
-max features = 14
-N estimators = 100
67% 67%
Prediction Results
 Decision Tree and Random Forest models have the same
level of accuracy on the training and test data.
 However, they vary some in the most important
predictive features
 67% accuracy is not that much better than a coin flip
 Future research: Improve predictive power
 Modify features (add interaction and higher order terms)
 Restrict the data scope to a more homogenous subset of the
workforce system clients, such as low-income adults age 30-
40 in California.
 Diagnose who is missing data on O*Net skill ratings
 Try additional machine learning algorithms
Takeaways
 https://jhsmith22.github.io/workforce_ml/
Project website
Extra: Fitting a Machine Learning Model
 1. Engineer the features: clean data and recode values as needed.
Covert categorical features into binary dummies.
 2. Split data into a “training” vs “test” datasets. Hold the “test”
data in reserve until Step #5.
 3. Try out a range of model parameters on the training dataset,
leveraging 5-fold cross-validation to create a more robust fit.
 4. Pick the best model parameters based on prediction accuracy
(How well does the model trained on the “training dataset” predict
the outcomes in the “validation” dataset?)
 5. Assess how well my model generalizes to unseen data. Evaluate
how accurately the model predicts the outcomes in the “test”
dataset.
Demographic & Socioeconomic Features
# Predictor Coding Description
1 Age Continuous Age in years at program entry
2 Sex Categorical Male, Female, neither. Omitted male for interpretability
3 Race/Ethnicity Categorical Hispanic, Asian (Not Hispanic), Black (Not Hispanic), Native Hawaiian/Pacific
Islander/American Indian/Alaska Native (not Hispanic), White (Not Hispanic),
Multiple Race (not Hispanic). Omitted “White” for interpretability.
4 Education Level Categorical Less than HS, HS diploma or GED, some post-secondary, postsecondary
technical or vocational certificate, Associate’s degree, Bachelor’s Degree or
higher. Omitted “Bachelor’s Degree or higher” for interpretability.
5 Veteran Status Binary Flag for veteran
6 Low-income Status Binary Flag for low-income at entry
7 English as a Second
Language
Binary Flag for English as a Second Language at entry
8 Single Parent Binary Flag for single parent at entry
9 Criminal History Categorical Yes, no, or refused to answer. Omitted “no” for interpretability.
10 Long-term unemployed Binary Flag for being unemployed for 27 or more consecutive weeks
11 Public Assistance Status Binary Flag for receipt of TANF, SNAP, SSI, or other reported assistance
12 State Categorical State that submitted the participant data. Omitted CA for interpretability.
Workforce System Experience Features
# Name Coding Description
13 2017 Exit Year Binary 2017 rather than 2016 Exit Year
14 Service Duration Continuous Cumulative days of service
15 Number of Spells of
Service Receipt
Continuous Count of the number of cycles of “entry”
and “exit” into workforce system services
Recent Occupation Skills Features
# Name Coding Description
16 Skill Rating (for each
of the 35 skills) for
the client’s most
recent occupation
Continuous Rating between 0 - 5

More Related Content

Similar to Predicting Long-Term Unemployment for Workforce System Clients

Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
ISRCS1130201541745413_CS8KWHKR3W
ISRCS1130201541745413_CS8KWHKR3WISRCS1130201541745413_CS8KWHKR3W
ISRCS1130201541745413_CS8KWHKR3WHannah Bank
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyColleen Farrelly
 
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018Catalina Arango
 
Introduction industrial-statistics-salt-lake-city
Introduction industrial-statistics-salt-lake-cityIntroduction industrial-statistics-salt-lake-city
Introduction industrial-statistics-salt-lake-cityGlobalCompliancePanel
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxssuserf9c51d
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkGabriel Hughes PhD
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
CSET 121 Chem Exam 201602- Pass
CSET 121 Chem Exam 201602- PassCSET 121 Chem Exam 201602- Pass
CSET 121 Chem Exam 201602- PassPete Bach
 
Where's your digital at feb 2018
Where's your digital at feb 2018Where's your digital at feb 2018
Where's your digital at feb 2018Moira Wright
 
Running head BUSN311 - Quantitative Methods and Analysis 1.docx
Running head  BUSN311 - Quantitative Methods and Analysis 1.docxRunning head  BUSN311 - Quantitative Methods and Analysis 1.docx
Running head BUSN311 - Quantitative Methods and Analysis 1.docxjoellemurphey
 
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docx
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docxAssignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docx
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docxdeanmtaylor1545
 
vijay mishra_data_analyst_cv
vijay mishra_data_analyst_cvvijay mishra_data_analyst_cv
vijay mishra_data_analyst_cvvijay mishra
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
A Predictive Model using Personality Traits: A Survey
A Predictive Model using Personality Traits: A SurveyA Predictive Model using Personality Traits: A Survey
A Predictive Model using Personality Traits: A SurveyIRJET Journal
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 

Similar to Predicting Long-Term Unemployment for Workforce System Clients (20)

Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
ISRCS1130201541745413_CS8KWHKR3W
ISRCS1130201541745413_CS8KWHKR3WISRCS1130201541745413_CS8KWHKR3W
ISRCS1130201541745413_CS8KWHKR3W
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
 
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
 
Introduction industrial-statistics-salt-lake-city
Introduction industrial-statistics-salt-lake-cityIntroduction industrial-statistics-salt-lake-city
Introduction industrial-statistics-salt-lake-city
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docx
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
CSET 121 Chem Exam 201602- Pass
CSET 121 Chem Exam 201602- PassCSET 121 Chem Exam 201602- Pass
CSET 121 Chem Exam 201602- Pass
 
Where's your digital at feb 2018
Where's your digital at feb 2018Where's your digital at feb 2018
Where's your digital at feb 2018
 
Running head BUSN311 - Quantitative Methods and Analysis 1.docx
Running head  BUSN311 - Quantitative Methods and Analysis 1.docxRunning head  BUSN311 - Quantitative Methods and Analysis 1.docx
Running head BUSN311 - Quantitative Methods and Analysis 1.docx
 
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docx
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docxAssignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docx
Assignment 1 The CEO’s Challenge Due Week 3 and worth 150 points .docx
 
vijay mishra_data_analyst_cv
vijay mishra_data_analyst_cvvijay mishra_data_analyst_cv
vijay mishra_data_analyst_cv
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
A Predictive Model using Personality Traits: A Survey
A Predictive Model using Personality Traits: A SurveyA Predictive Model using Personality Traits: A Survey
A Predictive Model using Personality Traits: A Survey
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

Predicting Long-Term Unemployment for Workforce System Clients

  • 1. Predicting Long-Term Unemployment for Workforce System Clients Insights from Machine Learning Jessica Smith Stockham
  • 2. Objective  Inform how the U.S. Department of Labor (DOL) workforce system can prioritize limited follow-up resources to those who are mostly likely to have the most trouble finding or keeping a job  Gain insights from machine learning on the characteristics of workforce system clients that are most likely to be unemployed 1 year after exiting workforce system services
  • 3. Method 107 predictors Unemployed Employed Tree Algorithm  Fit 2 different “tree-based” machine learning algorithms  Decision Trees  Random Forests  Decision trees are relatively simple and quick but can suffer from over-fitting (i.e., they predict your training dataset too well and not generalize to future data)  Random Forests are computationally intensive and more complicated, but generalize better to future data
  • 4. Data  1. DOL Performance Records  Administrative data with characteristics of individuals served, workforce services provided, and employment outcomes for 4 quarters after exit  PY 2018 Q2 WIOA Performance Records Public Use Data File  2. O*Net Skills Dataset  Mapping of occupation codes to 35 skills (e.g., reading comprehension, active listening)  Every occupation is rated on each skill on a 0-5 scale  E.g., Chief Executives have a score of 4.88 on Active Listening and a score of 0 on Equipment Maintenance. 1 Active Learning 2 Active Listening 3 Complex Problem Solving 4 Coordination 5 Critical Thinking 6 Equipment Maintenance 7 Equipment Selection 8 Installation 9 Instructing 10 Judgment and Decision Making 11 Learning Strategies 12 Management of Financial Resources 13 Management of Material Resources 14 Management of Personnel Resources 15 Mathematics 16 Monitoring 17 Negotiation 18 Operation Monitoring 19 Operation and Control 20 Operations Analysis 21 Persuasion 22 Programming 23 Quality Control Analysis 24 Reading Comprehension 25 Repairing 26 Science 27 Service Orientation 28 Social Perceptiveness 29 Speaking 30 Systems Analysis 31 Systems Evaluation 32 Technology Design 33 Time Management 34 Troubleshooting 35 Writing
  • 5. Data Scope and Limitations  Data Scope  ~1 million workforce system clients  Client’s most recent spell of service at the workforce center  Adults age 25 - 65 served by the workforce system from July 1, 2016 to December 31, 2018  50 states (excludes territories)  Employment outcomes available  Limitation = Lost about half the raw data file when I merged in O*Net skills ratings (due to missingness on the most recent occupation variable)  Results may not generalize to the broader workforce system population
  • 6. Participant Trends  OUTCOME:  33% of participants are unemployed 1 yr later  SELECT FEATURES  53% have only a high school diploma/GED or less  43% are White  28% are Black  Age range is diverse 20% 27% 24% 29% 25-30 31-40 41-50 51-65 Age
  • 7. Decision Tree: 15 Most Important Features to predict unemployment  Age 51-65 (vs Age 25-30)  Education Level Less than HS (vs having a BA or higher)  Duration (in days) of workforce system service receipt  Living in LA, FL, MA, CT, NH, IN (vs California)  Veteran status  Being long-term unemployed prior to receiving workforce system services  Being Black or providing “no response” to the race/ethnicity data element (vs being White)  The following job skills: programming, monitoring 
  • 8. Random Forest: 15 Most Important Features to predict unemployment  Age 51-65 (vs Age 25-30)  Education Level Less than HS (vs having a BA or higher)  Duration (in days) of workforce system service receipt  Living in LA or FL (vs CA)  Providing “no response” to the race/ethnicity data element (vs being White)  The following job skills: programming, monitoring, reading comprehension, math, writing, operation monitoring  Being long-term unemployed prior to receiving workforce system services  Veteran status 
  • 9. Model Best Parameters Accuracy on Validation Dataset (Share of Correctly Classified Cases) Accuracy on Test Dataset (Share of Correctly Classified Cases) Decision Tree -max # of leaf nodes = 100 67% 67% Random Forest -max depth of tree = 10 -max features = 14 -N estimators = 100 67% 67% Prediction Results
  • 10.  Decision Tree and Random Forest models have the same level of accuracy on the training and test data.  However, they vary some in the most important predictive features  67% accuracy is not that much better than a coin flip  Future research: Improve predictive power  Modify features (add interaction and higher order terms)  Restrict the data scope to a more homogenous subset of the workforce system clients, such as low-income adults age 30- 40 in California.  Diagnose who is missing data on O*Net skill ratings  Try additional machine learning algorithms Takeaways
  • 12. Extra: Fitting a Machine Learning Model  1. Engineer the features: clean data and recode values as needed. Covert categorical features into binary dummies.  2. Split data into a “training” vs “test” datasets. Hold the “test” data in reserve until Step #5.  3. Try out a range of model parameters on the training dataset, leveraging 5-fold cross-validation to create a more robust fit.  4. Pick the best model parameters based on prediction accuracy (How well does the model trained on the “training dataset” predict the outcomes in the “validation” dataset?)  5. Assess how well my model generalizes to unseen data. Evaluate how accurately the model predicts the outcomes in the “test” dataset.
  • 13. Demographic & Socioeconomic Features # Predictor Coding Description 1 Age Continuous Age in years at program entry 2 Sex Categorical Male, Female, neither. Omitted male for interpretability 3 Race/Ethnicity Categorical Hispanic, Asian (Not Hispanic), Black (Not Hispanic), Native Hawaiian/Pacific Islander/American Indian/Alaska Native (not Hispanic), White (Not Hispanic), Multiple Race (not Hispanic). Omitted “White” for interpretability. 4 Education Level Categorical Less than HS, HS diploma or GED, some post-secondary, postsecondary technical or vocational certificate, Associate’s degree, Bachelor’s Degree or higher. Omitted “Bachelor’s Degree or higher” for interpretability. 5 Veteran Status Binary Flag for veteran 6 Low-income Status Binary Flag for low-income at entry 7 English as a Second Language Binary Flag for English as a Second Language at entry 8 Single Parent Binary Flag for single parent at entry 9 Criminal History Categorical Yes, no, or refused to answer. Omitted “no” for interpretability. 10 Long-term unemployed Binary Flag for being unemployed for 27 or more consecutive weeks 11 Public Assistance Status Binary Flag for receipt of TANF, SNAP, SSI, or other reported assistance 12 State Categorical State that submitted the participant data. Omitted CA for interpretability.
  • 14. Workforce System Experience Features # Name Coding Description 13 2017 Exit Year Binary 2017 rather than 2016 Exit Year 14 Service Duration Continuous Cumulative days of service 15 Number of Spells of Service Receipt Continuous Count of the number of cycles of “entry” and “exit” into workforce system services Recent Occupation Skills Features # Name Coding Description 16 Skill Rating (for each of the 35 skills) for the client’s most recent occupation Continuous Rating between 0 - 5