Abstract.
How can universities determine which adult continuing education programs students will enroll in to prepare themselves for meaningful jobs? For that matter, how should students determine which types of adult continuing education will help them achieve their career objectives? Finally, how can employers determine which types of adult continuing education programs produce candidates who have the capability to add value to the hiring organization?
The purpose of this paper is to attempt to answer these questions through analysis of Occupational Employment Statistics (OES) data provided by the U.S. Department of Labor - Bureau of Labor Statistics (BLS). The data science team attempted to build a meaningful predictive model to determine the expected number of employees in 22 occupational categories on both national and state levels based on features present in the BLS data set. A statistically-significant predictive model would benefit educational institutions, students, and employers in making better decisions on how to best deploy their scarce resources. The team attempted to explore several supervised and unsupervised machine learning techniques such as regression analysis, decision trees (including random forest) and clustering using the scikit-learn and pandas modules in Python to depict meaningful patterns in the data set. While the team was unable to identify a meaningful predictive model during its exploration, the team was able to create meaningful visualizations using Tableau that provide useful insight and have marginal predictive value.
2. Outline
Executive Summary
Data Set Description
Data Wrangling
Computation and Analysis
Visualization Analytics
Labor Implications
Education Implications
Future Research
3. Executive Summary
● Our objective is to predict labor market behavior
so that institutions of higher learning can plan
further ahead what sorts of programs they will
offer in adult continuing education.
● Motivated by research from the Georgetown
University Center on Education & the Workforce
4. Data Set
● Bureau of Labor Statistics
o We pulled data procured through the Occupational
Employment Statistics (OES) program ranging from 2009
to 2014
o Data was ingested related to the national, state, and local
levels
o This data set includes employment numbers, mean
wages, jobs per thousand, and other descriptors of over
820 detailed occupations
5. Data Wrangling
● Pulling the data
● Coding for the removal of atypical characters
● Manual cleaning
● New code for the removal of new atypical characters
● Database construction (postgreSQL)
● The normalization process
● Pulling and formatting data for analysis using SQL
6. Computation & Analysis
● How are the variables related?
● Simple linear regressions
○ using pandas, numpy, scipy and pylab
○ x = Annual Mean Salary
○ y = Total Employment
● No meaningful results
7. Other linear regressions
y = Total employment, x is:
Ann Mean Sal Jobs/ 1000 Ann Mean Sal Delta Ann Sal 10th %tile
Beta Coefficient -0.048 -3053.75 -0.0013 -0.0029
Constant 0.0000014 114534.87 0.0000012 0.000002
R-squared 0.003 0.006 0.001 0.001
P-value 0.04 0.007 0.234 0.143
Slope -0.47 -3053.75 -1.31 -2.9
Intercept 138194.22 114534.87 118382.99 118829.23
Standard Error 0.237 1134.65 1.10 1.98
9. Labor Market Implications
• Imbalanced labor market recovery
• Labor marketing stratification is a key concern
• STEM occupations have highest wage growth potential
based on 5-year trend
10. Education Implications
• Role of education as a driver of social mobility or
reinforcer of status quo
• Program selection > institutional selection
• Value of adult continuing education to skilled end users
(students and employers) will continue to rise due to
increasing specialization and technical needs
11. Incorporating Feedback
● Incorporate BLS API to continuously feed
new data into pipeline
● Include course/ program evaluation ratings
from students or schools
● Build a web application to combine employer
ratings or survey data with BLS data
12. Future Research
● Classify occupations as STEM or Non-STEM to do K-
means classification and clustering for novel insights
● Combine data sets from different sources (e.g. National
Center for Education Statistics) with different features to
create a meaningful predictive model
● Build a recommendation engine to analyze risk/return
trade-offs for different programs and occupations
Editor's Notes
BRIAN
BRIAN
Michael
-Thought the data ingestion would be a relatively straightforward...was sadly mistaken. http://www.bls.gov/oes/home.htm
Michael
Jenny
Explore what the distribution of our data is using descriptive statistics
Wanted to explore the data through regression analysis and didn’t find anything meaningful
Go through the ipython notebook demonstration to show the graph
Wanted to see what features were related, and how each feature would be correlated to total employment
“Shift Gears to exploring out data into a visual analytics
Jenny
we ran several other linear regressions with y as TOTAL EMPLOYMENT and X as Annual Mean Salary, Jobs Per Thousands, Change in annual mean salary, and the Annual Salary 10th percentile
Beta - for everyone unit increase in x, y will increase/decrease by that coefficient
Annual Mean Salary
When the annual mean salary is 0, total employment is equal to 0.0000014
For every unit increase in annual mean salary,
Jobs Per 1000 Delta -
When the jobs per 1000 is equal to 0, total employment equals 114,535.
For every unit increase in the jobs per 1000, total employment decreases by 3054 units
Tableau Chart #1
Total employment by state 2009-2014 - Green = growth, darker the green, the more significant the growth
What is the change in total employment?
Chart 2: Employment delta over time
2010 was a rough year, CA had most significant decrease by count
Things get slightly better
Go through specific examples
Chart 3: Percent change in employment over time
Talk about changes by occupation
Chart 4: Stratification of employment in order by mean salary
Chat 5: Annual Salary Delta over time
Shows the percentage change in mean salary from the previous year
-Filter to the Computer industry
-Filter to DC, VA, MD
Edwin
Helpful for people looking to make a career transition
We chose the right Certificate program
We will be employable
Transferable skills
Edwin
Helpful for people looking to make a career transition
We chose the right Certificate program
We will be employable
Transferable skills