SlideShare a Scribd company logo
Employee Service Time Forecasting by
Machine Learning Methods
Introduction
Staff attrition rate has a profound effect on the operation and
performance of a company. Statistics show that many
companies are now suffering heavy loss caused by sudden
employee attrition, resulting in urgent policy adjustment and
waste of budget resources.
Modeling
Conclusion
Business Intelligence & Analytics
http://www.stevens.edu/howe/academics/graduate/business-intelligence-analytics
Di Zhu, Tingting Jiang Nov 23, 2015
Instructor: Germán G. Creamer, Theodoros Lappas
Data Description Variables Description
agelvl Age
edlvl Education Level
gender Gender
patco Occupation Category
stemocc STEM Occupations
supervis Supervisory Status
toa Type of Appointment
worksch Work Schedule
workstat Work Status
los Average Length of Service
Data Source:
• www.opm.gov/data/
• www.regulations.gov/#!home
Target Variable:
• Average Length of Service
Length of the Data:
• 36,166 examples
• 10 variables
Raw Data
Cleaned Data
Optimal Algorithm
Logistic Regression
CART
AdaBoosting
Naïve Bayes
Data
Processing
Evaluation
Layer 3: Model in Use New Data Item Model Prediction
Data
Grabbing
Web Data Set
sentiment
Analysis
Text Mining Result
Layer 2: Sentimenl Analysis
Layer 1: Model Learning
A. Model Learning & Evaluation
Table 1. Variable Descriptions
Table 2. Testing Results
Figure 1. ROC and Precision-Recall Curves
References
[1] G Creamer and Y Freund, Automated Trading with Boosting
and Expert Weighting, Quantitative Finance, Vol. 4, No. 10, pp.
401–420
[2] G Creamer, Learning with decision trees: ADTrees, bagging,
and random forests, Stevens Institute of Technology, Oct, 2015
B. Sentiment Analysis
• We downloaded and
analyzed 3,977 comments
on Nov 6, 2015 when there
were over 4,000 comments.
• In this case, 78% of the net
users support this policy.
Adaboosting [1]:
Do you agree to
extend OPT for
F-1 students with
STEM degrees?
• We selected AdaBoosting algorithm because it yields the
best result in the testing data set, and reduces model bias
efficiently[2].
• The Accuracy Rate is 76%.
To deal with it, we
constructed a model to
predict staff service time
using both machine learning
and sentiment analysis.
We proposed a refined model to predict Employee Service
time applying machine learning and sentiment analysis
In future work:
• Involve more topics in order to yield an instructive and
comprehensive HR decision support system.
• Consider optimization analysis to optimize some specified
set of parameters leading to the maximum benefit of
company.
Algorithm Accuracy AUC Confusion Matrix
Logistic
Regression 0.728 0.808 3771 1414
1522 4079
CART 0.754 0.83 4286 1646
1007 3847
AdaBoosting 0.755 0.844 4195 1545
1098 3948
Naïve Bayes 0.698 0.768 4285 2245
1008 3248
78%
22%
Net Users’ Opinion Survey
Agree Disagree
Results

More Related Content

What's hot

Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
Neal Lathia
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
Coforge (Erstwhile WHISHWORKS)
 
Big Data Science and Data Analytics Workshop by Tetrahedron
Big Data Science and Data Analytics Workshop by TetrahedronBig Data Science and Data Analytics Workshop by Tetrahedron
Big Data Science and Data Analytics Workshop by Tetrahedron
Sagar Sangam Sahu
 
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
RavindraSinghKushwah1
 
Dadm (lys)
Dadm (lys)Dadm (lys)
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
Matt Hansen
 
TSI Final Presentation
TSI Final PresentationTSI Final Presentation
TSI Final Presentation
Marco Better
 
Expedia Data Analysis
Expedia Data AnalysisExpedia Data Analysis
Expedia Data Analysis
Sriram Murali K J
 
Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582 Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582
Rahul Pandey
 
Machine learning project
Machine learning project Machine learning project
Machine learning project
BabatundeSogunro
 
Build your library a six sigma way
Build your library a six sigma wayBuild your library a six sigma way
Build your library a six sigma way
Vrushali Basarkar
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop
Rising Media, Inc.
 
Patchwork - ECO 18: How digital innovation can support workforce strategies
Patchwork - ECO 18: How digital innovation can support workforce strategiesPatchwork - ECO 18: How digital innovation can support workforce strategies
Patchwork - ECO 18: How digital innovation can support workforce strategies
Innovation Agency
 
Premier's Introduction To Labor Management in Healthcare
Premier's Introduction To Labor Management in HealthcarePremier's Introduction To Labor Management in Healthcare
Premier's Introduction To Labor Management in Healthcare
moogiedm
 
MSA – Gage R&R Test
MSA – Gage R&R TestMSA – Gage R&R Test
MSA – Gage R&R Test
Matt Hansen
 
Improving Laboratory Flow with Simulation
Improving Laboratory Flow with SimulationImproving Laboratory Flow with Simulation
Improving Laboratory Flow with Simulation
SIMUL8 Corporation
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
John B. Rollins, Ph.D.
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
Rajib Kumar De
 
Cockerill rs350-day3-what-can-be-done
Cockerill rs350-day3-what-can-be-doneCockerill rs350-day3-what-can-be-done
Cockerill rs350-day3-what-can-be-done
Riffyn
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptx
Derryn Knife
 

What's hot (20)

Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
 
Big Data Science and Data Analytics Workshop by Tetrahedron
Big Data Science and Data Analytics Workshop by TetrahedronBig Data Science and Data Analytics Workshop by Tetrahedron
Big Data Science and Data Analytics Workshop by Tetrahedron
 
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudha...
 
Dadm (lys)
Dadm (lys)Dadm (lys)
Dadm (lys)
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
TSI Final Presentation
TSI Final PresentationTSI Final Presentation
TSI Final Presentation
 
Expedia Data Analysis
Expedia Data AnalysisExpedia Data Analysis
Expedia Data Analysis
 
Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582 Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582
 
Machine learning project
Machine learning project Machine learning project
Machine learning project
 
Build your library a six sigma way
Build your library a six sigma wayBuild your library a six sigma way
Build your library a six sigma way
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop
 
Patchwork - ECO 18: How digital innovation can support workforce strategies
Patchwork - ECO 18: How digital innovation can support workforce strategiesPatchwork - ECO 18: How digital innovation can support workforce strategies
Patchwork - ECO 18: How digital innovation can support workforce strategies
 
Premier's Introduction To Labor Management in Healthcare
Premier's Introduction To Labor Management in HealthcarePremier's Introduction To Labor Management in Healthcare
Premier's Introduction To Labor Management in Healthcare
 
MSA – Gage R&R Test
MSA – Gage R&R TestMSA – Gage R&R Test
MSA – Gage R&R Test
 
Improving Laboratory Flow with Simulation
Improving Laboratory Flow with SimulationImproving Laboratory Flow with Simulation
Improving Laboratory Flow with Simulation
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Cockerill rs350-day3-what-can-be-done
Cockerill rs350-day3-what-can-be-doneCockerill rs350-day3-what-can-be-done
Cockerill rs350-day3-what-can-be-done
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptx
 

Viewers also liked

Group E 4.0V
Group E 4.0VGroup E 4.0V
Group E 4.0V
JIANG Tingting
 
Get Hip Libraries And Gangsta Lit
Get Hip   Libraries And Gangsta LitGet Hip   Libraries And Gangsta Lit
Get Hip Libraries And Gangsta Lit
Nathan Flinchum
 
Business project 1 slides
Business project 1 slidesBusiness project 1 slides
Business project 1 slides
Kenneth Tan
 
Roaming By Me
Roaming By MeRoaming By Me
Roaming By Me
Jacques DELEUZE
 
Metaphors
MetaphorsMetaphors
Metaphors
hpuengprof
 
Adecuación curricular individualizada
Adecuación curricular individualizadaAdecuación curricular individualizada
Adecuación curricular individualizada
Andy Cortés
 
What is Multiculturalism?
What is Multiculturalism?   What is Multiculturalism?
What is Multiculturalism?
paul_ilsley
 

Viewers also liked (7)

Group E 4.0V
Group E 4.0VGroup E 4.0V
Group E 4.0V
 
Get Hip Libraries And Gangsta Lit
Get Hip   Libraries And Gangsta LitGet Hip   Libraries And Gangsta Lit
Get Hip Libraries And Gangsta Lit
 
Business project 1 slides
Business project 1 slidesBusiness project 1 slides
Business project 1 slides
 
Roaming By Me
Roaming By MeRoaming By Me
Roaming By Me
 
Metaphors
MetaphorsMetaphors
Metaphors
 
Adecuación curricular individualizada
Adecuación curricular individualizadaAdecuación curricular individualizada
Adecuación curricular individualizada
 
What is Multiculturalism?
What is Multiculturalism?   What is Multiculturalism?
What is Multiculturalism?
 

Similar to Di&Tingting_5.0 v

Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
Jeremy Lehman
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 
Home assignment Tal Elor
Home assignment Tal ElorHome assignment Tal Elor
Home assignment Tal Elor
ssuser88467b
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
IRJET Journal
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
audeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
roushhsiu
 
Work Measurement and Operational Effectiveness
Work Measurement and Operational EffectivenessWork Measurement and Operational Effectiveness
Work Measurement and Operational Effectiveness
grubinm
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
Rosario Cunha
 
Scalable HR Integrations for Better Data Analytics: Challenges & Solutions
Scalable HR Integrations for Better Data Analytics: Challenges & SolutionsScalable HR Integrations for Better Data Analytics: Challenges & Solutions
Scalable HR Integrations for Better Data Analytics: Challenges & Solutions
Harbinger Systems - HRTech Builder of Choice
 
Personnel Productivity System - Updated 6-6-2013
Personnel Productivity System - Updated 6-6-2013Personnel Productivity System - Updated 6-6-2013
Personnel Productivity System - Updated 6-6-2013
Thomas Bronack
 
Data Mining and Analytics
Data Mining and AnalyticsData Mining and Analytics
Data Mining and Analytics
Nathaniel Palmer
 
Tbla 2016
Tbla 2016 Tbla 2016
Tbla 2016
Amit Arora,CPCU
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
IRJET Journal
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docxDescriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
theodorelove43763
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
-linkedin
-linkedin-linkedin
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
Bhupesh Chaurasia
 
Developing a framework for
Developing a framework forDeveloping a framework for
Developing a framework for
csandit
 
Paper 30 decision-support_system_for_employee_candidate(1)
Paper 30 decision-support_system_for_employee_candidate(1)Paper 30 decision-support_system_for_employee_candidate(1)
Paper 30 decision-support_system_for_employee_candidate(1)
Soleman Universitas Borobudur
 

Similar to Di&Tingting_5.0 v (20)

Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
Home assignment Tal Elor
Home assignment Tal ElorHome assignment Tal Elor
Home assignment Tal Elor
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Work Measurement and Operational Effectiveness
Work Measurement and Operational EffectivenessWork Measurement and Operational Effectiveness
Work Measurement and Operational Effectiveness
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
 
Scalable HR Integrations for Better Data Analytics: Challenges & Solutions
Scalable HR Integrations for Better Data Analytics: Challenges & SolutionsScalable HR Integrations for Better Data Analytics: Challenges & Solutions
Scalable HR Integrations for Better Data Analytics: Challenges & Solutions
 
Personnel Productivity System - Updated 6-6-2013
Personnel Productivity System - Updated 6-6-2013Personnel Productivity System - Updated 6-6-2013
Personnel Productivity System - Updated 6-6-2013
 
Data Mining and Analytics
Data Mining and AnalyticsData Mining and Analytics
Data Mining and Analytics
 
Tbla 2016
Tbla 2016 Tbla 2016
Tbla 2016
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docxDescriptive Statistics and Interpretation Grading GuideQNT5.docx
Descriptive Statistics and Interpretation Grading GuideQNT5.docx
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
-linkedin
-linkedin-linkedin
-linkedin
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
 
Developing a framework for
Developing a framework forDeveloping a framework for
Developing a framework for
 
Paper 30 decision-support_system_for_employee_candidate(1)
Paper 30 decision-support_system_for_employee_candidate(1)Paper 30 decision-support_system_for_employee_candidate(1)
Paper 30 decision-support_system_for_employee_candidate(1)
 

Di&Tingting_5.0 v

  • 1. Employee Service Time Forecasting by Machine Learning Methods Introduction Staff attrition rate has a profound effect on the operation and performance of a company. Statistics show that many companies are now suffering heavy loss caused by sudden employee attrition, resulting in urgent policy adjustment and waste of budget resources. Modeling Conclusion Business Intelligence & Analytics http://www.stevens.edu/howe/academics/graduate/business-intelligence-analytics Di Zhu, Tingting Jiang Nov 23, 2015 Instructor: Germán G. Creamer, Theodoros Lappas Data Description Variables Description agelvl Age edlvl Education Level gender Gender patco Occupation Category stemocc STEM Occupations supervis Supervisory Status toa Type of Appointment worksch Work Schedule workstat Work Status los Average Length of Service Data Source: • www.opm.gov/data/ • www.regulations.gov/#!home Target Variable: • Average Length of Service Length of the Data: • 36,166 examples • 10 variables Raw Data Cleaned Data Optimal Algorithm Logistic Regression CART AdaBoosting Naïve Bayes Data Processing Evaluation Layer 3: Model in Use New Data Item Model Prediction Data Grabbing Web Data Set sentiment Analysis Text Mining Result Layer 2: Sentimenl Analysis Layer 1: Model Learning A. Model Learning & Evaluation Table 1. Variable Descriptions Table 2. Testing Results Figure 1. ROC and Precision-Recall Curves References [1] G Creamer and Y Freund, Automated Trading with Boosting and Expert Weighting, Quantitative Finance, Vol. 4, No. 10, pp. 401–420 [2] G Creamer, Learning with decision trees: ADTrees, bagging, and random forests, Stevens Institute of Technology, Oct, 2015 B. Sentiment Analysis • We downloaded and analyzed 3,977 comments on Nov 6, 2015 when there were over 4,000 comments. • In this case, 78% of the net users support this policy. Adaboosting [1]: Do you agree to extend OPT for F-1 students with STEM degrees? • We selected AdaBoosting algorithm because it yields the best result in the testing data set, and reduces model bias efficiently[2]. • The Accuracy Rate is 76%. To deal with it, we constructed a model to predict staff service time using both machine learning and sentiment analysis. We proposed a refined model to predict Employee Service time applying machine learning and sentiment analysis In future work: • Involve more topics in order to yield an instructive and comprehensive HR decision support system. • Consider optimization analysis to optimize some specified set of parameters leading to the maximum benefit of company. Algorithm Accuracy AUC Confusion Matrix Logistic Regression 0.728 0.808 3771 1414 1522 4079 CART 0.754 0.83 4286 1646 1007 3847 AdaBoosting 0.755 0.844 4195 1545 1098 3948 Naïve Bayes 0.698 0.768 4285 2245 1008 3248 78% 22% Net Users’ Opinion Survey Agree Disagree Results