SlideShare a Scribd company logo
PREDICTING EMPLOYEE ATTRITION
1.1 OBJECTIVE AND SCOPE OF THE STUDY
 The objective of this project is to predict the attrition rate for
each employee, to find out who’s more likely to leave the
organization.
 It will help organizations to find ways to prevent attrition or
to plan in advance the hiring of new candidate.
 Attrition proves to be a costly and time consuming problem
for the organization and it also leads to loss of productivity.
 The scope of the project extends to companies in all
industries.
1.2 ANALYTICS APPROACH
 Check for missing values in the data, and if any, will process
the data accordingly.
 Understand how the features are related with our target
variable - attrition
 Convert target variable into numeric form
 Apply feature selection and feature engineering to make it
model ready
 Apply various algorithms to check which one is the most
suitable
 Draw out recommendations based on our analysis.
1.3 DATA SOURCES
 For this project, an HR dataset named ‘IBM HR Analytics
Employee Attrition & Performance’, has been picked, which
is available on IBM website.
 The data contains records of 1,470 employees.
 It has information about employee’s current employment
status, the total number of companies worked for in the past,
Total number of years at the current company and the current
roles, Their education level, distance from home, monthly
income, etc.
1.4 TOOLS AND TECHNIQUES
 We have selected Python as our analytics tool.
 Python includes many packages such as Pandas, NumPy,
Matplotlib, Seaborn etc.
 Algorithms such as Logistic Regression, Random Forest,
Support Vector Machine and XGBoost have been used for
prediction.
 Importing Libraries
2.1 IMPORTING LIBRARY AND DATA EXTRACTION
 Importing Packages
 Data Extraction
2.2 EXPLORATORY DATA ANALYSIS
 Refers to the process of performing initial investigations on the
data so as to discover patterns, to spot inconsistencies, to test
hypothesis and to check assumptions with the help of graphical
representations
 Displaying First 5 Rows
 Displaying rows and columns
 Identifying Missing Values
 Count of “Yes” and “No” values of Attrition
2.3 VISUALIZATION(EDA) -
 Attrition V/s “Age”
 Attrition V/s “Distance from Home”
 Attrition V/s “Job Satisfaction”
 Attrition V/s “Performance Rating”
 Attrition V/s “Training Times Last Year”
 Attrition V/s “Work Life Balance”
 Attrition V/s “Years At Company”
 Attrition V/s “Years in Current Role”
 Attrition V/s “Years Since Last Promotion”
 Attrition V/s Categorical Variables
Attrition V/s “Gender, Marital status and Overtime”
Attrition V/s “Department, Job Role, and Business Travel”
Data Pre-Processing-
Steps Involved –
 Taking care of missing data and dropping non-relevant
features
 Feature extraction
 Converting categorical features into numeric form
Binarization of the converted categorical features
 Feature scaling
 Understanding correlation of features with each other
 Splitting data into training and test data sets
 Refers to data mining technique that transforms raw data into
an understandable format
 Useful in making the data ready for analysis
3.1 FEATURE SELECTION
 Process wherein those features are selected, which contribute
most to the prediction variable or output.
Benefits of feature selection :
 Improve the performance
 Improves Accuracy
 Providing the better understanding of Data
Dropping non-relevant variables
#dropping all fixed and non-relevant variables
attrition_df.drop(['DailyRate','EmployeeCount','EmployeeNumber','HourlyRate','Month
lyRate','Over18','PerformanceRating','StandardHours','StockOptionLevel','TrainingTi
mesLastYear'], axis=1,inplace=True)
Check number of rows and columns
Features Extraction
3.2 FEATURE ENGINEERING
Label Encoding
 Label Encoding refers to converting the categorical variables into numeric
form, so as to convert it into the machine-readable form.
 It is an important pre-processing step for the structured dataset in supervised
learning.
 Fit and transform the required columns of the data, and then replace the
existing text data with the new encoded data.
Convert categorical variables into numeric variables
 One Hot Encoder
 It is used to perform “binarization” of the categorical features and
include it as a feature to train the model.
 It takes a column which has categorical data that has been label
encoded, and then splits the column into multiple columns.
 The numbers are replaced by 1s and 0s, depending on which
column has what value.
Applying “One Hot Encoder” on Label Encoded features
Feature Scaling
 Feature scaling is a method used to standardize the range of
independent variables or features of data
 It is also known as Data Normalization
 It is used to scale the features to a range which is centred around
zero so that the variance of the features are in the same range
 Two most popular methods of feature scaling are standardization
and normalization
Scaling the features
Correlation Matrix
• Correlation is a statistical technique which determines how one
variables moves/changes in relation with the other variable.
• It’s a bi-variant analysis measure which describes the association
between different variables.
Usefulness of Correlation matrix –
 If two variables are closely correlated, then we can predict one
variable from the other.
 Correlation plays a vital role in locating the important variables
on which other variables depend.
 It is used as the foundation for various modeling techniques.
 Proper correlation analysis leads to better understanding of data.
Plotting correlation matrix
Correlation matrix Plot
Splitting data into train and test
 The process of modeling means training a machine learning
algorithm to predict the labels from the features, tuning it for
the business need, and validating it on holdout data.
 Models used for employee attrition:
 Logistic Regression
 Random Forest
 Support vector machine
 XG Boost
Model building -
4.1 LOGISTIC REGRESSION
 Logistic Regression is one of the most basic and widely used
machine learning algorithms for solving a classification problem.
 It is a method used to predict a dependent variable (Y), given an
independent variable (X), given that the dependent variable
is categorical.
 Linear Regression equation
 Y stands for the dependent variable that needs to be predicted.
 β0 is the Y-intercept, which is basically the point on the line which
touches the y-axis.
 β1 is the slope of the line (the slope can be negative or positive
depending on the relationship between the dependent variable and
the independent variable.)
 X here represents the independent variable that is used to predict
our resultant dependent value.
 ∈ denotes the error in the computation
 Sigmoid Function
p(x)= β0+ β1x
 Building Logistic Regression Model
 Testing the Model
 Confusion Matrix
 Confusion matrix is the most crucial metric commonly used to
evaluate classification models.
 The confusion matrix avoids "confusion" by measuring the
actual and predicted values in a tabular format.
In table above, Positive class = 1 and Negative class = 0.
Standard table of confusion matrix -
 Creating confusion matrix
 AUC score
 Receiver Operator Characteristic (ROC)
 ROC determines the accuracy of a classification model at a user
defined threshold value.
 It determines the model's accuracy using Area Under Curve
(AUC).
 The area under the curve (AUC), also referred to as index of
accuracy (A) or concordant index, represents the performance of
the ROC curve. Higher the area, better the model.
 Plotting ROC curve
 ROC Curve For Logistic Regression
Using Logistic Regression algorithm, we got the accuracy score of
79% and roc_auc score of 0.77
4.2 RANDOM FOREST
• Random Forest is a supervised learning algorithm.
• It creates a forest and makes it random based on bagging
technique. It aggregates Classification Trees.
• In Random Forest, only a random subset of the features is taken
into consideration by the algorithm for splitting a node.
 Building Random Forest Model
 Testing the Model
 Confusion Matrix
 AUC score
 Plotting ROC curve
Using Random Forest algorithm, we got the accuracy score of 79%
and roc_auc score of 0.76.
 ROC Curve For Random Forest
4.3 SUPPORT VECTOR MACHINE
 SVM is a supervised machine learning algorithm used for both
regression and classification problems.
 Objective is to find a hyperplane in an N -dimensional space.
 Hyperplanes
 Hyperplanes are decision boundaries
that help segregate the data points.
 The dimension of the hyperplane
depends upon the number of features.
 Support Vectors
 These are data points that are closest to the hyperplane and
influence the position and orientation of the hyperplane.
 Used to maximize the margin of the classifier.
 Considered as critical elements of a dataset
 Kernel Technique
 Used when non-linear hyperplanes are needed
 The hyperplane is no longer a line, it must now be a plane
 Since we have a non-linear
classification problem, kernel
technique used here is Radial Basis
Function (rbf)
 Helps in segregating data that are
linearly non-separable.
 Building SVM Model
 Testing SVM Model
 Confusion Matrix
 AUC Score
 Plotting ROC Curve
Using SVM algorithm, we got the accuracy score of 79% and
roc_auc score of 0.77
 ROC Curve For SVM
4.4 XG BOOST
 XGBoost is a decision-tree-based ensemble Machine Learning algorithm
that uses a gradient boosting framework.
 XGBoost belongs to a family of boosting algorithms that convert weak
learners into strong learners.
 It is a sequential process, i.e., trees are grown using the information from
a previously grown tree one after the other, iteratively, the errors of the
previous model are corrected by the next predictor.
 Advantages of XGBoost -
 Regularization
 Parallel Processing
 High Flexibility
 Handling Missing Values
 Tree Pruning
 Built-in Cross-Validation
 Building XGBoost Model
 Testing the Model
 Confusion Matrix
 AUC Score
 Plotting ROC Curve
Using XGBoost algorithm we got the accuracy score of 82% and
roc_auc score 0.81
 ROC Curve For XGBoost Model
4.5 COMPARISON OF MODELS
 It can be observed by the table that XGBoost outperforms all other models.
 Hence, based on these results we can conclude that, XGBoost will be the best
model to predict future Employee Attrition for this company.
KEY FINDINGS
 The dataset does not feature any missing values or any redundant
features.
 The strongest positive correlations with the target features are:
Distance from home, Job satisfaction, marital status, overtime and
business travel
 The strongest negative correlations with the target features are:
Performance Rating and Training times last year
RECOMMENDATIONS
 Transportation should be provided to employees living in the same
area, or else transportation allowance should be provided.
 Plan and allocate projects in such a way to avoid the use of
overtime.
 Employees who hit their two-year anniversary should be identified
as potentially having a higher-risk of leaving.
 Gather information on industry benchmarks to determine if the
company is providing competitive wages.
THANK YOU

More Related Content

What's hot

Employee Attrition Analysis
Employee Attrition AnalysisEmployee Attrition Analysis
Employee Attrition Analysis
KrisGhimireMLSASCPCM
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Project report on attrition analysis
Project report on attrition analysis Project report on attrition analysis
Project report on attrition analysis
mohanapriya301
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
Sri Ambati
 
Employee Attrition
Employee AttritionEmployee Attrition
Employee Attrition
Vinay sattur
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
Divya Tiwari
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
MachinePulse
 
Data Management in R
Data Management in RData Management in R
Data Management in R
Sankhya_Analytics
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Analytics in hr
Analytics in hrAnalytics in hr
Analytics in hr
sonalimadhusmitajena1
 
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
Rahul Chatterjee
 
Tableau Presentation
Tableau PresentationTableau Presentation
Tableau Presentation
Andrea Bissoli
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Probability Theory for Data Scientists
Probability Theory for Data ScientistsProbability Theory for Data Scientists
Probability Theory for Data Scientists
Ferdin Joe John Joseph PhD
 
Project report on E recruitment
Project report on E recruitmentProject report on E recruitment
Project report on E recruitment
RajniKesharwani
 
HR Analytics, Done Right
HR Analytics, Done RightHR Analytics, Done Right
HR Analytics, Done Right
Trendwise Analytics
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
Douglas Joubert
 
Employee attrition
Employee attritionEmployee attrition
Employee attrition
Lineesh Kanaran
 
Hr analytics project
Hr analytics projectHr analytics project
Hr analytics project
Jatin Saini
 

What's hot (20)

Employee Attrition Analysis
Employee Attrition AnalysisEmployee Attrition Analysis
Employee Attrition Analysis
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
Project report on attrition analysis
Project report on attrition analysis Project report on attrition analysis
Project report on attrition analysis
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
 
Employee Attrition
Employee AttritionEmployee Attrition
Employee Attrition
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Analytics in hr
Analytics in hrAnalytics in hr
Analytics in hr
 
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
Project Title : A study on Job Satisfaction of Employees at Bright Future Con...
 
Tableau Presentation
Tableau PresentationTableau Presentation
Tableau Presentation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Probability Theory for Data Scientists
Probability Theory for Data ScientistsProbability Theory for Data Scientists
Probability Theory for Data Scientists
 
Project report on E recruitment
Project report on E recruitmentProject report on E recruitment
Project report on E recruitment
 
HR Analytics, Done Right
HR Analytics, Done RightHR Analytics, Done Right
HR Analytics, Done Right
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Employee attrition
Employee attritionEmployee attrition
Employee attrition
 
Hr analytics project
Hr analytics projectHr analytics project
Hr analytics project
 

Similar to Predicting Employee Attrition

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Boston Institute of Analytics
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
Hussain395748
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
Ashish Patel
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET Journal
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
IJCI JOURNAL
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
Codemotion
 
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
IRJET Journal
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
gitikasingh2004
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET Journal
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance Predictor
IRJET Journal
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptx
Kaviya452563
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User Guide
Andy Salmon
 
Open06
Open06Open06
Open06
butest
 

Similar to Predicting Employee Attrition (20)

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning Algorithm
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...Performance Comparisons among Machine Learning Algorithms based on the Stock ...
Performance Comparisons among Machine Learning Algorithms based on the Stock ...
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance Predictor
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptx
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User Guide
 
Open06
Open06Open06
Open06
 

Recently uploaded

一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 

Recently uploaded (20)

一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 

Predicting Employee Attrition

  • 2.
  • 3. 1.1 OBJECTIVE AND SCOPE OF THE STUDY  The objective of this project is to predict the attrition rate for each employee, to find out who’s more likely to leave the organization.  It will help organizations to find ways to prevent attrition or to plan in advance the hiring of new candidate.  Attrition proves to be a costly and time consuming problem for the organization and it also leads to loss of productivity.  The scope of the project extends to companies in all industries.
  • 4. 1.2 ANALYTICS APPROACH  Check for missing values in the data, and if any, will process the data accordingly.  Understand how the features are related with our target variable - attrition  Convert target variable into numeric form  Apply feature selection and feature engineering to make it model ready  Apply various algorithms to check which one is the most suitable  Draw out recommendations based on our analysis.
  • 5. 1.3 DATA SOURCES  For this project, an HR dataset named ‘IBM HR Analytics Employee Attrition & Performance’, has been picked, which is available on IBM website.  The data contains records of 1,470 employees.  It has information about employee’s current employment status, the total number of companies worked for in the past, Total number of years at the current company and the current roles, Their education level, distance from home, monthly income, etc.
  • 6. 1.4 TOOLS AND TECHNIQUES  We have selected Python as our analytics tool.  Python includes many packages such as Pandas, NumPy, Matplotlib, Seaborn etc.  Algorithms such as Logistic Regression, Random Forest, Support Vector Machine and XGBoost have been used for prediction.
  • 7.
  • 8.  Importing Libraries 2.1 IMPORTING LIBRARY AND DATA EXTRACTION
  • 9.  Importing Packages  Data Extraction
  • 10. 2.2 EXPLORATORY DATA ANALYSIS  Refers to the process of performing initial investigations on the data so as to discover patterns, to spot inconsistencies, to test hypothesis and to check assumptions with the help of graphical representations  Displaying First 5 Rows
  • 11.  Displaying rows and columns
  • 13.  Count of “Yes” and “No” values of Attrition
  • 14. 2.3 VISUALIZATION(EDA) -  Attrition V/s “Age”
  • 15.  Attrition V/s “Distance from Home”
  • 16.  Attrition V/s “Job Satisfaction”
  • 17.  Attrition V/s “Performance Rating”
  • 18.  Attrition V/s “Training Times Last Year”
  • 19.  Attrition V/s “Work Life Balance”
  • 20.  Attrition V/s “Years At Company”
  • 21.  Attrition V/s “Years in Current Role”
  • 22.  Attrition V/s “Years Since Last Promotion”
  • 23.  Attrition V/s Categorical Variables
  • 24. Attrition V/s “Gender, Marital status and Overtime”
  • 25. Attrition V/s “Department, Job Role, and Business Travel”
  • 26.
  • 27. Data Pre-Processing- Steps Involved –  Taking care of missing data and dropping non-relevant features  Feature extraction  Converting categorical features into numeric form Binarization of the converted categorical features  Feature scaling  Understanding correlation of features with each other  Splitting data into training and test data sets  Refers to data mining technique that transforms raw data into an understandable format  Useful in making the data ready for analysis
  • 28. 3.1 FEATURE SELECTION  Process wherein those features are selected, which contribute most to the prediction variable or output. Benefits of feature selection :  Improve the performance  Improves Accuracy  Providing the better understanding of Data
  • 29. Dropping non-relevant variables #dropping all fixed and non-relevant variables attrition_df.drop(['DailyRate','EmployeeCount','EmployeeNumber','HourlyRate','Month lyRate','Over18','PerformanceRating','StandardHours','StockOptionLevel','TrainingTi mesLastYear'], axis=1,inplace=True) Check number of rows and columns
  • 31. Label Encoding  Label Encoding refers to converting the categorical variables into numeric form, so as to convert it into the machine-readable form.  It is an important pre-processing step for the structured dataset in supervised learning.  Fit and transform the required columns of the data, and then replace the existing text data with the new encoded data.
  • 32. Convert categorical variables into numeric variables
  • 33.  One Hot Encoder  It is used to perform “binarization” of the categorical features and include it as a feature to train the model.  It takes a column which has categorical data that has been label encoded, and then splits the column into multiple columns.  The numbers are replaced by 1s and 0s, depending on which column has what value.
  • 34. Applying “One Hot Encoder” on Label Encoded features
  • 35. Feature Scaling  Feature scaling is a method used to standardize the range of independent variables or features of data  It is also known as Data Normalization  It is used to scale the features to a range which is centred around zero so that the variance of the features are in the same range  Two most popular methods of feature scaling are standardization and normalization
  • 37. Correlation Matrix • Correlation is a statistical technique which determines how one variables moves/changes in relation with the other variable. • It’s a bi-variant analysis measure which describes the association between different variables. Usefulness of Correlation matrix –  If two variables are closely correlated, then we can predict one variable from the other.  Correlation plays a vital role in locating the important variables on which other variables depend.  It is used as the foundation for various modeling techniques.  Proper correlation analysis leads to better understanding of data.
  • 40. Splitting data into train and test
  • 41.
  • 42.  The process of modeling means training a machine learning algorithm to predict the labels from the features, tuning it for the business need, and validating it on holdout data.  Models used for employee attrition:  Logistic Regression  Random Forest  Support vector machine  XG Boost Model building -
  • 43. 4.1 LOGISTIC REGRESSION  Logistic Regression is one of the most basic and widely used machine learning algorithms for solving a classification problem.  It is a method used to predict a dependent variable (Y), given an independent variable (X), given that the dependent variable is categorical.
  • 44.  Linear Regression equation  Y stands for the dependent variable that needs to be predicted.  β0 is the Y-intercept, which is basically the point on the line which touches the y-axis.  β1 is the slope of the line (the slope can be negative or positive depending on the relationship between the dependent variable and the independent variable.)  X here represents the independent variable that is used to predict our resultant dependent value.  ∈ denotes the error in the computation
  • 46.  Building Logistic Regression Model
  • 48.  Confusion Matrix  Confusion matrix is the most crucial metric commonly used to evaluate classification models.  The confusion matrix avoids "confusion" by measuring the actual and predicted values in a tabular format. In table above, Positive class = 1 and Negative class = 0. Standard table of confusion matrix -
  • 49.  Creating confusion matrix  AUC score
  • 50.  Receiver Operator Characteristic (ROC)  ROC determines the accuracy of a classification model at a user defined threshold value.  It determines the model's accuracy using Area Under Curve (AUC).  The area under the curve (AUC), also referred to as index of accuracy (A) or concordant index, represents the performance of the ROC curve. Higher the area, better the model.
  • 52.  ROC Curve For Logistic Regression Using Logistic Regression algorithm, we got the accuracy score of 79% and roc_auc score of 0.77
  • 53. 4.2 RANDOM FOREST • Random Forest is a supervised learning algorithm. • It creates a forest and makes it random based on bagging technique. It aggregates Classification Trees. • In Random Forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node.
  • 54.  Building Random Forest Model
  • 55.  Testing the Model  Confusion Matrix
  • 56.  AUC score  Plotting ROC curve
  • 57. Using Random Forest algorithm, we got the accuracy score of 79% and roc_auc score of 0.76.  ROC Curve For Random Forest
  • 58. 4.3 SUPPORT VECTOR MACHINE  SVM is a supervised machine learning algorithm used for both regression and classification problems.  Objective is to find a hyperplane in an N -dimensional space.  Hyperplanes  Hyperplanes are decision boundaries that help segregate the data points.  The dimension of the hyperplane depends upon the number of features.
  • 59.  Support Vectors  These are data points that are closest to the hyperplane and influence the position and orientation of the hyperplane.  Used to maximize the margin of the classifier.  Considered as critical elements of a dataset
  • 60.  Kernel Technique  Used when non-linear hyperplanes are needed  The hyperplane is no longer a line, it must now be a plane  Since we have a non-linear classification problem, kernel technique used here is Radial Basis Function (rbf)  Helps in segregating data that are linearly non-separable.
  • 62.  Testing SVM Model  Confusion Matrix
  • 63.  AUC Score  Plotting ROC Curve
  • 64. Using SVM algorithm, we got the accuracy score of 79% and roc_auc score of 0.77  ROC Curve For SVM
  • 65. 4.4 XG BOOST  XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.  XGBoost belongs to a family of boosting algorithms that convert weak learners into strong learners.  It is a sequential process, i.e., trees are grown using the information from a previously grown tree one after the other, iteratively, the errors of the previous model are corrected by the next predictor.  Advantages of XGBoost -  Regularization  Parallel Processing  High Flexibility  Handling Missing Values  Tree Pruning  Built-in Cross-Validation
  • 67.  Testing the Model  Confusion Matrix
  • 68.  AUC Score  Plotting ROC Curve
  • 69. Using XGBoost algorithm we got the accuracy score of 82% and roc_auc score 0.81  ROC Curve For XGBoost Model
  • 70. 4.5 COMPARISON OF MODELS  It can be observed by the table that XGBoost outperforms all other models.  Hence, based on these results we can conclude that, XGBoost will be the best model to predict future Employee Attrition for this company.
  • 71.
  • 72. KEY FINDINGS  The dataset does not feature any missing values or any redundant features.  The strongest positive correlations with the target features are: Distance from home, Job satisfaction, marital status, overtime and business travel  The strongest negative correlations with the target features are: Performance Rating and Training times last year
  • 73.
  • 74. RECOMMENDATIONS  Transportation should be provided to employees living in the same area, or else transportation allowance should be provided.  Plan and allocate projects in such a way to avoid the use of overtime.  Employees who hit their two-year anniversary should be identified as potentially having a higher-risk of leaving.  Gather information on industry benchmarks to determine if the company is providing competitive wages.