SlideShare a Scribd company logo
1 of 40
Phase 1: Project overview
1.Business Problem
1.1 Objective:
To predict why and when employees are most likely to leave the company using a Machine Learning
model, so that actions can be implemented to improve employee retention as well as possibly planning
new hiring in advance. Here we have data on former employees, where our target variable Y which is the
probability of an employee leaving the company. This project would fall under what is commonly known
as "HR Anlytics", "People Analytics".
1.2 Challenges:
The main challenge is that the data provided has more percentage of employees who are active, but not
about Ex-employees which we require. To incorparate a balanced data using SMOTE technique and perform
accordingly to provide the Y variable.
1.3 Real World Impact:
Overall impact of this solution would impact every possible company having employees, in understanding
likelihood of an active employee leaving the company and take decisions in advance for the retention. It would
also help save the true cost of replacing an employee which is caused due to the amount of time spent to
interview and find a replacement, sign-on bonuses, and the loss of productivity for several months while the new
employee gets accustomed to the new role. A study by the Center for American Progress found that companies
typically pay about one-fifth of an employee’s salary to replace that employee.
2.Dataset
2.1 Data Fields:
• Attrition: Whether employees are still with the company or a Ex-employee
• Age: 18 to 60 years old
• Gender: Female or Male
• Department: Research & Development, Sales, Human Resources.
• BusinessTravel: Travel_Rarely, Travel_Frequently, Non-Travel.
• DistanceFromHome: Distance between the company and their home in miles.
• MonthlyIncome: Employees numeric monthly income.
• MaritalStatus: Married, Single, Divorced.
• Education: Level of education.
• EducationField: Life Sciences, Medical, Marketing,Technical Degree,Other.
• EnvironmentSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
• RelationshipSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
• JobInvolvement: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
• JobRole: Sales Executive,Research Science, Laboratory Tec, Manufacturing, Healthcare Rep, etc.
• JobSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
• OverTime: Whether they work overtime or not.
• NumCompaniesWorked: Number of companies they worked for before joinging IBM.
• PerformanceRating: 1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'.
• YearsAtCompany: Years they worked for IBM.
• WorkLifeBalance: 1 'Bad' 2 'Good' 3 'Better' 4 'Best'.
• YearsSinceLastPromotion: Years passed since their last promotion.
2.2 Datasets:
In this case study, a HR dataset was sourced from which contains employee data for 1,470 employees
with various information about the employees. I will use this dataset to predict when employees are
going to quit by understanding the main drivers of employee churn. Only a single dataset is present:
• WA_Fn-UseC_-HR-Employee-Attrition.csv
2.3 Data Understanding & Tools:
• Data comes from a Kaggle competition so it can be downloaded directly for the solution but if we
want to productionize the live data we might have to make a data pipeline for the same. Cloud solutions
and SQL queries for data pipelines are very commonly seen in companies which can be used effectively.
• For this particular instance we can use Pandas and Numpy libraries to process the data as we have data
in CSV format.
• As the data is company specific additional data can be acquired by having business understanding of
the same.
3.Solutions to similar problems:
3.1 Solution Approach and Problem Type:
This project would fall under what is commonly known as "HR Anlytics", "People
Analytics". I will be usign a step-by-step systematic approach using a method that could
be used for a variety of ML problems. Some of the Machine learning algorihms are:
• Logistic Regression.
• Random Forest.
• Decision Trees.
• K Nearest Neighbours.
We can use cross validation technique to understand, compare model performances
& can also implement Grid Search for the best possible hyper parameters of models.
4.References:
• https://www.netsuite.com/portal/resource/articles/human-resources/employee-turnover-
kpis- metrics.shtml
• https://scikitearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
• https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
•https://towardsdatascience.com/oversampling-and-undersampling-5e2bbaf56dcf
Phase 2 : EDA and Feature Extraction
EDA and Feature extraction are valua understand the likelihood of an employee leaving
the company. We will explore the data using this steps :-
• Employee Data understanding and insights.
• Removing Duplicates and imputing missing values.
• Checking correlation
• Univariate analysis
• Multivariate analysis
• Outliers
•Binarizing Categorical variables.
• Over sampling and Under sampling.
• Data Transformation (Normalization)
1.Libraries :
Following libraries have been used:
Description of these libraries are as follows:-
• Pandas for Dataframe operations
• Numpy for Numeric operations
• Matplotlib and Seaborn are Data Visualisation libraries
• Scikit-Learn for all the Machine learning algorithms.
• imbalanced-learn for re-sampling techniques for strong between-class imbalance
2.EDA
We will start with understanding of the data:
•Data shape.
•Datatypes of each variable
•Unique values of each variable
2.1 Finding NA and Null values.
Found no Null or NA values across the data set
2.2 Dropping not so important variables according to data.
Found 3 variables EmployeeCount, Over18, StandardHours to have only one unique value.
It is also noticed that Employee number doesn't have much meaning for the analysis, so we
are dropping this column as well.
2.3 Correlation Analysis :
2.3.1 Correlations of Target variable Y
• Our Target Variable Y has most positive correlations with :
PerformanceRating 0.002889
MonthlyRate 0.015170
NumCompaniesWorked 0.043494
DistanceFromHome 0.077924
• Target variable Y has most negative correlations with :
TotalWorkingYears -0.171063
JobLevel -0.169105
YearsInCurrentRole -0.160545
MonthlyIncome -0.159840
Age -0.159205
2.3 Correlation analysis using a heatmap plot :
Most of the values appear to be weakly correlated with each other. But there are lots of insights
here to be had :
• Job level and total working years are highly correlated.
(i.e., the longer you work the higher job level you achieve)
• Age is correlated JobLevel and Education.
(i.e., the older we are the more educated and successful you are)
• HourlyRate, DailyRate, and MonthlyRate are completely uncorrelated with each other.
• MonthlyIncome is highly correlated to Job Level.
• Monthly Income and total working years are highly correlated.
• Performance rating and percentage salary hike are highly correlated.
• Years in current role and years at company are highly correlated.
(i.e., sticking with company for long can promote your role)
• Years with current manager and years at company are highly correlated.
• Work life Balance correlates to pretty much none of the numeric values
• Number of companies worked at is weakly correlated with the time spent at the company.
(might indicate we're likely the leave)
• If performance rating is high , then bigger percent salary hike.
2.4 Descriptive statistics :
The describe() function gives Descriptive statistics include that summarize the central tendency,
dispersion and shape of a dataset's distribution. The below are the important takeaways :
• Mean age of employees is 37
• Most people get a promotion in 2-5 years
• Average time employed at is 7 years
• No one has a performance rating under 3
• Most people get training 2-3 times a year
2.5 Visualization :
2.5.1 Trend of attrition for age :
• There’s a sudden rise of trend near 25 years and employees leave.
• Also Most of the employees leave in their early before 30’s & the exact same pattern is followed by near 31 years.
• Later on the attrition trend decreases as age increases.
• So the age 25 & in between 28 to 32 should be identified as potentially having a risk of leaving more employees.
2.5.1 Trend of attrition for age :
(a) (b)
(a) Most people leave in their early 30's. The current employees major age category falls betwwen 35 to 40.
The higher age people,typically after 35 the attrition rate becomes low.
(b) The Average age of a male employees leaving is 34, whereas it is 33 for female employees.
2.5.2 Years at company trend analysis :
• The highest attrition rate occurs in the first year of the job. Over 20% of all employees who left did so in their first year.
• The vast majority of the workforce has been for under 10 years.
• The max years at company being 40 years.
• The average years at company for a female is 6 years and for a male it is 5 years.
2.5.3 Overtime analysis :
• Majority of proportion tend to leave because of overtime, we confidently say over time is related to attrition by above.
• The major proportion of overtime being men by 60%, with an average age of 34 years. The other portion being female with
average age of 31 years.
2.5.4 Distance From Home trend analysis :
• Average distance from home for currently active employees: 8.92 miles and ex-employees: 10.63 miles
• Hence we can conclude that employee is likely to quit is distance is more than 10 Miles.
• It shows that majority of employees whose marital status is married, tend to leave the company if the
distance is more, which on average is coming out to be 11 miles approximately.
2.5.5 Analysis based on monthly income :
• The bar graph plots Age vs Monthly Income, the line trend follows the trend of attrition with age.
• We see that the as the age of employee increases, the monthly income also increases and the attrition trend
of employee leaving the company decreases.
• So the employee of higher age is overall a loyal employee.
2.5.5 Analysis based on monthly income :
(a) (b)
(a) AS the job level increases, we see that the monthly increases. Hence we can conclude that a good employee who stays in
company coping with the work, has good increments in job level, income & in the far run he becomes a loyal employee.
(b) The average monthly income for a female ex-employee is Rs. 4770, while for a male ex-employee is 4798.
• But leaving this attrition factor, the general monthly income of female employee is more than male employee.
2.5.6 Analysis based on Business Travel :
(a) (b)
(a) We see that the good majority of proportion who needs to travel frequently will leave the company.
(b) The resultant is that, the count of male employees who are likely to travel frequently & leave the company
is more than that of female employees who travel frequently.
2.5.7 Number of companies worked analysis :
(a) (b)
• The employees who had worked in one company before, is more likely to leave,i.e Employees who hit
their overall two-year anniversary should be identified as potentially having a higher-risk of leaving.
• The average age approximately is 30 years for those employees.
2.5.8 Trend of Attrition for total working years :
(a) (b)
• The attrition rate decreases i.e the employee is less likely to live as the number of working years increase.
• Employees who have between 5-9 years experience should be identified as potentially having a higher-risk of leaving.
• The average total working years of ex-employees is 9 years and for active people it is 12 years.
2.5.8 Trend of Attrition for years with current manager :
(a) (b)
• A large number of leavers leave 6 months after their Current Managers.
• There’s a same repeated pattern for 2-3 years and the 7th year, where the ex-employees attrition rate randomly
increases and later becomes normal. They should be identified as potentially having a higher-risk of leaving,
and be aware od this pattern.
2.5.9 Attrition analysis for Department :
(a)
• Of all the true attrition range, 65% employees leave from Research & development department.
• The next being sales department with 30% from total
• This trend of attrition happens for employees of age 33-35, this should be identified as risk of leaving and
should take advance steps on these aged employees of different departments.
2.5.10 Attribution analysis for Marital status :
(a)
• Most of the ex-employees are max proportinate of sigle & married classes.
• The average monthly income of married employees is high and the next is for single ones.
• Also the number of average years working at company seem to be high with 6.5 years, follwed by single ones
with 4.5 years.
3. Random Findings
• Compared to other roles, human resouces generally get promoted fast.
• Human resouces people are having a slightly lower job satisfaction compared to other roles.
Phase 3 : Data Transformation.
1.Undersampling & Oversampling (SMOTE)
• We see that the dataset provided has more proportion of Active employees thanEx-employees. So the
maching learning model may create biases towards active employees more. In order to avoid this imbalanced
data we use SMOTE technique, so that minority class will become proportionate to majority class.
• The initial count of the unique values were [ 0 : 1233, 1: 237]
• After oversampling & undersampling we see that, the values of minority class has been increased
and that of majority class has been decreased.
• The final propertion of values are coming to be [0: 770, 1: 616], which is decent.
• This aspect of dataset transformation is important for model, to avoid biases for our output.
2. Feature Scaling
2.1 Normalization of Data
• The resultant data after sampling, needs to be normalized between certain range of values,
so that the model wont be biased towards the high values of different variables.
• The data has been normalized to the values between 0 & 1, independently of the statistical
distribution they follow.
3.Train and Test data splitting
• The data has been split to test data, training data & the model is trained with the training data.
• Both the dependent & independent variables are split to test & training data.
• It is done,in order for the model to perform with higher precision when the unknown test data
is fed to it.
Phase 4 : Building Machine Learning Models
1. Baseline algorithms
• First we use a range of baseline algorithms (using out-of-the-box hyper-parameters)
before we move on to more sophisticated solutions.
• We use a baseline prediction algorithm to know whether the predictions for a given
algorithm are good or not.
• A baseline prediction algorithm provides a set of predictions that we can evaluate as
we would any predictions for our problem, such as classification.
• The scores from these algorithms provide the required point of comparison when
evaluating all other machine learning algorithms on our problem.
• Once established, we can comment on how much better a given algorithm is.
The algorithms considered in this section are: Logistic Regression, Random Forest, KNN,
Decision Tree Classifier.
From the above results, we can conclude that:
•The Machine Learning model ‘Random Forest Classifier seems to be the best fit model, with the
accuracy average being 90.2 approximately.
•The Machine learning models Decision tree, Logistic Regression also seems to be performing good,
with accuracy average being 82.4, 77.5 respectively. We shall proceed with this models for analysis.
2. Logistic Regression
• Logistic Regression is a Machine Learning classification algorithm that is used to predict
the probability of a categorical dependent variable. Logistic Regression is classification
algorithm that is not as sophisticated as the ensemble methods. Hence, it provides us with
a good benchmark.
• We use the Logistic Regression() function of the sklearn library to fit the data.
from sklearn.linear_model import LogisticRegression
• y prediction mean accuracy is coming out to be 77.8 approximately.
• On fitting the data to linear regression we get :
Intercept value is 1.16107 and coefficient values are:
array([[-0.60427658, -0.66403848, 1.01115777, -0.17174707, -1.16585296,
0.259884 , -0.23246306, -1.86911886, -0.5026541 , -1.20096013,
-0.16554763, 0.32295026, 1.05840619, 1.78505991, -0.17840222,
-0.26499954, -0.73665015, -0.73702936, -0.6300062 , -0.63448951,
-0.6229809 , 1.14153067, -1.35572966, 2.03116262, -1.03586988,
1.41016837, 0.92204911, -0.45429208, 0.38475649, -0.35644322,
- 0.11705673, -0.16685064, -0.2734763 , 0.80221059, 0.99447046,
1.33874149, -0.24631437, -0.14315613, -0.31208634, 0.06669798,
-0.01705458, 0.94274675, 0.56312458, 0.97563792]])
2.1 Confusion Matrix, Classification Report & ROC Curve.
[[165 28] auc score = 77.6
[52 102]]
Precision score for predicting our required Y variable is 75.
3. Random Forest Classifier
• Random Forest is a machine learning method that is capable of solving both regression and classification. It
is a brand of Ensemble learning, as it relies on an ensemble of decision trees. It aggregates Classification (or
Regression) Trees.
• Random Forest fits a number of decision tree classifiers on various sub-samples of the dataset and use
averaging to improve the predictive accuracy and control over-fitting. Random Forest can handle a large
number of features, and is helpful for estimating which of your variables are important in the underlying data
being modeled.
• Following the same train, test split data with the below parameters gives us the mean score of 92 approx.
•The scores of all the splits is as follows & their mean gives us overall score of the model.
4. Decision Tree classifier
• Decision tree classifier is a machine learning algorithm used for both classification and regression
tasks, that predicts value of a target variable by learning simple decision rules inferred from the input
features.
• Decision trees are structured as a hierarchical tree-like structure, where each internal node represents
a feature or attribute, and each branch represents a decision rule based on that attribute. The leaf
nodes represent the final predicted outcome or class label.
• They are also capable of handling nonlinear relationships between features and the target variable
• Following the same train, test split data with the below parameters gives us the score of 87.6 approx.
5. Grid Search for fine tuning of hyper parameters.
• Grid search works by creating a grid of all possible combinations of hyperparameter values specified
by the user. It then trains and evaluates model using each combination of hyperparameters and selects
the one that yields the best performance based on a predefined evaluation metric, such as accuracy,
precision, or F1 score
• It systematically explores all possible combinations of hyperparameters, ensuring that the best combination
is found within the specified search space. However, this exhaustive search can be computationally expensive.
5.1 Grid Search for Random forest classifier.
• Fitting the same train, test split with the above input parameters, gives us the best possible hyper parameters
as:
Best score = 0.9056670382757339
Best parameters = {'max_features': 1, 'n_estimators': 178}
5.2 Grid Search for Decision trees classifier.
• Fitting the same train, test split with the above input parameters, gives us the best possible
hyper parameters as :
Best score = 0.8896137963275589
Best parameters = {'criterion': 'entropy', 'max_depth': 13, 'random_state': 17}
5.2 Grid Search for Decision trees classifier.
• Fitting the same train, test split with the above input parameters, gives us the best possible
hyper parameters as :
Best score = 0.8896137963275589
Best parameters = {'criterion': 'entropy', 'max_depth': 13, 'random_state': 17}
Conclusion :
Retention plans :
The major indicators of employees leaving include:
• Age: Employees of young age bracket 25-35 are more likely to leave. Hence, efforts should be made to
clearly articulate the long-term plan for the company for retention, as well as provide incentives in any way
to upgrade job level.
• Monthly Income: People on higher wages are less likely to leave the company. Hence, efforts should be made to
gather information from current local market to determine if the company is paying competitive monthly wages.
• Over Time: People who work overtime are more likely to leave the company. Hence efforts must be taken to
appropriately scope projects before with adequate support and required crew so as to reduce the overtime.
• YearsWithCurrManager: A large number of leavers leave 6 months after their Current Managers. By getting line
Manager details for each employee,should determine under which Manager have experienced the largest
numbers of resignings over the past year. By extracting patterns in the employees who have resigned
may indicate recurring patterns in employees leaving in which case action may be taken accordingly.
• DistanceFromHome: Employees who live further from home are more likely to leave the company. Hence,
efforts should be made to provide support in the form of transportation for group of employees leaving the same
area, or in the form of Allowances. Initial screening of employees based on their home location is probably not
recommended as it would be regarded as a form of discrimination as long as employees make it to work on time
every day.
• TotalWorkingYears: The more experienced employees are less likely to leave. Employees who have between 5-8
years of experience should be identified as potentially having a higher-risk of leaving.
• YearsAtCompany: Loyal companies are less likely to leave. Employees who hit their two-year anniversary should
be identified as potentially having a higher-risk of leaving.
The company should look deeper into human resource roles to understanding which part people are not satisfied
with the job. Frequent communication and one-on-ones are strongly recommended.
While the company doesn’t need to worry too much about people who worked for 2 – 4 companies, it’s still
worth paying attention to males who went to more than 5 companies.

More Related Content

Similar to Employee Retension Capstone Project - Neeraj Bubby.pptx

Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnoseWebinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnosePayScale, Inc.
 
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ List
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ ListWhat it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ List
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ ListGreat Place to WorkÂŽ US
 
Hiring for success — what makes the difference? Why are so many organisations...
Hiring for success — what makes the difference? Why are so many organisations...Hiring for success — what makes the difference? Why are so many organisations...
Hiring for success — what makes the difference? Why are so many organisations...Steven Jagger
 
HR analytics
HR analyticsHR analytics
HR analyticsLewis Garrad
 
IBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceIBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceShivangiKrishna
 
Employee Engagement
Employee Engagement Employee Engagement
Employee Engagement Seta Wicaksana
 
10 HR Metrics Every Company Should Track .pdf
10 HR Metrics Every Company Should Track .pdf10 HR Metrics Every Company Should Track .pdf
10 HR Metrics Every Company Should Track .pdfnguyenanvuong2007
 
Science-Based Hiring: An Actionable Guide
Science-Based Hiring: An Actionable GuideScience-Based Hiring: An Actionable Guide
Science-Based Hiring: An Actionable GuideNatasha Ouslis
 
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docx
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docxINFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docx
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docxjaggernaoma
 
Hr analytics – demystified!
Hr analytics – demystified!Hr analytics – demystified!
Hr analytics – demystified!Arun Krishnan
 
HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?Lewis Garrad
 
Hiring for success-uk-web
Hiring for success-uk-webHiring for success-uk-web
Hiring for success-uk-webGary Fay
 
Measuring Employee Productivity
Measuring Employee ProductivityMeasuring Employee Productivity
Measuring Employee ProductivitySeta Wicaksana
 
Assignment week4 day3 data analytics
Assignment week4 day3 data analyticsAssignment week4 day3 data analytics
Assignment week4 day3 data analyticsGirish Nookella
 
5StaffingMetricsForYourWorkweek
5StaffingMetricsForYourWorkweek5StaffingMetricsForYourWorkweek
5StaffingMetricsForYourWorkweekEric Scalese
 
Fahim Karim: Attrition Prevention
Fahim Karim: Attrition PreventionFahim Karim: Attrition Prevention
Fahim Karim: Attrition PreventionEdunomica
 
NHRD HR Analytics Presentation
NHRD HR Analytics PresentationNHRD HR Analytics Presentation
NHRD HR Analytics PresentationSupriya Thankappan
 
How organizations have changed their performance reviews
How organizations have changed their performance reviewsHow organizations have changed their performance reviews
How organizations have changed their performance reviewsGroSum
 
Presentation final.pptx
Presentation final.pptxPresentation final.pptx
Presentation final.pptxkomalsharma581366
 

Similar to Employee Retension Capstone Project - Neeraj Bubby.pptx (20)

Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnoseWebinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
 
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ List
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ ListWhat it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ List
What it Takes to Make the Fortune 100 Best Companies to Work ForÂŽ List
 
Hiring for success — what makes the difference? Why are so many organisations...
Hiring for success — what makes the difference? Why are so many organisations...Hiring for success — what makes the difference? Why are so many organisations...
Hiring for success — what makes the difference? Why are so many organisations...
 
HR analytics
HR analyticsHR analytics
HR analytics
 
IBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceIBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & Performance
 
Employee Engagement
Employee Engagement Employee Engagement
Employee Engagement
 
10 HR Metrics Every Company Should Track .pdf
10 HR Metrics Every Company Should Track .pdf10 HR Metrics Every Company Should Track .pdf
10 HR Metrics Every Company Should Track .pdf
 
Science-Based Hiring: An Actionable Guide
Science-Based Hiring: An Actionable GuideScience-Based Hiring: An Actionable Guide
Science-Based Hiring: An Actionable Guide
 
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docx
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docxINFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docx
INFORMATION SYSTEMS CASE STUDYBrainstorm ideas for a new informa.docx
 
Hr analytics – demystified!
Hr analytics – demystified!Hr analytics – demystified!
Hr analytics – demystified!
 
HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?
 
Hiring for success
Hiring for successHiring for success
Hiring for success
 
Hiring for success-uk-web
Hiring for success-uk-webHiring for success-uk-web
Hiring for success-uk-web
 
Measuring Employee Productivity
Measuring Employee ProductivityMeasuring Employee Productivity
Measuring Employee Productivity
 
Assignment week4 day3 data analytics
Assignment week4 day3 data analyticsAssignment week4 day3 data analytics
Assignment week4 day3 data analytics
 
5StaffingMetricsForYourWorkweek
5StaffingMetricsForYourWorkweek5StaffingMetricsForYourWorkweek
5StaffingMetricsForYourWorkweek
 
Fahim Karim: Attrition Prevention
Fahim Karim: Attrition PreventionFahim Karim: Attrition Prevention
Fahim Karim: Attrition Prevention
 
NHRD HR Analytics Presentation
NHRD HR Analytics PresentationNHRD HR Analytics Presentation
NHRD HR Analytics Presentation
 
How organizations have changed their performance reviews
How organizations have changed their performance reviewsHow organizations have changed their performance reviews
How organizations have changed their performance reviews
 
Presentation final.pptx
Presentation final.pptxPresentation final.pptx
Presentation final.pptx
 

More from Boston Institute of Analytics

NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

More from Boston Institute of Analytics (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Employee Retension Capstone Project - Neeraj Bubby.pptx

  • 1.
  • 2. Phase 1: Project overview 1.Business Problem 1.1 Objective: To predict why and when employees are most likely to leave the company using a Machine Learning model, so that actions can be implemented to improve employee retention as well as possibly planning new hiring in advance. Here we have data on former employees, where our target variable Y which is the probability of an employee leaving the company. This project would fall under what is commonly known as "HR Anlytics", "People Analytics". 1.2 Challenges: The main challenge is that the data provided has more percentage of employees who are active, but not about Ex-employees which we require. To incorparate a balanced data using SMOTE technique and perform accordingly to provide the Y variable. 1.3 Real World Impact: Overall impact of this solution would impact every possible company having employees, in understanding likelihood of an active employee leaving the company and take decisions in advance for the retention. It would also help save the true cost of replacing an employee which is caused due to the amount of time spent to interview and find a replacement, sign-on bonuses, and the loss of productivity for several months while the new employee gets accustomed to the new role. A study by the Center for American Progress found that companies typically pay about one-fifth of an employee’s salary to replace that employee.
  • 3. 2.Dataset 2.1 Data Fields: • Attrition: Whether employees are still with the company or a Ex-employee • Age: 18 to 60 years old • Gender: Female or Male • Department: Research & Development, Sales, Human Resources. • BusinessTravel: Travel_Rarely, Travel_Frequently, Non-Travel. • DistanceFromHome: Distance between the company and their home in miles. • MonthlyIncome: Employees numeric monthly income. • MaritalStatus: Married, Single, Divorced. • Education: Level of education. • EducationField: Life Sciences, Medical, Marketing,Technical Degree,Other. • EnvironmentSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'. • RelationshipSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'. • JobInvolvement: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'. • JobRole: Sales Executive,Research Science, Laboratory Tec, Manufacturing, Healthcare Rep, etc. • JobSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'. • OverTime: Whether they work overtime or not. • NumCompaniesWorked: Number of companies they worked for before joinging IBM. • PerformanceRating: 1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'. • YearsAtCompany: Years they worked for IBM. • WorkLifeBalance: 1 'Bad' 2 'Good' 3 'Better' 4 'Best'. • YearsSinceLastPromotion: Years passed since their last promotion.
  • 4. 2.2 Datasets: In this case study, a HR dataset was sourced from which contains employee data for 1,470 employees with various information about the employees. I will use this dataset to predict when employees are going to quit by understanding the main drivers of employee churn. Only a single dataset is present: • WA_Fn-UseC_-HR-Employee-Attrition.csv 2.3 Data Understanding & Tools: • Data comes from a Kaggle competition so it can be downloaded directly for the solution but if we want to productionize the live data we might have to make a data pipeline for the same. Cloud solutions and SQL queries for data pipelines are very commonly seen in companies which can be used effectively. • For this particular instance we can use Pandas and Numpy libraries to process the data as we have data in CSV format. • As the data is company specific additional data can be acquired by having business understanding of the same.
  • 5. 3.Solutions to similar problems: 3.1 Solution Approach and Problem Type: This project would fall under what is commonly known as "HR Anlytics", "People Analytics". I will be usign a step-by-step systematic approach using a method that could be used for a variety of ML problems. Some of the Machine learning algorihms are: • Logistic Regression. • Random Forest. • Decision Trees. • K Nearest Neighbours. We can use cross validation technique to understand, compare model performances & can also implement Grid Search for the best possible hyper parameters of models.
  • 6. 4.References: • https://www.netsuite.com/portal/resource/articles/human-resources/employee-turnover- kpis- metrics.shtml • https://scikitearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html • https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html •https://towardsdatascience.com/oversampling-and-undersampling-5e2bbaf56dcf
  • 7. Phase 2 : EDA and Feature Extraction EDA and Feature extraction are valua understand the likelihood of an employee leaving the company. We will explore the data using this steps :- • Employee Data understanding and insights. • Removing Duplicates and imputing missing values. • Checking correlation • Univariate analysis • Multivariate analysis • Outliers •Binarizing Categorical variables. • Over sampling and Under sampling. • Data Transformation (Normalization)
  • 8. 1.Libraries : Following libraries have been used: Description of these libraries are as follows:- • Pandas for Dataframe operations • Numpy for Numeric operations • Matplotlib and Seaborn are Data Visualisation libraries • Scikit-Learn for all the Machine learning algorithms. • imbalanced-learn for re-sampling techniques for strong between-class imbalance
  • 9. 2.EDA We will start with understanding of the data: •Data shape. •Datatypes of each variable •Unique values of each variable 2.1 Finding NA and Null values. Found no Null or NA values across the data set 2.2 Dropping not so important variables according to data. Found 3 variables EmployeeCount, Over18, StandardHours to have only one unique value. It is also noticed that Employee number doesn't have much meaning for the analysis, so we are dropping this column as well.
  • 10. 2.3 Correlation Analysis : 2.3.1 Correlations of Target variable Y • Our Target Variable Y has most positive correlations with : PerformanceRating 0.002889 MonthlyRate 0.015170 NumCompaniesWorked 0.043494 DistanceFromHome 0.077924 • Target variable Y has most negative correlations with : TotalWorkingYears -0.171063 JobLevel -0.169105 YearsInCurrentRole -0.160545 MonthlyIncome -0.159840 Age -0.159205
  • 11. 2.3 Correlation analysis using a heatmap plot :
  • 12. Most of the values appear to be weakly correlated with each other. But there are lots of insights here to be had : • Job level and total working years are highly correlated. (i.e., the longer you work the higher job level you achieve) • Age is correlated JobLevel and Education. (i.e., the older we are the more educated and successful you are) • HourlyRate, DailyRate, and MonthlyRate are completely uncorrelated with each other. • MonthlyIncome is highly correlated to Job Level. • Monthly Income and total working years are highly correlated. • Performance rating and percentage salary hike are highly correlated. • Years in current role and years at company are highly correlated. (i.e., sticking with company for long can promote your role) • Years with current manager and years at company are highly correlated. • Work life Balance correlates to pretty much none of the numeric values • Number of companies worked at is weakly correlated with the time spent at the company. (might indicate we're likely the leave) • If performance rating is high , then bigger percent salary hike.
  • 13. 2.4 Descriptive statistics : The describe() function gives Descriptive statistics include that summarize the central tendency, dispersion and shape of a dataset's distribution. The below are the important takeaways : • Mean age of employees is 37 • Most people get a promotion in 2-5 years • Average time employed at is 7 years • No one has a performance rating under 3 • Most people get training 2-3 times a year
  • 14. 2.5 Visualization : 2.5.1 Trend of attrition for age : • There’s a sudden rise of trend near 25 years and employees leave. • Also Most of the employees leave in their early before 30’s & the exact same pattern is followed by near 31 years. • Later on the attrition trend decreases as age increases. • So the age 25 & in between 28 to 32 should be identified as potentially having a risk of leaving more employees.
  • 15. 2.5.1 Trend of attrition for age : (a) (b) (a) Most people leave in their early 30's. The current employees major age category falls betwwen 35 to 40. The higher age people,typically after 35 the attrition rate becomes low. (b) The Average age of a male employees leaving is 34, whereas it is 33 for female employees.
  • 16. 2.5.2 Years at company trend analysis : • The highest attrition rate occurs in the first year of the job. Over 20% of all employees who left did so in their first year. • The vast majority of the workforce has been for under 10 years. • The max years at company being 40 years. • The average years at company for a female is 6 years and for a male it is 5 years.
  • 17. 2.5.3 Overtime analysis : • Majority of proportion tend to leave because of overtime, we confidently say over time is related to attrition by above. • The major proportion of overtime being men by 60%, with an average age of 34 years. The other portion being female with average age of 31 years.
  • 18. 2.5.4 Distance From Home trend analysis : • Average distance from home for currently active employees: 8.92 miles and ex-employees: 10.63 miles • Hence we can conclude that employee is likely to quit is distance is more than 10 Miles. • It shows that majority of employees whose marital status is married, tend to leave the company if the distance is more, which on average is coming out to be 11 miles approximately.
  • 19. 2.5.5 Analysis based on monthly income : • The bar graph plots Age vs Monthly Income, the line trend follows the trend of attrition with age. • We see that the as the age of employee increases, the monthly income also increases and the attrition trend of employee leaving the company decreases. • So the employee of higher age is overall a loyal employee.
  • 20. 2.5.5 Analysis based on monthly income : (a) (b) (a) AS the job level increases, we see that the monthly increases. Hence we can conclude that a good employee who stays in company coping with the work, has good increments in job level, income & in the far run he becomes a loyal employee. (b) The average monthly income for a female ex-employee is Rs. 4770, while for a male ex-employee is 4798. • But leaving this attrition factor, the general monthly income of female employee is more than male employee.
  • 21. 2.5.6 Analysis based on Business Travel : (a) (b) (a) We see that the good majority of proportion who needs to travel frequently will leave the company. (b) The resultant is that, the count of male employees who are likely to travel frequently & leave the company is more than that of female employees who travel frequently.
  • 22. 2.5.7 Number of companies worked analysis : (a) (b) • The employees who had worked in one company before, is more likely to leave,i.e Employees who hit their overall two-year anniversary should be identified as potentially having a higher-risk of leaving. • The average age approximately is 30 years for those employees.
  • 23. 2.5.8 Trend of Attrition for total working years : (a) (b) • The attrition rate decreases i.e the employee is less likely to live as the number of working years increase. • Employees who have between 5-9 years experience should be identified as potentially having a higher-risk of leaving. • The average total working years of ex-employees is 9 years and for active people it is 12 years.
  • 24. 2.5.8 Trend of Attrition for years with current manager : (a) (b) • A large number of leavers leave 6 months after their Current Managers. • There’s a same repeated pattern for 2-3 years and the 7th year, where the ex-employees attrition rate randomly increases and later becomes normal. They should be identified as potentially having a higher-risk of leaving, and be aware od this pattern.
  • 25. 2.5.9 Attrition analysis for Department : (a) • Of all the true attrition range, 65% employees leave from Research & development department. • The next being sales department with 30% from total • This trend of attrition happens for employees of age 33-35, this should be identified as risk of leaving and should take advance steps on these aged employees of different departments.
  • 26. 2.5.10 Attribution analysis for Marital status : (a) • Most of the ex-employees are max proportinate of sigle & married classes. • The average monthly income of married employees is high and the next is for single ones. • Also the number of average years working at company seem to be high with 6.5 years, follwed by single ones with 4.5 years.
  • 27. 3. Random Findings • Compared to other roles, human resouces generally get promoted fast. • Human resouces people are having a slightly lower job satisfaction compared to other roles.
  • 28. Phase 3 : Data Transformation. 1.Undersampling & Oversampling (SMOTE) • We see that the dataset provided has more proportion of Active employees thanEx-employees. So the maching learning model may create biases towards active employees more. In order to avoid this imbalanced data we use SMOTE technique, so that minority class will become proportionate to majority class. • The initial count of the unique values were [ 0 : 1233, 1: 237] • After oversampling & undersampling we see that, the values of minority class has been increased and that of majority class has been decreased. • The final propertion of values are coming to be [0: 770, 1: 616], which is decent. • This aspect of dataset transformation is important for model, to avoid biases for our output.
  • 29. 2. Feature Scaling 2.1 Normalization of Data • The resultant data after sampling, needs to be normalized between certain range of values, so that the model wont be biased towards the high values of different variables. • The data has been normalized to the values between 0 & 1, independently of the statistical distribution they follow. 3.Train and Test data splitting • The data has been split to test data, training data & the model is trained with the training data. • Both the dependent & independent variables are split to test & training data. • It is done,in order for the model to perform with higher precision when the unknown test data is fed to it.
  • 30. Phase 4 : Building Machine Learning Models 1. Baseline algorithms • First we use a range of baseline algorithms (using out-of-the-box hyper-parameters) before we move on to more sophisticated solutions. • We use a baseline prediction algorithm to know whether the predictions for a given algorithm are good or not. • A baseline prediction algorithm provides a set of predictions that we can evaluate as we would any predictions for our problem, such as classification. • The scores from these algorithms provide the required point of comparison when evaluating all other machine learning algorithms on our problem. • Once established, we can comment on how much better a given algorithm is. The algorithms considered in this section are: Logistic Regression, Random Forest, KNN, Decision Tree Classifier.
  • 31. From the above results, we can conclude that: •The Machine Learning model ‘Random Forest Classifier seems to be the best fit model, with the accuracy average being 90.2 approximately. •The Machine learning models Decision tree, Logistic Regression also seems to be performing good, with accuracy average being 82.4, 77.5 respectively. We shall proceed with this models for analysis.
  • 32. 2. Logistic Regression • Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. Logistic Regression is classification algorithm that is not as sophisticated as the ensemble methods. Hence, it provides us with a good benchmark. • We use the Logistic Regression() function of the sklearn library to fit the data. from sklearn.linear_model import LogisticRegression • y prediction mean accuracy is coming out to be 77.8 approximately. • On fitting the data to linear regression we get : Intercept value is 1.16107 and coefficient values are: array([[-0.60427658, -0.66403848, 1.01115777, -0.17174707, -1.16585296, 0.259884 , -0.23246306, -1.86911886, -0.5026541 , -1.20096013, -0.16554763, 0.32295026, 1.05840619, 1.78505991, -0.17840222, -0.26499954, -0.73665015, -0.73702936, -0.6300062 , -0.63448951, -0.6229809 , 1.14153067, -1.35572966, 2.03116262, -1.03586988, 1.41016837, 0.92204911, -0.45429208, 0.38475649, -0.35644322, - 0.11705673, -0.16685064, -0.2734763 , 0.80221059, 0.99447046, 1.33874149, -0.24631437, -0.14315613, -0.31208634, 0.06669798, -0.01705458, 0.94274675, 0.56312458, 0.97563792]])
  • 33. 2.1 Confusion Matrix, Classification Report & ROC Curve. [[165 28] auc score = 77.6 [52 102]] Precision score for predicting our required Y variable is 75.
  • 34. 3. Random Forest Classifier • Random Forest is a machine learning method that is capable of solving both regression and classification. It is a brand of Ensemble learning, as it relies on an ensemble of decision trees. It aggregates Classification (or Regression) Trees. • Random Forest fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. Random Forest can handle a large number of features, and is helpful for estimating which of your variables are important in the underlying data being modeled. • Following the same train, test split data with the below parameters gives us the mean score of 92 approx. •The scores of all the splits is as follows & their mean gives us overall score of the model.
  • 35. 4. Decision Tree classifier • Decision tree classifier is a machine learning algorithm used for both classification and regression tasks, that predicts value of a target variable by learning simple decision rules inferred from the input features. • Decision trees are structured as a hierarchical tree-like structure, where each internal node represents a feature or attribute, and each branch represents a decision rule based on that attribute. The leaf nodes represent the final predicted outcome or class label. • They are also capable of handling nonlinear relationships between features and the target variable • Following the same train, test split data with the below parameters gives us the score of 87.6 approx.
  • 36. 5. Grid Search for fine tuning of hyper parameters. • Grid search works by creating a grid of all possible combinations of hyperparameter values specified by the user. It then trains and evaluates model using each combination of hyperparameters and selects the one that yields the best performance based on a predefined evaluation metric, such as accuracy, precision, or F1 score • It systematically explores all possible combinations of hyperparameters, ensuring that the best combination is found within the specified search space. However, this exhaustive search can be computationally expensive. 5.1 Grid Search for Random forest classifier. • Fitting the same train, test split with the above input parameters, gives us the best possible hyper parameters as: Best score = 0.9056670382757339 Best parameters = {'max_features': 1, 'n_estimators': 178}
  • 37. 5.2 Grid Search for Decision trees classifier. • Fitting the same train, test split with the above input parameters, gives us the best possible hyper parameters as : Best score = 0.8896137963275589 Best parameters = {'criterion': 'entropy', 'max_depth': 13, 'random_state': 17}
  • 38. 5.2 Grid Search for Decision trees classifier. • Fitting the same train, test split with the above input parameters, gives us the best possible hyper parameters as : Best score = 0.8896137963275589 Best parameters = {'criterion': 'entropy', 'max_depth': 13, 'random_state': 17}
  • 39. Conclusion : Retention plans : The major indicators of employees leaving include: • Age: Employees of young age bracket 25-35 are more likely to leave. Hence, efforts should be made to clearly articulate the long-term plan for the company for retention, as well as provide incentives in any way to upgrade job level. • Monthly Income: People on higher wages are less likely to leave the company. Hence, efforts should be made to gather information from current local market to determine if the company is paying competitive monthly wages. • Over Time: People who work overtime are more likely to leave the company. Hence efforts must be taken to appropriately scope projects before with adequate support and required crew so as to reduce the overtime. • YearsWithCurrManager: A large number of leavers leave 6 months after their Current Managers. By getting line Manager details for each employee,should determine under which Manager have experienced the largest numbers of resignings over the past year. By extracting patterns in the employees who have resigned may indicate recurring patterns in employees leaving in which case action may be taken accordingly.
  • 40. • DistanceFromHome: Employees who live further from home are more likely to leave the company. Hence, efforts should be made to provide support in the form of transportation for group of employees leaving the same area, or in the form of Allowances. Initial screening of employees based on their home location is probably not recommended as it would be regarded as a form of discrimination as long as employees make it to work on time every day. • TotalWorkingYears: The more experienced employees are less likely to leave. Employees who have between 5-8 years of experience should be identified as potentially having a higher-risk of leaving. • YearsAtCompany: Loyal companies are less likely to leave. Employees who hit their two-year anniversary should be identified as potentially having a higher-risk of leaving. The company should look deeper into human resource roles to understanding which part people are not satisfied with the job. Frequent communication and one-on-ones are strongly recommended. While the company doesn’t need to worry too much about people who worked for 2 – 4 companies, it’s still worth paying attention to males who went to more than 5 companies.