SlideShare a Scribd company logo
IS6030
NAME: AYANK GUPTA UCID:M12388639
Background: IBM’s HR Analytics
Motivation: To Uncover the factors that leads to employee Attrition
Goal:
1. To perform a data exploration in the data set by using SQL and R
2. Visualize the data using Tableau using interactive dashboard
3. Build a Random forest algorithm that could help us predict the factors leading to the
employee attrition.
Data: IBM’s Employee attrition data:
The data is found in the below URL (Kaggle Repository)
https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset/data
Description on the data:
Contains Various employee Identifiers as Age, Gender,ID
And various metrices like length of stay in the company,Average Monthly Salary
In total it has around 37 columns for us to explore and make the data a little bit more
meaningful
PROJECT INDEX
➢ CHAPTER 1: DATA PREPARATION
➢ Performing the completeness check of each variable – examine if missing values are present;
➢ Performing the validity check of each variable – examine if abnormal values are present;
➢ Cleaning the data based on the results of Steps 2 and 3;
➢ Summarizing the distribution of each variable (what tables and figures will you present?)
➢ CHAPTER 2: Descriptive Study (XY plots and correlation studies)
➢ Studying the X-Y plot between the different variables.
➢ Performing Various data exploration analysis
➢ CHAPTER 3: Statistical Modelling
➢ Preparing a model to predict the relationship between the independent variable and the dependent
variables
➢ CHAPTER 4: Visualizing Using Tableau
➢ CHAPTER 5: Project Summary (report)
CHAPTER 1: DATA PREPARATION
➢ Data Explanation:
S.No Column Name Column Definition Data Type
1 Age Age of Employees Numeric
2 Attrition Employee still in company status Categorical
3 BusinessTravel Opportunity of Travel Categorical
4 DailyRate Daily rate Numeric
5 Department Employee's Department Categorical
6 DistanceFromHome Employee's Distance from home Categorical
7 Education Level Eductaion Categorical
8 EducationField Field of the education Categorical
10 EmployeeNumber Unique Employee Identifier Numeric
11 EnvironmentSatisfaction Factor for Employee Satisfaction Categorical
12 Gender Employee gender Categorical
13 HourlyRate HourlyRate Numeric
14 JobInvolvement Involvment in the Job Categorical
15 JobLevel Level of the Job Categorical
16 JobRole Role in the Job Categorical
17 JobSatisfaction Satisfaction score of the employee Numeric
18 MaritalStatus Married or Not Categorical
19 MonthlyIncome Monthly income Categorical
20 MonthlyRate Monthl Salary Numeric
21 NumCompaniesWorked
Number of companies worked
before Numeric
22 Over18 whether 18+ ? Categorical
23 OverTime whether used to work overtime Numeric
24 PercentSalaryHike % Salary Hike Categorical
25 PerformanceRating
Performanceo rating of the
Employee Numeric
26 RelationshipSatisfaction Relationship satisfaction rating Categorical
27 StandardHours Standard working hours Numeric
28 StockOptionLevel StockOptionLevel available ? Categorical
29 TotalWorkingYears # Workingyears Numeric
30 TrainingTimesLastYear # Trainings Numeric
31 WorkLifeBalance Work life balance Numeric
32 YearsAtCompany
# years wrking for the same
company Numeric
33 YearsInCurrentRole # Years in current role Numeric
34
YearsSinceLastPromotio
n # years since last year Numeric
35 YearsWithCurrManager # years with the current manager Numeric
➢ Data Normalization:
Data is fine form , as it has all the required columns for analysis and prediction.
The data can be randomly divided into 2 data sets i.e Test and training data sets for the prediction
algorithm
➢ Data Cleaning:
1. Performing the completeness check of each variable
a. The whole data is unique at the Employee number level.
b. Are there, in any missing value ?
c. Bad columns
All the columns are aptly named , Except I had to make a age bucket columns
i.e above 30 and below 30 to have planned analysis on the age group.
Inconsistency in data types corrected:
I observed few of the data types were not consistent
➢ Using SQL for genera statistics, data description and data manipulation
After loading in the excel file in SQL, lets try to do some basic statistics
We will finding the statistics of the below variables
1. YearsWithCurrManager
2. YearsSinceLastPromotion
3. YearsInCurrentRole
4. YearsAtCompany
5. WorkLifeBalance
6. PerformanceRating
7. MonthlyIncome
Note: As opposite to the popular belief female on an average gets paid more than males.
Note: Another shocker all the people below 30 earn more on an average that their experienced
counterpart
Now let’s move our analysis to R , Firstly we need to connect our sql data base in to R.
Now let’s check the structure of the data base
Finally lets check the the statistically summary of the data sets to check for any discrepancies if any
A few basic summaries
Lets look at few of the visualizations in R
Creating a Machine learning algorithm-Random Forest for prediction Employees Attrition
Now use the VarImplot function to find out the most important factors
As we can see a few important factors in predicting the attritionis OverTime, MonthlyIncome,Total
Working Income and Job Roles
And hence we can study these factors in detail to explore more about in detail in the tableau
dashboard
Learning about the insights by using Tableau dashboards.
I tried to make the dashboard completely interactive, so that even a common man could drive
insights through it.
Few of the observations:
1. Most of the Employees are from the Life Sciences closely followed by Medical and
Marketing.
a. Least number of employees belongs to the HR
2. ~16% of the Employees in general leave the company per year.
3. Employees above is 30 are more in number as compared to employees in less than 30.
a. Maximum Employees are mail above 30.
b. And Minimum employees are female 30
In the interactive big boxes above we can also look at various metrices that will be ultra helpful to
the HR like
1. Avg Working hours of the selected employees
2. Avg years in the company
3. Average salary hike
4. Avg salary
Now we select the population that left company and we will be able to see a drastic change
And if we compare the above results with the people who have stayed in the company the
difference will be clear
Summary or the conclusion of the findings in the analysis
Below points will help uncover the reason why the employees left the company
1. The Average Salary of the employees who left was almost 33% less than the person who
stayed.
2. The Average Salary hike of the people that stayed in the office was marginally more that
people who left.
3. The Average Working years of the people who stayed were ~3 years more that people who
left
a. This means experienced people are reluctant to switch companies
4. Years with manager: On an average the people who stayed had more time with manager as
compared to the who left
Difficulties faced
1. The Assignment was at the time of other examinations so that to take out time in
completing the assignment
2. It was challenging but good to master Tableau as well.
3. Finding the dataset was also difficult.

More Related Content

Similar to Gupta ayankprojectassignmnet

Data AnalysisTeam A performed a series of analysis on behalf o.docx
Data AnalysisTeam A performed a series of analysis on behalf o.docxData AnalysisTeam A performed a series of analysis on behalf o.docx
Data AnalysisTeam A performed a series of analysis on behalf o.docx
theodorelove43763
 
Salary survey c level-2018
Salary survey c level-2018Salary survey c level-2018
Salary survey c level-2018
Olga Novykova
 
Module 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase SystemModule 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase System
Sam Pratt
 
Salary survey c level-2021
Salary survey c level-2021Salary survey c level-2021
Salary survey c level-2021
Kristina Florya
 
6 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 20196 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 2019
Namely
 
Analytics Driving Action - Building a Data-Driven HR Function
Analytics Driving Action - Building a Data-Driven HR FunctionAnalytics Driving Action - Building a Data-Driven HR Function
Analytics Driving Action - Building a Data-Driven HR Function
Jonathan Sidhu
 
Employee Annual Analysis PowerPoint Presentation Slides
Employee Annual Analysis PowerPoint Presentation SlidesEmployee Annual Analysis PowerPoint Presentation Slides
Employee Annual Analysis PowerPoint Presentation Slides
SlideTeam
 
Salary survey С-level 2019
Salary survey С-level 2019Salary survey С-level 2019
Salary survey С-level 2019
Таня Высоцкая
 
Data visualization via Tableau
Data visualization via TableauData visualization via Tableau
Data visualization via Tableau
kahhuey
 
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITION
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITIONUSING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITION
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITIONDr. John Sullivan
 
Digital Salary Insights 5th edition
Digital Salary Insights 5th editionDigital Salary Insights 5th edition
Digital Salary Insights 5th editionAlex Straw
 
Unit 4    [GB513 Business Analytics] Assignment .docx
Unit 4    [GB513 Business Analytics] Assignment .docxUnit 4    [GB513 Business Analytics] Assignment .docx
Unit 4    [GB513 Business Analytics] Assignment .docx
dickonsondorris
 
2013 Trends Report - The State of Employee Engagement by Quantum Workplace
2013 Trends Report - The State of Employee Engagement by Quantum Workplace2013 Trends Report - The State of Employee Engagement by Quantum Workplace
2013 Trends Report - The State of Employee Engagement by Quantum WorkplaceElizabeth Lupfer
 
Digital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th editionDigital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th edition
Alex Straw
 
Context of-Employee-Engagement - InspireOne
Context of-Employee-Engagement - InspireOneContext of-Employee-Engagement - InspireOne
Context of-Employee-Engagement - InspireOne
Inspireone
 
Best Companies Brochure
Best Companies BrochureBest Companies Brochure
Best Companies Brochure
Ollie Stokes
 
Employee Monitoring PowerPoint Presentation Slides
Employee Monitoring PowerPoint Presentation Slides Employee Monitoring PowerPoint Presentation Slides
Employee Monitoring PowerPoint Presentation Slides
SlideTeam
 
Based on your reading ofThe Best-Performing CEOs in the World, cho.docx
Based on your reading ofThe Best-Performing CEOs in the World, cho.docxBased on your reading ofThe Best-Performing CEOs in the World, cho.docx
Based on your reading ofThe Best-Performing CEOs in the World, cho.docx
ikirkton
 
Salary survey c level-2018
Salary survey c level-2018Salary survey c level-2018
Salary survey c level-2018
Olga Novykova
 
Whitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
Whitepaper | The Impact of Valuing Employee Effort | Sapience AnalyticsWhitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
Whitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
Sapience Analytics
 

Similar to Gupta ayankprojectassignmnet (20)

Data AnalysisTeam A performed a series of analysis on behalf o.docx
Data AnalysisTeam A performed a series of analysis on behalf o.docxData AnalysisTeam A performed a series of analysis on behalf o.docx
Data AnalysisTeam A performed a series of analysis on behalf o.docx
 
Salary survey c level-2018
Salary survey c level-2018Salary survey c level-2018
Salary survey c level-2018
 
Module 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase SystemModule 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase System
 
Salary survey c level-2021
Salary survey c level-2021Salary survey c level-2021
Salary survey c level-2021
 
6 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 20196 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 2019
 
Analytics Driving Action - Building a Data-Driven HR Function
Analytics Driving Action - Building a Data-Driven HR FunctionAnalytics Driving Action - Building a Data-Driven HR Function
Analytics Driving Action - Building a Data-Driven HR Function
 
Employee Annual Analysis PowerPoint Presentation Slides
Employee Annual Analysis PowerPoint Presentation SlidesEmployee Annual Analysis PowerPoint Presentation Slides
Employee Annual Analysis PowerPoint Presentation Slides
 
Salary survey С-level 2019
Salary survey С-level 2019Salary survey С-level 2019
Salary survey С-level 2019
 
Data visualization via Tableau
Data visualization via TableauData visualization via Tableau
Data visualization via Tableau
 
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITION
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITIONUSING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITION
USING BIG AND LITTLE DATA TO RECRUIT THE RIGHT CANDIDATE FOR EVERY POSITION
 
Digital Salary Insights 5th edition
Digital Salary Insights 5th editionDigital Salary Insights 5th edition
Digital Salary Insights 5th edition
 
Unit 4    [GB513 Business Analytics] Assignment .docx
Unit 4    [GB513 Business Analytics] Assignment .docxUnit 4    [GB513 Business Analytics] Assignment .docx
Unit 4    [GB513 Business Analytics] Assignment .docx
 
2013 Trends Report - The State of Employee Engagement by Quantum Workplace
2013 Trends Report - The State of Employee Engagement by Quantum Workplace2013 Trends Report - The State of Employee Engagement by Quantum Workplace
2013 Trends Report - The State of Employee Engagement by Quantum Workplace
 
Digital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th editionDigital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th edition
 
Context of-Employee-Engagement - InspireOne
Context of-Employee-Engagement - InspireOneContext of-Employee-Engagement - InspireOne
Context of-Employee-Engagement - InspireOne
 
Best Companies Brochure
Best Companies BrochureBest Companies Brochure
Best Companies Brochure
 
Employee Monitoring PowerPoint Presentation Slides
Employee Monitoring PowerPoint Presentation Slides Employee Monitoring PowerPoint Presentation Slides
Employee Monitoring PowerPoint Presentation Slides
 
Based on your reading ofThe Best-Performing CEOs in the World, cho.docx
Based on your reading ofThe Best-Performing CEOs in the World, cho.docxBased on your reading ofThe Best-Performing CEOs in the World, cho.docx
Based on your reading ofThe Best-Performing CEOs in the World, cho.docx
 
Salary survey c level-2018
Salary survey c level-2018Salary survey c level-2018
Salary survey c level-2018
 
Whitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
Whitepaper | The Impact of Valuing Employee Effort | Sapience AnalyticsWhitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
Whitepaper | The Impact of Valuing Employee Effort | Sapience Analytics
 

Recently uploaded

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 

Recently uploaded (20)

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 

Gupta ayankprojectassignmnet

  • 1. IS6030 NAME: AYANK GUPTA UCID:M12388639 Background: IBM’s HR Analytics Motivation: To Uncover the factors that leads to employee Attrition Goal: 1. To perform a data exploration in the data set by using SQL and R 2. Visualize the data using Tableau using interactive dashboard 3. Build a Random forest algorithm that could help us predict the factors leading to the employee attrition. Data: IBM’s Employee attrition data: The data is found in the below URL (Kaggle Repository) https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset/data Description on the data: Contains Various employee Identifiers as Age, Gender,ID And various metrices like length of stay in the company,Average Monthly Salary In total it has around 37 columns for us to explore and make the data a little bit more meaningful
  • 2. PROJECT INDEX ➢ CHAPTER 1: DATA PREPARATION ➢ Performing the completeness check of each variable – examine if missing values are present; ➢ Performing the validity check of each variable – examine if abnormal values are present; ➢ Cleaning the data based on the results of Steps 2 and 3; ➢ Summarizing the distribution of each variable (what tables and figures will you present?) ➢ CHAPTER 2: Descriptive Study (XY plots and correlation studies) ➢ Studying the X-Y plot between the different variables. ➢ Performing Various data exploration analysis ➢ CHAPTER 3: Statistical Modelling ➢ Preparing a model to predict the relationship between the independent variable and the dependent variables ➢ CHAPTER 4: Visualizing Using Tableau ➢ CHAPTER 5: Project Summary (report)
  • 3. CHAPTER 1: DATA PREPARATION ➢ Data Explanation: S.No Column Name Column Definition Data Type 1 Age Age of Employees Numeric 2 Attrition Employee still in company status Categorical 3 BusinessTravel Opportunity of Travel Categorical 4 DailyRate Daily rate Numeric 5 Department Employee's Department Categorical 6 DistanceFromHome Employee's Distance from home Categorical 7 Education Level Eductaion Categorical 8 EducationField Field of the education Categorical 10 EmployeeNumber Unique Employee Identifier Numeric 11 EnvironmentSatisfaction Factor for Employee Satisfaction Categorical 12 Gender Employee gender Categorical 13 HourlyRate HourlyRate Numeric 14 JobInvolvement Involvment in the Job Categorical 15 JobLevel Level of the Job Categorical 16 JobRole Role in the Job Categorical 17 JobSatisfaction Satisfaction score of the employee Numeric 18 MaritalStatus Married or Not Categorical 19 MonthlyIncome Monthly income Categorical 20 MonthlyRate Monthl Salary Numeric 21 NumCompaniesWorked Number of companies worked before Numeric 22 Over18 whether 18+ ? Categorical 23 OverTime whether used to work overtime Numeric 24 PercentSalaryHike % Salary Hike Categorical 25 PerformanceRating Performanceo rating of the Employee Numeric 26 RelationshipSatisfaction Relationship satisfaction rating Categorical 27 StandardHours Standard working hours Numeric 28 StockOptionLevel StockOptionLevel available ? Categorical 29 TotalWorkingYears # Workingyears Numeric 30 TrainingTimesLastYear # Trainings Numeric 31 WorkLifeBalance Work life balance Numeric 32 YearsAtCompany # years wrking for the same company Numeric 33 YearsInCurrentRole # Years in current role Numeric 34 YearsSinceLastPromotio n # years since last year Numeric 35 YearsWithCurrManager # years with the current manager Numeric
  • 4. ➢ Data Normalization: Data is fine form , as it has all the required columns for analysis and prediction. The data can be randomly divided into 2 data sets i.e Test and training data sets for the prediction algorithm ➢ Data Cleaning: 1. Performing the completeness check of each variable a. The whole data is unique at the Employee number level. b. Are there, in any missing value ? c. Bad columns All the columns are aptly named , Except I had to make a age bucket columns i.e above 30 and below 30 to have planned analysis on the age group. Inconsistency in data types corrected: I observed few of the data types were not consistent
  • 5. ➢ Using SQL for genera statistics, data description and data manipulation After loading in the excel file in SQL, lets try to do some basic statistics We will finding the statistics of the below variables 1. YearsWithCurrManager 2. YearsSinceLastPromotion 3. YearsInCurrentRole 4. YearsAtCompany 5. WorkLifeBalance 6. PerformanceRating 7. MonthlyIncome
  • 6.
  • 7. Note: As opposite to the popular belief female on an average gets paid more than males. Note: Another shocker all the people below 30 earn more on an average that their experienced counterpart Now let’s move our analysis to R , Firstly we need to connect our sql data base in to R. Now let’s check the structure of the data base
  • 8. Finally lets check the the statistically summary of the data sets to check for any discrepancies if any
  • 9. A few basic summaries Lets look at few of the visualizations in R
  • 10. Creating a Machine learning algorithm-Random Forest for prediction Employees Attrition Now use the VarImplot function to find out the most important factors
  • 11. As we can see a few important factors in predicting the attritionis OverTime, MonthlyIncome,Total Working Income and Job Roles And hence we can study these factors in detail to explore more about in detail in the tableau dashboard
  • 12. Learning about the insights by using Tableau dashboards. I tried to make the dashboard completely interactive, so that even a common man could drive insights through it. Few of the observations: 1. Most of the Employees are from the Life Sciences closely followed by Medical and Marketing. a. Least number of employees belongs to the HR 2. ~16% of the Employees in general leave the company per year. 3. Employees above is 30 are more in number as compared to employees in less than 30. a. Maximum Employees are mail above 30. b. And Minimum employees are female 30 In the interactive big boxes above we can also look at various metrices that will be ultra helpful to the HR like
  • 13. 1. Avg Working hours of the selected employees 2. Avg years in the company 3. Average salary hike 4. Avg salary Now we select the population that left company and we will be able to see a drastic change And if we compare the above results with the people who have stayed in the company the difference will be clear
  • 14.
  • 15. Summary or the conclusion of the findings in the analysis Below points will help uncover the reason why the employees left the company 1. The Average Salary of the employees who left was almost 33% less than the person who stayed. 2. The Average Salary hike of the people that stayed in the office was marginally more that people who left. 3. The Average Working years of the people who stayed were ~3 years more that people who left a. This means experienced people are reluctant to switch companies 4. Years with manager: On an average the people who stayed had more time with manager as compared to the who left Difficulties faced 1. The Assignment was at the time of other examinations so that to take out time in completing the assignment 2. It was challenging but good to master Tableau as well. 3. Finding the dataset was also difficult.