Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Hr analytics project
1. JATIN SAINI
MS BUSINESS ANALYTICS UNIVERSITY OF CINCINNATI
HR ANALYTICS REPORT –
DATA MANAGEMENT PROJECT
2. Introduction
This report helps us find why best and most employees leave prematurely. We use
exploratory data analysis and multi factor linear regression techniques to understand the pattern
between predictor and response variables. After analysing data, we fit linear regression model
and check the assumptions of linear model. The data has been divided in 3 classes based on
their salaries: high, medium and low.
Data Description
In this study we use data available on Kaggle (https://www.kaggle.com/ludobenistant/hr-
analytics)
satisfaction_level: Satisfaction Level
last_evaluation: Last Evaluation
number_project: Number of Projects
average_montly_hours: Average Monthly Hours
time_spend_company: Time Spent at the Company
work_accident: Whether they have had a work accident
promotion_last_5years: Whether they have had a promotion in last 5 years
sales: Department (sales)
salary: Salary (high/medium/low)
left: Whether the Employee has left (left=1 )
Cleaning Dataset
Table 1: Overview of dataset
3. We observed no null or missing values in
the data from figure 1.
Figure 1
Given the values in the data table, we can understand that the data is normalised and
has been collected by connecting employee record table, employee evaluation table and work
history table. Hence, we do not have to normalise the data.
As seen in the initial data exploration, satisfaction level and last evaluation have a scale
of 0 to 1 and work accident and promotion record have binary output [0,1]. Since, this data
does not have employee ID or employee name as attributes, it is difficult for us to identify
duplicate rows. Under the assumption that satisfaction level, last evaluation, number of projects
handled, time spent in the company and salary group can’t be the same for any 2 employees,
we check duplicates comparing these columns together.
After removing duplicates based on 9 variables, we find 11739 distinct rows. This step
helps us in eliminating data that may have been present in the dataset due to system error or
any other reason that we are not aware of.
Now we study descriptive summary of the data:
4. TABLE 2: Showing Descriptive Summary of Variables
Variable Satisfacti
on Level
Last
Evaluati
on
Number
of
Projects
Handled
Average
Monthly
Hours
Time
Spent in
the
Compan
y
Work
Accident
Promote
d in last 5
years
Min 0.09 0.36 2 96 2 0 0
1st
Quadrant
0.44 0.56 3 156 3 0 0
Median 0.64 0.72 4 200 3 0 0
Mean 0.61 0.72 3.8 201 3.5 0.15 0.02
3rd
Quadrant
0.82 0.87 5 245 4 0 0
Max 1 1 7 310 10 1 1
Next, we try to find any
correlation between variables. Common
assumption states that employees leaving
would have lower satisfaction level than
employees staying. We try to confirm this
belief from the data by plotting
correlations between variables.
FIGURE 2: The descriptive summary for
each variable
5. Correlation Matrix between variables in figure 2 shows that satisfaction_level and work
accident have negative correlation with employees leaving and time spent in the company has
positive correlation.
Below are paired plots and correlation between satisfaction_level, time spent in the
company, work accident and employees left status.
FIGURE 3
6. Below plot elaborates on the satisfaction level of employees according to their salary and
employment status
Figure 4: Satisfaction levels for different Salary Groups with Employee Status
Figure 4 indicates that the satisfaction level of employees staying is greater than 0.5 in
most of the cases, while employees leaving, generally, have a satisfaction level of less than 0.5
which clearly supports our common assumption that higher satisfaction of employees helps in
retaining employees.
Another plot which helps us get the sense of the data is the workload plot for
departments for people leaving and staying is shown in figure 5.
7. Figure 5: Work load for Employees across different Departments
From figure 5 we can see a trend going on that employees handling more than 4 projects
leaves the company, and it is true across all the departments. This clearly indicates an important
insight that people with more than 4 projects have higher tendency to leave and this is valid for
approximately all the departments.
Methodology
From the above data exploration, we can infer that
1. higher the satisfaction level of employees the fewer are the chances of them leaving.
2. higher number of projects or higher workload leads to higher chances of leaving
3. higher instances of work accident indicate that chances of employee leaving would be
low.
8. These inferences can now be shown in the following equation:
Chances of leaving = 0.35 + 0.04 * Time_Spend_Company - 0.45*Satisfaction_level – 0.10
* Work_accident
It can be interpreted as ‘Chances of leaving’ 0.5 or higher can show high chances of employee
leaving the job.
Now, we can see that the best fitted model does not have number of projects handled by the
employees as a factor in determining whether an employee would leave the company or not,
and, satisfaction level seems to be the most influential factors amongst all the factors.
Inferences
From the analysis conducted, we can infer following key points:
1. The dataset provided was a combination of 3 different tables and it would have been
helpful if we would have received the data in that form.
2. There is linear correlation between chances of an employee leaving and time spent in
the company, his satisfaction level with the job and if he/she had an accident during
the job years.
3. Employees handling more than 4 projects have higher chances of leaving the job. The
relationship between projects handled and chances of leaving of an employee are not
linearly related.
4. Satisfaction level (presented on a scale of 0 to 1) have a huge impact on the chances
of an employee leaving, and this is valid across all salary groups
5. General notion that higher satisfaction level of an employee has lower chances of
leaving holds true as seen from the final model