SlideShare a Scribd company logo
1 of 18
Download to read offline
 
	
  
	
  
	
  
	
  
	
  
	
  
	
  
HR Analytics: Why are our best and most
experienced employees leaving prematurely?
Erik Bebernes
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Introduction	
  
	
  
This	
  project	
  uses	
  a	
  dataset	
  I	
  found	
  on	
  kaggle,	
  where	
  a	
  company	
  has	
  been	
  experiencing	
  difficulty	
  
retaining	
  their	
  best	
  and	
  most	
  experienced	
  employees.	
  The	
  data	
  frame	
  consists	
  of	
  15,000	
  
observations	
  of	
  10	
  variables,	
  which	
  are:	
  
	
  
names(hr)	
  
	
  [1]	
  "satisfaction_level"	
  	
  	
  	
  "last_evaluation"	
  	
  	
  	
  	
  	
  	
  "number_project"	
  	
  	
  	
  	
  	
  	
  	
  
	
  [4]	
  "average_montly_hours"	
  	
  "time_spend_company"	
  	
  	
  	
  "Work_accident"	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  [7]	
  "left"	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "promotion_last_5years"	
  "sales"	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
[10]	
  "salary"	
  	
  
	
  
Satisfaction	
  Level	
  –	
  employees	
  overall	
  job	
  satisfaction	
  level	
  based	
  on	
  a	
  survey	
  
Last	
  Evaluation	
  –	
  employees	
  performance	
  score	
  given	
  by	
  their	
  manager	
  
Number	
  of	
  projects	
  –	
  how	
  many	
  projects	
  an	
  employee	
  has	
  been	
  involved	
  in	
  
Average	
  monthly	
  hours-­‐	
  mean	
  hours	
  worked	
  by	
  employee	
  per	
  month	
  
Time	
  spend	
  company	
  –	
  years	
  employee	
  has	
  worked	
  for	
  the	
  company	
  
Work	
  accident	
  –	
  binary	
  variable	
  indicating	
  if	
  1,	
  the	
  employee	
  has	
  had	
  an	
  accident	
  in	
  the	
  
workplace	
  
Left-­‐	
  indicated	
  if	
  1,	
  the	
  employee	
  has	
  left	
  or	
  0,	
  the	
  employee	
  is	
  still	
  at	
  the	
  company	
  
Promotion	
  last	
  5	
  years	
  –	
  binary	
  variable	
  signaling	
  if	
  the	
  employee	
  has	
  been	
  promoted	
  
Sales-­‐	
  categorical	
  variable	
  on	
  job	
  type	
  
Salary-­‐	
  categorical	
  variable	
  (low,	
  medium,	
  high)	
  of	
  how	
  much	
  the	
  employee	
  is	
  paid	
  annually	
  	
  
	
  
My	
  approach	
  to	
  this	
  project	
  can	
  be	
  summarized	
  in	
  the	
  following	
  steps:	
  
1.)   Clean	
  and	
  structure	
  the	
  data	
  set,	
  including	
  imputing	
  missing	
  values	
  if	
  necessary	
  
2.)   Create	
  subsets	
  between	
  the	
  best	
  employees	
  that	
  left	
  and	
  stayed	
  
3.)   Create	
  discrete	
  factor	
  variables	
  and	
  perform	
  association	
  rules	
  analysis	
  
4.)   Classify	
  employees	
  through	
  decision	
  tree	
  analysis	
  
5.)   Find	
  any	
  significant	
  correlations,	
  and	
  differences	
  in	
  correlations	
  between	
  said	
  subsets.	
  
6.)   Exploratory	
  visualization	
  analysis	
  in	
  an	
  attempt	
  to	
  explain	
  any	
  discrepancies	
  in	
  
correlations.	
  
7.)   Run	
  a	
  random	
  forest	
  algorithm	
  to	
  confirm	
  significant	
  relationships	
  between	
  the	
  
variables,	
  as	
  well	
  as	
  a	
  logistic	
  regression	
  
8.)   Provide	
  conclusions	
  and	
  recommendations	
  for	
  management	
  
	
  
	
  
HR_comma_sep	
  <-­‐	
  read.csv("~/Downloads/HR_comma_sep.csv",	
  header=TRUE)	
  
View(HR_comma_sep)	
  
hr<-­‐HR_comma_sep	
  
	
  
	
  
	
  
 
	
  
Cleaning	
  and	
  structuring	
  the	
  dataset	
  
	
  
At	
  first	
  glance	
  the	
  dataset	
  seems	
  clean,	
  but	
  to	
  make	
  sure	
  I’m	
  going	
  to	
  use	
  the	
  “amelia”	
  package	
  
to	
  identify	
  any	
  missingness.	
  
	
  
library(Amelia)	
  
missmap(hr)	
  
	
  
	
  
	
  
	
  
This	
  shows	
  that	
  there	
  is	
  no	
  missing	
  data.	
  
>	
  str(hr)	
  
'data.frame':	
   14999	
  obs.	
  of	
  	
  10	
  variables:	
  
	
  $	
  satisfaction_level	
  	
  	
  :	
  num	
  	
  0.38	
  0.8	
  0.11	
  0.72	
  0.37	
  0.41	
  0.1	
  0.92	
  0.89	
  0.42	
  ...	
  
	
  $	
  last_evaluation	
  	
  	
  	
  	
  	
  :	
  num	
  	
  0.53	
  0.86	
  0.88	
  0.87	
  0.52	
  0.5	
  0.77	
  0.85	
  1	
  0.53	
  ...	
  
	
  $	
  number_project	
  	
  	
  	
  	
  	
  	
  :	
  int	
  	
  2	
  5	
  7	
  5	
  2	
  2	
  6	
  5	
  5	
  2	
  ...	
  
 $	
  average_montly_hours	
  :	
  int	
  	
  157	
  262	
  272	
  223	
  159	
  153	
  247	
  259	
  224	
  142	
  ...	
  
	
  $	
  time_spend_company	
  	
  	
  :	
  int	
  	
  3	
  6	
  4	
  5	
  3	
  3	
  4	
  5	
  5	
  3	
  ...	
  
	
  $	
  Work_accident	
  	
  	
  	
  	
  	
  	
  	
  :	
  int	
  	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  ...	
  
	
  $	
  left	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  int	
  	
  1	
  1	
  1	
  1	
  1	
  1	
  1	
  1	
  1	
  1	
  ...	
  
	
  $	
  promotion_last_5years:	
  int	
  	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  0	
  ...	
  
	
  $	
  sales	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  Factor	
  w/	
  10	
  levels	
  "accounting","hr",..:	
  8	
  8	
  8	
  8	
  8	
  8	
  8	
  8	
  8	
  8	
  ...	
  
	
  $	
  salary	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  Factor	
  w/	
  3	
  levels	
  "high","low","medium":	
  2	
  3	
  3	
  2	
  2	
  2	
  2	
  2	
  2	
  2	
  ...	
  
	
  
Subsets	
  
	
  
hrbestleft<-­‐hr[which(hr$Last_eval>.72	
  &	
  hr$Left	
  ==	
  1),]	
  
#employees	
  with	
  high	
  evaluations	
  and	
  who	
  left	
  the	
  company	
  
	
  
hrbeststay<-­‐hr[which(hr$Last_eval>.72	
  &	
  hr$Left	
  ==	
  '0'),]	
  
#employees	
  with	
  high	
  evaluations	
  that	
  left	
  the	
  company	
  
	
  
Creating	
  Discrete	
  Variables	
  and	
  Association	
  Rules	
  Analysis	
  
	
  
quantile(hr$average_montly_hours,	
  .33)	
  
quantile(hr$average_montly_hours,	
  .67)	
  
hr$Hours_Discrete[hr$average_montly_hours	
  <=	
  69]<-­‐	
  'low'	
  
hr$Hours_Discrete[hr$average_montly_hours	
  >69	
  	
  &	
  hr$average_montly_hours	
  <	
  134]<-­‐	
  
'average'	
  
hr$Hours_Discrete[hr$average_montly_hours	
  >=134]<-­‐	
  'high'	
  
	
  
quantile(hr$satisfaction_level,	
  .33)	
  
quantile(hr$satisfaction_level,	
  .67)	
  
quantile(hr$satisfaction_level,	
  .8)	
  
	
  
hr$Sat_Discrete[hr$satisfaction_level	
  <=	
  43]<-­‐	
  'low'	
  
hr$Sat_Discrete[hr$satisfaction_level	
  >43	
  	
  &	
  hr$satisfaction_level	
  <	
  68]<-­‐	
  'average'	
  
hr$Sat_Discrete[hr$satisfaction_level	
  >=68]<-­‐	
  'high'	
  
	
  
library(arules)	
  
hr$Work_accident<-­‐as.factor(hr$Work_accident)	
  
hr$left<-­‐as.factor(hr$left)	
  
hr$promotion_last_5years<-­‐as.factor(hr$promotion_last_5years)	
  
hr$Hours_Discrete<-­‐as.factor(hr$Hours_Discrete)	
  
hr$Sat_Discrete<-­‐as.factor(hr$Sat_Discrete)	
  
names(hr)	
  
hrassoc<-­‐hr[,c(6,7,8,9,10,11,12)]	
  
rules<-­‐apriori(hrassoc,	
  parameter	
  =	
  list(support	
  =	
  .2,	
  confidence	
  =	
  .7))	
  
	
  
#since	
  the	
  majority	
  of	
  employees	
  haven't	
  left,	
  it	
  will	
  be	
  a	
  good	
  idea	
  to	
  reduce	
  support	
  and	
  
increase	
  confidence	
  
	
  
rules<-­‐apriori(hrassoc,	
  parameter	
  =	
  list(support	
  =	
  .05,	
  confidence	
  =	
  .95))	
  
	
  
#still	
  not	
  getting	
  any	
  interesting	
  rules,	
  so	
  I'll	
  make	
  a	
  new	
  dataset	
  with	
  only	
  left	
  =1	
  
	
  
hrleft<-­‐hr[which(hrassoc$left==1),]	
  
hrleft<-­‐hrleft[,c(6:12)]	
  
rules<-­‐apriori(hrleft,	
  parameter	
  =	
  list(support	
  =	
  .3,	
  confidence	
  =	
  1))	
  
inspect(rules)	
  
	
  
	
  	
  	
  lhs	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rhs	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  support	
  confidence	
  lift	
  
[1]	
  	
  {}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[2]	
  	
  {}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[3]	
  	
  {salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3688043	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[4]	
  	
  {salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.3688043	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[5]	
  	
  {salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.6082330	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[6]	
  	
  {salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.6082330	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[7]	
  	
  {Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.9106693	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[8]	
  	
  {Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.9106693	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[9]	
  	
  {Work_accident=0}	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.9526743	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[10]	
  {Work_accident=0}	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.9526743	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[11]	
  {promotion_last_5years=0}	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.9946794	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[12]	
  {promotion_last_5years=0}	
  =>	
  {Sat_Discrete=low}	
  0.9946794	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[13]	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[14]	
  {Sat_Discrete=low}	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[15]	
  {salary=medium,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3385606	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[16]	
  {salary=medium,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.3385606	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[17]	
  {Work_accident=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3480818	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[18]	
  {Work_accident=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.3480818	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[19]	
  {promotion_last_5years=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3674041	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[20]	
  {promotion_last_5years=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.3674041	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[21]	
  {left=1,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=medium}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.3688043	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[22]	
  {salary=medium,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Sat_Discrete=low}	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3688043	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[23]	
  {salary=low,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
 	
  	
  	
  	
  	
  Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.5527863	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[24]	
  {salary=low,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Hours_Discrete=high}	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.5527863	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[25]	
  {Work_accident=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.5816298	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[26]	
  {Work_accident=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.5816298	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[27]	
  {promotion_last_5years=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.6043125	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[28]	
  {promotion_last_5years=0,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.6043125	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[29]	
  {left=1,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  salary=low}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {Sat_Discrete=low}	
  0.6082330	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
[30]	
  {salary=low,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Sat_Discrete=low}	
  	
  	
  	
  	
  	
  	
  	
  =>	
  {left=1}	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.6082330	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  1	
  
	
  
Most	
  Interesting	
  rules:	
  
1.)	
  of	
  the	
  people	
  who	
  left,	
  99%	
  never	
  received	
  a	
  promotion	
  
2.)	
  95%	
  never	
  had	
  an	
  accident	
  
3.)	
  60%	
  were	
  low	
  salary	
  
4.)	
  100%	
  had	
  low	
  job	
  satisfaction	
  
	
  
These	
  rules	
  signify	
  a	
  few	
  important	
  relationships	
  between	
  the	
  variables	
  that	
  may	
  explain	
  why	
  
some	
  employees	
  are	
  leaving.	
  Of	
  the	
  employees	
  who	
  left,	
  99%	
  never	
  had	
  an	
  accident,	
  60%	
  were	
  
low	
  salary	
  and	
  an	
  astonishing	
  100%	
  had	
  low	
  job	
  satisfaction.	
  This	
  must	
  mean	
  satisfaction	
  is	
  
significant	
  in	
  determining	
  leaving	
  vs.	
  staying.	
  Next	
  I’m	
  going	
  to	
  look	
  at	
  correlations	
  between	
  
satisfaction	
  and	
  the	
  numeric	
  variables.	
  
	
  
Correlation	
  Analysis	
  
	
  
Using	
  all	
  employees	
  in	
  the	
  dataset:	
  
	
  
cor(hr[,1:5])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  satisfaction_level	
  last_evaluation	
  number_project	
  average_montly_hours	
  
satisfaction_level	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.00000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.1050212	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.1429696	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.02004811	
  
last_evaluation	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.10502121	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3493326	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.33974180	
  
number_project	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.14296959	
  	
  	
  	
  	
  	
  	
  0.3493326	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.41721063	
  
average_montly_hours	
  	
  	
  	
  	
  	
  	
  -­‐0.02004811	
  	
  	
  	
  	
  	
  	
  0.3397418	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.4172106	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.00000000	
  
time_spend_company	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.10086607	
  	
  	
  	
  	
  	
  	
  0.1315907	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.1967859	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.12775491	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  time_spend_company	
  
satisfaction_level	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.1008661	
  
last_evaluation	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.1315907	
  
number_project	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.1967859	
  
average_montly_hours	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.1277549	
  
time_spend_company	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  
	
  
	
  
	
  
The	
  above	
  plot	
  and	
  output	
  shows	
  correlations	
  between	
  numeric	
  variables	
  of	
  all	
  employees.	
  
Managers	
  seem	
  to	
  give	
  higher	
  evaluation	
  scores	
  to	
  employees	
  who	
  work	
  more	
  hours	
  and	
  who	
  
have	
  more	
  projects,	
  however	
  there	
  is	
  a	
  negative	
  correlation	
  between	
  employee	
  satisfaction	
  and	
  
number	
  of	
  projects.	
  It	
  should	
  be	
  interesting	
  to	
  see	
  how	
  this	
  compares	
  to	
  correlations	
  using	
  just	
  
the	
  best	
  employees.	
  
	
  
Correlations	
  using	
  just	
  the	
  best	
  employees	
  and	
  most	
  experienced	
  employees	
  that	
  left:	
  
	
  
>	
  hrbestleft<-­‐hr[which(hr$last_evaluation	
  >=	
  .72	
  &	
  hr$left	
  ==	
  1),]	
  
>	
  cor(hrbestleft[,1:5])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
	
  
	
  
 
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  satisfaction_level	
  last_evaluation	
  number_project	
  
satisfaction_level	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  0.3611564	
  	
  	
  	
  	
  -­‐0.7370609	
  
last_evaluation	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3611564	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  -­‐0.2150533	
  
number_project	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.7370609	
  	
  	
  	
  	
  	
  -­‐0.2150533	
  	
  	
  	
  	
  	
  1.0000000	
  
average_montly_hours	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.4771749	
  	
  	
  	
  	
  	
  -­‐0.1261519	
  	
  	
  	
  	
  	
  0.5217016	
  
time_spend_company	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.6582700	
  	
  	
  	
  	
  	
  	
  0.3147566	
  	
  	
  	
  	
  -­‐0.3644283	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  average_montly_hours	
  time_spend_company	
  
satisfaction_level	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.4771749	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.6582700	
  
last_evaluation	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.1261519	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.3147566	
  
number_project	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.5217016	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.3644283	
  
average_montly_hours	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.1572702	
  
time_spend_company	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐0.1572702	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.0000000	
  
	
  
	
  
	
  
There	
  are	
  some	
  very	
  notable	
  differences	
  here,	
  including	
  the	
  massive	
  negative	
  correlations	
  
between	
  number	
  of	
  projects	
  and	
  satisfaction	
  level	
  and	
  the	
  large	
  negative	
  correlation	
  between	
  
average	
  monthly	
  hours	
  and	
  satisfaction	
  level.	
  This	
  probably	
  means	
  that	
  managers	
  are	
  
overworking	
  their	
  best	
  employees,	
  which	
  leads	
  to	
  lower	
  satisfaction	
  levels.	
  It’s	
  worth	
  looking	
  at	
  
the	
  data	
  visually	
  to	
  see	
  if	
  this	
  is	
  in	
  fact	
  the	
  case.	
  I’ll	
  also	
  run	
  a	
  decision	
  tree	
  analysis	
  which	
  may	
  
serve	
  as	
  a	
  confirmation.	
  
	
  
Interpreting	
  Correlation	
  Differences	
  Visually	
  
	
  
Do	
  the	
  best	
  employees	
  work	
  more	
  hours?	
  
	
  
	
  
	
  
	
  
Comparing	
  these	
  histograms,	
  it’s	
  clear	
  that	
  employees	
  that	
  score	
  higher	
  on	
  manager	
  
evaluations	
  are	
  working	
  considerably	
  more	
  hours	
  than	
  the	
  workforce	
  as	
  a	
  whole.	
  
Do	
  the	
  best	
  employees	
  work	
  on	
  more	
  projects?	
  
	
  
Yes,	
  the	
  best	
  employees	
  usually	
  have	
  more	
  projects.	
  There	
  is	
  a	
  downward	
  trend	
  as	
  the	
  number	
  
of	
  projects	
  increase	
  when	
  you	
  look	
  at	
  the	
  workforce	
  as	
  a	
  whole,	
  and	
  the	
  opposite	
  can	
  almost	
  be	
  
said	
  for	
  the	
  best	
  employees	
  (until	
  you	
  get	
  to	
  6	
  projects).	
  
	
  
Have	
  the	
  best	
  employees	
  been	
  working	
  at	
  the	
  company	
  for	
  a	
  longer	
  period	
  of	
  time?	
  
	
  
Almost	
  all	
  of	
  the	
  best	
  employees	
  have	
  been	
  at	
  the	
  company	
  for	
  at	
  least	
  four	
  years,	
  perhaps	
  this	
  
can	
  be	
  related	
  to	
  “learning	
  by	
  doing.”	
  It’s	
  also	
  a	
  sufficient	
  amount	
  of	
  time	
  to	
  prove	
  to	
  managers	
  
that	
  they	
  are	
  high	
  performing.	
  The	
  dataset	
  as	
  a	
  whole	
  shows	
  that	
  there	
  are	
  an	
  abundance	
  of	
  
employees	
  who	
  have	
  been	
  there	
  for	
  2	
  and	
  3	
  years.	
  Let’s	
  see	
  if	
  anyone	
  is	
  being	
  promoted.	
  
	
  
As	
  you	
  can	
  see	
  above,	
  of	
  the	
  best	
  performing	
  employees…hardly	
  any	
  of	
  them	
  have	
  been	
  
promoted	
  in	
  the	
  last	
  five	
  years.	
  In	
  fact,	
  it’s	
  only	
  .2%.	
  It	
  must	
  be	
  discouraging	
  to	
  these	
  employees	
  
to	
  be	
  highly	
  evaluated	
  and	
  not	
  be	
  rewarded	
  for	
  it.	
  
	
  
Next	
  I’m	
  going	
  to	
  look	
  at	
  the	
  relationship	
  between	
  job	
  type	
  and	
  salary.	
  Are	
  there	
  noticeable	
  
differences	
  in	
  pay	
  between	
  different	
  departments	
  of	
  the	
  company?	
  And	
  how	
  many	
  employees	
  
are	
  in	
  each	
  department?	
  
	
  
A	
  couple	
  of	
  things	
  I	
  noticed	
  while	
  looking	
  at	
  this	
  graph	
  are	
  that	
  a	
  majority	
  of	
  the	
  good	
  
employees	
  are	
  on	
  the	
  low	
  end	
  of	
  the	
  salary	
  spectrum	
  and	
  most	
  of	
  them	
  are	
  working	
  in	
  sales,	
  
support	
  in	
  technical	
  roles.	
  However	
  I	
  made	
  the	
  same	
  graph	
  using	
  the	
  dataset	
  as	
  a	
  whole	
  and	
  
didn’t	
  see	
  much	
  of	
  a	
  difference,	
  so	
  I’ll	
  put	
  these	
  observations	
  aside	
  for	
  now.	
  
	
  
As	
  I	
  mentioned	
  earlier	
  during	
  my	
  association	
  rules	
  analysis,	
  satisfaction	
  is	
  most	
  likely	
  significant	
  
in	
  determining	
  why	
  the	
  best	
  employees	
  are	
  leaving.	
  The	
  plot	
  below	
  is	
  an	
  attempt	
  to	
  see	
  that	
  
relationship	
  visually,	
  where	
  the	
  green	
  density	
  is	
  the	
  subset	
  of	
  the	
  best	
  employees	
  that	
  left,	
  the	
  
red	
  density	
  are	
  the	
  best	
  employees	
  that	
  have	
  stayed,	
  and	
  the	
  blue	
  density	
  is	
  the	
  entire	
  dataset.	
  
	
  
	
  
	
  
	
  
p1<-­‐ggplot()+geom_density(data	
  =	
  hrbestleft,	
  aes(satisfaction_level),	
  fill	
  =	
  'green',	
  alpha	
  =	
  .3)+	
  
	
  	
  geom_density(data	
  =	
  hrbeststay,	
  aes(satisfaction_level),	
  fill	
  =	
  'red',	
  alpha	
  =	
  .3)+	
  
	
  	
  geom_density(data	
  =	
  hr,	
  aes(satisfaction_level),	
  fill	
  =	
  'blue',	
  alpha	
  =	
  .3)+theme_light(base_size	
  
=	
  16)+xlab("Satisfaction	
  Level")+ylab("")+	
  
	
  	
  ggtitle("Satisfaction	
  Levels	
  of	
  Subsets")	
  
	
  
	
  
	
  
The	
  best	
  employees	
  that	
  left	
  (green)	
  is	
  what	
  really	
  stands	
  out	
  here.	
  Many	
  of	
  them	
  have	
  very	
  
low	
  satisfaction	
  levels	
  (<.25),	
  then	
  there	
  is	
  a	
  lull,	
  and	
  then	
  another	
  group	
  with	
  satisfaction	
  levels	
  
greater	
  than	
  .6.	
  It’s	
  difficult	
  to	
  say	
  why	
  this	
  might	
  be.	
  Perhaps	
  there	
  is	
  a	
  difference	
  in	
  how	
  the	
  
employees	
  interpret	
  satisfaction.	
  It’s	
  possible	
  that	
  they	
  still	
  enjoyed	
  their	
  job	
  despite	
  being	
  over	
  
worked	
  and	
  not	
  being	
  promoted.	
  I	
  think	
  the	
  best	
  way	
  to	
  figure	
  this	
  out	
  is	
  through	
  a	
  decision	
  
tree	
  analysis,	
  where	
  those	
  who	
  left	
  will	
  be	
  classified	
  more	
  accurately.	
  But	
  first,	
  I	
  want	
  to	
  
combine	
  average	
  monthly	
  hours	
  and	
  satisfaction	
  into	
  a	
  plot.	
  Since	
  I	
  noticed	
  earlier	
  that	
  the	
  
good	
  employees	
  that	
  left	
  were	
  working	
  a	
  lot	
  more	
  hours,	
  there	
  should	
  be	
  a	
  strong	
  relationship	
  
between	
  the	
  two.	
  
	
  
plot6<-­‐ggplot(hr,	
  aes(satisfaction_level,	
  average_montly_hours,	
  color	
  =	
  left,	
  alpha	
  =	
  
.3))+geom_point()+ggtitle("Hours	
  and	
  Satisfaction")	
  
	
  
	
  
 
	
  
These	
  distributions	
  are	
  very	
  tight,	
  which	
  tells	
  me	
  that	
  the	
  decision	
  tree	
  will	
  be	
  a	
  great	
  addition	
  
to	
  my	
  analysis.	
  The	
  blue	
  box	
  must	
  be	
  underperforming	
  employees,	
  those	
  that	
  have	
  not	
  been	
  
working	
  many	
  hours	
  and	
  aren’t	
  that	
  satisfied.	
  Where	
  the	
  other	
  two	
  blue	
  distributions,	
  judging	
  
by	
  the	
  density	
  plots	
  on	
  the	
  previous	
  page,	
  are	
  high	
  performing	
  employees.	
  My	
  next	
  plot	
  is	
  
another	
  confirmation	
  of	
  that	
  hypothesis,	
  but	
  this	
  time	
  I’m	
  adding	
  years	
  spent	
  at	
  the	
  company.	
  
	
  
The	
  cluster	
  on	
  the	
  right	
  has	
  a	
  lot	
  of	
  employees	
  that	
  have	
  been	
  at	
  the	
  company	
  for	
  a	
  long	
  time,	
  I	
  
think	
  the	
  lack	
  of	
  promotions	
  may	
  have	
  something	
  to	
  do	
  with	
  them	
  leaving.	
  
	
  
Decision	
  Tree	
  Analysis	
  
	
  
Decision	
  trees	
  are	
  best	
  used	
  on	
  small	
  datasets,	
  so	
  in	
  order	
  to	
  get	
  a	
  few	
  simple	
  rules	
  (and	
  to	
  
avoid	
  over-­‐fitting	
  the	
  model)	
  I	
  made	
  a	
  small	
  sample	
  of	
  the	
  data	
  (2%).	
  	
  
	
  
install.packages("party")	
  
library(party)	
  
set.seed(421)	
  
ind<-­‐sample(2,	
  nrow(hr),	
  replace	
  =	
  TRUE,	
  prob	
  =	
  c(0.02,0.3))	
  
traindata<-­‐hr[ind==1,]	
  
testdata<-­‐hr[ind==2,]	
  
form<-­‐left~satisfaction_level+average_montly_hours+time_spend_company+last_evaluation	
  
hrtree<-­‐ctree(form,	
  data	
  =	
  traindata,	
  controls	
  =	
  ctree_control(maxsurrogate	
  =	
  3))	
  
table(predict(hrtree),	
  traindata$left)	
  
plot(hrtree,	
  type	
  =	
  "simple")	
  
?ctree	
  
	
  	
  print(hrtree)	
  
	
  
	
  
	
  
Using	
  the	
  variables	
  time	
  spent	
  at	
  company,	
  satisfaction,	
  average	
  monthly	
  hours	
  and	
  last	
  
evaluation	
  (what	
  I	
  think	
  are	
  the	
  most	
  important	
  variables	
  based	
  on	
  the	
  visualizations	
  I	
  made)	
  I	
  
was	
  able	
  to	
  come	
  up	
  with	
  a	
  few	
  rules	
  that	
  help	
  classify	
  employees	
  into	
  the	
  leaving	
  and	
  staying	
  
categories.	
  Here	
  are	
  my	
  key	
  takeaways:	
  
1.)   Employees	
  with	
  low	
  satisfaction	
  levels,	
  but	
  haven’t	
  been	
  at	
  the	
  company	
  long	
  will	
  generally	
  
stay.	
  
satisfaction_level
p < 0.001
1
≤ 0.46 > 0.46
time_spend_company
p < 0.001
2
≤ 4 > 4
time_spend_company
p = 0.001
3
≤ 2 > 2
n = 21
y = (0.952, 0.048)
4
n = 217
y = (0.258, 0.742)
5
n = 46
y = (0.891, 0.109)
6
time_spend_company
p < 0.001
7
≤ 4 > 4
n = 562
y = (0.984, 0.016)
8
last_evaluation
p < 0.001
9
≤ 0.8 > 0.8
n = 61
y = (0.951, 0.049)
10
average_montly_hours
p < 0.001
11
≤ 216 > 216
n = 18
y = (1, 0)
12
time_spend_company
p = 0.001
13
≤ 5 > 5
n = 37
y = (0.081, 0.919)
14
n = 22
y = (0.273, 0.727)
15
2.)   Employees	
  with	
  low	
  satisfaction	
  levels	
  and	
  who	
  have	
  been	
  at	
  the	
  company	
  between	
  2	
  and	
  
5	
  years	
  leave.	
  
3.)   Employees	
  with	
  high	
  satisfaction	
  levels	
  who	
  have	
  been	
  working	
  for	
  less	
  than	
  or	
  equal	
  to	
  4	
  
years	
  stay.	
  
4.)   High	
  performing	
  employees	
  with	
  high	
  satisfaction	
  and	
  who	
  have	
  been	
  at	
  the	
  company	
  >4	
  
years	
  leave	
  when	
  they	
  are	
  working	
  too	
  many	
  hours.	
  
	
  
This	
  analysis	
  is	
  91.5%	
  accurate,	
  which	
  is	
  pretty	
  good	
  considering	
  how	
  simple	
  the	
  tree	
  is.	
  If	
  I	
  
were	
  to	
  show	
  management	
  one	
  graph	
  it	
  would	
  be	
  this,	
  it	
  identifies	
  clear	
  cut	
  patterns	
  and	
  
confirms	
  much	
  of	
  what	
  I	
  had	
  been	
  hypothesizing	
  with	
  my	
  previous	
  analyses.	
  
	
  
Random	
  Forest	
  and	
  Logistic	
  Regression	
  
	
  
Before	
  offering	
  my	
  final	
  advice	
  to	
  management,	
  I	
  want	
  to	
  see	
  how	
  accurately	
  I	
  can	
  predict	
  who	
  
is	
  going	
  to	
  leave.	
  An	
  accurate	
  machine	
  learning	
  algorithm	
  will	
  allow	
  the	
  company	
  to	
  focus	
  on	
  
specific	
  employees…perhaps	
  offering	
  them	
  a	
  raise	
  or	
  reducing	
  their	
  hours	
  before	
  they	
  decide	
  to	
  
leave.	
  First	
  I’m	
  going	
  to	
  try	
  a	
  logistic	
  regression,	
  which	
  determines	
  probabilities	
  of	
  a	
  binary	
  
dependent	
  variable	
  for	
  each	
  observation.	
  Any	
  probability	
  greater	
  than	
  .5	
  will	
  mean	
  the	
  
employee	
  will	
  leave.	
  Let’s	
  see	
  how	
  it	
  goes:	
  
	
  
Logistic	
  Regression:	
  
	
  
#creating	
  a	
  test	
  and	
  training	
  set	
  using	
  dplyr	
  
set.seed(142)	
  
train<-­‐sample_frac(hr,	
  .7)	
  
sid<-­‐as.numeric(rownames(train))	
  
test<-­‐hr[-­‐sid,]	
  
	
  
fitted.results<-­‐predict(glmmodel,	
  newdata	
  =	
  test,	
  type	
  =	
  "response")	
  
#type	
  =	
  response	
  converts	
  logits	
  to	
  predicted	
  probabilities	
  
new<-­‐mutate(test,	
  fitted.results)	
  
predicted.to.leave<-­‐filter(new,	
  fitted.results	
  >	
  .5)	
  
predicted.to.stay<-­‐filter(new,	
  fitted.results	
  <	
  .5)	
  
View(predicted.to.stay)	
  
summary(predicted.to.stay$left)	
  
summary(predicted.to.leave$left)	
  
	
  
The	
  model	
  ended	
  up	
  being	
  only	
  79.4%	
  accurate.	
  Which	
  is	
  okay,	
  but	
  considering	
  the	
  decision	
  
tree	
  was	
  91%,	
  I	
  think	
  I	
  can	
  come	
  up	
  with	
  a	
  better	
  model.	
  Random	
  forest	
  works	
  by	
  averaging	
  the	
  
results	
  of	
  many	
  decision	
  trees	
  and	
  can	
  work	
  very	
  well.	
  Let’s	
  try	
  that:	
  
	
  
randindex<-­‐	
  sample(1:dim(hr)[1])	
  
cutpoint2_3<-­‐floor(2*dim(hr)[1]/3)	
  
traindata<-­‐hr[randindex[1:cutpoint2_3],]	
  
testdata<-­‐	
  hr[randindex[(cutpoint2_3+1):dim(hr)[1]],]	
  
library(randomForest)	
  
rfmodel	
  <-­‐	
  randomForest(factor(left)	
  ~	
  satisfaction_level	
  +	
  number_project	
  +	
  
average_montly_hours	
  +	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  time_spend_company	
  +	
  promotion_last_5years	
  +	
  last_evaluation,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  data	
  =	
  traindata)	
  
	
  
plot9<-­‐plot(rfmodel,	
  ylim=c(0,0.36))	
  
	
  
	
  
	
  
The	
  false	
  positive	
  and	
  false	
  negative	
  errors	
  are	
  very	
  low,	
  which	
  is	
  a	
  good	
  sign.	
  Let’s	
  see	
  how	
  
accurate	
  the	
  model	
  is	
  when	
  I	
  try	
  it	
  on	
  a	
  test	
  set.	
  
	
  
prediction<-­‐predict(rfmodel,	
  testdata)	
  
	
  
	
  
confusionMatrix(prediction,	
  testdata$left)	
  
Confusion	
  Matrix	
  and	
  Statistics	
  
 
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Reference	
  
Prediction	
  	
  	
  	
  0	
  	
  	
  	
  1	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  0	
  3786	
  	
  	
  48	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  10	
  1156	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Accuracy	
  :	
  0.9884	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  95%	
  CI	
  :	
  (0.985,	
  0.9912)	
  
	
  	
  	
  	
  No	
  Information	
  Rate	
  :	
  0.7592	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  P-­‐Value	
  [Acc	
  >	
  NIR]	
  :	
  <	
  2.2e-­‐16	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Kappa	
  :	
  0.9679	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  Mcnemar's	
  Test	
  P-­‐Value	
  :	
  1.184e-­‐06	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Sensitivity	
  :	
  0.9974	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Specificity	
  :	
  0.9601	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  Pos	
  Pred	
  Value	
  :	
  0.9875	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  Neg	
  Pred	
  Value	
  :	
  0.9914	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Prevalence	
  :	
  0.7592	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  Detection	
  Rate	
  :	
  0.7572	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  Detection	
  Prevalence	
  :	
  0.7668	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  Balanced	
  Accuracy	
  :	
  0.9787	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  'Positive'	
  Class	
  :	
  0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
The	
  model	
  is	
  98.84%	
  accurate,	
  this	
  will	
  prove	
  to	
  be	
  very	
  beneficial	
  in	
  identifying	
  employees	
  that	
  
are	
  likely	
  to	
  be	
  leaving	
  in	
  the	
  future.	
  What	
  variables	
  are	
  most	
  important	
  in	
  leaving	
  vs.	
  staying?	
  
	
  
importance(rfmodel)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  MeanDecreaseGini	
  
satisfaction_level	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1226.048093	
  
number_project	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  665.390311	
  
average_montly_hours	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  536.922188	
  
time_spend_company	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  664.193153	
  
promotion_last_5years	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  4.487941	
  
last_evaluation	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  430.694068	
  
	
  
According	
  to	
  the	
  random	
  forest	
  model,	
  satisfaction,	
  number	
  of	
  projects	
  and	
  time	
  spent	
  at	
  the	
  
company	
  are	
  the	
  three	
  most	
  significant	
  variables.	
  
	
  
	
  
	
  
	
  
Conclusion	
  and	
  Recommendations	
  
	
  
I	
  very	
  much	
  enjoyed	
  learning	
  more	
  about	
  this	
  dataset.	
  I	
  performed	
  so	
  many	
  types	
  of	
  analyses	
  
because	
  retaining	
  a	
  company’s	
  best	
  employees	
  is	
  extremely	
  important.	
  High	
  turnover	
  is	
  costly,	
  
and	
  if	
  a	
  company	
  wants	
  to	
  grow	
  you	
  need	
  the	
  right	
  people	
  leading	
  the	
  way.	
  I’ve	
  worked	
  for	
  
organizations	
  in	
  the	
  past	
  that	
  have	
  had	
  high	
  turnover	
  rates,	
  and	
  while	
  you	
  want	
  
underperforming	
  employees	
  to	
  leave,	
  you	
  want	
  your	
  best	
  workers	
  to	
  grow	
  with	
  you.	
  
	
  
What	
  I	
  found	
  most	
  useful	
  in	
  this	
  project	
  were	
  visualizations,	
  the	
  decision	
  tree	
  and	
  the	
  random	
  
forest	
  algorithm.	
  They	
  all	
  can	
  be	
  used	
  in	
  different	
  ways.	
  If	
  management	
  wants	
  a	
  basic	
  
understanding	
  of	
  what’s	
  going	
  on,	
  I	
  would	
  show	
  them	
  the	
  visuals,	
  if	
  they	
  want	
  to	
  know	
  what	
  
patterns	
  are	
  harming	
  them,	
  I	
  would	
  go	
  over	
  the	
  decision	
  tree,	
  and	
  if	
  they	
  want	
  to	
  know	
  what	
  
employees	
  will	
  leave	
  in	
  the	
  future,	
  the	
  random	
  forest	
  model	
  would	
  be	
  helpful.	
  Based	
  on	
  all	
  of	
  
those,	
  here	
  are	
  the	
  two	
  key	
  points	
  management	
  should	
  know	
  concerning	
  why	
  their	
  best	
  and	
  
most	
  experienced	
  employees	
  are	
  leaving	
  prematurely:	
  
	
  
1.)   They	
  are	
  being	
  overworked	
  –	
  it’s	
  common	
  for	
  managers	
  to	
  take	
  advantage	
  of	
  employees	
  
who	
  do	
  a	
  good	
  job	
  by	
  giving	
  them	
  a	
  heavier	
  workload.	
  This	
  is	
  costing	
  the	
  company,	
  
because	
  they	
  are	
  deciding	
  to	
  leave.	
  
2.)   They	
  aren’t	
  being	
  promoted-­‐	
  good	
  employees	
  expect	
  to	
  be	
  rewarded.	
  There	
  is	
  a	
  large	
  
group	
  of	
  employees	
  with	
  high	
  satisfaction	
  levels	
  who	
  have	
  been	
  at	
  the	
  company	
  for	
  
more	
  than	
  four	
  years,	
  but	
  they	
  decided	
  to	
  leave	
  because	
  there	
  isn’t	
  any	
  career	
  growth.	
  	
  
	
  
There	
  are	
  a	
  couple	
  of	
  simple,	
  obvious	
  actions	
  management	
  can	
  take.	
  They	
  shouldn’t	
  work	
  their	
  
best	
  employees	
  more	
  than	
  anyone	
  else,	
  and	
  they	
  should	
  be	
  promoted	
  after	
  3	
  or	
  4	
  years.	
  In	
  
time,	
  I	
  think	
  they	
  will	
  find	
  that	
  although	
  the	
  company	
  will	
  be	
  less	
  productive	
  in	
  the	
  short	
  run,	
  
reducing	
  their	
  turnover	
  rate	
  of	
  their	
  best	
  employees	
  will	
  lead	
  to	
  incremental	
  growth.	
  
	
  
	
  	
  

More Related Content

What's hot

Report on attrition rates of bpo and ites
Report on attrition rates of bpo and itesReport on attrition rates of bpo and ites
Report on attrition rates of bpo and itesProjects Kart
 
Human Resource Planning, Recruitment, Selection and Placement
Human Resource Planning, Recruitment, Selection and PlacementHuman Resource Planning, Recruitment, Selection and Placement
Human Resource Planning, Recruitment, Selection and PlacementLawrence Bautista
 
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIES
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIESA STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIES
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIESIAEME Publication
 
Employee retention hr project in brandix
Employee retention hr project in brandixEmployee retention hr project in brandix
Employee retention hr project in brandixlakshmanrao46
 
Staffing Case Studies
Staffing Case StudiesStaffing Case Studies
Staffing Case Studiesprocurehome
 
A study on reward and recognition program 2016
A study on reward and recognition program  2016A study on reward and recognition program  2016
A study on reward and recognition program 2016Anand Yogesh
 
Recruitment, Selection, and Placement
Recruitment, Selection, and PlacementRecruitment, Selection, and Placement
Recruitment, Selection, and PlacementSowie Althea
 
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIA
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIAA STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIA
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIAIAEME Publication
 
Attrition and long term skill retention
Attrition and long term skill retentionAttrition and long term skill retention
Attrition and long term skill retentionshelly hanspal
 
The Hr Department Presentation 2011
The Hr Department Presentation 2011The Hr Department Presentation 2011
The Hr Department Presentation 2011The HR Department
 
Project report on compensation and benefits
Project report on compensation and benefitsProject report on compensation and benefits
Project report on compensation and benefitssukesh gowda
 
Recruitment, Hiring And Retention
Recruitment, Hiring And RetentionRecruitment, Hiring And Retention
Recruitment, Hiring And RetentionMariaVyalkova
 
Employee retention strategy in bpo industry
Employee retention strategy in bpo industryEmployee retention strategy in bpo industry
Employee retention strategy in bpo industryDamyanti Vaishnav
 
Chapter 8: Human Resources
Chapter 8: Human ResourcesChapter 8: Human Resources
Chapter 8: Human Resourcesdmeyeravc
 
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...Project Report on Performance Appraisal System and Effectiveness in Flora Hot...
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...PS NEEMISH
 
Human Resource Management
Human Resource ManagementHuman Resource Management
Human Resource ManagementSuresh Rajan
 
Human resourcemanagement
Human resourcemanagementHuman resourcemanagement
Human resourcemanagementGeoffrey Gpals
 

What's hot (20)

Report on attrition rates of bpo and ites
Report on attrition rates of bpo and itesReport on attrition rates of bpo and ites
Report on attrition rates of bpo and ites
 
Human Resource Planning, Recruitment, Selection and Placement
Human Resource Planning, Recruitment, Selection and PlacementHuman Resource Planning, Recruitment, Selection and Placement
Human Resource Planning, Recruitment, Selection and Placement
 
Employee Retention Strategy
Employee Retention StrategyEmployee Retention Strategy
Employee Retention Strategy
 
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIES
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIESA STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIES
A STUDY TO REDUCE EMPLOYEE ATTRITION IN IT INDUSTRIES
 
Employee retention hr project in brandix
Employee retention hr project in brandixEmployee retention hr project in brandix
Employee retention hr project in brandix
 
Staffing Case Studies
Staffing Case StudiesStaffing Case Studies
Staffing Case Studies
 
A study on reward and recognition program 2016
A study on reward and recognition program  2016A study on reward and recognition program  2016
A study on reward and recognition program 2016
 
Recruitment, Selection, and Placement
Recruitment, Selection, and PlacementRecruitment, Selection, and Placement
Recruitment, Selection, and Placement
 
Staffing services
Staffing servicesStaffing services
Staffing services
 
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIA
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIAA STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIA
A STUDY ON EMPLOYEE RETENTION IN EDUCATION SECTOR IN INDIA
 
Attrition and long term skill retention
Attrition and long term skill retentionAttrition and long term skill retention
Attrition and long term skill retention
 
The Hr Department Presentation 2011
The Hr Department Presentation 2011The Hr Department Presentation 2011
The Hr Department Presentation 2011
 
Project report on compensation and benefits
Project report on compensation and benefitsProject report on compensation and benefits
Project report on compensation and benefits
 
Recruitment, Hiring And Retention
Recruitment, Hiring And RetentionRecruitment, Hiring And Retention
Recruitment, Hiring And Retention
 
Staffing Process
Staffing ProcessStaffing Process
Staffing Process
 
Employee retention strategy in bpo industry
Employee retention strategy in bpo industryEmployee retention strategy in bpo industry
Employee retention strategy in bpo industry
 
Chapter 8: Human Resources
Chapter 8: Human ResourcesChapter 8: Human Resources
Chapter 8: Human Resources
 
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...Project Report on Performance Appraisal System and Effectiveness in Flora Hot...
Project Report on Performance Appraisal System and Effectiveness in Flora Hot...
 
Human Resource Management
Human Resource ManagementHuman Resource Management
Human Resource Management
 
Human resourcemanagement
Human resourcemanagementHuman resourcemanagement
Human resourcemanagement
 

Similar to HR Analytics Project EEB

Creating Performance Based Culture through proper people management
Creating Performance Based Culture through proper people managementCreating Performance Based Culture through proper people management
Creating Performance Based Culture through proper people managementKenny Ong
 
Labor Management For The 21st Century
Labor Management For The 21st CenturyLabor Management For The 21st Century
Labor Management For The 21st CenturySteve Johnson
 
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptx
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptxMETHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptx
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptxDanielEssien9
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolioflodhi
 
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...HRsoft - Talent Management Software
 
KPI Calculus for BSC Performance & Progress Estimation
KPI Calculus for BSC Performance & Progress EstimationKPI Calculus for BSC Performance & Progress Estimation
KPI Calculus for BSC Performance & Progress EstimationFarooq Omar
 
Compensation management
Compensation managementCompensation management
Compensation managementBinty Agarwal
 
READ MEIf you need assistance using Excel, you can access a tutori.docx
READ MEIf you need assistance using Excel, you can access a tutori.docxREAD MEIf you need assistance using Excel, you can access a tutori.docx
READ MEIf you need assistance using Excel, you can access a tutori.docxmakdul
 
Metrics on employee management
Metrics on employee managementMetrics on employee management
Metrics on employee managementHR Spot
 
IBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceIBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceShivangiKrishna
 
Working Capital Modeling PowerPoint Presentation Slides
Working Capital Modeling PowerPoint Presentation Slides Working Capital Modeling PowerPoint Presentation Slides
Working Capital Modeling PowerPoint Presentation Slides SlideTeam
 
46834Strategic Human Resource Management - Session 6.pptx
46834Strategic Human Resource Management - Session 6.pptx46834Strategic Human Resource Management - Session 6.pptx
46834Strategic Human Resource Management - Session 6.pptxMuhammadAsghar955025
 
Bell curve appraisal
Bell curve appraisalBell curve appraisal
Bell curve appraisaldhiraj2hrd
 
Body Shop Creating and Implementing Team
Body  Shop  Creating and Implementing TeamBody  Shop  Creating and Implementing Team
Body Shop Creating and Implementing TeamRANDY FERRESE
 
European Study Group with Industry - One2tribe's problem presentation
European Study Group with Industry - One2tribe's problem presentationEuropean Study Group with Industry - One2tribe's problem presentation
European Study Group with Industry - One2tribe's problem presentationone2tribe
 
ACC 561 GENIUS Remember Education--acc561genius.com
ACC 561 GENIUS Remember Education--acc561genius.comACC 561 GENIUS Remember Education--acc561genius.com
ACC 561 GENIUS Remember Education--acc561genius.comchrysanthemu8
 
ACC 561 GENIUS Introduction Education--acc561genius.com
ACC 561 GENIUS Introduction Education--acc561genius.comACC 561 GENIUS Introduction Education--acc561genius.com
ACC 561 GENIUS Introduction Education--acc561genius.comagathachristie291
 

Similar to HR Analytics Project EEB (20)

Creating Performance Based Culture through proper people management
Creating Performance Based Culture through proper people managementCreating Performance Based Culture through proper people management
Creating Performance Based Culture through proper people management
 
Labor Management For The 21st Century
Labor Management For The 21st CenturyLabor Management For The 21st Century
Labor Management For The 21st Century
 
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptx
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptxMETHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptx
METHODOLOGY FOR SOLVING EMPLOYEE ATTRITION PROBLEM- Daniel Essien.pptx
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...
Comp Planning Pros: How To Design A Cash Compensation Program For Global & Gr...
 
SSAS and MDX
SSAS and MDXSSAS and MDX
SSAS and MDX
 
KPI Calculus for BSC Performance & Progress Estimation
KPI Calculus for BSC Performance & Progress EstimationKPI Calculus for BSC Performance & Progress Estimation
KPI Calculus for BSC Performance & Progress Estimation
 
Compensation management
Compensation managementCompensation management
Compensation management
 
READ MEIf you need assistance using Excel, you can access a tutori.docx
READ MEIf you need assistance using Excel, you can access a tutori.docxREAD MEIf you need assistance using Excel, you can access a tutori.docx
READ MEIf you need assistance using Excel, you can access a tutori.docx
 
Metrics on employee management
Metrics on employee managementMetrics on employee management
Metrics on employee management
 
IBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & PerformanceIBM HR Analytics Employee Attrition & Performance
IBM HR Analytics Employee Attrition & Performance
 
Working Capital Modeling PowerPoint Presentation Slides
Working Capital Modeling PowerPoint Presentation Slides Working Capital Modeling PowerPoint Presentation Slides
Working Capital Modeling PowerPoint Presentation Slides
 
Incentive Schemes
Incentive SchemesIncentive Schemes
Incentive Schemes
 
46834Strategic Human Resource Management - Session 6.pptx
46834Strategic Human Resource Management - Session 6.pptx46834Strategic Human Resource Management - Session 6.pptx
46834Strategic Human Resource Management - Session 6.pptx
 
Agile Business Value
Agile Business ValueAgile Business Value
Agile Business Value
 
Bell curve appraisal
Bell curve appraisalBell curve appraisal
Bell curve appraisal
 
Body Shop Creating and Implementing Team
Body  Shop  Creating and Implementing TeamBody  Shop  Creating and Implementing Team
Body Shop Creating and Implementing Team
 
European Study Group with Industry - One2tribe's problem presentation
European Study Group with Industry - One2tribe's problem presentationEuropean Study Group with Industry - One2tribe's problem presentation
European Study Group with Industry - One2tribe's problem presentation
 
ACC 561 GENIUS Remember Education--acc561genius.com
ACC 561 GENIUS Remember Education--acc561genius.comACC 561 GENIUS Remember Education--acc561genius.com
ACC 561 GENIUS Remember Education--acc561genius.com
 
ACC 561 GENIUS Introduction Education--acc561genius.com
ACC 561 GENIUS Introduction Education--acc561genius.comACC 561 GENIUS Introduction Education--acc561genius.com
ACC 561 GENIUS Introduction Education--acc561genius.com
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

HR Analytics Project EEB

  • 1.                 HR Analytics: Why are our best and most experienced employees leaving prematurely? Erik Bebernes                              
  • 2. Introduction     This  project  uses  a  dataset  I  found  on  kaggle,  where  a  company  has  been  experiencing  difficulty   retaining  their  best  and  most  experienced  employees.  The  data  frame  consists  of  15,000   observations  of  10  variables,  which  are:     names(hr)    [1]  "satisfaction_level"        "last_evaluation"              "number_project"                  [4]  "average_montly_hours"    "time_spend_company"        "Work_accident"                    [7]  "left"                                    "promotion_last_5years"  "sales"                                   [10]  "salary"       Satisfaction  Level  –  employees  overall  job  satisfaction  level  based  on  a  survey   Last  Evaluation  –  employees  performance  score  given  by  their  manager   Number  of  projects  –  how  many  projects  an  employee  has  been  involved  in   Average  monthly  hours-­‐  mean  hours  worked  by  employee  per  month   Time  spend  company  –  years  employee  has  worked  for  the  company   Work  accident  –  binary  variable  indicating  if  1,  the  employee  has  had  an  accident  in  the   workplace   Left-­‐  indicated  if  1,  the  employee  has  left  or  0,  the  employee  is  still  at  the  company   Promotion  last  5  years  –  binary  variable  signaling  if  the  employee  has  been  promoted   Sales-­‐  categorical  variable  on  job  type   Salary-­‐  categorical  variable  (low,  medium,  high)  of  how  much  the  employee  is  paid  annually       My  approach  to  this  project  can  be  summarized  in  the  following  steps:   1.)   Clean  and  structure  the  data  set,  including  imputing  missing  values  if  necessary   2.)   Create  subsets  between  the  best  employees  that  left  and  stayed   3.)   Create  discrete  factor  variables  and  perform  association  rules  analysis   4.)   Classify  employees  through  decision  tree  analysis   5.)   Find  any  significant  correlations,  and  differences  in  correlations  between  said  subsets.   6.)   Exploratory  visualization  analysis  in  an  attempt  to  explain  any  discrepancies  in   correlations.   7.)   Run  a  random  forest  algorithm  to  confirm  significant  relationships  between  the   variables,  as  well  as  a  logistic  regression   8.)   Provide  conclusions  and  recommendations  for  management       HR_comma_sep  <-­‐  read.csv("~/Downloads/HR_comma_sep.csv",  header=TRUE)   View(HR_comma_sep)   hr<-­‐HR_comma_sep        
  • 3.     Cleaning  and  structuring  the  dataset     At  first  glance  the  dataset  seems  clean,  but  to  make  sure  I’m  going  to  use  the  “amelia”  package   to  identify  any  missingness.     library(Amelia)   missmap(hr)           This  shows  that  there  is  no  missing  data.   >  str(hr)   'data.frame':   14999  obs.  of    10  variables:    $  satisfaction_level      :  num    0.38  0.8  0.11  0.72  0.37  0.41  0.1  0.92  0.89  0.42  ...    $  last_evaluation            :  num    0.53  0.86  0.88  0.87  0.52  0.5  0.77  0.85  1  0.53  ...    $  number_project              :  int    2  5  7  5  2  2  6  5  5  2  ...  
  • 4.  $  average_montly_hours  :  int    157  262  272  223  159  153  247  259  224  142  ...    $  time_spend_company      :  int    3  6  4  5  3  3  4  5  5  3  ...    $  Work_accident                :  int    0  0  0  0  0  0  0  0  0  0  ...    $  left                                  :  int    1  1  1  1  1  1  1  1  1  1  ...    $  promotion_last_5years:  int    0  0  0  0  0  0  0  0  0  0  ...    $  sales                                :  Factor  w/  10  levels  "accounting","hr",..:  8  8  8  8  8  8  8  8  8  8  ...    $  salary                              :  Factor  w/  3  levels  "high","low","medium":  2  3  3  2  2  2  2  2  2  2  ...     Subsets     hrbestleft<-­‐hr[which(hr$Last_eval>.72  &  hr$Left  ==  1),]   #employees  with  high  evaluations  and  who  left  the  company     hrbeststay<-­‐hr[which(hr$Last_eval>.72  &  hr$Left  ==  '0'),]   #employees  with  high  evaluations  that  left  the  company     Creating  Discrete  Variables  and  Association  Rules  Analysis     quantile(hr$average_montly_hours,  .33)   quantile(hr$average_montly_hours,  .67)   hr$Hours_Discrete[hr$average_montly_hours  <=  69]<-­‐  'low'   hr$Hours_Discrete[hr$average_montly_hours  >69    &  hr$average_montly_hours  <  134]<-­‐   'average'   hr$Hours_Discrete[hr$average_montly_hours  >=134]<-­‐  'high'     quantile(hr$satisfaction_level,  .33)   quantile(hr$satisfaction_level,  .67)   quantile(hr$satisfaction_level,  .8)     hr$Sat_Discrete[hr$satisfaction_level  <=  43]<-­‐  'low'   hr$Sat_Discrete[hr$satisfaction_level  >43    &  hr$satisfaction_level  <  68]<-­‐  'average'   hr$Sat_Discrete[hr$satisfaction_level  >=68]<-­‐  'high'     library(arules)   hr$Work_accident<-­‐as.factor(hr$Work_accident)   hr$left<-­‐as.factor(hr$left)   hr$promotion_last_5years<-­‐as.factor(hr$promotion_last_5years)   hr$Hours_Discrete<-­‐as.factor(hr$Hours_Discrete)   hr$Sat_Discrete<-­‐as.factor(hr$Sat_Discrete)   names(hr)   hrassoc<-­‐hr[,c(6,7,8,9,10,11,12)]   rules<-­‐apriori(hrassoc,  parameter  =  list(support  =  .2,  confidence  =  .7))    
  • 5. #since  the  majority  of  employees  haven't  left,  it  will  be  a  good  idea  to  reduce  support  and   increase  confidence     rules<-­‐apriori(hrassoc,  parameter  =  list(support  =  .05,  confidence  =  .95))     #still  not  getting  any  interesting  rules,  so  I'll  make  a  new  dataset  with  only  left  =1     hrleft<-­‐hr[which(hrassoc$left==1),]   hrleft<-­‐hrleft[,c(6:12)]   rules<-­‐apriori(hrleft,  parameter  =  list(support  =  .3,  confidence  =  1))   inspect(rules)          lhs                                                    rhs                                    support  confidence  lift   [1]    {}                                                =>  {left=1}                      1.0000000                    1        1   [2]    {}                                                =>  {Sat_Discrete=low}  1.0000000                    1        1   [3]    {salary=medium}                      =>  {left=1}                      0.3688043                    1        1   [4]    {salary=medium}                      =>  {Sat_Discrete=low}  0.3688043                    1        1   [5]    {salary=low}                            =>  {left=1}                      0.6082330                    1        1   [6]    {salary=low}                            =>  {Sat_Discrete=low}  0.6082330                    1        1   [7]    {Hours_Discrete=high}          =>  {left=1}                      0.9106693                    1        1   [8]    {Hours_Discrete=high}          =>  {Sat_Discrete=low}  0.9106693                    1        1   [9]    {Work_accident=0}                  =>  {left=1}                      0.9526743                    1        1   [10]  {Work_accident=0}                  =>  {Sat_Discrete=low}  0.9526743                    1        1   [11]  {promotion_last_5years=0}  =>  {left=1}                      0.9946794                    1        1   [12]  {promotion_last_5years=0}  =>  {Sat_Discrete=low}  0.9946794                    1        1   [13]  {left=1}                                    =>  {Sat_Discrete=low}  1.0000000                    1        1   [14]  {Sat_Discrete=low}                =>  {left=1}                      1.0000000                    1        1   [15]  {salary=medium,                                                                                                                                  Hours_Discrete=high}          =>  {left=1}                      0.3385606                    1        1   [16]  {salary=medium,                                                                                                                                  Hours_Discrete=high}          =>  {Sat_Discrete=low}  0.3385606                    1        1   [17]  {Work_accident=0,                                                                                                                              salary=medium}                      =>  {left=1}                      0.3480818                    1        1   [18]  {Work_accident=0,                                                                                                                              salary=medium}                      =>  {Sat_Discrete=low}  0.3480818                    1        1   [19]  {promotion_last_5years=0,                                                                                                              salary=medium}                      =>  {left=1}                      0.3674041                    1        1   [20]  {promotion_last_5years=0,                                                                                                              salary=medium}                      =>  {Sat_Discrete=low}  0.3674041                    1        1   [21]  {left=1,                                                                                                                                                salary=medium}                      =>  {Sat_Discrete=low}  0.3688043                    1        1   [22]  {salary=medium,                                                                                                                                  Sat_Discrete=low}                =>  {left=1}                      0.3688043                    1        1   [23]  {salary=low,                                                                                                                            
  • 6.            Hours_Discrete=high}          =>  {left=1}                      0.5527863                    1        1   [24]  {salary=low,                                                                                                                                        Hours_Discrete=high}          =>  {Sat_Discrete=low}  0.5527863                    1        1   [25]  {Work_accident=0,                                                                                                                              salary=low}                            =>  {left=1}                      0.5816298                    1        1   [26]  {Work_accident=0,                                                                                                                              salary=low}                            =>  {Sat_Discrete=low}  0.5816298                    1        1   [27]  {promotion_last_5years=0,                                                                                                              salary=low}                            =>  {left=1}                      0.6043125                    1        1   [28]  {promotion_last_5years=0,                                                                                                              salary=low}                            =>  {Sat_Discrete=low}  0.6043125                    1        1   [29]  {left=1,                                                                                                                                                salary=low}                            =>  {Sat_Discrete=low}  0.6082330                    1        1   [30]  {salary=low,                                                                                                                                        Sat_Discrete=low}                =>  {left=1}                      0.6082330                    1        1     Most  Interesting  rules:   1.)  of  the  people  who  left,  99%  never  received  a  promotion   2.)  95%  never  had  an  accident   3.)  60%  were  low  salary   4.)  100%  had  low  job  satisfaction     These  rules  signify  a  few  important  relationships  between  the  variables  that  may  explain  why   some  employees  are  leaving.  Of  the  employees  who  left,  99%  never  had  an  accident,  60%  were   low  salary  and  an  astonishing  100%  had  low  job  satisfaction.  This  must  mean  satisfaction  is   significant  in  determining  leaving  vs.  staying.  Next  I’m  going  to  look  at  correlations  between   satisfaction  and  the  numeric  variables.     Correlation  Analysis     Using  all  employees  in  the  dataset:     cor(hr[,1:5])                                                                                satisfaction_level  last_evaluation  number_project  average_montly_hours   satisfaction_level                                  1.00000000                        0.1050212                -­‐0.1429696                            -­‐0.02004811   last_evaluation                                          0.10502121              1.0000000                            0.3493326                              0.33974180   number_project                                      -­‐0.14296959              0.3493326                          1.0000000                          0.41721063   average_montly_hours              -­‐0.02004811              0.3397418                          0.4172106                      1.00000000   time_spend_company                  -­‐0.10086607              0.1315907                        0.1967859                      0.12775491                                                                                                            time_spend_company   satisfaction_level                                            -­‐0.1008661   last_evaluation                                                    0.1315907   number_project                                                  0.1967859  
  • 7. average_montly_hours                          0.1277549   time_spend_company                            1.0000000         The  above  plot  and  output  shows  correlations  between  numeric  variables  of  all  employees.   Managers  seem  to  give  higher  evaluation  scores  to  employees  who  work  more  hours  and  who   have  more  projects,  however  there  is  a  negative  correlation  between  employee  satisfaction  and   number  of  projects.  It  should  be  interesting  to  see  how  this  compares  to  correlations  using  just   the  best  employees.     Correlations  using  just  the  best  employees  and  most  experienced  employees  that  left:     >  hrbestleft<-­‐hr[which(hr$last_evaluation  >=  .72  &  hr$left  ==  1),]   >  cor(hrbestleft[,1:5])                                                    
  • 8.                                                                          satisfaction_level  last_evaluation  number_project   satisfaction_level                        1.0000000              0.3611564          -­‐0.7370609   last_evaluation                              0.3611564              1.0000000          -­‐0.2150533   number_project                              -­‐0.7370609            -­‐0.2150533            1.0000000   average_montly_hours                  -­‐0.4771749            -­‐0.1261519            0.5217016   time_spend_company                        0.6582700              0.3147566          -­‐0.3644283                                                                        average_montly_hours  time_spend_company   satisfaction_level                          -­‐0.4771749                    0.6582700   last_evaluation                                -­‐0.1261519                    0.3147566   number_project                                    0.5217016                  -­‐0.3644283   average_montly_hours                        1.0000000                  -­‐0.1572702   time_spend_company                          -­‐0.1572702                    1.0000000         There  are  some  very  notable  differences  here,  including  the  massive  negative  correlations   between  number  of  projects  and  satisfaction  level  and  the  large  negative  correlation  between   average  monthly  hours  and  satisfaction  level.  This  probably  means  that  managers  are   overworking  their  best  employees,  which  leads  to  lower  satisfaction  levels.  It’s  worth  looking  at  
  • 9. the  data  visually  to  see  if  this  is  in  fact  the  case.  I’ll  also  run  a  decision  tree  analysis  which  may   serve  as  a  confirmation.     Interpreting  Correlation  Differences  Visually     Do  the  best  employees  work  more  hours?           Comparing  these  histograms,  it’s  clear  that  employees  that  score  higher  on  manager   evaluations  are  working  considerably  more  hours  than  the  workforce  as  a  whole.   Do  the  best  employees  work  on  more  projects?    
  • 10. Yes,  the  best  employees  usually  have  more  projects.  There  is  a  downward  trend  as  the  number   of  projects  increase  when  you  look  at  the  workforce  as  a  whole,  and  the  opposite  can  almost  be   said  for  the  best  employees  (until  you  get  to  6  projects).     Have  the  best  employees  been  working  at  the  company  for  a  longer  period  of  time?     Almost  all  of  the  best  employees  have  been  at  the  company  for  at  least  four  years,  perhaps  this   can  be  related  to  “learning  by  doing.”  It’s  also  a  sufficient  amount  of  time  to  prove  to  managers   that  they  are  high  performing.  The  dataset  as  a  whole  shows  that  there  are  an  abundance  of   employees  who  have  been  there  for  2  and  3  years.  Let’s  see  if  anyone  is  being  promoted.    
  • 11. As  you  can  see  above,  of  the  best  performing  employees…hardly  any  of  them  have  been   promoted  in  the  last  five  years.  In  fact,  it’s  only  .2%.  It  must  be  discouraging  to  these  employees   to  be  highly  evaluated  and  not  be  rewarded  for  it.     Next  I’m  going  to  look  at  the  relationship  between  job  type  and  salary.  Are  there  noticeable   differences  in  pay  between  different  departments  of  the  company?  And  how  many  employees   are  in  each  department?     A  couple  of  things  I  noticed  while  looking  at  this  graph  are  that  a  majority  of  the  good   employees  are  on  the  low  end  of  the  salary  spectrum  and  most  of  them  are  working  in  sales,   support  in  technical  roles.  However  I  made  the  same  graph  using  the  dataset  as  a  whole  and   didn’t  see  much  of  a  difference,  so  I’ll  put  these  observations  aside  for  now.     As  I  mentioned  earlier  during  my  association  rules  analysis,  satisfaction  is  most  likely  significant   in  determining  why  the  best  employees  are  leaving.  The  plot  below  is  an  attempt  to  see  that   relationship  visually,  where  the  green  density  is  the  subset  of  the  best  employees  that  left,  the   red  density  are  the  best  employees  that  have  stayed,  and  the  blue  density  is  the  entire  dataset.          
  • 12. p1<-­‐ggplot()+geom_density(data  =  hrbestleft,  aes(satisfaction_level),  fill  =  'green',  alpha  =  .3)+      geom_density(data  =  hrbeststay,  aes(satisfaction_level),  fill  =  'red',  alpha  =  .3)+      geom_density(data  =  hr,  aes(satisfaction_level),  fill  =  'blue',  alpha  =  .3)+theme_light(base_size   =  16)+xlab("Satisfaction  Level")+ylab("")+      ggtitle("Satisfaction  Levels  of  Subsets")         The  best  employees  that  left  (green)  is  what  really  stands  out  here.  Many  of  them  have  very   low  satisfaction  levels  (<.25),  then  there  is  a  lull,  and  then  another  group  with  satisfaction  levels   greater  than  .6.  It’s  difficult  to  say  why  this  might  be.  Perhaps  there  is  a  difference  in  how  the   employees  interpret  satisfaction.  It’s  possible  that  they  still  enjoyed  their  job  despite  being  over   worked  and  not  being  promoted.  I  think  the  best  way  to  figure  this  out  is  through  a  decision   tree  analysis,  where  those  who  left  will  be  classified  more  accurately.  But  first,  I  want  to   combine  average  monthly  hours  and  satisfaction  into  a  plot.  Since  I  noticed  earlier  that  the   good  employees  that  left  were  working  a  lot  more  hours,  there  should  be  a  strong  relationship   between  the  two.     plot6<-­‐ggplot(hr,  aes(satisfaction_level,  average_montly_hours,  color  =  left,  alpha  =   .3))+geom_point()+ggtitle("Hours  and  Satisfaction")      
  • 13.     These  distributions  are  very  tight,  which  tells  me  that  the  decision  tree  will  be  a  great  addition   to  my  analysis.  The  blue  box  must  be  underperforming  employees,  those  that  have  not  been   working  many  hours  and  aren’t  that  satisfied.  Where  the  other  two  blue  distributions,  judging   by  the  density  plots  on  the  previous  page,  are  high  performing  employees.  My  next  plot  is   another  confirmation  of  that  hypothesis,  but  this  time  I’m  adding  years  spent  at  the  company.    
  • 14. The  cluster  on  the  right  has  a  lot  of  employees  that  have  been  at  the  company  for  a  long  time,  I   think  the  lack  of  promotions  may  have  something  to  do  with  them  leaving.     Decision  Tree  Analysis     Decision  trees  are  best  used  on  small  datasets,  so  in  order  to  get  a  few  simple  rules  (and  to   avoid  over-­‐fitting  the  model)  I  made  a  small  sample  of  the  data  (2%).       install.packages("party")   library(party)   set.seed(421)   ind<-­‐sample(2,  nrow(hr),  replace  =  TRUE,  prob  =  c(0.02,0.3))   traindata<-­‐hr[ind==1,]   testdata<-­‐hr[ind==2,]   form<-­‐left~satisfaction_level+average_montly_hours+time_spend_company+last_evaluation   hrtree<-­‐ctree(form,  data  =  traindata,  controls  =  ctree_control(maxsurrogate  =  3))   table(predict(hrtree),  traindata$left)   plot(hrtree,  type  =  "simple")   ?ctree      print(hrtree)         Using  the  variables  time  spent  at  company,  satisfaction,  average  monthly  hours  and  last   evaluation  (what  I  think  are  the  most  important  variables  based  on  the  visualizations  I  made)  I   was  able  to  come  up  with  a  few  rules  that  help  classify  employees  into  the  leaving  and  staying   categories.  Here  are  my  key  takeaways:   1.)   Employees  with  low  satisfaction  levels,  but  haven’t  been  at  the  company  long  will  generally   stay.   satisfaction_level p < 0.001 1 ≤ 0.46 > 0.46 time_spend_company p < 0.001 2 ≤ 4 > 4 time_spend_company p = 0.001 3 ≤ 2 > 2 n = 21 y = (0.952, 0.048) 4 n = 217 y = (0.258, 0.742) 5 n = 46 y = (0.891, 0.109) 6 time_spend_company p < 0.001 7 ≤ 4 > 4 n = 562 y = (0.984, 0.016) 8 last_evaluation p < 0.001 9 ≤ 0.8 > 0.8 n = 61 y = (0.951, 0.049) 10 average_montly_hours p < 0.001 11 ≤ 216 > 216 n = 18 y = (1, 0) 12 time_spend_company p = 0.001 13 ≤ 5 > 5 n = 37 y = (0.081, 0.919) 14 n = 22 y = (0.273, 0.727) 15
  • 15. 2.)   Employees  with  low  satisfaction  levels  and  who  have  been  at  the  company  between  2  and   5  years  leave.   3.)   Employees  with  high  satisfaction  levels  who  have  been  working  for  less  than  or  equal  to  4   years  stay.   4.)   High  performing  employees  with  high  satisfaction  and  who  have  been  at  the  company  >4   years  leave  when  they  are  working  too  many  hours.     This  analysis  is  91.5%  accurate,  which  is  pretty  good  considering  how  simple  the  tree  is.  If  I   were  to  show  management  one  graph  it  would  be  this,  it  identifies  clear  cut  patterns  and   confirms  much  of  what  I  had  been  hypothesizing  with  my  previous  analyses.     Random  Forest  and  Logistic  Regression     Before  offering  my  final  advice  to  management,  I  want  to  see  how  accurately  I  can  predict  who   is  going  to  leave.  An  accurate  machine  learning  algorithm  will  allow  the  company  to  focus  on   specific  employees…perhaps  offering  them  a  raise  or  reducing  their  hours  before  they  decide  to   leave.  First  I’m  going  to  try  a  logistic  regression,  which  determines  probabilities  of  a  binary   dependent  variable  for  each  observation.  Any  probability  greater  than  .5  will  mean  the   employee  will  leave.  Let’s  see  how  it  goes:     Logistic  Regression:     #creating  a  test  and  training  set  using  dplyr   set.seed(142)   train<-­‐sample_frac(hr,  .7)   sid<-­‐as.numeric(rownames(train))   test<-­‐hr[-­‐sid,]     fitted.results<-­‐predict(glmmodel,  newdata  =  test,  type  =  "response")   #type  =  response  converts  logits  to  predicted  probabilities   new<-­‐mutate(test,  fitted.results)   predicted.to.leave<-­‐filter(new,  fitted.results  >  .5)   predicted.to.stay<-­‐filter(new,  fitted.results  <  .5)   View(predicted.to.stay)   summary(predicted.to.stay$left)   summary(predicted.to.leave$left)     The  model  ended  up  being  only  79.4%  accurate.  Which  is  okay,  but  considering  the  decision   tree  was  91%,  I  think  I  can  come  up  with  a  better  model.  Random  forest  works  by  averaging  the   results  of  many  decision  trees  and  can  work  very  well.  Let’s  try  that:     randindex<-­‐  sample(1:dim(hr)[1])   cutpoint2_3<-­‐floor(2*dim(hr)[1]/3)  
  • 16. traindata<-­‐hr[randindex[1:cutpoint2_3],]   testdata<-­‐  hr[randindex[(cutpoint2_3+1):dim(hr)[1]],]   library(randomForest)   rfmodel  <-­‐  randomForest(factor(left)  ~  satisfaction_level  +  number_project  +   average_montly_hours  +                                                        time_spend_company  +  promotion_last_5years  +  last_evaluation,                                                  data  =  traindata)     plot9<-­‐plot(rfmodel,  ylim=c(0,0.36))         The  false  positive  and  false  negative  errors  are  very  low,  which  is  a  good  sign.  Let’s  see  how   accurate  the  model  is  when  I  try  it  on  a  test  set.     prediction<-­‐predict(rfmodel,  testdata)       confusionMatrix(prediction,  testdata$left)   Confusion  Matrix  and  Statistics  
  • 17.                      Reference   Prediction        0        1                    0  3786      48                    1      10  1156                                                                                                                    Accuracy  :  0.9884                                                      95%  CI  :  (0.985,  0.9912)          No  Information  Rate  :  0.7592                            P-­‐Value  [Acc  >  NIR]  :  <  2.2e-­‐16                                                                                                                                      Kappa  :  0.9679                      Mcnemar's  Test  P-­‐Value  :  1.184e-­‐06                                                                                                                          Sensitivity  :  0.9974                                            Specificity  :  0.9601                                      Pos  Pred  Value  :  0.9875                                      Neg  Pred  Value  :  0.9914                                              Prevalence  :  0.7592                                      Detection  Rate  :  0.7572                          Detection  Prevalence  :  0.7668                                Balanced  Accuracy  :  0.9787                                                                                                                      'Positive'  Class  :  0                                 The  model  is  98.84%  accurate,  this  will  prove  to  be  very  beneficial  in  identifying  employees  that   are  likely  to  be  leaving  in  the  future.  What  variables  are  most  important  in  leaving  vs.  staying?     importance(rfmodel)                                                                                                              MeanDecreaseGini   satisfaction_level                                            1226.048093   number_project                                                665.390311   average_montly_hours                        536.922188   time_spend_company                          664.193153   promotion_last_5years                      4.487941   last_evaluation                                                  430.694068     According  to  the  random  forest  model,  satisfaction,  number  of  projects  and  time  spent  at  the   company  are  the  three  most  significant  variables.          
  • 18. Conclusion  and  Recommendations     I  very  much  enjoyed  learning  more  about  this  dataset.  I  performed  so  many  types  of  analyses   because  retaining  a  company’s  best  employees  is  extremely  important.  High  turnover  is  costly,   and  if  a  company  wants  to  grow  you  need  the  right  people  leading  the  way.  I’ve  worked  for   organizations  in  the  past  that  have  had  high  turnover  rates,  and  while  you  want   underperforming  employees  to  leave,  you  want  your  best  workers  to  grow  with  you.     What  I  found  most  useful  in  this  project  were  visualizations,  the  decision  tree  and  the  random   forest  algorithm.  They  all  can  be  used  in  different  ways.  If  management  wants  a  basic   understanding  of  what’s  going  on,  I  would  show  them  the  visuals,  if  they  want  to  know  what   patterns  are  harming  them,  I  would  go  over  the  decision  tree,  and  if  they  want  to  know  what   employees  will  leave  in  the  future,  the  random  forest  model  would  be  helpful.  Based  on  all  of   those,  here  are  the  two  key  points  management  should  know  concerning  why  their  best  and   most  experienced  employees  are  leaving  prematurely:     1.)   They  are  being  overworked  –  it’s  common  for  managers  to  take  advantage  of  employees   who  do  a  good  job  by  giving  them  a  heavier  workload.  This  is  costing  the  company,   because  they  are  deciding  to  leave.   2.)   They  aren’t  being  promoted-­‐  good  employees  expect  to  be  rewarded.  There  is  a  large   group  of  employees  with  high  satisfaction  levels  who  have  been  at  the  company  for   more  than  four  years,  but  they  decided  to  leave  because  there  isn’t  any  career  growth.       There  are  a  couple  of  simple,  obvious  actions  management  can  take.  They  shouldn’t  work  their   best  employees  more  than  anyone  else,  and  they  should  be  promoted  after  3  or  4  years.  In   time,  I  think  they  will  find  that  although  the  company  will  be  less  productive  in  the  short  run,   reducing  their  turnover  rate  of  their  best  employees  will  lead  to  incremental  growth.