SlideShare a Scribd company logo
1 of 13
Background	
  
Pas	
  de	
  Poisson	
  is	
  a	
  fishing	
  conglomerate	
  headquartered	
  in	
  Montreal,	
  CN.	
  	
  The	
  
fleet	
  is	
  located	
  remotely	
  in	
  two	
  loca?ons,	
  Halifax,	
  NS	
  and	
  St.	
  John’s	
  
Newfoundland.	
  	
  The	
  St.	
  John’s	
  fleets	
  primarily	
  work	
  the	
  near	
  shore	
  fishing	
  
grounds	
  of	
  Nova	
  Sco?a	
  and	
  Newfoundland	
  within	
  12	
  nau?cal	
  miles	
  from	
  
shore.	
  	
  The	
  Halifax	
  loca?ons,	
  however,	
  have	
  fishing	
  deployments	
  that	
  are	
  
located	
  much	
  further	
  offshore,	
  and	
  in	
  most	
  cases,	
  using	
  U.S.	
  territorial	
  waters	
  
in	
  the	
  North	
  Atlan?c	
  under	
  the	
  CANAM	
  bilateral	
  agreements.	
  	
  	
  
The	
  en?re	
  crew	
  of	
  the	
  St.	
  Johns	
  fleet	
  are	
  Canadian	
  residents.	
  	
  Hiring	
  managers	
  
ensure	
  that	
  90%	
  of	
  the	
  deck	
  hands	
  working	
  on	
  the	
  Halifax	
  fleet	
  are	
  foreign	
  
workers	
  as	
  the	
  labor	
  rate	
  is	
  significantly	
  lower	
  and	
  the	
  turnover	
  rate	
  is	
  6	
  ?mes	
  
the	
  rate	
  of	
  St.	
  Johns	
  because	
  the	
  weather	
  is	
  constantly	
  rough	
  in	
  the	
  North	
  
Atlan?c	
  crea?ng	
  excep?onally	
  poor	
  working	
  condi?ons,	
  but	
  paying	
  well.	
  
Execu?ve	
  Summary	
  
The	
  hiring	
  managers	
  of	
  Pas	
  de	
  Poissen	
  sought	
  the	
  guidance	
  of	
  a	
  consul?ng	
  
firm	
  to	
  determine	
  which	
  of	
  the	
  na?onality	
  of	
  the	
  foreign	
  work	
  force,	
  entering	
  
Canada,	
  would	
  have	
  the	
  highest	
  probability	
  that	
  a	
  judge	
  would	
  approve	
  their	
  
appeal	
  to	
  remain,	
  and	
  subsequently	
  be	
  employable	
  in	
  the	
  country.	
  	
  	
  
Establishing	
  a	
  model	
  to	
  best	
  determine	
  which	
  candidates	
  to	
  hire	
  provided	
  
excep?onal	
  cost	
  saving	
  opportuni?es.	
  	
  In	
  the	
  past,	
  if	
  the	
  company	
  was	
  
informed	
  that	
  one	
  of	
  their	
  new	
  foreign	
  na?onal	
  workers	
  was	
  not	
  granted	
  an	
  
appeal,	
  and	
  was	
  ac?vely	
  on	
  a	
  fishing	
  deployment,	
  at	
  ?mes	
  las?ng	
  for	
  over	
  45	
  
days,	
  the	
  trawler	
  was	
  forced	
  to	
  return	
  to	
  port.	
  	
  A	
  vessel	
  having	
  to	
  return	
  
equated	
  to	
  missed	
  opportunis?c	
  revenue,	
  as	
  it	
  could	
  no	
  longer	
  fish,	
  and	
  
unexpected	
  fuel	
  expenses	
  for	
  return	
  transit.	
  	
  Furthermore,	
  the	
  penalty	
  for	
  
knowing	
  employing	
  an	
  illegal	
  foreign	
  worker	
  was	
  harsh	
  from	
  both	
  the	
  
Canadian	
  and	
  U.S	
  fisheries	
  enforcement	
  agencies.	
  
Data	
  Integrity	
  
•  Source:	
  Ra[le	
  Library	
  
•  Name:	
  “Green:	
  Refugee	
  Appeal”	
  
•  “Cleaning”	
  steps	
  
•  Used	
  transform	
  tag	
  to	
  remove	
  missing	
  and	
  ignored	
  data	
  a`er	
  
comparing	
  the	
  original	
  and	
  “cleaned”	
  OOB	
  error	
  rates.	
  	
  
Addi?onally,	
  the	
  categorical	
  data	
  “judges”	
  was	
  deemed	
  to	
  be	
  
sta?s?cally	
  insignificant	
  for	
  our	
  purposes,	
  hence	
  it	
  was	
  omi[ed	
  
thus	
  increasing	
  the	
  integrity	
  of	
  the	
  the	
  dataset.	
  
•  Steps:	
  	
  In	
  order	
  to	
  fulfill	
  the	
  hiring	
  strategy	
  we	
  targeted	
  informa?on,	
  from	
  
the	
  data	
  (using	
  ra[le,	
  R	
  and	
  excel),	
  that	
  would	
  serve	
  to	
  determine	
  the	
  
informa?on	
  necessary	
  to	
  depict	
  future	
  hires	
  based	
  on	
  the	
  probability	
  to	
  
determine	
  an	
  approved	
  appeal.	
  	
  
Forest	
  Model	
  
!  Imported	
  the	
  data	
  and	
  Rescaled	
  	
  
!  Created	
  a	
  Forest	
  model	
  with	
  default	
  op?ons	
  
!  OOB	
  error	
  =30.62%	
  ,	
  Type	
  1=	
  16.12	
  %	
  and	
  Type	
  2	
  =65.5	
  %error,	
  AUC	
  =	
  0.644	
  
!  Our	
  business	
  requires	
  more	
  focus	
  on	
  Type	
  1	
  error	
  rather	
  than	
  Type	
  2	
  error	
  
!  Checked	
  the	
  trend	
  of	
  errors	
  and	
  importance	
  
!  Created	
  a	
  sample	
  of	
  35,35	
  
!  OOB	
  es?mate	
  of	
  	
  error	
  rate:	
  35.83%,	
  Type	
  1	
  error	
  rate	
  =	
  35.02%,	
  Type	
  2	
  error	
  rate	
  =	
  	
  
37.77%,	
  AUC	
  =	
  0.653	
  
!  Error	
  rate	
  increased,	
  Type	
  1	
  increased-­‐	
  not	
  good	
  
!  No	
  major	
  change,	
  although	
  type	
  2	
  decreased	
  
!  Look	
  for	
  a	
  be[er	
  one.	
  Prune	
  the	
  trees	
  at	
  minimum	
  complexity	
  
!  Here	
  tree	
  =	
  421	
  and	
  complexity	
  =	
  0.2913	
  
!  Now,	
  OOB	
  es?mate	
  of	
  	
  error	
  rate:	
  29.32% 	
  ,	
  AUC	
  =	
  0.646,	
  Type	
  1	
  error	
  =	
  14.28571%,	
  
Type	
  2	
  error=	
  65.55%	
  	
  
!  Type	
  2	
  is	
  s?ll	
  large	
  but	
  we	
  are	
  not	
  much	
  concerned	
  about	
  that.	
  
!  Best	
  model	
  so	
  far	
  
Forest	
  Model	
  
!  Create	
  Importance	
  level	
  of	
  Type	
  1,	
  Type	
  2	
  error	
  rate	
  by	
  sampling	
  data	
  (35,35)	
  
!  randomForest(formula	
  =	
  IMO_decision	
  ~	
  .,	
  data	
  =	
  crs$dataset[crs$sample,	
  
c(crs$input,	
  crs$target)],ntree	
  =	
  421,	
  mtry	
  =	
  5,	
  sampsize	
  =	
  c(35,	
  35),	
  
importance	
  =	
  TRUE,	
  replace	
  =	
  FALSE,	
  na.ac?on	
  =	
  na.roughfix)	
  
!  	
  OOB	
  es?mate	
  of	
  	
  error	
  rate:	
  36.48%,	
  Type	
  1	
  error	
  rate	
  =	
  36.4	
  %,	
  Type	
  2	
  error	
  
rate	
  =	
  36.6%	
  
!  OOB	
  increased	
  .	
  Type	
  1	
  increased	
  as	
  expected	
  .	
  Not	
  a	
  good	
  solu?on	
   	
  	
  
!  Our	
  Best	
  Solu?on	
  so	
  far	
  is	
  	
  	
  
!  95%	
  CI:	
  0.5462-­‐0.6554	
  (DeLong)	
  	
  
!  OOB	
  es?mate	
  of	
  	
  error	
  rate:	
  29.32%,	
  Type	
  1	
  error	
  rate	
  =	
  14.28%,	
  Type	
  2	
  error	
  
rate	
  =	
  65.6	
  %.	
  
!  Run	
  the	
  evalua?on	
  on	
  the	
  test	
  data	
  set	
  to	
  get	
  the	
  final	
  result.	
  
	
   	
   	
   	
  	
  
Final	
  Confusion	
  Matrix-­‐	
  Forest	
  Model	
  
Boos?ng	
  Model	
  
•  Run	
  the	
  Boos?ng	
  model	
  with	
  default	
  op?ons	
  
•  OOB	
  es?mate	
  of	
  	
  error	
  rate:	
  21.8%	
  
•  Type	
  1	
  error	
  rate	
  is	
  6.9%,	
  Type	
  2	
  error	
  rate	
  is	
  61.1	
  %.	
  Look	
  for	
  error	
  trends	
  and	
  importance	
  of	
  variables.	
  
Analysis-­‐	
  Success	
  and	
  language	
  are	
  major	
  predictors	
  
•  Training	
  Error	
  is	
  high	
  ini?ally,	
  down	
  warding	
  as	
  number	
  of	
  itera?ons	
  increase.	
  
•  Try	
  to	
  look	
  at	
  the	
  point	
  where	
  error	
  graph	
  becomes	
  constant.	
  
•  1’s	
  as	
  shown	
  in	
  the	
  graph	
  depict	
  the	
  trend,	
  but	
  the	
  trend	
  again	
  is	
  changing	
  beyond	
  itera?on	
  50.	
  
•  Build	
  more	
  itera?ons	
  to	
  figure	
  out	
  the	
  trend	
  and	
  the	
  point	
  a`er	
  which	
  error	
  rate	
  is	
  constant.	
  
•  Analysis-­‐	
  Success	
  and	
  language	
  are	
  major	
  predictors	
  
•  Build	
  the	
  model	
  with	
  itera?on	
  =	
  200	
  
•  Analysis-­‐:	
  The	
  trend	
  seems	
  clear.	
  A`er	
  140	
  itera?ons,	
  the	
  error	
  rate	
  graph	
  becomes	
  constant.	
  
•  Set	
  the	
  itera?ons	
  to	
  140	
  and	
  con?nue	
  the	
  boos?ng	
  model.	
  
•  Analysis-­‐:	
  OOB	
  error	
  is	
  21.2	
  %	
  but	
  Type	
  	
  2	
  errors	
  are	
  very	
  large.	
  	
  
•  AUC	
  =68%.	
  S?ll	
  room	
  for	
  improvement.	
  Set	
  the	
  importance	
  matrix.	
  We	
  need	
  less	
  Type	
  2	
  error.	
  
•  Call:	
  
ada(IMO_decision	
   ~	
   .,	
   data	
   =	
   crs$dataset[crs$train,	
   c(crs$input,	
   	
   crs$target)],	
   control	
   =	
  
rpart.control(maxdepth	
  =	
  30,	
  cp	
  =	
  0.01,	
   	
  minsplit	
  =	
  20,	
  xval	
  =	
  10),	
  parms	
  =	
  list(split	
  =	
  "informa?on",	
   	
  loss	
  =	
  
matrix(c(0,	
  1,	
  1.5,	
  0),	
  byrow	
  =	
  TRUE,	
  nrow	
  =	
  2)),	
  iter	
  =	
  140)	
  	
  
Final	
  Confusion	
  Matrix-­‐	
  Boos?ng	
  
Model	
  
•  	
   	
  
	
   	
   	
   	
  	
  
•  Analysis-­‐:	
  Best	
  so	
  far,	
  although	
  type	
  2	
  error	
  is	
  
s?ll	
  big	
   	
  	
  
•  Giving	
  more	
  importance	
  doesn’t	
  help 	
   	
  	
  
•  No	
  major	
  change	
  in	
  ROC.	
  
Comparison	
  of	
  Models	
  
Forest	
  Model	
   Boos,ng	
  Model	
  
Conclusion	
  
	
  	
  	
  	
  With	
  the	
  best	
  dataset,	
  it	
  shows	
  that	
  there	
  is	
  a	
  strong	
  sta?s?cal	
  significance	
  that	
  
Czechoslovakia,	
  exhibit	
  1,	
  is	
  the	
  na?on	
  with	
  the	
  highest	
  probability	
  of	
  winning	
  
appeal	
  based	
  on	
  data	
  analyzed	
  in	
  MS	
  Excel.	
  	
  Furthermore,	
  exhibit	
  2	
  shows	
  29%	
  of	
  
all	
  applicants	
  are	
  denied	
  their	
  appeal.	
  	
  Of	
  those	
  the	
  Rater,	
  person	
  who	
  determines	
  
the	
  merit	
  of	
  their	
  case	
  going	
  forward	
  predicts	
  with,	
  an	
  81%	
  confidence	
  rate	
  that,	
  
when	
  he	
  or	
  she	
  predicts	
  a	
  appeal	
  denial,	
  it	
  is	
  the	
  correct	
  predic?on,	
  conversely	
  
they	
  are	
  only	
  correct	
  48%	
  of	
  the	
  ?me	
  when	
  they	
  predict	
  an	
  awarded	
  appeal	
  by	
  the	
  
judge.	
  	
  Finally,	
  the	
  data	
  shows	
  that	
  most	
  applicants	
  the	
  seek	
  an	
  appeal	
  have	
  a	
  
higher	
  approval	
  probability	
  with	
  the	
  courts	
  in	
  Montreal	
  and	
  not	
  Toronto.	
  
	
  	
  	
  	
  	
  As	
  with	
  the	
  Appeal	
  data	
  (above)	
  the	
  same	
  inferences	
  can	
  be	
  established	
  with	
  
individual	
  Judge	
  data.	
  For	
  the	
  judges	
  tree,	
  exhibit	
  3,	
  if	
  we	
  assume	
  that	
  the	
  rater	
  
predicts	
  success	
  for	
  33-­‐34%	
  of	
  claimants,	
  72%	
  of	
  those	
  posi?ve	
  predic?ons	
  are	
  
cases	
  that	
  are	
  to	
  be	
  heard	
  by	
  judges	
  that	
  ARE	
  NOT	
  Heald,	
  Hugessen,	
  Iacobucci,	
  
MacGuigan,	
  Pra[e,	
  and	
  Stone.	
  We	
  can	
  infer	
  that	
  Desjardins,	
  Mahoney,	
  Marceau,	
  
and	
  Urie	
  ARE	
  judges	
  that	
  will	
  have	
  the	
  highest	
  probability	
  of	
  ruling	
  posi?ve	
  on	
  an	
  
appeal.	
  	
  Therefore,	
  as	
  Desjardins	
  is	
  from	
  Montreal	
  and	
  rules	
  favorably	
  on	
  
Czechoslovakian	
  na?onals,	
  it	
  would	
  behoove	
  the	
  company	
  to	
  create	
  a	
  goal	
  
congruent	
  strategy	
  that	
  favors	
  those	
  results.	
  
Exhibit	
  1	
  
Appeal	
  Rate	
  by	
  Na?on	
  
NATION	
   APPROVED	
  APPEAL	
  RATE	
  
CZECHOSLOVAKIA	
   73%	
  
SRI	
  LANKA	
   36%	
  
EL	
  SALVADOR	
   36%	
  
ARGENTINA	
   25%	
  
IRAN	
   25%	
  
CHINA	
   22%	
  
BULGARIA	
   7%	
  
Exhibit	
  2	
  
Exhibit	
  3	
  

More Related Content

Similar to Predictive Modeling using R

205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ballp6academy
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterJieming Wei
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...John Blue
 
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Eesti Pank
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structuresIFMR
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...55296
 
Adequacy of sample size in population surveys
Adequacy of sample size in population surveysAdequacy of sample size in population surveys
Adequacy of sample size in population surveysParasuram Balasubramanian
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain ManagersMichael DePue
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Matt Hansen
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishArsalan Qadri
 
Predictive Analytics, Predicting LIkely Donors and Donation Amounts
Predictive Analytics, Predicting LIkely Donors and Donation AmountsPredictive Analytics, Predicting LIkely Donors and Donation Amounts
Predictive Analytics, Predicting LIkely Donors and Donation AmountsMichele Vincent
 
Predicting Likely Donors and Donation Amounts
Predicting Likely Donors and Donation AmountsPredicting Likely Donors and Donation Amounts
Predicting Likely Donors and Donation AmountsMichele Vincent
 
Machine Learning Application: Credit Scoring
Machine Learning Application: Credit ScoringMachine Learning Application: Credit Scoring
Machine Learning Application: Credit Scoringeurosigdoc acm
 
How to Build Pay Grades and Salary Ranges
How to Build Pay Grades and Salary RangesHow to Build Pay Grades and Salary Ranges
How to Build Pay Grades and Salary RangesPayScale, Inc.
 
Understanding Uncertainty.pdf
Understanding Uncertainty.pdfUnderstanding Uncertainty.pdf
Understanding Uncertainty.pdfMohanadHussien2
 
Examples Of Comparing And Contrast Essays
Examples Of Comparing And Contrast EssaysExamples Of Comparing And Contrast Essays
Examples Of Comparing And Contrast EssaysAngelavette Dowdy
 

Similar to Predictive Modeling using R (20)

205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ball
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - Poster
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
 
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using ...
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structures
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
 
Adequacy of sample size in population surveys
Adequacy of sample size in population surveysAdequacy of sample size in population surveys
Adequacy of sample size in population surveys
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
 
Predictive Analytics, Predicting LIkely Donors and Donation Amounts
Predictive Analytics, Predicting LIkely Donors and Donation AmountsPredictive Analytics, Predicting LIkely Donors and Donation Amounts
Predictive Analytics, Predicting LIkely Donors and Donation Amounts
 
Predicting Likely Donors and Donation Amounts
Predicting Likely Donors and Donation AmountsPredicting Likely Donors and Donation Amounts
Predicting Likely Donors and Donation Amounts
 
Machine Learning Application: Credit Scoring
Machine Learning Application: Credit ScoringMachine Learning Application: Credit Scoring
Machine Learning Application: Credit Scoring
 
How to Build Pay Grades and Salary Ranges
How to Build Pay Grades and Salary RangesHow to Build Pay Grades and Salary Ranges
How to Build Pay Grades and Salary Ranges
 
BI PPT Finale
BI PPT FinaleBI PPT Finale
BI PPT Finale
 
Understanding Uncertainty.pdf
Understanding Uncertainty.pdfUnderstanding Uncertainty.pdf
Understanding Uncertainty.pdf
 
Hypothesis and Test
Hypothesis and TestHypothesis and Test
Hypothesis and Test
 
Examples Of Comparing And Contrast Essays
Examples Of Comparing And Contrast EssaysExamples Of Comparing And Contrast Essays
Examples Of Comparing And Contrast Essays
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 

Predictive Modeling using R

  • 1. Background   Pas  de  Poisson  is  a  fishing  conglomerate  headquartered  in  Montreal,  CN.    The   fleet  is  located  remotely  in  two  loca?ons,  Halifax,  NS  and  St.  John’s   Newfoundland.    The  St.  John’s  fleets  primarily  work  the  near  shore  fishing   grounds  of  Nova  Sco?a  and  Newfoundland  within  12  nau?cal  miles  from   shore.    The  Halifax  loca?ons,  however,  have  fishing  deployments  that  are   located  much  further  offshore,  and  in  most  cases,  using  U.S.  territorial  waters   in  the  North  Atlan?c  under  the  CANAM  bilateral  agreements.       The  en?re  crew  of  the  St.  Johns  fleet  are  Canadian  residents.    Hiring  managers   ensure  that  90%  of  the  deck  hands  working  on  the  Halifax  fleet  are  foreign   workers  as  the  labor  rate  is  significantly  lower  and  the  turnover  rate  is  6  ?mes   the  rate  of  St.  Johns  because  the  weather  is  constantly  rough  in  the  North   Atlan?c  crea?ng  excep?onally  poor  working  condi?ons,  but  paying  well.  
  • 2. Execu?ve  Summary   The  hiring  managers  of  Pas  de  Poissen  sought  the  guidance  of  a  consul?ng   firm  to  determine  which  of  the  na?onality  of  the  foreign  work  force,  entering   Canada,  would  have  the  highest  probability  that  a  judge  would  approve  their   appeal  to  remain,  and  subsequently  be  employable  in  the  country.       Establishing  a  model  to  best  determine  which  candidates  to  hire  provided   excep?onal  cost  saving  opportuni?es.    In  the  past,  if  the  company  was   informed  that  one  of  their  new  foreign  na?onal  workers  was  not  granted  an   appeal,  and  was  ac?vely  on  a  fishing  deployment,  at  ?mes  las?ng  for  over  45   days,  the  trawler  was  forced  to  return  to  port.    A  vessel  having  to  return   equated  to  missed  opportunis?c  revenue,  as  it  could  no  longer  fish,  and   unexpected  fuel  expenses  for  return  transit.    Furthermore,  the  penalty  for   knowing  employing  an  illegal  foreign  worker  was  harsh  from  both  the   Canadian  and  U.S  fisheries  enforcement  agencies.  
  • 3. Data  Integrity   •  Source:  Ra[le  Library   •  Name:  “Green:  Refugee  Appeal”   •  “Cleaning”  steps   •  Used  transform  tag  to  remove  missing  and  ignored  data  a`er   comparing  the  original  and  “cleaned”  OOB  error  rates.     Addi?onally,  the  categorical  data  “judges”  was  deemed  to  be   sta?s?cally  insignificant  for  our  purposes,  hence  it  was  omi[ed   thus  increasing  the  integrity  of  the  the  dataset.   •  Steps:    In  order  to  fulfill  the  hiring  strategy  we  targeted  informa?on,  from   the  data  (using  ra[le,  R  and  excel),  that  would  serve  to  determine  the   informa?on  necessary  to  depict  future  hires  based  on  the  probability  to   determine  an  approved  appeal.    
  • 4. Forest  Model   !  Imported  the  data  and  Rescaled     !  Created  a  Forest  model  with  default  op?ons   !  OOB  error  =30.62%  ,  Type  1=  16.12  %  and  Type  2  =65.5  %error,  AUC  =  0.644   !  Our  business  requires  more  focus  on  Type  1  error  rather  than  Type  2  error   !  Checked  the  trend  of  errors  and  importance   !  Created  a  sample  of  35,35   !  OOB  es?mate  of    error  rate:  35.83%,  Type  1  error  rate  =  35.02%,  Type  2  error  rate  =     37.77%,  AUC  =  0.653   !  Error  rate  increased,  Type  1  increased-­‐  not  good   !  No  major  change,  although  type  2  decreased   !  Look  for  a  be[er  one.  Prune  the  trees  at  minimum  complexity   !  Here  tree  =  421  and  complexity  =  0.2913   !  Now,  OOB  es?mate  of    error  rate:  29.32%  ,  AUC  =  0.646,  Type  1  error  =  14.28571%,   Type  2  error=  65.55%     !  Type  2  is  s?ll  large  but  we  are  not  much  concerned  about  that.   !  Best  model  so  far  
  • 5. Forest  Model   !  Create  Importance  level  of  Type  1,  Type  2  error  rate  by  sampling  data  (35,35)   !  randomForest(formula  =  IMO_decision  ~  .,  data  =  crs$dataset[crs$sample,   c(crs$input,  crs$target)],ntree  =  421,  mtry  =  5,  sampsize  =  c(35,  35),   importance  =  TRUE,  replace  =  FALSE,  na.ac?on  =  na.roughfix)   !   OOB  es?mate  of    error  rate:  36.48%,  Type  1  error  rate  =  36.4  %,  Type  2  error   rate  =  36.6%   !  OOB  increased  .  Type  1  increased  as  expected  .  Not  a  good  solu?on       !  Our  Best  Solu?on  so  far  is       !  95%  CI:  0.5462-­‐0.6554  (DeLong)     !  OOB  es?mate  of    error  rate:  29.32%,  Type  1  error  rate  =  14.28%,  Type  2  error   rate  =  65.6  %.   !  Run  the  evalua?on  on  the  test  data  set  to  get  the  final  result.            
  • 6. Final  Confusion  Matrix-­‐  Forest  Model  
  • 7. Boos?ng  Model   •  Run  the  Boos?ng  model  with  default  op?ons   •  OOB  es?mate  of    error  rate:  21.8%   •  Type  1  error  rate  is  6.9%,  Type  2  error  rate  is  61.1  %.  Look  for  error  trends  and  importance  of  variables.   Analysis-­‐  Success  and  language  are  major  predictors   •  Training  Error  is  high  ini?ally,  down  warding  as  number  of  itera?ons  increase.   •  Try  to  look  at  the  point  where  error  graph  becomes  constant.   •  1’s  as  shown  in  the  graph  depict  the  trend,  but  the  trend  again  is  changing  beyond  itera?on  50.   •  Build  more  itera?ons  to  figure  out  the  trend  and  the  point  a`er  which  error  rate  is  constant.   •  Analysis-­‐  Success  and  language  are  major  predictors   •  Build  the  model  with  itera?on  =  200   •  Analysis-­‐:  The  trend  seems  clear.  A`er  140  itera?ons,  the  error  rate  graph  becomes  constant.   •  Set  the  itera?ons  to  140  and  con?nue  the  boos?ng  model.   •  Analysis-­‐:  OOB  error  is  21.2  %  but  Type    2  errors  are  very  large.     •  AUC  =68%.  S?ll  room  for  improvement.  Set  the  importance  matrix.  We  need  less  Type  2  error.   •  Call:   ada(IMO_decision   ~   .,   data   =   crs$dataset[crs$train,   c(crs$input,     crs$target)],   control   =   rpart.control(maxdepth  =  30,  cp  =  0.01,    minsplit  =  20,  xval  =  10),  parms  =  list(split  =  "informa?on",    loss  =   matrix(c(0,  1,  1.5,  0),  byrow  =  TRUE,  nrow  =  2)),  iter  =  140)    
  • 8. Final  Confusion  Matrix-­‐  Boos?ng   Model   •                •  Analysis-­‐:  Best  so  far,  although  type  2  error  is   s?ll  big       •  Giving  more  importance  doesn’t  help       •  No  major  change  in  ROC.  
  • 9. Comparison  of  Models   Forest  Model   Boos,ng  Model  
  • 10. Conclusion          With  the  best  dataset,  it  shows  that  there  is  a  strong  sta?s?cal  significance  that   Czechoslovakia,  exhibit  1,  is  the  na?on  with  the  highest  probability  of  winning   appeal  based  on  data  analyzed  in  MS  Excel.    Furthermore,  exhibit  2  shows  29%  of   all  applicants  are  denied  their  appeal.    Of  those  the  Rater,  person  who  determines   the  merit  of  their  case  going  forward  predicts  with,  an  81%  confidence  rate  that,   when  he  or  she  predicts  a  appeal  denial,  it  is  the  correct  predic?on,  conversely   they  are  only  correct  48%  of  the  ?me  when  they  predict  an  awarded  appeal  by  the   judge.    Finally,  the  data  shows  that  most  applicants  the  seek  an  appeal  have  a   higher  approval  probability  with  the  courts  in  Montreal  and  not  Toronto.            As  with  the  Appeal  data  (above)  the  same  inferences  can  be  established  with   individual  Judge  data.  For  the  judges  tree,  exhibit  3,  if  we  assume  that  the  rater   predicts  success  for  33-­‐34%  of  claimants,  72%  of  those  posi?ve  predic?ons  are   cases  that  are  to  be  heard  by  judges  that  ARE  NOT  Heald,  Hugessen,  Iacobucci,   MacGuigan,  Pra[e,  and  Stone.  We  can  infer  that  Desjardins,  Mahoney,  Marceau,   and  Urie  ARE  judges  that  will  have  the  highest  probability  of  ruling  posi?ve  on  an   appeal.    Therefore,  as  Desjardins  is  from  Montreal  and  rules  favorably  on   Czechoslovakian  na?onals,  it  would  behoove  the  company  to  create  a  goal   congruent  strategy  that  favors  those  results.  
  • 11. Exhibit  1   Appeal  Rate  by  Na?on   NATION   APPROVED  APPEAL  RATE   CZECHOSLOVAKIA   73%   SRI  LANKA   36%   EL  SALVADOR   36%   ARGENTINA   25%   IRAN   25%   CHINA   22%   BULGARIA   7%