GBM	
  &	
  Random	
  Forest	
  in	
  H2O	
  
Mark	
  Landry	
  
Presenta6on	
  Outline	
  
•  Algorithm	
  Background	
  
o Decision	
  Trees	
  
o Random	
  Forest	
  
o Gradient	
  Boosted	
  Machines	
  (GBM)	
  
•  H2O	
  ImplementaCons	
  
o Code	
  examples	
  
o DescripCon	
  of	
  parameters	
  and	
  general	
  usage	
  
Decision	
  Trees:	
  Concept	
  
•  Separate	
  the	
  data	
  
according	
  to	
  a	
  series	
  of	
  
quesCons	
  
o  Age	
  >	
  9.5?	
  
•  The	
  quesCons	
  are	
  found	
  
automaCcally	
  to	
  
opCmize	
  separaCon	
  of	
  
the	
  data	
  point	
  by	
  the	
  
“target”	
  
Source: wikimedia CART tree Titanic survivors
Example decision tree:
Predicting survival of Titanic passengers
Decision	
  Trees:	
  Prac6cal	
  Use	
  
•  Non	
  linear	
  
•  Robust	
  to	
  correlated	
  
features	
  
•  Robust	
  to	
  feature	
  
distribuCons	
  
•  Robust	
  to	
  missing	
  
values	
  
•  Simple	
  to	
  comprehend	
  
•  Fast	
  to	
  train	
  
•  Fast	
  to	
  score	
  
•  Poor	
  accuracy	
  
•  Cannot	
  project	
  
•  Inefficiently	
  fits	
  linear	
  
relaConships	
  
WeaknessesStrengths
Improved	
  Decision	
  Trees:	
  Ensembles	
  
•  Bootstrap	
  aggregaCon	
  
(bagging)	
  
•  Fit	
  many	
  trees	
  against	
  
different	
  samples	
  of	
  the	
  
data	
  and	
  average	
  
together	
  
•  BoosCng	
  
•  Fits	
  consecuCve	
  trees	
  
where	
  each	
  solves	
  for	
  
the	
  net	
  error	
  of	
  the	
  
prior	
  trees	
  	
  
GBMRandom Forest
Random	
  Forest	
  
•  Combine	
  mulCple	
  
decision	
  trees,	
  each	
  fit	
  
to	
  a	
  random	
  sample	
  of	
  
the	
  original	
  data	
  
•  Randomly	
  samples	
  	
  
o  Rows	
  
o  Columns	
  
•  Reduce	
  variance,	
  with	
  
minimal	
  increase	
  in	
  bias	
  
•  Strengths	
  
o  Easy	
  to	
  use	
  
•  Few	
  parameters	
  
•  Well-­‐established	
  default	
  
values	
  for	
  parameters	
  	
  
o  Robust	
  
o  CompeCCve	
  accuracy	
  on	
  
most	
  data	
  sets	
  
•  Weaknesses	
  
o  Slow	
  to	
  score	
  
o  Lack	
  of	
  transparency	
  
PracticalConceptual
Gradient	
  Boosted	
  Machines	
  (GBM)	
  
•  BoosCng:	
  ensemble	
  of	
  
weak	
  learners*	
  
•  Fits	
  consecuCve	
  trees	
  
where	
  each	
  solves	
  for	
  the	
  
net	
  loss	
  of	
  the	
  prior	
  trees	
  
•  Results	
  of	
  new	
  trees	
  are	
  
applied	
  parCally	
  to	
  the	
  
enCre	
  soluCon	
  
•  Strengths	
  
o  O`en	
  best	
  possible	
  model	
  
o  Robust	
  
o  Directly	
  opCmizes	
  cost	
  
funcCon	
  
•  Weaknesses	
  
o  Overfits	
  
•  Need	
  to	
  find	
  proper	
  
stopping	
  point	
  
o  SensiCve	
  to	
  noise	
  and	
  
extreme	
  values	
  
o  Several	
  hyper-­‐parameters	
  
o  Lack	
  of	
  transparency	
  
PracticalConceptual
* the notion of “weak” is being challenged
in practice
Trees	
  in	
  H2O	
  
•  Individual	
  tree	
  fiang	
  is	
  performed	
  in	
  parallel	
  
•  Shared	
  histograms	
  calculate	
  cut-­‐points	
  	
  
•  Greedy	
  search	
  of	
  histogram	
  bins,	
  opCmizing	
  
squared	
  error	
  
Explore	
  Further	
  through	
  Examples	
  
I have H2O
Installed
I have R
installed
I have the
H2O World
data sets

H2O World - GBM and Random Forest in H2O- Mark Landry

  • 1.
    GBM  &  Random  Forest  in  H2O   Mark  Landry  
  • 2.
    Presenta6on  Outline   • Algorithm  Background   o Decision  Trees   o Random  Forest   o Gradient  Boosted  Machines  (GBM)   •  H2O  ImplementaCons   o Code  examples   o DescripCon  of  parameters  and  general  usage  
  • 3.
    Decision  Trees:  Concept   •  Separate  the  data   according  to  a  series  of   quesCons   o  Age  >  9.5?   •  The  quesCons  are  found   automaCcally  to   opCmize  separaCon  of   the  data  point  by  the   “target”   Source: wikimedia CART tree Titanic survivors Example decision tree: Predicting survival of Titanic passengers
  • 4.
    Decision  Trees:  Prac6cal  Use   •  Non  linear   •  Robust  to  correlated   features   •  Robust  to  feature   distribuCons   •  Robust  to  missing   values   •  Simple  to  comprehend   •  Fast  to  train   •  Fast  to  score   •  Poor  accuracy   •  Cannot  project   •  Inefficiently  fits  linear   relaConships   WeaknessesStrengths
  • 5.
    Improved  Decision  Trees:  Ensembles   •  Bootstrap  aggregaCon   (bagging)   •  Fit  many  trees  against   different  samples  of  the   data  and  average   together   •  BoosCng   •  Fits  consecuCve  trees   where  each  solves  for   the  net  error  of  the   prior  trees     GBMRandom Forest
  • 6.
    Random  Forest   • Combine  mulCple   decision  trees,  each  fit   to  a  random  sample  of   the  original  data   •  Randomly  samples     o  Rows   o  Columns   •  Reduce  variance,  with   minimal  increase  in  bias   •  Strengths   o  Easy  to  use   •  Few  parameters   •  Well-­‐established  default   values  for  parameters     o  Robust   o  CompeCCve  accuracy  on   most  data  sets   •  Weaknesses   o  Slow  to  score   o  Lack  of  transparency   PracticalConceptual
  • 7.
    Gradient  Boosted  Machines  (GBM)   •  BoosCng:  ensemble  of   weak  learners*   •  Fits  consecuCve  trees   where  each  solves  for  the   net  loss  of  the  prior  trees   •  Results  of  new  trees  are   applied  parCally  to  the   enCre  soluCon   •  Strengths   o  O`en  best  possible  model   o  Robust   o  Directly  opCmizes  cost   funcCon   •  Weaknesses   o  Overfits   •  Need  to  find  proper   stopping  point   o  SensiCve  to  noise  and   extreme  values   o  Several  hyper-­‐parameters   o  Lack  of  transparency   PracticalConceptual * the notion of “weak” is being challenged in practice
  • 8.
    Trees  in  H2O   •  Individual  tree  fiang  is  performed  in  parallel   •  Shared  histograms  calculate  cut-­‐points     •  Greedy  search  of  histogram  bins,  opCmizing   squared  error  
  • 9.
    Explore  Further  through  Examples   I have H2O Installed I have R installed I have the H2O World data sets