H2O – The Open Source Math Engine
H2O and
Gradient
Boosting
What is Gradient Boosting
gbm is a boosted ensemble of decision trees, fitted in a
stagewise forward fashion to minimize a...
Why gradient boosting
Performs variable selecting during fitting process
• Highly collinear explanatory variables
- glm: b...
Why gradient boosting, more
Will naturally handle unscaled data (unlike glm, particularly
with L1, L2 penalties)
Handles o...
gradient boosting works well
on the right dataset, gbm classification will outperform both
glm and random forest
Demonstra...
Inference algorithm (simplified)
1. Initialize k predictors f_k,m=0(x)
2. for m = 1:num_trees
a. normalize current predict...
Regression tree, 1
R1
R2
R4
R3
X1
X2
2
7
1
Regression tree, 2
1-level regression tree: 2 terminal nodes, split decision:
minimize squared error
Data (9 observations)...
but has pain points
Slow to fit
Slow to predict
Data size limitations: often downsampling required
Many implementations si...
h2o can help
multicore
distributed
parallel
Questions?
gbm intuition
Why should this work well?
Universe is sparse. Life is messy.
Data is sparse & messy. - Lao Tzu
Upcoming SlideShare
Loading in …5
×

Gbm.more GBM in H2O

1,008 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,008
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Gbm.more GBM in H2O

  1. 1. H2O – The Open Source Math Engine H2O and Gradient Boosting
  2. 2. What is Gradient Boosting gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function ie gbm is a sum of decision trees each new tree corrects errors of the previous forest
  3. 3. Why gradient boosting Performs variable selecting during fitting process • Highly collinear explanatory variables - glm: backwards/forwards is unstable Interactions: will search to a specified depth Captures nonlinearities in the data • ex airlines on-time performance: gbm captures a change in 2001 without analyst having to do so
  4. 4. Why gradient boosting, more Will naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties) Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)] Relatively insensitive to long tailed distributions and outliers
  5. 5. gradient boosting works well on the right dataset, gbm classification will outperform both glm and random forest Demonstrates good performance on various classification problems • Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge: gbm main model to predict telco customer churn • KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm • many kaggle winners • results at previous employers
  6. 6. Inference algorithm (simplified) 1. Initialize k predictors f_k,m=0(x) 2. for m = 1:num_trees a. normalize current predictions b. for k = 1:num_classes i. compute pseudo residual r = y – p_k ii. fit a regression tree to targets r with data X iii. for each terminal region, compute multiplier that maximizes the deviance loss iv. f_k,m+1(x) = f_k,m(x) + region multiplier
  7. 7. Regression tree, 1 R1 R2 R4 R3 X1 X2 2 7 1
  8. 8. Regression tree, 2 1-level regression tree: 2 terminal nodes, split decision: minimize squared error Data (9 observations) Errors X 1 1 1 2 2 2 3 4 4 R 0.333 0.333 0.333 -0.333 -0.333 -0.333 0.667 0.333 -0.333 split left_sum right_sum left_mle right_mle left_err right_err total_err 1 to 2 2.00 -0.33 0.67 -0.06 0.00 0.98 0.98 2 to 3 1.00 0.67 0.17 0.22 1.50 0.52 2.02 3 to 4 1.67 0.00 0.24 0.00 1.71 0.22 1.94
  9. 9. but has pain points Slow to fit Slow to predict Data size limitations: often downsampling required Many implementations single threaded Parameters difficult to understand Fit with searching, choose with holdout: • Interaction levels / depths [1,5,10,15] • trees: [10,100,1000,5000] • learning rate: [.1, .01, .001] • this is often an overnight job
  10. 10. h2o can help multicore distributed parallel
  11. 11. Questions?
  12. 12. gbm intuition Why should this work well?
  13. 13. Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu

×