• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Gbm.more GBM in H2O
 

Gbm.more GBM in H2O

on

  • 396 views

 

Statistics

Views

Total Views
396
Views on SlideShare
396
Embed Views
0

Actions

Likes
1
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Gbm.more GBM in H2O Gbm.more GBM in H2O Presentation Transcript

    • H2O – The Open Source Math Engine H2O and Gradient Boosting
    • What is Gradient Boosting gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function ie gbm is a sum of decision trees each new tree corrects errors of the previous forest
    • Why gradient boosting Performs variable selecting during fitting process • Highly collinear explanatory variables - glm: backwards/forwards is unstable Interactions: will search to a specified depth Captures nonlinearities in the data • ex airlines on-time performance: gbm captures a change in 2001 without analyst having to do so
    • Why gradient boosting, more Will naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties) Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)] Relatively insensitive to long tailed distributions and outliers
    • gradient boosting works well on the right dataset, gbm classification will outperform both glm and random forest Demonstrates good performance on various classification problems • Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge: gbm main model to predict telco customer churn • KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm • many kaggle winners • results at previous employers
    • Inference algorithm (simplified) 1. Initialize k predictors f_k,m=0(x) 2. for m = 1:num_trees a. normalize current predictions b. for k = 1:num_classes i. compute pseudo residual r = y – p_k ii. fit a regression tree to targets r with data X iii. for each terminal region, compute multiplier that maximizes the deviance loss iv. f_k,m+1(x) = f_k,m(x) + region multiplier
    • Regression tree, 1 R1 R2 R4 R3 X1 X2 2 7 1
    • Regression tree, 2 1-level regression tree: 2 terminal nodes, split decision: minimize squared error Data (9 observations) Errors X 1 1 1 2 2 2 3 4 4 R 0.333 0.333 0.333 -0.333 -0.333 -0.333 0.667 0.333 -0.333 split left_sum right_sum left_mle right_mle left_err right_err total_err 1 to 2 2.00 -0.33 0.67 -0.06 0.00 0.98 0.98 2 to 3 1.00 0.67 0.17 0.22 1.50 0.52 2.02 3 to 4 1.67 0.00 0.24 0.00 1.71 0.22 1.94
    • but has pain points Slow to fit Slow to predict Data size limitations: often downsampling required Many implementations single threaded Parameters difficult to understand Fit with searching, choose with holdout: • Interaction levels / depths [1,5,10,15] • trees: [10,100,1000,5000] • learning rate: [.1, .01, .001] • this is often an overnight job
    • h2o can help multicore distributed parallel
    • Questions?
    • gbm intuition Why should this work well?
    • Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu