Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- XGBoost (System Overview) by Natallie Baikevich 1069 views
- GBM package in r by mark_landry 38344 views
- 6 Tips for Optimizing TreeNet Gradi... by Salford Systems 3075 views
- Inlining Heuristics by Natallie Baikevich 758 views
- Automated data analysis with Python by gramener 5120 views
- GBM theory code and parameters by Venkata Reddy Kon... 2219 views

No Downloads

Total views

1,395

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

33

Comments

0

Likes

1

No embeds

No notes for slide

- 1. H2O – The Open Source Math Engine H2O and Gradient Boosting
- 2. What is Gradient Boosting gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function ie gbm is a sum of decision trees each new tree corrects errors of the previous forest
- 3. Why gradient boosting Performs variable selecting during fitting process • Highly collinear explanatory variables - glm: backwards/forwards is unstable Interactions: will search to a specified depth Captures nonlinearities in the data • ex airlines on-time performance: gbm captures a change in 2001 without analyst having to do so
- 4. Why gradient boosting, more Will naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties) Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)] Relatively insensitive to long tailed distributions and outliers
- 5. gradient boosting works well on the right dataset, gbm classification will outperform both glm and random forest Demonstrates good performance on various classification problems • Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge: gbm main model to predict telco customer churn • KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm • many kaggle winners • results at previous employers
- 6. Inference algorithm (simplified) 1. Initialize k predictors f_k,m=0(x) 2. for m = 1:num_trees a. normalize current predictions b. for k = 1:num_classes i. compute pseudo residual r = y – p_k ii. fit a regression tree to targets r with data X iii. for each terminal region, compute multiplier that maximizes the deviance loss iv. f_k,m+1(x) = f_k,m(x) + region multiplier
- 7. Regression tree, 1 R1 R2 R4 R3 X1 X2 2 7 1
- 8. Regression tree, 2 1-level regression tree: 2 terminal nodes, split decision: minimize squared error Data (9 observations) Errors X 1 1 1 2 2 2 3 4 4 R 0.333 0.333 0.333 -0.333 -0.333 -0.333 0.667 0.333 -0.333 split left_sum right_sum left_mle right_mle left_err right_err total_err 1 to 2 2.00 -0.33 0.67 -0.06 0.00 0.98 0.98 2 to 3 1.00 0.67 0.17 0.22 1.50 0.52 2.02 3 to 4 1.67 0.00 0.24 0.00 1.71 0.22 1.94
- 9. but has pain points Slow to fit Slow to predict Data size limitations: often downsampling required Many implementations single threaded Parameters difficult to understand Fit with searching, choose with holdout: • Interaction levels / depths [1,5,10,15] • trees: [10,100,1000,5000] • learning rate: [.1, .01, .001] • this is often an overnight job
- 10. h2o can help multicore distributed parallel
- 11. Questions?
- 12. gbm intuition Why should this work well?
- 13. Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu

No public clipboards found for this slide

Be the first to comment