Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами стандартных kaggle-style задач»

Gradient Boosting  
New stuff, possibilities and tricks
Alex 
Natekin

Such boosting
wow
much learning
Microsoft 
LightGBM
CRAN

Boost our plan for today:
GBM as of  
May 2017
Inside the  
black box
Lesser known 
capabilities

Microsoft 
LightGBM
CRAN
•Leaf-wise tree
growth
•Histogram-based
trees
•Feature & data
parallel split search
•Common tasks
•Regularized tree
structure
•(new) histogram-
based trees
•Feature parallel
split search
•Common tasks +
full customization
•Vanilla + TONS
of tweaks
•Histogram-based
optimisation
•Feature parallel
split search
•Common tasks,
some extensions
•Vanilla
•Some tree
implementations
are plain bad
•As extensible as
one wants
Main GBM libraries:

tree_method = “hist”
CRAN
Microsoft 
LightGBM
CRAN
Current competition:

Such challenge
wow
much kaggle
?
7

• A lot of implementations, great, good and so-so:
• Multi-platform solutions outperfrom all: xgboost, lightgbm and h2o
• There are many niche packages with specialised boosters, losses and tweaks 
• GBM benchmarks:
• https://github.com/szilard/benchm-ml
• https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-
speed-comparison-17f95cee68b5
• https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs-
xgboost-7b5484746ac4  
• Next big thing - GBM on GPU, currently in active development:
• https://blog.h2o.ai/2017/05/machine-learning-on-gpus/
• Xgboost also has it’s GPU implementation, but H2O wrapped it under it’s framework
GBM as of May 2017

Inside the black-box
Variable
importance
Partial
dependency plots
Distillation and GBM
reconstruction

• GBM variable importance:
• Mostly implemented as gains and frequencies across splits. Don’t trust them
• Better approach for a blackbox - via shuffling variables and looking at loss change
• Nice packages: https://github.com/limexp/xgbfir/ + https://github.com/Far0n/xgbfi  
• Partial dependence plots:
• Just fix all variables to mean values and plot prediction grids for chosen variables
• Useful for overall model validation and highlighting strong interactions
• Very useful for validation key features and (chosen) interactions 
• GBM distillation and reconstruction:
• Use Xgboost functions predict_leaf_indices
• Lasso, glmnet, glinternet
• Can actually refit it all
Inside the black-box

Random cool stuff
Varying tree
complexity
Tuning: Discrete
random FTW
RL: boosting for
minecraft

• You can tweak GBM a lot:
• Changing tree depth across iterations (smaller first, deeper afterwards)
• Same applies to other parameters (deeper trees might need more randomness) 
• Tuning GBM:
• Better to tune alphaetashrinkage with fixed number of trees
• Packages bring more hyper parameter tweaks, histogram resolution is often useful
• Discrete random search works really well, significantly decrease time
• H2O has it off-the-shelf https://blog.h2o.ai/2016/06/h2o-gbm-tuning-tutorial-for-r/  
• GBM for random cool tasks:
• Strange yet working demo for RL and Minecraft https://arxiv.org/pdf/1603.04119.pdf
• Some custom GBMs for NER with CRF http://proceedings.mlr.press/v38/chen15b.pdf
Random cool stuff

1. Lightgbm seems like the go-to 2017 GBM library 
2. Await many cool news with GPU (especially H2O + Xgboost) 
3. Don’t forget about model inspection and PDP 
4. We have the distillation capabilities, why are we not using them? 
5. Random search helps a lot with tuning
Summary

Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами стандартных kaggle-style задач»

More Related Content

Similar to Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами стандартных kaggle-style задач»

More from Mail.ru Group

Recently uploaded

Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами стандартных kaggle-style задач»