Gradient Boosting 

New stuff, possibilities and tricks
Alex

Natekin
Such boosting
wow
much learning
Microsoft

LightGBM
CRAN
Boost our plan for today:
GBM as of 

May 2017
Inside the 

black box
Lesser known

capabilities
Microsoft

LightGBM
CRAN
•Leaf-wise tree
growth
•Histogram-based
trees
•Feature & data
parallel split search
•Common tasks
•Regularized tree
structure
•(new) histogram-
based trees
•Feature parallel
split search
•Common tasks +
full customization
•Vanilla + TONS
of tweaks
•Histogram-based
optimisation
•Feature parallel
split search
•Common tasks,
some extensions
•Vanilla
•Some tree
implementations
are plain bad
•As extensible as
one wants
Main GBM libraries:
tree_method = “hist”
CRAN
Microsoft

LightGBM
CRAN
Current competition:
Next big boost:
Next big boost:
Such challenge
wow
much kaggle
?
7
• A lot of implementations, great, good and so-so:
• Multi-platform solutions outperfrom all: xgboost, lightgbm and h2o
• There are many niche packages with specialised boosters, losses and tweaks

• GBM benchmarks:
• https://github.com/szilard/benchm-ml
• https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-
speed-comparison-17f95cee68b5
• https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs-
xgboost-7b5484746ac4 

• Next big thing - GBM on GPU, currently in active development:
• https://blog.h2o.ai/2017/05/machine-learning-on-gpus/
• Xgboost also has it’s GPU implementation, but H2O wrapped it under it’s framework
GBM as of May 2017
Inside the black-box
Variable
importance
Partial
dependency plots
Distillation and GBM
reconstruction
• GBM variable importance:
• Mostly implemented as gains and frequencies across splits. Don’t trust them
• Better approach for a blackbox - via shuffling variables and looking at loss change
• Nice packages: https://github.com/limexp/xgbfir/ + https://github.com/Far0n/xgbfi 

• Partial dependence plots:
• Just fix all variables to mean values and plot prediction grids for chosen variables
• Useful for overall model validation and highlighting strong interactions
• Very useful for validation key features and (chosen) interactions

• GBM distillation and reconstruction:
• Use Xgboost functions predict_leaf_indices
• Lasso, glmnet, glinternet
• Can actually refit it all
Inside the black-box
Random cool stuff
Varying tree
complexity
Tuning: Discrete
random FTW
RL: boosting for
minecraft
• You can tweak GBM a lot:
• Changing tree depth across iterations (smaller first, deeper afterwards)
• Same applies to other parameters (deeper trees might need more randomness)

• Tuning GBM:
• Better to tune alphaetashrinkage with fixed number of trees
• Packages bring more hyper parameter tweaks, histogram resolution is often useful
• Discrete random search works really well, significantly decrease time
• H2O has it off-the-shelf https://blog.h2o.ai/2016/06/h2o-gbm-tuning-tutorial-for-r/ 

• GBM for random cool tasks:
• Strange yet working demo for RL and Minecraft https://arxiv.org/pdf/1603.04119.pdf
• Some custom GBMs for NER with CRF http://proceedings.mlr.press/v38/chen15b.pdf
Random cool stuff
1. Lightgbm seems like the go-to 2017 GBM library

2. Await many cool news with GPU (especially H2O + Xgboost)

3. Don’t forget about model inspection and PDP

4. We have the distillation capabilities, why are we not using them?

5. Random search helps a lot with tuning
Summary
Thanks!

Алексей Натекин (DM Labs, OpenDataScience): «Градиентный бустинг: возможности, особенности и фишки за пределами стандартных kaggle-style задач»

  • 1.
    Gradient Boosting 
 Newstuff, possibilities and tricks Alex
 Natekin
  • 2.
  • 3.
    Boost our planfor today: GBM as of 
 May 2017 Inside the 
 black box Lesser known
 capabilities
  • 4.
    Microsoft
 LightGBM CRAN •Leaf-wise tree growth •Histogram-based trees •Feature &data parallel split search •Common tasks •Regularized tree structure •(new) histogram- based trees •Feature parallel split search •Common tasks + full customization •Vanilla + TONS of tweaks •Histogram-based optimisation •Feature parallel split search •Common tasks, some extensions •Vanilla •Some tree implementations are plain bad •As extensible as one wants Main GBM libraries:
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    • A lotof implementations, great, good and so-so: • Multi-platform solutions outperfrom all: xgboost, lightgbm and h2o • There are many niche packages with specialised boosters, losses and tweaks
 • GBM benchmarks: • https://github.com/szilard/benchm-ml • https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training- speed-comparison-17f95cee68b5 • https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs- xgboost-7b5484746ac4 
 • Next big thing - GBM on GPU, currently in active development: • https://blog.h2o.ai/2017/05/machine-learning-on-gpus/ • Xgboost also has it’s GPU implementation, but H2O wrapped it under it’s framework GBM as of May 2017
  • 10.
    Inside the black-box Variable importance Partial dependencyplots Distillation and GBM reconstruction
  • 11.
    • GBM variableimportance: • Mostly implemented as gains and frequencies across splits. Don’t trust them • Better approach for a blackbox - via shuffling variables and looking at loss change • Nice packages: https://github.com/limexp/xgbfir/ + https://github.com/Far0n/xgbfi 
 • Partial dependence plots: • Just fix all variables to mean values and plot prediction grids for chosen variables • Useful for overall model validation and highlighting strong interactions • Very useful for validation key features and (chosen) interactions
 • GBM distillation and reconstruction: • Use Xgboost functions predict_leaf_indices • Lasso, glmnet, glinternet • Can actually refit it all Inside the black-box
  • 12.
    Random cool stuff Varyingtree complexity Tuning: Discrete random FTW RL: boosting for minecraft
  • 13.
    • You cantweak GBM a lot: • Changing tree depth across iterations (smaller first, deeper afterwards) • Same applies to other parameters (deeper trees might need more randomness)
 • Tuning GBM: • Better to tune alphaetashrinkage with fixed number of trees • Packages bring more hyper parameter tweaks, histogram resolution is often useful • Discrete random search works really well, significantly decrease time • H2O has it off-the-shelf https://blog.h2o.ai/2016/06/h2o-gbm-tuning-tutorial-for-r/ 
 • GBM for random cool tasks: • Strange yet working demo for RL and Minecraft https://arxiv.org/pdf/1603.04119.pdf • Some custom GBMs for NER with CRF http://proceedings.mlr.press/v38/chen15b.pdf Random cool stuff
  • 14.
    1. Lightgbm seemslike the go-to 2017 GBM library
 2. Await many cool news with GPU (especially H2O + Xgboost)
 3. Don’t forget about model inspection and PDP
 4. We have the distillation capabilities, why are we not using them?
 5. Random search helps a lot with tuning Summary
  • 15.