XGBoost & LightGBM

•

7 likes•5,719 views

This document summarizes gradient boosting algorithms XGBoost and LightGBM. It covers decision trees, overfitting, regularization, feature engineering, parameter tuning, evaluation metrics, and comparisons between XGBoost and LightGBM. Key aspects discussed include XGBoost and LightGBM's tolerance of outliers, non-standardized features, collinear features, and NaN values. Parameter tuning, using RandomizedSearchCV and GridSearchCV, and ensembling models to optimize multiple metrics are also covered.

Data & Analytics

Gradient Boosting in Practice:
XGBoost and LightGBM
Gabriel Cypriano

source: http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting

Decision Trees
review
source: https://intelligentjava.wordpress.com/2015/04/28/machine-learning-decision-tree

Overfitting review
source: https://en.wikipedia.org/wiki/Overfitting

Gradient Boosting
source: https://blog.bigml.com/2017/03/14/introduction-to-boosted-trees

Regularization
review
source: https://www.r-bloggers.com/an-attempt-to-understand-boosting-algorithms

Regularization review
Ridge (L2) Lasso (L1)

Feature
Engineering
● OK with outliers
● OK with non-standardized
features
● OK with collinear features
● OK with NaN values
or lack thereof

Feature
Engineering
● Got NaN’s?
○ set them to -999
○ set missing = -999
or lack thereof

XGBoost
Parameter
Tuning
● n_estimators
● max_depth
● learning_rate
● reg_lambda
● reg_alpha
● subsample
● colsample_bytree
● gamma
yes, it’s combinatorial

XGBoost Parameter Tuning
RandomizedSearchCV and GridSearchCV to the rescue.

XGBoost Parameter Tuning
How not to do grid search (3 * 2 * 15 * 3 = 270 models):

XGBoost
Parameter Tuning
1 * 1 * 3 * 5 * 3 = 45 models

XGBoost
Ensembles
VotingClassifier with
voting='soft' for
combining multiple
XGBoost models and
optimizing for multiple
metrics.
ensemble xgb_auc xgb_precision xgb_log_loss
AUC 0.84 0.84 0.76 0.84
Precision 0.71 0.44 0.94 0.71
Log Loss 0.42 0.49 0.48 0.38

LightGBM
and its advantages
● OK with NaN values
● OK with categorical features
● Faster training than XGBoost
● Often better results

Trees: Categorical
features vs
One-Hot-Encoded
features
source:
https://medium.com/data-design/visiting-catego
rical-features-and-encoding-in-decision-trees-534
00fa65931

Tree Growth strategies
XGBoost:
LightGBM:
source: https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost

The good
● Often yields good results
● Reduced need for feature
engineering
● Fast to train a single model
● Good choice if all you have is 1
shot at the problem
● GPU support
● Scikit-learn API
● Great to ensemble and optimize for
multiple metrics
The bad
● Too many parameters
● Slow to tune parameters
● GPU config can be tough (try
Docker)
● No GPU support on
scikit-learn API (XGBoost)

Gracias!
gabrielcs.me
vagas.creditas.com.br
somostera.com

What's hot

Demystifying Xgboosthalifaxchester

Understanding Bagging and BoostingMohit Rajput

Introduction to XGBoostJoonyoung Yi

A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)Thomas da Silva Paula

Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics

Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!

Training Neural NetworksDatabricks

Introduction to Machine Learning with Python and scikit-learnMatt Hagy

Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsLightbend

Using SHAP to Understand Black Box ModelsJonathan Bechtel

ShapGiovanni Bruner

Feature selectiondkpawar

XgboostVivian S. Zhang

Transformers in 2021Grigory Sapunov

Hyperparameter TuningJon Lederman

Winning Kaggle 101: Introduction to StackingTed Xiao

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony

Feature EngineeringHJ van Veen

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest

Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi

What's hot (20)

Demystifying Xgboost

Understanding Bagging and Boosting

Introduction to XGBoost

A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)

Decision Trees for Classification: A Machine Learning Algorithm

Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...

Training Neural Networks

Introduction to Machine Learning with Python and scikit-learn

Concept Drift: Monitoring Model Quality In Streaming ML Applications

Using SHAP to Understand Black Box Models

Shap

Feature selection

Xgboost

Transformers in 2021

Hyperparameter Tuning

Winning Kaggle 101: Introduction to Stacking

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...

Feature Engineering

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI

Boosting Approach to Solving Machine Learning Problems

Similar to XGBoost & LightGBM

Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers

chapter1.pdfssuser75b6b3

Using GANs to improve generalization in a semi-supervised setting - trying it...PyData

Semi-supervised learning with GANsterek47

Sprint 44 reviewManageIQ

XgBoost.pptxsumankumar507

Web Traffic Time Series ForecastingBillTubbs

# Can we trust ai. the dilemma of model adjustmentTerence Huang

Fine tuning large LMsSylvainGugger

Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark

Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt

쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang

Sprint 45 reviewManageIQ

Scikit-Learn: Machine Learning in PythonMicrosoft

How to Reduce Scikit-Learn Training TimeMichael Galarnyk

Understanding GBM and XGBoost in Scikit-Learn철민 권

PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田健Preferred Networks

IrganLiangqun Lu

Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt

Elastic stockholm-meetupAnna Ossowski

Similar to XGBoost & LightGBM (20)

Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI

chapter1.pdf

Using GANs to improve generalization in a semi-supervised setting - trying it...

Semi-supervised learning with GANs

Sprint 44 review

XgBoost.pptx

Web Traffic Time Series Forecasting

# Can we trust ai. the dilemma of model adjustment

Fine tuning large LMs

Using Bayesian Optimization to Tune Machine Learning Models

쉽게 설명하는 GAN (What is this? Gum? It's GAN.)

Sprint 45 review

Scikit-Learn: Machine Learning in Python

How to Reduce Scikit-Learn Training Time

Understanding GBM and XGBoost in Scikit-Learn

PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田健

Irgan

Tuning 2.0: Advanced Optimization Techniques Webinar

Elastic stockholm-meetup

Recently uploaded

Exploratory Data Analysis - Dilip S.pptxDilipVasan

一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq

Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani

Q1’2024 Update: MYCI’s Leap Year ReboundOppotus

How can I successfully sell my pi coins in Philippines?DOT TECH

Opendatabay - Open Data Marketplace.pptxOpendatabay

Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757

2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz

Computer Presentation.pptx ecommerce advantage sMAQIB18

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya

Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.NABLAS株式会社

Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole

Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io

一比一原版(YU毕业证)约克大学毕业证成绩单enxupq

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz

Slip-and-fall Injuries: Top Workers' Comp ClaimsBisnar Chase Personal Injury Attorneys

Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013

Recently uploaded (20)

Exploratory Data Analysis - Dilip S.pptx

一比一原版(QU毕业证)皇后大学毕业证成绩单

Using PDB Relocation to Move a Single PDB to Another Existing CDB

Q1’2024 Update: MYCI’s Leap Year Rebound

How can I successfully sell my pi coins in Philippines?

Opendatabay - Open Data Marketplace.pptx

Business update Q1 2024 Lar España Real Estate SOCIMI

2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单

Computer Presentation.pptx ecommerce advantage s

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...

Jpolillo Amazon PPC - Bid Optimization Sample

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.

Supply chain analytics to combat the effects of Ukraine-Russia-conflict

Investigate & Recover / StarCompliance.io / Crypto_Crimes

一比一原版(YU毕业证)约克大学毕业证成绩单

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单

Slip-and-fall Injuries: Top Workers' Comp Claims

Pre-ProductionImproveddsfjgndflghtgg.pptx

XGBoost & LightGBM

1. Gradient Boosting in Practice: XGBoost and LightGBM Gabriel Cypriano

2. source: http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting

3. Decision Trees review source: https://intelligentjava.wordpress.com/2015/04/28/machine-learning-decision-tree

4. Overfitting review source: https://en.wikipedia.org/wiki/Overfitting

5. Overfitting with Decision Trees

6. Gradient Boosting source: https://blog.bigml.com/2017/03/14/introduction-to-boosted-trees

7. Regularization review source: https://www.r-bloggers.com/an-attempt-to-understand-boosting-algorithms

8. Regularization review Ridge (L2) Lasso (L1)

9. XGBoost

10. Feature Engineering ● OK with outliers ● OK with non-standardized features ● OK with collinear features ● OK with NaN values or lack thereof

11. Feature Engineering ● Got NaN’s? ○ set them to -999 ○ set missing = -999 or lack thereof

12. XGBoost Parameter Tuning ● n_estimators ● max_depth ● learning_rate ● reg_lambda ● reg_alpha ● subsample ● colsample_bytree ● gamma yes, it’s combinatorial

13. XGBoost Parameter Tuning RandomizedSearchCV and GridSearchCV to the rescue.

14. XGBoost Parameter Tuning How not to do grid search (3 * 2 * 15 * 3 = 270 models):

15. XGBoost Parameter Tuning 1 * 1 * 3 * 5 * 3 = 45 models

16. Evaluation Metric

17. XGBoost Ensembles VotingClassifier with voting='soft' for combining multiple XGBoost models and optimizing for multiple metrics. ensemble xgb_auc xgb_precision xgb_log_loss AUC 0.84 0.84 0.76 0.84 Precision 0.71 0.44 0.94 0.71 Log Loss 0.42 0.49 0.48 0.38

18. LightGBM

19. LightGBM and its advantages ● OK with NaN values ● OK with categorical features ● Faster training than XGBoost ● Often better results

20. Trees: Categorical features vs One-Hot-Encoded features source: https://medium.com/data-design/visiting-catego rical-features-and-encoding-in-decision-trees-534 00fa65931

21. Tree Growth strategies XGBoost: LightGBM: source: https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost

22. The good ● Often yields good results ● Reduced need for feature engineering ● Fast to train a single model ● Good choice if all you have is 1 shot at the problem ● GPU support ● Scikit-learn API ● Great to ensemble and optimize for multiple metrics The bad ● Too many parameters ● Slow to tune parameters ● GPU config can be tough (try Docker) ● No GPU support on scikit-learn API (XGBoost)

23. Gracias! gabrielcs.me vagas.creditas.com.br somostera.com

XGBoost & LightGBM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to XGBoost & LightGBM

Similar to XGBoost & LightGBM (20)

Recently uploaded

Recently uploaded (20)

XGBoost & LightGBM