Demystifying Xgboost

Context
● What’s XGBoost
● Evolution of Tree Based Algorithms
● Why does XGBoost perform so well
● King of all algorithms
● Boosting
● Ensemble method
● Kaggle projects that uses XGBoost
● XGBoost on Code

What is XGBoost
XGBoost is a decision-tree-based
ensemble Machine Learning
algorithm that uses a gradient
boosting framework. In prediction
problems involving unstructured
data (images, text, etc.)

EVOLUTION OF TREE-BASED ALGORITHMS

Why does XGBoost perform so well?

Why is XGBoost the King of all other machine learning Algorithm
Speed and
performance :
Originally written in
C++, it is comparatively
faster than other
ensemble classifiers and
useful for very large
datasets that don’t fit
into memory.
Consistently outperforms
other algorithm methods :
It has shown better
performance on a variety
of machine learning
benchmark datasets.Wide variety of tuning
parameters : XGBoost
internally has parameters for
cross-validation, regularization,
user-defined objective
functions, missing values, tree
parameters, scikit-learn
compatible API etc.

Cross validation is
80/20 Data Spliting
Boosting
Boosting is a sequential technique which works
on the principle of an ensemble. It combines a
set of weak learners and delivers improved
prediction accuracy

Cross validation is
80/20 Data Spliting
Boosting
That's the basic idea
behind boosting
algorithms is building a
weak model, making
conclusions about the
various feature
importance and
parameters, and then
using those conclusions
to build a new, stronger
model and capitalize on
the misclassiﬁcation error
of the previous model and
try to reduce it.

Ensemble Methods
ensemble methods use multiple learning algorithms to obtain
better predictive performance than could be obtained from
any of the constituent learning algorithms alone.

So should we use just XGBoost all the time?
When it comes to Machine Learning (or
even life for that matter), there is no free
lunch.

Cross validation is
80/20 Data Spliting
Kaggle Projects the uses XGBoost
● Predicting Gold Glove
● Sloan Digital Sky Survey Classiﬁcation
● Boston Housing Price
● Santander Customer Transaction Prediction
● Microsoft malware prediction etc

Cross validation is
80/20 Data Spliting
XGBoost on Code
Using the XGBoost Regressor is just like using any other
regression-based approach like Random Forest. We follow the
same methods:
● Clean the Data
● Select the important parameters
● Make a training and test set
● Fit the model to your training set
● Evaluate your model on your test data

Thank you
Datacamp Instructor |
Simpliv | Head Boy at
Gitgirl
Co- Organiser Pydata
Port Harcourt |
Phschoolofai
@emekaboris

Demystifying Xgboost

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Demystifying Xgboost

Similar to Demystifying Xgboost (20)

Recently uploaded

Recently uploaded (20)

Demystifying Xgboost