XGBoost
All Hail
XGBoostYou
Context
● What’s XGBoost
● Evolution of Tree Based Algorithms
● Why does XGBoost perform so well
● King of all algorithms
● Boosting
● Ensemble method
● Kaggle projects that uses XGBoost
● XGBoost on Code
What is XGBoost
XGBoost is a decision-tree-based
ensemble Machine Learning
algorithm that uses a gradient
boosting framework. In prediction
problems involving unstructured
data (images, text, etc.)
EVOLUTION OF TREE-BASED ALGORITHMS
Why does XGBoost perform so well?
XGBoost
Why is XGBoost the King of all other machine learning Algorithm
Speed and
performance :
Originally written in
C++, it is comparatively
faster than other
ensemble classifiers and
useful for very large
datasets that don’t fit
into memory.
Consistently outperforms
other algorithm methods :
It has shown better
performance on a variety
of machine learning
benchmark datasets.Wide variety of tuning
parameters : XGBoost
internally has parameters for
cross-validation, regularization,
user-defined objective
functions, missing values, tree
parameters, scikit-learn
compatible API etc.
XGBoost
Cross validation is
80/20 Data Spliting
Boosting
Boosting is a sequential technique which works
on the principle of an ensemble. It combines a
set of weak learners and delivers improved
prediction accuracy
Cross validation is
80/20 Data Spliting
Boosting
That's the basic idea
behind boosting
algorithms is building a
weak model, making
conclusions about the
various feature
importance and
parameters, and then
using those conclusions
to build a new, stronger
model and capitalize on
the misclassification error
of the previous model and
try to reduce it.
Ensemble Methods
ensemble methods use multiple learning algorithms to obtain
better predictive performance than could be obtained from
any of the constituent learning algorithms alone.
So should we use just XGBoost all the time?
When it comes to Machine Learning (or
even life for that matter), there is no free
lunch.
Cross validation is
80/20 Data Spliting
Kaggle Projects the uses XGBoost
● Predicting Gold Glove
● Sloan Digital Sky Survey Classification
● Boston Housing Price
● Santander Customer Transaction Prediction
● Microsoft malware prediction etc
Cross validation is
80/20 Data Spliting
XGBoost on Code
Using the XGBoost Regressor is just like using any other
regression-based approach like Random Forest. We follow the
same methods:
● Clean the Data
● Select the important parameters
● Make a training and test set
● Fit the model to your training set
● Evaluate your model on your test data
Thank you
Datacamp Instructor |
Simpliv | Head Boy at
Gitgirl
Co- Organiser Pydata
Port Harcourt |
Phschoolofai
@emekaboris

Demystifying Xgboost

  • 1.
  • 2.
    Context ● What’s XGBoost ●Evolution of Tree Based Algorithms ● Why does XGBoost perform so well ● King of all algorithms ● Boosting ● Ensemble method ● Kaggle projects that uses XGBoost ● XGBoost on Code
  • 3.
    What is XGBoost XGBoostis a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.)
  • 4.
  • 5.
    Why does XGBoostperform so well?
  • 6.
  • 8.
    Why is XGBoostthe King of all other machine learning Algorithm Speed and performance : Originally written in C++, it is comparatively faster than other ensemble classifiers and useful for very large datasets that don’t fit into memory. Consistently outperforms other algorithm methods : It has shown better performance on a variety of machine learning benchmark datasets.Wide variety of tuning parameters : XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, missing values, tree parameters, scikit-learn compatible API etc.
  • 10.
  • 11.
    Cross validation is 80/20Data Spliting Boosting Boosting is a sequential technique which works on the principle of an ensemble. It combines a set of weak learners and delivers improved prediction accuracy
  • 12.
    Cross validation is 80/20Data Spliting Boosting That's the basic idea behind boosting algorithms is building a weak model, making conclusions about the various feature importance and parameters, and then using those conclusions to build a new, stronger model and capitalize on the misclassification error of the previous model and try to reduce it.
  • 13.
    Ensemble Methods ensemble methodsuse multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
  • 14.
    So should weuse just XGBoost all the time? When it comes to Machine Learning (or even life for that matter), there is no free lunch.
  • 16.
    Cross validation is 80/20Data Spliting Kaggle Projects the uses XGBoost ● Predicting Gold Glove ● Sloan Digital Sky Survey Classification ● Boston Housing Price ● Santander Customer Transaction Prediction ● Microsoft malware prediction etc
  • 17.
    Cross validation is 80/20Data Spliting XGBoost on Code Using the XGBoost Regressor is just like using any other regression-based approach like Random Forest. We follow the same methods: ● Clean the Data ● Select the important parameters ● Make a training and test set ● Fit the model to your training set ● Evaluate your model on your test data
  • 21.
    Thank you Datacamp Instructor| Simpliv | Head Boy at Gitgirl Co- Organiser Pydata Port Harcourt | Phschoolofai @emekaboris