SlideShare a Scribd company logo
XGBoost: the algorithm that wins every
competition
Poznań Univeristy of Technology; April 28th, 2016
meet.ml #1 - Applied Big Data and Machine Learning
By Jarosław Szymczak
Data science competitions
Some recent competitions from Kaggle
(Binary classification problem)
Product Classification Challenge
(Multi-label classification problem)
Store Sales
(Regression problem)
Some recent competitions from Kaggle
Features:
93
Some recent competitions from Kaggle
CTR – click through rate
(~ equal to probability of click)
Features:
Some recent competitions from Kaggle
Sales prediction for store at certain date
Features:
• store id
• date
• school and state holidays
• store type
• assortment
• promotions
• competition
Solutions evaluation
0
2
4
6
8
10
12
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
logarithmic loss for positive example
• Root Mean Square
Percentage Error (RMSPE)
• Binary logarithmic loss
(loss is capped at ~35 for
a single observation)
• Multi-class logarithmic loss
What algorithms we tackle the problems with?
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
Shall we play tenis?
Decision tree standard example – shall we play tenis?
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
But one tree for complicated
problems is simply not
enough, and we will need to
reduce erros coming from:
• variance
• bias
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
44
2
Bagging – bootstrap aggregation
2 5 5 6
2 3 3 4 5 5
1 1 3 4 4 6
1 2 2 + 5
1 2 3 4 5 6Our initial dataset:
First draw:
Second draw:
Third draw:
Fourth draw:
3
Average:
Model 1
Model 2
Model 3
Model 4
Final model
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
Random forest
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
AdaBoost – adaptive boosting
+
+
+
-
+
+
-
-
-
-
AdaBoost – adaptive boosting
+
+
+
-
+
+
-
-
-
-
+
+
+
-
+
+
-
-
-
-
+
+
+
-
+
+
-
-
-
-
+
+
+
-
+
+
-
-
-
-
ℎ1
ε1 = 0,30
α1 = 0,42
ℎ2
ε2 = 0,21
α2 = 0,65
ℎ3
ε3 = 0,14
α3 = 0,92
𝐻 = 0,42ℎ1
+0,65ℎ2
+0,92ℎ3
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
Another boosting approach
What if we, instead of reweighting examples, made some corrections to prediction errors directly?
Note:
Residual is a gradient of single
observation error contribution
in one of the most common
evaluation measure for regression:
root mean squared error (RMSE)
We add new model, to the one we
already have, so:
Ypred = X1(Y) + NEW(Y)
NEW(Y) = X1(Y) - Ypred
our residual
Gradient descent
Gradient boosting
It can be:
• same algorithm as for
further steps
• or something very
simple (like uniform
probabilities or average
target in regression)
For any function that:
• agrregates the error from
examples (e.g. log-loss,
RMSE, but not AUC)
• you can calculate gradient
on example level (it is
called pseudo-residual)
Fit model to initial data Fit pseudo-residuals Finalization
Sum up all the
models
Bias related errors
Bagging
Random forest
Adaptive boosting
Gradient boosting
Variance related errors
XGBoost
by Tianqi Chen
eXtreme Gradient
Boosting
Used for:
• classification
• regression
• ranking
with custom loss
functions
Interfaces for
Python and R,
can be executed on
YARN
custom tree
building
algorithm
XGBoost – handling the features
Numeric values
• for each numeric value, XGBoost finds the best available split (it is always a binary split)
• algorithm is designed to work with numeric values only
Nominal values
• need to be converted to numeric ones
• classic way is to perform one-hot-encoding / get dummies (for all values)
• for variables with large cardinality, some other, more sophisticated methods may need to
be used
Missing values
• XGBoost handles missing values separately
• missing value is always added to a branch in which it would minimize the loss (so it is
either treated as very large / very small value)
• it is a great advantage over, e.g. scikit-learn gradient boosting implementation
XGBoost – loss functions
Used as optimization objective
• logloss (optimized in complicated manner) for objectives binary:logistic and reg:logistic (they only differ by
evaluation measure, all the rest is the same)
• softmax (which is logloss transofrmed to handle multiple classes) for objective multi:softmax
• RMSE (root mean squared error) for objective reg:linear
Only measured
• AUC – area under ROC curve
• MAE – mean absolute error
• error / merror – error rate for binary / multi-class classification
Custom
• only measured
• simply implement the evaluation function
• used as optimization objective
• provided that you have a measure that is aggregation of element-wise losses and is differentiable
• you create a function for element-wise gradient (derivative of prediction function)
• then hessian (second derivative of prediction function)
• that’s it, algorithm will approximate your loss function with Taylor series (pre-defined loss functions are
implemented in the same manner)
XGBoost – setup for Python environment
Installing XGBoost in your environment
Check out this wonderful tutorial to install Docker with
kaggle Python stack:
http://goo.gl/wSCkmC
http://xgboost.readthedocs.org/en/latest/build.html
Link as above, expect potential troubles
Clone: https://github.com/dmlc/xgboost
At revision: 219e58d453daf26d6717e2266b83fca0071f3129
And follow the instructions in it (it is of course older version)
XGBoost code example on Otto data
Train dataset
Test dataset
Sample submission
XGBoost – simple use example (direct)
Link to script: https://goo.gl/vwQeXs
for direct use
we need to
specify
number
of classes
scikit-learn
conventional
names
to account for
examples importance
we can assign weights
to them in DMatrix
(not done here)
XGBoost – feature importance
Link to script: http://pastebin.com/uPK5aNkf
XGBoost – simple use example (in scikit-learn style)
Link to script: https://goo.gl/IPxKh5
XGBoost – most common parameters for tree booster
subsample
• ratio of instances to take
• example values: 0.6, 0.9
colsample_by_tree
• ratio of columns used for
whole tree creation
• example values: 0.1, …, 0.9
colsample_by_level
• ratio of columns sampled for
each split
• example values: 0.1, …, 0.9
eta
• how fast algorithm will learn
(shrinkage)
• example values: 0.01, 0.05, 0.1
max_depth
• maximum numer of consecutive
splits
• example values: 1, …, 15 (for
dozen or so or more, needs to
be set with regularization
parametrs)
min_child_weight
• minimum weight of children in
leaf, needs to be adopted for
each measure
• example values: 1 (for linear
regression it would be one
example, for classification it is
gradient of pseudo-residual)
XGBoost – regularization parameters for tree booster
alpha
• L1 norm (simple average) of
weights for whole objective
function
lambda
• L2 norm (root from average of
squares) of weights, added as
penalty to objective function
gamma
• L0 norm, multiplied by numer of
leafs in a tree is used to decide
whether to make a split
lambda is in
denominator because
of various
transformation, in
original objective it is
„classic” L2 penalty
XGBoost – some cool things not necesarilly present elswhere
Watchlist
You can create some small
validation set to prevent
overfitting without need to
guess how many iterations
would be enough
Warm
start
Thanks to the nature of
gradient boosting, you can
take some already woking
solution and start the
„improvement” from it
Easy cut-
off at any
round
when
classyfing
Simple but powerful while
tuning the parameters,
allows you to recover from
overfitted model in easy
way
Last, but not least – who won?
Winning solutions
The winning solution consists of over 20 xgboost models
that each need about two hours to train when running three
models in parallel on my laptop. So I think it could be done
within 24 hours. Most of the models individually achieve a
very competitive (top 3 leaderboard) score.
by Gert Jacobusse
I did quite a bit of manual feature engineering and my models
are entirely based on xgboost. Feature engineering is a
combination of “brutal force” (trying different
transformations, etc that I know) and “heuristic” (trying to
think about drivers of the target in real world settings). One
thing I learned recently was entropy based features, which
were useful in my model.
by Owen Zhang
Winning solutions
by Gilberto Titericz Jr and Stanislav Semenov
Even deeper dive
• XGBoost:
• https://github.com/dmlc/xgboost/blob/master/demo/README.md
• More info about machine learning:
• Uczenie maszynowe i sieci neuronowe.
Krawiec K., Stefanowski J., Wydawnictwo PP, Poznań, 2003. (2 wydanie 2004)
• The Elements of Statistical Learning: Data Mining, Inference, and Prediction
by Trevor Hastie, Robert Tibshirani, Jerome Friedman (available on-line for free)
• Sci-kit learn and pandas official pages:
• http://scikit-learn.org/
• http://pandas.pydata.org/
• Kaggle blog and forum:
• http://blog.kaggle.com/ (no free hunch)
• https://www.datacamp.com/courses/kaggle-python-tutorial-on-machine-learning
• https://www.datacamp.com/courses/intro-to-python-for-data-science
• Python basics:
• https://www.coursera.org/learn/interactive-python-1
• https://www.coursera.org/learn/interactive-python-2

More Related Content

What's hot

Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
Nihar Ranjan
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
Kirkwood Donavin
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Xgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedXgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - Explained
Simon Lia-Jonassen
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
Shuai Zhang
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Ada boost
Ada boostAda boost
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
Dr Sulaimon Afolabi
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
Prithvi Paneru
 
XGBoost (System Overview)
XGBoost (System Overview)XGBoost (System Overview)
XGBoost (System Overview)
Natallie Baikevich
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
Carlo Carandang
 
KNN
KNN KNN

What's hot (20)

Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Xgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedXgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - Explained
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Ada boost
Ada boostAda boost
Ada boost
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
XGBoost (System Overview)
XGBoost (System Overview)XGBoost (System Overview)
XGBoost (System Overview)
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
KNN
KNN KNN
KNN
 

Similar to XGBoost: the algorithm that wins every competition

Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
Greg Makowski
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Databricks
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
Garrett Teoh Hor Keong
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptx
yadav834181
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
microsoftventures
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahout
tanuvir
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
Raouf KESKES
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
Barak Gitsis
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
철민 권
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
PyData
 
chapter1.pdf
chapter1.pdfchapter1.pdf
chapter1.pdf
ssuser75b6b3
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
Abhik Roychoudhury
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
rajalakshmi5921
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Eugene Yan Ziyou
 

Similar to XGBoost: the algorithm that wins every competition (20)

Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptx
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahout
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
chapter1.pdf
chapter1.pdfchapter1.pdf
chapter1.pdf
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

XGBoost: the algorithm that wins every competition

  • 1. XGBoost: the algorithm that wins every competition Poznań Univeristy of Technology; April 28th, 2016 meet.ml #1 - Applied Big Data and Machine Learning By Jarosław Szymczak
  • 3. Some recent competitions from Kaggle (Binary classification problem) Product Classification Challenge (Multi-label classification problem) Store Sales (Regression problem)
  • 4. Some recent competitions from Kaggle Features: 93
  • 5. Some recent competitions from Kaggle CTR – click through rate (~ equal to probability of click) Features:
  • 6. Some recent competitions from Kaggle Sales prediction for store at certain date Features: • store id • date • school and state holidays • store type • assortment • promotions • competition
  • 7. Solutions evaluation 0 2 4 6 8 10 12 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 logarithmic loss for positive example • Root Mean Square Percentage Error (RMSPE) • Binary logarithmic loss (loss is capped at ~35 for a single observation) • Multi-class logarithmic loss
  • 8. What algorithms we tackle the problems with?
  • 9. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 10. Shall we play tenis?
  • 11. Decision tree standard example – shall we play tenis?
  • 12. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 13. But one tree for complicated problems is simply not enough, and we will need to reduce erros coming from: • variance • bias
  • 14. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 15. 44 2 Bagging – bootstrap aggregation 2 5 5 6 2 3 3 4 5 5 1 1 3 4 4 6 1 2 2 + 5 1 2 3 4 5 6Our initial dataset: First draw: Second draw: Third draw: Fourth draw: 3 Average: Model 1 Model 2 Model 3 Model 4 Final model
  • 16. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 18. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 19. AdaBoost – adaptive boosting + + + - + + - - - -
  • 20. AdaBoost – adaptive boosting + + + - + + - - - - + + + - + + - - - - + + + - + + - - - - + + + - + + - - - - ℎ1 ε1 = 0,30 α1 = 0,42 ℎ2 ε2 = 0,21 α2 = 0,65 ℎ3 ε3 = 0,14 α3 = 0,92 𝐻 = 0,42ℎ1 +0,65ℎ2 +0,92ℎ3
  • 21. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 22. Another boosting approach What if we, instead of reweighting examples, made some corrections to prediction errors directly? Note: Residual is a gradient of single observation error contribution in one of the most common evaluation measure for regression: root mean squared error (RMSE) We add new model, to the one we already have, so: Ypred = X1(Y) + NEW(Y) NEW(Y) = X1(Y) - Ypred our residual
  • 24. Gradient boosting It can be: • same algorithm as for further steps • or something very simple (like uniform probabilities or average target in regression) For any function that: • agrregates the error from examples (e.g. log-loss, RMSE, but not AUC) • you can calculate gradient on example level (it is called pseudo-residual) Fit model to initial data Fit pseudo-residuals Finalization Sum up all the models
  • 25. Bias related errors Bagging Random forest Adaptive boosting Gradient boosting Variance related errors
  • 26. XGBoost by Tianqi Chen eXtreme Gradient Boosting Used for: • classification • regression • ranking with custom loss functions Interfaces for Python and R, can be executed on YARN custom tree building algorithm
  • 27. XGBoost – handling the features Numeric values • for each numeric value, XGBoost finds the best available split (it is always a binary split) • algorithm is designed to work with numeric values only Nominal values • need to be converted to numeric ones • classic way is to perform one-hot-encoding / get dummies (for all values) • for variables with large cardinality, some other, more sophisticated methods may need to be used Missing values • XGBoost handles missing values separately • missing value is always added to a branch in which it would minimize the loss (so it is either treated as very large / very small value) • it is a great advantage over, e.g. scikit-learn gradient boosting implementation
  • 28. XGBoost – loss functions Used as optimization objective • logloss (optimized in complicated manner) for objectives binary:logistic and reg:logistic (they only differ by evaluation measure, all the rest is the same) • softmax (which is logloss transofrmed to handle multiple classes) for objective multi:softmax • RMSE (root mean squared error) for objective reg:linear Only measured • AUC – area under ROC curve • MAE – mean absolute error • error / merror – error rate for binary / multi-class classification Custom • only measured • simply implement the evaluation function • used as optimization objective • provided that you have a measure that is aggregation of element-wise losses and is differentiable • you create a function for element-wise gradient (derivative of prediction function) • then hessian (second derivative of prediction function) • that’s it, algorithm will approximate your loss function with Taylor series (pre-defined loss functions are implemented in the same manner)
  • 29. XGBoost – setup for Python environment Installing XGBoost in your environment Check out this wonderful tutorial to install Docker with kaggle Python stack: http://goo.gl/wSCkmC http://xgboost.readthedocs.org/en/latest/build.html Link as above, expect potential troubles Clone: https://github.com/dmlc/xgboost At revision: 219e58d453daf26d6717e2266b83fca0071f3129 And follow the instructions in it (it is of course older version)
  • 30. XGBoost code example on Otto data Train dataset Test dataset Sample submission
  • 31. XGBoost – simple use example (direct) Link to script: https://goo.gl/vwQeXs for direct use we need to specify number of classes scikit-learn conventional names to account for examples importance we can assign weights to them in DMatrix (not done here)
  • 32. XGBoost – feature importance Link to script: http://pastebin.com/uPK5aNkf
  • 33. XGBoost – simple use example (in scikit-learn style) Link to script: https://goo.gl/IPxKh5
  • 34. XGBoost – most common parameters for tree booster subsample • ratio of instances to take • example values: 0.6, 0.9 colsample_by_tree • ratio of columns used for whole tree creation • example values: 0.1, …, 0.9 colsample_by_level • ratio of columns sampled for each split • example values: 0.1, …, 0.9 eta • how fast algorithm will learn (shrinkage) • example values: 0.01, 0.05, 0.1 max_depth • maximum numer of consecutive splits • example values: 1, …, 15 (for dozen or so or more, needs to be set with regularization parametrs) min_child_weight • minimum weight of children in leaf, needs to be adopted for each measure • example values: 1 (for linear regression it would be one example, for classification it is gradient of pseudo-residual)
  • 35. XGBoost – regularization parameters for tree booster alpha • L1 norm (simple average) of weights for whole objective function lambda • L2 norm (root from average of squares) of weights, added as penalty to objective function gamma • L0 norm, multiplied by numer of leafs in a tree is used to decide whether to make a split lambda is in denominator because of various transformation, in original objective it is „classic” L2 penalty
  • 36. XGBoost – some cool things not necesarilly present elswhere Watchlist You can create some small validation set to prevent overfitting without need to guess how many iterations would be enough Warm start Thanks to the nature of gradient boosting, you can take some already woking solution and start the „improvement” from it Easy cut- off at any round when classyfing Simple but powerful while tuning the parameters, allows you to recover from overfitted model in easy way
  • 37. Last, but not least – who won?
  • 38. Winning solutions The winning solution consists of over 20 xgboost models that each need about two hours to train when running three models in parallel on my laptop. So I think it could be done within 24 hours. Most of the models individually achieve a very competitive (top 3 leaderboard) score. by Gert Jacobusse I did quite a bit of manual feature engineering and my models are entirely based on xgboost. Feature engineering is a combination of “brutal force” (trying different transformations, etc that I know) and “heuristic” (trying to think about drivers of the target in real world settings). One thing I learned recently was entropy based features, which were useful in my model. by Owen Zhang
  • 39. Winning solutions by Gilberto Titericz Jr and Stanislav Semenov
  • 40. Even deeper dive • XGBoost: • https://github.com/dmlc/xgboost/blob/master/demo/README.md • More info about machine learning: • Uczenie maszynowe i sieci neuronowe. Krawiec K., Stefanowski J., Wydawnictwo PP, Poznań, 2003. (2 wydanie 2004) • The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman (available on-line for free) • Sci-kit learn and pandas official pages: • http://scikit-learn.org/ • http://pandas.pydata.org/ • Kaggle blog and forum: • http://blog.kaggle.com/ (no free hunch) • https://www.datacamp.com/courses/kaggle-python-tutorial-on-machine-learning • https://www.datacamp.com/courses/intro-to-python-for-data-science • Python basics: • https://www.coursera.org/learn/interactive-python-1 • https://www.coursera.org/learn/interactive-python-2