3
1. Introduction
“Ensemble Learning”Definition:
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive
performance than could be obtained from any of the constituent learning algorithm alone.
02 0301 04
Bagging Boosting
Bayesian model
combination
Stacking
Ensemble model
In this slide, we will focus on ‘Boosting Algorithm’ !
https://en.wikipedia.org/wiki/Boosting_(machine_learning)
4.
4
1. Introduction
• Thereare lots of data types in Kaggle + Most people make a baseline by using xgboost and lightgbm
5.
5
1. Introduction
• Amongthe 29 challenge winning solutions published at Kaggle’s blog during 2015, 17 solutions used XGBoost !
• Among these solutions, eight solely used XGBoost to train the model, while most others combined XGBoost
with neural nets in ensembles.
• For comparison, the second most popular method, deep neural nets, was used in 11 solutions.
6.
6
2. Ensemble Summary
[Bagging, called bootstrap aggregating ]
• To improve the stability and accuracy by reducing variance and avoiding overfitting.
training set 𝐷
size 𝑛
training set 𝐷𝑖
size 𝑛′
(𝑛′ = 𝑛)
by sampling
with replacement
Bootstrap sample
In categorical data, use voting algorithm !
In regression, use averaging(mean) the output !
8
2. Ensemble Summary
[Bagging, called bootstrap aggregating ]
• i.e. Random Forest
For solving,
1. Underfitting with high bias
2. Overfitting with high variance
9.
9
2. Ensemble Summary
https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
[Boosting ]
• “Can a set of weak learners creates a single strong learner?”
• Weak learner : a classifier that is only slightly correlated with the true classification
• Compared to Bagging, Boosting gives more weights to misclassified data to improve classification accuracy
• We will only focus on ‘Adaboost’ in this presentation
10.
10
2. Ensemble Summary
https://www.gormanalysis.com/blog/guide-to-model-stacking-i-e-meta-ensembling/
[Stacking ]
• Involves training a learning algorithm to combine the predictions of several other learning algorithms.
• First, all of the other algorithms are trained using the available data, then a combiner algorithm is trained to make
a final prediction using all the predictions of the other algorithms as additional inputs.
• (Bob, Kate, Mark, Sue) 4 people through 187 darts at a board !
• 150 samples(train data) + 27 samples(test data)
15
2. Ensemble Summary
https://www.gormanalysis.com/blog/guide-to-model-stacking-i-e-meta-ensembling/
[Stacking ]
• Fit each base model (M1, M2) to the full training dataset
- It means that test_meta’s M1, M2 are filled by all of the train folds
• Fit a new model (stacking model) ! Optionally, include other features from the original training dataset or
engineered features
• The main point to take home is that we’re using the predictions of the base models as features (i.e. meta features)
17
3. Random Forest
•Before we understand XGBoost, we have to know ‘Gradient Boosting’...
• Before we understand ‘Gradient Boosting’, we have to know ‘Boosting’, ‘Adaboost’, ‘Random Forest’ ()
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=0s
Step 1) Create a bootstrapped dataset
Randomly selected with replacement !
18.
18
3. Random Forest
Step2) Create a decision tree using the bootstrapped dataset !
Only considering a random a subset of variables at each step !
You’d do this 100’s of times !
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=0s
+. Blue node can be made by 2 or more variables
19.
19
3. Random Forest
Step3) How to use it ?
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=0s
etc. etc. etc..
Bootstrapping the data plus
using the aggregated to make a descision is called ‘Bagging’
...
Voting ! YES !!,
20.
20
3. Random Forest
Step4) How to we know if it’s any good ?
OOB dataset is composed by not in bootstrapped dataset
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=0s
Yes
Yes
NO
We can also get an OOB error by this step.
22
4. Adaboost
• Beforewe read XGBoost theory, recall the Boosting !
- Convert weak learner to strong ones / Sequential model / Random sampling with replacement
Thouhts on Hypotesis Boosting(1988)
1988
Generalization of AdaBoost
as Gradient Boosting
2012
Adaboost (1995)
A decision-theoretic generalization of on-line learning and an application to boosting
2011 2016
XGBoost
https://www.slideshare.net/freepsw/boosting-bagging-vs-boosting
24 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
•In random forest, there is no predetermined maximum depth
• In contrast, in a Forest of Trees made with AdaBoost,
- use one variable(feature) to make a decision
- Therefore, accuracy is too low (weak learner)
STUMP
Weak learner
25.
25 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
•As you know, sequence is important in Adaboost
- Next stump is made by influence of previous stump
- i.e. the errors that secode stump makes influence how the third stump is made
Sample weight init. =
1
# 𝑡𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
• Let’s start !
27 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
1)Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
Gini index = probability of misclassification
- Gini is 0 for all signal or all background
- 𝐺𝑖𝑛𝑖 = (σ𝑖=1
𝑛
𝑊𝑖)𝑃(1 − 𝑃)
3
5
∗
2
5
+
2
3
∗
1
3
= 0.24 + 0.22 = 0.46
3
6
∗
3
6
+
1
2
∗
1
2
= 0.25 + 0.25 = 0.50
3
3
∗
0
3
+
4
5
∗
1
5
= 0 + 0.20 = 0.20
28.
28 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
1)Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
Total Error = σ 𝐸𝑟𝑟𝑜𝑟 =
1
8
Amount of Say am =
1
2
log
1 −Totla Error
Totla Error
𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 =
𝑦𝑖 ≠𝑘 𝑚(𝑥 𝑖)
𝑤𝑖
𝑚
/
𝑖=1
𝑁
𝑤𝑖
𝑚
=
1
2
log
1 − 1/8
1/8
=
1
2
log 7 = 𝟎. 𝟗𝟕
29.
29 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
Amountof Say am =
1
2
log
1 −Totla Error
Totla Error
, 𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 =
𝑦𝑖 ≠𝑘 𝑚(𝑥 𝑖)
𝑤𝑖
𝑚
/
𝑖=1
𝑁
𝑤𝑖
𝑚
1) Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
[ Amount of Say Graph ]
Error ↓ Amount of Say ↑
Error = ½ Amount of Say = 0
Error ↑ Amount of Say ↓
30.
30 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
1)Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
• Now, we treat how to modify the samples weights !
31.
31 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
1)Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
𝑁𝑒𝑤 𝑆𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑒−𝑎𝑚𝑜𝑢𝑡 𝑜𝑓 𝑠𝑎𝑦
𝑁𝑒𝑤 𝑆𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑒+𝑎𝑚𝑜𝑢𝑡 𝑜𝑓 𝑠𝑎𝑦
32.
32 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
1)Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples
Sum = 1
Weighted Gini Index would put
more emphasis on correctly
classifiying this sample
( misclassified by the previous stump)
33.
33 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
•Let’s think about the modified weight. How to change the learning method ?
- we can make a new collection of samples that contains duplicat copies of the samples
0.00 – 0.07 : 1st row samples
0.07 – 0.14 : 2nd row samples
0.14 – 0.21 : 3rd row samples
0.21 – 0.70 : 4th row samples
...