20190409 agist adaboost_algorithm

Adaboost Algorithm
Apr. 09, 2019
Yangwoo, Kim
Master Degree at BioComputing Lab
1

Contents
2
1. Introduction
2. Ensemble Summary
3. Random Forest (Bagging)
4. Adaboost (Boosting)
5. Application

3
1. Introduction
“Ensemble Learning” Definition:
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive
performance than could be obtained from any of the constituent learning algorithm alone.
02 0301 04
Bagging Boosting
Bayesian model
combination
Stacking
Ensemble model
In this slide, we will focus on ‘Boosting Algorithm’ !
https://en.wikipedia.org/wiki/Boosting_(machine_learning)

4
1. Introduction
• There are lots of data types in Kaggle + Most people make a baseline by using xgboost and lightgbm

5
1. Introduction
• Among the 29 challenge winning solutions published at Kaggle’s blog during 2015, 17 solutions used XGBoost !
• Among these solutions, eight solely used XGBoost to train the model, while most others combined XGBoost
with neural nets in ensembles.
• For comparison, the second most popular method, deep neural nets, was used in 11 solutions.

6
2. Ensemble Summary
[ Bagging, called bootstrap aggregating ]
• To improve the stability and accuracy by reducing variance and avoiding overfitting.
training set 𝐷
size 𝑛
training set 𝐷𝑖
size 𝑛′
(𝑛′ = 𝑛)
by sampling
with replacement
 Bootstrap sample
In categorical data, use voting algorithm !
In regression, use averaging(mean) the output !

7
2. Ensemble Summary
https://www.youtube.com/watch?v=2Mg8QD0F1dQ
Data
(60%)
Train
(40%)
Test
𝐷1
. . .
𝐷2 𝐷 𝑚
Random with replacement
train train train
model model model
Mean / Voting

8
2. Ensemble Summary
• i.e. Random Forest
For solving,
1. Underfitting with high bias
2. Overfitting with high variance

9
2. Ensemble Summary
https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
[ Boosting ]
• “Can a set of weak learners creates a single strong learner?”
• Weak learner : a classifier that is only slightly correlated with the true classification
• Compared to Bagging, Boosting gives more weights to misclassified data to improve classification accuracy
• We will only focus on ‘Adaboost’ in this presentation

10
2. Ensemble Summary
https://www.gormanalysis.com/blog/guide-to-model-stacking-i-e-meta-ensembling/
[ Stacking ]
• Involves training a learning algorithm to combine the predictions of several other learning algorithms.
• First, all of the other algorithms are trained using the available data, then a combiner algorithm is trained to make
a final prediction using all the predictions of the other algorithms as additional inputs.
• (Bob, Kate, Mark, Sue) 4 people through 187 darts at a board !
• 150 samples(train data) + 27 samples(test data)

11
2. Ensemble Summary
[ Stacking ]
• Our target is stacking these models (KNN, SVM)
KNN model does a good job at classifying
Kate’s and Mark’s throws
SVM model does a good job at classifying
Bob’s and Sue’s throws
How can we combine two advantages by Stacking ?

12
2. Ensemble Summary
[ Stacking ]
Train datataset is divided into 5 folds
(Train : 150 samples, Test 27 samples)
True value (Competitor)

13
2. Ensemble Summary
[ Stacking ]
• M1 : K-Nearest Neighbors(k=1)
• M2 : Support Vector Machine (type = 4, cost = 1000)
Make meta data

14
2. Ensemble Summary
[ Stacking ]
• M1 : K-Nearest Neighbors(k=1)
• M2 : Support Vector Machine (type = 4, cost = 1000)
• Fit the base model to the training fold
• (Fill FoldID = 1) Train FoldId 2-5, predict the M1, M2 by FoldId 1
• (Fill FoldID = 2) Train FoldId 1,3,4,5, predict the M1, M2 by FoldId 2
• ...

15
2. Ensemble Summary
[ Stacking ]
• Fit each base model (M1, M2) to the full training dataset
- It means that test_meta’s M1, M2 are filled by all of the train folds
• Fit a new model (stacking model) ! Optionally, include other features from the original training dataset or
engineered features
• The main point to take home is that we’re using the predictions of the base models as features (i.e. meta features)

16
2. Ensemble Summary
[ Stacking ]
KNN model does a good job
at classifying
Kate’s and Mark’s throws
SVM model does a good job
at classifying
Bob’s and Sue’s throws
Combine base model’s advantage

17
3. Random Forest
• Before we understand XGBoost, we have to know ‘Gradient Boosting’...
• Before we understand ‘Gradient Boosting’, we have to know ‘Boosting’, ‘Adaboost’, ‘Random Forest’ ()
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=0s
Step 1) Create a bootstrapped dataset
Randomly selected with replacement !

18
3. Random Forest
Step 2) Create a decision tree using the bootstrapped dataset !
Only considering a random a subset of variables at each step !
You’d do this 100’s of times !
+. Blue node can be made by 2 or more variables

19
3. Random Forest
Step 3) How to use it ?
etc. etc. etc..
Bootstrapping the data plus
using the aggregated to make a descision is called ‘Bagging’
...
Voting ! YES !!,

20
3. Random Forest
Step 4) How to we know if it’s any good ?
OOB dataset is composed by not in bootstrapped dataset
Yes
Yes
NO
We can also get an OOB error by this step.

21
3. Random Forest
Step 5) Change the number of variables used per step + Find out better model
Recall that random forest is bagging !

22
4. Adaboost
• Before we read XGBoost theory, recall the Boosting !
- Convert weak learner to strong ones / Sequential model / Random sampling with replacement
Thouhts on Hypotesis Boosting(1988)
1988
Generalization of AdaBoost
as Gradient Boosting
2012
Adaboost (1995)
A decision-theoretic generalization of on-line learning and an application to boosting
2011 2016
XGBoost
https://www.slideshare.net/freepsw/boosting-bagging-vs-boosting

23 https://www.youtube.com/watch?v=GM3CDQfQ4sw
Data
(60%)
Train
(40%)
Test
𝐷1
train
model
①
Random
Test
Error samples
𝐷2
Random
train
𝐷3
model
Combine
Error samples
Random
. . .
train
model
[ Boosting ]
4. Adaboost
• How to optimize ?!
- Adaboost, Gradient Boosting and so on

24 https://www.youtube.com/watch?v=LsK-xG1cLYA
4. Adaboost
• In random forest, there is no predetermined maximum depth
• In contrast, in a Forest of Trees made with AdaBoost,
- use one variable(feature) to make a decision
- Therefore, accuracy is too low (weak learner)
STUMP 
Weak learner

4. Adaboost
• As you know, sequence is important in Adaboost
- Next stump is made by influence of previous stump
- i.e. the errors that secode stump makes influence how the third stump is made
Sample weight init. =
1
# 𝑡𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
• Let’s start !

4. Adaboost
1) Make a first stump / Use Gini Index
2) Find an error (incorrect classification sample) / Determin Amount of Say
3) Give the weights (or modify the weights) to error samples

4. Adaboost
Gini index = probability of misclassification
- Gini is 0 for all signal or all background
- 𝐺𝑖𝑛𝑖 = (σ𝑖=1
𝑛
𝑊𝑖)𝑃(1 − 𝑃)
3
5
∗
2
5
+
2
3
∗
1
3
= 0.24 + 0.22 = 0.46
3
6
∗
3
6
+
1
2
∗
1
2
= 0.25 + 0.25 = 0.50
3
3
∗
0
3
+
4
5
∗
1
5
= 0 + 0.20 = 0.20

4. Adaboost
Total Error = σ 𝐸𝑟𝑟𝑜𝑟 =
1
8
Amount of Say am =
1
2
log
1 −Totla Error
Totla Error
𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 = ෍
𝑦𝑖 ≠𝑘 𝑚(𝑥 𝑖)
𝑤𝑖
𝑚
/ ෍
𝑖=1
𝑁
𝑤𝑖
𝑚
=
1
2
log
1 − 1/8
1/8
=
1
2
log 7 = 𝟎. 𝟗𝟕

4. Adaboost
Amount of Say am =
1
2
log
1 −Totla Error
Totla Error
, 𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 = ෍
𝑦𝑖 ≠𝑘 𝑚(𝑥 𝑖)
𝑤𝑖
𝑚
/ ෍
𝑖=1
𝑁
𝑤𝑖
𝑚
[ Amount of Say Graph ]
Error ↓  Amount of Say ↑
Error = ½  Amount of Say = 0
Error ↑  Amount of Say ↓

4. Adaboost
• Now, we treat how to modify the samples weights !

4. Adaboost
𝑁𝑒𝑤 𝑆𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑒−𝑎𝑚𝑜𝑢𝑡 𝑜𝑓 𝑠𝑎𝑦
𝑁𝑒𝑤 𝑆𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑒+𝑎𝑚𝑜𝑢𝑡 𝑜𝑓 𝑠𝑎𝑦

4. Adaboost
Sum = 1
Weighted Gini Index would put
more emphasis on correctly
classifiying this sample
( misclassified by the previous stump)

4. Adaboost
• Let’s think about the modified weight. How to change the learning method ?
- we can make a new collection of samples that contains duplicat copies of the samples
0.00 – 0.07 : 1st row samples
0.07 – 0.14 : 2nd row samples
0.14 – 0.21 : 3rd row samples
0.21 – 0.70 : 4th row samples
...

4. Adaboost
• We go back to the beginning and try to find the stump that does the best job classifying the new collection
of samples.

20190409 agist adaboost_algorithm

More Related Content

Recently uploaded

Featured

20190409 agist adaboost_algorithm