SlideShare a Scribd company logo
1 of 58
Download to read offline
Machine Learning Specialization
Boosting
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization2
Simple (weak) classifiers are good!
©2015-2016 Emily Fox & Carlos Guestrin
Logistic
regression
w. simple
features
Low variance. Learning is fast!
But high bias…
Shallow
decision trees
Decision
stumps
Income>$100K?
Safe Risky
NoYes
Machine Learning Specialization3
Finding a classifier that’s just right
©2015-2016 Emily Fox & Carlos Guestrin
Model complexity
Classification
error
train error
true error
Weak
learner
Need
stronger
learner
Option 1: add more features or depth
Option 2: ?????
Machine Learning Specialization4
Boosting question
“Can a set of weak learners be combined to
create a stronger learner?” Kearns and Valiant (1988)
Yes! Schapire (1990)
Boosting
Amazing impact: Ÿ simple approach Ÿ widely used in
industry Ÿ wins most Kaggle competitions
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization
Ensemble classifier
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization6
A single classifier
Output: ŷ = f(x)
-  Either +1 or -1
©2015-2016 Emily Fox & Carlos Guestrin
Input: x
Classifier
Income>$100K?
Safe Risky
NoYes
Machine Learning Specialization7 ©2015-2016 Emily Fox & Carlos Guestrin
Income>$100K?
Safe Risky
NoYes
Machine Learning Specialization8 ©2015-2016 Emily Fox & Carlos Guestrin
Ensemble methods: Each classifier “votes” on prediction
xi = (Income=$120K, Credit=Bad, Savings=$50K, Market=Good)
f1(xi) = +1 	
Combine?
F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi)) 	
Ensemble
model Learn coefficients
Income>$100K?
Safe Risky
NoYes
Credit history?
Risky Safe
GoodBad
Savings>$100K?
Safe Risky
NoYes
Market conditions?
Risky Safe
GoodBad
Income>$100K?
Safe Risky
NoYes
f2(xi) = -1 	
Credit history?
Risky Safe
GoodBad
f3(xi) = -1 	
Savings>$100K?
Safe Risky
NoYes
f4(xi) = +1 	
Market conditions?
Risky Safe
GoodBad
Machine Learning Specialization10 ©2015-2016 Emily Fox & Carlos Guestrin
Prediction with ensemble
w1 2
w2 1.5
w3 1.5
w4 0.5
f1(xi) = +1 	
Combine?
F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi)) 	
Ensemble
model Learn coefficients
Income>$100K?
Safe Risky
NoYes
Credit history?
Risky Safe
GoodBad
Savings>$100K?
Safe Risky
NoYes
Market conditions?
Risky Safe
GoodBad
Income>$100K?
Safe Risky
NoYes
f2(xi) = -1 	
Credit history?
Risky Safe
GoodBad
f3(xi) = -1 	
Savings>$100K?
Safe Risky
NoYes
f4(xi) = +1 	
Market conditions?
Risky Safe
GoodBad
Machine Learning Specialization12
Ensemble classifier in general
•  Goal:
- Predict output y
•  Either +1 or -1
- From input x
•  Learn ensemble model:
- Classifiers: f1(x),f2(x),…,fT(x)
- Coefficients: ŵ1,ŵ2,…,ŵT
•  Prediction:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
Machine Learning Specialization
Boosting
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization14
Training a classifier
©2015-2016 Emily Fox & Carlos Guestrin
Training
data
Learn
classifier
Predict
ŷ	=	sign(f(x))	
f(x)
Machine Learning Specialization15
Learning decision stump
©2015-2016 Emily Fox & Carlos Guestrin
Credit Income y
A $130K Safe
B $80K Risky
C $110K Risky
A $110K Safe
A $90K Safe
B $120K Safe
C $30K Risky
C $60K Risky
B $95K Safe
A $60K Safe
A $98K Safe
Credit Income y
A $130K Safe
B $80K Risky
C $110K Risky
A $110K Safe
A $90K Safe
B $120K Safe
C $30K Risky
C $60K Risky
B $95K Safe
A $60K Safe
A $98K Safe
Income?
> $100K ≤ $100K
ŷ = Safeŷ = Safe
3 1 4 3
Machine Learning Specialization16
Boosting = Focus learning on “hard” points
©2015-2016 Emily Fox & Carlos Guestrin
Training
data
Learn
classifier
Predict
ŷ	=	sign(f(x))	
f(x)	
Learn where f(x)
makes mistakes
Evaluate
Boosting: focus next
classifier on places
where f(x) does less well
Machine Learning Specialization17
Learning on weighted data:
More weight on “hard” or more important points
•  Weighted dataset:
- Each xi,yi weighted by αi
•  More important point = higher weight αi
•  Learning:
- Data point j counts as αi data points
•  E.g., αi = 2 è count point twice
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization18
Credit Income y
A $130K Safe
B $80K Risky
C $110K Risky
A $110K Safe
A $90K Safe
B $120K Safe
C $30K Risky
C $60K Risky
B $95K Safe
A $60K Safe
A $98K Safe
Learning a decision stump on weighted data
©2015-2016 Emily Fox & Carlos Guestrin
Credit Income y Weight α
A $130K Safe 0.5
B $80K Risky 1.5
C $110K Risky 1.2
A $110K Safe 0.8
A $90K Safe 0.6
B $120K Safe 0.7
C $30K Risky 3
C $60K Risky 2
B $95K Safe 0.8
A $60K Safe 0.7
A $98K Safe 0.9
Income?
> $100K ≤ $100K
ŷ = Riskyŷ = Safe
2 1.2 3 6.5
Increase weight α of harder/
misclassified points
Machine Learning Specialization19
Learning from weighted data in general
•  Usually, learning from weighted data
- Data point i counts as αi data points
•  E.g., gradient ascent for logistic
regression:
©2015-2016 Emily Fox & Carlos Guestrin
w
(t+1)
j w
(t)
j + ⌘
NX
i=1
hj(xi)
⇣
1[yi = +1] P(y = +1 | xi, w(t)
)
⌘
w
(t+1)
j w
(t)
j + ⌘
NX
i=1
hj(xi)
⇣
1[yi = +1] P(y = +1 | xi, w(t)
)
⌘
αi	
Sum over data points
Weigh each point by αi
Machine Learning Specialization20
Boosting = Greedy learning ensembles from data
©2015-2016 Emily Fox & Carlos Guestrin
Training data
Predict ŷ	=	sign(f1(x))	
Learn classifier
f1(x)	
Weighted data
Learn classifier & coefficient
ŵ,f2(x)	
Predict ŷ	=	sign(ŵ1	f1(x)	+	ŵ2	f2(x))	
Higher weight
for points where
f1(x) is wrong
Machine Learning Specialization
AdaBoost
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization23
AdaBoost: learning ensemble
[Freund & Schapire 1999]
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
Machine Learning Specialization
Computing coefficient ŵt
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization26
AdaBoost: Computing coefficient ŵt of classifier ft(x)
•  ft(x) is good è ft has low training error
•  Measuring error in weighted data?
- Just weighted # of misclassified points
©2015-2016 Emily Fox & Carlos Guestrin
Is ft(x) good?	
ŵt large	
Yes
ŵt small	No
Machine Learning Specialization28
Data point
Weighted classification error
©2015-2016 Emily Fox & Carlos Guestrin
(Sushi was great, ,α=1.2)
Learned classifier
Hide label
Weight of correct
Weight of mistakesSushi was great	
ŷ =
Correct!
0
0
1.2
0.5(Food was OK, ,α=0.5)Food was OK	
Mistake!
α=1.2	α=0.5
Machine Learning Specialization29
Weighted classification error
•  Total weight of mistakes:
•  Total weight of all points:
•  Weighted error measures fraction of weight of mistakes:
- Best possible value is 0.0
©2015-2016 Emily Fox & Carlos Guestrin
weighted_error = .
Machine Learning Specialization30
AdaBoost: Formula for computing
coefficient ŵt of classifier ft(x)
©2015-2016 Emily Fox & Carlos Guestrin
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
weighted_error(ft)
on training data ŵt	
Is ft(x) good?	
Yes
No
1 weighted error(ft)
weighted error(ft)
Machine Learning Specialization31
AdaBoost: learning ensemble
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
Machine Learning Specialization
Recompute weights αi
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization34
AdaBoost: Updating weights αi based on where
classifier ft(x) makes mistakes
©2015-2016 Emily Fox & Carlos Guestrin
Did ft get xi right?	
Decrease αi	
Yes
Increase αi	
No
Machine Learning Specialization36
AdaBoost: Formula for
updating weights αi
©2015-2016 Emily Fox & Carlos Guestrin
ft(xi)=yi ? ŵt	 Multiply αi by Implication
Did ft get xi right?	
Yes
No
αi ç	
αi e , if ft(xi)=yi 	
-ŵt	
αi e , if ft(xi)≠yi 	
ŵt
Machine Learning Specialization37
AdaBoost: learning ensemble
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
αi ç	
αi e , if ft(xi)=yi 	
-ŵt	
αi e , if ft(xi)≠yi 	
ŵt
Machine Learning Specialization39
AdaBoost: Normalizing weights αi
©2015-2016 Emily Fox & Carlos Guestrin
If xi often mistake,
weight αi gets very
large
If xi often correct,
weight αi gets very
small
Can cause numerical instability
after many iterations
Normalize weights to
add up to 1 after every iteration
↵i
↵i
PN
j=1 ↵j
.
Machine Learning Specialization40
AdaBoost: learning ensemble
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
αi ç	
αi e , if ft(xi)=yi 	
-ŵt	
αi e , if ft(xi)≠yi 	
ŵt	
↵i
↵i
PN
j=1 ↵j
.
Machine Learning Specialization
AdaBoost example
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization43
t=1: Just learn a classifier on original data
©2015-2016 Emily Fox & Carlos Guestrin
Learned decision stump f1(x)	Original data
Machine Learning Specialization44
Updating weights αi
©2015-2016 Emily Fox & Carlos Guestrin
Learned decision stump f1(x)	 New data weights αi 	
Boundary
Increase weight αi
of misclassified points
Machine Learning Specialization45
t=2: Learn classifier on weighted data
Learned decision stump f2(x)
on weighted data	
Weighted data: using αi
chosen in previous iteration
f1(x)	
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization46
Ensemble becomes weighted
sum of learned classifiers
©2015-2016 Emily Fox & Carlos Guestrin
=	
f1(x)	 f2(x)	
0.61	
ŵ1	
+	 0.53	
ŵ2
Machine Learning Specialization47 ©2015-2016 Emily Fox & Carlos Guestrin
Decision boundary of ensemble classifier
after 30 iterations
training_error = 0
Machine Learning Specialization
AdaBoost summary
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization50
AdaBoost: learning ensemble
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
αi ç	
αi e , if ft(xi)=yi 	
-ŵt	
αi e , if ft(xi)≠yi 	
ŵt	
↵i
↵i
PN
j=1 ↵j
.
Machine Learning Specialization
Boosted decision stumps
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization52
Boosted decision stumps
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x): pick decision stump with lowest
weighted training error according to αi
	
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
Machine Learning Specialization53
Finding best next decision stump ft(x)
©2015-2016 Emily Fox & Carlos Guestrin
Consider splitting on each feature:
weighted_error
= 0.2 	
weighted_error
= 0.35 	
weighted_error
= 0.3 	
weighted_error
= 0.4 	
ˆwt =
1
2
ln
✓
1 weighted error(ft)
weighted error(ft)
◆
= 0.69 	ŵt	
Income>$100K?
Safe Risky
NoYes
Credit history?
Risky Safe
GoodBad
Savings>$100K?
Safe Risky
NoYes
Market conditions?
Risky Safe
GoodBad
ft =	
Income>$100K?
Safe Risky
NoYes
Machine Learning Specialization54
Boosted decision stumps
•  Start same weight for all points: αi = 1/N
•  For t = 1,…,T
- Learn ft(x): pick decision stump with lowest
weighted training error according to αi
	
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi
•  Final model predicts by:
©2015-2016 Emily Fox & Carlos Guestrin
ˆy = sign
TX
t=1
ˆwtft(x)
!
Machine Learning Specialization55
Updating weights αi
©2015-2016 Emily Fox & Carlos Guestrin
= αi e-0.69 = αi /2	
= αi e0.69 = 2αi	
, if ft(xi)=yi 	
, if ft(xi)≠yi 	
αi ç	
αi e	
-ŵt	
αi e	
ŵt	
Credit Income y
A $130K Safe
B $80K Risky
C $110K Risky
A $110K Safe
A $90K Safe
B $120K Safe
C $30K Risky
C $60K Risky
B $95K Safe
A $60K Safe
A $98K Safe
Credit Income y ŷ
A $130K Safe Safe
B $80K Risky Risky
C $110K Risky Safe
A $110K Safe Safe
A $90K Safe Risky
B $120K Safe Safe
C $30K Risky Risky
C $60K Risky Risky
B $95K Safe Risky
A $60K Safe Risky
A $98K Safe Risky
Credit Income y ŷ
Previous
weight α
New
weight α
A $130K Safe Safe 0.5
B $80K Risky Risky 1.5
C $110K Risky Safe 1.5
A $110K Safe Safe 2
A $90K Safe Risky 1
B $120K Safe Safe 2.5
C $30K Risky Risky 3
C $60K Risky Risky 2
B $95K Safe Risky 0.5
A $60K Safe Risky 1
A $98K Safe Risky 0.5
Credit Income y ŷ
Previous
weight α
New
weight α
A $130K Safe Safe 0.5 0.5/2 = 0.25
B $80K Risky Risky 1.5 0.75
C $110K Risky Safe 1.5 2 * 1.5 = 3
A $110K Safe Safe 2 1
A $90K Safe Risky 1 2
B $120K Safe Safe 2.5 1.25
C $30K Risky Risky 3 1.5
C $60K Risky Risky 2 1
B $95K Safe Risky 0.5 1
A $60K Safe Risky 1 2
A $98K Safe Risky 0.5 1
Income>$100K?
Safe Risky
NoYes
Machine Learning Specialization
Boosting convergence & overfitting
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization57
Boosting question revisited
“Can a set of weak learners be combined to
create a stronger learner?” Kearns and Valiant (1988)
Yes! Schapire (1990)
Boosting
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization58
After some iterations,
training error of boosting goes to zero!!!
©2015-2016 Emily Fox & Carlos Guestrin
Trainingerror
Iterations of boosting
Boosted decision stumps on toy dataset
Training error of ensemble of
30 decision stumps = 0%
Training error of
1 decision stump = 22.5%
Machine Learning Specialization60
AdaBoost Theorem
Under some technical conditions…
Training error of
boosted classifier → 0
as T→∞	
©2015-2016 Emily Fox & Carlos Guestrin
Trainingerror
Iterations of boosting
May oscillate a bit
But will
generally decrease, &
eventually become 0!
Machine Learning Specialization61
Condition of AdaBoost Theorem
Under some technical conditions…
Training error of
boosted classifier → 0
as T→∞	
©2015-2016 Emily Fox & Carlos Guestrin
Extreme example:
No classifier can
separate a +1
on top of -1
Condition = At every t,
can find a weak learner with
weighted_error(ft) < 0.5
Not always
possible
Nonetheless, boosting often
yields great training error
Machine Learning Specialization62 ©2015-2016 Emily Fox & Carlos Guestrin
Boosted decision stumps on loan data
Decision trees on loan data
39% test error
8% training error
Overfitting
32% test error
28.5% training error
Better fit & lower test error
Machine Learning Specialization63
Boosting tends to be robust to overfitting
©2015-2016 Emily Fox & Carlos Guestrin
Test set performance about
constant for many iterations
è Less sensitive to choice of T
Machine Learning Specialization65
But boosting will eventually overfit,
so must choose max number of components T
©2015-2016 Emily Fox & Carlos Guestrin
Best test error around 31%
Test error eventually
increases to 33% (overfits)
Machine Learning Specialization66
How do we decide when to stop boosting?
Choosing T ?
Not on
training data
Never ever
ever ever on
test data
Validation
set
Cross-
validation
Like selecting parameters in other ML
approaches (e.g., λ in regularization)
©2015-2016 Emily Fox & Carlos Guestrin
Not useful: training
error improves
as T increases
If dataset is large For smaller data
Machine Learning Specialization
Summary of boosting
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization68
Variants of boosting and related algorithms
©2015-2016 Emily Fox & Carlos Guestrin
There are hundreds of variants of boosting, most important:
Many other approaches to learn ensembles, most important:
•  Like AdaBoost, but useful beyond
basic classification
Gradient
boosting
•  Bagging: Pick random subsets of the data
- Learn a tree in each subset
- Average predictions
•  Simpler than boosting & easier to parallelize
•  Typically higher error than boosting for same
number of trees (# iterations T)
Random
forests
Machine Learning Specialization69
Impact of boosting
(spoiler alert... HUGE IMPACT)
• Standard approach for face
detection, for example
Extremely useful in
computer vision
• Malware classification, credit
fraud detection, ads click through
rate estimation, sales forecasting,
ranking webpages for search,
Higgs boson detection,…
Used by most winners of
ML competitions
(Kaggle, KDD Cup,…)
• Coefficients chosen manually,
with boosting, with bagging,
or others
Most deployed ML systems
use model ensembles
©2015-2016 Emily Fox & Carlos Guestrin
Amongst most useful
ML methods ever created
Machine Learning Specialization70
What you can do now…
•  Identify notion ensemble classifiers
•  Formalize ensembles as the weighted
combination of simpler classifiers
•  Outline the boosting framework –
sequentially learn classifiers on weighted data
•  Describe the AdaBoost algorithm
-  Learn each classifier on weighted data
-  Compute coefficient of classifier
-  Recompute data weights
-  Normalize weights
•  Implement AdaBoost to create an ensemble of
decision stumps
•  Discuss convergence properties of AdaBoost &
how to pick the maximum number of iterations T
©2015-2016 Emily Fox & Carlos Guestrin

More Related Content

Viewers also liked

Advancd imagiology rad/endodontic courses
Advancd imagiology  rad/endodontic coursesAdvancd imagiology  rad/endodontic courses
Advancd imagiology rad/endodontic coursesIndian dental academy
 
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016Daniel Caridi
 
Uttrakhand disaster
Uttrakhand disasterUttrakhand disaster
Uttrakhand disasterIndira Kundu
 
Guardiola 4v4+3 - Conceptos de Juego de Posicion
Guardiola 4v4+3 - Conceptos de Juego de PosicionGuardiola 4v4+3 - Conceptos de Juego de Posicion
Guardiola 4v4+3 - Conceptos de Juego de PosicionKieran Smith
 
Place and goals of biology
Place and goals of biologyPlace and goals of biology
Place and goals of biologyAkilagovt
 
Rondos - Up Back Through + Progressions 1 + 2
Rondos - Up Back Through + Progressions 1 + 2Rondos - Up Back Through + Progressions 1 + 2
Rondos - Up Back Through + Progressions 1 + 2Inspire Coach Education
 
Barcelona philosophy
Barcelona philosophyBarcelona philosophy
Barcelona philosophycoachingtech
 

Viewers also liked (9)

Advancd imagiology rad/endodontic courses
Advancd imagiology  rad/endodontic coursesAdvancd imagiology  rad/endodontic courses
Advancd imagiology rad/endodontic courses
 
Doc4
Doc4Doc4
Doc4
 
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016
eMarketer Webinar: Cross-Device Targeting—What to Watch for in 2016
 
SET PIECES - Corner Kick Precedent:
SET PIECES - Corner Kick Precedent: SET PIECES - Corner Kick Precedent:
SET PIECES - Corner Kick Precedent:
 
Uttrakhand disaster
Uttrakhand disasterUttrakhand disaster
Uttrakhand disaster
 
Guardiola 4v4+3 - Conceptos de Juego de Posicion
Guardiola 4v4+3 - Conceptos de Juego de PosicionGuardiola 4v4+3 - Conceptos de Juego de Posicion
Guardiola 4v4+3 - Conceptos de Juego de Posicion
 
Place and goals of biology
Place and goals of biologyPlace and goals of biology
Place and goals of biology
 
Rondos - Up Back Through + Progressions 1 + 2
Rondos - Up Back Through + Progressions 1 + 2Rondos - Up Back Through + Progressions 1 + 2
Rondos - Up Back Through + Progressions 1 + 2
 
Barcelona philosophy
Barcelona philosophyBarcelona philosophy
Barcelona philosophy
 

Similar to C3.5.1

Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningLibya Thomas
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01butest
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2potaters
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Qingkai Kong
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Edureka!
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 

Similar to C3.5.1 (20)

C3.2.2
C3.2.2C3.2.2
C3.2.2
 
C3.2.1
C3.2.1C3.2.1
C3.2.1
 
C2.3
C2.3C2.3
C2.3
 
C3.4.1
C3.4.1C3.4.1
C3.4.1
 
C3.7.1
C3.7.1C3.7.1
C3.7.1
 
C2.1 intro
C2.1 introC2.1 intro
C2.1 intro
 
C2.4
C2.4C2.4
C2.4
 
C2.2
C2.2C2.2
C2.2
 
C3.3.1
C3.3.1C3.3.1
C3.3.1
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
 
C3.1 intro
C3.1 introC3.1 intro
C3.1 intro
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01Microsoft PowerPoint - Chapter01
Microsoft PowerPoint - Chapter01
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2
 
C2.6
C2.6C2.6
C2.6
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
 
Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 

More from Daniel LIAO (11)

C4.5
C4.5C4.5
C4.5
 
C4.4
C4.4C4.4
C4.4
 
C4.3.1
C4.3.1C4.3.1
C4.3.1
 
Lime
LimeLime
Lime
 
C4.1.2
C4.1.2C4.1.2
C4.1.2
 
C4.1.1
C4.1.1C4.1.1
C4.1.1
 
C3.6.1
C3.6.1C3.6.1
C3.6.1
 
C3.4.2
C3.4.2C3.4.2
C3.4.2
 
C2.5
C2.5C2.5
C2.5
 
C3.1.2
C3.1.2C3.1.2
C3.1.2
 
C3.1.logistic intro
C3.1.logistic introC3.1.logistic intro
C3.1.logistic intro
 

Recently uploaded

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

C3.5.1

  • 1. Machine Learning Specialization Boosting Emily Fox & Carlos Guestrin Machine Learning Specialization University of Washington ©2015-2016 Emily Fox & Carlos Guestrin
  • 2. Machine Learning Specialization2 Simple (weak) classifiers are good! ©2015-2016 Emily Fox & Carlos Guestrin Logistic regression w. simple features Low variance. Learning is fast! But high bias… Shallow decision trees Decision stumps Income>$100K? Safe Risky NoYes
  • 3. Machine Learning Specialization3 Finding a classifier that’s just right ©2015-2016 Emily Fox & Carlos Guestrin Model complexity Classification error train error true error Weak learner Need stronger learner Option 1: add more features or depth Option 2: ?????
  • 4. Machine Learning Specialization4 Boosting question “Can a set of weak learners be combined to create a stronger learner?” Kearns and Valiant (1988) Yes! Schapire (1990) Boosting Amazing impact: Ÿ simple approach Ÿ widely used in industry Ÿ wins most Kaggle competitions ©2015-2016 Emily Fox & Carlos Guestrin
  • 5. Machine Learning Specialization Ensemble classifier ©2015-2016 Emily Fox & Carlos Guestrin
  • 6. Machine Learning Specialization6 A single classifier Output: ŷ = f(x) -  Either +1 or -1 ©2015-2016 Emily Fox & Carlos Guestrin Input: x Classifier Income>$100K? Safe Risky NoYes
  • 7. Machine Learning Specialization7 ©2015-2016 Emily Fox & Carlos Guestrin Income>$100K? Safe Risky NoYes
  • 8. Machine Learning Specialization8 ©2015-2016 Emily Fox & Carlos Guestrin Ensemble methods: Each classifier “votes” on prediction xi = (Income=$120K, Credit=Bad, Savings=$50K, Market=Good) f1(xi) = +1 Combine? F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi)) Ensemble model Learn coefficients Income>$100K? Safe Risky NoYes Credit history? Risky Safe GoodBad Savings>$100K? Safe Risky NoYes Market conditions? Risky Safe GoodBad Income>$100K? Safe Risky NoYes f2(xi) = -1 Credit history? Risky Safe GoodBad f3(xi) = -1 Savings>$100K? Safe Risky NoYes f4(xi) = +1 Market conditions? Risky Safe GoodBad
  • 9. Machine Learning Specialization10 ©2015-2016 Emily Fox & Carlos Guestrin Prediction with ensemble w1 2 w2 1.5 w3 1.5 w4 0.5 f1(xi) = +1 Combine? F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi)) Ensemble model Learn coefficients Income>$100K? Safe Risky NoYes Credit history? Risky Safe GoodBad Savings>$100K? Safe Risky NoYes Market conditions? Risky Safe GoodBad Income>$100K? Safe Risky NoYes f2(xi) = -1 Credit history? Risky Safe GoodBad f3(xi) = -1 Savings>$100K? Safe Risky NoYes f4(xi) = +1 Market conditions? Risky Safe GoodBad
  • 10. Machine Learning Specialization12 Ensemble classifier in general •  Goal: - Predict output y •  Either +1 or -1 - From input x •  Learn ensemble model: - Classifiers: f1(x),f2(x),…,fT(x) - Coefficients: ŵ1,ŵ2,…,ŵT •  Prediction: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) !
  • 12. Machine Learning Specialization14 Training a classifier ©2015-2016 Emily Fox & Carlos Guestrin Training data Learn classifier Predict ŷ = sign(f(x)) f(x)
  • 13. Machine Learning Specialization15 Learning decision stump ©2015-2016 Emily Fox & Carlos Guestrin Credit Income y A $130K Safe B $80K Risky C $110K Risky A $110K Safe A $90K Safe B $120K Safe C $30K Risky C $60K Risky B $95K Safe A $60K Safe A $98K Safe Credit Income y A $130K Safe B $80K Risky C $110K Risky A $110K Safe A $90K Safe B $120K Safe C $30K Risky C $60K Risky B $95K Safe A $60K Safe A $98K Safe Income? > $100K ≤ $100K ŷ = Safeŷ = Safe 3 1 4 3
  • 14. Machine Learning Specialization16 Boosting = Focus learning on “hard” points ©2015-2016 Emily Fox & Carlos Guestrin Training data Learn classifier Predict ŷ = sign(f(x)) f(x) Learn where f(x) makes mistakes Evaluate Boosting: focus next classifier on places where f(x) does less well
  • 15. Machine Learning Specialization17 Learning on weighted data: More weight on “hard” or more important points •  Weighted dataset: - Each xi,yi weighted by αi •  More important point = higher weight αi •  Learning: - Data point j counts as αi data points •  E.g., αi = 2 è count point twice ©2015-2016 Emily Fox & Carlos Guestrin
  • 16. Machine Learning Specialization18 Credit Income y A $130K Safe B $80K Risky C $110K Risky A $110K Safe A $90K Safe B $120K Safe C $30K Risky C $60K Risky B $95K Safe A $60K Safe A $98K Safe Learning a decision stump on weighted data ©2015-2016 Emily Fox & Carlos Guestrin Credit Income y Weight α A $130K Safe 0.5 B $80K Risky 1.5 C $110K Risky 1.2 A $110K Safe 0.8 A $90K Safe 0.6 B $120K Safe 0.7 C $30K Risky 3 C $60K Risky 2 B $95K Safe 0.8 A $60K Safe 0.7 A $98K Safe 0.9 Income? > $100K ≤ $100K ŷ = Riskyŷ = Safe 2 1.2 3 6.5 Increase weight α of harder/ misclassified points
  • 17. Machine Learning Specialization19 Learning from weighted data in general •  Usually, learning from weighted data - Data point i counts as αi data points •  E.g., gradient ascent for logistic regression: ©2015-2016 Emily Fox & Carlos Guestrin w (t+1) j w (t) j + ⌘ NX i=1 hj(xi) ⇣ 1[yi = +1] P(y = +1 | xi, w(t) ) ⌘ w (t+1) j w (t) j + ⌘ NX i=1 hj(xi) ⇣ 1[yi = +1] P(y = +1 | xi, w(t) ) ⌘ αi Sum over data points Weigh each point by αi
  • 18. Machine Learning Specialization20 Boosting = Greedy learning ensembles from data ©2015-2016 Emily Fox & Carlos Guestrin Training data Predict ŷ = sign(f1(x)) Learn classifier f1(x) Weighted data Learn classifier & coefficient ŵ,f2(x) Predict ŷ = sign(ŵ1 f1(x) + ŵ2 f2(x)) Higher weight for points where f1(x) is wrong
  • 20. Machine Learning Specialization23 AdaBoost: learning ensemble [Freund & Schapire 1999] •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x) with data weights αi - Compute coefficient ŵt - Recompute weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) !
  • 21. Machine Learning Specialization Computing coefficient ŵt ©2015-2016 Emily Fox & Carlos Guestrin
  • 22. Machine Learning Specialization26 AdaBoost: Computing coefficient ŵt of classifier ft(x) •  ft(x) is good è ft has low training error •  Measuring error in weighted data? - Just weighted # of misclassified points ©2015-2016 Emily Fox & Carlos Guestrin Is ft(x) good? ŵt large Yes ŵt small No
  • 23. Machine Learning Specialization28 Data point Weighted classification error ©2015-2016 Emily Fox & Carlos Guestrin (Sushi was great, ,α=1.2) Learned classifier Hide label Weight of correct Weight of mistakesSushi was great ŷ = Correct! 0 0 1.2 0.5(Food was OK, ,α=0.5)Food was OK Mistake! α=1.2 α=0.5
  • 24. Machine Learning Specialization29 Weighted classification error •  Total weight of mistakes: •  Total weight of all points: •  Weighted error measures fraction of weight of mistakes: - Best possible value is 0.0 ©2015-2016 Emily Fox & Carlos Guestrin weighted_error = .
  • 25. Machine Learning Specialization30 AdaBoost: Formula for computing coefficient ŵt of classifier ft(x) ©2015-2016 Emily Fox & Carlos Guestrin ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆ weighted_error(ft) on training data ŵt Is ft(x) good? Yes No 1 weighted error(ft) weighted error(ft)
  • 26. Machine Learning Specialization31 AdaBoost: learning ensemble •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x) with data weights αi - Compute coefficient ŵt - Recompute weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) ! ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆
  • 27. Machine Learning Specialization Recompute weights αi ©2015-2016 Emily Fox & Carlos Guestrin
  • 28. Machine Learning Specialization34 AdaBoost: Updating weights αi based on where classifier ft(x) makes mistakes ©2015-2016 Emily Fox & Carlos Guestrin Did ft get xi right? Decrease αi Yes Increase αi No
  • 29. Machine Learning Specialization36 AdaBoost: Formula for updating weights αi ©2015-2016 Emily Fox & Carlos Guestrin ft(xi)=yi ? ŵt Multiply αi by Implication Did ft get xi right? Yes No αi ç αi e , if ft(xi)=yi -ŵt αi e , if ft(xi)≠yi ŵt
  • 30. Machine Learning Specialization37 AdaBoost: learning ensemble •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x) with data weights αi - Compute coefficient ŵt - Recompute weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) ! ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆ αi ç αi e , if ft(xi)=yi -ŵt αi e , if ft(xi)≠yi ŵt
  • 31. Machine Learning Specialization39 AdaBoost: Normalizing weights αi ©2015-2016 Emily Fox & Carlos Guestrin If xi often mistake, weight αi gets very large If xi often correct, weight αi gets very small Can cause numerical instability after many iterations Normalize weights to add up to 1 after every iteration ↵i ↵i PN j=1 ↵j .
  • 32. Machine Learning Specialization40 AdaBoost: learning ensemble •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x) with data weights αi - Compute coefficient ŵt - Recompute weights αi - Normalize weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) ! ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆ αi ç αi e , if ft(xi)=yi -ŵt αi e , if ft(xi)≠yi ŵt ↵i ↵i PN j=1 ↵j .
  • 33. Machine Learning Specialization AdaBoost example ©2015-2016 Emily Fox & Carlos Guestrin
  • 34. Machine Learning Specialization43 t=1: Just learn a classifier on original data ©2015-2016 Emily Fox & Carlos Guestrin Learned decision stump f1(x) Original data
  • 35. Machine Learning Specialization44 Updating weights αi ©2015-2016 Emily Fox & Carlos Guestrin Learned decision stump f1(x) New data weights αi Boundary Increase weight αi of misclassified points
  • 36. Machine Learning Specialization45 t=2: Learn classifier on weighted data Learned decision stump f2(x) on weighted data Weighted data: using αi chosen in previous iteration f1(x) ©2015-2016 Emily Fox & Carlos Guestrin
  • 37. Machine Learning Specialization46 Ensemble becomes weighted sum of learned classifiers ©2015-2016 Emily Fox & Carlos Guestrin = f1(x) f2(x) 0.61 ŵ1 + 0.53 ŵ2
  • 38. Machine Learning Specialization47 ©2015-2016 Emily Fox & Carlos Guestrin Decision boundary of ensemble classifier after 30 iterations training_error = 0
  • 39. Machine Learning Specialization AdaBoost summary ©2015-2016 Emily Fox & Carlos Guestrin
  • 40. Machine Learning Specialization50 AdaBoost: learning ensemble •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x) with data weights αi - Compute coefficient ŵt - Recompute weights αi - Normalize weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) ! ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆ αi ç αi e , if ft(xi)=yi -ŵt αi e , if ft(xi)≠yi ŵt ↵i ↵i PN j=1 ↵j .
  • 41. Machine Learning Specialization Boosted decision stumps ©2015-2016 Emily Fox & Carlos Guestrin
  • 42. Machine Learning Specialization52 Boosted decision stumps •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x): pick decision stump with lowest weighted training error according to αi - Compute coefficient ŵt - Recompute weights αi - Normalize weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) !
  • 43. Machine Learning Specialization53 Finding best next decision stump ft(x) ©2015-2016 Emily Fox & Carlos Guestrin Consider splitting on each feature: weighted_error = 0.2 weighted_error = 0.35 weighted_error = 0.3 weighted_error = 0.4 ˆwt = 1 2 ln ✓ 1 weighted error(ft) weighted error(ft) ◆ = 0.69 ŵt Income>$100K? Safe Risky NoYes Credit history? Risky Safe GoodBad Savings>$100K? Safe Risky NoYes Market conditions? Risky Safe GoodBad ft = Income>$100K? Safe Risky NoYes
  • 44. Machine Learning Specialization54 Boosted decision stumps •  Start same weight for all points: αi = 1/N •  For t = 1,…,T - Learn ft(x): pick decision stump with lowest weighted training error according to αi - Compute coefficient ŵt - Recompute weights αi - Normalize weights αi •  Final model predicts by: ©2015-2016 Emily Fox & Carlos Guestrin ˆy = sign TX t=1 ˆwtft(x) !
  • 45. Machine Learning Specialization55 Updating weights αi ©2015-2016 Emily Fox & Carlos Guestrin = αi e-0.69 = αi /2 = αi e0.69 = 2αi , if ft(xi)=yi , if ft(xi)≠yi αi ç αi e -ŵt αi e ŵt Credit Income y A $130K Safe B $80K Risky C $110K Risky A $110K Safe A $90K Safe B $120K Safe C $30K Risky C $60K Risky B $95K Safe A $60K Safe A $98K Safe Credit Income y ŷ A $130K Safe Safe B $80K Risky Risky C $110K Risky Safe A $110K Safe Safe A $90K Safe Risky B $120K Safe Safe C $30K Risky Risky C $60K Risky Risky B $95K Safe Risky A $60K Safe Risky A $98K Safe Risky Credit Income y ŷ Previous weight α New weight α A $130K Safe Safe 0.5 B $80K Risky Risky 1.5 C $110K Risky Safe 1.5 A $110K Safe Safe 2 A $90K Safe Risky 1 B $120K Safe Safe 2.5 C $30K Risky Risky 3 C $60K Risky Risky 2 B $95K Safe Risky 0.5 A $60K Safe Risky 1 A $98K Safe Risky 0.5 Credit Income y ŷ Previous weight α New weight α A $130K Safe Safe 0.5 0.5/2 = 0.25 B $80K Risky Risky 1.5 0.75 C $110K Risky Safe 1.5 2 * 1.5 = 3 A $110K Safe Safe 2 1 A $90K Safe Risky 1 2 B $120K Safe Safe 2.5 1.25 C $30K Risky Risky 3 1.5 C $60K Risky Risky 2 1 B $95K Safe Risky 0.5 1 A $60K Safe Risky 1 2 A $98K Safe Risky 0.5 1 Income>$100K? Safe Risky NoYes
  • 46. Machine Learning Specialization Boosting convergence & overfitting ©2015-2016 Emily Fox & Carlos Guestrin
  • 47. Machine Learning Specialization57 Boosting question revisited “Can a set of weak learners be combined to create a stronger learner?” Kearns and Valiant (1988) Yes! Schapire (1990) Boosting ©2015-2016 Emily Fox & Carlos Guestrin
  • 48. Machine Learning Specialization58 After some iterations, training error of boosting goes to zero!!! ©2015-2016 Emily Fox & Carlos Guestrin Trainingerror Iterations of boosting Boosted decision stumps on toy dataset Training error of ensemble of 30 decision stumps = 0% Training error of 1 decision stump = 22.5%
  • 49. Machine Learning Specialization60 AdaBoost Theorem Under some technical conditions… Training error of boosted classifier → 0 as T→∞ ©2015-2016 Emily Fox & Carlos Guestrin Trainingerror Iterations of boosting May oscillate a bit But will generally decrease, & eventually become 0!
  • 50. Machine Learning Specialization61 Condition of AdaBoost Theorem Under some technical conditions… Training error of boosted classifier → 0 as T→∞ ©2015-2016 Emily Fox & Carlos Guestrin Extreme example: No classifier can separate a +1 on top of -1 Condition = At every t, can find a weak learner with weighted_error(ft) < 0.5 Not always possible Nonetheless, boosting often yields great training error
  • 51. Machine Learning Specialization62 ©2015-2016 Emily Fox & Carlos Guestrin Boosted decision stumps on loan data Decision trees on loan data 39% test error 8% training error Overfitting 32% test error 28.5% training error Better fit & lower test error
  • 52. Machine Learning Specialization63 Boosting tends to be robust to overfitting ©2015-2016 Emily Fox & Carlos Guestrin Test set performance about constant for many iterations è Less sensitive to choice of T
  • 53. Machine Learning Specialization65 But boosting will eventually overfit, so must choose max number of components T ©2015-2016 Emily Fox & Carlos Guestrin Best test error around 31% Test error eventually increases to 33% (overfits)
  • 54. Machine Learning Specialization66 How do we decide when to stop boosting? Choosing T ? Not on training data Never ever ever ever on test data Validation set Cross- validation Like selecting parameters in other ML approaches (e.g., λ in regularization) ©2015-2016 Emily Fox & Carlos Guestrin Not useful: training error improves as T increases If dataset is large For smaller data
  • 55. Machine Learning Specialization Summary of boosting ©2015-2016 Emily Fox & Carlos Guestrin
  • 56. Machine Learning Specialization68 Variants of boosting and related algorithms ©2015-2016 Emily Fox & Carlos Guestrin There are hundreds of variants of boosting, most important: Many other approaches to learn ensembles, most important: •  Like AdaBoost, but useful beyond basic classification Gradient boosting •  Bagging: Pick random subsets of the data - Learn a tree in each subset - Average predictions •  Simpler than boosting & easier to parallelize •  Typically higher error than boosting for same number of trees (# iterations T) Random forests
  • 57. Machine Learning Specialization69 Impact of boosting (spoiler alert... HUGE IMPACT) • Standard approach for face detection, for example Extremely useful in computer vision • Malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ranking webpages for search, Higgs boson detection,… Used by most winners of ML competitions (Kaggle, KDD Cup,…) • Coefficients chosen manually, with boosting, with bagging, or others Most deployed ML systems use model ensembles ©2015-2016 Emily Fox & Carlos Guestrin Amongst most useful ML methods ever created
  • 58. Machine Learning Specialization70 What you can do now… •  Identify notion ensemble classifiers •  Formalize ensembles as the weighted combination of simpler classifiers •  Outline the boosting framework – sequentially learn classifiers on weighted data •  Describe the AdaBoost algorithm -  Learn each classifier on weighted data -  Compute coefficient of classifier -  Recompute data weights -  Normalize weights •  Implement AdaBoost to create an ensemble of decision stumps •  Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T ©2015-2016 Emily Fox & Carlos Guestrin