An Introduction to boosting

An Introduction to Boosting Yoav Freund Banter Inc.

Plan of talk ,[object Object],[object Object],[object Object],[object Object],[object Object]

Toy Example ,[object Object],[object Object],[object Object],Human Voice Male Female

Generative modeling Voice Pitch Probability mean1 var1 mean2 var2

Discriminative approach Voice Pitch No. of mistakes

Ill-behaved data Voice Pitch Probability mean1 mean2 No. of mistakes

Traditional Statistics vs. Machine Learning Data Estimated world state Predictions Actions Statistics Decision Theory Machine Learning

Comparison of methodologies Mismatch problems Performance measure Goal Model Misclassifications Outliers Misclassification rate Likelihood Classification rule Probability estimates Discriminative Generative

A weak learner weak learner A weak rule h h Weighted training set (x1, y1 , w1 ),(x2, y2 , w2 ) … (xn, yn , wn ) instances x1,x2,x3,…,xn labels y1,y2,y3,…,yn The weak requirement: Feature vector Binary label Non-negative weights sum to 1

The boosting process Final rule: Sign [ ] h1   h2   hT   weak learner h1 (x1,y1, 1/n ), … (xn,yn, 1/n ) weak learner h2 (x1,y1, w1 ), … (xn,yn, wn ) h3 (x1,y1, w1 ), … (xn,yn, wn ) h4 (x1,y1, w1 ), … (xn,yn, wn ) h5 (x1,y1, w1 ), … (xn,yn, wn ) h6 (x1,y1, w1 ), … (xn,yn, wn ) h7 (x1,y1, w1 ), … (xn,yn, wn ) h8 (x1,y1, w1 ), … (xn,yn, wn ) h9 (x1,y1, w1 ), … (xn,yn, wn ) hT (x1,y1, w1 ), … (xn,yn, wn )

Adaboost ,[object Object],[object Object],[object Object],[object Object],[object Object]

Main property of adaboost ,[object Object],[object Object]

Adaboost as gradient descent ,[object Object],[object Object],[object Object],[object Object]

Margins view Project Prediction = + - + + + + + + - - - - - - - w Correct Mistakes Margin Cumulative # examples Mistakes Correct Margin =

Adaboost et al. Loss Correct Margin Mistakes Brownboost Logitboost Adaboost = 0-1 loss

One coordinate at a time ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is a good weak learner? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Alternating Trees Joint work with Llew Mason

Decision Trees X>3 Y>5 -1 +1 -1 +1 -1 -1 no yes yes no X Y 3 5

Decision tree as a sum -0.2 -0.2 X Y Y>5 +0.2 -0.3 yes no X>3 -0.1 no yes +0.1 +0.1 -0.1 +0.2 -0.3 +1 -1 -1 sign

An alternating decision tree -0.2 +0.7 X Y +0.1 -0.1 +0.2 -0.3 sign Y>5 +0.2 -0.3 yes no X>3 -0.1 no yes +0.1 Y<1 0.0 no yes +0.7 +1 -1 -1 +1

Example: Medical Diagnostics ,[object Object],[object Object],[object Object],[object Object]

Adtree for Cleveland heart-disease diagnostics problem

Cross-validated accuracy 0.8% 16.5% 16 Boost Stumps 0.5% 20.2% 446 C5.0 + boosting 0.5% 27.2% 27 C5.0 0.6% 17.0% 6 ADtree Test error variance Average test error Number of splits Learning algorithm

Curious phenomenon Boosting decision trees Using < 10,000 training examples we fit > 2,000,000 parameters

Explanation using margins Margin 0-1 loss

Explanation using margins Margin 0-1 loss No examples with small margins!!

Theorem For any convex combination and any threshold No dependence on number of weak rules that are combined!!! Schapire, Freund, Bartlett & Lee Annals of stat. 98 Probability of mistake Fraction of training example with small margin Size of training sample VC dimension of weak rules

An Introduction to boosting

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to An Introduction to boosting

Similar to An Introduction to boosting (20)

More from butest

More from butest (20)

An Introduction to boosting

Editor's Notes