SlideShare a Scribd company logo
1 of 34
1
Sparse vs. Ensemble Approaches
to Supervised Learning
Greg Grudic
Modified by Longin Jan Latecki
Some slides are by Piyush Rai
Intro AI Ensembles
2
Goal of Supervised Learning?
• Minimize the probability of model
prediction errors on future data
• Two Competing Methodologies
– Build one really good model
• Traditional approach
– Build many models and average the results
• Ensemble learning (more recent)
Intro AI Ensembles
3
The Single Model Philosophy
• Motivation: Occam’s Razor
– “one should not increase, beyond what is
necessary, the number of entities required to
explain anything”
• Infinitely many models can explain any
given dataset
– Might as well pick the smallest one…
Intro AI Ensembles
4
Which Model is Smaller?
• In this case
• It’s not always easy to define small!
( )
( )
1
3 5 7
2
ˆ sin( )
ˆ ...
3! 5! 7!
y f x x
or
x x x
y f x x
= =
= = - + - +
Intro AI Ensembles
5
Exact Occam’s Razor Models
• Exact approaches find optimal solutions
• Examples:
– Support Vector Machines
• Find a model structure that uses the smallest
percentage of training data (to explain the rest of it).
– Bayesian approaches
• Minimum description length
Intro AI Ensembles
6
How Do Support Vector Machines
Define Small?
Maximized
Margin
Minimize the number
of Support Vectors!
Intro AI Ensembles
7
Approximate Occam’s Razor
Models
• Approximate solutions use a greedy search
approach which is not optimal
• Examples
– Kernel Projection Pursuit algorithms
• Find a minimal set of kernel projections
– Relevance Vector Machines
• Approximate Bayesian approach
– Sparse Minimax Probability Machine Classification
• Find a minimum set of kernels and features
Intro AI Ensembles
8
Other Single Models: Not
Necessarily Motivated by Occam’s
Razor
• Minimax Probability Machine (MPM)
• Trees
– Greedy approach to sparseness
• Neural Networks
• Nearest Neighbor
• Basis Function Models
– e.g. Kernel Ridge Regression
Intro AI Ensembles
9
Ensemble Philosophy
• Build many models and combine them
• Only through averaging do we get at the
truth!
• It’s too hard (impossible?) to build a single
model that works best
• Two types of approaches:
– Models that don’t use randomness
– Models that incorporate randomness
Intro AI Ensembles
10
Ensemble Approaches
• Bagging
– Bootstrap aggregating
• Boosting
• Random Forests
– Bagging reborn
Intro AI Ensembles
11
Bagging
• Main Assumption:
– Combining many unstable predictors to produce a
ensemble (stable) predictor.
– Unstable Predictor: small changes in training data
produce large changes in the model.
• e.g. Neural Nets, trees
• Stable: SVM (sometimes), Nearest Neighbor.
• Hypothesis Space
– Variable size (nonparametric):
• Can model any function if you use an appropriate predictor
(e.g. trees)
Intro AI Ensembles
12
The Bagging Algorithm
For
• Obtain bootstrap sample from the
training data
• Build a model from bootstrap data
1:
m M
=
( ) ( )
{ }
1 1
, ,..., ,
N N
D y y
= x x
Given data:
m
D
D
( )
m
G x m
D
Intro AI Ensembles
13
The Bagging Model
• Regression
• Classification:
– Vote over classifier outputs
( )
1
1
ˆ
M
m
m
y G
M =
= å x
( ) ( )
1 ,..., M
G G
x x
Intro AI Ensembles
14
Bagging Details
• Bootstrap sample of N instances is obtained
by drawing N examples at random, with
replacement.
• On average each bootstrap sample has
63% of instances
– Encourages predictors to have
uncorrelated errors
• This is why it works
Intro AI Ensembles
15
Bagging Details 2
• Usually set
– Or use validation data to pick
• The models need to be unstable
– Usually full length (or slightly pruned) decision
trees.
~ 30
M =
M
( )
m
G x
Intro AI Ensembles
16
Boosting
– Main Assumption:
• Combining many weak predictors (e.g. tree stumps
or 1-R predictors) to produce an ensemble predictor
• The weak predictors or classifiers need to be stable
– Hypothesis Space
• Variable size (nonparametric):
– Can model any function if you use an appropriate
predictor (e.g. trees)
Intro AI Ensembles
17
Commonly Used Weak Predictor
(or classifier)
A Decision Tree Stump (1-R)
Intro AI Ensembles
18
Boosting
Each classifier is
trained from a weighted
Sample of the training
Data
( )
m
G x
Intro AI Ensembles
19
Boosting (Continued)
• Each predictor is created by using a biased
sample of the training data
– Instances (training examples) with high error
are weighted higher than those with lower error
• Difficult instances get more attention
– This is the motivation behind boosting
Intro AI Ensembles
20
Background Notation
• The function is defined as:
• The function is the natural logarithm
( )
1 if is true
0 otherwise
s
I s
ì
ï
ï
= í
ï
ï
î
( )
log x
( )
I s
Intro AI Ensembles
21
The AdaBoost Algorithm
(Freund and Schapire, 1996)
1. Initialize weights
2. For
a) Fit classifier to data using weights
b) Compute
c) Compute
d) Set
1:
m M
=
( ) ( )
{ }
1 1
, ,..., ,
N N
D y y
= x x
Given data:
( )
( )
1
1
N
i i m i
i
m N
i
i
w I y G
err
w
=
=
¹
=
å
å
x
( )
( )
log 1 /
m m m
err err
a = -
( )
( )
exp , 1,...,
i i m i m i
w w I y G i N
a
é ù
¬ ¹ =
ê ú
ë û
x
( ) { }
1,1
m
G Î -
x
1/ , 1,...,
i
w N i N
= =
i
w
Intro AI Ensembles
22
The AdaBoost Model
( )
1
ˆ sgn
M
m m
m
y G
a
=
é ù
ê ú
=
ê ú
ë û
å x
AdaBoost is NOT used for Regression!
Intro AI Ensembles
23
The Updates in Boosting
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-5
-4
-3
-2
-1
0
1
2
3
4
5
Alpha for Boosting
errm

m
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
10
20
30
40
50
60
70
80
90
100
Re-weighting Factor for Boosting
errm
w
*
exp(

m
)
Intro AI Ensembles
24
Boosting Characteristics
Simulated data: test error
rate for boosting with
stumps, as a function of
the number of iterations.
Also shown are the test
error rate for a single
stump, and a 400 node
tree.
Intro AI Ensembles
25
•Misclassification
•Exponential (Boosting)
•Binomial Deviance
(Cross Entropy)
•Squared Error
•Support Vectors
Loss Functions for
( )
( )
sgn
I f y
¹
( )
exp yf
-
{ }
1, 1 ,
y f
Î - + Î Â
( )
( )
log 1 exp 2yf
+ -
( )
2
y f
-
( ) ( )
1 1
yf I yf
- >
g
Intro AI Ensembles
Correct Classification
Incorrect Classification
Intro AI Ensembles 26
Intro AI Ensembles 27
Intro AI Ensembles 28
Intro AI Ensembles 29
Intro AI Ensembles 30
Intro AI Ensembles 31
32
Other Variations of Boosting
• Gradient Boosting
– Can use any cost function
• Stochastic (Gradient) Boosting
– Bootstrap Sample: Uniform random sampling
(with replacement)
– Often outperforms the non-random version
Intro AI Ensembles
33
Gradient Boosting
Intro AI Ensembles
34
Boosting Summary
• Good points
– Fast learning
– Capable of learning any function (given appropriate weak learner)
– Feature weighting
– Very little parameter tuning
• Bad points
– Can overfit data
– Only for binary classification
• Learning parameters (picked via cross validation)
– Size of tree
– When to stop
• Software
– http://www-stat.stanford.edu/~jhf/R-MART.html
Intro AI Ensembles

More Related Content

What's hot

Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...bhavesh lande
 
Single image haze removal
Single image haze removalSingle image haze removal
Single image haze removalMohsinGhazi2
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systemsSagar Ahire
 
Digital image processing question bank
Digital image processing question bankDigital image processing question bank
Digital image processing question bankYaseen Albakry
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Introduction to loaders
Introduction to loadersIntroduction to loaders
Introduction to loadersTech_MX
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Saurab Dulal
 
Loader and Its types
Loader and Its typesLoader and Its types
Loader and Its typesParth Dodiya
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 

What's hot (20)

Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 
Cg colg
Cg colgCg colg
Cg colg
 
Truth management system
Truth  management systemTruth  management system
Truth management system
 
Expert System
Expert SystemExpert System
Expert System
 
Single image haze removal
Single image haze removalSingle image haze removal
Single image haze removal
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systems
 
Digital image processing question bank
Digital image processing question bankDigital image processing question bank
Digital image processing question bank
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Classification metrics
Classification metrics Classification metrics
Classification metrics
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
Single Pass Assembler
Single Pass AssemblerSingle Pass Assembler
Single Pass Assembler
 
Introduction to loaders
Introduction to loadersIntroduction to loaders
Introduction to loaders
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Loader and Its types
Loader and Its typesLoader and Its types
Loader and Its types
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 

Similar to Ensemble Learning bagging, boosting and stacking

Summary.ppt
Summary.pptSummary.ppt
Summary.pptbutest
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine LearningGaurav Bhalotia
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01Mehmet Çelik
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptxMachine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptxNsitTech
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 

Similar to Ensemble Learning bagging, boosting and stacking (20)

Summary.ppt
Summary.pptSummary.ppt
Summary.ppt
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptxMachine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
supervised.pptx
supervised.pptxsupervised.pptx
supervised.pptx
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 

Recently uploaded

handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligencemahaffeycheryld
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 

Recently uploaded (20)

handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 

Ensemble Learning bagging, boosting and stacking

  • 1. 1 Sparse vs. Ensemble Approaches to Supervised Learning Greg Grudic Modified by Longin Jan Latecki Some slides are by Piyush Rai Intro AI Ensembles
  • 2. 2 Goal of Supervised Learning? • Minimize the probability of model prediction errors on future data • Two Competing Methodologies – Build one really good model • Traditional approach – Build many models and average the results • Ensemble learning (more recent) Intro AI Ensembles
  • 3. 3 The Single Model Philosophy • Motivation: Occam’s Razor – “one should not increase, beyond what is necessary, the number of entities required to explain anything” • Infinitely many models can explain any given dataset – Might as well pick the smallest one… Intro AI Ensembles
  • 4. 4 Which Model is Smaller? • In this case • It’s not always easy to define small! ( ) ( ) 1 3 5 7 2 ˆ sin( ) ˆ ... 3! 5! 7! y f x x or x x x y f x x = = = = - + - + Intro AI Ensembles
  • 5. 5 Exact Occam’s Razor Models • Exact approaches find optimal solutions • Examples: – Support Vector Machines • Find a model structure that uses the smallest percentage of training data (to explain the rest of it). – Bayesian approaches • Minimum description length Intro AI Ensembles
  • 6. 6 How Do Support Vector Machines Define Small? Maximized Margin Minimize the number of Support Vectors! Intro AI Ensembles
  • 7. 7 Approximate Occam’s Razor Models • Approximate solutions use a greedy search approach which is not optimal • Examples – Kernel Projection Pursuit algorithms • Find a minimal set of kernel projections – Relevance Vector Machines • Approximate Bayesian approach – Sparse Minimax Probability Machine Classification • Find a minimum set of kernels and features Intro AI Ensembles
  • 8. 8 Other Single Models: Not Necessarily Motivated by Occam’s Razor • Minimax Probability Machine (MPM) • Trees – Greedy approach to sparseness • Neural Networks • Nearest Neighbor • Basis Function Models – e.g. Kernel Ridge Regression Intro AI Ensembles
  • 9. 9 Ensemble Philosophy • Build many models and combine them • Only through averaging do we get at the truth! • It’s too hard (impossible?) to build a single model that works best • Two types of approaches: – Models that don’t use randomness – Models that incorporate randomness Intro AI Ensembles
  • 10. 10 Ensemble Approaches • Bagging – Bootstrap aggregating • Boosting • Random Forests – Bagging reborn Intro AI Ensembles
  • 11. 11 Bagging • Main Assumption: – Combining many unstable predictors to produce a ensemble (stable) predictor. – Unstable Predictor: small changes in training data produce large changes in the model. • e.g. Neural Nets, trees • Stable: SVM (sometimes), Nearest Neighbor. • Hypothesis Space – Variable size (nonparametric): • Can model any function if you use an appropriate predictor (e.g. trees) Intro AI Ensembles
  • 12. 12 The Bagging Algorithm For • Obtain bootstrap sample from the training data • Build a model from bootstrap data 1: m M = ( ) ( ) { } 1 1 , ,..., , N N D y y = x x Given data: m D D ( ) m G x m D Intro AI Ensembles
  • 13. 13 The Bagging Model • Regression • Classification: – Vote over classifier outputs ( ) 1 1 ˆ M m m y G M = = å x ( ) ( ) 1 ,..., M G G x x Intro AI Ensembles
  • 14. 14 Bagging Details • Bootstrap sample of N instances is obtained by drawing N examples at random, with replacement. • On average each bootstrap sample has 63% of instances – Encourages predictors to have uncorrelated errors • This is why it works Intro AI Ensembles
  • 15. 15 Bagging Details 2 • Usually set – Or use validation data to pick • The models need to be unstable – Usually full length (or slightly pruned) decision trees. ~ 30 M = M ( ) m G x Intro AI Ensembles
  • 16. 16 Boosting – Main Assumption: • Combining many weak predictors (e.g. tree stumps or 1-R predictors) to produce an ensemble predictor • The weak predictors or classifiers need to be stable – Hypothesis Space • Variable size (nonparametric): – Can model any function if you use an appropriate predictor (e.g. trees) Intro AI Ensembles
  • 17. 17 Commonly Used Weak Predictor (or classifier) A Decision Tree Stump (1-R) Intro AI Ensembles
  • 18. 18 Boosting Each classifier is trained from a weighted Sample of the training Data ( ) m G x Intro AI Ensembles
  • 19. 19 Boosting (Continued) • Each predictor is created by using a biased sample of the training data – Instances (training examples) with high error are weighted higher than those with lower error • Difficult instances get more attention – This is the motivation behind boosting Intro AI Ensembles
  • 20. 20 Background Notation • The function is defined as: • The function is the natural logarithm ( ) 1 if is true 0 otherwise s I s ì ï ï = í ï ï î ( ) log x ( ) I s Intro AI Ensembles
  • 21. 21 The AdaBoost Algorithm (Freund and Schapire, 1996) 1. Initialize weights 2. For a) Fit classifier to data using weights b) Compute c) Compute d) Set 1: m M = ( ) ( ) { } 1 1 , ,..., , N N D y y = x x Given data: ( ) ( ) 1 1 N i i m i i m N i i w I y G err w = = ¹ = å å x ( ) ( ) log 1 / m m m err err a = - ( ) ( ) exp , 1,..., i i m i m i w w I y G i N a é ù ¬ ¹ = ê ú ë û x ( ) { } 1,1 m G Î - x 1/ , 1,..., i w N i N = = i w Intro AI Ensembles
  • 22. 22 The AdaBoost Model ( ) 1 ˆ sgn M m m m y G a = é ù ê ú = ê ú ë û å x AdaBoost is NOT used for Regression! Intro AI Ensembles
  • 23. 23 The Updates in Boosting 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -5 -4 -3 -2 -1 0 1 2 3 4 5 Alpha for Boosting errm  m 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90 100 Re-weighting Factor for Boosting errm w * exp(  m ) Intro AI Ensembles
  • 24. 24 Boosting Characteristics Simulated data: test error rate for boosting with stumps, as a function of the number of iterations. Also shown are the test error rate for a single stump, and a 400 node tree. Intro AI Ensembles
  • 25. 25 •Misclassification •Exponential (Boosting) •Binomial Deviance (Cross Entropy) •Squared Error •Support Vectors Loss Functions for ( ) ( ) sgn I f y ¹ ( ) exp yf - { } 1, 1 , y f Î - + Î Â ( ) ( ) log 1 exp 2yf + - ( ) 2 y f - ( ) ( ) 1 1 yf I yf - > g Intro AI Ensembles Correct Classification Incorrect Classification
  • 32. 32 Other Variations of Boosting • Gradient Boosting – Can use any cost function • Stochastic (Gradient) Boosting – Bootstrap Sample: Uniform random sampling (with replacement) – Often outperforms the non-random version Intro AI Ensembles
  • 34. 34 Boosting Summary • Good points – Fast learning – Capable of learning any function (given appropriate weak learner) – Feature weighting – Very little parameter tuning • Bad points – Can overfit data – Only for binary classification • Learning parameters (picked via cross validation) – Size of tree – When to stop • Software – http://www-stat.stanford.edu/~jhf/R-MART.html Intro AI Ensembles