SlideShare a Scribd company logo
Bagging, Boosting, Random Forest, AdaBoost
In supervised learning, our goal is to learn predictor h(x) with high accuracy (low
error) using a training data {(x1,y1),…,(xn,yn)}.
Decision Tree
None of the classifiers is perfect. Examples which are not correctly classified by one classifier may
be correctly classified by the other classifiers. But, we can improve it by utilizing the Ensemble
classifier. General Idea of Ensemble Classifier:
The primary principle behind the ensemble model is that a group of weak learners come
together to form an active learner. Ensembles of Classifiers combine the classifiers to
improve the performance. It Combine the classification results from different
classifiers to produce the final output using unweighted voting or weighted voting.
Bias/Variance Tradeoff:
Ensemble methods that minimize variance
– Bagging
– Random Forests
Ensemble methods that minimize bias
– Functional Gradient Descent
– Boosting
– Ensemble Selection
o In Bagging, our goal is to reduce variance.
o Bagging combining many unstable predictors to produce a ensemble (stable)
predictor.
o We independently select many training sets S’.
o We train model using each S’ and finally average predictions.
Bagging Algorithm
• Training
o Given a dataset S, at each iteration i, a training set Si is sampled with replacement from S
(i.e. bootstraping)
o A classifier Ci is learned for each Si
• Classification: given an unseen sample X,
o Each classifier Ci returns its class prediction
o The bagged classifier H counts the votes and assigns the class with the most votes to X
• Regression: can be applied to the prediction of continuous values by taking the average
value of each prediction.
The Bagging Model
Bagging is more powerful than a single decision tree.
Bagging is used when our objective is to reduce the variance of a decision tree.
We create a few subsets of data from the training sample, which is chosen randomly with
replacement. Now each collection of subset data is used to prepare their decision trees
thus, we end up with an ensemble of various models. The average of all the assumptions
from numerous tress is used, which is more powerful than a single decision tree.
Random Forest is an expansion over bagging. It takes one additional step to predict a
random subset of data. It also makes the random selection of features rather than
using all features to develop trees. When we have numerous random trees, it is called
the Random Forest.
These are the following steps which are taken to implement a Random forest:
o Let us consider X observations Y features in the training data set. First, a model
from the training data set is taken randomly with substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based on the
collection of predictions from n number of trees.
Advantages of using Random Forest technique:
o It manages a higher dimension data set very well.
o It manages missing quantities and keeps accuracy for missing data.
Disadvantages of using Random Forest technique:
Since the last prediction depends on the mean predictions from subset trees, it won't
give precise value for the regression model.
Boosting is another ensemble procedure to make a collection of predictors. In other
words, we fit consecutive trees, usually random samples, and at each step, the
objective is to solve net error from the prior trees.
If a given input is misclassified by theory, then its weight is increased so that the
upcoming hypothesis is more likely to classify it correctly by consolidating the entire set
at last converts weak learners into better performing models.
Gradient Boosting is an expansion of the boosting procedure.
1. Gradient Boosting = Gradient Descent + Boosting
Bagging Boosting
Various training data subsets are randomly drawn
with replacement from the whole training dataset.
Each new subset contains the components that
were misclassified by previous models.
Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.
If the classifier is unstable (high variance), then we
need to apply bagging.
If the classifier is steady and straightforward
(high bias), then we need to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that
belong to the same type.
It is a way of connecting predictions that
belong to the different types.
Every model is constructed independently. New models are affected by the performance
of the previously developed model.
Comparison b/w Bagging and Boosting | Data Mining
Bagging and Boosting are two types of Ensemble Learning. These two decrease the
variance of single estimate as they combine several estimates from different models. So
the result may be a model with higher stability.
• If the difficulty of the single model is over-fitting, then Bagging is the best option.
• If the problem is that the single model gets a very low performance, Boosting could
generate a combined model with lower errors as it optimizes the advantages and reduces
pitfalls of the single model.
•
Similarities between Bagging and Boosting –
1. Both are ensemble methods to get N learners from 1 learner.
2. Both generate several training data sets by random sampling.
3. Both make the final decision by averaging the N learners (or taking the majority of
them i.e Majority Voting).
4. Both are good at reducing variance and provide higher stability.
Differences Between Bagging and Boosting –
S.NO Bagging Boosting
.
Simplest way of combining predictions
that belong to the same type.
A way of combining predictions
that belong to the different types.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
3. Each model receives equal weight.
Models are weighted according to
their performance.
4. Each model is built independently.
New models are influenced
by performance of previously built
models.
5.
Different training data subsets are
randomly drawn with replacement from
the entire training dataset.
Every new subsets contains the
elements that were misclassified by
previous models.
6.
Bagging tries to solve over-fitting
problem. Boosting tries to reduce bias.
7.
If the classifier is unstable (high variance),
then apply bagging.
If the classifier is stable and simple
(high bias) the apply boosting.
8. Random forest. Gradient boosting.
Boosting is a general ensemble method that creates a strong classifier from a number of
weak classifiers. This is done by building a model from the training data, then creating a
second model that attempts to correct the errors from the first model. Models are added
until the training set is predicted perfectly or a maximum number of models are added.
AdaBoost was the first really successful boosting algorithm developed for binary
classification. It is the best starting point for understanding boosting.
AdaBoost Ensemble
Weak models are added sequentially, trained using the weighted training data.
The process continues until a pre-set number of weak learners have been created (a user
parameter) or no further improvement can be made on the training dataset.
Once completed, you are left with a pool of weak learners each with a stage value.
Data Preparation for AdaBoost
This section lists some heuristics for best preparing your data for AdaBoost.
• Quality Data: Because the ensemble method continues to attempt to correct
misclassifications in the training data, you need to be careful that the training data is of a high-
quality.
• Outliers: Outliers will force the ensemble down the rabbit hole of working hard to correct for
cases that are unrealistic. These could be removed from the training dataset.
• Noisy Data: Noisy data, specifically noise in the output variable can be problematic. If
possible, attempt to isolate and clean these from your training dataset.
Making Predictions with AdaBoost
Predictions are made by calculating the weighted average of the weak classifiers.
For a new input instance, each weak learner calculates a predicted value as either +1.0 or -
1.0. The predicted values are weighted by each weak learners stage value. The prediction
for the ensemble model is taken as a the sum of the weighted predictions. If the sum is
positive, then the first class is predicted, if negative the second class is predicted.
For example, 5 weak classifiers may predict the values 1.0, 1.0, -1.0, 1.0, -1.0. From a
majority vote, it looks like the model will predict a value of 1.0 or the first class. These same
5 weak classifiers may have the stage values 0.2, 0.5, 0.8, 0.2 and 0.9 respectively.
Calculating the weighted sum of these predictions results in an output of -0.8, which would
be an ensemble prediction of -1.0 or the second class.

More Related Content

Similar to BaggingBoosting.pdf

M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
Raman Kannan
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 
Ensemble Method.pptx
Ensemble Method.pptxEnsemble Method.pptx
Ensemble Method.pptx
yashaswinitiwari1
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Ensemblelearning 181220105413
Ensemblelearning 181220105413Ensemblelearning 181220105413
Ensemblelearning 181220105413
Aravindharamanan S
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
DishaSinha9
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Megha Sharma
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
StepUp Analytics
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
Maths Behind Models.pptx
Maths Behind Models.pptxMaths Behind Models.pptx
Maths Behind Models.pptx
Mukul Kumar Singh Chauhan
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
AminaRepo
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
萍華 楊
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
Omkar Rane
 

Similar to BaggingBoosting.pdf (20)

M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Ensemble Method.pptx
Ensemble Method.pptxEnsemble Method.pptx
Ensemble Method.pptx
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Ensemblelearning 181220105413
Ensemblelearning 181220105413Ensemblelearning 181220105413
Ensemblelearning 181220105413
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Random forest
Random forestRandom forest
Random forest
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Maths Behind Models.pptx
Maths Behind Models.pptxMaths Behind Models.pptx
Maths Behind Models.pptx
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 

Recently uploaded

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

BaggingBoosting.pdf

  • 1. Bagging, Boosting, Random Forest, AdaBoost In supervised learning, our goal is to learn predictor h(x) with high accuracy (low error) using a training data {(x1,y1),…,(xn,yn)}. Decision Tree None of the classifiers is perfect. Examples which are not correctly classified by one classifier may be correctly classified by the other classifiers. But, we can improve it by utilizing the Ensemble classifier. General Idea of Ensemble Classifier: The primary principle behind the ensemble model is that a group of weak learners come together to form an active learner. Ensembles of Classifiers combine the classifiers to improve the performance. It Combine the classification results from different classifiers to produce the final output using unweighted voting or weighted voting.
  • 2. Bias/Variance Tradeoff: Ensemble methods that minimize variance – Bagging – Random Forests Ensemble methods that minimize bias – Functional Gradient Descent – Boosting – Ensemble Selection o In Bagging, our goal is to reduce variance. o Bagging combining many unstable predictors to produce a ensemble (stable) predictor. o We independently select many training sets S’. o We train model using each S’ and finally average predictions.
  • 3. Bagging Algorithm • Training o Given a dataset S, at each iteration i, a training set Si is sampled with replacement from S (i.e. bootstraping) o A classifier Ci is learned for each Si • Classification: given an unseen sample X, o Each classifier Ci returns its class prediction o The bagged classifier H counts the votes and assigns the class with the most votes to X • Regression: can be applied to the prediction of continuous values by taking the average value of each prediction. The Bagging Model Bagging is more powerful than a single decision tree. Bagging is used when our objective is to reduce the variance of a decision tree. We create a few subsets of data from the training sample, which is chosen randomly with replacement. Now each collection of subset data is used to prepare their decision trees thus, we end up with an ensemble of various models. The average of all the assumptions from numerous tress is used, which is more powerful than a single decision tree. Random Forest is an expansion over bagging. It takes one additional step to predict a random subset of data. It also makes the random selection of features rather than using all features to develop trees. When we have numerous random trees, it is called the Random Forest.
  • 4. These are the following steps which are taken to implement a Random forest: o Let us consider X observations Y features in the training data set. First, a model from the training data set is taken randomly with substitution. o The tree is developed to the largest. o The given steps are repeated, and prediction is given, which is based on the collection of predictions from n number of trees. Advantages of using Random Forest technique: o It manages a higher dimension data set very well. o It manages missing quantities and keeps accuracy for missing data. Disadvantages of using Random Forest technique: Since the last prediction depends on the mean predictions from subset trees, it won't give precise value for the regression model. Boosting is another ensemble procedure to make a collection of predictors. In other words, we fit consecutive trees, usually random samples, and at each step, the objective is to solve net error from the prior trees. If a given input is misclassified by theory, then its weight is increased so that the upcoming hypothesis is more likely to classify it correctly by consolidating the entire set at last converts weak learners into better performing models. Gradient Boosting is an expansion of the boosting procedure. 1. Gradient Boosting = Gradient Descent + Boosting Bagging Boosting Various training data subsets are randomly drawn with replacement from the whole training dataset. Each new subset contains the components that were misclassified by previous models. Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias. If the classifier is unstable (high variance), then we need to apply bagging. If the classifier is steady and straightforward (high bias), then we need to apply boosting.
  • 5. Every model receives an equal weight. Models are weighted by their performance. Objective to decrease variance, not bias. Objective to decrease bias, not variance. It is the easiest way of connecting predictions that belong to the same type. It is a way of connecting predictions that belong to the different types. Every model is constructed independently. New models are affected by the performance of the previously developed model. Comparison b/w Bagging and Boosting | Data Mining Bagging and Boosting are two types of Ensemble Learning. These two decrease the variance of single estimate as they combine several estimates from different models. So the result may be a model with higher stability. • If the difficulty of the single model is over-fitting, then Bagging is the best option. • If the problem is that the single model gets a very low performance, Boosting could generate a combined model with lower errors as it optimizes the advantages and reduces pitfalls of the single model. • Similarities between Bagging and Boosting – 1. Both are ensemble methods to get N learners from 1 learner. 2. Both generate several training data sets by random sampling. 3. Both make the final decision by averaging the N learners (or taking the majority of them i.e Majority Voting). 4. Both are good at reducing variance and provide higher stability.
  • 6. Differences Between Bagging and Boosting – S.NO Bagging Boosting . Simplest way of combining predictions that belong to the same type. A way of combining predictions that belong to the different types. 2. Aim to decrease variance, not bias. Aim to decrease bias, not variance. 3. Each model receives equal weight. Models are weighted according to their performance. 4. Each model is built independently. New models are influenced by performance of previously built models. 5. Different training data subsets are randomly drawn with replacement from the entire training dataset. Every new subsets contains the elements that were misclassified by previous models. 6. Bagging tries to solve over-fitting problem. Boosting tries to reduce bias. 7. If the classifier is unstable (high variance), then apply bagging. If the classifier is stable and simple (high bias) the apply boosting. 8. Random forest. Gradient boosting. Boosting is a general ensemble method that creates a strong classifier from a number of weak classifiers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added. AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting.
  • 7. AdaBoost Ensemble Weak models are added sequentially, trained using the weighted training data. The process continues until a pre-set number of weak learners have been created (a user parameter) or no further improvement can be made on the training dataset. Once completed, you are left with a pool of weak learners each with a stage value. Data Preparation for AdaBoost This section lists some heuristics for best preparing your data for AdaBoost. • Quality Data: Because the ensemble method continues to attempt to correct misclassifications in the training data, you need to be careful that the training data is of a high- quality. • Outliers: Outliers will force the ensemble down the rabbit hole of working hard to correct for cases that are unrealistic. These could be removed from the training dataset. • Noisy Data: Noisy data, specifically noise in the output variable can be problematic. If possible, attempt to isolate and clean these from your training dataset. Making Predictions with AdaBoost Predictions are made by calculating the weighted average of the weak classifiers. For a new input instance, each weak learner calculates a predicted value as either +1.0 or - 1.0. The predicted values are weighted by each weak learners stage value. The prediction for the ensemble model is taken as a the sum of the weighted predictions. If the sum is positive, then the first class is predicted, if negative the second class is predicted. For example, 5 weak classifiers may predict the values 1.0, 1.0, -1.0, 1.0, -1.0. From a majority vote, it looks like the model will predict a value of 1.0 or the first class. These same 5 weak classifiers may have the stage values 0.2, 0.5, 0.8, 0.2 and 0.9 respectively. Calculating the weighted sum of these predictions results in an output of -0.8, which would be an ensemble prediction of -1.0 or the second class.