Bagging – an
ensemble
learning method
Presented by
Muhammad Aqib FA18-BSE-024
Muhammad Hesham FA18-BSE-027
Muhammad Ibrar FA18-BSE-029
Muhammad Subtain FA18-BSE-047
Zain-ul-Abideen FA18-BSE-050
Contents
● 1_Ensemble learning.
● 2_What is Bagging
● 2.1_Applications of Bagging
● 2.2_Bagging vs Boosting
● 2.3_How bagging works
● 2.3 1_ Bootstrapping
● 2.3.2_ Parallel training
● 2.3.3_ Aggregation
● 2.4_Benefits and challenges of bagging.
● 2.5_ Ease of implementation
● 2.6_Reduction of variance
● 2.7_ Loss of interpretability
● 2.8_Computationally expensive
Ensemble learning
❖ In statistics and machine learning, ensemble methods use multiple
learning algorithms to obtain better predictive performance than could
be obtained from any of the constituent learning algorithms alone.
❖ Ensemble learning is the process by which multiple models,
such as classifiers or experts, are strategically generated and
combined to solve a particular computational intelligence
problem.
Ensemble Learning Types
Bagging
Boosting
Stacking
Bagging
❖ Bootstrap aggregating, also called bagging is a machine learning
ensemble meta-algorithm designed to improve the stability and
accuracy of machine learning algorithms used in statistical
classification and regression. It also reduces variance and helps to
avoid overfitting.
❖ E.g Used in decision trees.
Bagging visual representation model
Applications of Bagging
❖ Provides stability.
❖ Used in decision trees.
❖ Increases the machine learning algorithms accuracy that is
used in statistical classification and regression.
❖ Improve the performance of network intrusion detection
systems
Bagging Model
Algorithm
● Initialize the dataset and assign equal weight to each of the data
point.
● Provide this as input to the model and identify the wrongly classified
data points.
● Increase the weight of the wrongly classified data points.
● End
● If required output not found then repeat the step 2.
Bagging implementation using python
● Bagging Classifier Python Code Example
● We have a Google-Stock-Price-Prediction dataset. let’s Implement
bagging classifier
● The dataset is downloaded from https://www.kaggle.com/datasets
● Each steps is defined and explained explicitly.
Bagging implementation using python
● Pandas, numpy are python libraries used in implementation so we loaded with import
function
Load the Dataset
Split the dataset into training and testing
● Test size is the number that defines the size of the test set.
● Random - state is basically used for reproducing your problem the
same every time
Creating sub samples to train models
● K-Fold is validation technique in which we split the data into k-
subsets
● Seed method is used to initialize the random number generator
Defining decision tree Algorithm
Classification model for bagging
Train models with accuracy
Mean accuracy
● Mean accuracy result is 95%
Model’s accuracy
● Now we can conclude that the individual models (weak learners) overfits the data and
have a high variance. But the aggregated result has a reduced variance and is
trustworthy.
Bagging vs boosting
Bagging
Data partition is random
Reduce variance
Boosting
Mis-classified data is given higher importance
Increasing prediction accuracy
How bagging works
Bootstrapping
Bootstrapping in bagging is of may be row sampling with random
replacement or column/feature sampling with random replacement
In bootstrapping the dataset is just divided into base learners and in this
the condition is that they will not be same and they ma have same row or
column but not totally same and then it get trained and get tested
Parallel training
As a i told in previous slide the base learners get trained in bagging the
training is of parallel training. Let me tell u that in parallel training the
base learners exist independently and it can’t depend on the other base
learner so that it is tested independently
Aggregation
After the base learners get tested the result is aggregated and the
final result is the aggregate of the results that got from base learners
Abstarct level diagram
Advantages and disadvantages of Bagging
There are a number of key advantages and challenges that the
bagging method presents when used for classification or regression
problems.
The key benefits of bagging include:
Ease of implementation
Python libraries such as scikit-learn (also known as sklearn) make it
easy to combine the prediction of base learners or estimates to improve
mode, performance.
Reduction of variance
Bagging can reduce the variance within a learning algorithm. This is
particularly helpful with high dimensional data, where missing values can
lead to higher variance, marking it more prone to overfitting and
preventing accurate generalization to new datasets.
The disadvantages of bagging
Loss of interpretability:
It is difficult to draw very precise business insights through bagging because
due to the averaging involved across predictions.While the output is more
precise then any individual data point a more accurate or complete dataset
could also yield more precision within a single classification or regression
model

Bagging.pptx

  • 1.
    Bagging – an ensemble learningmethod Presented by Muhammad Aqib FA18-BSE-024 Muhammad Hesham FA18-BSE-027 Muhammad Ibrar FA18-BSE-029 Muhammad Subtain FA18-BSE-047 Zain-ul-Abideen FA18-BSE-050
  • 2.
    Contents ● 1_Ensemble learning. ●2_What is Bagging ● 2.1_Applications of Bagging ● 2.2_Bagging vs Boosting ● 2.3_How bagging works ● 2.3 1_ Bootstrapping ● 2.3.2_ Parallel training ● 2.3.3_ Aggregation ● 2.4_Benefits and challenges of bagging. ● 2.5_ Ease of implementation ● 2.6_Reduction of variance ● 2.7_ Loss of interpretability ● 2.8_Computationally expensive
  • 3.
    Ensemble learning ❖ Instatistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. ❖ Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem.
  • 4.
  • 5.
    Bagging ❖ Bootstrap aggregating,also called bagging is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. ❖ E.g Used in decision trees.
  • 6.
  • 7.
    Applications of Bagging ❖Provides stability. ❖ Used in decision trees. ❖ Increases the machine learning algorithms accuracy that is used in statistical classification and regression. ❖ Improve the performance of network intrusion detection systems
  • 8.
  • 9.
    Algorithm ● Initialize thedataset and assign equal weight to each of the data point. ● Provide this as input to the model and identify the wrongly classified data points. ● Increase the weight of the wrongly classified data points. ● End ● If required output not found then repeat the step 2.
  • 10.
    Bagging implementation usingpython ● Bagging Classifier Python Code Example ● We have a Google-Stock-Price-Prediction dataset. let’s Implement bagging classifier ● The dataset is downloaded from https://www.kaggle.com/datasets ● Each steps is defined and explained explicitly.
  • 11.
    Bagging implementation usingpython ● Pandas, numpy are python libraries used in implementation so we loaded with import function
  • 12.
  • 13.
    Split the datasetinto training and testing ● Test size is the number that defines the size of the test set. ● Random - state is basically used for reproducing your problem the same every time
  • 14.
    Creating sub samplesto train models ● K-Fold is validation technique in which we split the data into k- subsets ● Seed method is used to initialize the random number generator
  • 15.
  • 16.
  • 17.
  • 18.
    Mean accuracy ● Meanaccuracy result is 95%
  • 19.
    Model’s accuracy ● Nowwe can conclude that the individual models (weak learners) overfits the data and have a high variance. But the aggregated result has a reduced variance and is trustworthy.
  • 20.
    Bagging vs boosting Bagging Datapartition is random Reduce variance Boosting Mis-classified data is given higher importance Increasing prediction accuracy
  • 21.
  • 22.
    Bootstrapping Bootstrapping in baggingis of may be row sampling with random replacement or column/feature sampling with random replacement In bootstrapping the dataset is just divided into base learners and in this the condition is that they will not be same and they ma have same row or column but not totally same and then it get trained and get tested
  • 23.
    Parallel training As ai told in previous slide the base learners get trained in bagging the training is of parallel training. Let me tell u that in parallel training the base learners exist independently and it can’t depend on the other base learner so that it is tested independently
  • 24.
    Aggregation After the baselearners get tested the result is aggregated and the final result is the aggregate of the results that got from base learners
  • 25.
  • 26.
    Advantages and disadvantagesof Bagging There are a number of key advantages and challenges that the bagging method presents when used for classification or regression problems. The key benefits of bagging include:
  • 27.
    Ease of implementation Pythonlibraries such as scikit-learn (also known as sklearn) make it easy to combine the prediction of base learners or estimates to improve mode, performance.
  • 28.
    Reduction of variance Baggingcan reduce the variance within a learning algorithm. This is particularly helpful with high dimensional data, where missing values can lead to higher variance, marking it more prone to overfitting and preventing accurate generalization to new datasets.
  • 29.
    The disadvantages ofbagging Loss of interpretability: It is difficult to draw very precise business insights through bagging because due to the averaging involved across predictions.While the output is more precise then any individual data point a more accurate or complete dataset could also yield more precision within a single classification or regression model