The document discusses bagging, an ensemble machine learning method. Bagging (bootstrap aggregating) uses multiple models fitted on random subsets of a dataset to improve stability and accuracy compared to a single model. It works by training base models in parallel on random samples with replacement of the original dataset and aggregating their predictions. Key benefits are reduced variance, easier implementation through libraries like scikit-learn, and improved performance over single models. However, bagging results in less interpretable models compared to a single model.
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Bagging.pptx
1. Bagging – an
ensemble
learning method
Presented by
Muhammad Aqib FA18-BSE-024
Muhammad Hesham FA18-BSE-027
Muhammad Ibrar FA18-BSE-029
Muhammad Subtain FA18-BSE-047
Zain-ul-Abideen FA18-BSE-050
2. Contents
● 1_Ensemble learning.
● 2_What is Bagging
● 2.1_Applications of Bagging
● 2.2_Bagging vs Boosting
● 2.3_How bagging works
● 2.3 1_ Bootstrapping
● 2.3.2_ Parallel training
● 2.3.3_ Aggregation
● 2.4_Benefits and challenges of bagging.
● 2.5_ Ease of implementation
● 2.6_Reduction of variance
● 2.7_ Loss of interpretability
● 2.8_Computationally expensive
3. Ensemble learning
❖ In statistics and machine learning, ensemble methods use multiple
learning algorithms to obtain better predictive performance than could
be obtained from any of the constituent learning algorithms alone.
❖ Ensemble learning is the process by which multiple models,
such as classifiers or experts, are strategically generated and
combined to solve a particular computational intelligence
problem.
5. Bagging
❖ Bootstrap aggregating, also called bagging is a machine learning
ensemble meta-algorithm designed to improve the stability and
accuracy of machine learning algorithms used in statistical
classification and regression. It also reduces variance and helps to
avoid overfitting.
❖ E.g Used in decision trees.
7. Applications of Bagging
❖ Provides stability.
❖ Used in decision trees.
❖ Increases the machine learning algorithms accuracy that is
used in statistical classification and regression.
❖ Improve the performance of network intrusion detection
systems
9. Algorithm
● Initialize the dataset and assign equal weight to each of the data
point.
● Provide this as input to the model and identify the wrongly classified
data points.
● Increase the weight of the wrongly classified data points.
● End
● If required output not found then repeat the step 2.
10. Bagging implementation using python
● Bagging Classifier Python Code Example
● We have a Google-Stock-Price-Prediction dataset. let’s Implement
bagging classifier
● The dataset is downloaded from https://www.kaggle.com/datasets
● Each steps is defined and explained explicitly.
11. Bagging implementation using python
● Pandas, numpy are python libraries used in implementation so we loaded with import
function
13. Split the dataset into training and testing
● Test size is the number that defines the size of the test set.
● Random - state is basically used for reproducing your problem the
same every time
14. Creating sub samples to train models
● K-Fold is validation technique in which we split the data into k-
subsets
● Seed method is used to initialize the random number generator
19. Model’s accuracy
● Now we can conclude that the individual models (weak learners) overfits the data and
have a high variance. But the aggregated result has a reduced variance and is
trustworthy.
20. Bagging vs boosting
Bagging
Data partition is random
Reduce variance
Boosting
Mis-classified data is given higher importance
Increasing prediction accuracy
22. Bootstrapping
Bootstrapping in bagging is of may be row sampling with random
replacement or column/feature sampling with random replacement
In bootstrapping the dataset is just divided into base learners and in this
the condition is that they will not be same and they ma have same row or
column but not totally same and then it get trained and get tested
23. Parallel training
As a i told in previous slide the base learners get trained in bagging the
training is of parallel training. Let me tell u that in parallel training the
base learners exist independently and it can’t depend on the other base
learner so that it is tested independently
24. Aggregation
After the base learners get tested the result is aggregated and the
final result is the aggregate of the results that got from base learners
26. Advantages and disadvantages of Bagging
There are a number of key advantages and challenges that the
bagging method presents when used for classification or regression
problems.
The key benefits of bagging include:
27. Ease of implementation
Python libraries such as scikit-learn (also known as sklearn) make it
easy to combine the prediction of base learners or estimates to improve
mode, performance.
28. Reduction of variance
Bagging can reduce the variance within a learning algorithm. This is
particularly helpful with high dimensional data, where missing values can
lead to higher variance, marking it more prone to overfitting and
preventing accurate generalization to new datasets.
29. The disadvantages of bagging
Loss of interpretability:
It is difficult to draw very precise business insights through bagging because
due to the averaging involved across predictions.While the output is more
precise then any individual data point a more accurate or complete dataset
could also yield more precision within a single classification or regression
model