22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx

14/05/2025 1
Department of Computer Science & Engineering (SB-ET)
III B. Tech -I Semester
MACHINE LEARNING
SUBJECT CODE: 22PCOAM16
AcademicY
ear: 2024-2025
by
Dr. M.Gokilavani
GNITC
Department of CSE (SB-ET)

14/05/2025 Department of CSE (SB-ET) 2
22PCOAM16 MACHINE LEARNING
UNIT – III
Syllabus
Learning with Trees – Decision Trees – Constructing Decision Trees –
Classification and Regression Trees – Ensemble Learning – Boosting –
Bagging – Different ways to Combine Classifiers – Basic Statistics –
Gaussian Mixture Models – Nearest Neighbor Methods – Unsupervised
Learning – K means Algorithms

14/05/2025 3
TEXTBOOK:
• Stephen Marsland, Machine Learning - An Algorithmic Perspective, Second Edition,
Chapman and Hall/CRC.
• Machine Learning and Pattern Recognition Series, 2014.
REFERENCES:
• Tom M Mitchell, Machine Learning, First Edition, McGraw Hill Education, 2013.
• Ethem Alpaydin, Introduction to Machine Learning 3e (Adaptive Computation and
Machine
No of Hours Required: 13
Department of CSE (SB-ET)
UNIT - III LECTURE - 22

Ensemble Learning
• Ensemble learning is a technique in machine learning that
combines the predictions from multiple individual models to
achieve better predictive performance than any single model
alone.
• The fundamental idea is to leverage the strengths and compensate
for the weaknesses of various models by aggregating their
predictions.

Types of Ensemble Learning
There are two main types of ensemble methods:
• Bagging (Bootstrap Aggregating): Models are trained
independently on different subsets of the data, and their results
are averaged or voted on.
• Boosting: Models are trained sequentially, with each one
learning from the mistakes of the previous model.

Bagging Algorithm
• Bootstrap aggregating also known as bagging, is a machine
learning ensemble dataset designed to improve the stability and
accuracy of machine learning algorithms used in statistical
classification and regression.
• It decreases the variance and helps to avoid over fitting.
• It is usually applied to decision tree methods.
• Bagging is a special case of the model averaging approach.

Description of the Technique
• Suppose a set D of d tuples, at each iteration ith, a training set
Di of d tuples is selected via row sampling with a replacement
method (i.e., there can be repetitive elements from different d
tuples) from D (i.e., bootstrap).
• Then a classifier model Mi is learned for each training set D < i.
• Each classifier Mi returns its class prediction.
• The bagged classifier M* counts the votes and assigns the class
with the most votes to X (unknown sample).

Implementation of Bagging
• Step 1: Multiple subsets are created from the original data set with equal
tuples, selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set and
independent of each other.
• Step 4: The final predictions are determined by combining the predictions
from all the models.

Implementation of Bagging

Bagging Algorithm

Bagging Classifier
• Bagging or Bootstrap aggregating is a type of ensemble learning in which
multiple base models are trained independently and parallel on different
subsets of training data.
• In bagging classifier, the final prediction is made by aggregating the
predictions of all base model using majority voting.
• In the models of regression the final prediction is made by averaging the
predictions of the all base model and that is known as bagging regression.

Bootstrap Method
• Bootstrap Method is a powerful statistical technique widely used in
mathematics for estimating the distribution of a statistic by resampling
with replacement from the original data.
• The bootstrap method is a resampling technique that allows you to
estimate the properties of an estimator (such as its variance or bias) by
repeatedly drawing samples from the original data.
• It was introduced by Bradley Efron in 1979 and has since become a widely
used tool in statistical inference.
• The bootstrap method is useful in situations where the theoretical sampling
distribution of a statistic is unknown or difficult to derive analytically.

Bootstrap Method
• Bootstrapping is a statistical procedure that resample's a single data set to
create many simulated samples.
• Bootstrap Method or Bootstrapping is a statistical procedure that
resample's a single data set to create many simulated samples.
• This process allows for the, "calculation of standard errors, confidence
intervals, and hypothesis testing” according to a post on bootstrapping
statistics from statistician Jim Frost.
• It can be used to estimate summary statistics such as the mean and standard
deviation.

Bootstrap Method
• Bootstrap Method or Bootstrapping is a statistical technique for estimating an entire
population quantity by averaging estimates from multiple smaller data samples.

Implementation of Bootstrap
The procedure can be summarized as follows:
Step 1: Choose the number of bootstrap samples to take.
Step 2: Choose your sample size For each bootstrap sample, draw a
replacement sample of the size you selected.
Step 3: Calculate the statistics for the samples Calculate the average of the
computed sample statistics.

Example
• The Random Forest model uses Bagging, where decision tree models with
higher variance are present.
• It makes random feature selection to grow trees.
• Several random trees make a Random Forest.

Boosting Algorithm
• Boosting is an ensemble modeling technique designed to create a strong
classifier by combining multiple weak classifiers.
• The process involves building models sequentially, where each new model
aims to correct the errors made by the previous ones.
• AdaBoost was the first really successful boosting algorithm developed for
the purpose of binary classification.
• AdaBoost is short for Adaptive Boosting and is a very popular boosting
technique that combines multiple “weak classifiers” into a single “strong
classifier”.

Implementation of Boosting

• Initially, a model is built using the training data.
• Subsequent models are then trained to address the mistakes of their
predecessors.
• boosting assigns weights to the data points in the original dataset.
• Higher weights: Instances that were misclassified by the previous model
receive higher weights.
• Lower weights: Instances that were correctly classified receive lower
weights.

• Training on weighted data: The subsequent model learns from the
weighted dataset, focusing its attention on harder-to-learn examples (those
with higher weights).
• This iterative process continues until:
• The entire training dataset is accurately predicted, or
• A predefined maximum number of models is reached.

Boosting Algorithm
• Step 1: Initialize the dataset and assign equal weight to each of the data point.
• Step 2: Provide this as input to the model and identify the wrongly classified
data points.
• Step 3: Increase the weight of the wrongly classified data points and decrease
the weights of correctly classified data points. And then normalize the weights of
all data points.
• Step 4: if (got required results)
Goto step 5
else
Goto step 2
• Step 5: End

Advantages of Boosting
• Improved Accuracy: By combining multiple weak learners it enhances
predictive accuracy for both classification and regression tasks.
• Robustness to Over fitting: Unlike traditional models it dynamically
adjusts weights to prevent over fitting.
• Handles Imbalanced Data Well: It prioritizes misclassified points making
it effective for imbalanced datasets.
• Better Interpretability: The sequential nature of helps break down
decision-making making the model more interpretable.

Topics to be covered in next session 23
• Different ways to Combine Classifiers
Thank you!!!

22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx

More Related Content

Similar to 22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx

More from Guru Nanak Technical Institutions

Recently uploaded

22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx