Ensemble Method
(Bagging
Boosting)
Agenda
Ensemble Learning
Bagging
Boosting
Conclusion
Ensemble Learning
• What is Ensemble Learning?
• How Ensemble Learning works?
• How Ensemble Learning comes to
existence?
3
Ensemble Learning
Ensemble learning in machine learning
refers to the technique of combining
predictions from multiple models to
improve overall performance and
accuracy.
By aggregating diverse models such as
decision trees or neural networks,
ensemble methods like bagging and
boosting enhance robustness, reduce
overfitting, and yield more accurate and
stable predictions for various tasks.
5
Bagging
What is Bagging
Bootstraping
Diverse
Models
Aggregation
Reduction
of
Overfitting
Dataset
Model 1 Model 2 Model n
Dataset 1 Dataset 2 Dataset n
Ensemble Model
Advantages of Bagging
Improved
Accuracy
1
Enhanced
Robustnes
s
2
Increased
Stability
3
Reduction
of
Overfitting
4
Easy
Parallelization
5
What is boosting?
• Boosting builds a series of weak learners sequentially.
• Each new model pays more attention to the data points that the previous models
misclassified.
Sequential Training of
Models
• Boosting algorithms identify and prioritize misclassified samples during the
training process.
• More emphasis is given to the data points that are difficult to classify correctly.
Focus on Misclassified
Samples
• Predictions from individual models are combined with different weights.
• Models that perform well are given higher weights, while models that struggle with
certain samples are given lower weights.
• The final prediction is a weighted sum of predictions from all models.
Weighted Aggregation
of Predictions
Training
Set
Subset 1
Subset 2
Subset n
Weak
Learner
Weak
Learner
Weak
Learner
False prediction
False prediction
Overall Prediction
Training
Training
Training
Testing
Testing
12
Advantage of Boosting
Handles Noisy Data
Boosting corrects misclassified samples, reducing the impact of noisy data on the final model's accuracy.
Feature Importance
Boosting reveals crucial features for accurate predictions, aiding in effective feature selection.
Adaptability to Different Data
Boosting algorithms adapt to diverse and complex datasets, making them suitable for various applications, from text classification to image recognition.
Reduction of Bias and Variance
Boosting strikes a balance between bias and variance, resulting in models that generalize well to new data by combining predictions of multiple weak learners.
Improved Accuracy
Boosting corrects errors made by previous models, leading to highly accurate predictions through sequential model refinement.
Differences Between Bagging and Boosting
14
Criteria Bagging Boosting
Predictions data type The simplest way of combining
predictions,
Which belong to the same type.
A way of combining
predictions that
belong to the different types.
Focuses areas Aim to decrease variance, not
bias.
Aim to decrease bias, not
variance.
Weights Each model receives equal
weight.
Models are weighted
according to their
performance.
Models working method
Each model is built
independently.
New models are influenced
by the performance of
previously built models.
Classifier training type In this base classifiers are
trained parallelly.
In this base classifiers are
trained sequentially.
Example The Random forest model uses
Bagging.
The AdaBoost uses Boosting
techniques.
Credit Scoring Customer Churn
Prediction
Stock Market
Forecasting
Medical
Diagnosis
Web Search
Ranking
Fraud Detection Face Detection Natural Language
Processing
Real World Application of Bagging
Real World Application of Boosting
Summary
1. Enhanced Accuracy and Stability
2. Robustness to Data Challenges
3. Continuous Improvement and
Exploration
Thank you
References:
18
✔✔ https://www.simplilearn.com/tutorials/machine-learning-tutorial/bagging-in-machine-learning
✔✔ https://www.analyticsvidhya.com/blog/2023/01/ensemble-learning-methods-bagging-boosting-and stacking/
✔✔ https://www.simplilearn.com/tutorials/machine-learning-tutorial/what-is-boosting
✔✔ https://www.geeksforgeeks.org/bagging-vs-boosting-in-machine-learning/
✔✔ https://olympus.mygreatlearning.com/courses/61356

Ensemble Method (Bagging Boosting)

  • 1.
  • 2.
  • 3.
    Ensemble Learning • Whatis Ensemble Learning? • How Ensemble Learning works? • How Ensemble Learning comes to existence? 3
  • 4.
    Ensemble Learning Ensemble learningin machine learning refers to the technique of combining predictions from multiple models to improve overall performance and accuracy. By aggregating diverse models such as decision trees or neural networks, ensemble methods like bagging and boosting enhance robustness, reduce overfitting, and yield more accurate and stable predictions for various tasks.
  • 5.
  • 6.
  • 7.
  • 8.
    Dataset Model 1 Model2 Model n Dataset 1 Dataset 2 Dataset n Ensemble Model
  • 9.
  • 10.
    What is boosting? •Boosting builds a series of weak learners sequentially. • Each new model pays more attention to the data points that the previous models misclassified. Sequential Training of Models • Boosting algorithms identify and prioritize misclassified samples during the training process. • More emphasis is given to the data points that are difficult to classify correctly. Focus on Misclassified Samples • Predictions from individual models are combined with different weights. • Models that perform well are given higher weights, while models that struggle with certain samples are given lower weights. • The final prediction is a weighted sum of predictions from all models. Weighted Aggregation of Predictions
  • 11.
    Training Set Subset 1 Subset 2 Subsetn Weak Learner Weak Learner Weak Learner False prediction False prediction Overall Prediction Training Training Training Testing Testing
  • 12.
  • 13.
    Advantage of Boosting HandlesNoisy Data Boosting corrects misclassified samples, reducing the impact of noisy data on the final model's accuracy. Feature Importance Boosting reveals crucial features for accurate predictions, aiding in effective feature selection. Adaptability to Different Data Boosting algorithms adapt to diverse and complex datasets, making them suitable for various applications, from text classification to image recognition. Reduction of Bias and Variance Boosting strikes a balance between bias and variance, resulting in models that generalize well to new data by combining predictions of multiple weak learners. Improved Accuracy Boosting corrects errors made by previous models, leading to highly accurate predictions through sequential model refinement.
  • 14.
    Differences Between Baggingand Boosting 14 Criteria Bagging Boosting Predictions data type The simplest way of combining predictions, Which belong to the same type. A way of combining predictions that belong to the different types. Focuses areas Aim to decrease variance, not bias. Aim to decrease bias, not variance. Weights Each model receives equal weight. Models are weighted according to their performance. Models working method Each model is built independently. New models are influenced by the performance of previously built models. Classifier training type In this base classifiers are trained parallelly. In this base classifiers are trained sequentially. Example The Random forest model uses Bagging. The AdaBoost uses Boosting techniques.
  • 15.
    Credit Scoring CustomerChurn Prediction Stock Market Forecasting Medical Diagnosis Web Search Ranking Fraud Detection Face Detection Natural Language Processing Real World Application of Bagging Real World Application of Boosting
  • 16.
    Summary 1. Enhanced Accuracyand Stability 2. Robustness to Data Challenges 3. Continuous Improvement and Exploration
  • 17.
  • 18.
    References: 18 ✔✔ https://www.simplilearn.com/tutorials/machine-learning-tutorial/bagging-in-machine-learning ✔✔ https://www.analyticsvidhya.com/blog/2023/01/ensemble-learning-methods-bagging-boosting-andstacking/ ✔✔ https://www.simplilearn.com/tutorials/machine-learning-tutorial/what-is-boosting ✔✔ https://www.geeksforgeeks.org/bagging-vs-boosting-in-machine-learning/ ✔✔ https://olympus.mygreatlearning.com/courses/61356

Editor's Notes

  • #8 1. **Bootstrap Aggregating (Bagging)**: Bagging is an ensemble learning technique that combines predictions from multiple base models. It operates by creating multiple subsets of the original dataset through bootstrapping (sampling with replacement). 2. **Diverse Models**: Bagging uses a collection of diverse base models, such as decision trees, which are trained independently on these subsets. Each model learns different patterns from the data, enhancing overall model diversity. 3. **Aggregation**: After training, predictions from all individual models are combined, often through averaging (for regression) or voting (for classification). This aggregation smoothens out individual model errors and improves overall accuracy and robustness. 4. **Reduction of Overfitting**: Bagging reduces overfitting because it averages out the noise present in individual models, leading to a more generalized and reliable ensemble model. It is a fundamental technique used in ensemble learning to enhance predictive performance.
  • #10 Improved Accuracy: Bagging reduces variance and overfitting by combining predictions from multiple models. This often leads to more accurate and reliable predictions compared to individual models. Enhanced Robustness: By training multiple models on different subsets of the data, bagging reduces the impact of outliers and noisy data points, making the overall model more robust and resistant to errors. Increased Stability: Bagging stabilizes the learning process. Because it averages or combines predictions, it smoothens out fluctuations in the training data, making the model's predictions more stable and consistent. 4.Reduction of Overfitting: Bagging significantly reduces overfitting, a common problem in machine learning where a model performs well on training data but poorly on unseen data. By training multiple models on different subsets of the data and combining their predictions, bagging reduces the chances of any single model memorizing noise or outliers in the training data, leading to a more generalized and reliable ensemble model. This reduction in overfitting enhances the model's ability to make accurate predictions on new, unseen data, improving overall performance and reliability. 5.Easy Parallelization: The independent nature of the base models in bagging allows for easy parallelization. Models can be trained simultaneously on different subsets of data, speeding up the overall training process, which is particularly useful for large datasets.
  • #11 Explanation of Boosting: Boosting is an ensemble learning technique that aims to improve the accuracy of models by converting weak learners into strong learners through a sequential learning process. Unlike bagging, boosting trains models sequentially, where each model corrects the errors made by its predecessor. Boosting focuses on learning from mistakes, continually giving more attention to misclassified samples. Key Points: Sequential Training of Models: Boosting builds a series of weak learners (e.g., decision trees) sequentially. Each new model pays more attention to the data points that the previous models misclassified. Focus on Misclassified Samples: Boosting algorithms identify and prioritize misclassified samples during the training process. More emphasis is given to the data points that are difficult to classify correctly. Weighted Aggregation of Predictions: Predictions from individual models are combined with different weights. Models that perform well are given higher weights, while models that struggle with certain samples are given lower weights. The final prediction is a weighted sum of predictions from all models. Visual: Diagram Illustrating the Concept of Boosting: [Insert Diagram Here] (Visual: Provide a diagram showing the sequential training process of boosting. Use arrows to indicate the flow of information from one model to another. Emphasize the focus on misclassified samples, potentially using different colors or shapes to represent correctly and incorrectly classified samples. Show the weighted aggregation of predictions, perhaps using varying line thickness or opacity to represent different model weights. This visual aid will help the audience understand how boosting adapts and improves over iterations.) This slide explains the concept of boosting, highlighting its sequential training approach, emphasis on misclassified samples, and the weighted aggregation of predictions. The visual diagram enhances the understanding of the boosting process, making it easier for the audience to grasp the iterative nature of this ensemble learning technique.
  • #17 Enhanced Accuracy and Stability: Ensemble techniques like bagging and boosting improve model accuracy by combining diverse models, ensuring more reliable predictions even in complex scenarios. Robustness to Data Challenges: Ensemble methods handle overfitting and outliers effectively, crucial for managing noisy or imbalanced datasets, making them indispensable in real-world applications. Continuous Improvement and Exploration: Regular experimentation with diverse base models, coupled with hyperparameter tuning, enables continuous learning, allowing adaptation to the latest advancements in ensemble techniques, ensuring optimal model performance.