This document discusses ensemble modeling techniques. It begins with an introduction to ensemble models and their advantages over single models in reducing biases, variability, and inaccuracies. It then explains how ensemble models work by combining the predictions from multiple machine learning models. Common ensemble methods like bagging and boosting are described, along with the mathematics of reducing bias, variance, and noise. Bagging is explained in more detail, including the bagging algorithm and an example of bagging ensembles using R. The document concludes by outlining topics to cover in subsequent sections, such as boosting, comparing bagging and boosting, and gradient boosting machines.
2. Road Map
Introduction
Ensemble models and possible drawback/s of single specific
model
How ensemble models works and example
Frequently used ensemble methods and mathematics
Bagging and Bagging Algorithm
Bagging ensembles using R
Comparison of result
Continue with. . .
Ganesh S Step Up Analytics 2 / 12
3. Introduction
Many of you might studied and practiced different
classification as well regression algorithms.
Also, many a time modeler uses a model at a time.
Ever wondered what would happen if we could combine more
than one classification model?
Whether resulting combo might more accurate or less variant?
Will answer these questions shortly
Ganesh S Step Up Analytics 3 / 12
4. Ensemble models and possible drawback/s of single
specific model
Ensembles are the answers to these questions
It is the process of running two or more related but different
machine learning models and then synthesizing the results into
single predictive or machine learning model
It can have biases
Presence of high variability
Outright inaccuracies
Ganesh S Step Up Analytics 4 / 12
5. How ensemble models works and example of ensemble
Producing a distribution called a simple ML model on the
subset of original data
Combining the distribution in one aggregated model
Random Forest
It is the group of multiple decision trees which built on
different sample data,evaluates different factors and/or weight
common variables differently.
Ganesh S Step Up Analytics 5 / 12
7. Frequently used ensemble methods and mathematics
Bagging
Boosting
Distance between predated (y) and actual (y) should be less.
(y − y) = Bias + Variance + Noise
Bias - The average distance between predictions.
Variance - Variability in the predictions.
Noise - Lower bound on the prediction error that the predictor
can achieve.
If we want to minimize(y − y) we have to minimize above
three.
Ganesh S Step Up Analytics 7 / 12
8. Bagging and Bagging Algorithm
Bagging stands for Bootstrapped Aggregation
Bagging is the way to decrease variance of your prediction by
generating additional training data from the original data with
different combination and replications
Bagging Algorithm
1. Samples(with replacement) are repeatedly taken from the data
set, so that each record has an equal record has an equal
probability of being selected, and each sample is of the same
size as the original training data set. These are bootstrapped
samples.
2. Train the model and record the predictions for each sample.
3. Bagging ensembles will be defined as the class with most votes
or the average of prediction made.
Ganesh S Step Up Analytics 8 / 12
9. Bagging Ensembles using R
Small case study using R, How ensemble bagging works!
Data Source is UCI data repository - Car Evaluation Data Set
Regression models is used
Bagging
Bagging in R
Ganesh S Step Up Analytics 9 / 12
11. Continue with...
Boosting and Boosting in R
Bagging and Boosting case study in python
Bagging-Boosting comparison
Famous GBM(Gradient Boosting Method)
GBM in R as well in Python with case study
Ganesh S Step Up Analytics 11 / 12