2. Lets Discover…
01
02
03
04
Intro to Ensembles
Lets Understand What is Ensemble Modelling
Titanic Example
This example will give better Idea about Ensembles.
What is Bagging
Random Forest, Bias Variance Trade-Off & Much More…
What is Boosting
Boosting Process…
4. The Anatomy of Decision Trees
How the Trees Look Like
Splitting: Process of Dividi
ng a node in sub-nodes
Decision Node: When a sub-node
is divided in further sub-nodes, t
hen It is called Decision Node.
Terminal Node: Nodes that d
o not split further are called
Leaf or Terminal Node.
Branch/Sub-Tree: A subse
ction of Entire Tree
Parent/Child Node: A Node that is divid
ed in sub node is called Parent Node a
nd sub-nodes are called Child of Paren
t Node
5. Maths Behind Decision Trees
• There are two algorithm for the construction of Decision Tree – Gini Index &
ID 3 (Iterative Dichotomiser -3)
• Here will focus on ID3. It used Entropy and Information Gain as Metrics. So
now we are going to find out the Root Node basis ID3 Algorithm.
• Entropy – It is the measure of uncertainty/impurity in the data.
H – Entropy
i – Set of classes in Data set
p(i) - proportion of no of elements of i in Dataset (S)
S – current dataset. For e.g. Basketball Data
Remember that for a Binary Classification Problem if all examples are +ve or all are –ve then entropy will be 0 i.e. low. If half of
the examples are +ve and half are –ve then entropy is One i.e. high.
B B B
R R
B B B R R
6. Ensemble Methods
Phone a Friend in KBC
Vs
Audience Poll
Asking a Single Person resembles
like a Single Tree
Asking a Group of People resembles
like multiple Decision Trees
Decision
Trees
Random
Forest
Image Source: Google
8. What is Bagging
• Decision Trees overfit the model and increases
the variance. This makes the model vulnerable.
• Bagged Trees averages many models to reduce
the variance.
• Although, Bagging is often applied to Decision
Trees but it can be used with any type of method
• In addition to reducing variance, it also helps
avoid overfitting.
• Bootstrap AGGregatING.
• Bootstrap Sampling means sampling rows from
the training data with replacement.
• This means that a single training example can be
repeated more than once.
Image Source: Google
9. How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Image Source: Google
10. How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Step2 – Train the Decision Trees on the newly
created bootstrapped samples.
This Step 1 and Step 2 can be repeated n no of
times. Typically more trees, the better the model.
Image Source: Google
11. How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Step2 – Train the Decision Trees on the newly
created bootstrapped samples.
This Step 1 and Step 2 can be repeated n no of
times. Typically more trees, the better the model.
Step3 – To Generate the Prediction, we would
simply average the predictions from these models
created on m samples, to get a final prediction.
Bagging can dramatically reduce the variance of
unstable models (e.g. Decision trees) leading to
improved Prediction.
Image Source: Google
Averaging reduces variance
but leaves the bias unchanged
14. What is Boosting
An Overview
• Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model
• The succeeding models are dependent on the previous model.
• Boosting gives mis-classified samples higher preference/weight.
• It is a method to “boost” weak learning algorithm (“single tree”) into strong learning algorithm.
Image Source: Google
15. What is Boosting
End to End Process
Let’s understand the way boosting works in the steps:
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights & a base model
is created on this subset.
3. This model is used to make predictions on the whole dataset.
4. Errors are calculated using actual values & predicted values.
5. The observations which are incorrectly predicted, given higher
weights.
6. Another model is created & predictions are made on the
dataset.
7. The model tries to correct the error from the previous data
set.
8. Similarly, multiple models are created to reduce the errors
of the previous model.
9. The final model is the weighted mean of all models.
Image Source: Google
16. Difference between Bagging & Boosting
• In bagging, Individual Trees are Independent of each others
& is a method of improving performance by aggregating result
of weak learners.
• In boosting, Individual Trees are Not Independent of each
other because the trees correct the result of previous trees.
Bagging Boosting
• Training stage is sequential in Boosting. This means weights
are assigned to the data.
• Training stage is parallel in Bagging. This means that each
model is built independently.
• The purpose of Bagging is to reduce Variance. It may solve the
problem of Over-fitting
• The purpose of Boosting is to reduce Bias. It may increase the
overfitting in the data.
17. How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. When the model is fitted, it will create base learners sequentially.
D - Dataset
1
2
3
4
5
6
7
8
9
10
18. How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
Base Learner 1
D - Dataset
1
2
3
4
5
6
7
8
9
10
19. How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified
Base Learner 1
D - Dataset
1
2
3
4
5
6
7
8
9
10
20. How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified.
5. The Next Learner BL2 will take these incorrectly classified r
ecords, add more weight to them and retrain.
Base Learner 1
Incorrectly Classified records:
3rd,4th & 5th
Base Learner 2
21. How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified.
5. The Next Learner BL2 will take these incorrectly classified r
ecords, add more weight to them and retrain.
6. If the errors regenerate, then the BL3 will be created and this
process will keep on going till the errors are brought to a mini
ma.
Base Learner 1
Incorrectly Classified records:
3rd,4th & 5th
Base Learner 2
23. Advantages of Trees
• Highly Accurate: Almost Half of Data Science Challenges are won by Tree Based Models.
• Easy to Use – Easy to Implement, Gets Good Performance with Little Tuning.
• Easy to Interpret and Control.
• Controls Overfitting (Ensemble)
• Improves Trainings Speed and Scale up-to - Large Datasets.