SlideShare a Scribd company logo
1 of 24
DECODING ML MODELS
BAGGING, BOOSTING & OTHERS
Lets Discover…
01
02
03
04
Intro to Ensembles
Lets Understand What is Ensemble Modelling
Titanic Example
This example will give better Idea about Ensembles.
What is Bagging
Random Forest, Bias Variance Trade-Off & Much More…
What is Boosting
Boosting Process…
Lets Dive In…
The Anatomy of Decision Trees
How the Trees Look Like
Splitting: Process of Dividi
ng a node in sub-nodes
Decision Node: When a sub-node
is divided in further sub-nodes, t
hen It is called Decision Node.
Terminal Node: Nodes that d
o not split further are called
Leaf or Terminal Node.
Branch/Sub-Tree: A subse
ction of Entire Tree
Parent/Child Node: A Node that is divid
ed in sub node is called Parent Node a
nd sub-nodes are called Child of Paren
t Node
Maths Behind Decision Trees
• There are two algorithm for the construction of Decision Tree – Gini Index &
ID 3 (Iterative Dichotomiser -3)
• Here will focus on ID3. It used Entropy and Information Gain as Metrics. So
now we are going to find out the Root Node basis ID3 Algorithm.
• Entropy – It is the measure of uncertainty/impurity in the data.
H – Entropy
i – Set of classes in Data set
p(i) - proportion of no of elements of i in Dataset (S)
S – current dataset. For e.g. Basketball Data
Remember that for a Binary Classification Problem if all examples are +ve or all are –ve then entropy will be 0 i.e. low. If half of
the examples are +ve and half are –ve then entropy is One i.e. high.
B B B
R R
B B B R R
Ensemble Methods
Phone a Friend in KBC
Vs
Audience Poll
Asking a Single Person resembles
like a Single Tree
Asking a Group of People resembles
like multiple Decision Trees
Decision
Trees
Random
Forest
Image Source: Google
Bagging
What is Bagging
• Decision Trees overfit the model and increases
the variance. This makes the model vulnerable.
• Bagged Trees averages many models to reduce
the variance.
• Although, Bagging is often applied to Decision
Trees but it can be used with any type of method
• In addition to reducing variance, it also helps
avoid overfitting.
• Bootstrap AGGregatING.
• Bootstrap Sampling means sampling rows from
the training data with replacement.
• This means that a single training example can be
repeated more than once.
Image Source: Google
How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Image Source: Google
How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Step2 – Train the Decision Trees on the newly
created bootstrapped samples.
This Step 1 and Step 2 can be repeated n no of
times. Typically more trees, the better the model.
Image Source: Google
How Bagging Works
Step1 – Draw m Samples from with replacement
from original training set where m is a no less than
or equal to N.
Note - With Bagged Trees, the common choice for
m is one-half of N.
Step2 – Train the Decision Trees on the newly
created bootstrapped samples.
This Step 1 and Step 2 can be repeated n no of
times. Typically more trees, the better the model.
Step3 – To Generate the Prediction, we would
simply average the predictions from these models
created on m samples, to get a final prediction.
Bagging can dramatically reduce the variance of
unstable models (e.g. Decision trees) leading to
improved Prediction.
Image Source: Google
Averaging reduces variance
but leaves the bias unchanged
Bagging Algorithm
Image Source: Google
Boosting
What is Boosting
An Overview
• Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model
• The succeeding models are dependent on the previous model.
• Boosting gives mis-classified samples higher preference/weight.
• It is a method to “boost” weak learning algorithm (“single tree”) into strong learning algorithm.
Image Source: Google
What is Boosting
End to End Process
Let’s understand the way boosting works in the steps:
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights & a base model
is created on this subset.
3. This model is used to make predictions on the whole dataset.
4. Errors are calculated using actual values & predicted values.
5. The observations which are incorrectly predicted, given higher
weights.
6. Another model is created & predictions are made on the
dataset.
7. The model tries to correct the error from the previous data
set.
8. Similarly, multiple models are created to reduce the errors
of the previous model.
9. The final model is the weighted mean of all models.
Image Source: Google
Difference between Bagging & Boosting
• In bagging, Individual Trees are Independent of each others
& is a method of improving performance by aggregating result
of weak learners.
• In boosting, Individual Trees are Not Independent of each
other because the trees correct the result of previous trees.
Bagging Boosting
• Training stage is sequential in Boosting. This means weights
are assigned to the data.
• Training stage is parallel in Bagging. This means that each
model is built independently.
• The purpose of Bagging is to reduce Variance. It may solve the
problem of Over-fitting
• The purpose of Boosting is to reduce Bias. It may increase the
overfitting in the data.
How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. When the model is fitted, it will create base learners sequentially.
D - Dataset
1
2
3
4
5
6
7
8
9
10
How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
Base Learner 1
D - Dataset
1
2
3
4
5
6
7
8
9
10
How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified
Base Learner 1
D - Dataset
1
2
3
4
5
6
7
8
9
10
How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified.
5. The Next Learner BL2 will take these incorrectly classified r
ecords, add more weight to them and retrain.
Base Learner 1
Incorrectly Classified records:
3rd,4th & 5th
Base Learner 2
How Boosting Works
Boosting Process
Let’s understand the way boosting works in the steps:
1. Suppose we take a D Dataset that has 10 observations.
2. From this, when the model is fitted, it will create base learners
sequentially, like here, I have the first base learner.
3. The records will be passed to the BL1 and we will see how the
model has performed.
4. Lets Assume that 3,4 and 5th record is incorrectly classified.
5. The Next Learner BL2 will take these incorrectly classified r
ecords, add more weight to them and retrain.
6. If the errors regenerate, then the BL3 will be created and this
process will keep on going till the errors are brought to a mini
ma.
Base Learner 1
Incorrectly Classified records:
3rd,4th & 5th
Base Learner 2
Boosting Algorithm
Advantages of Trees
• Highly Accurate: Almost Half of Data Science Challenges are won by Tree Based Models.
• Easy to Use – Easy to Implement, Gets Good Performance with Little Tuning.
• Easy to Interpret and Control.
• Controls Overfitting (Ensemble)
• Improves Trainings Speed and Scale up-to - Large Datasets.
Thank you

More Related Content

What's hot

CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2NIMMYRAJU
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Lineamientos para la visualización de datos
Lineamientos para la visualización de datosLineamientos para la visualización de datos
Lineamientos para la visualización de datosAntonio Bustamante Delon
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesAshikur Rahman
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansStat Analytica
 
Seminar on Robust Regression Methods
Seminar on Robust Regression MethodsSeminar on Robust Regression Methods
Seminar on Robust Regression MethodsSumon Sdb
 
Effort estimation for software development
Effort estimation for software developmentEffort estimation for software development
Effort estimation for software developmentSpyros Ktenas
 
ForecastIT 3. Simple Exponential Smoothing
ForecastIT 3. Simple Exponential SmoothingForecastIT 3. Simple Exponential Smoothing
ForecastIT 3. Simple Exponential SmoothingDeepThought, Inc.
 
Microprocessor.pdf
Microprocessor.pdfMicroprocessor.pdf
Microprocessor.pdfpradipsaha77
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
A Note on Leapfrog Integration
A Note on Leapfrog IntegrationA Note on Leapfrog Integration
A Note on Leapfrog IntegrationKai Xu
 
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...vinoth raja
 

What's hot (17)

CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Lineamientos para la visualización de datos
Lineamientos para la visualización de datosLineamientos para la visualización de datos
Lineamientos para la visualización de datos
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Clustering
ClusteringClustering
Clustering
 
Seminar on Robust Regression Methods
Seminar on Robust Regression MethodsSeminar on Robust Regression Methods
Seminar on Robust Regression Methods
 
Effort estimation for software development
Effort estimation for software developmentEffort estimation for software development
Effort estimation for software development
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
03 data mining : data warehouse
03 data mining : data warehouse03 data mining : data warehouse
03 data mining : data warehouse
 
ForecastIT 3. Simple Exponential Smoothing
ForecastIT 3. Simple Exponential SmoothingForecastIT 3. Simple Exponential Smoothing
ForecastIT 3. Simple Exponential Smoothing
 
Microprocessor.pdf
Microprocessor.pdfMicroprocessor.pdf
Microprocessor.pdf
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
A Note on Leapfrog Integration
A Note on Leapfrog IntegrationA Note on Leapfrog Integration
A Note on Leapfrog Integration
 
Guestion paper
Guestion paperGuestion paper
Guestion paper
 
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
 

Similar to Maths Behind Models.pptx

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)Abdullah al Mamun
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 

Similar to Maths Behind Models.pptx (20)

Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Decision tree
Decision treeDecision tree
Decision tree
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Maths Behind Models.pptx

  • 1. DECODING ML MODELS BAGGING, BOOSTING & OTHERS
  • 2. Lets Discover… 01 02 03 04 Intro to Ensembles Lets Understand What is Ensemble Modelling Titanic Example This example will give better Idea about Ensembles. What is Bagging Random Forest, Bias Variance Trade-Off & Much More… What is Boosting Boosting Process…
  • 4. The Anatomy of Decision Trees How the Trees Look Like Splitting: Process of Dividi ng a node in sub-nodes Decision Node: When a sub-node is divided in further sub-nodes, t hen It is called Decision Node. Terminal Node: Nodes that d o not split further are called Leaf or Terminal Node. Branch/Sub-Tree: A subse ction of Entire Tree Parent/Child Node: A Node that is divid ed in sub node is called Parent Node a nd sub-nodes are called Child of Paren t Node
  • 5. Maths Behind Decision Trees • There are two algorithm for the construction of Decision Tree – Gini Index & ID 3 (Iterative Dichotomiser -3) • Here will focus on ID3. It used Entropy and Information Gain as Metrics. So now we are going to find out the Root Node basis ID3 Algorithm. • Entropy – It is the measure of uncertainty/impurity in the data. H – Entropy i – Set of classes in Data set p(i) - proportion of no of elements of i in Dataset (S) S – current dataset. For e.g. Basketball Data Remember that for a Binary Classification Problem if all examples are +ve or all are –ve then entropy will be 0 i.e. low. If half of the examples are +ve and half are –ve then entropy is One i.e. high. B B B R R B B B R R
  • 6. Ensemble Methods Phone a Friend in KBC Vs Audience Poll Asking a Single Person resembles like a Single Tree Asking a Group of People resembles like multiple Decision Trees Decision Trees Random Forest Image Source: Google
  • 8. What is Bagging • Decision Trees overfit the model and increases the variance. This makes the model vulnerable. • Bagged Trees averages many models to reduce the variance. • Although, Bagging is often applied to Decision Trees but it can be used with any type of method • In addition to reducing variance, it also helps avoid overfitting. • Bootstrap AGGregatING. • Bootstrap Sampling means sampling rows from the training data with replacement. • This means that a single training example can be repeated more than once. Image Source: Google
  • 9. How Bagging Works Step1 – Draw m Samples from with replacement from original training set where m is a no less than or equal to N. Note - With Bagged Trees, the common choice for m is one-half of N. Image Source: Google
  • 10. How Bagging Works Step1 – Draw m Samples from with replacement from original training set where m is a no less than or equal to N. Note - With Bagged Trees, the common choice for m is one-half of N. Step2 – Train the Decision Trees on the newly created bootstrapped samples. This Step 1 and Step 2 can be repeated n no of times. Typically more trees, the better the model. Image Source: Google
  • 11. How Bagging Works Step1 – Draw m Samples from with replacement from original training set where m is a no less than or equal to N. Note - With Bagged Trees, the common choice for m is one-half of N. Step2 – Train the Decision Trees on the newly created bootstrapped samples. This Step 1 and Step 2 can be repeated n no of times. Typically more trees, the better the model. Step3 – To Generate the Prediction, we would simply average the predictions from these models created on m samples, to get a final prediction. Bagging can dramatically reduce the variance of unstable models (e.g. Decision trees) leading to improved Prediction. Image Source: Google Averaging reduces variance but leaves the bias unchanged
  • 14. What is Boosting An Overview • Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model • The succeeding models are dependent on the previous model. • Boosting gives mis-classified samples higher preference/weight. • It is a method to “boost” weak learning algorithm (“single tree”) into strong learning algorithm. Image Source: Google
  • 15. What is Boosting End to End Process Let’s understand the way boosting works in the steps: 1. A subset is created from the original dataset. 2. Initially, all data points are given equal weights & a base model is created on this subset. 3. This model is used to make predictions on the whole dataset. 4. Errors are calculated using actual values & predicted values. 5. The observations which are incorrectly predicted, given higher weights. 6. Another model is created & predictions are made on the dataset. 7. The model tries to correct the error from the previous data set. 8. Similarly, multiple models are created to reduce the errors of the previous model. 9. The final model is the weighted mean of all models. Image Source: Google
  • 16. Difference between Bagging & Boosting • In bagging, Individual Trees are Independent of each others & is a method of improving performance by aggregating result of weak learners. • In boosting, Individual Trees are Not Independent of each other because the trees correct the result of previous trees. Bagging Boosting • Training stage is sequential in Boosting. This means weights are assigned to the data. • Training stage is parallel in Bagging. This means that each model is built independently. • The purpose of Bagging is to reduce Variance. It may solve the problem of Over-fitting • The purpose of Boosting is to reduce Bias. It may increase the overfitting in the data.
  • 17. How Boosting Works Boosting Process Let’s understand the way boosting works in the steps: 1. Suppose we take a D Dataset that has 10 observations. 2. When the model is fitted, it will create base learners sequentially. D - Dataset 1 2 3 4 5 6 7 8 9 10
  • 18. How Boosting Works Boosting Process Let’s understand the way boosting works in the steps: 1. Suppose we take a D Dataset that has 10 observations. 2. From this, when the model is fitted, it will create base learners sequentially, like here, I have the first base learner. 3. The records will be passed to the BL1 and we will see how the model has performed. Base Learner 1 D - Dataset 1 2 3 4 5 6 7 8 9 10
  • 19. How Boosting Works Boosting Process Let’s understand the way boosting works in the steps: 1. Suppose we take a D Dataset that has 10 observations. 2. From this, when the model is fitted, it will create base learners sequentially. 3. The records will be passed to the BL1 and we will see how the model has performed. 4. Lets Assume that 3,4 and 5th record is incorrectly classified Base Learner 1 D - Dataset 1 2 3 4 5 6 7 8 9 10
  • 20. How Boosting Works Boosting Process Let’s understand the way boosting works in the steps: 1. Suppose we take a D Dataset that has 10 observations. 2. From this, when the model is fitted, it will create base learners sequentially, like here, I have the first base learner. 3. The records will be passed to the BL1 and we will see how the model has performed. 4. Lets Assume that 3,4 and 5th record is incorrectly classified. 5. The Next Learner BL2 will take these incorrectly classified r ecords, add more weight to them and retrain. Base Learner 1 Incorrectly Classified records: 3rd,4th & 5th Base Learner 2
  • 21. How Boosting Works Boosting Process Let’s understand the way boosting works in the steps: 1. Suppose we take a D Dataset that has 10 observations. 2. From this, when the model is fitted, it will create base learners sequentially, like here, I have the first base learner. 3. The records will be passed to the BL1 and we will see how the model has performed. 4. Lets Assume that 3,4 and 5th record is incorrectly classified. 5. The Next Learner BL2 will take these incorrectly classified r ecords, add more weight to them and retrain. 6. If the errors regenerate, then the BL3 will be created and this process will keep on going till the errors are brought to a mini ma. Base Learner 1 Incorrectly Classified records: 3rd,4th & 5th Base Learner 2
  • 23. Advantages of Trees • Highly Accurate: Almost Half of Data Science Challenges are won by Tree Based Models. • Easy to Use – Easy to Implement, Gets Good Performance with Little Tuning. • Easy to Interpret and Control. • Controls Overfitting (Ensemble) • Improves Trainings Speed and Scale up-to - Large Datasets.