SlideShare a Scribd company logo
AI HACKERS
ENSEMBLE LEARNING
INTRODUCTION TO ENSEMBLE LEARNING
Definition
• An ensemble consists of a set of individually trained classifiers (such as neural networks or decision
trees) whose predictions are combined when classifying novel instances
Source: http://jair.org/papers/paper614.html
ENSEMBLE MODELS
Combine Model Predictions Into Ensemble Predictions
The three most popular methods for combining the predictions from different models are:
• Bagging. Building multiple models (typically of the same type) from different subsamples of the training
dataset.
• Boosting. Building multiple models (typically of the same type) each of which learns to fix the
prediction errors of a prior model in the chain.
• Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the
mean) are used to combine predictions.
BAGGING
• performs best with algorithms that have high variance
• Operates via equal weighting of models
• Settles on result using majority voting
• Employs multiple instances of same classifier for one dataset
• Builds models of smaller datasets by sampling with replacement
• Works best when classifier is unstable (decision trees, for example), as this instability creates models of
differing accuracy and results to draw majority from
• Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate
conclusions
UNDERSTANDING IRIS DATASET
BAGGING – DECISION TREE
BAGGING – IN SCIKIT LEARN
• model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed)
• Where base_estimator can be classifier of our choice
• n_estimators = number of estimators you want to be build
• Random_state if you want to use seed to reproduce results using various different models
CROSS VALIDATION
kfold = model_selection.KFold(n_splits=n, random_state=seed)
RANDOM FOREST
• extension of bagged decision trees
• Samples of the training dataset are taken with replacement, but the trees are constructed in a way that
reduces the correlation between individual classifiers
• Thumbrule: All Not Features are selected
RANDOM FOREST V/S BAGGED FOREST
• Bagged Forest : All predictor variables are applied to each tree
• Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in
overfitting
EXTRA TREES
• Similar to Random forest
• differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random
in the case of an Extremely Randomized Trees
• the next split is the best split among random uniform splits in the selected variables for the current tree.
IMPACT:
contains a bias-variance analysis
ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets)
Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
BOOSTING
• Instead of assigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result
based on weighted voting.
• Operates via weighted voting
• Algorithm proceeds iteratively; new models are influenced by previous ones
• New models become experts for instances classified incorrectly by earlier models
• Can be used without weights by using resampling, with probability determined by weights
• Works well if classifiers are not too complex
• Also works well with weak learners like decision trees
• Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm
• LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems
• GradientBoosting is most sophisticated boosting algorithm
LOGIT BOOST V/S GRADIENT BOOST
• Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics
regression function.
VOTING ENSEMBLE
• combining the predictions from multiple machine learning algorithms.
• Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or
even heuristically is difficult. More advanced methods can learn how to best weight the predictions
from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit-
learn.
STACKING?
• Trains multiple learners (as opposed to bagging/boosting which train a single learner)
• Each learner uses a subset of data
• A "combiner" is trained on a validation segment
• Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes)
• Difficult to analyze theoretically ("black magic")
• Level-1 → meta learner
• Level-0 → base classifiers
• Can also be used for numeric prediction (regression)
• The best algorithms to use for base models are smooth, global learners
THANK YOU
• REFERENCES
• https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/
• http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris-
py

More Related Content

What's hot

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
ComsatsSahiwal1
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
Nihar Ranjan
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
StepUp Analytics
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
CloudxLab
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
Pier Luca Lanzi
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Stacking ensemble
Stacking ensembleStacking ensemble
Stacking ensemble
kalung0313
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
Dr Sulaimon Afolabi
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
piyushkumar222909
 

What's hot (20)

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Clustering
ClusteringClustering
Clustering
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Stacking ensemble
Stacking ensembleStacking ensemble
Stacking ensemble
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
 

Similar to Ensemble learning Techniques

Random Forest.pptx
Random Forest.pptxRandom Forest.pptx
Random Forest.pptx
SPIDERSRSTV
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
DynamicPitch
 
Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptx
agathaljjwm20
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
Tuan Yang
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptx
stalkthemhaha
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
Maarten Smeets
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
Ensemble Method.pptx
Ensemble Method.pptxEnsemble Method.pptx
Ensemble Method.pptx
yashaswinitiwari1
 
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Giacomo Boracchi
 
Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)
Abdullah al Mamun
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 

Similar to Ensemble learning Techniques (20)

Random Forest.pptx
Random Forest.pptxRandom Forest.pptx
Random Forest.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptx
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptx
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Ensemble Method.pptx
Ensemble Method.pptxEnsemble Method.pptx
Ensemble Method.pptx
 
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
 
Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)Ensemble Method (Bagging Boosting)
Ensemble Method (Bagging Boosting)
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 

More from Babu Priyavrat

5G and Drones
5G and Drones 5G and Drones
5G and Drones
Babu Priyavrat
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processing
Babu Priyavrat
 
Lda and it's applications
Lda and it's applicationsLda and it's applications
Lda and it's applications
Babu Priyavrat
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
Babu Priyavrat
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Babu Priyavrat
 
Neural network
Neural networkNeural network
Neural network
Babu Priyavrat
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
Babu Priyavrat
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat
 

More from Babu Priyavrat (8)

5G and Drones
5G and Drones 5G and Drones
5G and Drones
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processing
 
Lda and it's applications
Lda and it's applicationsLda and it's applications
Lda and it's applications
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Neural network
Neural networkNeural network
Neural network
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

Ensemble learning Techniques

  • 2. INTRODUCTION TO ENSEMBLE LEARNING Definition • An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances Source: http://jair.org/papers/paper614.html
  • 3. ENSEMBLE MODELS Combine Model Predictions Into Ensemble Predictions The three most popular methods for combining the predictions from different models are: • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset. • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain. • Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.
  • 4. BAGGING • performs best with algorithms that have high variance • Operates via equal weighting of models • Settles on result using majority voting • Employs multiple instances of same classifier for one dataset • Builds models of smaller datasets by sampling with replacement • Works best when classifier is unstable (decision trees, for example), as this instability creates models of differing accuracy and results to draw majority from • Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate conclusions
  • 7. BAGGING – IN SCIKIT LEARN • model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed) • Where base_estimator can be classifier of our choice • n_estimators = number of estimators you want to be build • Random_state if you want to use seed to reproduce results using various different models
  • 8. CROSS VALIDATION kfold = model_selection.KFold(n_splits=n, random_state=seed)
  • 9. RANDOM FOREST • extension of bagged decision trees • Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers • Thumbrule: All Not Features are selected
  • 10. RANDOM FOREST V/S BAGGED FOREST • Bagged Forest : All predictor variables are applied to each tree • Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in overfitting
  • 11. EXTRA TREES • Similar to Random forest • differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random in the case of an Extremely Randomized Trees • the next split is the best split among random uniform splits in the selected variables for the current tree. IMPACT: contains a bias-variance analysis ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets) Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
  • 12. BOOSTING • Instead of assigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result based on weighted voting. • Operates via weighted voting • Algorithm proceeds iteratively; new models are influenced by previous ones • New models become experts for instances classified incorrectly by earlier models • Can be used without weights by using resampling, with probability determined by weights • Works well if classifiers are not too complex • Also works well with weak learners like decision trees • Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm • LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems • GradientBoosting is most sophisticated boosting algorithm
  • 13. LOGIT BOOST V/S GRADIENT BOOST • Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics regression function.
  • 14. VOTING ENSEMBLE • combining the predictions from multiple machine learning algorithms. • Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit- learn.
  • 15. STACKING? • Trains multiple learners (as opposed to bagging/boosting which train a single learner) • Each learner uses a subset of data • A "combiner" is trained on a validation segment • Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes) • Difficult to analyze theoretically ("black magic") • Level-1 → meta learner • Level-0 → base classifiers • Can also be used for numeric prediction (regression) • The best algorithms to use for base models are smooth, global learners
  • 16. THANK YOU • REFERENCES • https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/ • http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris- py