2. Agenda
● Introduction to Ensemble Methods
● Families of ensemble methods
● Forest
● AdaBoost
● Gradient Tree Boosting
● Voting Classifier
● XGBoost
3. Introduction to Ensemble Methods
● Ensemble methods are techniques that create multiple models and then
combine them to produce improved results.
● Usually boosts accuracy compared to base models themselves
● Very popular in competitions
5. Bagging
● Building multiple models (typically of the same type) from different
subsamples of the training dataset.
● Picks multiple sample from same training dataset as configured as
n_estimators
● Training one model for each picked sample.
● Final prediction if a function of prediction by all models.
BaggingClassif/Regre RandomForestClassif/Regre ExtraTreeClasif/Regres
6. RandomForset
● Used for both classification & Regression
● Samples of training dataset are taken with replacement
● Models are trained using the subsamples
● Final result is function of all the results of participating models
● Reduces variance of base learning method
● Usually base algorithm is Decision Tree
● All features are considered, that reduces correlation between trees
8. Boosting
● Building multiple models (typically of the same type) each of which learns to
fix the prediction errors of a prior model in the chain.
● Creating strong predictor using weak learners
● This is done by building a model from the training data, then creating a
second model that attempts to correct the errors from the first model.
● Models are added until the training set is predicted perfectly or a maximum
number of models are added.
AdaBoost GradientBoostingTree XGBoost
9. AdaBoost
● Suited for bi-class classification
● Steps are as follows
- Train weak learner on training data
- Increase weights of misclassified data
- Increased weight data has more chances of getting picked in next model
training
- Final prediction is function of all the participating models
12. XGBoost
● Advanced implementation of Gradient Boosting Algorithm
● Regularized Boosting to prevent overfitting
● In-built mechanism of handling missing values
● Re-train already trained model
● In-built cross validation
13. Voting
● Building multiple models (typically of differing types) and simple majority or
weighted majority as prediction
● Participating learning algorithms can be SVM, KNearestClassifiers, Logistic
Regression, bagging or boosting methods.
● Here participating learners should be strong.
● Weights can be assigned to different algorithm.
16. Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com