2. Introduction to Ensemble Learning
• Ensemble Learning is a combination of several machine learning
models (weak learners) in 1 problem. The intuition is that when you
comebine several weak learners, they can become strong learners.
• Each weak learner is fitted on the training set and provides
predictions obtained. The final prediction is computed by combining
the results from all the weak learners. In this manner, it usually
produces more accurate solutions than a single model would.
3. Basic Ensemble Learning Techniques
1. Max Voting: Prediction from each model = 1 vote. Final prediction
comes from the prediction with the most votes.
2. Averaging: Final prediction is the average of all predictions. This
goes for regression problems, example – random forest regression,
where the final result is the average of the predictions from
individual decision trees.
3. Weighted Average: The base model with higher predictive power is
more important and thus, given more weight – the sum of the
weights would equal 1.
4. Advanced Ensemble Learning Techniques
1. Stacking
• Here, various estimators are combined to
reduce their biases.
• Predictions from each estimator are stacked
together and used as input to a final
estimator (meta-model).
• Training of the final estimator happens via
cross-validation.
5. Stacking - Methodology
1.Divide the training set into K folds, for example 10.
2.Train a base model (say decision tree) on 9 folds and
make predictions on the 10th fold.
3.Repeat until you have a prediction for each fold.
4.Fit the base model on the whole training set.
5.Use the model to make predictions on the test set.
6.Repeat step 2-5 for other base models (for example
knn).
7.Use predictions from the test set as features to build a
new model (meta-model).
8. Make final predictions on the
test set using the meta model.
6. Advanced Ensemble Learning Techniques
2. Blending
• Uses a holdout/validation set from the training set to make
predictions, i.e., predictions are made on the holdout set only.
• The holdout set and the predictions are used to build a model which
is run on the test set.
• It is simpler than stacking and prevents leakage of information in the
model.
• However, blending uses less data, may lead to overfitting.
7. Blending - Methodology
1. The training set is split into training and
validation sets.
2. Models are fitted on the training set.
3. Make predictions on the validation and
training set.
4. Use validation set and its predictions to build
a final mode.
5. Make final predictions using this model
8. Advanced Ensemble Learning Techniques
3. Bagging
• Takes random samples of data, builds learning algorithms and uses
the mean to find bagging probabilities.
• Also known as bootstrap aggregating.
• Aggregates the results from several models in order to obtain a
generalized result.
9. Bagging - Methodology
1. Create multiple subsets from original
dataset with replacement.
2. Build a base model for each of the
subsets.
3. Running all the models in parallel.
4. Combining predictions from all models
to obtain final predictions.
10. Advanced Ensemble Learning Techniques
3. Boosting
• If a data point is incorrectly predicted by the first model, and then the
next (probably all models), will combining the predictions provide
better results? Such situations are taken care of by boosting.
• Boosting is a sequential process, where each subsequent model
attempts to correct the errors of the previous model.
• Reduces bias and variance by converting weak learners into strong
learners.
11. Boosting - Methodology
1. Create a subset from the original data.
2. Build an initial model with this data.
3. Run predictions on the whole data set,.
4. Calculate the error using the predictions
and the actual values.
5. Assign more weight to the incorrect
predictions.
6. Create another model that attempts to fix
errors from the last model.
7. Run predictions on the entire dataset with
the new model.
8. Create several models with each model
aiming at correcting the errors generated by
the previous one.
9. Obtain the final model by weighting the
mean of all the models.
12. …and many more methods as well.
Libraries for Ensemble Learning