AI HACKERS
ENSEMBLE LEARNING
INTRODUCTION TO ENSEMBLE LEARNING
Definition
• An ensemble consists of a set of individually trained classifiers (such as neural networks or decision
trees) whose predictions are combined when classifying novel instances
Source: http://jair.org/papers/paper614.html
ENSEMBLE MODELS
Combine Model Predictions Into Ensemble Predictions
The three most popular methods for combining the predictions from different models are:
• Bagging. Building multiple models (typically of the same type) from different subsamples of the training
dataset.
• Boosting. Building multiple models (typically of the same type) each of which learns to fix the
prediction errors of a prior model in the chain.
• Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the
mean) are used to combine predictions.
BAGGING
• performs best with algorithms that have high variance
• Operates via equal weighting of models
• Settles on result using majority voting
• Employs multiple instances of same classifier for one dataset
• Builds models of smaller datasets by sampling with replacement
• Works best when classifier is unstable (decision trees, for example), as this instability creates models of
differing accuracy and results to draw majority from
• Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate
conclusions
UNDERSTANDING IRIS DATASET
BAGGING – DECISION TREE
BAGGING – IN SCIKIT LEARN
• model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed)
• Where base_estimator can be classifier of our choice
• n_estimators = number of estimators you want to be build
• Random_state if you want to use seed to reproduce results using various different models
CROSS VALIDATION
kfold = model_selection.KFold(n_splits=n, random_state=seed)
RANDOM FOREST
• extension of bagged decision trees
• Samples of the training dataset are taken with replacement, but the trees are constructed in a way that
reduces the correlation between individual classifiers
• Thumbrule: All Not Features are selected
RANDOM FOREST V/S BAGGED FOREST
• Bagged Forest : All predictor variables are applied to each tree
• Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in
overfitting
EXTRA TREES
• Similar to Random forest
• differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random
in the case of an Extremely Randomized Trees
• the next split is the best split among random uniform splits in the selected variables for the current tree.
IMPACT:
contains a bias-variance analysis
ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets)
Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
BOOSTING
• Instead of assigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result
based on weighted voting.
• Operates via weighted voting
• Algorithm proceeds iteratively; new models are influenced by previous ones
• New models become experts for instances classified incorrectly by earlier models
• Can be used without weights by using resampling, with probability determined by weights
• Works well if classifiers are not too complex
• Also works well with weak learners like decision trees
• Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm
• LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems
• GradientBoosting is most sophisticated boosting algorithm
LOGIT BOOST V/S GRADIENT BOOST
• Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics
regression function.
VOTING ENSEMBLE
• combining the predictions from multiple machine learning algorithms.
• Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or
even heuristically is difficult. More advanced methods can learn how to best weight the predictions
from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit-
learn.
STACKING?
• Trains multiple learners (as opposed to bagging/boosting which train a single learner)
• Each learner uses a subset of data
• A "combiner" is trained on a validation segment
• Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes)
• Difficult to analyze theoretically ("black magic")
• Level-1 → meta learner
• Level-0 → base classifiers
• Can also be used for numeric prediction (regression)
• The best algorithms to use for base models are smooth, global learners
THANK YOU
• REFERENCES
• https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/
• http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris-
py

Ensemble learning Techniques

  • 1.
  • 2.
    INTRODUCTION TO ENSEMBLELEARNING Definition • An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances Source: http://jair.org/papers/paper614.html
  • 3.
    ENSEMBLE MODELS Combine ModelPredictions Into Ensemble Predictions The three most popular methods for combining the predictions from different models are: • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset. • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain. • Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.
  • 4.
    BAGGING • performs bestwith algorithms that have high variance • Operates via equal weighting of models • Settles on result using majority voting • Employs multiple instances of same classifier for one dataset • Builds models of smaller datasets by sampling with replacement • Works best when classifier is unstable (decision trees, for example), as this instability creates models of differing accuracy and results to draw majority from • Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate conclusions
  • 5.
  • 6.
  • 7.
    BAGGING – INSCIKIT LEARN • model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed) • Where base_estimator can be classifier of our choice • n_estimators = number of estimators you want to be build • Random_state if you want to use seed to reproduce results using various different models
  • 8.
    CROSS VALIDATION kfold =model_selection.KFold(n_splits=n, random_state=seed)
  • 9.
    RANDOM FOREST • extensionof bagged decision trees • Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers • Thumbrule: All Not Features are selected
  • 10.
    RANDOM FOREST V/SBAGGED FOREST • Bagged Forest : All predictor variables are applied to each tree • Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in overfitting
  • 11.
    EXTRA TREES • Similarto Random forest • differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random in the case of an Extremely Randomized Trees • the next split is the best split among random uniform splits in the selected variables for the current tree. IMPACT: contains a bias-variance analysis ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets) Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
  • 12.
    BOOSTING • Instead ofassigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result based on weighted voting. • Operates via weighted voting • Algorithm proceeds iteratively; new models are influenced by previous ones • New models become experts for instances classified incorrectly by earlier models • Can be used without weights by using resampling, with probability determined by weights • Works well if classifiers are not too complex • Also works well with weak learners like decision trees • Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm • LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems • GradientBoosting is most sophisticated boosting algorithm
  • 13.
    LOGIT BOOST V/SGRADIENT BOOST • Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics regression function.
  • 14.
    VOTING ENSEMBLE • combiningthe predictions from multiple machine learning algorithms. • Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit- learn.
  • 15.
    STACKING? • Trains multiplelearners (as opposed to bagging/boosting which train a single learner) • Each learner uses a subset of data • A "combiner" is trained on a validation segment • Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes) • Difficult to analyze theoretically ("black magic") • Level-1 → meta learner • Level-0 → base classifiers • Can also be used for numeric prediction (regression) • The best algorithms to use for base models are smooth, global learners
  • 16.
    THANK YOU • REFERENCES •https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/ • http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris- py