Successfully reported this slideshow.
Upcoming SlideShare
×

Ensemble learning Techniques

1,446 views

Published on

Explain Bagging, Boosting and Voting

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No

Ensemble learning Techniques

1. 1. AI HACKERS ENSEMBLE LEARNING
2. 2. INTRODUCTION TO ENSEMBLE LEARNING Definition • An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances Source: http://jair.org/papers/paper614.html
3. 3. ENSEMBLE MODELS Combine Model Predictions Into Ensemble Predictions The three most popular methods for combining the predictions from different models are: • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset. • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain. • Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.
4. 4. BAGGING • performs best with algorithms that have high variance • Operates via equal weighting of models • Settles on result using majority voting • Employs multiple instances of same classifier for one dataset • Builds models of smaller datasets by sampling with replacement • Works best when classifier is unstable (decision trees, for example), as this instability creates models of differing accuracy and results to draw majority from • Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate conclusions
5. 5. UNDERSTANDING IRIS DATASET
6. 6. BAGGING – DECISION TREE
7. 7. BAGGING – IN SCIKIT LEARN • model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed) • Where base_estimator can be classifier of our choice • n_estimators = number of estimators you want to be build • Random_state if you want to use seed to reproduce results using various different models
8. 8. CROSS VALIDATION kfold = model_selection.KFold(n_splits=n, random_state=seed)
9. 9. RANDOM FOREST • extension of bagged decision trees • Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers • Thumbrule: All Not Features are selected
10. 10. RANDOM FOREST V/S BAGGED FOREST • Bagged Forest : All predictor variables are applied to each tree • Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in overfitting
11. 11. EXTRA TREES • Similar to Random forest • differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random in the case of an Extremely Randomized Trees • the next split is the best split among random uniform splits in the selected variables for the current tree. IMPACT: contains a bias-variance analysis ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets) Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
12. 12. BOOSTING • Instead of assigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result based on weighted voting. • Operates via weighted voting • Algorithm proceeds iteratively; new models are influenced by previous ones • New models become experts for instances classified incorrectly by earlier models • Can be used without weights by using resampling, with probability determined by weights • Works well if classifiers are not too complex • Also works well with weak learners like decision trees • Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm • LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems • GradientBoosting is most sophisticated boosting algorithm
13. 13. LOGIT BOOST V/S GRADIENT BOOST • Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics regression function.
14. 14. VOTING ENSEMBLE • combining the predictions from multiple machine learning algorithms. • Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit- learn.
15. 15. STACKING? • Trains multiple learners (as opposed to bagging/boosting which train a single learner) • Each learner uses a subset of data • A "combiner" is trained on a validation segment • Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes) • Difficult to analyze theoretically ("black magic") • Level-1 → meta learner • Level-0 → base classifiers • Can also be used for numeric prediction (regression) • The best algorithms to use for base models are smooth, global learners
16. 16. THANK YOU • REFERENCES • https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/ • http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris- py