Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

**Scribd will begin operating the SlideShare business on December 1, 2020**
As of this date, Scribd will manage your SlideShare account and any content you may have on SlideShare, and Scribd's General Terms of Use and Privacy Policy will apply. If you wish to opt out, please close your SlideShare account. Learn more.

Successfully reported this slideshow.

Like this presentation? Why not share!

- What to Upload to SlideShare by SlideShare 4701767 views
- Customer Code: Creating a Company C... by HubSpot 3522235 views
- Be A Great Product Leader (Amplify,... by Adam Nash 784444 views
- Trillion Dollar Coach Book (Bill Ca... by Eric Schmidt 943332 views
- APIdays Paris 2019 - Innovation @ s... by apidays 997451 views
- A few thoughts on work life-balance by Wim Vanderbauwhede 824692 views

1,446 views

Published on

Explain Bagging, Boosting and Voting

Published in:
Data & Analytics

No Downloads

Total views

1,446

On SlideShare

0

From Embeds

0

Number of Embeds

683

Shares

0

Downloads

33

Comments

3

Likes

3

No notes for slide

- 1. AI HACKERS ENSEMBLE LEARNING
- 2. INTRODUCTION TO ENSEMBLE LEARNING Definition • An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances Source: http://jair.org/papers/paper614.html
- 3. ENSEMBLE MODELS Combine Model Predictions Into Ensemble Predictions The three most popular methods for combining the predictions from different models are: • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset. • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain. • Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.
- 4. BAGGING • performs best with algorithms that have high variance • Operates via equal weighting of models • Settles on result using majority voting • Employs multiple instances of same classifier for one dataset • Builds models of smaller datasets by sampling with replacement • Works best when classifier is unstable (decision trees, for example), as this instability creates models of differing accuracy and results to draw majority from • Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate conclusions
- 5. UNDERSTANDING IRIS DATASET
- 6. BAGGING – DECISION TREE
- 7. BAGGING – IN SCIKIT LEARN • model = BaggingClassifier(base_estimator=choice, n_estimators=X, random_state=seed) • Where base_estimator can be classifier of our choice • n_estimators = number of estimators you want to be build • Random_state if you want to use seed to reproduce results using various different models
- 8. CROSS VALIDATION kfold = model_selection.KFold(n_splits=n, random_state=seed)
- 9. RANDOM FOREST • extension of bagged decision trees • Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers • Thumbrule: All Not Features are selected
- 10. RANDOM FOREST V/S BAGGED FOREST • Bagged Forest : All predictor variables are applied to each tree • Random Forest: only a subset of predictor variables are applied to each tree and thus can help avoid in overfitting
- 11. EXTRA TREES • Similar to Random forest • differ in the sense that the splits of the trees in the Random Forest are deterministic whereas they are random in the case of an Extremely Randomized Trees • the next split is the best split among random uniform splits in the selected variables for the current tree. IMPACT: contains a bias-variance analysis ET being a bit worse when there is a high number of noisy features (in high dimensional data-sets) Further reading: https://orbi.uliege.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf
- 12. BOOSTING • Instead of assigning equal weighting to models, boosting assigns varying weights to classifiers, and derives its ultimate result based on weighted voting. • Operates via weighted voting • Algorithm proceeds iteratively; new models are influenced by previous ones • New models become experts for instances classified incorrectly by earlier models • Can be used without weights by using resampling, with probability determined by weights • Works well if classifiers are not too complex • Also works well with weak learners like decision trees • Adaptive Boosting is a popular boosting algorithm – First successful boosting algorithm • LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems • GradientBoosting is most sophisticated boosting algorithm
- 13. LOGIT BOOST V/S GRADIENT BOOST • Gradient minimizes error using exponential loss function where as Logit Minimizes error using Logistics regression function.
- 14. VOTING ENSEMBLE • combining the predictions from multiple machine learning algorithms. • Predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit- learn.
- 15. STACKING? • Trains multiple learners (as opposed to bagging/boosting which train a single learner) • Each learner uses a subset of data • A "combiner" is trained on a validation segment • Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes) • Difficult to analyze theoretically ("black magic") • Level-1 → meta learner • Level-0 → base classifiers • Can also be used for numeric prediction (regression) • The best algorithms to use for base models are smooth, global learners
- 16. THANK YOU • REFERENCES • https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/ • http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris- py

No public clipboards found for this slide

Login to see the comments