This document discusses ensemble machine learning methods. It introduces classifiers ensembles and describes three common ensemble methods: bagging, boosting, and random forests. For each method, it explains the basic idea, how the method works, advantages and disadvantages. Bagging constructs multiple classifiers from bootstrap samples of the training data and aggregates their predictions through voting. Boosting builds classifiers sequentially by focusing on misclassified examples. Random forests create decision trees with random subsets of features and samples. Ensembles can improve performance over single classifiers by reducing variance.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
Planning and Learning with Tabular MethodsDongmin Lee
Hello~!
This material is reviewed by Dong-Min Lee.
(http://www.facebook.com/dongminleeai)
A reviewed material is Chapter 8. Planning and Learning with Tabular Methods in Reinforcement Learning: An Introduction written by Richard S. Sutton and Andrew G. Barto.
Outline is
1) Introduction
2) Models and Planning
3) Dyna: Integrating Planning, Acting, and Learning
4) When the Model Is Wrong
5) Prioritized Sweeping
6) Expected vs. Sample Updates
7) Trajectory Sampling
8) Planning at Decision Time
9) Heuristic Search
10) Rollout Algorithms
11) Monte Carlo Tree Search
12) Summary
I'm happy to be reviewed Sutton book~!!!
I hope everyone who studies Reinforcement Learning sees this material and help! :)
Thank you!
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
Planning and Learning with Tabular MethodsDongmin Lee
Hello~!
This material is reviewed by Dong-Min Lee.
(http://www.facebook.com/dongminleeai)
A reviewed material is Chapter 8. Planning and Learning with Tabular Methods in Reinforcement Learning: An Introduction written by Richard S. Sutton and Andrew G. Barto.
Outline is
1) Introduction
2) Models and Planning
3) Dyna: Integrating Planning, Acting, and Learning
4) When the Model Is Wrong
5) Prioritized Sweeping
6) Expected vs. Sample Updates
7) Trajectory Sampling
8) Planning at Decision Time
9) Heuristic Search
10) Rollout Algorithms
11) Monte Carlo Tree Search
12) Summary
I'm happy to be reviewed Sutton book~!!!
I hope everyone who studies Reinforcement Learning sees this material and help! :)
Thank you!
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
basics of GAN neural network
GAN is a advanced tech in area of neural networks which will help to generate new data . This new data will be developed based over the past experiences and raw data.
Abstract: This PDSG workshop introduces basic concepts of ensemble methods in machine learning. Concepts covered are Condercet Jury Theorem, Weak Learners, Decision Stumps, Bagging and Majority Voting.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps to improve the performance and accuracy of machine learning algorithms. It is used to deal with bias-variance trade-offs and reduces the variance of a prediction model. Bagging avoids overfitting of data and is used for both regression and classification models, specifically for decision tree algorithms.
Multiclass classification of imbalanced dataSaurabhWani6
Pydata Talk on Classification of imbalanced data.
It is an overview of concepts for better classification in imbalanced datasets.
Resampling techniques are introduced along with bagging and boosting methods.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
basics of GAN neural network
GAN is a advanced tech in area of neural networks which will help to generate new data . This new data will be developed based over the past experiences and raw data.
Abstract: This PDSG workshop introduces basic concepts of ensemble methods in machine learning. Concepts covered are Condercet Jury Theorem, Weak Learners, Decision Stumps, Bagging and Majority Voting.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps to improve the performance and accuracy of machine learning algorithms. It is used to deal with bias-variance trade-offs and reduces the variance of a prediction model. Bagging avoids overfitting of data and is used for both regression and classification models, specifically for decision tree algorithms.
Multiclass classification of imbalanced dataSaurabhWani6
Pydata Talk on Classification of imbalanced data.
It is an overview of concepts for better classification in imbalanced datasets.
Resampling techniques are introduced along with bagging and boosting methods.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
Machine learning [1] is concerned with the design and development of algorithms that allow computers to evolve intelligent behaviors based on empirical data. Weak learner is a learning algorithm with accuracy less than 50%. Adaptive Boosting (Ada-Boost) is a machine learning algorithm may be used to increase accuracy for any weak learning algorithm. This can be achieved by running it on a given weak learner several times, slightly alters data and combines the hypotheses. In this paper, Ada-Boost algorithm is used to increase the accuracy of the weak learner Naïve-Bayesian classifier. The Ada-Boost algorithm iteratively works on the Naïve-Bayesian classifier with normalized weights and it classifies the given input into different classes with some attributes. Maize Expert System is developed to identify the diseases of Maize crop using Ada-Boost algorithm logic as inference mechanism. A separate user interface for the Maize expert system consisting of three different interfaces namely, End-user/farmer, Expert and Admin are presented here. End-user/farmer module may be used for identifying the diseases for the symptoms entered by the farmer. Expert module may be used for adding rules and questions to data set by a domain expert. Admin module may be used for maintenance of the system.
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
Machine learning [1] is concerned with the design and development of algorithms that allow computers to evolve intelligent behaviors based on empirical data. Weak learner is a learning algorithm with accuracy less than 50%. Adaptive Boosting (Ada-Boost) is a machine learning algorithm may be used to increase accuracy for any weak learning algorithm. This can be achieved by running it on a given weak learner several times, slightly alters data and combines the hypotheses. In this paper, Ada-Boost algorithm is used to increase the accuracy of the weak learner Naïve-Bayesian classifier. The Ada-Boost algorithm iteratively works on the Naïve-Bayesian classifier with normalized weights and it classifies the given input into different classes with some attributes. Maize Expert System is developed to identify the diseases of Maize crop using Ada-Boost algorithm logic as inference mechanism. A separate user interface for the Maize expert system consisting of three different interfaces namely, End-user/farmer, Expert and Admin are presented here. End-user/farmer module may be used for identifying the diseases for the symptoms entered by the farmer. Expert module may be used for adding rules and questions to data set by a domain expert. Admin module may be used for maintenance of the system.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
2. References 2
Jiawei Han and Micheline Kamber, quot;Data Mining: Concepts
and Techniquesquot;, The Morgan Kaufmann Series in Data
Management Systems (Second Edition)
Tom M. Mitchell. “Machine Learning” McGraw Hill 1997
Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
“Introduction to Data Mining”, Addison Wesley
Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine
Learning Tools and Techniques with Java Implementations”
2nd Edition
Prof. Pier Luca Lanzi
3. Outline 3
What is the general idea?
What ensemble methods?
Bagging
Boosting
Random Forest
Prof. Pier Luca Lanzi
4. Ensemble Methods 4
Construct a set of classifiers from the training data
Predict class label of previously unseen records by
aggregating predictions made by multiple classifiers
Prof. Pier Luca Lanzi
5. What is the General Idea? 5
Prof. Pier Luca Lanzi
6. Building models ensembles 6
Basic idea
Build different “experts”, let them vote
Advantage
Often improves predictive performance
Disadvantage
Usually produces output that is very hard to analyze
However, there are approaches that aim to produce a single
comprehensible structure
Prof. Pier Luca Lanzi
7. Why does it work? 7
Suppose there are 25 base classifiers
Each classifier has error rate, ε = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong
prediction:
⎛ 25 ⎞ i
25
∑⎜ i ⎟⎜ ⎟ε (1 − ε ) 25−i = 0.06
⎠
i =13 ⎝
Prof. Pier Luca Lanzi
8. How to generate an ensemble? 8
Bootstrap Aggregating (Bagging)
Boosting
Random Forests
Prof. Pier Luca Lanzi
9. What is Bagging? (Bootstrap 9
Aggregation)
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
Given a set D of d tuples, at each iteration i, a training set Di of
d tuples is sampled with replacement from D (i.e., bootstrap)
A classifier model Mi is learned for each training set Di
Classification: classify an unknown sample X
Each classifier Mi returns its class prediction
The bagged classifier M* counts the votes and
assigns the class with the most votes to X
Prediction: can be applied to the prediction of continuous values by
taking the average value of each prediction for a given test tuple
Prof. Pier Luca Lanzi
10. What is Bagging? 10
Combining predictions by voting/averaging
Simplest way
Each model receives equal weight
“Idealized” version:
Sample several training sets of size n
(instead of just having one training set of size n)
Build a classifier for each training set
Combine the classifiers’ predictions
Prof. Pier Luca Lanzi
11. More on bagging 11
Bagging works because it reduces variance
by voting/averaging
Note: in some pathological hypothetical situations the
overall error might increase
Usually, the more classifiers the better
Problem: we only have one dataset!
Solution: generate new ones of size n
by Bootstrap, i.e., sampling from it with replacement
Can help a lot if data is noisy
Can also be applied to numeric prediction
Aside: bias-variance decomposition originally only known
for numeric prediction
Prof. Pier Luca Lanzi
12. Bagging classifiers 12
Model generation
Let n be the number of instances in the training data
For each of t iterations:
Sample n instances from training set
(with replacement)
Apply learning algorithm to the sample
Store resulting model
Classification
For each of the t models:
Predict class of instance using model
Return class that is predicted most often
Prof. Pier Luca Lanzi
13. Bias-variance decomposition 13
Used to analyze how much selection of any specific training
set affects performance
Assume infinitely many classifiers,
built from different training sets of size n
For any learning scheme,
Bias = expected error of the combined
classifier on new data
Variance = expected error due to
the particular training set used
Total expected error ≈ bias + variance
Prof. Pier Luca Lanzi
14. When does Bagging work? 14
Learning algorithm is unstable, if small changes to the
training set cause large changes in the learned classifier
If the learning algorithm is unstable, then
Bagging almost always improves performance
Bagging stable classifiers is not a good idea
Which ones are unstable?
Neural nets, decision trees, regression trees, linear
regression
Which ones are stable?
K-nearest neighbors
Prof. Pier Luca Lanzi
15. Why Bagging works? 15
Let T={<xn, yn>} be the set of training data containing N examples
Let {Tk} be a sequence of training sets containing N examples
independently sampled from T, for instance Tk can be generated
using bootstrap
Let P be the underlying distribution of T
Bagging replaces the prediction of the model φ(x,T) obtained by E,
with the majority of the predictions given by the models {φ(x,Tk)}
φA(x,P) = ET (φ(x,Tk))
The algorithm is instable, if perturbing the learning set can cause
significant changes in the predictor constructed
Bagging can improve the accuracy of the predictor,
Prof. Pier Luca Lanzi
16. Why Bagging works? 16
It is possible to prove that,
(y –ET(φ(x,T)))2 ≤ ET(y - φ(x,T))2
Thus, Bagging produces a smaller error
How much is smaller the error is, depends on how much unequal
are the two sides of,
[ET(φ(x,T)) ]2 ≤ ET(φ2(x,T))
If the algorithm is stable, the two sides will be nearly equal
If more highly variable the φ(x,T) are the more improvement the
aggregation produces
However, φA always improves φ
Prof. Pier Luca Lanzi
17. Bagging with costs 17
Bagging unpruned decision trees known to produce good
probability estimates
Where, instead of voting, the individual classifiers'
probability estimates are averaged
Note: this can also improve the success rate
Can use this with minimum-expected cost approach
for learning problems with costs
Problem: not interpretable
MetaCost re-labels training data using bagging with costs
and then builds single tree
Prof. Pier Luca Lanzi
18. Randomization 18
Can randomize learning algorithm instead of input
Some algorithms already have a random component:
e.g. initial weights in neural net
Most algorithms can be randomized, e.g. greedy algorithms:
Pick from the N best options at random instead of always
picking the best options
E.g.: attribute selection in decision trees
More generally applicable than bagging:
e.g. random subsets in nearest-neighbor scheme
Can be combined with bagging
Prof. Pier Luca Lanzi
19. What is Boosting? 19
Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
How boosting works?
Weights are assigned to each training tuple
A series of k classifiers is iteratively learned
After a classifier Mi is learned, the weights are updated to allow
the subsequent classifier, Mi+1, to pay more attention to the
training tuples that were misclassified by Mi
The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
The boosting algorithm can be extended for the prediction of
continuous values
Comparing with bagging: boosting tends to achieve greater
accuracy, but it also risks overfitting the model to misclassified data
Prof. Pier Luca Lanzi
20. What is the Basic Idea? 20
Suppose there are just 5 training examples {1,2,3,4,5}
Initially each example has a 0.2 (1/5) probability of being sampled
1st round of boosting samples (with replacement) 5 examples:
{2, 4, 4, 3, 2} and builds a classifier from them
Suppose examples 2, 3, 5 are correctly predicted by this classifier,
and examples 1, 4 are wrongly predicted:
Weight of examples 1 and 4 is increased,
Weight of examples 2, 3, 5 is decreased
2nd round of boosting samples again 5 examples, but now
examples 1 and 4 are more likely to be sampled
And so on … until some convergence is achieved
Prof. Pier Luca Lanzi
21. Boosting 21
Also uses voting/averaging
Weights models according to performance
Iterative: new models are influenced
by the performance of previously built ones
Encourage new model to become an “expert” for instances
misclassified by earlier models
Intuitive justification: models should be experts that
complement each other
Several variants
Boosting by sampling,
the weights are used to sample the data for training
Boosting by weighting,
the weights are used by the learning algorithm
Prof. Pier Luca Lanzi
22. AdaBoost.M1 22
Model generation
Assign equal weight to each training instance
For t iterations:
Apply learning algorithm to weighted dataset,
store resulting model
Compute model’s error e on weighted dataset
If e = 0 or e ≥ 0.5:
Terminate model generation
For each instance in dataset:
If classified correctly by model:
Multiply instance’s weight by e/(1-e)
Normalize weight of all instances
Classification
Assign weight = 0 to all classes
For each of the t (or less) models:
For the class this model predicts
add –log e/(1-e) to this class’s weight
Return class with highest weight
Prof. Pier Luca Lanzi
23. Example: AdaBoost 23
Base classifiers: C1, C2, …, CT
Error rate:
∑ w δ (C ( x ) ≠ y )
N
1
εi = j i j j
N j =1
Importance of a classifier:
1 ⎛ 1 − εi ⎞
αi = ln⎜
⎜ε ⎟
2⎝ i⎟ ⎠
Prof. Pier Luca Lanzi
24. Example: AdaBoost 24
Weight update:
⎧exp −α j if C j ( xi ) = yi
w⎪
( j)
( j +1)
= ⎨
i
wi α
Z j ⎪ exp j if C j ( xi ) ≠ yi
⎩
where Z j is the normalizat ion factor
If any intermediate rounds produce error rate higher than
50%, the weights are reverted back to 1/n and the
resampling procedure is repeated
Classification:
C * ( x ) = arg max ∑ α jδ (C j ( x ) = y )
T
j =1
y
Prof. Pier Luca Lanzi
25. Illustrating AdaBoost 25
Initial weights for each data point Data points
for training
Prof. Pier Luca Lanzi
27. Adaboost (Freund and Schapire, 1997) 27
Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
Tuples from D are sampled (with replacement) to form a
training set Di of the same size
Each tuple’s chance of being selected is based on its weight
A classification model Mi is derived from Di
Its error rate is calculated using Di as a test set
If a tuple is misclassified, its weight is increased, o.w. it is
decreased
Error rate: err(Xj) is the misclassification error of tuple Xj.
Classifier Mi error rate is the sum of the weights of the
misclassified tuples:
d
error ( M i ) = ∑ w j × err ( X j )
j
The weight of classifier Mi’s vote is
1 − error ( M i )
log
error ( M i )
Prof. Pier Luca Lanzi
28. What is a Random Forest? 28
Random forests (RF) are a combination of tree predictors
Each tree depends on the values of a random vector sampled
independently and with the same distribution for all trees in
the forest
The generalization error of a forest of tree classifiers depends
on the strength of the individual trees in the forest and the
correlation between them
Using a random selection of features to split each node yields
error rates that compare favorably to Adaboost, and are
more robust with respect to noise
Prof. Pier Luca Lanzi
29. How do Random Forests Work? 29
D = training set F = set of tests
k = nb of trees in forest n = nb of tests
for i = 1 to k do:
build data set Di by sampling with replacement from D
learn tree Ti (Tilde) from Di:
at each node:
choose best split from random subset of F of size n
allow aggregates and refinement of aggregates in tests
make predictions according to majority vote of the set of k trees.
Prof. Pier Luca Lanzi
30. Random Forests 30
Ensemble method tailored for decision tree classifiers
Creates k decision trees, where each tree is independently
generated based on random decisions
Bagging using decision trees can be seen as a special case of
random forests where the random decisions are the random
creations of the bootstrap samples
Prof. Pier Luca Lanzi
31. Two examples of random decisions 31
in decision forests
At each internal tree node, randomly select F attributes, and
evaluate just those attributes to choose the partitioning attribute
Tends to produce trees larger than trees where all attributes are
considered for selection at each node, but different classes will
be eventually assigned to different leaf nodes, anyway
Saves processing time in the construction of each individual
tree, since just a subset of attributes is considered at each
internal node
At each internal tree node, evaluate the quality of all possible
partitioning attributes, but randomly select one of the F best
attributes to label that node (based on InfoGain, etc.)
Unlike the previous approach, does not save processing time
Prof. Pier Luca Lanzi
32. Properties of Random Forests 32
Easy to use (quot;off-the-shelvequot;), only 2 parameters
(no. of trees, %variables for split)
Very high accuracy
No overfitting if selecting large number of trees (choose high)
Insensitive to choice of split% (~20%)
Returns an estimate of variable importance
Prof. Pier Luca Lanzi
33. Out of the bag 33
For every tree grown, about one-third of the cases are out-of-bag
(out of the bootstrap sample). Abbreviated oob.
Put these oob cases down the corresponding tree and get response
estimates for them.
For each case n, average or pluralize the response estimates over
all time that n was oob to get a test set estimate yn for yn.
Averaging the loss over all n give the test set
estimate of prediction error.
The only adjustable parameter in RF is m.
The default value for m is M. But RF is not sensitive to the value of
m over a wide range.
Prof. Pier Luca Lanzi
34. Variable Importance 34
Because of the need to know which variables are important in the
classification, RF has three different ways of looking at variable
importance
Measure 1
To estimate the importance of the mth variable, in the oob
cases for the kth tree, randomly permute all values of the mth
variable
Put these altered oob x-values down the tree and get
classifications.
Proceed as though computing a new internal error rate.
The amount by which this new error exceeds the original test
set error is defined as the importance of the mth variable.
Prof. Pier Luca Lanzi
35. Variable Importance 35
For the nth case in the data, its margin at the end of a run is the
proportion of votes for its true class minus the maximum of the
proportion of votes for each of the other classes
The 2nd measure of importance of the mth variable is the average
lowering of the margin across all cases when the mth variable is
randomly permuted as in method 1
The third measure is the count of how many margins are lowered
minus the number of margins raised
Prof. Pier Luca Lanzi
36. Summary of Random Forests 36
Random forests are an effective tool in prediction.
Forests give results competitive with boosting and adaptive
bagging, yet do not progressively change the training set.
Random inputs and random features produce good results in
classification- less so in regression.
For larger data sets, we can gain accuracy by combining
random features with boosting.
Prof. Pier Luca Lanzi
37. Summary 37
Ensembles in general improve predictive accuracy
Good results reported for most application domains,
unlike algorithm variations whose success are more
dependant on the application domain/dataset
Improvement in accuracy, but interpretability decreases
Much more difficult for the user to interpret an ensemble
of classification models than a single classification model
Diversity of the base classifiers in the ensemble is important
Trade-off between each base classifier’s error and
diversity
Maximizing classifier diversity tends to increase the error
of each individual base classifier
Prof. Pier Luca Lanzi