This document discusses ensemble learning methods. It begins by introducing the concept of ensemble learning, which involves combining multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. It then discusses several popular ensemble methods, including boosting, bagging, random forests, and DECORATE. Boosting works by iteratively training weak learners on reweighted versions of the data to focus on examples that previous learners misclassified. Bagging trains learners on randomly sampled subsets of the data and combines them by averaging or voting. Random forests add additional randomness to bagging. DECORATE improves ensembles by adding artificial training examples to encourage diversity.
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
In this tutorial, we will learn the the following topics -
+ Linear SVM Classification
+ Soft Margin Classification
+ Nonlinear SVM Classification
+ Polynomial Kernel
+ Adding Similarity Features
+ Gaussian RBF Kernel
+ Computational Complexity
+ SVM Regression
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...Simplilearn
This presentation on Introduction to Machine Learning will explain what is Machine Learning and how does Machine Learning works. By the end of this presentation, you will be able to understand what are the types of Machine Learning, Machine Learning algorithms and some of the breakthroughs in Machine Learning industry. You will also learn what Machine Learning has to offer to us in terms of career opportunities.
This Machine Learning presentation will cover the following topics:
1. Real life applications of Machine Learning
2. Machine Learning Challenges
3. How did Machine Learning evolve?
4. Why Machine Learning / Machine Learning benefits
5. What is Machine Learning?
6. Types of Machine Learning ( Supervised, Unsupervised & Reinforcement Learning )
7. Machine Learning algorithms
8. Breakthroughs in Machine Learning
9. Machine Learning Future
10. Machine Learning Career
11. Machine Learning job trends
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This article gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs.
References:
Eiben, Agoston E., and James E. Smith. Introduction to evolutionary computing. Vol. 53. Heidelberg: springer, 2003.
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
The GENETIC ALGORITHM is a model of machine learning which derives its behavior from a metaphor of the processes of EVOLUTION in nature. Genetic Algorithm (GA) is a search heuristic that mimics the process of natural selection. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems.
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
In this tutorial, we will learn the the following topics -
+ Linear SVM Classification
+ Soft Margin Classification
+ Nonlinear SVM Classification
+ Polynomial Kernel
+ Adding Similarity Features
+ Gaussian RBF Kernel
+ Computational Complexity
+ SVM Regression
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...Simplilearn
This presentation on Introduction to Machine Learning will explain what is Machine Learning and how does Machine Learning works. By the end of this presentation, you will be able to understand what are the types of Machine Learning, Machine Learning algorithms and some of the breakthroughs in Machine Learning industry. You will also learn what Machine Learning has to offer to us in terms of career opportunities.
This Machine Learning presentation will cover the following topics:
1. Real life applications of Machine Learning
2. Machine Learning Challenges
3. How did Machine Learning evolve?
4. Why Machine Learning / Machine Learning benefits
5. What is Machine Learning?
6. Types of Machine Learning ( Supervised, Unsupervised & Reinforcement Learning )
7. Machine Learning algorithms
8. Breakthroughs in Machine Learning
9. Machine Learning Future
10. Machine Learning Career
11. Machine Learning job trends
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This article gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs.
References:
Eiben, Agoston E., and James E. Smith. Introduction to evolutionary computing. Vol. 53. Heidelberg: springer, 2003.
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
The GENETIC ALGORITHM is a model of machine learning which derives its behavior from a metaphor of the processes of EVOLUTION in nature. Genetic Algorithm (GA) is a search heuristic that mimics the process of natural selection. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/22vmyrg.
Matthew Sackman discusses dependencies between transactions, how to capture these with Vector Clocks, how to treat Vector Clocks as a CRDT, and how GoshawkDB uses all these ideas and more to achieve a scalable distributed data store with strong serializability, general transactions and fault tolerance. Filmed at qconlondon.com.
Matthew Sackman has been building and studying concurrent and distributed systems for over ten years. He built the first prototype of RabbitMQ, and contributed heavily to the design and implementation of many of its core features. He worked on the initial implementation of the Weave router in 2014. In 2015, he built GoshawkDB: a distributed, transactional, fault-tolerant object store.
Supervised Learning Algorithms - Analysis of different approachesPhilip Yankov
This talk is about - why AI/ML is so important today and how to follow an approach of doing Supervised Machine Learning algorithms from prototype to production - how to choose and transform the data, how to choose the model/algorithm and evaluate it so you have the best results in the future.
The entire framework from converting raw data to preprocessed data usable by ML algorithm, training an ML algorithm, and finally using the output of the ML algorithm to perform actions in the real-world is the Pipeline. We discussed the importance of the Pipeline and the adequate choice of the ML algorithm related to the different feature types, sparsity and amount of data.
Making the Crowd Wiser: (Re)combination through Teaming in CrowdsourcingJungpil Hahn
Talk at Deakin University Department of Information Systems and Business Analytics Monthly Seminar series on Research Diffusion and Impact (August 11, 2021)
Geir Rosoy discusses the importance of moving from just data storage to data warehousing. This presentation includes how Hyatt Hotels & Resorts have created a data mapping system to determine all sources of data and created a centralized location where we as an organization can use the data to provide intelligent and strategic analysis. It will further show, as an example, how CenterBridge data can be combined with other departmental data, as well as, macroeconomic data, to provide more insight into customer and agent behavior. The final part will discuss some of the statistical methodologies that can be utilized to derive information.
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
Abstract: Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design.
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Ensemble Learning: The Wisdom of Crowds (of Machines)
1. Ensemble Learning:
The Wisdom of Crowds
(of Machines)
Lior Rokach
Department of Information Systems Engineering
Ben-Gurion University of the Negev
2. About Me
Prof. Lior Rokach
Department of Information Systems Engineering
Faculty of Engineering Sciences
Head of the Machine Learning Lab
Ben-Gurion University of the Negev
Email: liorrk@bgu.ac.il
http://www.ise.bgu.ac.il/faculty/liorr/
PhD (2004) from Tel Aviv University
3. The Condorcet Jury Theorem
• If each voter has a probability p of being correct
and the probability of a majority of voters being
correct is M,
• then p > 0.5 implies M > p.
• Also M approaches 1, for all p > 0.5 as the number
of voters approaches infinity.
• This theorem was proposed by the Marquis of
Condorcet in 1784
4. Francis Galton
• Galton promoted statistics and invented the
concept of correlation.
• In 1906 Galton visited a livestock fair and
stumbled upon an intriguing contest.
• An ox was on display, and the villagers were
invited to guess the animal's weight.
• Nearly 800 gave it a go and, not surprisingly,
not one hit the exact mark: 1,198 pounds.
• Astonishingly, however, the average of those
800 guesses came close - very close indeed. It
was 1,197 pounds.
5. The Wisdom of Crowds
Why the Many Are Smarter Than the Few and How Collective Wisdom
Shapes Business, Economies, Societies and Nations
• Under certain controlled conditions, the
aggregation of information in groups,
resulting in decisions that are often
superior to those that can been made by
any single - even experts.
• Imitates our second nature to seek several
opinions before making any crucial
decision. We weigh the individual
opinions, and combine them to reach a
final decision
6. Committees of Experts
– ― … a medical school that has the objective that all students,
given a problem, come up with an identical solution‖
• There is not much point in setting up a committee of
experts from such a group - such a committee will not
improve on the judgment of an individual.
• Consider:
– There needs to be disagreement for the committee to have
the potential to be better than an individual.
7. Does it always work?
• Not all crowds (groups) are wise.
– Example: crazed investors in a stock market
bubble.
8. Key Criteria
• Diversity of opinion
– Each person should have private information even if it's just an
eccentric interpretation of the known facts.
• Independence
– People's opinions aren't determined by the opinions of those
around them.
• Decentralization
– People are able to specialize and draw on local knowledge.
• Aggregation
– Some mechanism exists for turning private judgments into a
collective decision.
9. Teaser: How good are ensemble methods?
Let’s look at the Netflix Prize Competition…
10. Began October 2006
• Supervised learning task
– Training data is a set of users and ratings (1,2,3,4,5
stars) those users have given to movies.
– Construct a classifier that given a user and an unrated
movie, correctly classifies that movie as either 1, 2, 3,
4, or 5 stars
• $1 million prize for a 10% improvement over
Netflix’s current movie recommender/classifier
(MSE = 0.9514)
11.
12. Learning biases
• Occam’s razor
―among the theories that are consistent with the data,
select the simplest one‖.
• Epicurus’ principle
―keep all theories that are consistent with the data,‖
[not necessarily with equal weights]
E.g. Bayesian learning
Ensemble learning
12
13. Strong and Weak Learners
• Strong (PAC) Learner
– Take labeled data for training
– Produce a classifier which can be arbitrarily
accurate
– Objective of machine learning
• Weak (PAC) Learner
– Take labeled data for training
– Produce a classifier which is more accurate
than random guessing
14. Ensembles of classifiers
• Given some training data
Dtrain x n , yn ; n 1,, N train
• Inductive learning
L: Dtrain h( ), where h( ): X Y
• Ensemble learning
L1: Dtrain h1( )
L2: Dtrain h2( )
... Ensemble:
LT: Dtrain hT ( ) {h1( ), h2( ), ... , hT ( )}
14
15. Classification by majority voting
New Instance: x
T=7 classifiers
1 1 1 2 1 2 1
Accumulated votes: t 5 2
0
1
2
3 0
4 1 Final class: 1
t1 t2
Alberto Suárez (2012)
15
17. Boosting
• Learners
– Strong learners are very difficult to construct
– Constructing weaker Learners is relatively easy
• Strategy
– Derive strong learner from weak learner
– Boost weak classifiers to a strong learner
18. Construct Weak Classifiers
• Using Different Data Distribution
– Start with uniform weighting
– During each step of learning
• Increase weights of the examples which are not
correctly learned by the weak learner
• Decrease weights of the examples which are
correctly learned by the weak learner
• Idea
– Focus on difficult examples which are not
correctly classified in the previous steps
19. Combine Weak Classifiers
• Weighted Voting
– Construct strong classifier by weighted voting
of the weak classifiers
• Idea
– Better weak classifier gets a larger weight
– Iteratively add weak classifiers
• Increase accuracy of the combined classifier through
minimization of a cost function
20. AdaBoost (Adaptive Boosting)
(Freund and Schapire, 1997)
Generate a
sequence of
base-learners
each focusing
on previous
one’s errors
(Freund and
Schapire,
1996)
32. Training Errors vs Test Errors
Performance on ‘letter’ dataset
(Schapire et al. 1997)
Test
error
Training
error
Training error drops to 0 on round 5
Test error continues to drop after round 5
(from 8.4% to 3.1%)
35. BrownBoost
• Reduce the weight given to misclassified
example
• Good (only) for very noisy data.
36. Bagging
Bootstrap AGGregatING
• Employs simplest way of combining predictions
that belong to the same type.
• Combining can be realized with voting or
averaging
• Each model receives equal weight
• ―Idealized‖ version of bagging:
– Sample several training sets of size n (instead of just
having one training set of size n)
– Build a classifier for each training set
– Combine the classifier’s predictions
• This improves performance in almost all cases if
learning scheme is unstable.
37. Wagging
Weighted AGGregatING
• A variant of bagging in which each
classifier is trained on the entire training set,
but each instance is stochastically assigned
a weight.
38. Random Forests
1. Choose T—number of trees to grow.
2. Choose m—number of variables used to split each node. m
≪ M, where M is the number of input variables. m is hold
constant while growing the forest.
3. Grow T trees. When growing each tree do the following.
(a) Construct a bootstrap sample of size n sampled from Sn with
replacement and grow a tree from this bootstrap sample.
(b) When growing a tree at each node select m variables at random
and use them to find the best split.
(c) Grow the tree to a maximal extent. There is no pruning.
4. To classify point X collect votes from every tree in the
forest and
then use majority voting to decide on the class label.
39. Variation of Random Forests
• Random Split Selection (Dietterich, 2000)
– Grow multiple trees
– When splitting, choose split uniformly at random from
– K best splits
– Can be used with or without pruning
• Random Subspace (Ho, 1998)
– Grow multiple trees
– Each tree is grown using a fixed subset of variables
– Do a majority vote or averaging to combine votes from
– different trees
40. DECORATE
(Melville & Mooney, 2003)
• Change training data by adding new
artificial training examples that encourage
diversity in the resulting ensemble.
• Improves accuracy when the training set is
small, and therefore resampling and
reweighting the training set has limited
ability to generate diverse alternative
hypotheses.
41. Overview of DECORATE
Current Ensemble
Training Examples
+
- C1
-
+
+
Base Learner
+
+
-
+
-
Artificial Examples
42. Overview of DECORATE
Current Ensemble
Training Examples
+
- C1
-
+
+
Base Learner C2
+
-
-
+
-
-
+
Artificial Examples
43. Overview of DECORATE
Current Ensemble
Training Examples
+
- C1
-
+
+
Base Learner C2
-
+
+
+ C3
-
Artificial Examples
49. Members Dependency
• Dependent Methods: There is an interaction
between the learning runs (AdaBoost)
– Model-guided Instance Selection: the classifiers that
were constructed in previous iterations are used for
selecting the training set in the subsequent iteration.
– Incremental Batch Learning: In this method the
classification produced in one iteration is given as prior
knowledge (a new feature) to the learning algorithm in
the subsequent iteration.
• Independent Methods (Bagging)
50. Cascading
Use dj only if
preceding ones
are not confident
Cascade learners
in order of
complexity
51. Diversity
• Manipulating the Inducer
• Manipulating the Training Sample
• Changing the target attribute representation
• Partitioning the search space - Each member is
trained on a different search subspace.
• Hybridization - Diversity is obtained by using
various base inducers or ensemble strategies.
52. Measuring the Diversity
• Pairwise measures calculate the average of a
particular distance metric between all possible
pairings of members in the ensemble, such as Q-
statistic or kappa-statistic.
• The non-pairwise measures either use the idea of
entropy or calculate a correlation of each ensemble
member with the averaged output.
53. Kappa-Statistic
i, j i, j
i, j
1 i, j
where i, j is the proportion of instances on which the classifiers i and j agree with each
other on the training set, and i, j is the probability that the two classifiers agree by
chance.
54. How crowded should the crowd be?
Ensemble Selection
• Why bother?
– Desired accuracy
– Computational cost
• Predetermine the ensemble size
• Use a certain criterion to stops training
• Pruning
56. Multi-strategy Ensemble Learning
• Combines several ensemble strategies.
• MultiBoosting, an extension to AdaBoost
expressed by adding wagging-like features
can harness both AdaBoost's high bias and
variance reduction with wagging's superior
variance reduction.
• produces decision committees with lower
error than either AdaBoost or wagging.
58. Why using Ensembles?
• Statistical Reasons: Out of many classifier models with similar training / test
errors, which one shall we pick? If we just pick one at random, we risk the
possibility of choosing a really poor one
– Combining / averaging them may prevent us from making one such unfortunate
decision
• Computational Reasons: Every time we run a classification algorithm, we
may find different local optima
– Combining their outputs may allow us to find a solution that is closer to the global
minimum.
• Too little data / too much data:
– Generating multiple classifiers with the resampling of the available data / mutually
exclusive subsets of the available data
• Representational Reasons: The classifier space may not contain the solution
to a given particular problem. However, an ensemble of such classifiers may
– For example, linear classifiers cannot solve non-linearly separable problems,
however, their combination can.
60. There’s no real Paradox…
• Ideally, all committee members would be
right about everything!
• If not, they should be wrong about different
things.
61. No Free Lunch Theorem in Machine
Learning (Wolpert, 2001)
• “Or to put it another way, for any two
learning algorithms, there are just as many
situations (appropriately weighted) in
which algorithm one is superior to
algorithm two as vice versa, according to
any of the measures of "superiority"
62. So why developing new algorithms?
• The science of pattern recognition is mostly concerned with choosing
the most appropriate algorithm for the problem at hand
• This requires some a priori knowledge – data distribution, prior
probabilities, complexity of the problem, the physics of the underlying
phenomenon, etc.
• The No Free Lunch theorem tells us that – unless we have some a
priori knowledge – simple classifiers (or complex ones for that matter)
are not necessarily better than others. However, given some a priori
information, certain classifiers may better MATCH the characteristics
of certain type of problems.
• The main challenge of the patter recognition professional is then, to
identify the correct match between the problem and the classifier!
…which is yet another reason to arm yourself with a diverse set of PR
arsenal !
63. Ensemble and the
No Free Lunch Theorem
• Ensemble combine the strengths of each
classifier to make a super-learner.
• But … Ensemble only improves classification
if the component classifiers perform better
than chance
– Can not be guaranteed a priori
• Proven effective in many real-world
applications
64. Ensemble and Optimal Bayes Rule
• Given a finite amount of data, many hypothesis are
typically equally good. How can the learning algorithm
select among them?
• Optimal Bayes classifier recipe: take a weighted majority
• vote of all hypotheses weighted by their posterior
probability.
• That is, put most weight on hypotheses consistent with the
data.
• Hence, ensemble learning may be viewed as an
approximation of the Optimal Bayes rule (which is
provably the best possible classifier).
65. Bias and Variance Decomposition
Bias
– The hypothesis space made available by a
particular classification method does not
include sufficient hypotheses
Variance
– The hypothesis space made available is too
large for the training data, and the selected
hypothesis may not be accurate on unseen data
66. Bias and Variance
Decision Trees
• Small trees have high bias.
• Large trees have high
variance. Why?
from Elder, John. From Trees
to Forests and Rule Sets - A
Unified Overview of Ensemble
Methods. 2007.
67. For Any Model
(Not only decision trees)
• Given a target function
• Model has many parameters
– Generally low bias
– Fits data well
– Yields high variance
• Model has few parameters
– Generally high bias
– May not fit data well
– The fit does not change much for different data sets
(low variance)
68. Bias-Variance and Ensemble Learning
• Bagging: There exists empirical and theoretical
evidence that Bagging acts as variance reduction
machine (i.e., it reduces the variance part of the
error).
• AdaBoost: Empirical evidence suggests that
AdaBoost reduces both the bias and the variance
part of the error. In particular, it seems that bias is
mostly reduced in early iterations, while variance
in later ones.
70. Occam's razor
• The explanation of any phenomenon should
make as few assumptions as possible,
eliminating those that make no difference in
the observable predictions of the
explanatory hypothesis or theory
71. Contradiction with Occam’s Razor
• Ensemble Contradicts with Occam’s Razor
– More rounds -> more classifiers for voting ->
more complicated
– With the 0 training error, a more complicated
classifier may perform worse
72. Two Razors (Domingos, 1999)
• First razor: Given two models with the same
generalization error, the simpler one should be
preferred because simplicity is desirable in itself.
• On the other hand, within KDD Occam's razor is often
used in a quite different sense, that can be stated as:
• Second razor: Given two models with the same
training-set error, the simpler one should be preferred
because it is likely to have lower generalization error.
• Domingos: The first one is largely uncontroversial, while
the second one, taken literally, is false.
73. Summary
• “Two heads are better than none. One
hundred heads are so much better than
one”
– Dearg Doom, The Tain, Horslips, 1973
• “Great minds think alike, clever minds
think together‖ L. Zoref, 2011.
• But they must be different, specialised
• And it might be an idea to select only the
best of them for the problem at hand