This document provides an overview of Naive Bayes classification. It begins with background on classification methods, then covers Bayes' theorem and how it relates to Bayesian and maximum likelihood classification. The document introduces Naive Bayes classification, which makes a strong independence assumption to simplify probability calculations. It discusses algorithms for discrete and continuous features, and addresses common issues like dealing with zero probabilities. The document concludes by outlining some applications of Naive Bayes classification and its advantages of simplicity and effectiveness for many problems.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier.
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts on measuring accuracy of your trained model. Concepts covered are loss functions and confusion matrices.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Introduction To Multilevel Association Rule And Its MethodsIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
In this presentation is given an introduction to Bayesian networks and basic probability theory. Graphical explanation of Bayes' theorem, random variable, conditional and joint probability. Spam classifier, medical diagnosis, fault prediction. The main software for Bayesian Networks are presented.
Abstract: This PDSG workshop introduces basic concepts of ensemble methods in machine learning. Concepts covered are Condercet Jury Theorem, Weak Learners, Decision Stumps, Bagging and Majority Voting.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier.
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts on measuring accuracy of your trained model. Concepts covered are loss functions and confusion matrices.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Introduction To Multilevel Association Rule And Its MethodsIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
In this presentation is given an introduction to Bayesian networks and basic probability theory. Graphical explanation of Bayes' theorem, random variable, conditional and joint probability. Spam classifier, medical diagnosis, fault prediction. The main software for Bayesian Networks are presented.
Abstract: This PDSG workshop introduces basic concepts of ensemble methods in machine learning. Concepts covered are Condercet Jury Theorem, Weak Learners, Decision Stumps, Bagging and Majority Voting.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
3. Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class
Examples: Naive Bayes, Model based classifiers
a) and b) are examples of discriminative classification
c) is an example of generative classification
b) and c) are both examples of probabilistic classification
2
4. Bayes Theorem
Given a hypothesis h and data D which bears on the hypothesis:
P(h): independent probability of h: prior probability
P(D): independent probability of D
P(D|h): conditional probability of D given h: likelihood
P(h|D): conditional probability of h given D: posterior probability
3
5. Maximum A Posterior
Based on Bayes Theorem, we can compute the Maximum A Posterior (MAP)
hypothesis for the data
We are interested in the best hypothesis for some space H given observed training
data D.
H: set of all hypothesis.
h argmaxP(h | D)
h H
MAP
P D h P h
( | ) ( )
P D
( )
argmax
hH
argmaxP(D| h)P(h)
hH
Note that we can drop P(D) as the probability of the data is constant (and
independent of the hypothesis).
4
6. Maximum Likelihood
Now assume that all hypothesis are equally probable a prior, i.e. P(hi ) = P(hj ) for all
hi, hj belong to H.
This is called assuming a uniform prior. It simplifies computing the posterior:
h argmaxP(D| h)
h H
ML
This hypothesis is called the maximum likelihood hypothesis.
5
7. Bayesian Classifier
The classification problem may be formalized using a-posterior probabilities:
P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C.
E.g. P(class=N | outlook= sunny, windy=true,…)
Idea: assign to sample X the class label C such that P(C|X) is maximal
6
8. Estimating a-posterior probabilities
Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
P(X) is constant for all classes
P(C) = relative freq of class C samples
C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum
Problem: computing P(X|C) is unfeasible!
7
9. Naive Bayes
Bayes classification
( ) ( ) ( ) ( , , | ) ( ) 1 P C| P |C P C P X X C P C n X X
Difficulty: learning the joint probability
Naive Bayes classification
-Assumption that all input features are conditionally independent!
P X X X C P X X X C P X X C
( , , , | ) ( | , , , ) ( ,
, | )
n n n
1 2 1 2 2
-MAP classification rule: for
P X C P X X C
( | ) ( , , | )
1 2
P X C P X C P X C
( | ) ( | ) ( | )
1 2
n
n
( , , , ) 1 2 n x x x x
*
[P(x | c ) P(x | c )]P(c ) [P(x | c) P(x | c)]P(c), c c , c c , ,c n 1
n 1
L * * *
1
8
10. Naive Bayes
Algorithm: Discrete-Valued Features
-Learning Phase: Given a training set S,
c (c c , ,c )
For each target value of 1
i i L
ˆ ( ) estimate ( ) with examples in ;
P C c P C
c
i i
x X j n k ,N
For every feature value of each feature ( 1, , ; 1,
)
jk j j
ˆ ( | ) estimate ( | ) with examples in ;
P X x C c P X x C
c
X N L j j ,
Output: conditional probability tables; for elements
-Test Phase: Given an unknown instance
( , , ) 1 n X a a
Look up tables to assign the label c* to X´ if
S
S
j jk i j jk i
[Pˆ(a | c * ) Pˆ(a | c * )]Pˆ(c *
) [Pˆ(a | c) Pˆ(a | c)]Pˆ(c), c c *
, c c ,
,c 1 n 1
n 1
L 9
12. Example
Learning Phase :
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
11
13. Example
Test Phase :
-Given a new instance, predict its label
x´=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
-Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
-Decision making with the MAP rule:
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Yes|x´): [ P(Sunny|Yes) P(Cool|Yes) P(High|Yes) P(Strong|Yes) ] P(Play=Yes) = 0.0053
P(No|x´): [ P(Sunny|No) P(Cool|No) P(High|No) P(Strong|No) ] P(Play=No) = 0.0206
Given the fact P(Yes|x´) < P(No|x´) , we label x´ to be “No”.
12
14. Naive Bayes
Algorithm: Continuous-valued Features
- Numberless values for a feature
- Conditional probability often modeled with the normal distribution
(
)
ˆ ( | ) 2
j ji
1
2
exp
2
X c
2
: mean (avearage) of feature values of examples for whichC
ji j i
ji j i
- Learning Phase:
Output: normal distributions and
- Test Phase: Given an unknown instance
-Instead of looking-up tables, calculate conditional probabilities with all the normal
distributions achieved in the learning phrase
-Apply the MAP rule to make a decision
ji
ji
j i
C c
X
P X C c
: standard deviation of feature values X of examples for which
n L for (X , , X ), C c , ,c 1 1 X
P C c i L i nL ( ) 1, ,
( , , ) 1 n X a a
13
15. Naive Bayes
Example: Continuous-valued Features
-Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
-Estimate mean and variance for each class
N
N
1
2 2
n x
x
1
,
N n
1
( )
n
n
N
1
21.64,
2.35
Yes Yes
23.88, 7.09
No No
-Learning Phase: output two Gaussian models for P(temp|C)
1
( 21.64)
1
( 23.88)
50.25
exp
7.09 2
ˆ ( | )
11.09
exp
2.35 2
ˆ ( | )
2
2
x
P x No
x
P x Yes
14
16. Uses of Naive Bayes classification
Text Classification
Spam Filtering
Hybrid Recommender System
- Recommender Systems apply machine learning and data mining techniques for
filtering unseen information and can predict whether a user would like a given
resource
Online Application
- Simple Emotion Modeling
15
17. Why text classification?
Learning which articles are of interest
Classify web pages by topic
Information extraction
Internet filters
16
19. Naive Bayes Approach
Build the Vocabulary as the list of all distinct words that appear in all the documents
of the training set.
Remove stop words and markings
The words in the vocabulary become the attributes, assuming that classification is
independent of the positions of the words
Each document in the training set becomes a record with frequencies for each word
in the Vocabulary.
Train the classifier based on the training data set, by computing the prior probabilities
for each class and attributes.
Evaluate the results on Test data
18
20. Text Classification Algorithm: Naive Bayes
Tct – Number of particular word in particular class
Tct’ – Number of total words in particular class
B´ – Number of distinct words in all class
19
21. Relevant Issues
Violation of Independence Assumption
Zero conditional probability Problem
20
22. Violation of Independence Assumption
Naive Bayesian classifiers assume that the effect of an attribute value on a given
class is independent of the values of the other attributes. This assumption is called
class conditional independence. It is made to simplify the computations involved and,
in this sense, is considered “naive.”
21
23. Improvement
Bayesian belief network are graphical models, which unlike naive Bayesian
classifiers, allow the representation of dependencies among subsets of attributes.
Bayesian belief networks can also be used for classification.
22
24. Zero conditional probability Problem
If a given class and feature value never occur together in the training set then the
frequency-based probability estimate will be zero.
This is problematic since it will wipe out all information in the other probabilities when
they are multiplied.
It is therefore often desirable to incorporate a small-sample correction in all
probability estimates such that no probability is ever set to be exactly zero.
23
25. Naive Bayes Laplace Correction
To eliminate zeros, we use add-one or Laplace smoothing, which simply adds one to
each count
24
26. Example
Suppose that for the class buys computer D (yes) in some training database, D, containing 1000
tuples.
we have 0 tuples with income D low,
990 tuples with income D medium, and
10 tuples with income D high.
The probabilities of these events, without the Laplacian correction, are 0, 0.990 (from 990/1000),
and 0.010 (from 10/1000), respectively.
Using the Laplacian correction for the three quantities, we pretend that we have 1 more tuple for
each income-value pair. In this way, we instead obtain the following probabilities :
respectively. The “corrected” probability estimates are close to their “uncorrected” counterparts,
yet the zero probability value is avoided.
25
27. Advantages
• Advantages :
Easy to implement
Requires a small amount of training data to estimate the parameters
Good results obtained in most of the cases
26
28. Disadvantages
Disadvantages:
Assumption: class conditional independence, therefore loss of accuracy
Practically, dependencies exist among variables
-E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modelled by Naïve Bayesian Classifier
27
30. Conclusions
Naive Bayes is:
- Really easy to implement and often works well
- Often a good first thing to try
- Commonly used as a “punching bag” for smarter algorithms
29
31. References
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/mlbook/ch6.pdf
Data Mining: Concepts and Techniques, 3rd
Edition, Han & kamber & Pei ISBN: 9780123814791
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://www.slideshare.net/ashrafmath/naive-bayes-15644818
http://www.slideshare.net/gladysCJ/lesson-71-naive-bayes-classifier
30