3. A Naive Bayes classifier
• is a probabilistic machine learning model that’s used for
classification task.
• The crux of the classifier is based on the Bayes theorem.
• Using Bayes theorem, we can find the probability
of A happening, given that B has occurred.
• Here, B is the evidence and A is the hypothesis.
• The assumption made here is that the predictors/features are
independent.
2/3/2024
12. Example: test phase
–Given a new instance,
• x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)
–Look up tables
–MAP rule
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
2/3/2024
13. Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the attribute value
– In this circumstance, during test
– For a remedy, conditional probabilities estimated with
𝑃(𝑋1,⋅⋅⋅, 𝑋𝑛|𝐶) ≠ 𝑃(𝑋1|𝐶) ⋅⋅⋅ 𝑃(𝑋𝑛|𝐶)
𝑋𝑗 = 𝑎𝑗𝑘, 𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) = 0
𝑃(𝑥1|𝑐𝑖) ⋅⋅⋅ 𝑃(𝑎𝑗𝑘|𝑐𝑖) ⋅⋅⋅ 𝑃(𝑥𝑛|𝑐𝑖) = 0
𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) =
𝑛𝑐 + 𝑚𝑝
𝑛 + 𝑚
𝑛𝑐: number of training examples for which 𝑋𝑗 = 𝑎𝑗𝑘 and C = 𝑐𝑖
𝑛: number of training examples for which 𝐶 = 𝑐𝑖
𝑝: prior estimate (usually, 𝑝 = 1/𝑡 for 𝑡 possible values of 𝑋𝑗)
𝑚: weight to prior (number of "virtual" examples, 𝑚 ≥ 1)
2/3/2024
14. Gaussian Naive Bayes classifier
• In Gaussian Naive Bayes, continuous values associated with each
feature are assumed to be distributed according to a Gaussian
distribution.
• A Gaussian distribution is also called Normal distribution.
• When plotted, it gives a bell shaped curve which is symmetric
about the mean of the feature values as shown below:
2/3/2024
15. Gaussian Naive Bayes classifier
• Updated table of prior probabilities for outlook feature is as
following:
• The likelihood of the features is assumed to be Gaussian, hence,
conditional probability is given by:
2/3/2024
16. Summary
• Naïve Bayes based on the independence assumption
• Training is very easy and fast; just requiring considering each attribute in each
class separately
• Test is straightforward; just looking up tables or calculating conditional
probabilities with normal distributions
• A popular generative model
• Performance competitive to most of state-of-the-art classifiers even in presence of
violating independence assumption
• Many successful applications, e.g., spam mail filtering
• Apart from classification, naïve Bayes can do more…
2/3/2024
Editor's Notes
That is presence of one particular feature does not affect the other.
Hence it is called naïve.
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation:
where X and C are events and P(X) ≠ 0.
Basically, we are trying to find probability of event C, given the event X is true. Event X is also termed as evidence.
P(C) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event X).
P(C|X) is a posteriori probability of B, i.e. probability of event after evidence is seen.