baysian in machine learning in Supervised Learning .pptx

Naïve bayes
Lecture note
AASTU
2/3/2024

Outline
• Background
• Probability Basics
• Probabilistic Classification
• Naïve Bayes
• Example: Play Tennis
• Relevant Issues
• Conclusions
2/3/2024

A Naive Bayes classifier
• is a probabilistic machine learning model that’s used for
classification task.
• The crux of the classifier is based on the Bayes theorem.
• Using Bayes theorem, we can find the probability
of A happening, given that B has occurred.
• Here, B is the evidence and A is the hypothesis.
• The assumption made here is that the predictors/features are
independent.
2/3/2024

Probability Basics
• Prior, conditional and joint probability
– Prior probability:
– Conditional probability:
– Joint probability:
– Relationship:
– Independence:
• Bayesian Rule
𝑃(𝑋)
𝑃(𝑋1|𝑋2), 𝑃(𝑋2|𝑋1)
𝐗 = 𝑋1, 𝑋2 ⇒ 𝑃(𝐗) = 𝑃(𝑋1 , 𝑋2)
𝑃(𝑋1 , 𝑋2) = 𝑃(𝑋2|𝑋1)𝑃(𝑋1) = 𝑃(𝑋1|𝑋2)𝑃(𝑋2)
𝑃 𝑋2 𝑋1 = 𝑃 𝑋2 , 𝑃 𝑋1 𝑋2 = 𝑃 𝑋1 ,
=> 𝑃(𝑋1 , 𝑋2) = 𝑃(𝑋1)𝑃(𝑋2)
𝑃(𝐶|𝐗) =
𝑃(𝐗|𝐶)𝑃(𝐶)
𝑃(𝐗)
𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑃𝑟𝑖𝑜𝑟
𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒
2/3/2024

Example by Dieter Fox
taken
2/3/2024

Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model:
– Generative model:
• MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if
• Generative classification with the MAP rule
– Apply Bayesian rule to convert:
𝑃(𝐶|𝐗) 𝐶 = 𝑐1,⋅⋅⋅, 𝑐𝐿, 𝐗 = (𝑋1,⋅⋅⋅, 𝑋𝑛)
𝑃(𝐗|𝐶) 𝐶 = 𝑐1,⋅⋅⋅, 𝑐𝐿, 𝐗 = (𝑋1,⋅⋅⋅, 𝑋𝑛)
𝑃(𝐶 = 𝑐∗
|𝐗 = 𝐱) > 𝑃(𝐶 = 𝑐|𝐗 = 𝐱) 𝑐 ≠ 𝑐∗
, 𝑐 = 𝑐1,⋅⋅⋅, 𝑐𝐿
𝑃(𝐶|𝐗) =
𝑃(𝐗|𝐶)𝑃(𝐶)
𝑃(𝐗)
∝ 𝑃(𝐗|𝐶)𝑃(𝐶)
2/3/2024

Naïve Bayes
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
• Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance ,
• Look up tables to assign the label c* to X’ if
•
For each target value of 𝑐𝑖 (𝑐𝑖 = 𝑐1,⋅⋅⋅, 𝑐𝐿)
𝑃(𝐶 = 𝑐𝑖) ← estimate 𝑃(𝐶 = 𝑐𝑖) with examples in 𝐒;
For every attribute value 𝑎𝑗𝑘 of each attribute 𝑥𝑗 (𝑗 = 1,⋅⋅⋅, 𝑛; 𝑘 = 1,⋅⋅⋅, 𝑁𝑗)
𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) ← estimate 𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) with examples in 𝐒;
𝑥𝑗, 𝑁𝑗 × 𝐿
𝐗′
= (𝑎1
′
,⋅⋅⋅, 𝑎𝑛
′
)
[𝑃(𝑎1
′
|𝑐∗
) ⋅⋅⋅ 𝑃(𝑎𝑛
′
|𝑐∗
)]𝑃(𝑐∗
) > [𝑃(𝑎1
′
|𝑐) ⋅⋅⋅ 𝑃(𝑎𝑛
′
|𝑐)]𝑃(𝑐), 𝑐 ≠ 𝑐∗
, 𝑐 = 𝑐1,⋅⋅⋅, 𝑐𝐿
2/3/2024

Example: Learning phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
2/3/2024

Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the attribute value
– In this circumstance, during test
– For a remedy, conditional probabilities estimated with
𝑃(𝑋1,⋅⋅⋅, 𝑋𝑛|𝐶) ≠ 𝑃(𝑋1|𝐶) ⋅⋅⋅ 𝑃(𝑋𝑛|𝐶)
𝑋𝑗 = 𝑎𝑗𝑘, 𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) = 0
𝑃(𝑥1|𝑐𝑖) ⋅⋅⋅ 𝑃(𝑎𝑗𝑘|𝑐𝑖) ⋅⋅⋅ 𝑃(𝑥𝑛|𝑐𝑖) = 0
𝑃(𝑋𝑗 = 𝑎𝑗𝑘|𝐶 = 𝑐𝑖) =
𝑛𝑐 + 𝑚𝑝
𝑛 + 𝑚
𝑛𝑐: number of training examples for which 𝑋𝑗 = 𝑎𝑗𝑘 and C = 𝑐𝑖
𝑛: number of training examples for which 𝐶 = 𝑐𝑖
𝑝: prior estimate (usually, 𝑝 = 1/𝑡 for 𝑡 possible values of 𝑋𝑗)
𝑚: weight to prior (number of "virtual" examples, 𝑚 ≥ 1)
2/3/2024

Gaussian Naive Bayes classifier
• In Gaussian Naive Bayes, continuous values associated with each
feature are assumed to be distributed according to a Gaussian
distribution.
• A Gaussian distribution is also called Normal distribution.
• When plotted, it gives a bell shaped curve which is symmetric
about the mean of the feature values as shown below:
2/3/2024

Gaussian Naive Bayes classifier
• Updated table of prior probabilities for outlook feature is as
following:
• The likelihood of the features is assumed to be Gaussian, hence,
conditional probability is given by:
2/3/2024

Summary
• Naïve Bayes based on the independence assumption
• Training is very easy and fast; just requiring considering each attribute in each
class separately
• Test is straightforward; just looking up tables or calculating conditional
probabilities with normal distributions
• A popular generative model
• Performance competitive to most of state-of-the-art classifiers even in presence of
violating independence assumption
• Many successful applications, e.g., spam mail filtering
• Apart from classification, naïve Bayes can do more…
2/3/2024

baysian in machine learning in Supervised Learning .pptx

Recommended

Recommended

More Related Content

Similar to baysian in machine learning in Supervised Learning .pptx

Similar to baysian in machine learning in Supervised Learning .pptx (20)

Recently uploaded

Recently uploaded (20)

baysian in machine learning in Supervised Learning .pptx

Editor's Notes