2. Algorithm
• A Naive Bayesian model is easy to
build, with no complicated
iterative parameter estimation
which makes it particularly useful
for very large datasets.
• Despite its simplicity, the Naive
Bayesian classifier often does
surprisingly well and is widely
used.
Assume that there are n number of features in
the dataset, then X= {x1 ,x2 , … , xn }
3. Naïve Bayes -Details
• Bayes classification:
Difficulty: learning the joint probability
• Naïve Bayes classification- Assumption that all input features are conditionally
independent!
)()|,,()()()( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
)|()|()|()|,,,( 2121 CXPCXPCXPCXXXP nn
4. Naïve Bayes
• NB classification rule:
• for given 𝑋 = (𝑥1, 𝑥2, 𝑥3, . . 𝑥 𝑛) and L number of classes: C1, C2, .., CL, the
vector X is assigned to class c* when:
Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1
*
1
***
1
5. Naïve Bayes
• Algorithm: Continuous-valued Features
– Conditional probability often modeled with the normal distribution
– Learning Phase:
Output: normal distributions and
– Test Phase: Given an unknown instance
• Instead of looking-up tables, calculate conditional probabilities with all the
normal distributions achieved in the learning phrase
• Apply the MAP rule to make a decision
ijji
ijji
ji
jij
ji
ij
cC
cX
X
cCXP
whichforexamplesofXvaluesfeatureofdeviationstandard:
Cwhichforexamplesofvaluesfeatureof(avearage)mean:
2
)(
exp
2
1
)|(ˆ
2
2
Ln ccCXX ,,),,,(for 11 XLn
LicCP i ,,1)(
),,( 1 naa X
6. Example 3-Naïve Bayes Classifier with Continuous
Attributes
• Problem: classify
whether a given
person is a male or a
female based on the
measured features.
The features include:
height, weight, and
foot size.
Training
Example training set below.
Sex
(o/p class)
Height
(ft)
Weight
(lbs)
foot size
(inches)
male 6 180 12
male 5.92 190 11
male 5.58 170 12
male 5.92 165 10
female 5 100 6
female 5.5 150 8
female 5.42 130 7
female 5.75 150 9
7. Example 3
• Solution
• Phase 1: Training
• The classifier created from the training set using a Gaussian distribution assumption would be:
sex
mean
(height)
variance
(height)
mean
(weight)
variance
(weight)
Mean
(foot size)
variance
(foot size)
male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01
female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00
We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.
8. Example 3
• Phase 2: Testing
• Below is a sample X to be classified as a male or female.
sex height (ft) weight foot size(inches)
To identify 6 130 8
Solution:
X={6,130,8}
Given this info, We wish to determine which is greater, p(male|X) or p(female|X) .
p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence
p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence
9. Example 3
• The evidence (also termed normalizing constant) may be calculated
since the sum of the posteriors equals one.
• evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) +
P(female)*P(height|female)*P(weight|female)*P(foot size|female)
• The evidence may be ignored since it is a positive constant and is
same for both the classes. (Normal distributions are always positive.)
10. Example 3
• We now determine the sex of the sample.
• P(male) = 0.5
• P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under
the bell curve that is equal to 1.)
• P(weight|male) = 5.9881e-06
• P(foot size|male) = 1.3112e-3
• numerator of p(male|X) = their product = 6.1984e-09
11. Example 3
• P(female) = 0.5
• P(height|female) = 2.2346e-1
• P(weight|female) = 1.6789e-2
• P(foot size|female) = 2.8669e-1
• numerator of p(female|X) = their product = 5.3778e-04
Result:
Since posterior numerator of p(female|X) > posterior numerator of p(male|X) , the sample is
female.
12. Naïve Bayes
• Algorithm: Discrete-Valued Features
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance ,
Look up tables to assign the label c* to X’ if
;inexampleswith)|(estimate)|(ˆ
),1;,,1(featureeachofvaluefeatureeveryFor
;inexampleswith)(estimate)(ˆ
ofvaluetargeteachFor 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knjXx
cCPcCP
)c,,c(cc
Lnn ccccccPcaPcaPcPcaPcaP ,,,),(ˆ)]|(ˆ)|(ˆ[)(ˆ)]|(ˆ)|(ˆ[ 1
*
1
***
1
),,( 1 naa X
LNX jj ,
13. Example
• Example: Play Tennis
Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)
14. Example
• Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
We have four variables, we calculate for each
we calculate the conditional probability table
15. Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
– Decision making with the MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
16. Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
17. Example 2: Training dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
30…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Class:
C1:buys_computer=‘yes’
C2:buys_computer=‘no’
Data sample:
X =
(age<=30,
Income=medium,
Student=yes
Credit_rating=Fair)
18. Naïve Bayesian Classifier: Example 2
• Compute P(X|Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
• X=(age<=30 ,income =medium, student=yes,credit_rating=fair)
P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044
P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019
P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007
X belongs to class “buys_computer=yes”
P(buys_computer=“yes“)=9/14
P(buys_computer=“no“)=5/14
19. Summary
• Naïve Bayes: the conditional independence assumption
• Training is very easy and fast; just requiring considering each attribute
in each class separately
• Test is straightforward; just looking up tables or calculating conditional
probabilities with estimated distributions
• A popular generative model
• Performance competitive to most of state-of-the-art classifiers even in
presence of violating independence assumption
• Many successful applications, e.g., spam mail filtering
• A good candidate of a base learner in ensemble learning
• Apart from classification, naïve Bayes can do more…