SlideShare a Scribd company logo
1 of 15
Naive Bayes Classifier
Bayesian Methods
 Our focus this lecture
 Learning and classification methods based on
probability theory.
 Bayes theorem plays a critical role in probabilistic
learning and classification.
 Uses prior probability of each category given no
information about an item.
 Categorization produces a posterior probability
distribution over the possible categories given a
description of an item.
Basic Probability Formulas
 Product rule
 Sum rule
 Bayes theorem
 Theorem of total probability, if event Ai is
mutually exclusive and probability sum to 1
)
(
)
|
(
)
(
)
|
(
)
( A
P
A
B
P
B
P
B
A
P
B
A
P 


)
(
)
(
)
(
)
( B
A
P
B
P
A
P
B
A
P 







n
i
i
i A
P
A
B
P
B
P
1
)
(
)
|
(
)
(
)
(
)
(
)
|
(
)
|
(
D
P
h
P
h
D
P
D
h
P 
Bayes Theorem
 Given a hypothesis h and data D which bears on the
hypothesis:
 P(h): independent probability of h: prior probability
 P(D): independent probability of D
 P(D|h): conditional probability of D given h:
likelihood
 P(h|D): conditional probability of h given D: posterior
probability
)
(
)
(
)
|
(
)
|
(
D
P
h
P
h
D
P
D
h
P 
Does patient have cancer or not?
 A patient takes a lab test and the result comes back positive.
It is known that the test returns a correct positive result in
only 99% of the cases and a correct negative result in only
95% of the cases. Furthermore, only 0.03 of the entire
population has this disease.
1. What is the probability that this patient has cancer?
2. What is the probability that he does not have cancer?
3. What is the diagnosis?
Maximum A Posterior
 Based on Bayes Theorem, we can compute the
Maximum A Posterior (MAP) hypothesis for the data
 We are interested in the best hypothesis for some
space H given observed training data D.
)
|
(
argmax D
h
P
h
H
h
MAP


)
(
)
(
)
|
(
argmax
D
P
h
P
h
D
P
H
h

)
(
)
|
(
argmax h
P
h
D
P
H
h

H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Maximum Likelihood
 Now assume that all hypotheses are equally
probable a priori, i.e., P(hi ) = P(hj ) for all hi,
hj belong to H.
 This is called assuming a uniform prior. It
simplifies computing the posterior:
 This hypothesis is called the maximum
likelihood hypothesis.
)
|
(
max
arg h
D
P
h
H
h
ML


Desirable Properties of Bayes Classifier
 Incrementality: with each training example,
the prior and the likelihood can be updated
dynamically: flexible and robust to errors.
 Combines prior knowledge and observed
data: prior probability of a hypothesis
multiplied with probability of the hypothesis
given the training data
 Probabilistic hypothesis: outputs not only a
classification, but a probability distribution
over all classes
Bayes Classifiers
Assumption: training set consists of instances of different classes
described cj as conjunctions of attributes values
Task: Classify a new instance d based on a tuple of attribute values
into one of the classes cj  C
Key idea: assign the most probable class using Bayes
Theorem.
MAP
c
)
,
,
,
|
(
argmax 2
1 n
j
C
c
MAP x
x
x
c
P
c
j



)
,
,
,
(
)
(
)
|
,
,
,
(
argmax
2
1
2
1
n
j
j
n
C
c x
x
x
P
c
P
c
x
x
x
P
j 



)
(
)
|
,
,
,
(
argmax 2
1 j
j
n
C
c
c
P
c
x
x
x
P
j



Parameters estimation
 P(cj)
 Can be estimated from the frequency of classes in the
training examples.
 P(x1,x2,…,xn|cj)
 O(|X|n•|C|) parameters
 Could only be estimated if a very, very large number of
training examples was available.
 Independence Assumption: attribute values are
conditionally independent given the target value: naïve
Bayes.


i
j
i
j
n c
x
P
c
x
x
x
P )
|
(
)
|
,
,
,
( 2
1 



i
j
i
j
C
c
NB c
x
P
c
P
c
j
)
|
(
)
(
max
arg
Properties
 Estimating instead of greatly
reduces the number of parameters (and the data
sparseness).
 The learning step in Naïve Bayes consists of
estimating and based on the
frequencies in the training data
 An unseen instance is classified by computing the
class that maximizes the posterior
 When conditioned independence is satisfied, Naïve
Bayes corresponds to MAP classification.
)
|
( j
i c
x
P )
|
,
,
,
( 2
1 j
n c
x
x
x
P 
)
( j
c
P
)
|
( j
i c
x
P
Example. ‘Play Tennis’ data
Day Outlook Temperature Humidity Wind Play
Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Question: For the day <sunny, cool, high, strong>, what’s
the play prediction?
Naive Bayes solution
Classify any new datum instance x=(a1,…aT) as:
 To do this based on training examples, we need to estimate the
parameters from the training examples:
 For each target value (hypothesis) h
 For each attribute value at of each datum instance
)
(
estimate
:
)
(
ˆ h
P
h
P 
)
|
(
estimate
:
)
|
(
ˆ h
a
P
h
a
P t
t 



t
t
h
h
Bayes
Naive h
a
P
h
P
h
P
h
P
h )
|
(
)
(
max
arg
)
|
(
)
(
max
arg x
Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)
 That means: Play tennis or not?
 Working:
)
|
(
)
|
(
)
|
(
)
|
(
)
(
max
arg
)
|
(
)
(
max
arg
)
|
(
)
(
max
arg
]
,
[
]
,
[
]
,
[
h
strong
Wind
P
h
high
Humidity
P
h
cool
Temp
P
h
sunny
Outlook
P
h
P
h
a
P
h
P
h
P
h
P
h
no
yes
h
t
t
no
yes
h
no
yes
h
NB











x
no
x
PlayTennis
answer
no
strong
P
no
high
P
no
cool
P
no
sunny
P
no
P
yes
strong
P
yes
high
P
yes
cool
P
yes
sunny
P
yes
P
etc
no
PlayTennis
strong
Wind
P
yes
PlayTennis
strong
Wind
P
no
PlayTennis
P
yes
PlayTennis
P


















)
(
:
)
|
(
)
|
(
)
|
(
)
|
(
)
(
0053
.
0
)
|
(
)
|
(
)
|
(
)
|
(
)
(
.
60
.
0
5
/
3
)
|
(
33
.
0
9
/
3
)
|
(
36
.
0
14
/
5
)
(
64
.
0
14
/
9
)
(
0.0206
Underflow Prevention
 Multiplying lots of probabilities, which are
between 0 and 1 by definition, can result in
floating-point underflow.
 Since log(xy) = log(x) + log(y), it is better to
perform all computations by summing logs of
probabilities rather than multiplying
probabilities.
 Class with highest final un-normalized log
probability score is still the most probable.





positions
i
j
i
j
C
c
NB c
x
P
c
P
c )
|
(
log
)
(
log
argmax
j

More Related Content

Similar to bayesNaive algorithm in machine learning

Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learningkensaleste
 
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeBayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeDhivyaa C.R
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptxMahimMajee
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningbutest
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxavinashBajpayee1
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
Statistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptStatistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptSandeepGupta229023
 

Similar to bayesNaive algorithm in machine learning (20)

06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Naive Bayes.pptx
Naive Bayes.pptxNaive Bayes.pptx
Naive Bayes.pptx
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learning
 
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeBayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
 
UNIT II (7).pptx
UNIT II (7).pptxUNIT II (7).pptx
UNIT II (7).pptx
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
ppt
pptppt
ppt
 
Bayes Theorem.pdf
Bayes Theorem.pdfBayes Theorem.pdf
Bayes Theorem.pdf
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Supervised models
Supervised modelsSupervised models
Supervised models
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
Statistical Machine________ Learning.ppt
Statistical Machine________ Learning.pptStatistical Machine________ Learning.ppt
Statistical Machine________ Learning.ppt
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

bayesNaive algorithm in machine learning

  • 2. Bayesian Methods  Our focus this lecture  Learning and classification methods based on probability theory.  Bayes theorem plays a critical role in probabilistic learning and classification.  Uses prior probability of each category given no information about an item.  Categorization produces a posterior probability distribution over the possible categories given a description of an item.
  • 3. Basic Probability Formulas  Product rule  Sum rule  Bayes theorem  Theorem of total probability, if event Ai is mutually exclusive and probability sum to 1 ) ( ) | ( ) ( ) | ( ) ( A P A B P B P B A P B A P    ) ( ) ( ) ( ) ( B A P B P A P B A P         n i i i A P A B P B P 1 ) ( ) | ( ) ( ) ( ) ( ) | ( ) | ( D P h P h D P D h P 
  • 4. Bayes Theorem  Given a hypothesis h and data D which bears on the hypothesis:  P(h): independent probability of h: prior probability  P(D): independent probability of D  P(D|h): conditional probability of D given h: likelihood  P(h|D): conditional probability of h given D: posterior probability ) ( ) ( ) | ( ) | ( D P h P h D P D h P 
  • 5. Does patient have cancer or not?  A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive result in only 99% of the cases and a correct negative result in only 95% of the cases. Furthermore, only 0.03 of the entire population has this disease. 1. What is the probability that this patient has cancer? 2. What is the probability that he does not have cancer? 3. What is the diagnosis?
  • 6. Maximum A Posterior  Based on Bayes Theorem, we can compute the Maximum A Posterior (MAP) hypothesis for the data  We are interested in the best hypothesis for some space H given observed training data D. ) | ( argmax D h P h H h MAP   ) ( ) ( ) | ( argmax D P h P h D P H h  ) ( ) | ( argmax h P h D P H h  H: set of all hypothesis. Note that we can drop P(D) as the probability of the data is constant (and independent of the hypothesis).
  • 7. Maximum Likelihood  Now assume that all hypotheses are equally probable a priori, i.e., P(hi ) = P(hj ) for all hi, hj belong to H.  This is called assuming a uniform prior. It simplifies computing the posterior:  This hypothesis is called the maximum likelihood hypothesis. ) | ( max arg h D P h H h ML  
  • 8. Desirable Properties of Bayes Classifier  Incrementality: with each training example, the prior and the likelihood can be updated dynamically: flexible and robust to errors.  Combines prior knowledge and observed data: prior probability of a hypothesis multiplied with probability of the hypothesis given the training data  Probabilistic hypothesis: outputs not only a classification, but a probability distribution over all classes
  • 9. Bayes Classifiers Assumption: training set consists of instances of different classes described cj as conjunctions of attributes values Task: Classify a new instance d based on a tuple of attribute values into one of the classes cj  C Key idea: assign the most probable class using Bayes Theorem. MAP c ) , , , | ( argmax 2 1 n j C c MAP x x x c P c j    ) , , , ( ) ( ) | , , , ( argmax 2 1 2 1 n j j n C c x x x P c P c x x x P j     ) ( ) | , , , ( argmax 2 1 j j n C c c P c x x x P j   
  • 10. Parameters estimation  P(cj)  Can be estimated from the frequency of classes in the training examples.  P(x1,x2,…,xn|cj)  O(|X|n•|C|) parameters  Could only be estimated if a very, very large number of training examples was available.  Independence Assumption: attribute values are conditionally independent given the target value: naïve Bayes.   i j i j n c x P c x x x P ) | ( ) | , , , ( 2 1     i j i j C c NB c x P c P c j ) | ( ) ( max arg
  • 11. Properties  Estimating instead of greatly reduces the number of parameters (and the data sparseness).  The learning step in Naïve Bayes consists of estimating and based on the frequencies in the training data  An unseen instance is classified by computing the class that maximizes the posterior  When conditioned independence is satisfied, Naïve Bayes corresponds to MAP classification. ) | ( j i c x P ) | , , , ( 2 1 j n c x x x P  ) ( j c P ) | ( j i c x P
  • 12. Example. ‘Play Tennis’ data Day Outlook Temperature Humidity Wind Play Tennis Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No Day3 Overcast Hot High Weak Yes Day4 Rain Mild High Weak Yes Day5 Rain Cool Normal Weak Yes Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes Day8 Sunny Mild High Weak No Day9 Sunny Cool Normal Weak Yes Day10 Rain Mild Normal Weak Yes Day11 Sunny Mild Normal Strong Yes Day12 Overcast Mild High Strong Yes Day13 Overcast Hot Normal Weak Yes Day14 Rain Mild High Strong No Question: For the day <sunny, cool, high, strong>, what’s the play prediction?
  • 13. Naive Bayes solution Classify any new datum instance x=(a1,…aT) as:  To do this based on training examples, we need to estimate the parameters from the training examples:  For each target value (hypothesis) h  For each attribute value at of each datum instance ) ( estimate : ) ( ˆ h P h P  ) | ( estimate : ) | ( ˆ h a P h a P t t     t t h h Bayes Naive h a P h P h P h P h ) | ( ) ( max arg ) | ( ) ( max arg x
  • 14. Based on the examples in the table, classify the following datum x: x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)  That means: Play tennis or not?  Working: ) | ( ) | ( ) | ( ) | ( ) ( max arg ) | ( ) ( max arg ) | ( ) ( max arg ] , [ ] , [ ] , [ h strong Wind P h high Humidity P h cool Temp P h sunny Outlook P h P h a P h P h P h P h no yes h t t no yes h no yes h NB            x no x PlayTennis answer no strong P no high P no cool P no sunny P no P yes strong P yes high P yes cool P yes sunny P yes P etc no PlayTennis strong Wind P yes PlayTennis strong Wind P no PlayTennis P yes PlayTennis P                   ) ( : ) | ( ) | ( ) | ( ) | ( ) ( 0053 . 0 ) | ( ) | ( ) | ( ) | ( ) ( . 60 . 0 5 / 3 ) | ( 33 . 0 9 / 3 ) | ( 36 . 0 14 / 5 ) ( 64 . 0 14 / 9 ) ( 0.0206
  • 15. Underflow Prevention  Multiplying lots of probabilities, which are between 0 and 1 by definition, can result in floating-point underflow.  Since log(xy) = log(x) + log(y), it is better to perform all computations by summing logs of probabilities rather than multiplying probabilities.  Class with highest final un-normalized log probability score is still the most probable.      positions i j i j C c NB c x P c P c ) | ( log ) ( log argmax j