Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Short Intro to Naive Bayesian Classifiers

1,558 views

Published on

Published in: Technology, Education
  • Be the first to comment

A Short Intro to Naive Bayesian Classifiers

  1. 1. e Naïv to n ctio du intro theory drew’s th -dep nd the see An ers. in ore fiers a lease ta Min Naïve Bayes m or a Classi hem, p for Da F t es ty Bay unding obabili r o surr re on P Classifiers u l ec t Andrew W. Moore Note to other teachers and users of these slides. Andrew would be delighted Professor if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them School of Computer Science to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in Carnegie Mellon University your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: www.cs.cmu.edu/~awm http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully awm@cs.cmu.edu received. e ad y have alr 412-268-7599 you assume otes These n works sian Net e met Bay Copyright © 2004, Andrew W. Moore
  2. 2. A simple Bayes Net J C Z R J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once Copyright © 2004, Andrew W. Moore Slide 2
  3. 3. A simple Bayes Net J What parameters are stored in the CPTs of this Bayes Net? C Z R J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once Copyright © 2004, Andrew W. Moore Slide 3
  4. 4. A simple Bayes Net P(J) = J J Person is a Junior C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once C Z R P(Z|J) = P(R|J) = P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= le 20 peop from tabase se ld we u a da co u ave s e we h re. How this CPT? Suppo ctu ded a le e values in en who att stimate th e that to Copyright © 2004, Andrew W. Moore Slide 4
  5. 5. A simple Bayes Net P(J) = J J Person is a Junior # people who Coat to Classroom C Brought walked to school Z Live in zipcode 15213 R people in database more # Saw “Return of the King” than once C F R # people who walked to school and brought a coat # people who walked to school P(Z|J) = P(R|J) = P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= le 20 peop from tabase se ld we u a da co u # coat - bringers who ave ectture. How schoolCPT? se we h didn' walk to Suppo this l edidn' a walk to school lues in nded # people o att estimtate the va wh who that to Copyright © 2004, Andrew W. Moore Slide 5
  6. 6. A Naïve Bayes Classifier P(J) = Output Attribute J J Walked to School C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once C Z R P(Z|J) = P(R|J) = Input Attributes P(C|J) = P(Z|~J)= P(R|~J)= P(C|~J)= A new person shows up at class wearing an “I live right above the Manor Theater where I saw all the Lord of The Rings Movies every night” overcoat. What is the probability that they are a Junior? Copyright © 2004, Andrew W. Moore Slide 6
  7. 7. Naïve Bayes Classifier Inference P ( J | C ^ ¬Z ^ R ) = J P ( J ^ C ^ ¬Z ^ R ) = P(C ^ ¬Z ^ R) C Z R P ( J ^ C ^ ¬Z ^ R ) = P ( J ^ C ^ ¬Z ^ R ) + P ( ¬J ^ C ^ ¬Z ^ R ) P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) = P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) ⎛ ⎞ ⎜ ⎟ + ⎜ ⎟ ⎜ P (C | ¬J ) P(¬Z | ¬J ) P ( R | ¬J ) P(¬J ) ⎟ ⎝ ⎠ Copyright © 2004, Andrew W. Moore Slide 7
  8. 8. The General Case Y ... X1 X2 Xm 1. Estimate P(Y=v) as fraction of records with Y=v 2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that also have X=u. 3. To predict the Y value given observations of all the Xi values, compute Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v Copyright © 2004, Andrew W. Moore Slide 8
  9. 9. Naïve Bayes Classifier Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v P(Y = v ^ X 1 = u1 L X m = um ) = argmax predict Y P( X 1 = u1 L X m = um ) v P( X 1 = u1 L X m = um | Y = v) P(Y = v) = argmax predict Y P( X 1 = u1 L X m = um ) v Y predict = argmax P( X 1 = u1 L X m = um | Y = v) P(Y = v) v Because of the structure of the Bayes Net nY Y predict = argmax P(Y = v)∏ P( X j = u j | Y = v) j =1 v Copyright © 2004, Andrew W. Moore Slide 9
  10. 10. More Facts About Naïve Bayes Classifiers • Naïve Bayes Classifiers can be built with real-valued inputs* • Rather Technical Complaint: Bayes Classifiers don’t try to be maximally discriminative---they merely try to honestly model what’s going on* • Zero probabilities are painful for Joint and Naïve. A hack (justifiable with the magic words “Dirichlet Prior”) can help*. • Naïve Bayes is wonderfully cheap. And survives 10,000 attributes cheerfully! *See future Andrew Lectures Copyright © 2004, Andrew W. Moore Slide 10
  11. 11. What you should know • How to build a Bayes Classifier • How to predict with a BC Copyright © 2004, Andrew W. Moore Slide 11

×