SlideShare a Scribd company logo
1 of 59
Download to read offline
Naïve	
  Bayes	
  
Essen,al	
  Probability	
  Concepts	
  
• Marginaliza,on:	
  	
  
• 	
  Condi,onal	
  Probability:	
  
	
  
	
  
• 	
  Bayes’	
  Rule:	
  
	
  
• Independence:	
  
2
	
  
P(A | B) =
P(A ^ B)
P(B)
P(A ^ B) = P(A | B) ⇥ P(B)
A?
?B $ P(A ^ B) = P(A) ⇥ P(B)
$ P(A | B) = P(A)
P(A | B) =
P(B | A) ⇥ P(A)
P(B)
A?
?B | C $ P(A ^ B | C) = P(A | C) ⇥ P(B | C)
P(B) =
X
v2values(A)
P(B ^ A = v)
Density	
  Es,ma,on	
  
3
	
  
Recall	
  the	
  Joint	
  Distribu,on...	
  
4
	
  
alarm ¬alarm
earthquake ¬earthquake earthquake ¬earthquake
burglary 0.01 0.08 0.001 0.009
¬burglary 0.01 0.09 0.01 0.79
How	
  Can	
  We	
  Obtain	
  a	
  Joint	
  Distribu,on?	
  
Op#on	
  1:	
  	
  Elicit	
  it	
  from	
  an	
  expert	
  human	
  
Op#on	
  2:	
  	
  Build	
  it	
  up	
  from	
  simpler	
  probabilis,c	
  facts	
  
• e.g,	
  if	
  we	
  knew	
  
	
   	
  	
  P(a) = 0.7 ! !P(b|a) = 0.2 ! ! P(b|¬a) = 0.1"
	
  	
  	
  	
  	
  then,	
  we	
  could	
  compute	
  P(a ∧ b)"
Op#on	
  3:	
  	
  Learn	
  it	
  from	
  data...	
  
	
  
5
	
  
Based	
  on	
  slide	
  by	
  Andrew	
  Moore	
  
Learning	
  a	
  Joint	
  Distribu,on	
  
Build	
  a	
  JD	
  table	
  for	
  your	
  
aTributes	
  in	
  which	
  the	
  
probabili,es	
  are	
  unspecified	
  
Then,	
  fill	
  in	
  each	
  row	
  with:	
  
records
of
number
total
row
matching
records
)
row
(
ˆ =
P
A B C Prob
0 0 0 ?
0 0 1 ?
0 1 0 ?
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 ?
1 1 1 ?
A B C Prob
0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
Frac,on	
  of	
  all	
  records	
  in	
  which	
  
A	
  and	
  B	
  are	
  true	
  but	
  C	
  is	
  false	
  
Step	
  1:	
   Step	
  2:	
  
Slide	
  ©	
  Andrew	
  Moore	
  
Example	
  of	
  Learning	
  a	
  Joint	
  PD	
  
This	
  Joint	
  PD	
  was	
  obtained	
  by	
  learning	
  from	
  three	
  
aTributes	
  in	
  the	
  UCI	
  “Adult”	
  Census	
  Database	
  [Kohavi	
  1995]	
  
Slide	
  ©	
  Andrew	
  Moore	
  
Density	
  Es,ma,on	
  
• Our	
  joint	
  distribu,on	
  learner	
  is	
  an	
  example	
  of	
  
something	
  called	
  Density	
  Es#ma#on	
  
• A	
  Density	
  Es,mator	
  learns	
  a	
  mapping	
  from	
  a	
  set	
  of	
  
aTributes	
  to	
  a	
  probability	
  
Density	
  
Es,mator	
  
Probability	
  
Input	
  
ATributes	
  
Slide	
  ©	
  Andrew	
  Moore	
  
Regressor	
  
Predic,on	
  of	
  
real-­‐valued	
  
output	
  
Input	
  
ATributes	
  
Density	
  Es,ma,on	
  
Compare	
  it	
  against	
  the	
  two	
  other	
  major	
  kinds	
  of	
  models:	
  
Classifier	
  
Predic,on	
  of	
  
categorical	
  
output	
  
Input	
  
ATributes	
  
Density	
  
Es,mator	
  
Probability	
  
Input	
  
ATributes	
  
Slide	
  ©	
  Andrew	
  Moore	
  
Evalua,ng	
  Density	
  Es,ma,on	
  
Test	
  set	
  
Accuracy	
  
?	
  
Test	
  set	
  
Accuracy	
  
Test-­‐set	
  criterion	
  for	
  
es,ma,ng	
  performance	
  
on	
  future	
  data	
  
Regressor	
  
Predic,on	
  of	
  
real-­‐valued	
  
output	
  
Input	
  
ATributes	
  
Classifier	
  
Predic,on	
  of	
  
categorical	
  
output	
  
Input	
  
ATributes	
  
Density	
  
Es,mator	
  
Probability	
  
Input	
  
ATributes	
  
Slide	
  ©	
  Andrew	
  Moore	
  
• Given	
  a	
  record	
  x,	
  a	
  density	
  es,mator	
  M	
  can	
  tell	
  you	
  
how	
  likely	
  the	
  record	
  is:	
  
• The	
  density	
  es,mator	
  can	
  also	
  tell	
  you	
  how	
  likely	
  the	
  
dataset	
  is:	
  
– Under	
  the	
  assump,on	
  that	
  all	
  records	
  were	
  independently	
  
generated	
  from	
  the	
  Density	
  Es,mator’s	
  JD	
  	
  	
  	
  (that	
  is,	
  i.i.d.)	
  
	
  
Evalua,ng	
  a	
  Density	
  Es,mator	
  
P̂(x | M)
t | M) = P̂(x1 ^ x2 ^ . . . ^ xn | M) =
n
Y
i=1
P̂(xi | M)
dataset	
  
Slide	
  by	
  Andrew	
  Moore	
  
Example	
  Small	
  Dataset:	
  Miles	
  Per	
  Gallon	
  
From	
  the	
  UCI	
  repository	
  (thanks	
  to	
  Ross	
  Quinlan)	
  
• 192	
  records	
  in	
  the	
  training	
  set	
  
mpg modelyear maker
good 75to78 asia
bad 70to74 america
bad 75to78 europe
bad 70to74 america
bad 70to74 america
bad 70to74 asia
bad 70to74 asia
bad 75to78 america
: : :
: : :
: : :
bad 70to74 america
good 79to83 america
bad 75to78 america
good 79to83 america
bad 75to78 america
good 79to83 america
good 79to83 america
bad 70to74 america
good 75to78 europe
bad 75to78 europe
Slide	
  by	
  Andrew	
  Moore	
  
Example	
  Small	
  Dataset:	
  Miles	
  Per	
  Gallon	
  
From	
  the	
  UCI	
  repository	
  (thanks	
  to	
  Ross	
  Quinlan)	
  
• 192	
  records	
  in	
  the	
  training	
  set	
  
mpg modelyear maker
good 75to78 asia
bad 70to74 america
bad 75to78 europe
bad 70to74 america
bad 70to74 america
bad 70to74 asia
bad 70to74 asia
bad 75to78 america
: : :
: : :
: : :
bad 70to74 america
good 79to83 america
bad 75to78 america
good 79to83 america
bad 75to78 america
good 79to83 america
good 79to83 america
bad 70to74 america
good 75to78 europe
bad 75to78 europe
P̂(dataset | M) =
n
Y
i=1
P̂(xi | M)
= 3.4 ⇥ 10 203
(in this case)
Slide	
  by	
  Andrew	
  Moore	
  
Log	
  Probabili,es	
  
• For	
  decent	
  sized	
  data	
  sets,	
  this	
  product	
  will	
  underflow	
  
• Therefore,	
  since	
  probabili,es	
  of	
  datasets	
  get	
  so	
  small,	
  
we	
  usually	
  use	
  log	
  probabili,es	
  
log P̂(dataset | M) = log
n
Y
i=1
P̂(xi | M) =
n
X
i=1
log P̂(xi | M)
P̂(dataset | M) =
n
Y
i=1
P̂(xi | M)
= 3.4 ⇥ 10 203
(in this case)
Based	
  on	
  slide	
  by	
  Andrew	
  Moore	
  
Example	
  Small	
  Dataset:	
  Miles	
  Per	
  Gallon	
  
From	
  the	
  UCI	
  repository	
  (thanks	
  to	
  Ross	
  Quinlan)	
  
• 192	
  records	
  in	
  the	
  training	
  set	
  
mpg modelyear maker
good 75to78 asia
bad 70to74 america
bad 75to78 europe
bad 70to74 america
bad 70to74 america
bad 70to74 asia
bad 70to74 asia
bad 75to78 america
: : :
: : :
: : :
bad 70to74 america
good 79to83 america
bad 75to78 america
good 79to83 america
bad 75to78 america
good 79to83 america
good 79to83 america
bad 70to74 america
good 75to78 europe
bad 75to78 europe
Slide	
  by	
  Andrew	
  Moore	
  
log P̂(dataset | M) =
n
X
i=1
log P̂(xi | M)
= 466.19 (in this case)
Pros/Cons	
  of	
  the	
  Joint	
  Density	
  Es,mator	
  
The	
  Good	
  News:	
  
• We	
  can	
  learn	
  a	
  Density	
  Es,mator	
  from	
  data.	
  
• Density	
  es,mators	
  can	
  do	
  many	
  good	
  things…	
  
– Can	
  sort	
  the	
  records	
  by	
  probability,	
  and	
  thus	
  spot	
  weird	
  
records	
  (anomaly	
  detec,on)	
  
– Can	
  do	
  inference	
  
– Ingredient	
  for	
  Bayes	
  Classifiers	
  (coming	
  very	
  soon...)	
  
	
  
The	
  Bad	
  News:	
  
• Density	
  es,ma,on	
  by	
  directly	
  learning	
  the	
  joint	
  is	
  
trivial,	
  mindless,	
  and	
  dangerous	
  
Slide	
  by	
  Andrew	
  Moore	
  
The	
  Joint	
  Density	
  Es,mator	
  on	
  a	
  Test	
  Set	
  
• An	
  independent	
  test	
  set	
  with	
  196	
  cars	
  has	
  a	
  much	
  
worse	
  log-­‐likelihood	
  
– Actually	
  it’s	
  a	
  billion	
  quin,llion	
  quin,llion	
  quin,llion	
  
quin,llion	
  ,mes	
  less	
  likely	
  
• Density	
  es,mators	
  can	
  overfit...	
  
	
  ...and	
  the	
  full	
  joint	
  density	
  es,mator	
  is	
  the	
  
overfikest	
  of	
  them	
  all!	
  
Slide	
  by	
  Andrew	
  Moore	
  
Copyright	
  ©	
  Andrew	
  W.	
  Moore	
  
Overfikng	
  Density	
  Es,mators	
  
If	
  this	
  ever	
  happens,	
  the	
  
joint	
  PDE	
  learns	
  there	
  are	
  
certain	
  combina,ons	
  that	
  
are	
  impossible	
  
log P̂(dataset | M) =
n
X
i=1
log P̂(xi | M)
= 1 if for any i, P̂(xi | M) = 0
Slide	
  by	
  Andrew	
  Moore	
  
Curse	
  of	
  Dimensionality	
  
Slide	
  by	
  Christopher	
  Bishop	
  
The	
  Joint	
  Density	
  Es,mator	
  on	
  a	
  Test	
  Set	
  
• The	
  only	
  reason	
  that	
  the	
  test	
  set	
  didn’t	
  score	
  -­‐∞	
  is	
  
that	
  the	
  code	
  was	
  hard-­‐wired	
  to	
  always	
  predict	
  a	
  
probability	
  of	
  at	
  least	
  1/1020	
  
We	
  need	
  Density	
  Es-mators	
  that	
  are	
  less	
  prone	
  to	
  
overfi7ng...	
  
Slide	
  by	
  Andrew	
  Moore	
  
The	
  Naïve	
  Bayes	
  Classifier	
  
21
	
  
Bayes’	
  Rule	
  
• Recall	
  Baye’s	
  Rule:	
  
• Equivalently,	
  we	
  can	
  write:	
  
where	
  X	
  is	
  a	
  random	
  variable	
  represen,ng	
  the	
  evidence	
  and	
  
Y	
  is	
  a	
  random	
  variable	
  for	
  the	
  label	
  
• This	
  is	
  actually	
  short	
  for:	
  
	
  where	
  Xj	
  denotes	
  the	
  random	
  variable	
  for	
  the	
  j	
  th	
  feature	
   22
	
  
P(hypothesis | evidence) =
P(evidence | hypothesis) ⇥ P(hypothesis)
P(evidence)
P(Y = yk | X = xi) =
P(Y = yk)P(X1 = xi,1 ^ . . . ^ Xd = xi,d | Y = yk)
P(X1 = xi,1 ^ . . . ^ Xd = xi,d)
P(Y = yk | X = xi) =
P(Y = yk)P(X = xi | Y = yk)
P(X = xi)
Naïve	
  Bayes	
  Classifier	
  
Idea:	
  	
  Use	
  the	
  training	
  data	
  to	
  es,mate	
  
	
  
	
  Then,	
  use	
  Bayes	
  rule	
  to	
  infer	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  for	
  new	
  data	
  
	
  
• Recall	
  that	
  es,ma,ng	
  the	
  joint	
  probability	
  distribu,on	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  is	
  not	
  prac,cal	
  
	
   23
	
  
P(X | Y ) P(Y )
and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .	
  
P(Y |Xnew)
P(Y = yk | X = xi) =
P(Y = yk)P(X1 = xi,1 ^ . . . ^ Xd = xi,d | Y = yk)
P(X1 = xi,1 ^ . . . ^ Xd = xi,d)
P(X1, X2, . . . , Xd | Y )
Easy	
  to	
  es,mate	
  
from	
  data
	
  
Unnecessary,	
  as	
  it	
  turns	
  out
	
  
Imprac,cal,	
  but	
  necessary
	
  
Naïve	
  Bayes	
  Classifier	
  
Problem:	
  	
  es,ma,ng	
  the	
  joint	
  PD	
  or	
  CPD	
  isn’t	
  prac,cal	
  
– Severely	
  overfits,	
  as	
  we	
  saw	
  before	
  
However,	
  if	
  we	
  make	
  the	
  assump,on	
  that	
  the	
  aTributes	
  
are	
  independent	
  given	
  the	
  class	
  label,	
  es,ma,on	
  is	
  easy!	
  
	
  
	
  
	
  
• In	
  other	
  words,	
  we	
  assume	
  all	
  aTributes	
  are	
  
condi-onally	
  independent	
  given	
  Y	
  
• Onen	
  this	
  assump,on	
  is	
  violated	
  in	
  prac,ce,	
  but	
  more	
  
on	
  that	
  later…	
  
24
	
  
P(X1, X2, . . . , Xd | Y ) =
d
Y
j=1
P(Xj | Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = ? ! ! ! ! ! ! P(¬play) = ?"
P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
25
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = ? ! ! ! ! ! ! P(¬play) = ?"
P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
26
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
27
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
28
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
29
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
30
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
31
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
32
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
33
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
34
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = 1"
... ! ! ! ! ! ! ! ! ! ..."
35
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng!	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0"
P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = 1"
... ! ! ! ! ! ! ! ! ! ..."
"
36
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Laplace	
  Smoothing	
  
• No,ce	
  that	
  some	
  probabili,es	
  es,mated	
  by	
  coun,ng	
  
might	
  be	
  zero	
  
– Possible	
  overfikng!	
  
• Fix	
  by	
  using	
  Laplace	
  smoothing:	
  
– Adds	
  1	
  to	
  each	
  count	
  
where	
  	
  
– cv	
  is	
  the	
  count	
  of	
  training	
  instances	
  with	
  a	
  value	
  of	
  v	
  for	
  
aTribute	
  j	
  and	
  class	
  label	
  yk!
– |values(Xj)| is	
  the	
  number	
  of	
  values	
  Xj	
  can	
  take	
  on	
  
37
	
  
P(Xj = v | Y = yk) =
cv + 1
X
v02values(Xj )
cv0 + |values(Xj)|
Training	
  Naïve	
  Bayes	
  with	
  Laplace	
  Smoothing	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng	
  with	
  Laplace	
  smoothing:	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = ?"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
38
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  with	
  Laplace	
  Smoothing	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng	
  with	
  Laplace	
  smoothing:	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3"
P(Humid = high | play) = ?
! ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
39
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  with	
  Laplace	
  Smoothing	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng	
  with	
  Laplace	
  smoothing:	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3"
P(Humid = high | play) = 3/5 ! P(Humid = high | ¬play) = ?"
... ! ! ! ! ! ! ! ! ! ..."
40
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Training	
  Naïve	
  Bayes	
  with	
  Laplace	
  Smoothing	
  
Es,mate	
  	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  directly	
  from	
  the	
  training	
  
data	
  by	
  coun,ng	
  with	
  Laplace	
  smoothing:	
  
	
  
	
  
	
  
	
  
P(play) = 3/4
! ! ! ! ! ! P(¬play) = 1/4"
P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3"
P(Humid = high | play) = 3/5 ! P(Humid = high | ¬play) = 2/3"
... ! ! ! ! ! ! ! ! ! ..."
41
Sky	
   Temp	
   Humid	
   Wind	
   Water	
   Forecast	
   Play?	
  
sunny	
   warm	
   normal	
   strong	
   warm	
   same	
   yes	
  
sunny	
   warm	
   high	
   strong	
   warm	
   same	
   yes	
  
rainy	
   cold	
   high	
   strong	
   warm	
   change	
   no	
  
sunny	
   warm	
   high	
   strong	
   cool	
   change	
   yes	
  
P(Xj | Y ) P(Y )
Using	
  the	
  Naïve	
  Bayes	
  Classifier	
  
• Now,	
  we	
  have	
  
– 	
   In	
  prac,ce,	
  we	
  use	
  log-­‐probabili,es	
  to	
  prevent	
  underflow	
  
• To	
  classify	
  a	
  new	
  point	
  x,	
  
42
	
  
P(Y = yk | X = xi) =
P(Y = yk)
Qd
j=1 P(Xj = xi,j | Y = yk)
P(X = xi)
This	
  is	
  constant	
  for	
  a	
  given	
  instance,	
  
and	
  so	
  irrelevant	
  to	
  our	
  predic,on
	
  
h(x) = arg max
yk
P(Y = yk)
d
Y
j=1
P(Xj = xj | Y = yk)
= arg max
yk
log P(Y = yk) +
d
X
j=1
log P(Xj = xj | Y = yk)
j	
  th	
  aTribute	
  value	
  of	
  x!
h(x) = arg max
yk
P(Y = yk)
d
Y
j=1
P(Xj = xj | Y = yk)
= arg max
yk
log P(Y = yk) +
d
X
j=1
log P(Xj = xj | Y = yk)
43
The	
  Naïve	
  Bayes	
  Classifier	
  Algorithm	
  
• For	
  each	
  class	
  label	
  yk!
– Es,mate	
  P(Y = yk) from	
  the	
  data	
  
– For	
  each	
  value	
  xi,j	
  	
  of	
  each	
  aTribute	
  Xi!
• Es,mate	
  P(Xi = xi,j | Y = yk)"
• Classify	
  a	
  new	
  point	
  via:	
  
• In	
  prac,ce,	
  the	
  independence	
  assump,on	
  doesn’t	
  
onen	
  hold	
  true,	
  but	
  Naïve	
  Bayes	
  performs	
  very	
  well	
  
despite	
  it	
  
h(x) = arg max
yk
log P(Y = yk) +
d
X
j=1
log P(Xj = xj | Y = yk)
Compu,ng	
  Probabili,es	
  
(Not	
  Just	
  Predic,ng	
  Labels)	
  
• NB	
  classifier	
  gives	
  predic,ons,	
  not	
  probabili,es,	
  because	
  
we	
  ignore	
  P(X)	
  	
  (the	
  denominator	
  in	
  Bayes	
  rule)	
  
• Can	
  produce	
  probabili,es	
  by:	
  
– For	
  each	
  possible	
  class	
  label	
  yk ,	
  compute	
  
	
  
– α	
  is	
  given	
  by	
  
– Class	
  probability	
  is	
  given	
  by	
  
44
	
  
P̃(Y = yk | X = x) = P(Y = yk)
d
Y
j=1
P(Xj = xj | Y = yk)
This	
  is	
  the	
  numerator	
  of	
  Bayes	
  rule,	
  and	
  is	
  
therefore	
  off	
  the	
  true	
  probability	
  by	
  a	
  factor	
  
of	
  α	
  that	
  makes	
  probabili,es	
  sum	
  to	
  1
	
  
↵ =
1
P#classes
k=1 P̃(Y = yk | X = x)
P(Y = yk | X = x) = ↵P̃(Y = yk | X = x)
Naïve	
  Bayes	
  Applica,ons	
  
• Text	
  classifica,on	
  
– Which	
  e-­‐mails	
  are	
  spam?	
  
– Which	
  e-­‐mails	
  are	
  mee,ng	
  no,ces?	
  
– Which	
  author	
  wrote	
  a	
  document?	
  
• Classifying	
  mental	
  states	
  
People	
  Words	
   Animal	
  Words	
  
Learning	
  P(BrainActivity | WordCategory)"
Pairwise	
  Classifica,on	
  
Accuracy:	
  85%	
  
The	
  Naïve	
  Bayes	
  Graphical	
  Model	
  
• Nodes	
  denote	
  random	
  variables	
  
• Edges	
  denote	
  dependency	
  
• Each	
  node	
  has	
  an	
  associated	
  condi,onal	
  probability	
  
table	
  (CPT),	
  condi,oned	
  upon	
  its	
  parents	
  
46
	
  
ATributes	
  (evidence)	
  
Labels	
  (hypotheses)	
  
…
Y"
X1" Xi! Xd!
…
P(Y)"
P(X1 | Y)" P(Xi | Y)" P(Xd | Y)"
Example	
  NB	
  Graphical	
  Model	
  
47
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
48
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
  
no	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
49
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
50
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
  
rainy	
   yes	
  
sunny	
   no	
  
rainy	
   no	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
51
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
52
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
  
cold	
   yes	
  
warm	
   no	
  
cold	
   no	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
53
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
54
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Humid	
  Play?	
  P(Humid	
  |	
  Play)	
  
high	
   yes	
  
norm	
   yes	
  
high	
   no	
  
norm	
   no	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
55
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Sky	
   Temp	
   Humid	
   Play?	
  
sunny	
   warm	
   normal	
   yes	
  
sunny	
   warm	
   high	
   yes	
  
rainy	
   cold	
   high	
   no	
  
sunny	
   warm	
   high	
   yes	
  
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Humid	
  Play?	
  P(Humid	
  |	
  Play)	
  
high	
   yes	
   3/5	
  
norm	
   yes	
   2/5	
  
high	
   no	
   2/3	
  
norm	
   no	
   1/3	
  
Data:	
  
Example	
  NB	
  Graphical	
  Model	
  
• Some	
  redundancies	
  in	
  CPTs	
  that	
  can	
  be	
  eliminated	
  
56
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Humid	
  Play?	
  P(Humid	
  |	
  Play)	
  
high	
   yes	
   3/5	
  
norm	
   yes	
   2/5	
  
high	
   no	
   2/3	
  
norm	
   no	
   1/3	
  
Example	
  Using	
  NB	
  for	
  Classifica,on	
  
Goal:	
  	
  Predict	
  label	
  for	
  	
  x	
  =	
  (rainy,	
  warm,	
  normal)	
  
57
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Humid	
  Play?	
  P(Humid	
  |	
  Play)	
  
high	
   yes	
   3/5	
  
norm	
   yes	
   2/5	
  
high	
   no	
   2/3	
  
norm	
   no	
   1/3	
  
h(x) = arg max
yk
log P(Y = yk) +
d
X
j=1
log P(Xj = xj | Y = yk)
Example	
  Using	
  NB	
  for	
  Classifica,on	
  
Predict	
  label	
  for:	
  	
  
x	
  =	
  (rainy,	
  warm,	
  normal)	
  
58
	
  
…
Play
	
  
Sky	
   Temp	
   Humid	
  
…
Play?	
   P(Play)	
  
yes	
   3/4	
  
no	
   1/4	
  
Sky	
   Play?	
  P(Sky	
  |	
  Play)	
  
sunny	
   yes	
   4/5	
  
rainy	
   yes	
   1/5	
  
sunny	
   no	
   1/3	
  
rainy	
   no	
   2/3	
  
Temp	
  Play?	
  P(Temp	
  |	
  Play)	
  
warm	
   yes	
   4/5	
  
cold	
   yes	
   1/5	
  
warm	
   no	
   1/3	
  
cold	
   no	
   2/3	
  
Humid	
  Play?	
  P(Humid	
  |	
  Play)	
  
high	
   yes	
   3/5	
  
norm	
   yes	
   2/5	
  
high	
   no	
   2/3	
  
norm	
   no	
   1/3	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  predict	
  	
  	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  PLAY
	
  
Naïve	
  Bayes	
  Summary	
  
Advantages:	
  
• Fast	
  to	
  train	
  (single	
  scan	
  through	
  data)	
  
• Fast	
  to	
  classify	
  	
  
• Not	
  sensi,ve	
  to	
  irrelevant	
  features	
  
• Handles	
  real	
  and	
  discrete	
  data	
  
• Handles	
  streaming	
  data	
  well	
  
Disadvantages:	
  
• Assumes	
  independence	
  of	
  features	
  
59
	
  
Slide	
  by	
  Eamonn	
  Keogh	
  

More Related Content

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

09_NaiveBayes.pdf

  • 2. Essen,al  Probability  Concepts   • Marginaliza,on:     •  Condi,onal  Probability:       •  Bayes’  Rule:     • Independence:   2   P(A | B) = P(A ^ B) P(B) P(A ^ B) = P(A | B) ⇥ P(B) A? ?B $ P(A ^ B) = P(A) ⇥ P(B) $ P(A | B) = P(A) P(A | B) = P(B | A) ⇥ P(A) P(B) A? ?B | C $ P(A ^ B | C) = P(A | C) ⇥ P(B | C) P(B) = X v2values(A) P(B ^ A = v)
  • 4. Recall  the  Joint  Distribu,on...   4   alarm ¬alarm earthquake ¬earthquake earthquake ¬earthquake burglary 0.01 0.08 0.001 0.009 ¬burglary 0.01 0.09 0.01 0.79
  • 5. How  Can  We  Obtain  a  Joint  Distribu,on?   Op#on  1:    Elicit  it  from  an  expert  human   Op#on  2:    Build  it  up  from  simpler  probabilis,c  facts   • e.g,  if  we  knew        P(a) = 0.7 ! !P(b|a) = 0.2 ! ! P(b|¬a) = 0.1"          then,  we  could  compute  P(a ∧ b)" Op#on  3:    Learn  it  from  data...     5   Based  on  slide  by  Andrew  Moore  
  • 6. Learning  a  Joint  Distribu,on   Build  a  JD  table  for  your   aTributes  in  which  the   probabili,es  are  unspecified   Then,  fill  in  each  row  with:   records of number total row matching records ) row ( ˆ = P A B C Prob 0 0 0 ? 0 0 1 ? 0 1 0 ? 0 1 1 ? 1 0 0 ? 1 0 1 ? 1 1 0 ? 1 1 1 ? A B C Prob 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1 1 0 0.25 1 1 1 0.10 Frac,on  of  all  records  in  which   A  and  B  are  true  but  C  is  false   Step  1:   Step  2:   Slide  ©  Andrew  Moore  
  • 7. Example  of  Learning  a  Joint  PD   This  Joint  PD  was  obtained  by  learning  from  three   aTributes  in  the  UCI  “Adult”  Census  Database  [Kohavi  1995]   Slide  ©  Andrew  Moore  
  • 8. Density  Es,ma,on   • Our  joint  distribu,on  learner  is  an  example  of   something  called  Density  Es#ma#on   • A  Density  Es,mator  learns  a  mapping  from  a  set  of   aTributes  to  a  probability   Density   Es,mator   Probability   Input   ATributes   Slide  ©  Andrew  Moore  
  • 9. Regressor   Predic,on  of   real-­‐valued   output   Input   ATributes   Density  Es,ma,on   Compare  it  against  the  two  other  major  kinds  of  models:   Classifier   Predic,on  of   categorical   output   Input   ATributes   Density   Es,mator   Probability   Input   ATributes   Slide  ©  Andrew  Moore  
  • 10. Evalua,ng  Density  Es,ma,on   Test  set   Accuracy   ?   Test  set   Accuracy   Test-­‐set  criterion  for   es,ma,ng  performance   on  future  data   Regressor   Predic,on  of   real-­‐valued   output   Input   ATributes   Classifier   Predic,on  of   categorical   output   Input   ATributes   Density   Es,mator   Probability   Input   ATributes   Slide  ©  Andrew  Moore  
  • 11. • Given  a  record  x,  a  density  es,mator  M  can  tell  you   how  likely  the  record  is:   • The  density  es,mator  can  also  tell  you  how  likely  the   dataset  is:   – Under  the  assump,on  that  all  records  were  independently   generated  from  the  Density  Es,mator’s  JD        (that  is,  i.i.d.)     Evalua,ng  a  Density  Es,mator   P̂(x | M) t | M) = P̂(x1 ^ x2 ^ . . . ^ xn | M) = n Y i=1 P̂(xi | M) dataset   Slide  by  Andrew  Moore  
  • 12. Example  Small  Dataset:  Miles  Per  Gallon   From  the  UCI  repository  (thanks  to  Ross  Quinlan)   • 192  records  in  the  training  set   mpg modelyear maker good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe Slide  by  Andrew  Moore  
  • 13. Example  Small  Dataset:  Miles  Per  Gallon   From  the  UCI  repository  (thanks  to  Ross  Quinlan)   • 192  records  in  the  training  set   mpg modelyear maker good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe P̂(dataset | M) = n Y i=1 P̂(xi | M) = 3.4 ⇥ 10 203 (in this case) Slide  by  Andrew  Moore  
  • 14. Log  Probabili,es   • For  decent  sized  data  sets,  this  product  will  underflow   • Therefore,  since  probabili,es  of  datasets  get  so  small,   we  usually  use  log  probabili,es   log P̂(dataset | M) = log n Y i=1 P̂(xi | M) = n X i=1 log P̂(xi | M) P̂(dataset | M) = n Y i=1 P̂(xi | M) = 3.4 ⇥ 10 203 (in this case) Based  on  slide  by  Andrew  Moore  
  • 15. Example  Small  Dataset:  Miles  Per  Gallon   From  the  UCI  repository  (thanks  to  Ross  Quinlan)   • 192  records  in  the  training  set   mpg modelyear maker good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe Slide  by  Andrew  Moore   log P̂(dataset | M) = n X i=1 log P̂(xi | M) = 466.19 (in this case)
  • 16. Pros/Cons  of  the  Joint  Density  Es,mator   The  Good  News:   • We  can  learn  a  Density  Es,mator  from  data.   • Density  es,mators  can  do  many  good  things…   – Can  sort  the  records  by  probability,  and  thus  spot  weird   records  (anomaly  detec,on)   – Can  do  inference   – Ingredient  for  Bayes  Classifiers  (coming  very  soon...)     The  Bad  News:   • Density  es,ma,on  by  directly  learning  the  joint  is   trivial,  mindless,  and  dangerous   Slide  by  Andrew  Moore  
  • 17. The  Joint  Density  Es,mator  on  a  Test  Set   • An  independent  test  set  with  196  cars  has  a  much   worse  log-­‐likelihood   – Actually  it’s  a  billion  quin,llion  quin,llion  quin,llion   quin,llion  ,mes  less  likely   • Density  es,mators  can  overfit...    ...and  the  full  joint  density  es,mator  is  the   overfikest  of  them  all!   Slide  by  Andrew  Moore  
  • 18. Copyright  ©  Andrew  W.  Moore   Overfikng  Density  Es,mators   If  this  ever  happens,  the   joint  PDE  learns  there  are   certain  combina,ons  that   are  impossible   log P̂(dataset | M) = n X i=1 log P̂(xi | M) = 1 if for any i, P̂(xi | M) = 0 Slide  by  Andrew  Moore  
  • 19. Curse  of  Dimensionality   Slide  by  Christopher  Bishop  
  • 20. The  Joint  Density  Es,mator  on  a  Test  Set   • The  only  reason  that  the  test  set  didn’t  score  -­‐∞  is   that  the  code  was  hard-­‐wired  to  always  predict  a   probability  of  at  least  1/1020   We  need  Density  Es-mators  that  are  less  prone  to   overfi7ng...   Slide  by  Andrew  Moore  
  • 21. The  Naïve  Bayes  Classifier   21  
  • 22. Bayes’  Rule   • Recall  Baye’s  Rule:   • Equivalently,  we  can  write:   where  X  is  a  random  variable  represen,ng  the  evidence  and   Y  is  a  random  variable  for  the  label   • This  is  actually  short  for:    where  Xj  denotes  the  random  variable  for  the  j  th  feature   22   P(hypothesis | evidence) = P(evidence | hypothesis) ⇥ P(hypothesis) P(evidence) P(Y = yk | X = xi) = P(Y = yk)P(X1 = xi,1 ^ . . . ^ Xd = xi,d | Y = yk) P(X1 = xi,1 ^ . . . ^ Xd = xi,d) P(Y = yk | X = xi) = P(Y = yk)P(X = xi | Y = yk) P(X = xi)
  • 23. Naïve  Bayes  Classifier   Idea:    Use  the  training  data  to  es,mate      Then,  use  Bayes  rule  to  infer                                                  for  new  data     • Recall  that  es,ma,ng  the  joint  probability  distribu,on                                                                                                                is  not  prac,cal     23   P(X | Y ) P(Y ) and                                .   P(Y |Xnew) P(Y = yk | X = xi) = P(Y = yk)P(X1 = xi,1 ^ . . . ^ Xd = xi,d | Y = yk) P(X1 = xi,1 ^ . . . ^ Xd = xi,d) P(X1, X2, . . . , Xd | Y ) Easy  to  es,mate   from  data   Unnecessary,  as  it  turns  out   Imprac,cal,  but  necessary  
  • 24. Naïve  Bayes  Classifier   Problem:    es,ma,ng  the  joint  PD  or  CPD  isn’t  prac,cal   – Severely  overfits,  as  we  saw  before   However,  if  we  make  the  assump,on  that  the  aTributes   are  independent  given  the  class  label,  es,ma,on  is  easy!         • In  other  words,  we  assume  all  aTributes  are   condi-onally  independent  given  Y   • Onen  this  assump,on  is  violated  in  prac,ce,  but  more   on  that  later…   24   P(X1, X2, . . . , Xd | Y ) = d Y j=1 P(Xj | Y )
  • 25. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = ? ! ! ! ! ! ! P(¬play) = ?" P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 25 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 26. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = ? ! ! ! ! ! ! P(¬play) = ?" P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 26 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 27. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 27 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 28. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = ? ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 28 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 29. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 29 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 30. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 30 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 31. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 31 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 32. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 32 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 33. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 33 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 34. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 34 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 35. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = 1" ... ! ! ! ! ! ! ! ! ! ..." 35 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 36. Training  Naïve  Bayes   Es,mate                            and                              directly  from  the  training   data  by  coun,ng!           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 1 ! ! P(Sky = sunny | ¬play) = 0" P(Humid = high | play) = 2/3 ! P(Humid = high | ¬play) = 1" ... ! ! ! ! ! ! ! ! ! ..." " 36 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 37. Laplace  Smoothing   • No,ce  that  some  probabili,es  es,mated  by  coun,ng   might  be  zero   – Possible  overfikng!   • Fix  by  using  Laplace  smoothing:   – Adds  1  to  each  count   where     – cv  is  the  count  of  training  instances  with  a  value  of  v  for   aTribute  j  and  class  label  yk! – |values(Xj)| is  the  number  of  values  Xj  can  take  on   37   P(Xj = v | Y = yk) = cv + 1 X v02values(Xj ) cv0 + |values(Xj)|
  • 38. Training  Naïve  Bayes  with  Laplace  Smoothing   Es,mate                            and                              directly  from  the  training   data  by  coun,ng  with  Laplace  smoothing:           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = ?" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 38 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 39. Training  Naïve  Bayes  with  Laplace  Smoothing   Es,mate                            and                              directly  from  the  training   data  by  coun,ng  with  Laplace  smoothing:           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3" P(Humid = high | play) = ? ! ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 39 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 40. Training  Naïve  Bayes  with  Laplace  Smoothing   Es,mate                            and                              directly  from  the  training   data  by  coun,ng  with  Laplace  smoothing:           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3" P(Humid = high | play) = 3/5 ! P(Humid = high | ¬play) = ?" ... ! ! ! ! ! ! ! ! ! ..." 40 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 41. Training  Naïve  Bayes  with  Laplace  Smoothing   Es,mate                            and                              directly  from  the  training   data  by  coun,ng  with  Laplace  smoothing:           P(play) = 3/4 ! ! ! ! ! ! P(¬play) = 1/4" P(Sky = sunny | play) = 4/5 ! P(Sky = sunny | ¬play) = 1/3" P(Humid = high | play) = 3/5 ! P(Humid = high | ¬play) = 2/3" ... ! ! ! ! ! ! ! ! ! ..." 41 Sky   Temp   Humid   Wind   Water   Forecast   Play?   sunny   warm   normal   strong   warm   same   yes   sunny   warm   high   strong   warm   same   yes   rainy   cold   high   strong   warm   change   no   sunny   warm   high   strong   cool   change   yes   P(Xj | Y ) P(Y )
  • 42. Using  the  Naïve  Bayes  Classifier   • Now,  we  have   –   In  prac,ce,  we  use  log-­‐probabili,es  to  prevent  underflow   • To  classify  a  new  point  x,   42   P(Y = yk | X = xi) = P(Y = yk) Qd j=1 P(Xj = xi,j | Y = yk) P(X = xi) This  is  constant  for  a  given  instance,   and  so  irrelevant  to  our  predic,on   h(x) = arg max yk P(Y = yk) d Y j=1 P(Xj = xj | Y = yk) = arg max yk log P(Y = yk) + d X j=1 log P(Xj = xj | Y = yk) j  th  aTribute  value  of  x! h(x) = arg max yk P(Y = yk) d Y j=1 P(Xj = xj | Y = yk) = arg max yk log P(Y = yk) + d X j=1 log P(Xj = xj | Y = yk)
  • 43. 43 The  Naïve  Bayes  Classifier  Algorithm   • For  each  class  label  yk! – Es,mate  P(Y = yk) from  the  data   – For  each  value  xi,j    of  each  aTribute  Xi! • Es,mate  P(Xi = xi,j | Y = yk)" • Classify  a  new  point  via:   • In  prac,ce,  the  independence  assump,on  doesn’t   onen  hold  true,  but  Naïve  Bayes  performs  very  well   despite  it   h(x) = arg max yk log P(Y = yk) + d X j=1 log P(Xj = xj | Y = yk)
  • 44. Compu,ng  Probabili,es   (Not  Just  Predic,ng  Labels)   • NB  classifier  gives  predic,ons,  not  probabili,es,  because   we  ignore  P(X)    (the  denominator  in  Bayes  rule)   • Can  produce  probabili,es  by:   – For  each  possible  class  label  yk ,  compute     – α  is  given  by   – Class  probability  is  given  by   44   P̃(Y = yk | X = x) = P(Y = yk) d Y j=1 P(Xj = xj | Y = yk) This  is  the  numerator  of  Bayes  rule,  and  is   therefore  off  the  true  probability  by  a  factor   of  α  that  makes  probabili,es  sum  to  1   ↵ = 1 P#classes k=1 P̃(Y = yk | X = x) P(Y = yk | X = x) = ↵P̃(Y = yk | X = x)
  • 45. Naïve  Bayes  Applica,ons   • Text  classifica,on   – Which  e-­‐mails  are  spam?   – Which  e-­‐mails  are  mee,ng  no,ces?   – Which  author  wrote  a  document?   • Classifying  mental  states   People  Words   Animal  Words   Learning  P(BrainActivity | WordCategory)" Pairwise  Classifica,on   Accuracy:  85%  
  • 46. The  Naïve  Bayes  Graphical  Model   • Nodes  denote  random  variables   • Edges  denote  dependency   • Each  node  has  an  associated  condi,onal  probability   table  (CPT),  condi,oned  upon  its  parents   46   ATributes  (evidence)   Labels  (hypotheses)   … Y" X1" Xi! Xd! … P(Y)" P(X1 | Y)" P(Xi | Y)" P(Xd | Y)"
  • 47. Example  NB  Graphical  Model   47   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Data:  
  • 48. Example  NB  Graphical  Model   48   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   no   Data:  
  • 49. Example  NB  Graphical  Model   49   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Data:  
  • 50. Example  NB  Graphical  Model   50   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   rainy   yes   sunny   no   rainy   no   Data:  
  • 51. Example  NB  Graphical  Model   51   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Data:  
  • 52. Example  NB  Graphical  Model   52   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   cold   yes   warm   no   cold   no   Data:  
  • 53. Example  NB  Graphical  Model   53   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Data:  
  • 54. Example  NB  Graphical  Model   54   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Humid  Play?  P(Humid  |  Play)   high   yes   norm   yes   high   no   norm   no   Data:  
  • 55. Example  NB  Graphical  Model   55   … Play   Sky   Temp   Humid   … Sky   Temp   Humid   Play?   sunny   warm   normal   yes   sunny   warm   high   yes   rainy   cold   high   no   sunny   warm   high   yes   Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Humid  Play?  P(Humid  |  Play)   high   yes   3/5   norm   yes   2/5   high   no   2/3   norm   no   1/3   Data:  
  • 56. Example  NB  Graphical  Model   • Some  redundancies  in  CPTs  that  can  be  eliminated   56   … Play   Sky   Temp   Humid   … Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Humid  Play?  P(Humid  |  Play)   high   yes   3/5   norm   yes   2/5   high   no   2/3   norm   no   1/3  
  • 57. Example  Using  NB  for  Classifica,on   Goal:    Predict  label  for    x  =  (rainy,  warm,  normal)   57   … Play   Sky   Temp   Humid   … Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Humid  Play?  P(Humid  |  Play)   high   yes   3/5   norm   yes   2/5   high   no   2/3   norm   no   1/3   h(x) = arg max yk log P(Y = yk) + d X j=1 log P(Xj = xj | Y = yk)
  • 58. Example  Using  NB  for  Classifica,on   Predict  label  for:     x  =  (rainy,  warm,  normal)   58   … Play   Sky   Temp   Humid   … Play?   P(Play)   yes   3/4   no   1/4   Sky   Play?  P(Sky  |  Play)   sunny   yes   4/5   rainy   yes   1/5   sunny   no   1/3   rainy   no   2/3   Temp  Play?  P(Temp  |  Play)   warm   yes   4/5   cold   yes   1/5   warm   no   1/3   cold   no   2/3   Humid  Play?  P(Humid  |  Play)   high   yes   3/5   norm   yes   2/5   high   no   2/3   norm   no   1/3                                  predict                          PLAY  
  • 59. Naïve  Bayes  Summary   Advantages:   • Fast  to  train  (single  scan  through  data)   • Fast  to  classify     • Not  sensi,ve  to  irrelevant  features   • Handles  real  and  discrete  data   • Handles  streaming  data  well   Disadvantages:   • Assumes  independence  of  features   59   Slide  by  Eamonn  Keogh