SlideShare a Scribd company logo
1 of 34
Download to read offline
Expecta(on	
  Propaga(on	
  
Theory	
  and	
  Applica(on	
  

Dong	
  Guo	
  
Research	
  Workshop	
  2013	
  Hulu	
  Internal	
  
	
  
See	
  more	
  details	
  in	
  
hEp://dongguo.me/blog/2014/01/01/expecta(on-­‐propaga(on/	
  
hEp://dongguo.me/blog/2013/12/01/bayesian-­‐ctr-­‐predic(on-­‐for-­‐bing/	
  
	
  
	
  
Outline	
  
• 
• 
• 
• 

Overview	
  
Background	
  
Theory	
  
Applica(ons	
  
OVERVIEW	
  
Bayesian	
  Paradigm	
  
•  Infer	
  posterior	
  distribu(on	
  
Prior	
  
Posterior	
  

Make	
  decision	
  

Data	
  

Note:	
  figure	
  of	
  LDA	
  is	
  from	
  Wikipedia,	
  and	
  the	
  right	
  figure	
  is	
  from	
  paper	
  ‘Web-­‐Scale	
  Bayesian	
  
Click-­‐Through	
  Rate	
  PredicFon	
  for	
  Sponsored	
  Search	
  AdverFsing	
  in	
  MicrosoI’s	
  Bing	
  Search	
  Engine’	
  	
  
Bayesian	
  inference	
  methods	
  
•  Exact	
  inference	
  
–  Belief	
  propaga(on	
  

•  Approximate	
  inference	
  
–  Stochas(c	
  (sampling)	
  
–  Determinis(c	
  
•  Assumed	
  density	
  filtering	
  
•  Expecta(on	
  propaga(on	
  
•  Varia(onal	
  Bayes	
  
Message	
  passing	
  
•  A	
  form	
  of	
  communica(on	
  used	
  in	
  mul(ple	
  
domains	
  of	
  computer	
  science	
  
–  Parallel	
  compu(ng	
  (MPI)	
  
–  Object-­‐oriented	
  programming	
  
–  Inter-­‐process	
  communica(on	
  
–  Bayesian	
  inference	
  
•  A	
  family	
  of	
  methods	
  to	
  infer	
  posterior	
  distribu(on	
  
Expecta(on	
  Propaga(on	
  
•  Belongs	
  to	
  message	
  passing	
  family	
  
•  Approximated	
  method	
  (itera(on	
  is	
  needed)	
  
	
  

•  Very	
  popular	
  in	
  Bayesian	
  inference,	
  especially	
  
in	
  graphic	
  model	
  
Researchers	
  
•  Thomas	
  Minka	
  
–  EP	
  was	
  proposed	
  in	
  PhD	
  thesis	
  

•  Kevin	
  p.	
  Murphy	
  
–  Machine	
  Learning	
  A	
  ProbabilisFc	
  PerspecFve	
  
BACKGROUND	
  
Background	
  
• 
• 
• 
• 
• 
• 

(Truncated)	
  Gaussian	
  
Exponen(al	
  family	
  
Graphic	
  model	
  
Factor	
  graph	
  
Belief	
  propaga(on	
  
Moment	
  matching	
  
Gaussian	
  and	
  Truncated	
  Gaussian	
  
•  Gaussian	
  opera(on	
  is	
  a	
  basis	
  for	
  EP	
  inference	
  
–  Gaussian	
  +*/	
  Gaussian	
  
–  Gaussian	
  integral	
  

•  Truncated	
  Gaussian	
  is	
  used	
  in	
  many	
  EP	
  
applica(ons	
  
•  See	
  details	
  here	
  
Exponen(al	
  family	
  distribu(on	
  
•  Very	
  good	
  summary	
  in	
  Wikipedia	
  

q(z) = h(z)g(η )exp{η T u(z)}
	
  	
  

•  Sufficient	
  sta(s(cs	
  of	
  Gaussian	
  distribu(on:	
  (x,	
  x^2)	
  
•  Typical	
  distribu(on	
  

Note:	
  above	
  4	
  figures	
  are	
  from	
  Wikipedia	
  
Graphical	
  Models	
  
•  Directed	
  graph	
  (Bayesian	
  Network)	
  

x1	
  

x2	
  

x4	
  
K

P(x) = ∏ p(xk | pak )
k=1

x3	
  

•  Undirected	
  graph	
  (Condi(onal	
  
Random	
  Field)	
  

x1	
  

x2	
  

x4	
  

x3	
  
Factor	
  graph	
  
•  Express	
  rela(on	
  between	
  variable	
  nodes	
  explicitly	
  
•  Rela(on	
  in	
  edge	
  -­‐>	
  factor	
  node	
  

•  Hide	
  the	
  difference	
  of	
  BN	
  and	
  CRF	
  in	
  inference	
  
•  Make	
  inference	
  more	
  intui(onal	
  
x1	
  

x2	
  

x4	
  

x3	
  

x1	
  

fa	
  

x2	
  
fc	
  

x4	
  

c	
  

x3	
  
BELIEF	
  PROPAGATION	
  
Belief	
  Propaga(on	
  Overview	
  
•  Exact	
  Bayesian	
  method	
  to	
  infer	
  marginal	
  
distribu(on	
  
–  ‘sum-­‐product’	
  message	
  passing	
  

•  Key	
  components	
  
–  Calculate	
  posterior	
  distribu(on	
  of	
  variable	
  node	
  
–  Two	
  kinds	
  of	
  messages	
  
Posterior	
  distribu(on	
  of	
  variable	
  node	
  
•  Factor	
  graph	
  

p(X) =

∏

Fs (s, X s ), for any variable x in the graph

s∈ne( x )

p(x) = ∑ p(X) = ∑
Xx

∏

Fs (s, X s ) =

X  x s∈ne( x )

∏ ∑ F (x, X ) = ∏
s

s∈ne( x ) X s

in which µ fs −>x (x) = ∑ Fs (x, X s )
Xs

Note:	
  the	
  figure	
  is	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  

s

s∈ne( x )

µ fs −>x (x)
Message:	
  factor	
  -­‐>	
  variable	
  node	
  
•  Factor	
  graph	
  

µ fs −>x (x) = ∑ ...∑ fs (x, x1 ,..., x M )
x1

xM

∏

xm ∈ne( fs ) x

µ xm −> fs (xm ),

in which {x1 ,..., x M } is the set of variables on which the factor fs depends
Note:	
  the	
  figure	
  is	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Message:	
  variable	
  -­‐>	
  factor	
  node	
  
•  Factor	
  graph	
  

µ xm −> fs (xm ) =

∏

µ fl −>xm (xm )

l∈ne( xm ) fs

Summary:	
  posterior	
  distribuFon	
  is	
  only	
  determined	
  by	
  factors	
  !!	
  	
  
Note:	
  the	
  figure	
  is	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Whole	
  steps	
  of	
  BP	
  
•  Steps	
  to	
  calculate	
  posterior	
  distribu(on	
  of	
  given	
  variable	
  
node	
  
–  Step	
  1:	
  construct	
  factor	
  graph	
  
–  Step	
  2:	
  treat	
  the	
  variable	
  node	
  as	
  root,	
  and	
  ini(alize	
  messages	
  
sent	
  from	
  leaf	
  nodes	
  

–  Step	
  3:	
  leverage	
  the	
  message	
  passing	
  steps	
  recursively	
  un(l	
  the	
  
root	
  node	
  receives	
  messages	
  from	
  all	
  of	
  its	
  neighbors	
  
–  Step	
  4:	
  get	
  the	
  marginal	
  distribu(on	
  by	
  mul(plying	
  all	
  messages	
  
sent	
  in	
  
Note:	
  the	
  figures	
  are	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
BP:	
  example	
  
•  Infer	
  marginal	
  distribu(on	
  of	
  x_3	
  

•  Infer	
  marginal	
  distribu(on	
  of	
  every	
  variables	
  

Note:	
  the	
  figures	
  are	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Posterior	
  is	
  intractable	
  some(mes	
  
•  Example	
  
–  Infer	
  the	
  mean	
  of	
  a	
  Gaussian	
  distribu(on	
  
p(x | θ ) = (1− w)N(x | θ , I ) + wN(x | 0,aI )
p(θ ) = N(θ | 0,bI )

–  Ad	
  predictor	
  

Note:	
  the	
  figure	
  is	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Distribu(on	
  Approxima(on	
  
Approximate p(x) with q(x), which belongs to exponential family
Such that: q(x) = h(x)g(η )exp{η T u(x)}
KL( p || q) = − ∫ p(x)In

q(x)
dx = − ∫ p(x)Inq(x)dx + ∫ p(x)Inp(x)dx
p(x)

= − ∫ p(x)Ing(η )dx − ∫ p(x)η T u(x) dx + const
= − Ing(η ) − η T Ε p( x ) [u(x)] + const
where const terms are independent of the natural parameter η

Minimize KL( p || q) by setting the gradient with repect to η to zero:
=> −∇Ing(η ) = Ε p( x ) [u(x)]
By leveraging formula (2.226) in PRML:
=> E q( x ) [u(x)] = −∇Ing(η ) = Ε p( x ) [u(x)]
Moment	
  matching	
  
It's called moment matching when q(x) is Gaussian distribution
then u(x) = (x, x 2 )T
=> ∫ q(x)x dx = ∫ p(x)x dx, and ∫ q(x)x 2 dx = ∫ p(x)x 2 dx
=> meanq( x ) = ∫ q(x)x dx = ∫ p(x)x dx = mean p( x ) ,
variance q( x ) = ∫ q(x)x 2 dx − (meanq( x ) )2
= ∫ p(x)x 2 dx − (mean p( x ) )2 = variance p( x )

•  Moments	
  of	
  a	
  distribu(on	
  
k'th moment M = ∫ x f (x)dx
b

k

a

k
EXPECTATION	
  PROPAGATION	
  
=	
  Belief	
  Propaga(on	
  +	
  Moment	
  matching?	
  
Key	
  Idea	
  
•  Approximate	
  each	
  factor	
  with	
  Gaussian	
  distribu(on	
  
•  Approximate	
  corresponding	
  factor	
  pairs	
  one	
  by	
  one?	
  
•  Approximate	
  each	
  factor	
  in	
  turn	
  in	
  the	
  context	
  of	
  all	
  
remaining	
  factors	
  (Proposed	
  by	
  Minka)	
  
refine factor  (θ ) by ensuring q new (θ ) ∝  (θ )q  j (θ ) is close with f j (θ )q  j (θ )
fj
fj
q(θ )
in which q (θ ) = 
f j (θ )
j
EP:	
  The	
  detail	
  steps	
  
	
  	
  

1.Initialize all of the approximating factors i (θ )
f
2.Initialize the posterior approximation by setting : q(θ ) ∝ ∏ i (θ )
f
i

3.Until convergence :
(a). Choose a fator  (θ ) to refine.
fj
q(θ )
(b). Remove  (θ ) from the posterior by division : q  j (θ ) = 
fj
f j (θ )
(c). Get the new posterior by settting sufficient statistics of q

new

f j (θ )q  j (θ )
(θ ) equal to those of
zj

f j (θ )q  j (θ ) new
1
(minimize KL(
|| q (θ ))),in which z j = ∫ f j (θ )q  j (θ )dθ , and q new (θ ) = j (θ )q  j (θ )
f
zj
k
new
 (θ ) :  (θ ) = k q (θ )
(d). Get the refined factor f j
fj
q  j (θ )
Example:	
  The	
  cluEer	
  problem	
  
•  Infer	
  the	
  mean	
  of	
  a	
  Gaussian	
  distribu(on	
  
•  Want	
  to	
  try	
  MLE,	
  but	
  
p(x | θ ) = (1− w)N(x | θ , I ) + wN(x | 0,aI )
p(θ ) = N(θ | 0,bI )

•  Approximate	
  with	
  
q(θ ) = N(θ | m,vI ), and each factor  (θ ) = N(θ | mn ,vn I )
fn

–  Approximate	
  mixture	
  Gaussian	
  using	
  Gaussian	
  

Note:	
  the	
  figure	
  is	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Example:	
  The	
  cluEer	
  problem(2)	
  
•  Approximate	
  complex	
  factor(e.g.	
  mixture	
  
Gaussian)	
  with	
  Gaussian	
  

fn (θ ) in blue,  (θ ) in red, and q  n (θ ) in green
fn
Remember variance of q  n (θ ) is usually very small, so  (θ ) only need to approximate fn (θ ) in small range
fn

Note:	
  above	
  2	
  figures	
  are	
  from	
  book	
  ‘PaMern	
  recogniFon	
  and	
  machine	
  learning’	
  
Applica(on:	
  Bayesian	
  CTR	
  predictor	
  for	
  Bing	
  
•  See	
  the	
  details	
  here	
  
–  Inference	
  step	
  by	
  step	
  
–  Make	
  predic(on	
  

•  Some	
  insights	
  
–  Variance	
  of	
  each	
  feature	
  increases	
  aker	
  every	
  
exposure	
  
–  Sample	
  with	
  more	
  features	
  will	
  have	
  bigger	
  variance	
  
•  Independent	
  assump(on	
  for	
  features	
  
Experimenta(on	
  
•  Dataset	
  is	
  very	
  Inhomogeneous	
  
	
  

•  Performance	
  
	
  

Model	
  

FTRL	
  

OWLQN	
  

Ad	
  predictor	
  

AUC	
  

0.638	
  

0.641	
  

0.639	
  

	
  
–  Other	
  metrics	
  

•  Pros:	
  speed,	
  parameter	
  choice	
  cost,	
  online	
  learning	
  support,	
  
interpreta(ve,	
  support	
  add	
  more	
  factors	
  
•  Cons:	
  sparse	
  

•  Code	
  
Application: XBOX skill rating system
•  	
  	
  

See	
  details	
  in	
  P793~798	
  of	
  Machine	
  Learning	
  A	
  ProbabilisFc	
  PerspecFve	
  	
  
	
  
Note:	
  the	
  figure	
  is	
  from	
  paper:	
  ‘TrueSkill:	
  A	
  Bayesian	
  Skill	
  RaFng	
  System’	
  	
  
Apply	
  to	
  all	
  Bayesian	
  models	
  
•  Infer.net	
  (Microsok/Bishop)	
  
–  A	
  framework	
  for	
  running	
  Bayesian	
  inference	
  in	
  
graphical	
  models	
  	
  
–  Model-­‐based	
  machine	
  learning	
  	
  
References	
  
•  Books	
  
–  Chapter	
  2/8/10	
  of	
  PaMern	
  RecogniFon	
  and	
  Machine	
  Learning	
  
–  Chapter	
  22	
  of	
  Machine	
  Learning:	
  A	
  ProbabilisFc	
  PerspecFve	
  

•  Papers	
  
– 
– 
– 
– 

A	
  family	
  of	
  algorithms	
  for	
  approximate	
  Bayesian	
  inference	
  
From	
  belief	
  propagaFon	
  to	
  expectaFon	
  propagaFon	
  
TrueSkill:	
  A	
  Bayesian	
  Skill	
  RaFng	
  System	
  
Web-­‐Scale	
  Bayesian	
  Click-­‐Through	
  Rate	
  PredicFon	
  for	
  Sponsored	
  
Search	
  AdverFsing	
  in	
  MicrosoI’s	
  Bing	
  Search	
  Engine	
  

•  Roadmap	
  for	
  EP	
  

More Related Content

What's hot

Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Aspect Extraction Performance With Common Pattern of Dependency Relation in ...
Aspect Extraction Performance With Common Pattern of  Dependency Relation in ...Aspect Extraction Performance With Common Pattern of  Dependency Relation in ...
Aspect Extraction Performance With Common Pattern of Dependency Relation in ...Nurfadhlina Mohd Sharef
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1gohyunwoong
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyAlexandros Karatzoglou
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Probabilistic approach to reliable localization
Probabilistic approach to reliable localizationProbabilistic approach to reliable localization
Probabilistic approach to reliable localizationNaokiAkai2
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)AJ. Tor วิศวกรรมแหล่งนํา้
 
บทที่ 2 การเคลื่อนที่แนวตรง
บทที่ 2 การเคลื่อนที่แนวตรงบทที่ 2 การเคลื่อนที่แนวตรง
บทที่ 2 การเคลื่อนที่แนวตรงThepsatri Rajabhat University
 
ของไหล 2
ของไหล 2ของไหล 2
ของไหล 2luanrit
 
รวมสูตรฟิสิกส์ ม.6
รวมสูตรฟิสิกส์ ม.6รวมสูตรฟิสิกส์ ม.6
รวมสูตรฟิสิกส์ ม.6Mu PPu
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Hossein Abedi
 

What's hot (20)

Pumping lemma (1)
Pumping lemma (1)Pumping lemma (1)
Pumping lemma (1)
 
Session-Based Recommender Systems
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Aspect Extraction Performance With Common Pattern of Dependency Relation in ...
Aspect Extraction Performance With Common Pattern of  Dependency Relation in ...Aspect Extraction Performance With Common Pattern of  Dependency Relation in ...
Aspect Extraction Performance With Common Pattern of Dependency Relation in ...
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Probabilistic approach to reliable localization
Probabilistic approach to reliable localizationProbabilistic approach to reliable localization
Probabilistic approach to reliable localization
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Absa project
Absa projectAbsa project
Absa project
 
A short history of MCMC
A short history of MCMCA short history of MCMC
A short history of MCMC
 
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)
บทที่ 6 การไหลในทางน้ำเปิด Open Channel Flow + คลิป (Fluid Mechanics)
 
El analisis de sentimientos
El analisis de sentimientosEl analisis de sentimientos
El analisis de sentimientos
 
บทที่ 2 การเคลื่อนที่แนวตรง
บทที่ 2 การเคลื่อนที่แนวตรงบทที่ 2 การเคลื่อนที่แนวตรง
บทที่ 2 การเคลื่อนที่แนวตรง
 
ของไหล 2
ของไหล 2ของไหล 2
ของไหล 2
 
รวมสูตรฟิสิกส์ ม.6
รวมสูตรฟิสิกส์ ม.6รวมสูตรฟิสิกส์ ม.6
รวมสูตรฟิสิกส์ ม.6
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
 

Similar to Expectation propagation

ガウス過程入門
ガウス過程入門ガウス過程入門
ガウス過程入門ShoShimoyama
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Kernel estimation(ref)
Kernel estimation(ref)Kernel estimation(ref)
Kernel estimation(ref)Zahra Amini
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaAlexander Litvinenko
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
proposal_pura
proposal_puraproposal_pura
proposal_puraErick Lin
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 

Similar to Expectation propagation (20)

ガウス過程入門
ガウス過程入門ガウス過程入門
ガウス過程入門
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Kernel estimation(ref)
Kernel estimation(ref)Kernel estimation(ref)
Kernel estimation(ref)
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
PhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-SeneviratnePhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-Seneviratne
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 

More from Dong Guo

Convex optimization methods
Convex optimization methodsConvex optimization methods
Convex optimization methodsDong Guo
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zeroDong Guo
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
机器学习概述
机器学习概述机器学习概述
机器学习概述Dong Guo
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting treeDong Guo
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic RegressionDong Guo
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionDong Guo
 

More from Dong Guo (8)

Convex optimization methods
Convex optimization methodsConvex optimization methods
Convex optimization methods
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
机器学习概述
机器学习概述机器学习概述
机器学习概述
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Expectation propagation

  • 1. Expecta(on  Propaga(on   Theory  and  Applica(on   Dong  Guo   Research  Workshop  2013  Hulu  Internal     See  more  details  in   hEp://dongguo.me/blog/2014/01/01/expecta(on-­‐propaga(on/   hEp://dongguo.me/blog/2013/12/01/bayesian-­‐ctr-­‐predic(on-­‐for-­‐bing/      
  • 4. Bayesian  Paradigm   •  Infer  posterior  distribu(on   Prior   Posterior   Make  decision   Data   Note:  figure  of  LDA  is  from  Wikipedia,  and  the  right  figure  is  from  paper  ‘Web-­‐Scale  Bayesian   Click-­‐Through  Rate  PredicFon  for  Sponsored  Search  AdverFsing  in  MicrosoI’s  Bing  Search  Engine’    
  • 5. Bayesian  inference  methods   •  Exact  inference   –  Belief  propaga(on   •  Approximate  inference   –  Stochas(c  (sampling)   –  Determinis(c   •  Assumed  density  filtering   •  Expecta(on  propaga(on   •  Varia(onal  Bayes  
  • 6. Message  passing   •  A  form  of  communica(on  used  in  mul(ple   domains  of  computer  science   –  Parallel  compu(ng  (MPI)   –  Object-­‐oriented  programming   –  Inter-­‐process  communica(on   –  Bayesian  inference   •  A  family  of  methods  to  infer  posterior  distribu(on  
  • 7. Expecta(on  Propaga(on   •  Belongs  to  message  passing  family   •  Approximated  method  (itera(on  is  needed)     •  Very  popular  in  Bayesian  inference,  especially   in  graphic  model  
  • 8. Researchers   •  Thomas  Minka   –  EP  was  proposed  in  PhD  thesis   •  Kevin  p.  Murphy   –  Machine  Learning  A  ProbabilisFc  PerspecFve  
  • 10. Background   •  •  •  •  •  •  (Truncated)  Gaussian   Exponen(al  family   Graphic  model   Factor  graph   Belief  propaga(on   Moment  matching  
  • 11. Gaussian  and  Truncated  Gaussian   •  Gaussian  opera(on  is  a  basis  for  EP  inference   –  Gaussian  +*/  Gaussian   –  Gaussian  integral   •  Truncated  Gaussian  is  used  in  many  EP   applica(ons   •  See  details  here  
  • 12. Exponen(al  family  distribu(on   •  Very  good  summary  in  Wikipedia   q(z) = h(z)g(η )exp{η T u(z)}     •  Sufficient  sta(s(cs  of  Gaussian  distribu(on:  (x,  x^2)   •  Typical  distribu(on   Note:  above  4  figures  are  from  Wikipedia  
  • 13. Graphical  Models   •  Directed  graph  (Bayesian  Network)   x1   x2   x4   K P(x) = ∏ p(xk | pak ) k=1 x3   •  Undirected  graph  (Condi(onal   Random  Field)   x1   x2   x4   x3  
  • 14. Factor  graph   •  Express  rela(on  between  variable  nodes  explicitly   •  Rela(on  in  edge  -­‐>  factor  node   •  Hide  the  difference  of  BN  and  CRF  in  inference   •  Make  inference  more  intui(onal   x1   x2   x4   x3   x1   fa   x2   fc   x4   c   x3  
  • 16. Belief  Propaga(on  Overview   •  Exact  Bayesian  method  to  infer  marginal   distribu(on   –  ‘sum-­‐product’  message  passing   •  Key  components   –  Calculate  posterior  distribu(on  of  variable  node   –  Two  kinds  of  messages  
  • 17. Posterior  distribu(on  of  variable  node   •  Factor  graph   p(X) = ∏ Fs (s, X s ), for any variable x in the graph s∈ne( x ) p(x) = ∑ p(X) = ∑ Xx ∏ Fs (s, X s ) = X x s∈ne( x ) ∏ ∑ F (x, X ) = ∏ s s∈ne( x ) X s in which µ fs −>x (x) = ∑ Fs (x, X s ) Xs Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’   s s∈ne( x ) µ fs −>x (x)
  • 18. Message:  factor  -­‐>  variable  node   •  Factor  graph   µ fs −>x (x) = ∑ ...∑ fs (x, x1 ,..., x M ) x1 xM ∏ xm ∈ne( fs ) x µ xm −> fs (xm ), in which {x1 ,..., x M } is the set of variables on which the factor fs depends Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 19. Message:  variable  -­‐>  factor  node   •  Factor  graph   µ xm −> fs (xm ) = ∏ µ fl −>xm (xm ) l∈ne( xm ) fs Summary:  posterior  distribuFon  is  only  determined  by  factors  !!     Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 20. Whole  steps  of  BP   •  Steps  to  calculate  posterior  distribu(on  of  given  variable   node   –  Step  1:  construct  factor  graph   –  Step  2:  treat  the  variable  node  as  root,  and  ini(alize  messages   sent  from  leaf  nodes   –  Step  3:  leverage  the  message  passing  steps  recursively  un(l  the   root  node  receives  messages  from  all  of  its  neighbors   –  Step  4:  get  the  marginal  distribu(on  by  mul(plying  all  messages   sent  in   Note:  the  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 21. BP:  example   •  Infer  marginal  distribu(on  of  x_3   •  Infer  marginal  distribu(on  of  every  variables   Note:  the  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 22. Posterior  is  intractable  some(mes   •  Example   –  Infer  the  mean  of  a  Gaussian  distribu(on   p(x | θ ) = (1− w)N(x | θ , I ) + wN(x | 0,aI ) p(θ ) = N(θ | 0,bI ) –  Ad  predictor   Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 23. Distribu(on  Approxima(on   Approximate p(x) with q(x), which belongs to exponential family Such that: q(x) = h(x)g(η )exp{η T u(x)} KL( p || q) = − ∫ p(x)In q(x) dx = − ∫ p(x)Inq(x)dx + ∫ p(x)Inp(x)dx p(x) = − ∫ p(x)Ing(η )dx − ∫ p(x)η T u(x) dx + const = − Ing(η ) − η T Ε p( x ) [u(x)] + const where const terms are independent of the natural parameter η Minimize KL( p || q) by setting the gradient with repect to η to zero: => −∇Ing(η ) = Ε p( x ) [u(x)] By leveraging formula (2.226) in PRML: => E q( x ) [u(x)] = −∇Ing(η ) = Ε p( x ) [u(x)]
  • 24. Moment  matching   It's called moment matching when q(x) is Gaussian distribution then u(x) = (x, x 2 )T => ∫ q(x)x dx = ∫ p(x)x dx, and ∫ q(x)x 2 dx = ∫ p(x)x 2 dx => meanq( x ) = ∫ q(x)x dx = ∫ p(x)x dx = mean p( x ) , variance q( x ) = ∫ q(x)x 2 dx − (meanq( x ) )2 = ∫ p(x)x 2 dx − (mean p( x ) )2 = variance p( x ) •  Moments  of  a  distribu(on   k'th moment M = ∫ x f (x)dx b k a k
  • 25. EXPECTATION  PROPAGATION   =  Belief  Propaga(on  +  Moment  matching?  
  • 26. Key  Idea   •  Approximate  each  factor  with  Gaussian  distribu(on   •  Approximate  corresponding  factor  pairs  one  by  one?   •  Approximate  each  factor  in  turn  in  the  context  of  all   remaining  factors  (Proposed  by  Minka)   refine factor  (θ ) by ensuring q new (θ ) ∝  (θ )q j (θ ) is close with f j (θ )q j (θ ) fj fj q(θ ) in which q (θ ) =  f j (θ ) j
  • 27. EP:  The  detail  steps       1.Initialize all of the approximating factors i (θ ) f 2.Initialize the posterior approximation by setting : q(θ ) ∝ ∏ i (θ ) f i 3.Until convergence : (a). Choose a fator  (θ ) to refine. fj q(θ ) (b). Remove  (θ ) from the posterior by division : q j (θ ) =  fj f j (θ ) (c). Get the new posterior by settting sufficient statistics of q new f j (θ )q j (θ ) (θ ) equal to those of zj f j (θ )q j (θ ) new 1 (minimize KL( || q (θ ))),in which z j = ∫ f j (θ )q j (θ )dθ , and q new (θ ) = j (θ )q j (θ ) f zj k new  (θ ) :  (θ ) = k q (θ ) (d). Get the refined factor f j fj q j (θ )
  • 28. Example:  The  cluEer  problem   •  Infer  the  mean  of  a  Gaussian  distribu(on   •  Want  to  try  MLE,  but   p(x | θ ) = (1− w)N(x | θ , I ) + wN(x | 0,aI ) p(θ ) = N(θ | 0,bI ) •  Approximate  with   q(θ ) = N(θ | m,vI ), and each factor  (θ ) = N(θ | mn ,vn I ) fn –  Approximate  mixture  Gaussian  using  Gaussian   Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 29. Example:  The  cluEer  problem(2)   •  Approximate  complex  factor(e.g.  mixture   Gaussian)  with  Gaussian   fn (θ ) in blue,  (θ ) in red, and q n (θ ) in green fn Remember variance of q n (θ ) is usually very small, so  (θ ) only need to approximate fn (θ ) in small range fn Note:  above  2  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  
  • 30. Applica(on:  Bayesian  CTR  predictor  for  Bing   •  See  the  details  here   –  Inference  step  by  step   –  Make  predic(on   •  Some  insights   –  Variance  of  each  feature  increases  aker  every   exposure   –  Sample  with  more  features  will  have  bigger  variance   •  Independent  assump(on  for  features  
  • 31. Experimenta(on   •  Dataset  is  very  Inhomogeneous     •  Performance     Model   FTRL   OWLQN   Ad  predictor   AUC   0.638   0.641   0.639     –  Other  metrics   •  Pros:  speed,  parameter  choice  cost,  online  learning  support,   interpreta(ve,  support  add  more  factors   •  Cons:  sparse   •  Code  
  • 32. Application: XBOX skill rating system •      See  details  in  P793~798  of  Machine  Learning  A  ProbabilisFc  PerspecFve       Note:  the  figure  is  from  paper:  ‘TrueSkill:  A  Bayesian  Skill  RaFng  System’    
  • 33. Apply  to  all  Bayesian  models   •  Infer.net  (Microsok/Bishop)   –  A  framework  for  running  Bayesian  inference  in   graphical  models     –  Model-­‐based  machine  learning    
  • 34. References   •  Books   –  Chapter  2/8/10  of  PaMern  RecogniFon  and  Machine  Learning   –  Chapter  22  of  Machine  Learning:  A  ProbabilisFc  PerspecFve   •  Papers   –  –  –  –  A  family  of  algorithms  for  approximate  Bayesian  inference   From  belief  propagaFon  to  expectaFon  propagaFon   TrueSkill:  A  Bayesian  Skill  RaFng  System   Web-­‐Scale  Bayesian  Click-­‐Through  Rate  PredicFon  for  Sponsored   Search  AdverFsing  in  MicrosoI’s  Bing  Search  Engine   •  Roadmap  for  EP