Cristopher M. Bishop's tutorial on graphical models


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cristopher M. Bishop's tutorial on graphical models

  1. 1. Part 1: Graphical Models Machine Learning Techniques for Computer Vision Microsoft Research Cambridge ECCV 2004, Prague Christopher M. Bishop
  2. 2. About this Tutorial <ul><li>Learning is the new frontier in computer vision </li></ul><ul><li>Focus on concepts </li></ul><ul><ul><li>not lists of algorithms </li></ul></ul><ul><ul><li>not technical details </li></ul></ul><ul><li>Graduate level </li></ul><ul><li>Please ask questions! </li></ul>
  3. 3. Overview <ul><li>Part 1: Graphical models </li></ul><ul><ul><li>directed and undirected graphs </li></ul></ul><ul><ul><li>inference and learning </li></ul></ul><ul><li>Part 2: Unsupervised learning </li></ul><ul><ul><li>mixture models, EM </li></ul></ul><ul><ul><li>variational inference, model complexity </li></ul></ul><ul><ul><li>continuous latent variables </li></ul></ul><ul><li>Part 3: Supervised learning </li></ul><ul><ul><li>decision theory </li></ul></ul><ul><ul><li>linear models, neural networks, </li></ul></ul><ul><ul><li>boosting, sparse kernel machines </li></ul></ul>
  4. 4. Probability Theory <ul><li>Sum rule </li></ul><ul><li>Product rule </li></ul><ul><li>From these we have Bayes’ theorem </li></ul><ul><ul><li>with normalization </li></ul></ul>
  5. 5. Role of the Graphs <ul><li>New insights into existing models </li></ul><ul><li>Motivation for new models </li></ul><ul><li>Graph based algorithms for calculation and computation </li></ul><ul><ul><li>c.f. Feynman diagrams in physics </li></ul></ul>
  6. 6. Decomposition <ul><li>Consider an arbitrary joint distribution </li></ul><ul><li>By successive application of the product rule </li></ul>
  7. 7. Directed Acyclic Graphs <ul><li>Joint distribution where denotes the parents of i </li></ul>No directed cycles
  8. 8. Undirected Graphs <ul><li>Provided then joint distribution is product of non-negative functions over the cliques of the graph where are the clique potentials, and Z is a normalization constant </li></ul>
  9. 9. Conditioning on Evidence <ul><li>Variables may be hidden (latent) or visible (observed) </li></ul><ul><li>Latent variables may have a specific interpretation, or may be introduced to permit a richer class of distribution </li></ul>
  10. 10. Conditional Independences <ul><li>x independent of y given z if, for all values of z , </li></ul><ul><li>For undirected graphs this is given by graph separation! </li></ul>
  11. 11. “Explaining Away” <ul><li>C.I. for directed graphs similar, but with one subtlety </li></ul><ul><li>Illustration: pixel colour in an image </li></ul>image colour surface colour lighting colour
  12. 12. Directed versus Undirected
  13. 13. Example: State Space Models <ul><li>Hidden Markov model </li></ul><ul><li>Kalman filter </li></ul>
  14. 14. Example: Bayesian SSM
  15. 15. Example: Factorial SSM <ul><li>Multiple hidden sequences </li></ul><ul><li>Avoid exponentially large hidden space </li></ul>
  16. 16. Example: Markov Random Field <ul><li>Typical application: image region labelling </li></ul>
  17. 17. Example: Conditional Random Field
  18. 18. Inference <ul><li>Simple example: Bayes’ theorem </li></ul>
  19. 19. Message Passing <ul><li>Example </li></ul><ul><li>Find marginal for a particular node </li></ul><ul><ul><li>for M -state nodes, cost is </li></ul></ul><ul><ul><li>exponential in length of chain </li></ul></ul><ul><ul><li>but, we can exploit the graphical structure (conditional independences) </li></ul></ul>
  20. 20. Message Passing <ul><li>Joint distribution </li></ul><ul><li>Exchange sums and products </li></ul>
  21. 21. Message Passing <ul><li>Express as product of messages </li></ul><ul><li>Recursive evaluation of messages </li></ul><ul><li>Find Z by normalizing </li></ul>
  22. 22. Belief Propagation <ul><li>Extension to general tree-structured graphs </li></ul><ul><li>At each node: </li></ul><ul><ul><li>form product of incoming messages and local evidence </li></ul></ul><ul><ul><li>marginalize to give outgoing message </li></ul></ul><ul><ul><li>one message in each direction across every link </li></ul></ul><ul><li>Fails if there are loops </li></ul>
  23. 23. Junction Tree Algorithm <ul><li>An efficient exact algorithm for a general graph </li></ul><ul><ul><li>applies to both directed and undirected graphs </li></ul></ul><ul><ul><li>compile original graph into a tree of cliques </li></ul></ul><ul><ul><li>then perform message passing on this tree </li></ul></ul><ul><li>Problem: </li></ul><ul><ul><li>cost is exponential in size of largest clique </li></ul></ul><ul><ul><li>many vision models have intractably large cliques </li></ul></ul>
  24. 24. Loopy Belief Propagation <ul><li>Apply belief propagation directly to general graph </li></ul><ul><ul><li>need to keep iterating </li></ul></ul><ul><ul><li>might not converge </li></ul></ul><ul><li>State-of-the-art performance in error-correcting codes </li></ul>
  25. 25. Max-product Algorithm <ul><li>Goal: find </li></ul><ul><ul><li>define </li></ul></ul><ul><ul><li>then </li></ul></ul><ul><li>Message passing algorithm with “sum” replaced by “max” </li></ul><ul><li>Example: </li></ul><ul><ul><li>Viterbi algorithm for HMMs </li></ul></ul>
  26. 26. Inference and Learning <ul><li>Data set </li></ul><ul><li>Likelihood function (independent observations) </li></ul><ul><li>Maximize (log) likelihood </li></ul><ul><li>Predictive distribution </li></ul>
  27. 27. Regularized Maximum Likelihood <ul><li>Prior , posterior </li></ul><ul><li>MAP (maximum posterior) </li></ul><ul><li>Predictive distribution </li></ul><ul><li>Not really Bayesian </li></ul>
  28. 28. Bayesian Learning <ul><li>Key idea is to marginalize over unknown parameters, rather than make point estimates </li></ul><ul><ul><li>avoids severe over-fitting of ML and MAP </li></ul></ul><ul><ul><li>allows direct model comparison </li></ul></ul><ul><li>Parameters are now latent variables </li></ul><ul><li>Bayesian learning is an inference problem! </li></ul>
  29. 29. Bayesian Learning
  30. 30. Bayesian Learning
  31. 31. And Finally … the Exponential Family <ul><li>Many distributions can be written in the form </li></ul><ul><li>Includes: </li></ul><ul><ul><li>Gaussian </li></ul></ul><ul><ul><li>Dirichlet </li></ul></ul><ul><ul><li>Gamma </li></ul></ul><ul><ul><li>Multi-nomial </li></ul></ul><ul><ul><li>Wishart </li></ul></ul><ul><ul><li>Bernoulli </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Building blocks in graphs to give rich probabilistic models </li></ul>
  32. 32. Illustration: the Gaussian <ul><li>Use precision (inverse variance) </li></ul><ul><li>In standard form </li></ul>
  33. 33. Maximum Likelihood <ul><li>Likelihood function (independent observations) </li></ul><ul><li>Depends on data via sufficient statistics of fixed dimension </li></ul>
  34. 34. Conjugate Priors <ul><li>Prior has same functional form as likelihood </li></ul><ul><li>Hence posterior is of the form </li></ul><ul><li>Can interpret prior as effective observations of value </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Gaussian for the mean of a Gaussian </li></ul></ul><ul><ul><li>Gaussian-Wishart for mean and precision of Gaussian </li></ul></ul><ul><ul><li>Dirichlet for the parameters of a discrete distribution </li></ul></ul>
  35. 35. Summary of Part 1 <ul><li>Directed graphs </li></ul><ul><li>Undirected graphs </li></ul><ul><li>Inference by message passing: belief propagation </li></ul>