Like this presentation? Why not share!

# Cristopher M. Bishop's tutorial on graphical models

## by butest on Apr 26, 2010

• 369 views

### Views

Total Views
369
Views on SlideShare
369
Embed Views
0

Likes
0
1
0

No embeds

### Accessibility

Uploaded via SlideShare as Microsoft PowerPoint

## Cristopher M. Bishop's tutorial on graphical modelsPresentation Transcript

• Part 1: Graphical Models Machine Learning Techniques for Computer Vision Microsoft Research Cambridge ECCV 2004, Prague Christopher M. Bishop
• Learning is the new frontier in computer vision
• Focus on concepts
• not lists of algorithms
• not technical details
• Overview
• Part 1: Graphical models
• directed and undirected graphs
• inference and learning
• Part 2: Unsupervised learning
• mixture models, EM
• variational inference, model complexity
• continuous latent variables
• Part 3: Supervised learning
• decision theory
• linear models, neural networks,
• boosting, sparse kernel machines
• Probability Theory
• Sum rule
• Product rule
• From these we have Bayes’ theorem
• with normalization
• Role of the Graphs
• New insights into existing models
• Motivation for new models
• Graph based algorithms for calculation and computation
• c.f. Feynman diagrams in physics
• Decomposition
• Consider an arbitrary joint distribution
• By successive application of the product rule
• Directed Acyclic Graphs
• Joint distribution where denotes the parents of i
No directed cycles
• Undirected Graphs
• Provided then joint distribution is product of non-negative functions over the cliques of the graph where are the clique potentials, and Z is a normalization constant
• Conditioning on Evidence
• Variables may be hidden (latent) or visible (observed)
• Latent variables may have a specific interpretation, or may be introduced to permit a richer class of distribution
• Conditional Independences
• x independent of y given z if, for all values of z ,
• For undirected graphs this is given by graph separation!
• “Explaining Away”
• C.I. for directed graphs similar, but with one subtlety
• Illustration: pixel colour in an image
image colour surface colour lighting colour
• Directed versus Undirected
• Example: State Space Models
• Hidden Markov model
• Kalman filter
• Example: Bayesian SSM
• Example: Factorial SSM
• Multiple hidden sequences
• Avoid exponentially large hidden space
• Example: Markov Random Field
• Typical application: image region labelling
• Example: Conditional Random Field
• Inference
• Simple example: Bayes’ theorem
• Message Passing
• Example
• Find marginal for a particular node
• for M -state nodes, cost is
• exponential in length of chain
• but, we can exploit the graphical structure (conditional independences)
• Message Passing
• Joint distribution
• Exchange sums and products
• Message Passing
• Express as product of messages
• Recursive evaluation of messages
• Find Z by normalizing
• Belief Propagation
• Extension to general tree-structured graphs
• At each node:
• form product of incoming messages and local evidence
• marginalize to give outgoing message
• one message in each direction across every link
• Fails if there are loops
• Junction Tree Algorithm
• An efficient exact algorithm for a general graph
• applies to both directed and undirected graphs
• compile original graph into a tree of cliques
• then perform message passing on this tree
• Problem:
• cost is exponential in size of largest clique
• many vision models have intractably large cliques
• Loopy Belief Propagation
• Apply belief propagation directly to general graph
• need to keep iterating
• might not converge
• State-of-the-art performance in error-correcting codes
• Max-product Algorithm
• Goal: find
• define
• then
• Message passing algorithm with “sum” replaced by “max”
• Example:
• Viterbi algorithm for HMMs
• Inference and Learning
• Data set
• Likelihood function (independent observations)
• Maximize (log) likelihood
• Predictive distribution
• Regularized Maximum Likelihood
• Prior , posterior
• MAP (maximum posterior)
• Predictive distribution
• Not really Bayesian
• Bayesian Learning
• Key idea is to marginalize over unknown parameters, rather than make point estimates
• avoids severe over-fitting of ML and MAP
• allows direct model comparison
• Parameters are now latent variables
• Bayesian learning is an inference problem!
• Bayesian Learning
• Bayesian Learning
• And Finally … the Exponential Family
• Many distributions can be written in the form
• Includes:
• Gaussian
• Dirichlet
• Gamma
• Multi-nomial
• Wishart
• Bernoulli
• Building blocks in graphs to give rich probabilistic models
• Illustration: the Gaussian
• Use precision (inverse variance)
• In standard form
• Maximum Likelihood
• Likelihood function (independent observations)
• Depends on data via sufficient statistics of fixed dimension
• Conjugate Priors
• Prior has same functional form as likelihood
• Hence posterior is of the form
• Can interpret prior as effective observations of value
• Examples:
• Gaussian for the mean of a Gaussian
• Gaussian-Wishart for mean and precision of Gaussian
• Dirichlet for the parameters of a discrete distribution
• Summary of Part 1
• Directed graphs
• Undirected graphs
• Inference by message passing: belief propagation