0
Computer vision: models, learning and inference           Chapter 10        Graphical Models   Please send errata to s.pri...
Independence•   Two variables x1 and x2 are independent if their    joint probability distribution factorizes as    Pr(x1,...
Conditional independence•   The variable x1 is said to be conditionally    independent of x3 given x2 when x1 and x3 are  ...
Conditional independence•   Consider joint pdf of three discrete variables x1, x2, x3                  Computer vision: mo...
Conditional independence•   Consider joint pdf of three discrete variables x1, x2, x3     • The three marginal distributio...
Conditional independence•   Consider joint pdf of three discrete variables x1, x2, x3     • The three marginal distributio...
Graphical models• A graphical model is a graph based  representation that makes both factorization  and conditional indepe...
Directed graphical models• Directed graphical model represents probability  distribution that factorizes as a product of  ...
Directed graphical models• To visualize graphical model from factorization   – add one node per random variable and draw a...
Example 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   10
Example 1= Markov Blanket of variable x8 – Parents, children     and parents of children     Computer vision: models, lear...
Example 1If there is no route between to variables and theyshare no ancestors they are independent.       Computer vision:...
Example 1A variable is conditionally independent of all othersgiven its Markov Blanket         Computer vision: models, le...
Example 1General rule:                Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   14
Example 2This joint pdf of this graphical model factorizes as:            Computer vision: models, learning and inference....
Example 2This joint pdf of this graphical model factorizes as:It describes the original example:            Computer visio...
Example 2 Here the arrows meet head to tail at x2 and so x1 is conditionally independent of x3 given x2.General rule:     ...
Example 2Algebraic proof: No dependence on x3 implies that x1 is conditionally independent of x3 given x2.           Compu...
Redundancy                        Conditional independence can be                        thought of as redundancy in the f...
Example 3      Mixture of                       t-distribution                                Factor analyzer      Gaussia...
Undirected graphical modelsProbability distribution factorizes as:   Partition                    Product over            ...
Undirected graphical modelsProbability distribution factorizes as:   Partition   function(normalization                  F...
Alternative formCan be written as Gibbs Distribution:where                                                         Cost fu...
CliquesBetter to write undirected model as                              Product over                              Clique  ...
Undirected graphical models• To visualize graphical model from factorization   – Draw one node per random variable   – For...
Conditional independence• Much simpler than for directed models:One set of nodes is conditionally independent of another g...
Example 1Represents factorization:          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   27
Example 1By inspection of graphical model:x1 is conditionally independent of x3 given x2 as theroute from x1 to x3 is bloc...
Example 1Algebraically:No dependence on x3 implies that x1 is conditionallyindependent of x3 given x2.           Computer ...
Example 2• Variables x1 and x2 form a clique (both connected to  each other)• But not a maximal clique as we can add x3 an...
Example 2Graphical model implies factorization:           Computer vision: models, learning and inference. ©2011 Simon J.D...
Example 2Or could be....                                                  ... but this is less general           Computer ...
Comparing directed and undirected modelsExecutive summary:•   Some conditional independence patterns can be    represented...
Comparing directed and undirected modelsThese models represents same independence /                             There is n...
Comparing directed and undirected models   There is no directed model that                                 Closest example...
Graphical models in computer vision     Chain model                                       Interpreting sign(hidden Markov ...
Graphical models in computer vision  Tree model                                       Parsing the human                   ...
Graphical models in computer vision    Grid model                                                SemanticMarkov random fie...
Graphical models in computer vision Chain model                                          Tracking contours Kalman filter  ...
Inference in models with many               unknowns• Ideally we would compute full posterior  distribution Pr(w1...N|x1.....
Finding MAP solution• Still difficult to compute – must search  through very large number of states to find  the best one....
Marginal posterior distributions• Compute one distribution for each variable wn.• Obviously cannot be computed by computin...
Maximum marginals• Maximum of marginal posterior distribution for  each variable wn.• May have probability zero; the state...
Sampling the posterior• Draw samples from posterior Pr(w1...N|x1...N).  – use samples as representation of distribution  –...
Drawing samples - directedTo sample from directed model, use ancestral sampling•   work through graphical model, sampling ...
Ancestral sampling example  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   46
Ancestral sampling exampleTo generate one sample:1. Sample x1* from Pr(x1)                               4. Sample x3* fro...
Drawing samples - undirected• Can’t use ancestral sampling as no sense of  parents / children and don’t have conditional  ...
Gibbs samplingTo generate new sample x in the chain  – Sample each dimension in any order  – To update nth dimension xn   ...
Gibbs sampling example: bi-variate        normal distribution     Computer vision: models, learning and inference. ©2011 S...
Gibbs sampling example: bi-variate        normal distribution     Computer vision: models, learning and inference. ©2011 S...
Learning in directed modelsUse standard ML formulationwhere xi,n is the nth dimension of the ith training example.        ...
Learning in undirected modelsWrite in form of Gibbs distributionMaximum likelihood formulation       Computer vision: mode...
Learning in undirected modelsPROBLEM: To compute first term, we must sum overall possible states. This is intractable     ...
Contrastive divergenceSome algebraic manipulation         Computer vision: models, learning and inference. ©2011 Simon J.D...
Contrastive divergence   Now approximate:Where xj* is one of J samples from the distribution.Can be computed using Gibbs s...
Contrastive divergenceComputer vision: models, learning and inference. ©2011 Simon J.D. Prince   57
ConclusionsCan characterize joint distributions as  – Graphical models  – Sets of conditional independence relations  – Fa...
Upcoming SlideShare
Loading in...5
×

10 cv mil_graphical_models

511

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
511
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "10 cv mil_graphical_models"

  1. 1. Computer vision: models, learning and inference Chapter 10 Graphical Models Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Independence• Two variables x1 and x2 are independent if their joint probability distribution factorizes as Pr(x1, x2)=Pr(x1) Pr(x2) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Conditional independence• The variable x1 is said to be conditionally independent of x3 given x2 when x1 and x3 are independent for fixed x2.• When this is true the joint density factorizes in a certain way and is hence redundant. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Conditional independence• Consider joint pdf of three discrete variables x1, x2, x3 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Conditional independence• Consider joint pdf of three discrete variables x1, x2, x3 • The three marginal distributions show that no pair of variables is independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Conditional independence• Consider joint pdf of three discrete variables x1, x2, x3 • The three marginal distributions show that no pair of variables is independent • But x1 is independent of x2 given x3 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Graphical models• A graphical model is a graph based representation that makes both factorization and conditional independence relations easy to establish• Two important types: – Directed graphical model or Bayesian network – Undirected graphical model or Markov network Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Directed graphical models• Directed graphical model represents probability distribution that factorizes as a product of conditional probability distributions where pa[n] denotes the parents of node n Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Directed graphical models• To visualize graphical model from factorization – add one node per random variable and draw arrow to each variable from each of its parents.• To extract factorization from graphical model – Add one term per node in the graph Pr(xn| xpa[n]) – If no parents then just add Pr(xn) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. Example 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. Example 1= Markov Blanket of variable x8 – Parents, children and parents of children Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. Example 1If there is no route between to variables and theyshare no ancestors they are independent. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Example 1A variable is conditionally independent of all othersgiven its Markov Blanket Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. Example 1General rule: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Example 2This joint pdf of this graphical model factorizes as: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. Example 2This joint pdf of this graphical model factorizes as:It describes the original example: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. Example 2 Here the arrows meet head to tail at x2 and so x1 is conditionally independent of x3 given x2.General rule: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. Example 2Algebraic proof: No dependence on x3 implies that x1 is conditionally independent of x3 given x2. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Redundancy Conditional independence can be thought of as redundancy in the full distribution 4 + 3x4 + 2x3 4 x 3 x 2 = 24 entries = 22 entriesRedundancy here only very small, but with larger modelscan be very significant. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. Example 3 Mixture of t-distribution Factor analyzer GaussiansBlue boxes = Plates. Interpretation: repeat contents of boxnumber of times in bottom right corner.Bullet = variables which are not treated as uncertain Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. Undirected graphical modelsProbability distribution factorizes as: Partition Product over Potential function function C functions (returns non-(normalization negative number) constant) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. Undirected graphical modelsProbability distribution factorizes as: Partition function(normalization For large systems, intractable to compute constant) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. Alternative formCan be written as Gibbs Distribution:where Cost function (positive or negative) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. CliquesBetter to write undirected model as Product over Clique cliques Subset of variables Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Undirected graphical models• To visualize graphical model from factorization – Draw one node per random variable – For every clique draw connection from every node to every other• To extract factorization from graphical model – Add one term to factorization per maximal clique (fully connected subset of nodes where it is not possible to add another node and remain fully connected) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. Conditional independence• Much simpler than for directed models:One set of nodes is conditionally independent of another given a third if the third set separates them (i.e. Blocks any path from the first node to the second) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. Example 1Represents factorization: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  28. 28. Example 1By inspection of graphical model:x1 is conditionally independent of x3 given x2 as theroute from x1 to x3 is blocked by x2. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  29. 29. Example 1Algebraically:No dependence on x3 implies that x1 is conditionallyindependent of x3 given x2. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Example 2• Variables x1 and x2 form a clique (both connected to each other)• But not a maximal clique as we can add x3 and it is connected to both Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Example 2Graphical model implies factorization: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Example 2Or could be.... ... but this is less general Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. Comparing directed and undirected modelsExecutive summary:• Some conditional independence patterns can be represented by both directed and undirected• Some can be represented only by directed• Some can be represented only by undirected• Some can be represented by neither Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. Comparing directed and undirected modelsThese models represents same independence / There is no undirected model thatconditional independence relations can describe these relations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Comparing directed and undirected models There is no directed model that Closest example, can describe these relations but not the same Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. Graphical models in computer vision Chain model Interpreting sign(hidden Markov model) language sequences Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. Graphical models in computer vision Tree model Parsing the human body Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Graphical models in computer vision Grid model SemanticMarkov random field segmentation (blue nodes) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. Graphical models in computer vision Chain model Tracking contours Kalman filter Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Inference in models with many unknowns• Ideally we would compute full posterior distribution Pr(w1...N|x1...N).• But for most models this is a very large discrete distribution – intractable to compute• Other solutions: – Find MAP solution – Find marginal posterior distributions – Maximum marginals – Sampling posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. Finding MAP solution• Still difficult to compute – must search through very large number of states to find the best one. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. Marginal posterior distributions• Compute one distribution for each variable wn.• Obviously cannot be computed by computing full distribution and explicitly marginalizing. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. Maximum marginals• Maximum of marginal posterior distribution for each variable wn.• May have probability zero; the states can be individually probably but never co-occur. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. Sampling the posterior• Draw samples from posterior Pr(w1...N|x1...N). – use samples as representation of distribution – select sample with highest prob. as point sample – compute empirical max-marginals • Look at marginal statistics of samples Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. Drawing samples - directedTo sample from directed model, use ancestral sampling• work through graphical model, sampling one variable at a time.• Always sample parents before sampling variable• Condition on previously sampled values Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  46. 46. Ancestral sampling example Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
  47. 47. Ancestral sampling exampleTo generate one sample:1. Sample x1* from Pr(x1) 4. Sample x3* from Pr(x3| x2*,x4*)2. Sample x2* from Pr(x2| x1*) 5. Sample x5* from Pr(x5| x3*)3. Sample x4* from Pr(x4| x1*, x2*) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
  48. 48. Drawing samples - undirected• Can’t use ancestral sampling as no sense of parents / children and don’t have conditional probability distributions• Instead us Markov chain Monte Carlo method – Generate series of samples (chain) – Each depends on previous sample (Markov) – Generation stochastic (Monte Carlo)• Example MCMC method = Gibbs sampling Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
  49. 49. Gibbs samplingTo generate new sample x in the chain – Sample each dimension in any order – To update nth dimension xn • Fix other N-1 dimensions • Draw from conditional distribution Pr(xn| x1...Nn)Get samples by selecting from chain – Needs burn in period – Choose samples spaced apart so not correlated Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
  50. 50. Gibbs sampling example: bi-variate normal distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
  51. 51. Gibbs sampling example: bi-variate normal distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51
  52. 52. Learning in directed modelsUse standard ML formulationwhere xi,n is the nth dimension of the ith training example. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
  53. 53. Learning in undirected modelsWrite in form of Gibbs distributionMaximum likelihood formulation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
  54. 54. Learning in undirected modelsPROBLEM: To compute first term, we must sum overall possible states. This is intractable Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
  55. 55. Contrastive divergenceSome algebraic manipulation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
  56. 56. Contrastive divergence Now approximate:Where xj* is one of J samples from the distribution.Can be computed using Gibbs sampling. In practice, it ispossible to run MCMC for just 1 iteration and still OK. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
  57. 57. Contrastive divergenceComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
  58. 58. ConclusionsCan characterize joint distributions as – Graphical models – Sets of conditional independence relations – FactorizationsTwo types of graphical model, represent different but overlapping subsets of possible conditional independence relations – Directed (learning easy, sampling easy) – Undirected (learning hard, sampling hard) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×