07 cv mil_modeling_complex_densities

375 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
375
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

07 cv mil_modeling_complex_densities

  1. 1. Computer vision: models, learning and inference Chapter 7 Modeling complex densities Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Face DetectionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Type 3: Pr(x|w) - GenerativeHow to model Pr(x|w)? – Choose an appropriate form for Pr(x) – Make parameters a function of w – Function takes parameters q that define its shapeLearning algorithm: learn parameters q from training data x,wInference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Classification ModelOr writing in terms of class conditional density functionsParameters m0, S0 learnt just from data S0 where w=0 Similarly, parameters m1, S1 learnt just from data S1 where w=1 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Experiment1000 non-faces1000 faces60x60x3 Images =10800 x1 vectorsEqual priors Pr(y=1)=Pr(y=0) = 0.575% performance on test set. Notvery good! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Results (diagonal covariance) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. Hidden (or latent) VariablesKey idea: represent density Pr(x) as marginalization of jointdensity with another variable h that we do not seeWill also depend on some parameters: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Hidden (or latent) Variables Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. Expectation MaximizationAn algorithm specialized to fitting pdfs which are themarginalization of a joint distributionDefines a lower bound on log likelihood and increases bounditeratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Lower boundLower bound is a function of parameters q and a set ofprobability distributions qi(hi)Expectation Maximization (EM) algorithm alternates E-steps and M-StepsE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. Lower boundComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions qi(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. E-Step & M-StepComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. Mixture of Gaussians (MoG) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. ML LearningQ. Can we learn this model with maximum likelihood?A. Yes, but using brute force approach is tricky • If you take derivative and set to zero, can’t solve -- the log of the sum causes problems • Have to enforce constraints on parameters • covariances must be positive definite • weights must sum to one Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. MoG as a marginalizationDefine a variable and then writeThen we can recover the density by marginalizing Pr(x,h) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. MoG as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. MoG as a marginalizationDefine a variable and then writeNote : • This gives us a method to generate data from MoG • First sample Pr(h), then sample Pr(x|h) • The hidden variable h has a clear interpretation – it tells you which Gaussian created data point x Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Expectation Maximization for MoGGOAL: to learn parameters from training dataE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. E-Step We’ll call this the responsibility of the kth Gaussian for the ith data point Repeat this procedure for every datapoint!Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. M-StepTake derivative, equate to zero and solve (Lagrange multipliers for l) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  28. 28. M-StepUpdate means , covariances and weights according to responsibilities of datapoints Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  29. 29. Iterate until no further improvement E-Step M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Different flavours... Full Diagonal Samecovariance covariance covariance Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Local MinimaStart from three random positions L = 98.76 L = 96.97 L = 94.35 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. Means of face/non-face modelClassification  84% (9% improvement!) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. Student t-distributionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. Student t-distributions motivationThe normal distribution is not very robust – a single outlier cancompletely throw it off because the tails fall off so fast... Normal distribution Normal distribution t-distribution w/ one extra datapoint! Normal distribution Normal distribution t-distribution w/ one extra datapoint! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Student t-distributionsUnivariate student t-distributionMultivariate student t-distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. t-distribution as a marginalizationDefine hidden variable hCan be expressed as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Gamma distributionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. t-distribution as a marginalizationDefine hidden variable hThings to note:• Again this provides a method to sample from the t-distribution• Variable h has a clear interpretation: • Each datum drawn from a Gaussian, mean m • Covariance depends inversely on h• Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. t-distributionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. EM for t-distributionsGOAL: to learn parameters from training dataE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. E-StepExtract expectations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. M-StepWhere... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  46. 46. UpdatesNo closed form solution for n. Must optimize bound – since itis only one number we can optimize by just evaluating with afine grid of values and picking the best Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
  47. 47. EM algorithm for t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
  48. 48. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
  49. 49. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
  50. 50. Factor AnalysisCompromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance)Models full covariance in a subspace FMops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
  51. 51. Data DensityConsider modelling distribution of 100x100x3 RGB ImagesGaussian w/ spherical covariance: 1 covariance matrix parametersGaussian w/ diagonal covariance: Dx covariance matrix parametersFull Gaussian: ~Dx2 covariance matrix parametersPPCA: ~Dx Dh covariance matrix parametersFactor Analysis: ~Dx (Dh+1) covariance matrix parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51 (full covariance in subspace + diagonal)
  52. 52. SubspacesComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
  53. 53. Factor AnalysisCompromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance)Models full covariance in a subspace FMops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
  54. 54. Factor Analysis as a MarginalizationLet’s defineThen it can be shown (not obvious) that Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
  55. 55. Sampling from factor analyzer Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
  56. 56. Factor analysis vs. MoGComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
  57. 57. E-Step• Compute posterior over hidden variables• Compute expectations needed for M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
  58. 58. E-StepComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
  59. 59. M-Step• Optimize parameters w.r.t. EM boundwhere.... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
  60. 60. M-Step• Optimize parameters w.r.t. EM boundwhere...giving expressions: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
  61. 61. Face modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
  62. 62. Sampling from 10 parameter model To generate: • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • (should add random noise component ei w/ diagonal cov S) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 62
  63. 63. Combining ModelsCan combine three models in various combinations• Mixture of t-distributions = mixture of Gaussians + t-distributions• Robust subspace models = t-distributions + factor analysis• Mixture of factor analyizers = mixture of Gaussians + factor analysisOr combine all models to create mixture of robust subspacemodels Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
  64. 64. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
  65. 65. Expectation MaximizationProblem: Optimize cost functions of the form Discrete case Continuous caseSolution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)Key idea: Define lower bound on log-likelihood and increase at each iteration Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 65
  66. 66. Expectation MaximizationDefines a lower bound on log likelihood and increases bounditeratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 66
  67. 67. Lower boundLower bound is a function of parameters q and a set ofprobability distributions {qi(hi)} Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 67
  68. 68. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions {qi(hi)}M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 68
  69. 69. E-Step: Update {qi[hi]} so that bound equals log likelihood for this q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 69
  70. 70. M-Step: Update q to maximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 70
  71. 71. Missing Parts of Argument Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 71
  72. 72. Missing Parts of Argument1. Show that this is a lower bound2. Show that the E-Step update is correct3. Show that the M-Step update is correct Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 72
  73. 73. Jensen’s Inequality or… similarlyComputer vision: models, learning and inference. ©2011 Simon 73 J.D. Prince
  74. 74. Lower Bound for EM AlgorithmWhere we have used Jensen’s inequality Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 74
  75. 75. E-Step – Optimize bound w.r.t {qi(hi)} Constant w.r.t. q(h) Only this term matters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 75
  76. 76. E-Step Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance)Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 76
  77. 77. Use this relation log[y]<y-1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 77
  78. 78. Kullback Leibler DivergenceSo the cost function must be positiveIn other words, the best we can do is choose qi(hi) so that this iszero Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 78
  79. 79. E-StepSo the cost function must be positiveThe best we can do is choose qi(hi) so that this is zero.How can we do this? Easy – choose posterior Pr(h|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 79
  80. 80. M-Step – Optimize bound w.r.t. qIn the M-Step we optimize expected joint log likelihood withrespect to parameters q (Expectation w.r.t distribution from E-Step) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 80
  81. 81. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions qi(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 81
  82. 82. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 82
  83. 83. Face Detection(not a very realistic model – face detection really done using discriminative methods) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 83
  84. 84. Object recognition Model as J independent patchest-distributions normal distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 84
  85. 85. SegmentationComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 85
  86. 86. Face RecognitionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 86
  87. 87. Pose RegressionTraining data Learning InferenceFrontal faces x Fit factor analysis model to Given new x, infer wNon-frontal faces w joint distribution of x,w (cond. dist of normal) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 87
  88. 88. Conditional DistributionsIfthen Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 88
  89. 89. Modeling transformations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 89

×