Upcoming SlideShare
×

# 07 cv mil_modeling_complex_densities

375 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
375
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
6
0
Likes
0
Embeds 0
No embeds

No notes for slide

### 07 cv mil_modeling_complex_densities

1. 1. Computer vision: models, learning and inference Chapter 7 Modeling complex densities Please send errata to s.prince@cs.ucl.ac.uk
2. 2. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
3. 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
4. 4. Face DetectionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
5. 5. Type 3: Pr(x|w) - GenerativeHow to model Pr(x|w)? – Choose an appropriate form for Pr(x) – Make parameters a function of w – Function takes parameters q that define its shapeLearning algorithm: learn parameters q from training data x,wInference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
6. 6. Classification ModelOr writing in terms of class conditional density functionsParameters m0, S0 learnt just from data S0 where w=0 Similarly, parameters m1, S1 learnt just from data S1 where w=1 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
7. 7. Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
8. 8. Experiment1000 non-faces1000 faces60x60x3 Images =10800 x1 vectorsEqual priors Pr(y=1)=Pr(y=0) = 0.575% performance on test set. Notvery good! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
9. 9. Results (diagonal covariance) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
10. 10. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
11. 11. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
12. 12. Hidden (or latent) VariablesKey idea: represent density Pr(x) as marginalization of jointdensity with another variable h that we do not seeWill also depend on some parameters: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
13. 13. Hidden (or latent) Variables Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
14. 14. Expectation MaximizationAn algorithm specialized to fitting pdfs which are themarginalization of a joint distributionDefines a lower bound on log likelihood and increases bounditeratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
15. 15. Lower boundLower bound is a function of parameters q and a set ofprobability distributions qi(hi)Expectation Maximization (EM) algorithm alternates E-steps and M-StepsE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
16. 16. Lower boundComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
17. 17. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions qi(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
18. 18. E-Step & M-StepComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
19. 19. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
20. 20. Mixture of Gaussians (MoG) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
21. 21. ML LearningQ. Can we learn this model with maximum likelihood?A. Yes, but using brute force approach is tricky • If you take derivative and set to zero, can’t solve -- the log of the sum causes problems • Have to enforce constraints on parameters • covariances must be positive definite • weights must sum to one Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
22. 22. MoG as a marginalizationDefine a variable and then writeThen we can recover the density by marginalizing Pr(x,h) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
23. 23. MoG as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
24. 24. MoG as a marginalizationDefine a variable and then writeNote : • This gives us a method to generate data from MoG • First sample Pr(h), then sample Pr(x|h) • The hidden variable h has a clear interpretation – it tells you which Gaussian created data point x Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
25. 25. Expectation Maximization for MoGGOAL: to learn parameters from training dataE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
26. 26. E-Step We’ll call this the responsibility of the kth Gaussian for the ith data point Repeat this procedure for every datapoint!Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
27. 27. M-StepTake derivative, equate to zero and solve (Lagrange multipliers for l) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
28. 28. M-StepUpdate means , covariances and weights according to responsibilities of datapoints Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
29. 29. Iterate until no further improvement E-Step M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
30. 30. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
31. 31. Different flavours... Full Diagonal Samecovariance covariance covariance Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
32. 32. Local MinimaStart from three random positions L = 98.76 L = 96.97 L = 94.35 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
33. 33. Means of face/non-face modelClassification  84% (9% improvement!) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
34. 34. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
35. 35. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
36. 36. Student t-distributionsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
37. 37. Student t-distributions motivationThe normal distribution is not very robust – a single outlier cancompletely throw it off because the tails fall off so fast... Normal distribution Normal distribution t-distribution w/ one extra datapoint! Normal distribution Normal distribution t-distribution w/ one extra datapoint! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
38. 38. Student t-distributionsUnivariate student t-distributionMultivariate student t-distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
39. 39. t-distribution as a marginalizationDefine hidden variable hCan be expressed as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
40. 40. Gamma distributionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
41. 41. t-distribution as a marginalizationDefine hidden variable hThings to note:• Again this provides a method to sample from the t-distribution• Variable h has a clear interpretation: • Each datum drawn from a Gaussian, mean m • Covariance depends inversely on h• Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
42. 42. t-distributionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
43. 43. EM for t-distributionsGOAL: to learn parameters from training dataE-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
44. 44. E-StepExtract expectations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
45. 45. M-StepWhere... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
46. 46. UpdatesNo closed form solution for n. Must optimize bound – since itis only one number we can optimize by just evaluating with afine grid of values and picking the best Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
47. 47. EM algorithm for t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
48. 48. The plan...Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
49. 49. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
50. 50. Factor AnalysisCompromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance)Models full covariance in a subspace FMops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
51. 51. Data DensityConsider modelling distribution of 100x100x3 RGB ImagesGaussian w/ spherical covariance: 1 covariance matrix parametersGaussian w/ diagonal covariance: Dx covariance matrix parametersFull Gaussian: ~Dx2 covariance matrix parametersPPCA: ~Dx Dh covariance matrix parametersFactor Analysis: ~Dx (Dh+1) covariance matrix parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51 (full covariance in subspace + diagonal)
52. 52. SubspacesComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
53. 53. Factor AnalysisCompromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance)Models full covariance in a subspace FMops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
54. 54. Factor Analysis as a MarginalizationLet’s defineThen it can be shown (not obvious) that Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
55. 55. Sampling from factor analyzer Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
56. 56. Factor analysis vs. MoGComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
57. 57. E-Step• Compute posterior over hidden variables• Compute expectations needed for M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
58. 58. E-StepComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
59. 59. M-Step• Optimize parameters w.r.t. EM boundwhere.... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
60. 60. M-Step• Optimize parameters w.r.t. EM boundwhere...giving expressions: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
61. 61. Face modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
62. 62. Sampling from 10 parameter model To generate: • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • (should add random noise component ei w/ diagonal cov S) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 62
63. 63. Combining ModelsCan combine three models in various combinations• Mixture of t-distributions = mixture of Gaussians + t-distributions• Robust subspace models = t-distributions + factor analysis• Mixture of factor analyizers = mixture of Gaussians + factor analysisOr combine all models to create mixture of robust subspacemodels Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
64. 64. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
65. 65. Expectation MaximizationProblem: Optimize cost functions of the form Discrete case Continuous caseSolution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)Key idea: Define lower bound on log-likelihood and increase at each iteration Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 65
66. 66. Expectation MaximizationDefines a lower bound on log likelihood and increases bounditeratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 66
67. 67. Lower boundLower bound is a function of parameters q and a set ofprobability distributions {qi(hi)} Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 67
68. 68. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions {qi(hi)}M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 68
69. 69. E-Step: Update {qi[hi]} so that bound equals log likelihood for this q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 69
70. 70. M-Step: Update q to maximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 70
71. 71. Missing Parts of Argument Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 71
72. 72. Missing Parts of Argument1. Show that this is a lower bound2. Show that the E-Step update is correct3. Show that the M-Step update is correct Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 72
73. 73. Jensen’s Inequality or… similarlyComputer vision: models, learning and inference. ©2011 Simon 73 J.D. Prince
74. 74. Lower Bound for EM AlgorithmWhere we have used Jensen’s inequality Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 74
75. 75. E-Step – Optimize bound w.r.t {qi(hi)} Constant w.r.t. q(h) Only this term matters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 75
76. 76. E-Step Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance)Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 76
77. 77. Use this relation log[y]<y-1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 77
78. 78. Kullback Leibler DivergenceSo the cost function must be positiveIn other words, the best we can do is choose qi(hi) so that this iszero Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 78
79. 79. E-StepSo the cost function must be positiveThe best we can do is choose qi(hi) so that this is zero.How can we do this? Easy – choose posterior Pr(h|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 79
80. 80. M-Step – Optimize bound w.r.t. qIn the M-Step we optimize expected joint log likelihood withrespect to parameters q (Expectation w.r.t distribution from E-Step) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 80
81. 81. E-Step & M-StepE-Step – Maximize bound w.r.t. distributions qi(hi)M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 81
82. 82. Structure• Densities for classification• Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis• EM algorithm in detail• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 82
83. 83. Face Detection(not a very realistic model – face detection really done using discriminative methods) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 83
84. 84. Object recognition Model as J independent patchest-distributions normal distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 84
85. 85. SegmentationComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 85
86. 86. Face RecognitionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 86
87. 87. Pose RegressionTraining data Learning InferenceFrontal faces x Fit factor analysis model to Given new x, infer wNon-frontal faces w joint distribution of x,w (cond. dist of normal) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 87
88. 88. Conditional DistributionsIfthen Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 88
89. 89. Modeling transformations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 89