Expectation Maximization and Gaussian Mixture Models

28,211 views
27,722 views

Published on

Published in: Technology, Education
13 Comments
27 Likes
Statistics
Notes
No Downloads
Views
Total views
28,211
On SlideShare
0
From Embeds
0
Number of Embeds
274
Actions
Shares
0
Downloads
2,815
Comments
13
Likes
27
Embeds 0
No embeds

No notes for slide

Expectation Maximization and Gaussian Mixture Models

  1. 1. Expectation Maximization and Mixture of Gaussians 1
  2. 2. (bpm 125)  Recommend me Bpm some music! 90!  Discover groups of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection 2
  3. 3. (bpm 125)  Recommend me some music! bpm  Discover groups 120 of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection bpm 60 3
  4. 4. An unsupervised classifying method 4
  5. 5. 1.  Initialize K “means” µk , one for each class µ1   Eg. Use random starting points, or € choose k random € µ2 points from the set €K=2 5
  6. 6. 1 0 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 6
  7. 7. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 0 1 7
  8. 8. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 8
  9. 9. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 9
  10. 10. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 10
  11. 11. 0 1 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 11
  12. 12. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 12
  13. 13. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 13
  14. 14. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 14
  15. 15. 4.  When means do not change anymore  clustering DONE. 15
  16. 16.  InK-means, a point can only have 1 class  But what about points that lie in between groups? eg. Jazz + Classical 16
  17. 17. The Famous “GMM”: Gaussian Mixture Model 17
  18. 18. Mean p(X) = N(X | µ,Σ) Variance Gaussian == “Normal” distribution 18
  19. 19. p(X) = N(X | µ,Σ) + N(X | µ,Σ) 19
  20. 20. p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 ) Example: Variance 20
  21. 21. p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 ) k Example: Mixing Coefficient ∑π k =1 k=1 € π 1 = 0.7 π 2 = 0.3 21
  22. 22. K p(X) = ∑ π k N(X | µk ,Σ k ) k=1 Example: K =2 € € 22
  23. 23.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 23
  24. 24.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 24
  25. 25.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier Parameter to fit to data: Parameters to fit to data: • Mean µk • Mean µk • Covariance Σ k • Mixing coefficient π k € € 25 €
  26. 26. EM for GMM 26
  27. 27. 1.  Initialize means µk 1 0 2.  E Step: Assign each point to a cluster 3.  M Step: Given clusters, refine mean µk of each cluster k 4.  Stop when change in means is small € € 27
  28. 28. 1.  Initialize Gaussian* parameters: means µk , covariances Σ k and mixing coefficients π k 2.  E Step: Assign each point Xn an assignment score γ (znk ) for each cluster k 0.5 0.5 3.  M Step: Given scores, adjust µk ,€ k ,Σ k π for€each cluster k € 4.  Evaluate € likelihood. If likelihood or parameters converge, stop. € € € *There are k Gaussians 28
  29. 29. 1.  Initialize µk , Σk π k , one for each Gaussian k € π2 Σ2   Tip! Use K-means € € result to initialize: µ2 µk ← µk Σk ← cov(cluster(K)) € € π k ← Number of pointspoints in k € Total number of 29 €
  30. 30. Latent variable 2.  E Step: For each .7 .3 point Xn, determine its assignment score to each Gaussian k: is called a “responsibility”: how much is this Gaussian k γ (znk ) responsible for this point Xn? 30
  31. 31. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Responsibility for this Xn Mean of Gaussian k € Find the mean that “fits” the assignment scores best 31
  32. 32. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Covariance matrix € of Gaussian k Just calculated this! 32
  33. 33. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Mixing Coefficient € eg. 105.6/200 for Gaussian k Total # of points 33
  34. 34. 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step). Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness! 34
  35. 35. 35
  36. 36. old Hidden 1.  Initialize parameters θ variables old 2.  E Step: Evaluate p(Z | X,θ ) 3.  M Step: Evaluate Observed variables € € Likelihood where 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else θ old ← θ new and go to E Step. 36
  37. 37.  K-means can be formulated as EM  EM for Gaussian Mixtures  EM for Bernoulli Mixtures  EM for Bayesian Linear Regression 37
  38. 38.  “Expectation” Calculated the fixed, data-dependent parameters of the function Q.  “Maximization” Once the parameters of Q are known, it is fully determined, so now we can maximize Q. 38
  39. 39.  We learned how to cluster data in an unsupervised manner  Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments  Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5 39
  40. 40.  Myquestion: What other applications could use EM? How about EM of GMMs? 40

×