Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

# Expectation Maximization and Gaussian Mixture Models

38,174 views

Published on

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here
• @Wesley Kyle Generally the Covariance Matrix is always a square invertible matrix. However for matrices which are not square a psuedo-inverse can be used.

Are you sure you want to  Yes  No
Your message goes here
• Very nice, fast and simple explanation!

Are you sure you want to  Yes  No
Your message goes here
• These slides are great. Clearly explained in a very pragmatic manner. How does one deal with non-invertible covariance matrices?

Are you sure you want to  Yes  No
Your message goes here
• Awesome, thanks!

Are you sure you want to  Yes  No
Your message goes here

### Expectation Maximization and Gaussian Mixture Models

1. 1. Expectation Maximization and Mixture of Gaussians 1
2. 2. (bpm 125)  Recommend me Bpm some music! 90!  Discover groups of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection 2
3. 3. (bpm 125)  Recommend me some music! bpm  Discover groups 120 of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection bpm 60 3
4. 4. An unsupervised classifying method 4
5. 5. 1.  Initialize K “means” µk , one for each class µ1   Eg. Use random starting points, or € choose k random € µ2 points from the set €K=2 5
6. 6. 1 0 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 6
7. 7. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 0 1 7
8. 8. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 8
9. 9. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 9
10. 10. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 10
11. 11. 0 1 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 11
12. 12. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 12
13. 13. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 13
14. 14. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 14
15. 15. 4.  When means do not change anymore  clustering DONE. 15
16. 16.  InK-means, a point can only have 1 class  But what about points that lie in between groups? eg. Jazz + Classical 16
17. 17. The Famous “GMM”: Gaussian Mixture Model 17
18. 18. Mean p(X) = N(X | µ,Σ) Variance Gaussian == “Normal” distribution 18
19. 19. p(X) = N(X | µ,Σ) + N(X | µ,Σ) 19
20. 20. p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 ) Example: Variance 20
21. 21. p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 ) k Example: Mixing Coefficient ∑π k =1 k=1 € π 1 = 0.7 π 2 = 0.3 21
22. 22. K p(X) = ∑ π k N(X | µk ,Σ k ) k=1 Example: K =2 € € 22
23. 23.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 23
24. 24.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 24
25. 25.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier Parameter to fit to data: Parameters to fit to data: • Mean µk • Mean µk • Covariance Σ k • Mixing coefficient π k € € 25 €
26. 26. EM for GMM 26
27. 27. 1.  Initialize means µk 1 0 2.  E Step: Assign each point to a cluster 3.  M Step: Given clusters, refine mean µk of each cluster k 4.  Stop when change in means is small € € 27
28. 28. 1.  Initialize Gaussian* parameters: means µk , covariances Σ k and mixing coefficients π k 2.  E Step: Assign each point Xn an assignment score γ (znk ) for each cluster k 0.5 0.5 3.  M Step: Given scores, adjust µk ,€ k ,Σ k π for€each cluster k € 4.  Evaluate € likelihood. If likelihood or parameters converge, stop. € € € *There are k Gaussians 28
29. 29. 1.  Initialize µk , Σk π k , one for each Gaussian k € π2 Σ2   Tip! Use K-means € € result to initialize: µ2 µk ← µk Σk ← cov(cluster(K)) € € π k ← Number of pointspoints in k € Total number of 29 €
30. 30. Latent variable 2.  E Step: For each .7 .3 point Xn, determine its assignment score to each Gaussian k: is called a “responsibility”: how much is this Gaussian k γ (znk ) responsible for this point Xn? 30
31. 31. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Responsibility for this Xn Mean of Gaussian k € Find the mean that “fits” the assignment scores best 31
32. 32. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Covariance matrix € of Gaussian k Just calculated this! 32
33. 33. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Mixing Coefficient € eg. 105.6/200 for Gaussian k Total # of points 33
34. 34. 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step). Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness! 34
35. 35. 35
36. 36. old Hidden 1.  Initialize parameters θ variables old 2.  E Step: Evaluate p(Z | X,θ ) 3.  M Step: Evaluate Observed variables € € Likelihood where 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else θ old ← θ new and go to E Step. 36
37. 37.  K-means can be formulated as EM  EM for Gaussian Mixtures  EM for Bernoulli Mixtures  EM for Bayesian Linear Regression 37
38. 38.  “Expectation” Calculated the fixed, data-dependent parameters of the function Q.  “Maximization” Once the parameters of Q are known, it is fully determined, so now we can maximize Q. 38
39. 39.  We learned how to cluster data in an unsupervised manner  Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments  Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5 39
40. 40.  Myquestion: What other applications could use EM? How about EM of GMMs? 40