Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Expectation Maximization and
         Mixture of Gaussians




                            1
(bpm
                                                125)
 Recommend   me
                          Bpm
  some music!    ...
(bpm
                                                 125)
 Recommend   me
  some music!
                                ...
An unsupervised classifying method




               4
1.    Initialize K
      “means” µk , one
      for each class        µ1

    Eg.  Use random
      starting points, or
 ...
1       0
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      n...
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new cluster...
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters

...
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters

...
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters

...
0        1
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      ...
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters

...
2.    Phase 1: Assign
      each point to
      closest mean µk
3.    Phase 2: Update
      means of the
      new cluster...
2.    Phase 1: Assign
      each point to
      closest mean
3.    Phase 2: Update
      means of the
      new clusters

...
4.    When means do
      not change
      anymore 
      clustering DONE.




                         15
 InK-means, a point can only have 1 class
 But what about points that lie in between
  groups? eg. Jazz + Classical




...
The Famous “GMM”:
Gaussian Mixture Model




              17
Mean

p(X) = N(X | µ,Σ)
                                   Variance


                    Gaussian ==
                    ...
p(X) = N(X | µ,Σ) + N(X | µ,Σ)




                         19
p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 )
Example:

                                      Variance




                      ...
p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 )
                                          k
Example:
                       ...
K
        p(X) = ∑ π k N(X | µk ,Σ k )
                k=1


    Example:

    K =2
€

€                                  ...
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
         ...
 K-means     is a    Mixture of
 classifier            Gaussians is a
                       probability model
         ...
 K-means      is a          Mixture of
  classifier                  Gaussians is a
                              probab...
EM for GMM




             26
1.      Initialize means    µk                          1 0
      2.    E Step: Assign each point to a cluster
      3.   ...
1.      Initialize Gaussian* parameters: means µk ,
        covariances Σ k and mixing coefficients π k
      2.    E Step...
1.    Initialize µk , Σk
          π k , one for each
          Gaussian k
                 €                             ...
Latent variable
 2.    E Step: For each                                    .7    .3
       point Xn, determine
       its ...
3.    M Step: For each
       Gaussian k, update
       parameters using
       new γ (znk )

                      Respon...
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Covariance matrix
 €
of Gaussi...
3.    M Step: For each
      Gaussian k, update
      parameters using
      new γ (znk )


Mixing Coefficient
 €
        ...
4.    Evaluate log likelihood. If likelihood or
      parameters converge, stop. Else go to Step
      2 (E step).




Lik...
35
old              Hidden
1.      Initialize parameters   θ                   variables
                                    ...
 K-means  can be formulated as EM
 EM for Gaussian Mixtures
 EM for Bernoulli Mixtures

 EM for Bayesian Linear Regres...
 “Expectation”
Calculated the fixed, data-dependent
  parameters of the function Q.
 “Maximization”
Once the parameters ...
 We  learned how to cluster data in an
  unsupervised manner
 Gaussian Mixture Models are useful for
  modeling data wit...
 Myquestion: What other applications could
 use EM? How about EM of GMMs?
                                       40
Upcoming SlideShare
Loading in …5
×

Expectation Maximization and Gaussian Mixture Models

37,580 views

Published on

Published in: Technology, Education

Expectation Maximization and Gaussian Mixture Models

  1. 1. Expectation Maximization and Mixture of Gaussians 1
  2. 2. (bpm 125)  Recommend me Bpm some music! 90!  Discover groups of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection 2
  3. 3. (bpm 125)  Recommend me some music! bpm  Discover groups 120 of similar songs… Only my railgun (bpm Bach Sonata 120) #1 (bpm 60) My Music Collection bpm 60 3
  4. 4. An unsupervised classifying method 4
  5. 5. 1.  Initialize K “means” µk , one for each class µ1   Eg. Use random starting points, or € choose k random € µ2 points from the set €K=2 5
  6. 6. 1 0 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 6
  7. 7. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 0 1 7
  8. 8. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 8
  9. 9. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 9
  10. 10. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 10
  11. 11. 0 1 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 11
  12. 12. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 12
  13. 13. 2.  Phase 1: Assign each point to closest mean µk 3.  Phase 2: Update means of the new clusters € 13
  14. 14. 2.  Phase 1: Assign each point to closest mean 3.  Phase 2: Update means of the new clusters 14
  15. 15. 4.  When means do not change anymore  clustering DONE. 15
  16. 16.  InK-means, a point can only have 1 class  But what about points that lie in between groups? eg. Jazz + Classical 16
  17. 17. The Famous “GMM”: Gaussian Mixture Model 17
  18. 18. Mean p(X) = N(X | µ,Σ) Variance Gaussian == “Normal” distribution 18
  19. 19. p(X) = N(X | µ,Σ) + N(X | µ,Σ) 19
  20. 20. p(X) = N(X | µ1,Σ1 ) + N(X | µ2 ,Σ 2 ) Example: Variance 20
  21. 21. p(X) = π 1N(X | µ1,Σ1 ) + π 2 N(X | µ2 ,Σ 2 ) k Example: Mixing Coefficient ∑π k =1 k=1 € π 1 = 0.7 π 2 = 0.3 21
  22. 22. K p(X) = ∑ π k N(X | µk ,Σ k ) k=1 Example: K =2 € € 22
  23. 23.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 23
  24. 24.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier 24
  25. 25.  K-means is a  Mixture of classifier Gaussians is a probability model  We can USE it as a “soft” classifier Parameter to fit to data: Parameters to fit to data: • Mean µk • Mean µk • Covariance Σ k • Mixing coefficient π k € € 25 €
  26. 26. EM for GMM 26
  27. 27. 1.  Initialize means µk 1 0 2.  E Step: Assign each point to a cluster 3.  M Step: Given clusters, refine mean µk of each cluster k 4.  Stop when change in means is small € € 27
  28. 28. 1.  Initialize Gaussian* parameters: means µk , covariances Σ k and mixing coefficients π k 2.  E Step: Assign each point Xn an assignment score γ (znk ) for each cluster k 0.5 0.5 3.  M Step: Given scores, adjust µk ,€ k ,Σ k π for€each cluster k € 4.  Evaluate € likelihood. If likelihood or parameters converge, stop. € € € *There are k Gaussians 28
  29. 29. 1.  Initialize µk , Σk π k , one for each Gaussian k € π2 Σ2   Tip! Use K-means € € result to initialize: µ2 µk ← µk Σk ← cov(cluster(K)) € € π k ← Number of pointspoints in k € Total number of 29 €
  30. 30. Latent variable 2.  E Step: For each .7 .3 point Xn, determine its assignment score to each Gaussian k: is called a “responsibility”: how much is this Gaussian k γ (znk ) responsible for this point Xn? 30
  31. 31. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Responsibility for this Xn Mean of Gaussian k € Find the mean that “fits” the assignment scores best 31
  32. 32. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Covariance matrix € of Gaussian k Just calculated this! 32
  33. 33. 3.  M Step: For each Gaussian k, update parameters using new γ (znk ) Mixing Coefficient € eg. 105.6/200 for Gaussian k Total # of points 33
  34. 34. 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step). Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness! 34
  35. 35. 35
  36. 36. old Hidden 1.  Initialize parameters θ variables old 2.  E Step: Evaluate p(Z | X,θ ) 3.  M Step: Evaluate Observed variables € € Likelihood where 4.  Evaluate log likelihood. If likelihood or parameters converge, stop. Else θ old ← θ new and go to E Step. 36
  37. 37.  K-means can be formulated as EM  EM for Gaussian Mixtures  EM for Bernoulli Mixtures  EM for Bayesian Linear Regression 37
  38. 38.  “Expectation” Calculated the fixed, data-dependent parameters of the function Q.  “Maximization” Once the parameters of Q are known, it is fully determined, so now we can maximize Q. 38
  39. 39.  We learned how to cluster data in an unsupervised manner  Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments  Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5 39
  40. 40.  Myquestion: What other applications could use EM? How about EM of GMMs? 40

×