Outline




            Machine Learning

               Devdatt Dubhashi
Department of Computer Science and Engineering
 ...
Outline


Outline




  1   k-Means Clustering



  2   Mixtures of Gaussians and EM Algorithm




                       ...
Outline


Outline




  1   k-Means Clustering



  2   Mixtures of Gaussians and EM Algorithm




                       ...
k-means
                       mix gaussians


Clustering




     Data set {x 1 , · · · , x N } of N observations of a ra...
k-means
                       mix gaussians


Cluster centers and assignments




     Find a set of centers µk , k ∈ [K ...
k-means
                      mix gaussians


Assignment and Distortion



     Introduce binary indicator variables

    ...
k-means
                         mix gaussians


Two Step Optimization




  Start with some initial values of µk . Basic ...
k-means
                        mix gaussians


Two Step Optimization: M Step




  Minimize J wrt rn,k keeping µk fixed:

...
k-means
                        mix gaussians


Two Step Optimization: E Step


  Minimize J wrt µk keeping rn,k fixed: J i...
k-means
                     mix gaussians


K-Means Algorithm Analysis




     Since J decreses at each iteration, conve...
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (a)                  2     (b)



   0      ...
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (c)                  2     (d)



   0      ...
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (e)                  2     (f)



   0      ...
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (g)                  2     (h)



   0      ...
k-means
                 mix gaussians


K-Means Algorithm: Example



   2   (i)



   0



  −2

       −2    0     2


...
k-means
                      mix gaussians


K-Means and Image Segmentation



     Image Segmentation problem: partition...
k-means
                mix gaussians


K-Means Algorithm: Example

          ¤¢ 
         £ ¡                          ¤¢...
k-means
                mix gaussians


K-Means Algorithm: Example

        ¦¤¢ 
        ¥£ ¡                    Original ...
k-means
                mix gaussians


K-Means Algorithm: Example




                    Dubhashi    Machine Learrning
k-means
                mix gaussians


K-Means Algorithm: Example




                    Dubhashi    Machine Learrning
k-means
                       mix gaussians


K-Means and Data Compression


      Lossy as opposed to lossless compressi...
k-means
                      mix gaussians


K-Means and Data Compression: Example

     Suppose original image has N pix...
k-means
                      mix gaussians


Mixtures of Gaussians: Motivation




     Pure Gaussian distributions have ...
k-means
                         mix gaussians


Old Faithful Eruptions


   100                                   100


 ...
k-means
                      mix gaussians


Mixtures of Gaussians: Modelling


 Linear combination of Gaussians
 can giv...
k-means
                     mix gaussians


Mixtures of Gaussians: Definition



     Superpositon of Gaussians of the for...
k-means
                                 mix gaussians


Mixtures of Gaussians: Definition



    1                        ...
k-means
                                 mix gaussians


Mixtures of Gaussians: Definition



    1
         (a)



   0.5 ...
k-means
                      mix gaussians


Equivalent Definition: Latent Variable
     Can introduce a latent variable z...
k-means
                     mix gaussians


Mixtures and Responsibilities



    1                                 1
    ...
k-means
                     mix gaussians


Mixtures and Responsibilities



    1                                 1
    ...
k-means
                      mix gaussians


Learning Mixtures




     Suppose we have a data set of observations repres...
k-means
                       mix gaussians


Learning Mixtures: The Means

     Start with the loglikelihood function:
 ...
k-means
                      mix gaussians


Learning Mixtures: The Means



     Interpret Nk as the “effective number o...
k-means
                         mix gaussians


Learning Mixtures: The Covariances



     Setting derivative wrt Σk to z...
k-means
                      mix gaussians


Learning Mixtures: Mixing Coefficients



     Setting derivative wrt πk to z...
k-means
                          mix gaussians


Learning Mixtures: EM Algorithm
   1   Initialize means, covars and mix ...
k-means
                 mix gaussians


EM Algorithm: Example



   2                              2




   0            ...
k-means
                    mix gaussians


EM Algorithm: Example



   2     ¤¢ 
        £ ¡                          2  ...
k-means
                    mix gaussians


EM Algorithm: Example



   2     ¤¢ 
        £ ¡                          2  ...
k-means
                     mix gaussians


EM vs K-Means




    K-means performs a hard assignment of data points to
  ...
Upcoming SlideShare
Loading in …5
×

Machine Learning

2,436 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,436
On SlideShare
0
From Embeds
0
Number of Embeds
1,436
Actions
Shares
0
Downloads
32
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Machine Learning

  1. 1. Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning
  2. 2. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  3. 3. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  4. 4. k-means mix gaussians Clustering Data set {x 1 , · · · , x N } of N observations of a random d-dim eculidean variable x. Goal is to partition the data set ito K clusters (K known). Intuitively, the points within a cluster must be “close” to each other compared to pints outside the cluster. Dubhashi Machine Learrning
  5. 5. k-means mix gaussians Cluster centers and assignments Find a set of centers µk , k ∈ [K ] Assign each data point to one of the centers so as to minimize the sum of the squares of the distances to the assigned centers. Dubhashi Machine Learrning
  6. 6. k-means mix gaussians Assignment and Distortion Introduce binary indicator variables 1, if xn is asssigned to µk rn,k := 0, otherwise Minimize the distortion measure J := rn,k ||xn − µk ||2 . n∈[N] k∈[K ] Dubhashi Machine Learrning
  7. 7. k-means mix gaussians Two Step Optimization Start with some initial values of µk . Basic iteration consists of two steps until convergence. M Minimize J wrt rn,k keeping µk fixed. E Minimize J wrt µk keeping rn,k fixed. Dubhashi Machine Learrning
  8. 8. k-means mix gaussians Two Step Optimization: M Step Minimize J wrt rn,k keeping µk fixed: 1, if k = argminj ||xn − µk ||2 rn,k := 0, otherwise. Dubhashi Machine Learrning
  9. 9. k-means mix gaussians Two Step Optimization: E Step Minimize J wrt µk keeping rn,k fixed: J is a quadratic function of µk , so setting derivative to zero gives: rn,k (x n − µk ) = 0, n∈[N] hence, n rn,k x n µk = . n rn,k In words: set µk to be the mean of the points assigned to cluster k, hence called K-means Algorithms. Dubhashi Machine Learrning
  10. 10. k-means mix gaussians K-Means Algorithm Analysis Since J decreses at each iteration, convergence guaranteesd. But may converge to a local rather than a global optimum. Dubhashi Machine Learrning
  11. 11. k-means mix gaussians K-Means Algorithm: Example 2 (a) 2 (b) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  12. 12. k-means mix gaussians K-Means Algorithm: Example 2 (c) 2 (d) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  13. 13. k-means mix gaussians K-Means Algorithm: Example 2 (e) 2 (f) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  14. 14. k-means mix gaussians K-Means Algorithm: Example 2 (g) 2 (h) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  15. 15. k-means mix gaussians K-Means Algorithm: Example 2 (i) 0 −2 −2 0 2 Dubhashi Machine Learrning
  16. 16. k-means mix gaussians K-Means and Image Segmentation Image Segmentation problem: partition image into regions of homogeneosu visual appearance, corresponding to objects or parts of objects. Each pixel is a 3-dim point corresponding to intensities of red, blue and green channels. perform K-means and redraw image replacing ecah pixel by the corresponding center µk . Dubhashi Machine Learrning
  17. 17. k-means mix gaussians K-Means Algorithm: Example ¤¢  £ ¡ ¤¢  £ ¡ Dubhashi Machine Learrning
  18. 18. k-means mix gaussians K-Means Algorithm: Example ¦¤¢  ¥£ ¡ Original image Dubhashi Machine Learrning
  19. 19. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  20. 20. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  21. 21. k-means mix gaussians K-Means and Data Compression Lossy as opposed to lossless compression where we accept some errors in recontruction in return for higher rate of compression. Instead of storing all the N data pointsm store only the identity of the assigned cluster, and the cluster centers. Significant savings provided K << N. Each data point approximated by nearest center µk : code-book vectors. New data compressed by finding nearest center and storing only the label k of corresponding cluster. Scheme called Vector Quantization. Dubhashi Machine Learrning
  22. 22. k-means mix gaussians K-Means and Data Compression: Example Suppose original image has N pixels comprising {R, G, B} values which are stored with 8 bits precision. Then total space required is 24N bits. Instead if we first do K -means and transmit only label of corresponding cluster for ecah pixel, this takes log K bits per pixel for a total of N log K bits. Also need to transmit the K code–book vectors which needs 24K bits. In example, original image has 240 × 180 = 43, 200 pixels, requring 24 × 43, 200 = 1, 036, 800 pixels. Compressed images require 43, 248 (K = 2), 86, 472 (K = 3) and 173, 040 (K = 10) bits. Dubhashi Machine Learrning
  23. 23. k-means mix gaussians Mixtures of Gaussians: Motivation Pure Gaussian distributions have limitations when it comes to modelling real life data. Example: “Olfd Faithful” eruption durations. Forms two dominant clumps Single Gaussian can’t model this data well Linear superposition of two Gaussians does much better. Dubhashi Machine Learrning
  24. 24. k-means mix gaussians Old Faithful Eruptions 100 100 80 80 60 60 40 40 1 2 3 4 5 6 1 2 3 4 5 6 Dubhashi Machine Learrning
  25. 25. k-means mix gaussians Mixtures of Gaussians: Modelling Linear combination of Gaussians can give rise to complex p(x) distributions. By using a suffieicnt number of Gaussians and adjusting their means and covariannces, as well as linear combination coefiicients, can model almost x any continuous density to arbitrary accuracy. Dubhashi Machine Learrning
  26. 26. k-means mix gaussians Mixtures of Gaussians: Definition Superpositon of Gaussians of the form p(x) := πk N (x | µk , Σk ). k∈[K ] Each Gaussian density N (x | µk , Σk ) is a component of the mixture with its own mean and covariance. Parameters πk are mixing coefficients and satisfy 0 ≤ πk ≤ 1 and k πk = 1. Dubhashi Machine Learrning
  27. 27. k-means mix gaussians Mixtures of Gaussians: Definition 1 1 (a) (b) 0.5 0.2 0.5 0.5 0.3 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  28. 28. k-means mix gaussians Mixtures of Gaussians: Definition 1 (a) 0.5 0.2 0.5 0.3 0 0 0.5 1 Dubhashi Machine Learrning
  29. 29. k-means mix gaussians Equivalent Definition: Latent Variable Can introduce a latent variable z which is such that exactly one component is 1 and the rest are zeros, with p(zk = 1) = πk . This variable gives the component. Given z, the conditional distribution is p(x | zk = 1) = N (x | µk , Σk ). Inverting this, using Baye’s rule γ(zk ) := p(zk = 1 | x) p(zk = 1)p(x | zk = 1) = j p(zj = 1)p(x | zj = 1) πk N (x | µk , Σk ) = j πj N (x | µj , Σj ) is the posterior probability or responsibility that component k takes for observation x. Dubhashi Machine Learrning
  30. 30. k-means mix gaussians Mixtures and Responsibilities 1 1 (a) (b) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  31. 31. k-means mix gaussians Mixtures and Responsibilities 1 1 (b) (c) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  32. 32. k-means mix gaussians Learning Mixtures Suppose we have a data set of observations represented by a N × D matrix X := {x 1 , · · · , x N } and we want to model it as a mixture of K Gaussians. Need to find mixing coefficients πk , and parameters of component models, µk and Σk . Dubhashi Machine Learrning
  33. 33. k-means mix gaussians Learning Mixtures: The Means Start with the loglikelihood function:   ln p(X | πµ, Σ) = ln  πk N (x n | µk , Σk ) n∈[N] k∈[K ] Setting derivative wrt µk to zero, and assuming Σ is invertible gives: 1 µk = γ(zn,k )x n , Nk n∈[N] where Nk := n∈[N] γ(zn,k ). Dubhashi Machine Learrning
  34. 34. k-means mix gaussians Learning Mixtures: The Means Interpret Nk as the “effective number of points” assigned to cluster k . Note that the mean µk for the kth Gaussian component is given by a weighted mean of all points in the data set The weighting factor for data point x n is given by the posterior probability or responsibilty of component k for generating x n . Dubhashi Machine Learrning
  35. 35. k-means mix gaussians Learning Mixtures: The Covariances Setting derivative wrt Σk to zero, and assuming Σ is invertible gives: 1 σk = γ(zn,k )(x n − µk )(x n − µk )T . Nk n∈[N] which is same as sigle Gaussian solution but with a avergae weighted by the corresponding posterior probability. Dubhashi Machine Learrning
  36. 36. k-means mix gaussians Learning Mixtures: Mixing Coefficients Setting derivative wrt πk to zero, and taking into account that k πk = 1 (lagrange multipliers!) Nk πk = . N The mixing coefficient for the kth componet is the average responsibilility that the component takes for explaining the data set. Dubhashi Machine Learrning
  37. 37. k-means mix gaussians Learning Mixtures: EM Algorithm 1 Initialize means, covars and mix coeffs and repeat: 2 E Step: Evaluate responsibilities using current parameters: πk N (x n | µk , Σk ) γ(zn,k ) = j πj N (x n | µj , Σj ) 3 M Step: Re-estimate parameters using current responsibilities: 1 µnew = k γ(zn,k )x n Nk n 1 Σnew = k γ(zn,k )(x n − µnew )(x n − µnew )T . k k Nk n new Nk πk = , N where Nk := n γ(zn,k ). Dubhashi Machine Learrning
  38. 38. k-means mix gaussians EM Algorithm: Example 2 2 0 0 −2 −2 −2 0 (a) 2 −2 0 (b) 2 Dubhashi Machine Learrning
  39. 39. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¤¢  £ ¡ 0 0 −2 −2 −2 0 (c) 2 −2 0 (d) 2 Dubhashi Machine Learrning
  40. 40. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¦¤¢  ¥£ ¡ 0 0 −2 −2 −2 0 (e) 2 −2 0 (f) 2 Dubhashi Machine Learrning
  41. 41. k-means mix gaussians EM vs K-Means K-means performs a hard assignment of data points to clusters i.e. each data point is assigned to a unique cluster. EM algorithm makes a soft assignment based on posterior probabilities. K-means can be derived as the limit of the EM algorithm assigned to a particular instance of Gaussian mixtures. Dubhashi Machine Learrning

×