• Like
Machine Learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Machine Learning

  • 1,466 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,466
On SlideShare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
23
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning
  • 2. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  • 3. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
  • 4. k-means mix gaussians Clustering Data set {x 1 , · · · , x N } of N observations of a random d-dim eculidean variable x. Goal is to partition the data set ito K clusters (K known). Intuitively, the points within a cluster must be “close” to each other compared to pints outside the cluster. Dubhashi Machine Learrning
  • 5. k-means mix gaussians Cluster centers and assignments Find a set of centers µk , k ∈ [K ] Assign each data point to one of the centers so as to minimize the sum of the squares of the distances to the assigned centers. Dubhashi Machine Learrning
  • 6. k-means mix gaussians Assignment and Distortion Introduce binary indicator variables 1, if xn is asssigned to µk rn,k := 0, otherwise Minimize the distortion measure J := rn,k ||xn − µk ||2 . n∈[N] k∈[K ] Dubhashi Machine Learrning
  • 7. k-means mix gaussians Two Step Optimization Start with some initial values of µk . Basic iteration consists of two steps until convergence. M Minimize J wrt rn,k keeping µk fixed. E Minimize J wrt µk keeping rn,k fixed. Dubhashi Machine Learrning
  • 8. k-means mix gaussians Two Step Optimization: M Step Minimize J wrt rn,k keeping µk fixed: 1, if k = argminj ||xn − µk ||2 rn,k := 0, otherwise. Dubhashi Machine Learrning
  • 9. k-means mix gaussians Two Step Optimization: E Step Minimize J wrt µk keeping rn,k fixed: J is a quadratic function of µk , so setting derivative to zero gives: rn,k (x n − µk ) = 0, n∈[N] hence, n rn,k x n µk = . n rn,k In words: set µk to be the mean of the points assigned to cluster k, hence called K-means Algorithms. Dubhashi Machine Learrning
  • 10. k-means mix gaussians K-Means Algorithm Analysis Since J decreses at each iteration, convergence guaranteesd. But may converge to a local rather than a global optimum. Dubhashi Machine Learrning
  • 11. k-means mix gaussians K-Means Algorithm: Example 2 (a) 2 (b) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 12. k-means mix gaussians K-Means Algorithm: Example 2 (c) 2 (d) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 13. k-means mix gaussians K-Means Algorithm: Example 2 (e) 2 (f) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 14. k-means mix gaussians K-Means Algorithm: Example 2 (g) 2 (h) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
  • 15. k-means mix gaussians K-Means Algorithm: Example 2 (i) 0 −2 −2 0 2 Dubhashi Machine Learrning
  • 16. k-means mix gaussians K-Means and Image Segmentation Image Segmentation problem: partition image into regions of homogeneosu visual appearance, corresponding to objects or parts of objects. Each pixel is a 3-dim point corresponding to intensities of red, blue and green channels. perform K-means and redraw image replacing ecah pixel by the corresponding center µk . Dubhashi Machine Learrning
  • 17. k-means mix gaussians K-Means Algorithm: Example ¤¢  £ ¡ ¤¢  £ ¡ Dubhashi Machine Learrning
  • 18. k-means mix gaussians K-Means Algorithm: Example ¦¤¢  ¥£ ¡ Original image Dubhashi Machine Learrning
  • 19. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  • 20. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
  • 21. k-means mix gaussians K-Means and Data Compression Lossy as opposed to lossless compression where we accept some errors in recontruction in return for higher rate of compression. Instead of storing all the N data pointsm store only the identity of the assigned cluster, and the cluster centers. Significant savings provided K << N. Each data point approximated by nearest center µk : code-book vectors. New data compressed by finding nearest center and storing only the label k of corresponding cluster. Scheme called Vector Quantization. Dubhashi Machine Learrning
  • 22. k-means mix gaussians K-Means and Data Compression: Example Suppose original image has N pixels comprising {R, G, B} values which are stored with 8 bits precision. Then total space required is 24N bits. Instead if we first do K -means and transmit only label of corresponding cluster for ecah pixel, this takes log K bits per pixel for a total of N log K bits. Also need to transmit the K code–book vectors which needs 24K bits. In example, original image has 240 × 180 = 43, 200 pixels, requring 24 × 43, 200 = 1, 036, 800 pixels. Compressed images require 43, 248 (K = 2), 86, 472 (K = 3) and 173, 040 (K = 10) bits. Dubhashi Machine Learrning
  • 23. k-means mix gaussians Mixtures of Gaussians: Motivation Pure Gaussian distributions have limitations when it comes to modelling real life data. Example: “Olfd Faithful” eruption durations. Forms two dominant clumps Single Gaussian can’t model this data well Linear superposition of two Gaussians does much better. Dubhashi Machine Learrning
  • 24. k-means mix gaussians Old Faithful Eruptions 100 100 80 80 60 60 40 40 1 2 3 4 5 6 1 2 3 4 5 6 Dubhashi Machine Learrning
  • 25. k-means mix gaussians Mixtures of Gaussians: Modelling Linear combination of Gaussians can give rise to complex p(x) distributions. By using a suffieicnt number of Gaussians and adjusting their means and covariannces, as well as linear combination coefiicients, can model almost x any continuous density to arbitrary accuracy. Dubhashi Machine Learrning
  • 26. k-means mix gaussians Mixtures of Gaussians: Definition Superpositon of Gaussians of the form p(x) := πk N (x | µk , Σk ). k∈[K ] Each Gaussian density N (x | µk , Σk ) is a component of the mixture with its own mean and covariance. Parameters πk are mixing coefficients and satisfy 0 ≤ πk ≤ 1 and k πk = 1. Dubhashi Machine Learrning
  • 27. k-means mix gaussians Mixtures of Gaussians: Definition 1 1 (a) (b) 0.5 0.2 0.5 0.5 0.3 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 28. k-means mix gaussians Mixtures of Gaussians: Definition 1 (a) 0.5 0.2 0.5 0.3 0 0 0.5 1 Dubhashi Machine Learrning
  • 29. k-means mix gaussians Equivalent Definition: Latent Variable Can introduce a latent variable z which is such that exactly one component is 1 and the rest are zeros, with p(zk = 1) = πk . This variable gives the component. Given z, the conditional distribution is p(x | zk = 1) = N (x | µk , Σk ). Inverting this, using Baye’s rule γ(zk ) := p(zk = 1 | x) p(zk = 1)p(x | zk = 1) = j p(zj = 1)p(x | zj = 1) πk N (x | µk , Σk ) = j πj N (x | µj , Σj ) is the posterior probability or responsibility that component k takes for observation x. Dubhashi Machine Learrning
  • 30. k-means mix gaussians Mixtures and Responsibilities 1 1 (a) (b) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 31. k-means mix gaussians Mixtures and Responsibilities 1 1 (b) (c) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
  • 32. k-means mix gaussians Learning Mixtures Suppose we have a data set of observations represented by a N × D matrix X := {x 1 , · · · , x N } and we want to model it as a mixture of K Gaussians. Need to find mixing coefficients πk , and parameters of component models, µk and Σk . Dubhashi Machine Learrning
  • 33. k-means mix gaussians Learning Mixtures: The Means Start with the loglikelihood function:   ln p(X | πµ, Σ) = ln  πk N (x n | µk , Σk ) n∈[N] k∈[K ] Setting derivative wrt µk to zero, and assuming Σ is invertible gives: 1 µk = γ(zn,k )x n , Nk n∈[N] where Nk := n∈[N] γ(zn,k ). Dubhashi Machine Learrning
  • 34. k-means mix gaussians Learning Mixtures: The Means Interpret Nk as the “effective number of points” assigned to cluster k . Note that the mean µk for the kth Gaussian component is given by a weighted mean of all points in the data set The weighting factor for data point x n is given by the posterior probability or responsibilty of component k for generating x n . Dubhashi Machine Learrning
  • 35. k-means mix gaussians Learning Mixtures: The Covariances Setting derivative wrt Σk to zero, and assuming Σ is invertible gives: 1 σk = γ(zn,k )(x n − µk )(x n − µk )T . Nk n∈[N] which is same as sigle Gaussian solution but with a avergae weighted by the corresponding posterior probability. Dubhashi Machine Learrning
  • 36. k-means mix gaussians Learning Mixtures: Mixing Coefficients Setting derivative wrt πk to zero, and taking into account that k πk = 1 (lagrange multipliers!) Nk πk = . N The mixing coefficient for the kth componet is the average responsibilility that the component takes for explaining the data set. Dubhashi Machine Learrning
  • 37. k-means mix gaussians Learning Mixtures: EM Algorithm 1 Initialize means, covars and mix coeffs and repeat: 2 E Step: Evaluate responsibilities using current parameters: πk N (x n | µk , Σk ) γ(zn,k ) = j πj N (x n | µj , Σj ) 3 M Step: Re-estimate parameters using current responsibilities: 1 µnew = k γ(zn,k )x n Nk n 1 Σnew = k γ(zn,k )(x n − µnew )(x n − µnew )T . k k Nk n new Nk πk = , N where Nk := n γ(zn,k ). Dubhashi Machine Learrning
  • 38. k-means mix gaussians EM Algorithm: Example 2 2 0 0 −2 −2 −2 0 (a) 2 −2 0 (b) 2 Dubhashi Machine Learrning
  • 39. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¤¢  £ ¡ 0 0 −2 −2 −2 0 (c) 2 −2 0 (d) 2 Dubhashi Machine Learrning
  • 40. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¦¤¢  ¥£ ¡ 0 0 −2 −2 −2 0 (e) 2 −2 0 (f) 2 Dubhashi Machine Learrning
  • 41. k-means mix gaussians EM vs K-Means K-means performs a hard assignment of data points to clusters i.e. each data point is assigned to a unique cluster. EM algorithm makes a soft assignment based on posterior probabilities. K-means can be derived as the limit of the EM algorithm assigned to a particular instance of Gaussian mixtures. Dubhashi Machine Learrning