Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

No Downloads

Total views

2,436

On SlideShare

0

From Embeds

0

Number of Embeds

1,436

Shares

0

Downloads

32

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning
- 2. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
- 3. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
- 4. k-means mix gaussians Clustering Data set {x 1 , · · · , x N } of N observations of a random d-dim eculidean variable x. Goal is to partition the data set ito K clusters (K known). Intuitively, the points within a cluster must be “close” to each other compared to pints outside the cluster. Dubhashi Machine Learrning
- 5. k-means mix gaussians Cluster centers and assignments Find a set of centers µk , k ∈ [K ] Assign each data point to one of the centers so as to minimize the sum of the squares of the distances to the assigned centers. Dubhashi Machine Learrning
- 6. k-means mix gaussians Assignment and Distortion Introduce binary indicator variables 1, if xn is asssigned to µk rn,k := 0, otherwise Minimize the distortion measure J := rn,k ||xn − µk ||2 . n∈[N] k∈[K ] Dubhashi Machine Learrning
- 7. k-means mix gaussians Two Step Optimization Start with some initial values of µk . Basic iteration consists of two steps until convergence. M Minimize J wrt rn,k keeping µk ﬁxed. E Minimize J wrt µk keeping rn,k ﬁxed. Dubhashi Machine Learrning
- 8. k-means mix gaussians Two Step Optimization: M Step Minimize J wrt rn,k keeping µk ﬁxed: 1, if k = argminj ||xn − µk ||2 rn,k := 0, otherwise. Dubhashi Machine Learrning
- 9. k-means mix gaussians Two Step Optimization: E Step Minimize J wrt µk keeping rn,k ﬁxed: J is a quadratic function of µk , so setting derivative to zero gives: rn,k (x n − µk ) = 0, n∈[N] hence, n rn,k x n µk = . n rn,k In words: set µk to be the mean of the points assigned to cluster k, hence called K-means Algorithms. Dubhashi Machine Learrning
- 10. k-means mix gaussians K-Means Algorithm Analysis Since J decreses at each iteration, convergence guaranteesd. But may converge to a local rather than a global optimum. Dubhashi Machine Learrning
- 11. k-means mix gaussians K-Means Algorithm: Example 2 (a) 2 (b) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
- 12. k-means mix gaussians K-Means Algorithm: Example 2 (c) 2 (d) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
- 13. k-means mix gaussians K-Means Algorithm: Example 2 (e) 2 (f) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
- 14. k-means mix gaussians K-Means Algorithm: Example 2 (g) 2 (h) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
- 15. k-means mix gaussians K-Means Algorithm: Example 2 (i) 0 −2 −2 0 2 Dubhashi Machine Learrning
- 16. k-means mix gaussians K-Means and Image Segmentation Image Segmentation problem: partition image into regions of homogeneosu visual appearance, corresponding to objects or parts of objects. Each pixel is a 3-dim point corresponding to intensities of red, blue and green channels. perform K-means and redraw image replacing ecah pixel by the corresponding center µk . Dubhashi Machine Learrning
- 17. k-means mix gaussians K-Means Algorithm: Example ¤¢ £ ¡ ¤¢ £ ¡ Dubhashi Machine Learrning
- 18. k-means mix gaussians K-Means Algorithm: Example ¦¤¢ ¥£ ¡ Original image Dubhashi Machine Learrning
- 19. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
- 20. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
- 21. k-means mix gaussians K-Means and Data Compression Lossy as opposed to lossless compression where we accept some errors in recontruction in return for higher rate of compression. Instead of storing all the N data pointsm store only the identity of the assigned cluster, and the cluster centers. Signiﬁcant savings provided K << N. Each data point approximated by nearest center µk : code-book vectors. New data compressed by ﬁnding nearest center and storing only the label k of corresponding cluster. Scheme called Vector Quantization. Dubhashi Machine Learrning
- 22. k-means mix gaussians K-Means and Data Compression: Example Suppose original image has N pixels comprising {R, G, B} values which are stored with 8 bits precision. Then total space required is 24N bits. Instead if we ﬁrst do K -means and transmit only label of corresponding cluster for ecah pixel, this takes log K bits per pixel for a total of N log K bits. Also need to transmit the K code–book vectors which needs 24K bits. In example, original image has 240 × 180 = 43, 200 pixels, requring 24 × 43, 200 = 1, 036, 800 pixels. Compressed images require 43, 248 (K = 2), 86, 472 (K = 3) and 173, 040 (K = 10) bits. Dubhashi Machine Learrning
- 23. k-means mix gaussians Mixtures of Gaussians: Motivation Pure Gaussian distributions have limitations when it comes to modelling real life data. Example: “Olfd Faithful” eruption durations. Forms two dominant clumps Single Gaussian can’t model this data well Linear superposition of two Gaussians does much better. Dubhashi Machine Learrning
- 24. k-means mix gaussians Old Faithful Eruptions 100 100 80 80 60 60 40 40 1 2 3 4 5 6 1 2 3 4 5 6 Dubhashi Machine Learrning
- 25. k-means mix gaussians Mixtures of Gaussians: Modelling Linear combination of Gaussians can give rise to complex p(x) distributions. By using a sufﬁeicnt number of Gaussians and adjusting their means and covariannces, as well as linear combination coeﬁicients, can model almost x any continuous density to arbitrary accuracy. Dubhashi Machine Learrning
- 26. k-means mix gaussians Mixtures of Gaussians: Deﬁnition Superpositon of Gaussians of the form p(x) := πk N (x | µk , Σk ). k∈[K ] Each Gaussian density N (x | µk , Σk ) is a component of the mixture with its own mean and covariance. Parameters πk are mixing coefﬁcients and satisfy 0 ≤ πk ≤ 1 and k πk = 1. Dubhashi Machine Learrning
- 27. k-means mix gaussians Mixtures of Gaussians: Deﬁnition 1 1 (a) (b) 0.5 0.2 0.5 0.5 0.3 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
- 28. k-means mix gaussians Mixtures of Gaussians: Deﬁnition 1 (a) 0.5 0.2 0.5 0.3 0 0 0.5 1 Dubhashi Machine Learrning
- 29. k-means mix gaussians Equivalent Deﬁnition: Latent Variable Can introduce a latent variable z which is such that exactly one component is 1 and the rest are zeros, with p(zk = 1) = πk . This variable gives the component. Given z, the conditional distribution is p(x | zk = 1) = N (x | µk , Σk ). Inverting this, using Baye’s rule γ(zk ) := p(zk = 1 | x) p(zk = 1)p(x | zk = 1) = j p(zj = 1)p(x | zj = 1) πk N (x | µk , Σk ) = j πj N (x | µj , Σj ) is the posterior probability or responsibility that component k takes for observation x. Dubhashi Machine Learrning
- 30. k-means mix gaussians Mixtures and Responsibilities 1 1 (a) (b) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
- 31. k-means mix gaussians Mixtures and Responsibilities 1 1 (b) (c) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
- 32. k-means mix gaussians Learning Mixtures Suppose we have a data set of observations represented by a N × D matrix X := {x 1 , · · · , x N } and we want to model it as a mixture of K Gaussians. Need to ﬁnd mixing coefﬁcients πk , and parameters of component models, µk and Σk . Dubhashi Machine Learrning
- 33. k-means mix gaussians Learning Mixtures: The Means Start with the loglikelihood function: ln p(X | πµ, Σ) = ln πk N (x n | µk , Σk ) n∈[N] k∈[K ] Setting derivative wrt µk to zero, and assuming Σ is invertible gives: 1 µk = γ(zn,k )x n , Nk n∈[N] where Nk := n∈[N] γ(zn,k ). Dubhashi Machine Learrning
- 34. k-means mix gaussians Learning Mixtures: The Means Interpret Nk as the “effective number of points” assigned to cluster k . Note that the mean µk for the kth Gaussian component is given by a weighted mean of all points in the data set The weighting factor for data point x n is given by the posterior probability or responsibilty of component k for generating x n . Dubhashi Machine Learrning
- 35. k-means mix gaussians Learning Mixtures: The Covariances Setting derivative wrt Σk to zero, and assuming Σ is invertible gives: 1 σk = γ(zn,k )(x n − µk )(x n − µk )T . Nk n∈[N] which is same as sigle Gaussian solution but with a avergae weighted by the corresponding posterior probability. Dubhashi Machine Learrning
- 36. k-means mix gaussians Learning Mixtures: Mixing Coefﬁcients Setting derivative wrt πk to zero, and taking into account that k πk = 1 (lagrange multipliers!) Nk πk = . N The mixing coefﬁcient for the kth componet is the average responsibilility that the component takes for explaining the data set. Dubhashi Machine Learrning
- 37. k-means mix gaussians Learning Mixtures: EM Algorithm 1 Initialize means, covars and mix coeffs and repeat: 2 E Step: Evaluate responsibilities using current parameters: πk N (x n | µk , Σk ) γ(zn,k ) = j πj N (x n | µj , Σj ) 3 M Step: Re-estimate parameters using current responsibilities: 1 µnew = k γ(zn,k )x n Nk n 1 Σnew = k γ(zn,k )(x n − µnew )(x n − µnew )T . k k Nk n new Nk πk = , N where Nk := n γ(zn,k ). Dubhashi Machine Learrning
- 38. k-means mix gaussians EM Algorithm: Example 2 2 0 0 −2 −2 −2 0 (a) 2 −2 0 (b) 2 Dubhashi Machine Learrning
- 39. k-means mix gaussians EM Algorithm: Example 2 ¤¢ £ ¡ 2 ¤¢ £ ¡ 0 0 −2 −2 −2 0 (c) 2 −2 0 (d) 2 Dubhashi Machine Learrning
- 40. k-means mix gaussians EM Algorithm: Example 2 ¤¢ £ ¡ 2 ¦¤¢ ¥£ ¡ 0 0 −2 −2 −2 0 (e) 2 −2 0 (f) 2 Dubhashi Machine Learrning
- 41. k-means mix gaussians EM vs K-Means K-means performs a hard assignment of data points to clusters i.e. each data point is assigned to a unique cluster. EM algorithm makes a soft assignment based on posterior probabilities. K-means can be derived as the limit of the EM algorithm assigned to a particular instance of Gaussian mixtures. Dubhashi Machine Learrning

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment