Upcoming SlideShare
×

# Machine Learning

2,436 views

Published on

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,436
On SlideShare
0
From Embeds
0
Number of Embeds
1,436
Actions
Shares
0
32
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Machine Learning

1. 1. Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning
2. 2. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
3. 3. Outline Outline 1 k-Means Clustering 2 Mixtures of Gaussians and EM Algorithm Dubhashi Machine Learrning
4. 4. k-means mix gaussians Clustering Data set {x 1 , · · · , x N } of N observations of a random d-dim eculidean variable x. Goal is to partition the data set ito K clusters (K known). Intuitively, the points within a cluster must be “close” to each other compared to pints outside the cluster. Dubhashi Machine Learrning
5. 5. k-means mix gaussians Cluster centers and assignments Find a set of centers µk , k ∈ [K ] Assign each data point to one of the centers so as to minimize the sum of the squares of the distances to the assigned centers. Dubhashi Machine Learrning
6. 6. k-means mix gaussians Assignment and Distortion Introduce binary indicator variables 1, if xn is asssigned to µk rn,k := 0, otherwise Minimize the distortion measure J := rn,k ||xn − µk ||2 . n∈[N] k∈[K ] Dubhashi Machine Learrning
7. 7. k-means mix gaussians Two Step Optimization Start with some initial values of µk . Basic iteration consists of two steps until convergence. M Minimize J wrt rn,k keeping µk ﬁxed. E Minimize J wrt µk keeping rn,k ﬁxed. Dubhashi Machine Learrning
8. 8. k-means mix gaussians Two Step Optimization: M Step Minimize J wrt rn,k keeping µk ﬁxed: 1, if k = argminj ||xn − µk ||2 rn,k := 0, otherwise. Dubhashi Machine Learrning
9. 9. k-means mix gaussians Two Step Optimization: E Step Minimize J wrt µk keeping rn,k ﬁxed: J is a quadratic function of µk , so setting derivative to zero gives: rn,k (x n − µk ) = 0, n∈[N] hence, n rn,k x n µk = . n rn,k In words: set µk to be the mean of the points assigned to cluster k, hence called K-means Algorithms. Dubhashi Machine Learrning
10. 10. k-means mix gaussians K-Means Algorithm Analysis Since J decreses at each iteration, convergence guaranteesd. But may converge to a local rather than a global optimum. Dubhashi Machine Learrning
11. 11. k-means mix gaussians K-Means Algorithm: Example 2 (a) 2 (b) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
12. 12. k-means mix gaussians K-Means Algorithm: Example 2 (c) 2 (d) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
13. 13. k-means mix gaussians K-Means Algorithm: Example 2 (e) 2 (f) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
14. 14. k-means mix gaussians K-Means Algorithm: Example 2 (g) 2 (h) 0 0 −2 −2 −2 0 2 −2 0 2 Dubhashi Machine Learrning
15. 15. k-means mix gaussians K-Means Algorithm: Example 2 (i) 0 −2 −2 0 2 Dubhashi Machine Learrning
16. 16. k-means mix gaussians K-Means and Image Segmentation Image Segmentation problem: partition image into regions of homogeneosu visual appearance, corresponding to objects or parts of objects. Each pixel is a 3-dim point corresponding to intensities of red, blue and green channels. perform K-means and redraw image replacing ecah pixel by the corresponding center µk . Dubhashi Machine Learrning
17. 17. k-means mix gaussians K-Means Algorithm: Example ¤¢  £ ¡ ¤¢  £ ¡ Dubhashi Machine Learrning
18. 18. k-means mix gaussians K-Means Algorithm: Example ¦¤¢  ¥£ ¡ Original image Dubhashi Machine Learrning
19. 19. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
20. 20. k-means mix gaussians K-Means Algorithm: Example Dubhashi Machine Learrning
21. 21. k-means mix gaussians K-Means and Data Compression Lossy as opposed to lossless compression where we accept some errors in recontruction in return for higher rate of compression. Instead of storing all the N data pointsm store only the identity of the assigned cluster, and the cluster centers. Signiﬁcant savings provided K << N. Each data point approximated by nearest center µk : code-book vectors. New data compressed by ﬁnding nearest center and storing only the label k of corresponding cluster. Scheme called Vector Quantization. Dubhashi Machine Learrning
22. 22. k-means mix gaussians K-Means and Data Compression: Example Suppose original image has N pixels comprising {R, G, B} values which are stored with 8 bits precision. Then total space required is 24N bits. Instead if we ﬁrst do K -means and transmit only label of corresponding cluster for ecah pixel, this takes log K bits per pixel for a total of N log K bits. Also need to transmit the K code–book vectors which needs 24K bits. In example, original image has 240 × 180 = 43, 200 pixels, requring 24 × 43, 200 = 1, 036, 800 pixels. Compressed images require 43, 248 (K = 2), 86, 472 (K = 3) and 173, 040 (K = 10) bits. Dubhashi Machine Learrning
23. 23. k-means mix gaussians Mixtures of Gaussians: Motivation Pure Gaussian distributions have limitations when it comes to modelling real life data. Example: “Olfd Faithful” eruption durations. Forms two dominant clumps Single Gaussian can’t model this data well Linear superposition of two Gaussians does much better. Dubhashi Machine Learrning
24. 24. k-means mix gaussians Old Faithful Eruptions 100 100 80 80 60 60 40 40 1 2 3 4 5 6 1 2 3 4 5 6 Dubhashi Machine Learrning
25. 25. k-means mix gaussians Mixtures of Gaussians: Modelling Linear combination of Gaussians can give rise to complex p(x) distributions. By using a sufﬁeicnt number of Gaussians and adjusting their means and covariannces, as well as linear combination coeﬁicients, can model almost x any continuous density to arbitrary accuracy. Dubhashi Machine Learrning
26. 26. k-means mix gaussians Mixtures of Gaussians: Deﬁnition Superpositon of Gaussians of the form p(x) := πk N (x | µk , Σk ). k∈[K ] Each Gaussian density N (x | µk , Σk ) is a component of the mixture with its own mean and covariance. Parameters πk are mixing coefﬁcients and satisfy 0 ≤ πk ≤ 1 and k πk = 1. Dubhashi Machine Learrning
27. 27. k-means mix gaussians Mixtures of Gaussians: Deﬁnition 1 1 (a) (b) 0.5 0.2 0.5 0.5 0.3 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
28. 28. k-means mix gaussians Mixtures of Gaussians: Deﬁnition 1 (a) 0.5 0.2 0.5 0.3 0 0 0.5 1 Dubhashi Machine Learrning
29. 29. k-means mix gaussians Equivalent Deﬁnition: Latent Variable Can introduce a latent variable z which is such that exactly one component is 1 and the rest are zeros, with p(zk = 1) = πk . This variable gives the component. Given z, the conditional distribution is p(x | zk = 1) = N (x | µk , Σk ). Inverting this, using Baye’s rule γ(zk ) := p(zk = 1 | x) p(zk = 1)p(x | zk = 1) = j p(zj = 1)p(x | zj = 1) πk N (x | µk , Σk ) = j πj N (x | µj , Σj ) is the posterior probability or responsibility that component k takes for observation x. Dubhashi Machine Learrning
30. 30. k-means mix gaussians Mixtures and Responsibilities 1 1 (a) (b) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
31. 31. k-means mix gaussians Mixtures and Responsibilities 1 1 (b) (c) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 Dubhashi Machine Learrning
32. 32. k-means mix gaussians Learning Mixtures Suppose we have a data set of observations represented by a N × D matrix X := {x 1 , · · · , x N } and we want to model it as a mixture of K Gaussians. Need to ﬁnd mixing coefﬁcients πk , and parameters of component models, µk and Σk . Dubhashi Machine Learrning
33. 33. k-means mix gaussians Learning Mixtures: The Means Start with the loglikelihood function:   ln p(X | πµ, Σ) = ln  πk N (x n | µk , Σk ) n∈[N] k∈[K ] Setting derivative wrt µk to zero, and assuming Σ is invertible gives: 1 µk = γ(zn,k )x n , Nk n∈[N] where Nk := n∈[N] γ(zn,k ). Dubhashi Machine Learrning
34. 34. k-means mix gaussians Learning Mixtures: The Means Interpret Nk as the “effective number of points” assigned to cluster k . Note that the mean µk for the kth Gaussian component is given by a weighted mean of all points in the data set The weighting factor for data point x n is given by the posterior probability or responsibilty of component k for generating x n . Dubhashi Machine Learrning
35. 35. k-means mix gaussians Learning Mixtures: The Covariances Setting derivative wrt Σk to zero, and assuming Σ is invertible gives: 1 σk = γ(zn,k )(x n − µk )(x n − µk )T . Nk n∈[N] which is same as sigle Gaussian solution but with a avergae weighted by the corresponding posterior probability. Dubhashi Machine Learrning
36. 36. k-means mix gaussians Learning Mixtures: Mixing Coefﬁcients Setting derivative wrt πk to zero, and taking into account that k πk = 1 (lagrange multipliers!) Nk πk = . N The mixing coefﬁcient for the kth componet is the average responsibilility that the component takes for explaining the data set. Dubhashi Machine Learrning
37. 37. k-means mix gaussians Learning Mixtures: EM Algorithm 1 Initialize means, covars and mix coeffs and repeat: 2 E Step: Evaluate responsibilities using current parameters: πk N (x n | µk , Σk ) γ(zn,k ) = j πj N (x n | µj , Σj ) 3 M Step: Re-estimate parameters using current responsibilities: 1 µnew = k γ(zn,k )x n Nk n 1 Σnew = k γ(zn,k )(x n − µnew )(x n − µnew )T . k k Nk n new Nk πk = , N where Nk := n γ(zn,k ). Dubhashi Machine Learrning
38. 38. k-means mix gaussians EM Algorithm: Example 2 2 0 0 −2 −2 −2 0 (a) 2 −2 0 (b) 2 Dubhashi Machine Learrning
39. 39. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¤¢  £ ¡ 0 0 −2 −2 −2 0 (c) 2 −2 0 (d) 2 Dubhashi Machine Learrning
40. 40. k-means mix gaussians EM Algorithm: Example 2 ¤¢  £ ¡ 2 ¦¤¢  ¥£ ¡ 0 0 −2 −2 −2 0 (e) 2 −2 0 (f) 2 Dubhashi Machine Learrning
41. 41. k-means mix gaussians EM vs K-Means K-means performs a hard assignment of data points to clusters i.e. each data point is assigned to a unique cluster. EM algorithm makes a soft assignment based on posterior probabilities. K-means can be derived as the limit of the EM algorithm assigned to a particular instance of Gaussian mixtures. Dubhashi Machine Learrning