Statistical Clustering: k-means, Gaussian Mixtures, Variational Inference 22-FEB-2012
What is Clustering? Design Considerations • Features • Dimension • Model: Distance / Cost • Bias / Variance
Why do we care?
Scope of Talk – Main Take Away Point It's all About the Posterior 𝑝 𝐿 𝐷 K-means How does it work Math behind it Issues GMM How does it work Math behind it Issues Variational Just the facts Variational Inference GMM, EM, (Graph Cuts, Spectral Clustering) K-means, vector quantization
Scope of Talk Main Take Away Point It's all Just Posterior Estimation Variational / MCNC GMM K-means / vector quantization K-means How does it work Math behind it Issues GMM How does it work Math behind it Issues Variational Just the facts
K-means – How it works Goal: represent a data set in terms of K clusters each of which is summarized by a prototype 𝝁 𝒌 Iterative Two step process: E-step: assign each data point to nearest prototype M-step: update prototype to be the cluster means Simple version: Euclidean distance, requires whitening Design Considerations • Features • Dimension • Model: Distance / Cost • Bias / Variance
Converged
k-means - Math Responsibilities – assign data to cluster Cost Function example
Minimizing the Cost Function
What can go wrong?
What can go wrong? A great deal. How do we choose K? (gap statistic / prediction strength) How do we initialize? (k++ seems to be the best) Local minimums – run hundreds of time with different initializations Are we overfitting? Probably. But hey – it simple to understand and does not cost too many cycles
Quick word on distances (k-medioids) Mahalanobis Not dependent on scale of measurement Tuning parameter Manhattan / City Block Dampens outliers Euclidean Need to whiten Outliers are an issue
Exclusive Clustering: k-means, weighted k-means Overlapping Clustering: fuzzy c-means, Nonlinear Clustering: kernel k-means (spectral clustering, normalized cuts) Hierarchical Clustering: Hierarchical Quicker word on flavors
Probabilistic Clustering Represent the probability distribution of the data as a mixture model Captures uncertainty in cluster assignments Gives model for data distribution Bayesian mixture – we can figure out K easier Consider a mixture of Gaussians
Multivariate Gaussian Distribution Review
Likelihood Function Maximum Likelihood What is the best fit to my data Approximation of Posterior!
Maximum Likelihood Solution for One Gaussian Sample mean Sample Covariance
Gaussian Mixtures Linear super-position of Gaussians Normalization and positivity require Can interpret mixing coefficients as prior probabilities [Aside]We can sample from this. Given mixing coeff, mean, variance – get a sample from p(x) – our dataset.
Fitting the Gaussian Mixture We wish to invert this sampling process – given the data, find the corresponding parameters (like we did for the single Gaussian case) Mixing coefficients Means Covariances If we knew which data point "belonged" or was the responsibility of which Gaussian, then we could use our single Gaussian ML solution Problem: We don't have labels, this complicates things. Solution: Create a latent or hidden variable (z) that tells us which data point goes with which Gaussian
Posterior of latent variable 𝜋 𝑘(𝑥) ≡ 𝑝 𝑧 𝑘 = 1 Or more concretely the probability that the data point 𝑥 was generated by the 𝑘 𝑡ℎ Gaussian with no prior knowledge of 𝑥. 𝛾 𝑘 𝑥 ≡ 𝑝 𝑧 𝑘 = 1|𝑥 Or more concretely the probability that the data point 𝑥 was generated by the 𝑘 𝑡ℎ Gaussian after observing 𝑥 𝛾 𝑘 𝑥 = 𝜋 𝑘 𝑁(𝑥|𝜇 𝑘) 𝑗=1 𝐾 𝜋 𝑗 𝑁(𝑥|𝜇 𝑘) Also called responsiblities
Maximum Likelihood for GMM The log likelihood takes this form ln 𝑝 𝐷 𝝅, 𝝁, 𝜮 = 𝑛=1 𝑁 𝑙𝑛 𝑘=1 𝐾 𝜋 𝑘 𝑁(𝑥 𝑛|𝝁 𝒌, 𝜮 𝒌) Notice that the sum inside the log, no closed form solution. Solve by expectation-maximization (EM) algorithm Derivative w.r.t 𝝁 𝒌
EM – notice each one of these is dependent on responsiblities Do the Same for Covariance Use Lagrange Multiplier for mixing coefficients
Relation to k-means
Fast food example http://nutrition.mcdonalds.com/nutritionexchange/nutritionfacts.pdf
Dessert Cluster Caramel Mocha Frappe Caramel Iced Hazelnut Latte Iced Coffee Strawberry Triple Thick Shake Snack Size McFlurry Hot Caramel Sundae Baked Hot Apple Pie Cinnamon Melts Kiddie Cone Strawberry Sundae
Burger – like cluster Hamburger Cheeseburger Filet-O-Fish Quarter Pounder with Cheese Premium Grilled Chicken Club Sandwich Ranch Snack Wrap Premium Asian Salad with Crispy Chicken Butter Garlic Croutons Sausage McMuffin Sausage McGriddles
Salad Cluster Premium Southwest Salad with Grilled Chicken Premium Caesar Salad with Grilled Chicken Side Salad Premium Asian Salad without Chicken Premium Bacon Ranch Salad without Chicken
Sauces Cluster 2 /6 Hot Mustard Sauce Spicy Buffalo Sauce Newman's Own Low Fat Balsamic Vinaigrette Ketchup Packet Barbeque Sauce Chipotle Barbeque Sauce
Creamy Sauces Creamy Ranch Sauce Newman's Own Creamy Caesar Dressing Coffee Cream Iced Coffee with Sugar Free Vanilla Syrup
Oatmeal and Apples on their own
Breakfast artery clogging cluster Sausage McMuffin with Egg Sausage Burrito Egg McMuffin Bacon, Egg & Chees Biscuit McSkillet Burrito with Sausage Big Breakfast with Hotcakes