• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Machine Learning and Data Mining: 06 Clustering: Partitioning

on

  • 19,606 views

Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. This lecture introduces k-means.

Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. This lecture introduces k-means.

Statistics

Views

Total Views
19,606
Views on SlideShare
19,371
Embed Views
235

Actions

Likes
25
Downloads
4
Comments
2

9 Embeds 235

http://bio7.org 128
http://www.slideshare.net 60
http://129.70.40.49 25
http://localhost 8
http://129.70.76.15 6
http://translate.googleusercontent.com 4
http://webspace.elet.polimi.it 2
http://lizziesky.blogspot.com 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • excellent
    Are you sure you want to
    Your message goes here
    Processing…
  • Good good!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Machine Learning and Data Mining: 06 Clustering: Partitioning Machine Learning and Data Mining: 06 Clustering: Partitioning Presentation Transcript

    • Clustering: Partitioning Methods Machine Learning and Data Mining
    • How do partitioning algorithms work? 2 They construct a partition of a data set containing n objects into a set of k clusters, so to minimize the sum of squared distance Σ k =1Σ tmi∈Km (Cm − tmi ) 2 m Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Global optimal: exhaustively enumerate all partitions Heuristic methods: k-means and k-medoids algorithms k-means Each cluster is represented by the center of the cluster k-medoids or PAM (Partition around medoids) Each cluster is represented by one of the objects in the cluster Prof. Pier Luca Lanzi
    • K-means Clustering 3 Works with numeric data only Pick a number (K) of cluster centers (at random) Assign every item to its nearest cluster center (e.g. using Euclidean distance) Move each cluster center to the mean of its assigned items Repeat steps 2,3 until convergence (change in cluster assignments less than a threshold) Prof. Pier Luca Lanzi
    • K-means Clustering 4 Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple Prof. Pier Luca Lanzi
    • K-means example (step 1) 5 k1 Y Pick 3 k2 initial cluster centers (randomly) k3 X Prof. Pier Luca Lanzi
    • K-means example (step 2) 6 k1 Y k2 Assign each point to the closest cluster k3 center X Prof. Pier Luca Lanzi
    • K-means example (step 3) 7 k1 k1 Y k2 Move k3 each cluster center k2 to the mean k3 of each cluster X Prof. Pier Luca Lanzi
    • K-means example (step 4) 8 k1 Y Reassign k3 points closest to a k2 different new cluster center X Prof. Pier Luca Lanzi
    • K-means example (step 5) 9 k1 Y re-compute k3 cluster means k2 X Prof. Pier Luca Lanzi
    • K-means example (step 6) 10 k1 Y re-compute k2 cluster means k3 X Prof. Pier Luca Lanzi
    • Two different K-means Clusterings 11 3 2.5 Original Points 2 1.5 y 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x 3 3 2.5 2.5 2 2 1.5 1.5 y y 1 1 0.5 0.5 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x Optimal Clustering Sub-optimal Clustering Prof. Pier Luca Lanzi
    • Importance of Choosing Initial Centroids 12 Iteration 1 6 5 4 3 2 3 2.5 2 1.5 y 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x Prof. Pier Luca Lanzi
    • Importance of Choosing Initial Centroids 13 Iteration 1 Iteration 2 Iteration 3 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Iteration 4 Iteration 5 Iteration 6 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Prof. Pier Luca Lanzi
    • Evaluating K-means Clusters 14 Most common measure is the sum of squared error (SSE) For each point, the error is the distance to the nearest cluster To get SSE, we square these errors and sum them. K SSE = ∑ ∑ dist 2 ( mi , x ) i =1 x∈Ci x is a data point in cluster Ci and mi is the representative point for cluster Ci Given two clusters, we can choose the one with the smallest error One easy way to reduce SSE is to increase K, the number of clusters A good clustering with smaller K can have a lower SSE than a poor clustering with higher K Prof. Pier Luca Lanzi
    • Importance of Choosing Initial Centroids … 15 Iteration 1 5 4 3 2 3 2.5 2 1.5 y 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x Prof. Pier Luca Lanzi
    • Importance of Choosing Initial Centroids … 16 Iteration 1 Iteration 2 3 3 2.5 2.5 2 2 1.5 1.5 y y 1 1 0.5 0.5 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x Iteration 3 Iteration 4 Iteration 5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Prof. Pier Luca Lanzi
    • Problems with Selecting Initial Points 17 If there are K ‘real’ clusters then the chance of selecting one centroid from each cluster is small. Chance is relatively small when K is large If clusters are the same size, n, then For example, if K = 10, then probability = 10!/1010 = 0.00036 Sometimes the initial centroids will readjust themselves in ‘right’ way, and sometimes they don’t Consider an example of five pairs of clusters Prof. Pier Luca Lanzi
    • 10 Clusters Example 18 Iteration 1 4 3 2 8 6 4 2 y 0 -2 -4 -6 0 5 10 15 20 x Starting with two initial centroids in one cluster of each pair of clusters Prof. Pier Luca Lanzi
    • 10 Clusters Example 19 Iteration 1 Iteration 2 8 8 6 6 4 4 2 2 y y 0 0 -2 -2 -4 -4 -6 -6 0 5 10 15 20 0 5 10 15 20 x x Iteration 3 Iteration 4 8 8 6 6 4 4 2 2 y y 0 0 -2 -2 -4 -4 -6 -6 0 5 10 15 20 0 5 10 15 20 x x Starting with two initial centroids in one cluster of each pair of clusters Prof. Pier Luca Lanzi
    • 10 Clusters Example 20 Iteration 1 4 3 2 8 6 4 2 y 0 -2 -4 -6 0 5 10 15 20 x Starting with some pairs of clusters having three initial centroids, while other have only one. Prof. Pier Luca Lanzi
    • 10 Clusters Example 21 Iteration 1 Iteration 2 8 8 6 6 4 4 2 2 y y 0 0 -2 -2 -4 -4 -6 -6 0 5 10 15 20 0 5 10 15 20 x x Iteration 3 Iteration 4 8 8 6 6 4 4 2 2 y y 0 0 -2 -2 -4 -4 -6 -6 0 5 10 15 20 0 5 10 15 20 x x Starting with some pairs of clusters having three initial centroids, while other have only one. Prof. Pier Luca Lanzi
    • K-means 22 Result can vary significantly depending on initial choice of seeds Can get trapped in local minimum initial cluster centers instances To increase chance of finding global optimum: restart with different random seeds Prof. Pier Luca Lanzi
    • K-means Clustering 23 Initial centroids are often chosen randomly Clusters produced vary from one run to another The centroid is (typically) the mean of the points in the cluster ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. K-means will converge for common similarity measures mentioned above. Most of the convergence happens in the first few iterations Often the stopping condition is changed to ‘Until relatively few points change clusters’ Complexity is O(n*K*I*d) n = number of points, K = number of clusters, I = number of iterations, d = number of attributes Prof. Pier Luca Lanzi
    • Solutions to Initial Centroids Problem 24 Multiple runs Helps, but probability is not on your side Sample and use hierarchical clustering to determine initial centroids Select more than k initial centroids and then select among these initial centroids Select most widely separated Postprocessing Bisecting K-means Not as susceptible to initialization issues Prof. Pier Luca Lanzi
    • Handling Empty Clusters 25 Basic K-means algorithm can yield empty clusters Several strategies Choose the point that contributes most to SSE Choose a point from the cluster with the highest SSE If there are several empty clusters, the above can be repeated several times. Prof. Pier Luca Lanzi
    • Updating Centers Incrementally 26 In the basic K-means algorithm, centroids are updated after all points are assigned to a centroid An alternative is to update the centroids after each assignment (incremental approach) Each assignment updates zero or two centroids More expensive Introduces an order dependency Never get an empty cluster Can use “weights” to change the impact Prof. Pier Luca Lanzi
    • Pre-processing and Post-processing 27 Pre-processing Normalize the data Eliminate outliers Post-processing Eliminate small clusters that may represent outliers Split ‘loose’ clusters, i.e., clusters with relatively high SSE Merge clusters that are ‘close’ and that have relatively low SSE These steps can be used during the clustering process Prof. Pier Luca Lanzi
    • Bisecting K-means 28 Variant of K-means that can produce a partitional or a hierarchical clustering Prof. Pier Luca Lanzi
    • Bisecting K-means Example 29 Prof. Pier Luca Lanzi
    • Limitations of K-means 30 K-means has problems when clusters are of differing Sizes Densities Non-globular shapes K-means has problems when the data contains outliers. Prof. Pier Luca Lanzi
    • Limitations of K-means: Differing Sizes 31 K-means (3 Clusters) Original Points Prof. Pier Luca Lanzi
    • Limitations of K-means: Differing Density 32 K-means (3 Clusters) Original Points Prof. Pier Luca Lanzi
    • Limitations of K-means: Non-globular Shapes 33 Original Points K-means (2 Clusters) Prof. Pier Luca Lanzi
    • Overcoming K-means Limitations 34 Original Points K-means Clusters One solution is to use many clusters. Find parts of clusters, but need to put together. Prof. Pier Luca Lanzi
    • Overcoming K-means Limitations 35 Original Points K-means Clusters Prof. Pier Luca Lanzi
    • Overcoming K-means Limitations 36 Original Points K-means Clusters Prof. Pier Luca Lanzi
    • K-means clustering summary 37 Advantages Simple, understandable Items automatically assigned to clusters Disadvantages Must pick number of clusters before hand All items forced into a cluster Too sensitive to outliers Prof. Pier Luca Lanzi
    • K-means clustering 38 Strength Relatively efficient Often terminates at a local optimum The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes Prof. Pier Luca Lanzi
    • Variations of the K-Means Method 39 A few variants of the k-means which differ in Selection of the initial k means Dissimilarity calculations Strategies to calculate cluster means Handling categorical data: k-modes Replacing means of clusters with modes Using new dissimilarity measures to deal with categorical objects Using a frequency-based method to update modes of clusters A mixture of categorical and numerical data: k-prototype method Prof. Pier Luca Lanzi
    • Comments on the K-Means Method 40 Strength: Relatively efficient Complexity is O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k)) Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes Prof. Pier Luca Lanzi
    • K-Medoids Clustering 41 Instead of mean, use medians of each cluster Mean of 1, 3, 5, 7, 9 is 5 Mean of 1, 3, 5, 7, 1009 is 205 Median of 1, 3, 5, 7, 1009 is 5 Median is not affected by extreme values For large databases, use sampling Prof. Pier Luca Lanzi
    • K-Medoids Clustering 42 Find representative objects, called medoids, in clusters PAM (Partitioning Around Medoids, 1987) starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering PAM works effectively for small data sets, but does not scale well for large data sets CLARA (Kaufmann & Rousseeuw, 1990) CLARANS (Ng & Han, 1994): Randomized sampling Focusing + spatial data structure (Ester et al., 1995) Prof. Pier Luca Lanzi
    • A Typical K-Medoids Algorithm (PAM) 43 Total Cost = 20 10 10 10 9 9 9 8 8 8 Arbitrary Assign 7 7 7 6 6 6 choose k each 5 5 5 object as remaining 4 4 4 initial object to 3 3 3 medoids nearest 2 2 2 medoids 1 1 1 0 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 K=2 Randomly select a nonmedoid object,Oramdom Total Cost = 26 Do loop 10 10 Compute 9 9 Swapping O 8 8 total cost of Until no 7 7 and Oramdom swapping 6 6 change 5 5 If quality is 4 4 improved. 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Prof. Pier Luca Lanzi
    • PAM (Partitioning Around Medoids) 44 Use real object to represent the cluster Select k representative objects arbitrarily For each pair of non-selected object h and selected object i, calculate the total swapping cost TCih For each pair of i and h, If TCih < 0, i is replaced by h Then assign each non-selected object to the most similar representative object repeat steps 2-3 until there is no change Prof. Pier Luca Lanzi
    • PAM Clustering: Total swapping cost 45 TCih=∑jCjih 10 10 j 9 9 t t 8 8 7 7 j 6 6 5 5 h i h 4 4 i 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Cjih = 0 Cjih = d(j, h) - d(j, i) 10 10 9 9 h 8 8 7 j 7 6 6 i 5 5 i h j 4 t 4 3 3 t 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Cjih = d(j, h) - d(j, t) Cjih = d(j, t) - d(j, i) Prof. Pier Luca Lanzi
    • What Is the Problem with PAM? 46 Pam is more robust than k-means in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. O(k(n-k)2 ) for each iteration where n is # of data,k is # of clusters Sampling based method, CLARA(Clustering LARge Applications) Prof. Pier Luca Lanzi
    • CLARA (Clustering Large Applications) 47 Developed by Kaufmann and Rousseeuw in 1990 It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output Strength: deals with larger data sets than PAM Weakness: Efficiency depends on the sample size A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased Prof. Pier Luca Lanzi
    • CLARANS (“Randomized” CLARA) 48 CLARANS stands for Clustering Algorithm based on RANdomized Search CLARANS draws sample of neighbors dynamically The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum It is more efficient and scalable than both PAM and CLARA Focusing techniques and spatial access structures may further improve its performance (Ester et al.’95) Prof. Pier Luca Lanzi