Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Clustering

1,167

Published on

Data Clustering and clustering techniques focus on K-means algorithms

Data Clustering and clustering techniques focus on K-means algorithms

Published in: Education
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,167
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
114
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Clustering, K-means variants clustering techniques and applications Jagdeep Matharu Brock University March 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 1 / 54
• 2. Clustering Algorithms ClusteringClustering 1 Grouping together data objects that are in some similar way according to some user deﬁned criteria. 2 Cluster : collection of data objects that are similar to each other 3 A form of Unsupervised learning. 4 Data exploration - Looking for new patterns for structures of data. 5 Optimization problem.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 2 / 54
• 3. Clustering Algorithms ClusteringClustering Task 1 Pattern Representation 2 Pattern proximity measure Most important How much (de)similar two objects are. 3 GroupingJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 3 / 54
• 4. Clustering Algorithms Clustering TechniquesClustering Techniques 1 Hierarchical Algorithms: Create Hierarchical decomposition of the data set. Agglomerative: Bottom-up approach. Divisive: top-down approach. 2 Partition Algorithms: Create partition and then evaluate by some criteria e.g: k-means ,k-medoids Figure 1 : Examples of segmentation based on colour orMarch 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means intensity. 4 / 54
• 5. Clustering Algorithms Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms 1 Sequential Clustering Algorithm 2 Algorithm: assign every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one cluster Compute Distances between and old clusters 3 Use distance matrix as clustering criteria 4 Construct nested partitions layer by layer into tree like structure 5 Resulting cluster can further cut down to get the desired number of cluster.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 5 / 54
• 6. Clustering Algorithms Hierarchical Clustering AlgorithmsCont’d 1 Binary Tree or dendrogram. 2 Where Height of the bars shows how close two objects are.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 6 / 54
• 7. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 7 / 54
• 8. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 8 / 54
• 9. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 9 / 54
• 10. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 10 / 54
• 11. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 11 / 54
• 12. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 12 / 54
• 13. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 13 / 54
• 14. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 14 / 54
• 15. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 15 / 54
• 16. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 16 / 54
• 17. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 17 / 54
• 18. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 18 / 54
• 19. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 19 / 54
• 20. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 20 / 54
• 21. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 21 / 54
• 22. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 22 / 54
• 23. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 23 / 54
• 24. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 24 / 54
• 25. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 25 / 54
• 26. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 26 / 54
• 27. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 27 / 54
• 28. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 28 / 54
• 29. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 29 / 54
• 30. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 30 / 54
• 31. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 31 / 54
• 32. Clustering Algorithms Hierarchical Clustering AlgorithmsStrengths and Weaknesses 1 Pros: No need to assume number of clusters required. Easy to implement. 2 Cons: Time and Space complexity O(n2 ). computing proximity matrix. No objective function directly minimized. Merging decisions are ﬁnal - cannot undone.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 32 / 54
• 33. Partition Clustering algorithmsPartition Clustering algorithms 1 Overview: Construct a partition of a data set D of n objects into a set of k clusters. Value of k is speciﬁed by user. diﬀerent values of k result in diﬀerent cluster output. Find the partition of k clusters that optimize the chosen partition criteria/Error Function. E.g.: Error Sum of Squares(SSE) 2 Combinatorial search can be computationally expensive.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 33 / 54
• 34. Partition Clustering algorithms Partition Clustering algorithmPartition Clustering algorithms 1 k-medoids Use medoid (data point) to represent the cluster. 2 k-means Use centriod to represent the cluster. 3 Variations Bisecting k-means ISODATAJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 34 / 54
• 35. Partition Clustering algorithms Partition Clustering algorithmsk-means algorithms 1 Choose k initial centroids (center points). 2 Each cluster is associated with a centroid. 3 Each data object is assigned to closet centroid. 4 The centroid of each cluster is then updated based on the data objects assignment to the cluster. 5 Repeat the assignment and update steps until convergence. Figure 2 : AlgorithmJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 35 / 54
• 36. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 36 / 54
• 37. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 37 / 54
• 38. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 38 / 54
• 39. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 39 / 54
• 40. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 40 / 54
• 41. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 41 / 54
• 42. Partition Clustering algorithms Partition Clustering algorithmsK-means 1 What is the size of k? 2 How to Choosing initial centroids ? 3 How to assign points to closet centroid ? 4 Cluster evaluation ? 5 Other issues.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 42 / 54
• 43. Partition Clustering algorithms Partition Clustering algorithmsChoosing value of k 1 k represent the number of the clusters required in a partition. 2 Must specify before hand 3 There is no rule of thumb while choosing k - Trail and failure. 4 Diﬀerent sizes may result to diﬀerent results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 43 / 54
• 44. Partition Clustering algorithms Partition Clustering algorithmschoosing initial centroid. 1 Key step of k-means method. 2 Diﬀerent initial centroids can produce diﬀerent results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 44 / 54
• 45. Partition Clustering algorithms Partition Clustering algorithmsExample - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 45 / 54
• 46. Partition Clustering algorithms Partition Clustering algorithmsExample - Sub - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 46 / 54
• 47. Partition Clustering algorithms Partition Clustering algorithmsChoosing intial centroid. 1 Choose Initial centroid randomly. Can lead to poor clustering. 2 Choosing centroid by performing multiple runs with randomly chosen initial centroid. Select the set of clusters with optimal solution. 3 Take a sample of points and cluster them using a hierarchical clustering technique. k clusters are extracted from hierarchy. Centroids of those clusters are used as initial centroids.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 47 / 54
• 48. Partition Clustering algorithms Partition Clustering algorithmsAssigning points to centroid. 1 Goal is to ﬁnd the closest centroid for each data points. 2 Assign data points to the closest centroid . 3 Required proximity measure to calculate distances. Euclidien distance, Manhattan distance. 4 Point is assigned to the centroid with smallest distance.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 48 / 54
• 49. Partition Clustering algorithms Partition Clustering algorithmsCluster Evaluation 1 Most common measure is the sum of squared errors. (SSE) 2 Goal is to reduce the error. 3 Error represent the distance from data point to nearest cluster. 4 Mathematically K dist 2 (mi , x) i=1 x∈Ci 5 Where dist is the distence from a data point to cluster, x is a data point, Ci and Mi is repersentative points for the cluster Ci 6 Given the two clusters, we choose the one with the smallest error. 7 To reduce SSE increase k.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 49 / 54
• 50. Partition Clustering algorithms Partition Clustering algorithmsk-means 1 Pros Easy to implement. Guarantee to converge. In few initial iterations. Linear complexity O(n). 2 Cons Need to specify k, in advance. Sensitive to outliers. May yield empty clusters.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 50 / 54
• 51. Partition Clustering algorithms Partition Clustering algorithmsBisecting k-means 1 Variation of basic k-means method. 2 Can produce a partitional or hierarchical clustering. 3 To obtain K clusters, split the set of all points into two clusters. 4 Choose one of two clusters to split again. Can choose largest cluster between two. Can choose one with hight SSE . Cab choose based on both. 5 Continue until K clusters have been produced.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 51 / 54
• 52. Partition Clustering algorithms Partition Clustering algorithmsISODATA 1 Iterative Self Organizing Data Analysis Technique A 2 Dont need to know the number of clusters. 3 Cluster centers are randomly placed and points are assigned to closest centriod. 4 The standard deviation within each cluster, and the distance between cluster centers is calculated. Clusters are split if standard deviation is greater than the user-deﬁned. Clusters are merged if the distance between them is less than the user-deﬁned threshold.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 52 / 54
• 53. Partition Clustering algorithms Partition Clustering algorithmsPractical Example of k-means 1 Image segmentation using k-means clustering. Figure 3 : Examples of segmentation based on colour or intensity.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 53 / 54
• 54. Partition Clustering algorithms BibliographyBibliography I A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” 1999. P. L. Lanzi. (2007) Clustering: Partitioning methods. [Online]. Available: http://www.slideshare.net/pierluca.lanzi/ machine-learning-and-data-mining-06-clustering-partitioning?from= ss embed Tan. (2005) Introduction to data mining. [Online]. Available: http://www-users.cs.umn.edu/∼kumar/dmbook/dmslides/ chap8 basic cluster analysis.pdfJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 54 / 54