Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- K means Clustering Algorithm by Kasun Ranga Wijew... 9184 views
- K-Means & K-Medoids Algoritması by Sinem Altan 6025 views
- K means by 典陽(Tien Yang) 吳(Wu) 205 views
- clustering by Khaoula Mabrouki 1586 views
- K-Means, its Variants and its Appli... by Varad Meru 3501 views
- K Means Clustering of Web Pages bas... by Kartik Rao 1894 views
- K means by Mohamed Heny Selmi 3655 views
- K-means and Hierarchical Clustering by guestfee8698 2694 views
- K means Clustering by Edureka! 3109 views
- Cluster analysis by Jewel Refran 22586 views
- Final Presentation (PPT) [3768KB] by Prezi22 1109 views
- 簡易爬蟲製作和Pttcrawler by 典陽(Tien Yang) 吳(Wu) 174 views

1,167

Published on

Data Clustering and clustering techniques focus on K-means algorithms

Data Clustering and clustering techniques focus on K-means algorithms

Published in:
Education

No Downloads

Total Views

1,167

On Slideshare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

114

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Clustering, K-means variants clustering techniques and applications Jagdeep Matharu Brock University March 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 1 / 54
- 2. Clustering Algorithms ClusteringClustering 1 Grouping together data objects that are in some similar way according to some user deﬁned criteria. 2 Cluster : collection of data objects that are similar to each other 3 A form of Unsupervised learning. 4 Data exploration - Looking for new patterns for structures of data. 5 Optimization problem.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 2 / 54
- 3. Clustering Algorithms ClusteringClustering Task 1 Pattern Representation 2 Pattern proximity measure Most important How much (de)similar two objects are. 3 GroupingJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 3 / 54
- 4. Clustering Algorithms Clustering TechniquesClustering Techniques 1 Hierarchical Algorithms: Create Hierarchical decomposition of the data set. Agglomerative: Bottom-up approach. Divisive: top-down approach. 2 Partition Algorithms: Create partition and then evaluate by some criteria e.g: k-means ,k-medoids Figure 1 : Examples of segmentation based on colour orMarch 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means intensity. 4 / 54
- 5. Clustering Algorithms Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms 1 Sequential Clustering Algorithm 2 Algorithm: assign every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one cluster Compute Distances between and old clusters 3 Use distance matrix as clustering criteria 4 Construct nested partitions layer by layer into tree like structure 5 Resulting cluster can further cut down to get the desired number of cluster.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 5 / 54
- 6. Clustering Algorithms Hierarchical Clustering AlgorithmsCont’d 1 Binary Tree or dendrogram. 2 Where Height of the bars shows how close two objects are.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 6 / 54
- 7. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 7 / 54
- 8. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 8 / 54
- 9. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 9 / 54
- 10. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 10 / 54
- 11. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 11 / 54
- 12. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 12 / 54
- 13. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 13 / 54
- 14. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 14 / 54
- 15. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 15 / 54
- 16. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 16 / 54
- 17. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 17 / 54
- 18. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 18 / 54
- 19. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 19 / 54
- 20. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 20 / 54
- 21. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 21 / 54
- 22. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 22 / 54
- 23. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 23 / 54
- 24. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 24 / 54
- 25. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 25 / 54
- 26. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 26 / 54
- 27. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 27 / 54
- 28. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 28 / 54
- 29. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 29 / 54
- 30. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 30 / 54
- 31. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 31 / 54
- 32. Clustering Algorithms Hierarchical Clustering AlgorithmsStrengths and Weaknesses 1 Pros: No need to assume number of clusters required. Easy to implement. 2 Cons: Time and Space complexity O(n2 ). computing proximity matrix. No objective function directly minimized. Merging decisions are ﬁnal - cannot undone.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 32 / 54
- 33. Partition Clustering algorithmsPartition Clustering algorithms 1 Overview: Construct a partition of a data set D of n objects into a set of k clusters. Value of k is speciﬁed by user. diﬀerent values of k result in diﬀerent cluster output. Find the partition of k clusters that optimize the chosen partition criteria/Error Function. E.g.: Error Sum of Squares(SSE) 2 Combinatorial search can be computationally expensive.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 33 / 54
- 34. Partition Clustering algorithms Partition Clustering algorithmPartition Clustering algorithms 1 k-medoids Use medoid (data point) to represent the cluster. 2 k-means Use centriod to represent the cluster. 3 Variations Bisecting k-means ISODATAJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 34 / 54
- 35. Partition Clustering algorithms Partition Clustering algorithmsk-means algorithms 1 Choose k initial centroids (center points). 2 Each cluster is associated with a centroid. 3 Each data object is assigned to closet centroid. 4 The centroid of each cluster is then updated based on the data objects assignment to the cluster. 5 Repeat the assignment and update steps until convergence. Figure 2 : AlgorithmJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 35 / 54
- 36. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 36 / 54
- 37. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 37 / 54
- 38. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 38 / 54
- 39. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 39 / 54
- 40. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 40 / 54
- 41. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 41 / 54
- 42. Partition Clustering algorithms Partition Clustering algorithmsK-means 1 What is the size of k? 2 How to Choosing initial centroids ? 3 How to assign points to closet centroid ? 4 Cluster evaluation ? 5 Other issues.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 42 / 54
- 43. Partition Clustering algorithms Partition Clustering algorithmsChoosing value of k 1 k represent the number of the clusters required in a partition. 2 Must specify before hand 3 There is no rule of thumb while choosing k - Trail and failure. 4 Diﬀerent sizes may result to diﬀerent results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 43 / 54
- 44. Partition Clustering algorithms Partition Clustering algorithmschoosing initial centroid. 1 Key step of k-means method. 2 Diﬀerent initial centroids can produce diﬀerent results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 44 / 54
- 45. Partition Clustering algorithms Partition Clustering algorithmsExample - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 45 / 54
- 46. Partition Clustering algorithms Partition Clustering algorithmsExample - Sub - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 46 / 54
- 47. Partition Clustering algorithms Partition Clustering algorithmsChoosing intial centroid. 1 Choose Initial centroid randomly. Can lead to poor clustering. 2 Choosing centroid by performing multiple runs with randomly chosen initial centroid. Select the set of clusters with optimal solution. 3 Take a sample of points and cluster them using a hierarchical clustering technique. k clusters are extracted from hierarchy. Centroids of those clusters are used as initial centroids.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 47 / 54
- 48. Partition Clustering algorithms Partition Clustering algorithmsAssigning points to centroid. 1 Goal is to ﬁnd the closest centroid for each data points. 2 Assign data points to the closest centroid . 3 Required proximity measure to calculate distances. Euclidien distance, Manhattan distance. 4 Point is assigned to the centroid with smallest distance.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 48 / 54
- 49. Partition Clustering algorithms Partition Clustering algorithmsCluster Evaluation 1 Most common measure is the sum of squared errors. (SSE) 2 Goal is to reduce the error. 3 Error represent the distance from data point to nearest cluster. 4 Mathematically K dist 2 (mi , x) i=1 x∈Ci 5 Where dist is the distence from a data point to cluster, x is a data point, Ci and Mi is repersentative points for the cluster Ci 6 Given the two clusters, we choose the one with the smallest error. 7 To reduce SSE increase k.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 49 / 54
- 50. Partition Clustering algorithms Partition Clustering algorithmsk-means 1 Pros Easy to implement. Guarantee to converge. In few initial iterations. Linear complexity O(n). 2 Cons Need to specify k, in advance. Sensitive to outliers. May yield empty clusters.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 50 / 54
- 51. Partition Clustering algorithms Partition Clustering algorithmsBisecting k-means 1 Variation of basic k-means method. 2 Can produce a partitional or hierarchical clustering. 3 To obtain K clusters, split the set of all points into two clusters. 4 Choose one of two clusters to split again. Can choose largest cluster between two. Can choose one with hight SSE . Cab choose based on both. 5 Continue until K clusters have been produced.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 51 / 54
- 52. Partition Clustering algorithms Partition Clustering algorithmsISODATA 1 Iterative Self Organizing Data Analysis Technique A 2 Dont need to know the number of clusters. 3 Cluster centers are randomly placed and points are assigned to closest centriod. 4 The standard deviation within each cluster, and the distance between cluster centers is calculated. Clusters are split if standard deviation is greater than the user-deﬁned. Clusters are merged if the distance between them is less than the user-deﬁned threshold.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 52 / 54
- 53. Partition Clustering algorithms Partition Clustering algorithmsPractical Example of k-means 1 Image segmentation using k-means clustering. Figure 3 : Examples of segmentation based on colour or intensity.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 53 / 54
- 54. Partition Clustering algorithms BibliographyBibliography I A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” 1999. P. L. Lanzi. (2007) Clustering: Partitioning methods. [Online]. Available: http://www.slideshare.net/pierluca.lanzi/ machine-learning-and-data-mining-06-clustering-partitioning?from= ss embed Tan. (2005) Introduction to data mining. [Online]. Available: http://www-users.cs.umn.edu/∼kumar/dmbook/dmslides/ chap8 basic cluster analysis.pdfJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 54 / 54

Be the first to comment