Clustering

1,658 views

Published on

Data Clustering and clustering techniques focus on K-means algorithms

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,658
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
150
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Clustering

  1. 1. Clustering, K-means variants clustering techniques and applications Jagdeep Matharu Brock University March 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 1 / 54
  2. 2. Clustering Algorithms ClusteringClustering 1 Grouping together data objects that are in some similar way according to some user defined criteria. 2 Cluster : collection of data objects that are similar to each other 3 A form of Unsupervised learning. 4 Data exploration - Looking for new patterns for structures of data. 5 Optimization problem.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 2 / 54
  3. 3. Clustering Algorithms ClusteringClustering Task 1 Pattern Representation 2 Pattern proximity measure Most important How much (de)similar two objects are. 3 GroupingJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 3 / 54
  4. 4. Clustering Algorithms Clustering TechniquesClustering Techniques 1 Hierarchical Algorithms: Create Hierarchical decomposition of the data set. Agglomerative: Bottom-up approach. Divisive: top-down approach. 2 Partition Algorithms: Create partition and then evaluate by some criteria e.g: k-means ,k-medoids Figure 1 : Examples of segmentation based on colour orMarch 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means intensity. 4 / 54
  5. 5. Clustering Algorithms Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms 1 Sequential Clustering Algorithm 2 Algorithm: assign every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one cluster Compute Distances between and old clusters 3 Use distance matrix as clustering criteria 4 Construct nested partitions layer by layer into tree like structure 5 Resulting cluster can further cut down to get the desired number of cluster.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 5 / 54
  6. 6. Clustering Algorithms Hierarchical Clustering AlgorithmsCont’d 1 Binary Tree or dendrogram. 2 Where Height of the bars shows how close two objects are.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 6 / 54
  7. 7. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 7 / 54
  8. 8. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 8 / 54
  9. 9. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 9 / 54
  10. 10. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 10 / 54
  11. 11. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 11 / 54
  12. 12. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 12 / 54
  13. 13. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 13 / 54
  14. 14. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 14 / 54
  15. 15. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 15 / 54
  16. 16. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 16 / 54
  17. 17. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 17 / 54
  18. 18. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 18 / 54
  19. 19. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 19 / 54
  20. 20. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 20 / 54
  21. 21. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 21 / 54
  22. 22. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 22 / 54
  23. 23. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 23 / 54
  24. 24. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 24 / 54
  25. 25. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 25 / 54
  26. 26. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 26 / 54
  27. 27. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 27 / 54
  28. 28. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 28 / 54
  29. 29. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 29 / 54
  30. 30. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 30 / 54
  31. 31. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 31 / 54
  32. 32. Clustering Algorithms Hierarchical Clustering AlgorithmsStrengths and Weaknesses 1 Pros: No need to assume number of clusters required. Easy to implement. 2 Cons: Time and Space complexity O(n2 ). computing proximity matrix. No objective function directly minimized. Merging decisions are final - cannot undone.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 32 / 54
  33. 33. Partition Clustering algorithmsPartition Clustering algorithms 1 Overview: Construct a partition of a data set D of n objects into a set of k clusters. Value of k is specified by user. different values of k result in different cluster output. Find the partition of k clusters that optimize the chosen partition criteria/Error Function. E.g.: Error Sum of Squares(SSE) 2 Combinatorial search can be computationally expensive.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 33 / 54
  34. 34. Partition Clustering algorithms Partition Clustering algorithmPartition Clustering algorithms 1 k-medoids Use medoid (data point) to represent the cluster. 2 k-means Use centriod to represent the cluster. 3 Variations Bisecting k-means ISODATAJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 34 / 54
  35. 35. Partition Clustering algorithms Partition Clustering algorithmsk-means algorithms 1 Choose k initial centroids (center points). 2 Each cluster is associated with a centroid. 3 Each data object is assigned to closet centroid. 4 The centroid of each cluster is then updated based on the data objects assignment to the cluster. 5 Repeat the assignment and update steps until convergence. Figure 2 : AlgorithmJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 35 / 54
  36. 36. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 36 / 54
  37. 37. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 37 / 54
  38. 38. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 38 / 54
  39. 39. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 39 / 54
  40. 40. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 40 / 54
  41. 41. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 41 / 54
  42. 42. Partition Clustering algorithms Partition Clustering algorithmsK-means 1 What is the size of k? 2 How to Choosing initial centroids ? 3 How to assign points to closet centroid ? 4 Cluster evaluation ? 5 Other issues.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 42 / 54
  43. 43. Partition Clustering algorithms Partition Clustering algorithmsChoosing value of k 1 k represent the number of the clusters required in a partition. 2 Must specify before hand 3 There is no rule of thumb while choosing k - Trail and failure. 4 Different sizes may result to different results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 43 / 54
  44. 44. Partition Clustering algorithms Partition Clustering algorithmschoosing initial centroid. 1 Key step of k-means method. 2 Different initial centroids can produce different results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 44 / 54
  45. 45. Partition Clustering algorithms Partition Clustering algorithmsExample - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 45 / 54
  46. 46. Partition Clustering algorithms Partition Clustering algorithmsExample - Sub - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 46 / 54
  47. 47. Partition Clustering algorithms Partition Clustering algorithmsChoosing intial centroid. 1 Choose Initial centroid randomly. Can lead to poor clustering. 2 Choosing centroid by performing multiple runs with randomly chosen initial centroid. Select the set of clusters with optimal solution. 3 Take a sample of points and cluster them using a hierarchical clustering technique. k clusters are extracted from hierarchy. Centroids of those clusters are used as initial centroids.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 47 / 54
  48. 48. Partition Clustering algorithms Partition Clustering algorithmsAssigning points to centroid. 1 Goal is to find the closest centroid for each data points. 2 Assign data points to the closest centroid . 3 Required proximity measure to calculate distances. Euclidien distance, Manhattan distance. 4 Point is assigned to the centroid with smallest distance.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 48 / 54
  49. 49. Partition Clustering algorithms Partition Clustering algorithmsCluster Evaluation 1 Most common measure is the sum of squared errors. (SSE) 2 Goal is to reduce the error. 3 Error represent the distance from data point to nearest cluster. 4 Mathematically K dist 2 (mi , x) i=1 x∈Ci 5 Where dist is the distence from a data point to cluster, x is a data point, Ci and Mi is repersentative points for the cluster Ci 6 Given the two clusters, we choose the one with the smallest error. 7 To reduce SSE increase k.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 49 / 54
  50. 50. Partition Clustering algorithms Partition Clustering algorithmsk-means 1 Pros Easy to implement. Guarantee to converge. In few initial iterations. Linear complexity O(n). 2 Cons Need to specify k, in advance. Sensitive to outliers. May yield empty clusters.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 50 / 54
  51. 51. Partition Clustering algorithms Partition Clustering algorithmsBisecting k-means 1 Variation of basic k-means method. 2 Can produce a partitional or hierarchical clustering. 3 To obtain K clusters, split the set of all points into two clusters. 4 Choose one of two clusters to split again. Can choose largest cluster between two. Can choose one with hight SSE . Cab choose based on both. 5 Continue until K clusters have been produced.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 51 / 54
  52. 52. Partition Clustering algorithms Partition Clustering algorithmsISODATA 1 Iterative Self Organizing Data Analysis Technique A 2 Dont need to know the number of clusters. 3 Cluster centers are randomly placed and points are assigned to closest centriod. 4 The standard deviation within each cluster, and the distance between cluster centers is calculated. Clusters are split if standard deviation is greater than the user-defined. Clusters are merged if the distance between them is less than the user-defined threshold.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 52 / 54
  53. 53. Partition Clustering algorithms Partition Clustering algorithmsPractical Example of k-means 1 Image segmentation using k-means clustering. Figure 3 : Examples of segmentation based on colour or intensity.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 53 / 54
  54. 54. Partition Clustering algorithms BibliographyBibliography I A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” 1999. P. L. Lanzi. (2007) Clustering: Partitioning methods. [Online]. Available: http://www.slideshare.net/pierluca.lanzi/ machine-learning-and-data-mining-06-clustering-partitioning?from= ss embed Tan. (2005) Introduction to data mining. [Online]. Available: http://www-users.cs.umn.edu/∼kumar/dmbook/dmslides/ chap8 basic cluster analysis.pdfJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 54 / 54

×