Your SlideShare is downloading. ×
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Clustering

1,167

Published on

Data Clustering and clustering techniques focus on K-means algorithms

Data Clustering and clustering techniques focus on K-means algorithms

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,167
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
114
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Clustering, K-means variants clustering techniques and applications Jagdeep Matharu Brock University March 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 1 / 54
  • 2. Clustering Algorithms ClusteringClustering 1 Grouping together data objects that are in some similar way according to some user defined criteria. 2 Cluster : collection of data objects that are similar to each other 3 A form of Unsupervised learning. 4 Data exploration - Looking for new patterns for structures of data. 5 Optimization problem.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 2 / 54
  • 3. Clustering Algorithms ClusteringClustering Task 1 Pattern Representation 2 Pattern proximity measure Most important How much (de)similar two objects are. 3 GroupingJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 3 / 54
  • 4. Clustering Algorithms Clustering TechniquesClustering Techniques 1 Hierarchical Algorithms: Create Hierarchical decomposition of the data set. Agglomerative: Bottom-up approach. Divisive: top-down approach. 2 Partition Algorithms: Create partition and then evaluate by some criteria e.g: k-means ,k-medoids Figure 1 : Examples of segmentation based on colour orMarch 18th 2013Jagdeep Matharu (Brock University) Clustering - k-means intensity. 4 / 54
  • 5. Clustering Algorithms Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms 1 Sequential Clustering Algorithm 2 Algorithm: assign every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one cluster Compute Distances between and old clusters 3 Use distance matrix as clustering criteria 4 Construct nested partitions layer by layer into tree like structure 5 Resulting cluster can further cut down to get the desired number of cluster.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 5 / 54
  • 6. Clustering Algorithms Hierarchical Clustering AlgorithmsCont’d 1 Binary Tree or dendrogram. 2 Where Height of the bars shows how close two objects are.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 6 / 54
  • 7. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 7 / 54
  • 8. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 8 / 54
  • 9. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 9 / 54
  • 10. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 10 / 54
  • 11. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 11 / 54
  • 12. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 12 / 54
  • 13. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 13 / 54
  • 14. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 14 / 54
  • 15. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 15 / 54
  • 16. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 16 / 54
  • 17. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 17 / 54
  • 18. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 18 / 54
  • 19. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 19 / 54
  • 20. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 20 / 54
  • 21. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 21 / 54
  • 22. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 22 / 54
  • 23. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 23 / 54
  • 24. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 24 / 54
  • 25. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 25 / 54
  • 26. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 26 / 54
  • 27. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 27 / 54
  • 28. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 28 / 54
  • 29. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 29 / 54
  • 30. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 30 / 54
  • 31. Clustering Algorithms Hierarchical Clustering AlgorithmsExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 31 / 54
  • 32. Clustering Algorithms Hierarchical Clustering AlgorithmsStrengths and Weaknesses 1 Pros: No need to assume number of clusters required. Easy to implement. 2 Cons: Time and Space complexity O(n2 ). computing proximity matrix. No objective function directly minimized. Merging decisions are final - cannot undone.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 32 / 54
  • 33. Partition Clustering algorithmsPartition Clustering algorithms 1 Overview: Construct a partition of a data set D of n objects into a set of k clusters. Value of k is specified by user. different values of k result in different cluster output. Find the partition of k clusters that optimize the chosen partition criteria/Error Function. E.g.: Error Sum of Squares(SSE) 2 Combinatorial search can be computationally expensive.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 33 / 54
  • 34. Partition Clustering algorithms Partition Clustering algorithmPartition Clustering algorithms 1 k-medoids Use medoid (data point) to represent the cluster. 2 k-means Use centriod to represent the cluster. 3 Variations Bisecting k-means ISODATAJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 34 / 54
  • 35. Partition Clustering algorithms Partition Clustering algorithmsk-means algorithms 1 Choose k initial centroids (center points). 2 Each cluster is associated with a centroid. 3 Each data object is assigned to closet centroid. 4 The centroid of each cluster is then updated based on the data objects assignment to the cluster. 5 Repeat the assignment and update steps until convergence. Figure 2 : AlgorithmJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 35 / 54
  • 36. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 36 / 54
  • 37. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 37 / 54
  • 38. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 38 / 54
  • 39. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 39 / 54
  • 40. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 40 / 54
  • 41. Partition Clustering algorithms Partition Clustering algorithmsK-means ExampleJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 41 / 54
  • 42. Partition Clustering algorithms Partition Clustering algorithmsK-means 1 What is the size of k? 2 How to Choosing initial centroids ? 3 How to assign points to closet centroid ? 4 Cluster evaluation ? 5 Other issues.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 42 / 54
  • 43. Partition Clustering algorithms Partition Clustering algorithmsChoosing value of k 1 k represent the number of the clusters required in a partition. 2 Must specify before hand 3 There is no rule of thumb while choosing k - Trail and failure. 4 Different sizes may result to different results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 43 / 54
  • 44. Partition Clustering algorithms Partition Clustering algorithmschoosing initial centroid. 1 Key step of k-means method. 2 Different initial centroids can produce different results.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 44 / 54
  • 45. Partition Clustering algorithms Partition Clustering algorithmsExample - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 45 / 54
  • 46. Partition Clustering algorithms Partition Clustering algorithmsExample - Sub - Optimal Initial Centroid.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 46 / 54
  • 47. Partition Clustering algorithms Partition Clustering algorithmsChoosing intial centroid. 1 Choose Initial centroid randomly. Can lead to poor clustering. 2 Choosing centroid by performing multiple runs with randomly chosen initial centroid. Select the set of clusters with optimal solution. 3 Take a sample of points and cluster them using a hierarchical clustering technique. k clusters are extracted from hierarchy. Centroids of those clusters are used as initial centroids.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 47 / 54
  • 48. Partition Clustering algorithms Partition Clustering algorithmsAssigning points to centroid. 1 Goal is to find the closest centroid for each data points. 2 Assign data points to the closest centroid . 3 Required proximity measure to calculate distances. Euclidien distance, Manhattan distance. 4 Point is assigned to the centroid with smallest distance.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 48 / 54
  • 49. Partition Clustering algorithms Partition Clustering algorithmsCluster Evaluation 1 Most common measure is the sum of squared errors. (SSE) 2 Goal is to reduce the error. 3 Error represent the distance from data point to nearest cluster. 4 Mathematically K dist 2 (mi , x) i=1 x∈Ci 5 Where dist is the distence from a data point to cluster, x is a data point, Ci and Mi is repersentative points for the cluster Ci 6 Given the two clusters, we choose the one with the smallest error. 7 To reduce SSE increase k.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 49 / 54
  • 50. Partition Clustering algorithms Partition Clustering algorithmsk-means 1 Pros Easy to implement. Guarantee to converge. In few initial iterations. Linear complexity O(n). 2 Cons Need to specify k, in advance. Sensitive to outliers. May yield empty clusters.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 50 / 54
  • 51. Partition Clustering algorithms Partition Clustering algorithmsBisecting k-means 1 Variation of basic k-means method. 2 Can produce a partitional or hierarchical clustering. 3 To obtain K clusters, split the set of all points into two clusters. 4 Choose one of two clusters to split again. Can choose largest cluster between two. Can choose one with hight SSE . Cab choose based on both. 5 Continue until K clusters have been produced.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 51 / 54
  • 52. Partition Clustering algorithms Partition Clustering algorithmsISODATA 1 Iterative Self Organizing Data Analysis Technique A 2 Dont need to know the number of clusters. 3 Cluster centers are randomly placed and points are assigned to closest centriod. 4 The standard deviation within each cluster, and the distance between cluster centers is calculated. Clusters are split if standard deviation is greater than the user-defined. Clusters are merged if the distance between them is less than the user-defined threshold.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 52 / 54
  • 53. Partition Clustering algorithms Partition Clustering algorithmsPractical Example of k-means 1 Image segmentation using k-means clustering. Figure 3 : Examples of segmentation based on colour or intensity.Jagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 53 / 54
  • 54. Partition Clustering algorithms BibliographyBibliography I A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” 1999. P. L. Lanzi. (2007) Clustering: Partitioning methods. [Online]. Available: http://www.slideshare.net/pierluca.lanzi/ machine-learning-and-data-mining-06-clustering-partitioning?from= ss embed Tan. (2005) Introduction to data mining. [Online]. Available: http://www-users.cs.umn.edu/∼kumar/dmbook/dmslides/ chap8 basic cluster analysis.pdfJagdeep Matharu (Brock University) Clustering - k-means March 18th 2013 54 / 54

×