Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DATA MINING:Clustering Types

data mining : Clustering Methods

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

DATA MINING:Clustering Types

  1. 1. PPrreesseenntteedd BByy,, AAsshhwwiinn SShheennooyy MM 44CCBB1133SSCCSS0022 1
  2. 2.  Partitioning methods  HHiieerraarrcchhiiccaall mmeetthhooddss  DDeennssiittyy--bbaasseedd mmeetthhooddss  GGrriidd--bbaasseedd mmeetthhooddss  MMooddeell--bbaasseedd mmeetthhooddss (conceptual clustering, neural networks) 2
  3. 3.  AA ppaarrttiittiioonniinngg mmeetthhoodd:: construct a partition of a database D of nn objects into a set of k clusters such that o each cluster contains at least one object o each object belongs to exactly one cluster 3
  4. 4. mmeetthhooddss: o kk--mmeeaannss : Each cluster is represented by the center of the cluster (cceennttrrooiidd). o kk--mmeeddooiiddss : Each cluster is represented by one of the objects in the cluster (mmeeddooiidd). 4
  5. 5.  IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, and a database of n objects  AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 1. partition object into k nonempty subsets/clusters 2. compute a seed points as the cceennttrrooiidd (the mean of the objects in the cluster) for each cluster in the current partition 3. assign each object to the cluster with the nearest centroid 4. go back to Step 2, stop when there are no more new assignments 5
  6. 6. Alternative algorithm aallssoo ccoonnssiissttss ooff ffoouurr sstteeppss: 1. arbitrarily choose k objects as the initial cluster centers (centroids) 2. (re)assign each object to the cluster with the nearest centroid 3. update the centroids 4. go back to Step 2, stop when there are no more new assignments 6
  7. 7. 7 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
  8. 8.  IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, and a database of n objects  AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 1. arbitrarily choose k objects as the initial mmeeddooiiddss (representative objects) 2. assign each remaining object to the cluster with the nearest medoid 3. select a nonmedoid and replace one of the medoids with it if this improves the clustering 4. go back to Step 2, stop when there are no more new assignments 8
  9. 9.  AA hhiieerraarrcchhiiccaall mmeetthhoodd:: construct a hierarchy of clustering, not just a single partition of objects.  The number of clusters kk is not required as an input .  Use a ddiissttaannccee mmaattrriixx as clustering criteria  A tteerrmmiinnaattiioonn ccoonnddiittiioonn can be used (e.g., a number of clusters) 9
  10. 10. 11..aagggglloommeerraattiivvee (bottom-up): oplace each object in its own cluster. omerge in each step the two most similar clusters until there is only one cluster left or the termination condition is satisfied. 22..ddiivviissiivvee (top-down): ostart with one big cluster containing all the objects odivide the most distinctive cluster into smaller clusters and proceed until there are n clusters or the termination condition is satisfied. 10
  11. 11. 11 Step 0 Step 1 Step 2 Step 3 Step 4 a a b b c d e d e a b c d e c d e Step 4 Step 3 Step 2 Step 1 Step 0 aagggglloommeerraattiivvee ddiivviissiivvee
  12. 12.  Birch: Balanced Iterative Reducing and Clustering using Hierarchies.  Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data). Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree. 12
  13. 13.  Clustering feature: • summary of the statistics for a given subcluster: the 0-th, 1st and 2nd moments of the subcluster from the statistical point of view. • registers crucial measurements for computing cluster and utilizes storage efficiently A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering • A nonleaf node in a tree has descendants or “children” • The nonleaf nodes store sums of the CFs of their children  A CF tree has two parameters • Branching factor: specify the maximum number of children. • threshold: max diameter of sub-clusters stored at the leaf nodes 13
  14. 14. 14 CF1 child1 Root CF3 child3 CF2 child2 CF6 child6 CF1 child1 Non-leaf node CF3 child3 CF2 child2 CF5 child5 B = 7 L = 6 Leaf node Leaf node CF1 CF2 CF6 prev next CF1 CF2 CF4 prev next
  15. 15.  ROCK: RObust Clustering using linKs  Major ideas • Use links to measure similarity/proximity. 15
  16. 16.  Links: # of common neighbors • C1 <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e} • C2 <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}  Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f} • link(T1, T2) = 4, since they have 4 common neighbors  {a, c, d}, {a, c, e}, {b, c, d}, {b, c, e} • link(T1, T3) = 3, since they have 3 common neighbors  {a, b, d}, {a, b, e}, {a, b, g}  Thus link is a better measure than Jaccard coefficient 16
  17. 17.  Measures the similarity based on a dynamic model • Two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal interconnectivity of the clusters and closeness of items within the clusters  A two-phase algorithm 1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters 2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters 17
  18. 18. 18 Construct Sparse Graph Partition the Graph Merge Partition Final Clusters Data Set
  19. 19. THANK YOU 19

    Be the first to comment

    Login to see the comments

  • 831jsh

    Dec. 1, 2014
  • AnithaVenugopal1

    Sep. 9, 2015
  • rashpinky

    Mar. 3, 2017
  • ZfMohammed

    Nov. 17, 2017
  • koushiknethala

    Apr. 18, 2019
  • NaveenRakkasi

    Dec. 6, 2019

data mining : Clustering Methods

Views

Total views

748

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

69

Shares

0

Comments

0

Likes

6

×