SlideShare a Scribd company logo
Clustering
Hierarchical Methods
1
2
Hierarchical Clustering
 Use distance matrix as clustering criteria.
 This method does not require the number of clusters k as an input,
but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
a
a b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
agglomerative
(AGNES)
divisive
(DIANA)
3
AGNES (Agglomerative Nesting)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages
 Use the Single-Link method and the dissimilarity matrix.
 Merge nodes that have the least dissimilarity
 Go on in a non-descending fashion
 Eventually all nodes belong to the same cluster
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
4
Dendrogram
5
DIANA (Divisive Analysis)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages, e.g., Splus
 Inverse order of AGNES
 Eventually each node forms a cluster on its own
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
6
Distance Measures
 Minimum distance dmin(Ci, Cj) = min p ∈Ci,p’ ∈Cj |p-p’|
 Nearest Neighbor algorithm
 Single-linkage algorithm – terminates when distance > threshold
 Agglomerative Hierarchical – Minimal Spanning tree
 Maximum distance dmax(Ci, Cj) = max p ∈Ci,p’ ∈Cj |p-p’|
 Farthest-neighbor clustering algorithm
 Complete-linkage - terminates when distance > threshold
 Mean distance dmean(Ci, Cj) = |mi-mj|
 Average distance davg(Ci, Cj) = (1/ninj) ∑ p ∈Ci ∑,p’ ∈Cj |p-p’|
7
Difficulties faced in Hierarchical
Clustering
 Selection of merge/split points
 Cannot revert operation
 Scalability
8
Recent Hierarchical Clustering Methods
 Integration of hierarchical and other techniques:
 BIRCH: uses tree structures and incrementally adjusts the quality of
sub-clusters
 CURE: Represents a Cluster by a fixed number of representative
objects
 ROCK: clustering categorical data by neighbor and link analysis
 CHAMELEON : hierarchical clustering using dynamic modeling
9
BIRCH
 Balanced Iterative Reducing and Clustering using Hierarchies
 Incrementally construct a CF (Clustering Feature) tree, a hierarchical data
structure for multiphase clustering
 Phase 1: scan DB to build an initial in-memory CF tree (a multi-level
compression of the data that tries to preserve the inherent clustering structure of
the data)
 Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the
CF-tree
 Scales linearly: finds a good clustering with a single scan and improves the
quality with a few additional scans
 Weakness: handles only numeric data, and sensitive to the order of the
data record.
10
Clustering Feature Vector in BIRCH
Clustering Feature: CF = (N, LS, SS)
N: Number of data points
LS: ∑N
i=1=Xi
SS: ∑N
i=1=Xi
2
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
CF = (5, (16,30),(54,190))
(3,4)
(2,6)
(4,5)
(4,7)
(3,8)
11
CF-Tree in BIRCH
 Clustering feature:
 summary of the statistics for a given subcluster
 registers crucial measurements for computing cluster and utilizes storage
efficiently
A CF tree is a height-balanced tree that stores the clustering features
for a hierarchical clustering
 A nonleaf node in a tree has descendants or “children”
 The nonleaf nodes store sums of the CFs of their children
 A CF tree has two parameters
 Branching factor: specify the maximum number of children.
 threshold: max diameter of sub-clusters stored at the leaf nodes
12
CF Tree
CF1
child1
CF3
child3
CF2
child2
CF6
child6
CF1
child1
CF3
child3
CF2
child2
CF5
child5
CF1 CF2 CF6prev next CF1 CF2 CF4prev next
B = 7
L = 6
Root
Non-leaf node
Leaf node Leaf node
13
CURE (Clustering Using Representatives)
 Supports Non-Spherical Clusters
 Fixed number of representative points are used
 Select scattered objects and shrink by moving towards cluster
center
 Merge clusters with closest pair of representatives
 For large Databases
 Draw a random Sample S
 Partition
 Partially cluster each partition
 Eliminate outliers and slow growing clusters
 Continue Clustering
 Label Cluster by Representatives
14
Cure: Shrinking Representative Points
 Shrink the multiple representative points towards the gravity center by a
fraction of α.
 Multiple representatives capture the shape of the cluster
x
y
x
y
15
Clustering Categorical Data: ROCK
Algorithm
 ROCK: RObust Clustering using linKs
 Major ideas
 Use links to measure similarity/proximity
 Not distance-based
 Number of Common Neighbours
 Algorithm: sampling-based clustering
 Draw random sample
 Cluster with links
 Label data in disk
Sim T T
T T
T T
( , )1 2
1 2
1 2
=
∩
∪
16
CHAMELEON: Hierarchical Clustering Using
Dynamic Modeling
 Measures the similarity based on a dynamic model
 Two clusters are merged only if the interconnectivity and closeness
(proximity) between two clusters are high
 A two-phase algorithm
1. Use a graph partitioning algorithm: cluster objects into a large number of
relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the genuine
clusters by repeatedly combining these sub-clusters
17
Overall Framework of CHAMELEON
Construct
Sparse Graph Partition the Graph
Merge Partition
Final Clusters
Data Set
Chameleon
 Computes similarity based on Relative Interconnectivity
and Relative Closeness
18

More Related Content

What's hot

What's hot (20)

Clustering
ClusteringClustering
Clustering
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Kdd process
Kdd processKdd process
Kdd process
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 

Viewers also liked

K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
guestfee8698
 

Viewers also liked (20)

Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule MiningMachine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule Mining
 
28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical Clustering28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical Clustering
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Converting management controllable resources into best in-class performance 2012
Converting management controllable resources into best in-class performance 2012Converting management controllable resources into best in-class performance 2012
Converting management controllable resources into best in-class performance 2012
 
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
1.9 b trees eg 03
1.9 b trees eg 031.9 b trees eg 03
1.9 b trees eg 03
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
trabajo de cultural
trabajo de culturaltrabajo de cultural
trabajo de cultural
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
Online Trading Concepts
Online Trading ConceptsOnline Trading Concepts
Online Trading Concepts
 
RESUME-ARITRA BHOWMIK
RESUME-ARITRA BHOWMIKRESUME-ARITRA BHOWMIK
RESUME-ARITRA BHOWMIK
 

Similar to 3.3 hierarchical methods

ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 

Similar to 3.3 hierarchical methods (20)

My8clst
My8clstMy8clst
My8clst
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Birch
BirchBirch
Birch
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 

More from Krish_ver2

More from Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 
4.2 bst 03
4.2 bst 034.2 bst 03
4.2 bst 03
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
 
4.1 sequentioal search
4.1 sequentioal search4.1 sequentioal search
4.1 sequentioal search
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 

Recently uploaded (20)

Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPoint
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 

3.3 hierarchical methods

  • 2. 2 Hierarchical Clustering  Use distance matrix as clustering criteria.  This method does not require the number of clusters k as an input, but needs a termination condition Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 agglomerative (AGNES) divisive (DIANA)
  • 3. 3 AGNES (Agglomerative Nesting)  Introduced in Kaufmann and Rousseeuw (1990)  Implemented in statistical analysis packages  Use the Single-Link method and the dissimilarity matrix.  Merge nodes that have the least dissimilarity  Go on in a non-descending fashion  Eventually all nodes belong to the same cluster 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 5. 5 DIANA (Divisive Analysis)  Introduced in Kaufmann and Rousseeuw (1990)  Implemented in statistical analysis packages, e.g., Splus  Inverse order of AGNES  Eventually each node forms a cluster on its own 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 6. 6 Distance Measures  Minimum distance dmin(Ci, Cj) = min p ∈Ci,p’ ∈Cj |p-p’|  Nearest Neighbor algorithm  Single-linkage algorithm – terminates when distance > threshold  Agglomerative Hierarchical – Minimal Spanning tree  Maximum distance dmax(Ci, Cj) = max p ∈Ci,p’ ∈Cj |p-p’|  Farthest-neighbor clustering algorithm  Complete-linkage - terminates when distance > threshold  Mean distance dmean(Ci, Cj) = |mi-mj|  Average distance davg(Ci, Cj) = (1/ninj) ∑ p ∈Ci ∑,p’ ∈Cj |p-p’|
  • 7. 7 Difficulties faced in Hierarchical Clustering  Selection of merge/split points  Cannot revert operation  Scalability
  • 8. 8 Recent Hierarchical Clustering Methods  Integration of hierarchical and other techniques:  BIRCH: uses tree structures and incrementally adjusts the quality of sub-clusters  CURE: Represents a Cluster by a fixed number of representative objects  ROCK: clustering categorical data by neighbor and link analysis  CHAMELEON : hierarchical clustering using dynamic modeling
  • 9. 9 BIRCH  Balanced Iterative Reducing and Clustering using Hierarchies  Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering  Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)  Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree  Scales linearly: finds a good clustering with a single scan and improves the quality with a few additional scans  Weakness: handles only numeric data, and sensitive to the order of the data record.
  • 10. 10 Clustering Feature Vector in BIRCH Clustering Feature: CF = (N, LS, SS) N: Number of data points LS: ∑N i=1=Xi SS: ∑N i=1=Xi 2 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CF = (5, (16,30),(54,190)) (3,4) (2,6) (4,5) (4,7) (3,8)
  • 11. 11 CF-Tree in BIRCH  Clustering feature:  summary of the statistics for a given subcluster  registers crucial measurements for computing cluster and utilizes storage efficiently A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering  A nonleaf node in a tree has descendants or “children”  The nonleaf nodes store sums of the CFs of their children  A CF tree has two parameters  Branching factor: specify the maximum number of children.  threshold: max diameter of sub-clusters stored at the leaf nodes
  • 12. 12 CF Tree CF1 child1 CF3 child3 CF2 child2 CF6 child6 CF1 child1 CF3 child3 CF2 child2 CF5 child5 CF1 CF2 CF6prev next CF1 CF2 CF4prev next B = 7 L = 6 Root Non-leaf node Leaf node Leaf node
  • 13. 13 CURE (Clustering Using Representatives)  Supports Non-Spherical Clusters  Fixed number of representative points are used  Select scattered objects and shrink by moving towards cluster center  Merge clusters with closest pair of representatives  For large Databases  Draw a random Sample S  Partition  Partially cluster each partition  Eliminate outliers and slow growing clusters  Continue Clustering  Label Cluster by Representatives
  • 14. 14 Cure: Shrinking Representative Points  Shrink the multiple representative points towards the gravity center by a fraction of α.  Multiple representatives capture the shape of the cluster x y x y
  • 15. 15 Clustering Categorical Data: ROCK Algorithm  ROCK: RObust Clustering using linKs  Major ideas  Use links to measure similarity/proximity  Not distance-based  Number of Common Neighbours  Algorithm: sampling-based clustering  Draw random sample  Cluster with links  Label data in disk Sim T T T T T T ( , )1 2 1 2 1 2 = ∩ ∪
  • 16. 16 CHAMELEON: Hierarchical Clustering Using Dynamic Modeling  Measures the similarity based on a dynamic model  Two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high  A two-phase algorithm 1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters 2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters
  • 17. 17 Overall Framework of CHAMELEON Construct Sparse Graph Partition the Graph Merge Partition Final Clusters Data Set
  • 18. Chameleon  Computes similarity based on Relative Interconnectivity and Relative Closeness 18