Hierarchical clustering methods group data points into a hierarchy of clusters based on their distance or similarity. There are two main approaches: agglomerative, which starts with each point as a separate cluster and merges them; and divisive, which starts with all points in one cluster and splits them. AGNES and DIANA are common agglomerative and divisive algorithms. Hierarchical clustering represents the hierarchy as a dendrogram tree structure and allows exploring data at different granularities of clusters.
2. 2
Hierarchical Clustering
Use distance matrix as clustering criteria.
This method does not require the number of clusters k as an input,
but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
a
a b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
agglomerative
(AGNES)
divisive
(DIANA)
3. 3
AGNES (Agglomerative Nesting)
Introduced in Kaufmann and Rousseeuw (1990)
Implemented in statistical analysis packages
Use the Single-Link method and the dissimilarity matrix.
Merge nodes that have the least dissimilarity
Go on in a non-descending fashion
Eventually all nodes belong to the same cluster
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
5. 5
DIANA (Divisive Analysis)
Introduced in Kaufmann and Rousseeuw (1990)
Implemented in statistical analysis packages, e.g., Splus
Inverse order of AGNES
Eventually each node forms a cluster on its own
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
6. 6
Distance Measures
Minimum distance dmin(Ci, Cj) = min p ∈Ci,p’ ∈Cj |p-p’|
Nearest Neighbor algorithm
Single-linkage algorithm – terminates when distance > threshold
Agglomerative Hierarchical – Minimal Spanning tree
Maximum distance dmax(Ci, Cj) = max p ∈Ci,p’ ∈Cj |p-p’|
Farthest-neighbor clustering algorithm
Complete-linkage - terminates when distance > threshold
Mean distance dmean(Ci, Cj) = |mi-mj|
Average distance davg(Ci, Cj) = (1/ninj) ∑ p ∈Ci ∑,p’ ∈Cj |p-p’|
7. 7
Difficulties faced in Hierarchical
Clustering
Selection of merge/split points
Cannot revert operation
Scalability
8. 8
Recent Hierarchical Clustering Methods
Integration of hierarchical and other techniques:
BIRCH: uses tree structures and incrementally adjusts the quality of
sub-clusters
CURE: Represents a Cluster by a fixed number of representative
objects
ROCK: clustering categorical data by neighbor and link analysis
CHAMELEON : hierarchical clustering using dynamic modeling
9. 9
BIRCH
Balanced Iterative Reducing and Clustering using Hierarchies
Incrementally construct a CF (Clustering Feature) tree, a hierarchical data
structure for multiphase clustering
Phase 1: scan DB to build an initial in-memory CF tree (a multi-level
compression of the data that tries to preserve the inherent clustering structure of
the data)
Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the
CF-tree
Scales linearly: finds a good clustering with a single scan and improves the
quality with a few additional scans
Weakness: handles only numeric data, and sensitive to the order of the
data record.
11. 11
CF-Tree in BIRCH
Clustering feature:
summary of the statistics for a given subcluster
registers crucial measurements for computing cluster and utilizes storage
efficiently
A CF tree is a height-balanced tree that stores the clustering features
for a hierarchical clustering
A nonleaf node in a tree has descendants or “children”
The nonleaf nodes store sums of the CFs of their children
A CF tree has two parameters
Branching factor: specify the maximum number of children.
threshold: max diameter of sub-clusters stored at the leaf nodes
13. 13
CURE (Clustering Using Representatives)
Supports Non-Spherical Clusters
Fixed number of representative points are used
Select scattered objects and shrink by moving towards cluster
center
Merge clusters with closest pair of representatives
For large Databases
Draw a random Sample S
Partition
Partially cluster each partition
Eliminate outliers and slow growing clusters
Continue Clustering
Label Cluster by Representatives
14. 14
Cure: Shrinking Representative Points
Shrink the multiple representative points towards the gravity center by a
fraction of α.
Multiple representatives capture the shape of the cluster
x
y
x
y
15. 15
Clustering Categorical Data: ROCK
Algorithm
ROCK: RObust Clustering using linKs
Major ideas
Use links to measure similarity/proximity
Not distance-based
Number of Common Neighbours
Algorithm: sampling-based clustering
Draw random sample
Cluster with links
Label data in disk
Sim T T
T T
T T
( , )1 2
1 2
1 2
=
∩
∪
16. 16
CHAMELEON: Hierarchical Clustering Using
Dynamic Modeling
Measures the similarity based on a dynamic model
Two clusters are merged only if the interconnectivity and closeness
(proximity) between two clusters are high
A two-phase algorithm
1. Use a graph partitioning algorithm: cluster objects into a large number of
relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the genuine
clusters by repeatedly combining these sub-clusters
17. 17
Overall Framework of CHAMELEON
Construct
Sparse Graph Partition the Graph
Merge Partition
Final Clusters
Data Set