2. Content
•Clustering
•Types of Clustering
•Hierarchical Clustering
•Key Concepts in Hierarchal Clustering
•Types of Hierarchal Clustering
•Difference between Partitional and Hierarchical clustering
•The Application of Hierarchical Clustering
•Hierarchical Clustering pros and cons
•Summary
4/10/2018 HIERARCHICAL CLUSTERING 2
3. Clustering
•Clustering is the classification of objects into different groups,
or more precisely, the partitioning of a data set into subsets
(clusters), so that the data in each subset (ideally) share some
common trait - often according to some defined distance
measure.
4/10/2018 HIERARCHICAL CLUSTERING 3
4. Clustering (cont..)
•Also called “unsupervised learning” or “data mining”
•Organizing data into classes such that there is
• high intra-class similarity
• low intra-class similarity
•More informally, finding natural groupings among objects.
4/10/2018 HIERARCHICAL CLUSTERING 4
5. Clustering (cont..)
What is a natural grouping among these objects?
Clustering is subjective
Rahim’s Family School Employs Females Males
4/10/2018 HIERARCHICAL CLUSTERING 5
7. Hierarchical Clustering
•Produces a set of nested clusters organized as a hierarchical
tree.
•Can be visualized as a dendrogram.
»A tree-like diagram that records the sequences of merge or splits
4/10/2018 HIERARCHICAL CLUSTERING 7
8. Key Concepts in Hierarchal Clustering
Dendrogram tree representation
2
3
4
5
6
object
lifetime
1. In the beginning we have 6
clusters: A, B, C, D, E and F
2. We merge clusters D and F into
cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
into (A, B) at distance 0.71
4. We merge clusters E and (D, F)
into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
into (((D, F), E), C) at distance 1.41
6. We merge clusters (((D, F), E), C)
and (A, B) into ((((D, F), E), C), (A, B))
at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation
4/10/2018 HIERARCHICAL CLUSTERING 8
9. Types of Hierarchal Clustering
Two main types of hierarchical clustering
• Agglomerative:
» Start with the points as individual clusters
» At each step, merge the closest pair of clusters until only one cluster (or
K clusters) left
• Divisive:
» Start with one, all-inclusive cluster
» At each step, split a cluster until each cluster contains a point (or there
are K clusters)
4/10/2018 HIERARCHICAL CLUSTERING 9
10. Difference between Partitional and
Hierarchical clustering
Partitional clustering
» Partitional clustering is faster
than hierarchical clustering.
» Partitional clustering requires
stronger assumptions such as
number of clusters and the
initial centers.
» partitional clustering
algorithms require the number
of clusters to start running.
Hierarchical clustering
» Hierarchical clustering is slower
than Partitional clustering.
» Hierarchical clustering requires
only a similarity measure.
» Hierarchical clustering does not
require any input parameters
4/10/2018 HIERARCHICAL CLUSTERING 10
11. The applications of Hierarchical
Applications
» Wireless Sensor Network
» Audio Event Detection
» Web cluster engines
» Bioinformatics
» And many more.
4/10/2018 HIERARCHICAL CLUSTERING 11
12. Hierarchal Clustering pros and cons
Pros..
» Doesn't required number of
clusters to be specified.
» Easy to implement.
» Produces a dendrogram,
which helps with
understanding the data.
Cons..
» Can never undo any previous
steps throughout the
algorithm.
» Generally has long runtimes.
» Sometimes difficult to identify
the number of clusters by the
dendrogram.
4/10/2018 HIERARCHICAL CLUSTERING 12
13. Summary
•Hierarchical algorithm is a sequential clustering algorithm
»Use distance matrix to construct a tree of clusters (dendrogram)
»Hierarchical representation without the need of knowing of clusters (can
set termination condition with known of clusters)
•Major weakness of agglomerative clustering methods
»Can never undo what was done previously
»Sensitive to cluster distance measures and noise/outliers
4/10/2018 HIERARCHICAL CLUSTERING 13