Hierarchical methods navdeep kaur newww.pptx

Hierarchical methods
Submitted To : Er Ashima Aggarwal
Submitted By: Navdeep Kaur
Roll no:2131761
B.Tech (data science ) 6th Sem

Content
•Clustering
•Types of Clustering
•Hierarchical Clustering
•Key Concepts in Hierarchal Clustering
•Types of Hierarchal Clustering
•Difference between Partitional and Hierarchical clustering
•The Application of Hierarchical Clustering
•Hierarchical Clustering pros and cons
• Summary

Clustering
Clustering is the classification of objects into different groups,
or more precisely, the partitioning of a data set into subsets
(clusters), so that the data in each subset (ideally) share some
common trait - often according to some defined distance
Measure .

Clustering (cont..)
• Also called “unsupervised learning” or “data mining”
•Organizing data into classes such that there is
high intra-class similarity
low intra-class similarity
•More informally, finding natural groupings among objects.

Clustering (cont..)
• What is a natural grouping among these objects?
Clustering is subjective

Types of Clustering
• 1. Hierarchical algorithms
• 2. Partitional clustering

Hierarchical Clustering
•Produces a set of nested clusters organized as a hierarchical
tree.
•Can be visualized as a dendrogram.
»A tree-like diagram that records the sequences of merge or splits

Key Concepts in Hierarchal Clustering
Dendrogram tree representation
1. In the beginning we have 6
clusters: A, B, C, D, E and F
2. We merge clusters D and F into
cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
into (A, B) at distance 0.71
4. We merge clusters E and (D, F) into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
into (((D, F), E), C) at distance 1.41
6. We merge clusters (((D, F), E), C)
and (A, B) into ((((D, F), E), C), (A, B))at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation

Types of Hierarchal Clustering
Two main types of hierarchical clustering
• Agglomerative:
» Start with the points as individual clusters
» At each step, merge the closest pair of clusters until only one cluster (or
K clusters) left
 Bottom to top
• Divisive:
» Start with one, all-inclusive cluster
» At each step, split a cluster until each cluster contains a point (or there
are K clusters)
 Top to down

AGNES (Agglomerative Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages Use the Single-Link method
and the dissimilarity matrix.
• Merge nodes that have the least dissimilarity Go on in a non-descending
fashion
• Eventually all nodes belong to the same cluster

DIANA (Divisive Analysis)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• Eventually each node forms a cluster on its own

Difference between Partitional and
Hierarchical clustering
Partitional clustering
» Partitional clustering is faster
than hierarchical clustering.
» Partitional clustering requires
stronger assumptions such as
number of clusters and the
initial centers.
» partitional clustering
algorithms require the number
of clusters to start running.
Hierarchical clustering
»Hierarchical clustering is slower
than Partitional clustering.
» Hierarchical clustering requires
only a similarity measure.
» Hierarchical clustering does not
require any input parameters

The applications of Hierarchical
Applications
» Wireless Sensor Network
» Audio Event Detection
» Web cluster engines
» Bioinformatics
» And many more.

Hierarchal Clustering pros and cons
Pros..
» Doesn't required number of
clusters to be specified.
» Easy to implement.
» Produces a dendrogram,
which helps with
understanding the data.
Cons..
» Can never undo any previous
steps throughout the
algorithm.
» Generally has long runtimes.
» Sometimes difficult to identify
the number of clusters by the
dendrogram.

Summary
•Hierarchical algorithm is a sequential clustering algorithm
»Use distance matrix to construct a tree of clusters (dendrogram)
»Hierarchical representation without the need of knowing of clusters
(can set termination condition with known of clusters)
•Major weakness of agglomerative clustering methods
»Can never undo what was done previously
»Sensitive to cluster distance measures and noise/outliers

Hierarchical methods navdeep kaur newww.pptx

Hierarchical methods navdeep kaur newww.pptx

Recommended

Recommended

More Related Content

Similar to Hierarchical methods navdeep kaur newww.pptx

Similar to Hierarchical methods navdeep kaur newww.pptx (20)

Recently uploaded

Recently uploaded (20)

Hierarchical methods navdeep kaur newww.pptx