2. I'm Menna, aspiring AI
engineer who is
currently an AI intern
at NTI. I was also an AI
Intern at Samsung.
About Me
Fresh Graduate From Faculty of Computer
science Cairo university
3. 1. 2. 3.
Definition of Hierarchical
Clustering
Why we need Hierarchical
Clustering
Types of Hierarchical Clustering
4. 5. 6.
How the Algorithm works Example Pros And Cons
What we'll learn:
Today's Agenda
4.
5. What is
Hierarchical
clustering
that is used to group unlabelled datasets into a Cluster.
This Hierarchical Clustering technique builds clusters based
on the similarity between different objects in the set.
Hierarchical clustering is one of the
popular clustering technique
6. Why we need Hierarchical
Clustering ?
Read More
Interpretability and Insight
Generation
No Prespecified Number of
Clusters
Visual Representation
7.
8. Agglomerative
Hierarchical
Clustering
Start with points as individual
clusters
At each step it merges the
closest pair of clusters until
only one cluster
Divisive
Hierarchical
Clustering
Start with one, all-inclusive
cluster.
At each step it splits a cluster
until each cluster contains a
point
9.
10. Step 1 :
First, make each data point which
forms N clusters.
Let’s take a sample of data and learn how the agglomerative hierarchical
clustering work step by step :
11. Step 2
Take the next two closest data points
and make them one cluster so
now it forms N-1 clusters.
12. Step 3
Again, take the two clusters and make
them one cluster so now it forms N-2
clusters.
18. the distance of two clusters is defined as the minimum distance
between an object (point) in one cluster and an object (point) in
the other cluster.
Simple Linkage :
27. the distance between two clusters is defined as the maximum
distance between an object (point) in one cluster and an object
(point) in the other cluster..
Complete Linkage :
37. At each stage, we combine the two sets that have the smallest centroid
distance. In simple words, it is the distance between the centroids of the
two sets
Centroid Linkage :
40. is used to display the distance between each pair
of sequentially merged objects.
Dendrogram
41. k-means Clustering
it begin with ‘n’ clusters, combine similar
clusters until only one cluster left
Hierarchical Clustering does not work well
on large amounts of data.
Hierarchical clustering don’t work as well as
K-means
found to work well when the structure of the
clusters is hyper spherical
Methods used are less computationally
intensive and are suited with large datasets.
One can use median or mean as a cluster
centre to represent each cluster.
Hierarchical Clustering
42. linkage
criteria
Find the fun in presenting!
hyperparameters of
hierchal clustering
Press Press Press
distance
metric
number
of clusters
43. WHY YOU SHOULD
USE HIRECAL
CLUSTERING
2-
The data has a
Hierarchical structure
ex: Gene expression
data
1-
When the data
does not have a
clear number of
clusters
3-
When the goal is
to explore the
data or identify
patterns.
45. Strengths
Hierarchical Clustering does not work well
on large amounts of data.
once a decision is made to combine two
clusters, it can not be undone
the time complexity for the clustering can
tbe so big in comparison with K-Means
Easy to decide the number of clusters by
looking at the Dendrogram.
It is easy to understand and implement
We don’t have to pre-specify any particular
number of clusters
Limitations