SlideShare a Scribd company logo
Dendogram
Hierarchical Clustering : Its slow :: complicated :: repeatable :: not suited for big
data sets.
Lets take 6 simple Vectors.
6 Vectors
Using Euclidean Distance lets compute the Distance Matrix.
Euclidean Distance = sqrt( (x2 -x1)**2 + (y2-y1)**2 )
Using Euclidean Distance lets compute the Distance Matrix.
Euclidean Distance = sqrt( (x2 -x1)**2 + (y2-y1)**2 )
Distance Matrix
Complete Link Clustering: Considers Max of all distances. Leads to many small
clusters.
Distance Matrix: Diagonals will be 0 and values will be symmetric.
Stage 0
Step a: The shortest distance in the matrix is 1 and the vectors associated with that
are C & D
So the first cluster is C — D
Distance between other vectors and CD
A to CD = max(A->C, A->D) = max(25,24) = 25
B to CD = max(B-<C, B->D) = max(21,20) = 21
and similarly find for E -> CD & F -> CD
Stage 1
Step b : Now 2 is the shortest distance and the vectors associated with that are E & F
Second cluster is E — F
A to EF = max(A->E, A->F) = max(9,7) = 9
CD to EF = max(CD->E, CD->F) = max(15,17) = 17
Step c : Now 4 is the shortest distance and vectors associated are A & B. Third cluster
is A — B
CD to AB = max(CD -> A, CD ->B) = max(25,21) = 25
EF to AB = max(EF -> A, EF ->B) = max(9,5) = 9
Step d : Now 9 is the shortest distance and vectors associated are AB and EF. Fourth
cluster is AB — EF
CD to ABEF = max(CD->AB, CD->EF) = max(25,18) = 25
Step e : Last cluster is CD — ABEF
Let’s take a sample of 5 students:
Creating a Proximity Matrix
First, we will create a proximity matrix which will tell us the distance between each of
these points. S
ince we are calculating the distance of each point from each of the
other points, we will get a square matrix of shape n X n (where n is the number of
observations).
Let’s make the 5 x 5 proximity matrix for our example:
Step 1: First, we assign all the points to an individual cluster:
Different colors here represent different clusters. You can see that we have 5
different clusters for the 5 points in our data.
Step 2: Next, we will look at the smallest distance in the proximity matrix and merge
the points with the smallest distance. We then update the proximity matrix:
Here, the smallest distance is 3 and hence we will merge point 1 and 2:
Let’s look at the updated clusters and accordingly update the proximity matrix:
Here, we have taken the maximum of the two marks (7, 10) to replace the marks for
this cluster. Instead of the maximum, we can also take the minimum value or the
average values as well. Now, we will again calculate the proximity matrix for these
clusters:
Step 3: We will repeat step 2 until only a single cluster is left.
So, we will first look at the minimum distance in the proximity matrix and then merge
the closest pair of clusters. We will get the merged clusters as shown below after
repeating these steps:
How should we Choose the Number of Clusters in Hierarchical Clustering?
Let’s get back to our teacher-student example. Whenever we merge two clusters, a
dendrogram will record the distance between these clusters and represent it in graph
form. Let’s see how a dendrogram looks like:
We have the samples of the dataset on the x-axis and the distance on the y-
axis. Whenever two clusters are merged, we will join them in this dendrogram and
the height of the join will be the distance between these points.
Let’s build the dendrogram for our example:
Take a moment to process the above image. We started by merging sample 1 and 2
and the distance between these two samples was 3 (refer to the first proximity matrix
in the previous section).
Let’s plot this in the dendrogram:
Here, we can see that we have merged sample 1 and 2. The vertical line represents
the distance between these samples. S
imilarly, we plot all the steps where we merged
the clusters and finally, we get a dendrogram like this:
Now, we can set a threshold distance and draw a horizontal line (Generally, we try to
set the threshold in such a way that it cuts the tallest vertical line). Let’s set this threshold
as 12 and draw a horizontal line:
The number of clusters will be the number of vertical lines which are being
intersected by the line drawn using the threshold. In the above example, since the
red line intersects 2 vertical lines, we will have 2 clusters. One cluster will have a
sample (1,2,4) and the other will have a sample (3,5).

More Related Content

Similar to Clustering-dendogram.pptx

Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean Classifier
Dipesh Shome
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
ishmecse13
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
Afzaal Subhani
 
11-2-Clustering.pptx
11-2-Clustering.pptx11-2-Clustering.pptx
11-2-Clustering.pptx
paktari1
 
[PPT]
[PPT][PPT]
[PPT]
butest
 
overviewPCA
overviewPCAoverviewPCA
overviewPCA
Edwin Heredia
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
MSE.pptx
MSE.pptxMSE.pptx
MSE.pptx
JantuRahaman
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Avijit Famous
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
nozomuhamada
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
Nayem Nayem
 
Clustering
ClusteringClustering
Clustering
Smrutiranjan Sahu
 
Project
ProjectProject
Project
Srijan Paul
 
Traveling Salesman Problem in Distributed Environment
Traveling Salesman Problem in Distributed EnvironmentTraveling Salesman Problem in Distributed Environment
Traveling Salesman Problem in Distributed Environment
csandit
 
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENTTRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
cscpconf
 
Vectorise all the things
Vectorise all the thingsVectorise all the things
Vectorise all the things
JodieBurchell1
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
Varad Meru
 
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
ArchiLab 7
 
Lect4
Lect4Lect4
Lect4
sumit621
 

Similar to Clustering-dendogram.pptx (20)

Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean Classifier
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
11-2-Clustering.pptx
11-2-Clustering.pptx11-2-Clustering.pptx
11-2-Clustering.pptx
 
[PPT]
[PPT][PPT]
[PPT]
 
overviewPCA
overviewPCAoverviewPCA
overviewPCA
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
MSE.pptx
MSE.pptxMSE.pptx
MSE.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
 
Clustering
ClusteringClustering
Clustering
 
Project
ProjectProject
Project
 
Traveling Salesman Problem in Distributed Environment
Traveling Salesman Problem in Distributed EnvironmentTraveling Salesman Problem in Distributed Environment
Traveling Salesman Problem in Distributed Environment
 
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENTTRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENT
 
Vectorise all the things
Vectorise all the thingsVectorise all the things
Vectorise all the things
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
 
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
Fessant aknin oukhellou_midenet_2001:comparison_of_supervised_self_organizing...
 
Lect4
Lect4Lect4
Lect4
 

Recently uploaded

How to Use Payment Vouchers in Odoo 18.
How to Use Payment Vouchers in  Odoo 18.How to Use Payment Vouchers in  Odoo 18.
How to Use Payment Vouchers in Odoo 18.
FinShe
 
Fabular Frames and the Four Ratio Problem
Fabular Frames and the Four Ratio ProblemFabular Frames and the Four Ratio Problem
Fabular Frames and the Four Ratio Problem
Majid Iqbal
 
The Impact of Generative AI and 4th Industrial Revolution
The Impact of Generative AI and 4th Industrial RevolutionThe Impact of Generative AI and 4th Industrial Revolution
The Impact of Generative AI and 4th Industrial Revolution
Paolo Maresca
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
qntjwn68
 
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdfOptimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
shruti1menon2
 
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdfSeeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
Ashis Kumar Dey
 
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
bresciafarid233
 
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
28xo7hf
 
Detailed power point presentation on compound interest and how it is calculated
Detailed power point presentation on compound interest  and how it is calculatedDetailed power point presentation on compound interest  and how it is calculated
Detailed power point presentation on compound interest and how it is calculated
KishanChaudhary23
 
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
5spllj1l
 
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
nimaruinazawa258
 
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptxOAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
hiddenlevers
 
Accounting Information Systems (AIS).pptx
Accounting Information Systems (AIS).pptxAccounting Information Systems (AIS).pptx
Accounting Information Systems (AIS).pptx
TIZITAWMASRESHA
 
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
5spllj1l
 
Machine Learning in Business - A power point presentation.pptx
Machine Learning in Business - A power point presentation.pptxMachine Learning in Business - A power point presentation.pptx
Machine Learning in Business - A power point presentation.pptx
mimiroselowe
 
Ending stagnation: How to boost prosperity across Scotland
Ending stagnation: How to boost prosperity across ScotlandEnding stagnation: How to boost prosperity across Scotland
Ending stagnation: How to boost prosperity across Scotland
ResolutionFoundation
 
South Dakota State University degree offer diploma Transcript
South Dakota State University degree offer diploma TranscriptSouth Dakota State University degree offer diploma Transcript
South Dakota State University degree offer diploma Transcript
ynfqplhm
 
Discover the Future of Dogecoin with Our Comprehensive Guidance
Discover the Future of Dogecoin with Our Comprehensive GuidanceDiscover the Future of Dogecoin with Our Comprehensive Guidance
Discover the Future of Dogecoin with Our Comprehensive Guidance
36 Crypto
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
sameer shah
 
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
28xo7hf
 

Recently uploaded (20)

How to Use Payment Vouchers in Odoo 18.
How to Use Payment Vouchers in  Odoo 18.How to Use Payment Vouchers in  Odoo 18.
How to Use Payment Vouchers in Odoo 18.
 
Fabular Frames and the Four Ratio Problem
Fabular Frames and the Four Ratio ProblemFabular Frames and the Four Ratio Problem
Fabular Frames and the Four Ratio Problem
 
The Impact of Generative AI and 4th Industrial Revolution
The Impact of Generative AI and 4th Industrial RevolutionThe Impact of Generative AI and 4th Industrial Revolution
The Impact of Generative AI and 4th Industrial Revolution
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
 
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdfOptimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdf
 
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdfSeeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
Seeman_Fiintouch_LLP_Newsletter_Jun_2024.pdf
 
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
 
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
1比1复刻(ksu毕业证书)美国堪萨斯州立大学毕业证本科文凭证书原版一模一样
 
Detailed power point presentation on compound interest and how it is calculated
Detailed power point presentation on compound interest  and how it is calculatedDetailed power point presentation on compound interest  and how it is calculated
Detailed power point presentation on compound interest and how it is calculated
 
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
 
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
Tdasx: In-Depth Analysis of Cryptocurrency Giveaway Scams and Security Strate...
 
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptxOAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
 
Accounting Information Systems (AIS).pptx
Accounting Information Systems (AIS).pptxAccounting Information Systems (AIS).pptx
Accounting Information Systems (AIS).pptx
 
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
快速办理(美国Fordham毕业证书)福德汉姆大学毕业证学历证书一模一样
 
Machine Learning in Business - A power point presentation.pptx
Machine Learning in Business - A power point presentation.pptxMachine Learning in Business - A power point presentation.pptx
Machine Learning in Business - A power point presentation.pptx
 
Ending stagnation: How to boost prosperity across Scotland
Ending stagnation: How to boost prosperity across ScotlandEnding stagnation: How to boost prosperity across Scotland
Ending stagnation: How to boost prosperity across Scotland
 
South Dakota State University degree offer diploma Transcript
South Dakota State University degree offer diploma TranscriptSouth Dakota State University degree offer diploma Transcript
South Dakota State University degree offer diploma Transcript
 
Discover the Future of Dogecoin with Our Comprehensive Guidance
Discover the Future of Dogecoin with Our Comprehensive GuidanceDiscover the Future of Dogecoin with Our Comprehensive Guidance
Discover the Future of Dogecoin with Our Comprehensive Guidance
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
 
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
真实可查(nwu毕业证书)美国西北大学毕业证学位证书范本原版一模一样
 

Clustering-dendogram.pptx

  • 1. Dendogram Hierarchical Clustering : Its slow :: complicated :: repeatable :: not suited for big data sets. Lets take 6 simple Vectors. 6 Vectors Using Euclidean Distance lets compute the Distance Matrix. Euclidean Distance = sqrt( (x2 -x1)**2 + (y2-y1)**2 )
  • 2. Using Euclidean Distance lets compute the Distance Matrix. Euclidean Distance = sqrt( (x2 -x1)**2 + (y2-y1)**2 ) Distance Matrix Complete Link Clustering: Considers Max of all distances. Leads to many small clusters. Distance Matrix: Diagonals will be 0 and values will be symmetric. Stage 0
  • 3. Step a: The shortest distance in the matrix is 1 and the vectors associated with that are C & D So the first cluster is C — D Distance between other vectors and CD A to CD = max(A->C, A->D) = max(25,24) = 25 B to CD = max(B-<C, B->D) = max(21,20) = 21 and similarly find for E -> CD & F -> CD Stage 1
  • 4. Step b : Now 2 is the shortest distance and the vectors associated with that are E & F Second cluster is E — F A to EF = max(A->E, A->F) = max(9,7) = 9 CD to EF = max(CD->E, CD->F) = max(15,17) = 17 Step c : Now 4 is the shortest distance and vectors associated are A & B. Third cluster is A — B CD to AB = max(CD -> A, CD ->B) = max(25,21) = 25 EF to AB = max(EF -> A, EF ->B) = max(9,5) = 9
  • 5. Step d : Now 9 is the shortest distance and vectors associated are AB and EF. Fourth cluster is AB — EF CD to ABEF = max(CD->AB, CD->EF) = max(25,18) = 25 Step e : Last cluster is CD — ABEF
  • 6. Let’s take a sample of 5 students: Creating a Proximity Matrix First, we will create a proximity matrix which will tell us the distance between each of these points. S ince we are calculating the distance of each point from each of the other points, we will get a square matrix of shape n X n (where n is the number of observations). Let’s make the 5 x 5 proximity matrix for our example:
  • 7. Step 1: First, we assign all the points to an individual cluster: Different colors here represent different clusters. You can see that we have 5 different clusters for the 5 points in our data. Step 2: Next, we will look at the smallest distance in the proximity matrix and merge the points with the smallest distance. We then update the proximity matrix: Here, the smallest distance is 3 and hence we will merge point 1 and 2:
  • 8. Let’s look at the updated clusters and accordingly update the proximity matrix: Here, we have taken the maximum of the two marks (7, 10) to replace the marks for this cluster. Instead of the maximum, we can also take the minimum value or the average values as well. Now, we will again calculate the proximity matrix for these clusters:
  • 9. Step 3: We will repeat step 2 until only a single cluster is left. So, we will first look at the minimum distance in the proximity matrix and then merge the closest pair of clusters. We will get the merged clusters as shown below after repeating these steps:
  • 10. How should we Choose the Number of Clusters in Hierarchical Clustering? Let’s get back to our teacher-student example. Whenever we merge two clusters, a dendrogram will record the distance between these clusters and represent it in graph form. Let’s see how a dendrogram looks like: We have the samples of the dataset on the x-axis and the distance on the y- axis. Whenever two clusters are merged, we will join them in this dendrogram and the height of the join will be the distance between these points.
  • 11. Let’s build the dendrogram for our example: Take a moment to process the above image. We started by merging sample 1 and 2 and the distance between these two samples was 3 (refer to the first proximity matrix in the previous section).
  • 12. Let’s plot this in the dendrogram: Here, we can see that we have merged sample 1 and 2. The vertical line represents the distance between these samples. S imilarly, we plot all the steps where we merged the clusters and finally, we get a dendrogram like this:
  • 13. Now, we can set a threshold distance and draw a horizontal line (Generally, we try to set the threshold in such a way that it cuts the tallest vertical line). Let’s set this threshold as 12 and draw a horizontal line: The number of clusters will be the number of vertical lines which are being intersected by the line drawn using the threshold. In the above example, since the red line intersects 2 vertical lines, we will have 2 clusters. One cluster will have a sample (1,2,4) and the other will have a sample (3,5).