SlideShare a Scribd company logo
1 of 22
CLUSTER VALIDATION
Presented By :Rohit Paul
CLUSTERING
 Process of partitioning a set of data objects into
subsets (called clusters)
 Objects in a cluster are similar to one another and
dissimilar to objects in other clusters.
CLUSTER VALIDITY INDICES
 To evaluate the “goodness” of the resulting clusters.
 Different aspects of cluster validation
 To compare clustering algorithms
 To compare two different cluster set
 Comparing the results of a cluster analysis to externally
known results
 Determining the ‘correct’ number of clusters
 Scikit-learn(sklearn) – a library for machine learning
in python
 from sklearn.metrics import ..
Types of Validity Indices
 Internal Quality Indices
 Use to measure the goodness of a clustering structure
without respect to external information.
 How well the clusters are separated and how compact the
clusters are.
 External Quality Indices
 Measure the extent to which cluster labels match the
externally supplied class labels.
Internal Quality Indices
 Based on the following two criteria:
 Compactness/Cohesion: how closely related the objects
in a cluster are
 Separation: how distinct or well-separated a cluster is
from other clusters
 Application
 To compare clustering algorithms
 Determining the ‘correct’ number of clusters
Disadvantages of k-mean
Choosing the number of clusters k
 In most exploratory applications, the number of clusters K
is unknown
 Correct choice of k is often ambiguous
Davies Bouldin Index
Maximum of intra-cluster distance by
inter-cluster distance
>> from sklearn.metrics import davies_bouldin_score
………....
>> davies_bouldin_score(X, labels)
 Lower the DB index value, better is the clustering
Dunn Index
It is defined as Minimum separation by
maximum diameter
 Higher the Dunn index value, better is the clustering.
Silhouette Index
 The Silhouette Coefficient combine ideas cohesion
and separation, but for individual points
S(i) = ( b(i) – a(i) ) / ( max { ( a(i), b(i) ) }
Where,
 a(i) is the average dissimilarity of ith object to all other
objects in the same cluster
 b(i) is the average dissimilarity of ith object with all objects
in the closest cluster.
>> from sklearn.metrics import silhouette_score
………....
>> silhouette_score(X, labels)
Other Internal Cluster Validity Indices
 Root-mean-square std dev
 R-squared
 Modified Hubert statistics
 Calinski-Harabasz index
 I index
 SD validity index
 S_Dbw validity index and so on….
External Quality Indices
Comparing the results of a cluster analysis to an
externally known result, such as externally
provided class labels
 Validate against ground truth
 Compare two clusters
Jaccard Score
Rand Index
 Measure the number of pairs that are in:
 A = Same class both in P and G
 B = Same class in P but different in G
 C = Different class in P but
same in G
 D = Different class both in
P and G
 Agreement: a, d
 Disagreement: b, c
 Rand Index:
>> from sklearn.metrics import adjusted_rand_score
………....
>> adjusted_rand_score(labels_true, labels_pred)
F-measure
 Precision: What % of tuples that the classifier labeled
positive are actually positive
 Recall: What % of positive tuples did
the classifier label as positive
F-Measure : The harmonic mean of precision
and recall
Others External Cluster Validity Indices
 Normalized Mutual Information(NMI)
 Purity
 Sorensen-Dice
 Braun-Banquet
 Normalized Van Dongen
 Pair-Set Index
 Centroid Index and many more….
Reference
 https://medium.com/swlh/how-to-choose-the-right-
number-of-clusters-in-the-k-means-algorithm-
9160c57ec760
 https://present5.com/clustering-methods-part-3-cluster-
validation-pasi-franti/
 https://www.datanovia.com/en/lessons/cluster-validation-
statistics-must-know-methods/
 https://www.geeksforgeeks.org/dunn-index-and-db-index-
cluster-validity-indices-set-1/
 Understanding of Internal Clustering Validation Measures
Yanchi Liu1,2, Zhongmou Li2, Hui Xiong2, Xuedong
Gao1, Junjie Wu31School of Economics and
Management, University of Science and Technology
Beijing, China
 https://scikit-learn.org/stable/modules/clustering.html
Thank You !!

More Related Content

What's hot

k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptxRoshan86572
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...Edureka!
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannswapnac12
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Neural network
Neural networkNeural network
Neural networkSilicon
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligenceharshita virwani
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsUmang MIshra
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)snegacmr
 
Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptxSushilAcharya18
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Introduction and architecture of expert system
Introduction  and architecture of expert systemIntroduction  and architecture of expert system
Introduction and architecture of expert systempremdeshmane
 

What's hot (20)

k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Clustering
ClusteringClustering
Clustering
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbann
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Neural network
Neural networkNeural network
Neural network
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
 
Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Introduction and architecture of expert system
Introduction  and architecture of expert systemIntroduction  and architecture of expert system
Introduction and architecture of expert system
 

Similar to Cluster validation

20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.pptSamPrem3
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.pptPalaniKumarR2
 
Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Daniel Chan
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...ShivarkarSandip
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification Mahmoud Alfarra
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.pptLaxmi139487
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxAmy Aung
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
Deployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionDeployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionijtsrd
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysisAcad
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_AnalysisSammya Sengupta
 
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
 

Similar to Cluster validation (20)

20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
ppt.pptx
ppt.pptxppt.pptx
ppt.pptx
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Data mining
Data miningData mining
Data mining
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Deployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionDeployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement prediction
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
class diagram
class diagramclass diagram
class diagram
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_Analysis
 
Presentation
PresentationPresentation
Presentation
 
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Cluster validation

  • 2. CLUSTERING  Process of partitioning a set of data objects into subsets (called clusters)  Objects in a cluster are similar to one another and dissimilar to objects in other clusters.
  • 3. CLUSTER VALIDITY INDICES  To evaluate the “goodness” of the resulting clusters.  Different aspects of cluster validation  To compare clustering algorithms  To compare two different cluster set  Comparing the results of a cluster analysis to externally known results  Determining the ‘correct’ number of clusters  Scikit-learn(sklearn) – a library for machine learning in python  from sklearn.metrics import ..
  • 4. Types of Validity Indices  Internal Quality Indices  Use to measure the goodness of a clustering structure without respect to external information.  How well the clusters are separated and how compact the clusters are.  External Quality Indices  Measure the extent to which cluster labels match the externally supplied class labels.
  • 5. Internal Quality Indices  Based on the following two criteria:  Compactness/Cohesion: how closely related the objects in a cluster are  Separation: how distinct or well-separated a cluster is from other clusters
  • 6.  Application  To compare clustering algorithms  Determining the ‘correct’ number of clusters
  • 7. Disadvantages of k-mean Choosing the number of clusters k  In most exploratory applications, the number of clusters K is unknown  Correct choice of k is often ambiguous
  • 8. Davies Bouldin Index Maximum of intra-cluster distance by inter-cluster distance
  • 9. >> from sklearn.metrics import davies_bouldin_score ……….... >> davies_bouldin_score(X, labels)  Lower the DB index value, better is the clustering
  • 10. Dunn Index It is defined as Minimum separation by maximum diameter
  • 11.  Higher the Dunn index value, better is the clustering.
  • 12. Silhouette Index  The Silhouette Coefficient combine ideas cohesion and separation, but for individual points S(i) = ( b(i) – a(i) ) / ( max { ( a(i), b(i) ) } Where,  a(i) is the average dissimilarity of ith object to all other objects in the same cluster  b(i) is the average dissimilarity of ith object with all objects in the closest cluster.
  • 13. >> from sklearn.metrics import silhouette_score ……….... >> silhouette_score(X, labels)
  • 14. Other Internal Cluster Validity Indices  Root-mean-square std dev  R-squared  Modified Hubert statistics  Calinski-Harabasz index  I index  SD validity index  S_Dbw validity index and so on….
  • 15. External Quality Indices Comparing the results of a cluster analysis to an externally known result, such as externally provided class labels  Validate against ground truth  Compare two clusters
  • 17. Rand Index  Measure the number of pairs that are in:  A = Same class both in P and G  B = Same class in P but different in G  C = Different class in P but same in G  D = Different class both in P and G
  • 18.  Agreement: a, d  Disagreement: b, c  Rand Index: >> from sklearn.metrics import adjusted_rand_score ……….... >> adjusted_rand_score(labels_true, labels_pred)
  • 19. F-measure  Precision: What % of tuples that the classifier labeled positive are actually positive  Recall: What % of positive tuples did the classifier label as positive F-Measure : The harmonic mean of precision and recall
  • 20. Others External Cluster Validity Indices  Normalized Mutual Information(NMI)  Purity  Sorensen-Dice  Braun-Banquet  Normalized Van Dongen  Pair-Set Index  Centroid Index and many more….
  • 21. Reference  https://medium.com/swlh/how-to-choose-the-right- number-of-clusters-in-the-k-means-algorithm- 9160c57ec760  https://present5.com/clustering-methods-part-3-cluster- validation-pasi-franti/  https://www.datanovia.com/en/lessons/cluster-validation- statistics-must-know-methods/  https://www.geeksforgeeks.org/dunn-index-and-db-index- cluster-validity-indices-set-1/  Understanding of Internal Clustering Validation Measures Yanchi Liu1,2, Zhongmou Li2, Hui Xiong2, Xuedong Gao1, Junjie Wu31School of Economics and Management, University of Science and Technology Beijing, China  https://scikit-learn.org/stable/modules/clustering.html