Cluster validation

CLUSTER VALIDATION
Presented By :Rohit Paul

CLUSTERING
 Process of partitioning a set of data objects into
subsets (called clusters)
 Objects in a cluster are similar to one another and
dissimilar to objects in other clusters.

CLUSTER VALIDITY INDICES
 To evaluate the “goodness” of the resulting clusters.
 Different aspects of cluster validation
 To compare clustering algorithms
 To compare two different cluster set
 Comparing the results of a cluster analysis to externally
known results
 Determining the ‘correct’ number of clusters
 Scikit-learn(sklearn) – a library for machine learning
in python
 from sklearn.metrics import ..

Types of Validity Indices
 Internal Quality Indices
 Use to measure the goodness of a clustering structure
without respect to external information.
 How well the clusters are separated and how compact the
clusters are.
 External Quality Indices
 Measure the extent to which cluster labels match the
externally supplied class labels.

Internal Quality Indices
 Based on the following two criteria:
 Compactness/Cohesion: how closely related the objects
in a cluster are
 Separation: how distinct or well-separated a cluster is
from other clusters

 Application
 To compare clustering algorithms
 Determining the ‘correct’ number of clusters

Disadvantages of k-mean
Choosing the number of clusters k
 In most exploratory applications, the number of clusters K
is unknown
 Correct choice of k is often ambiguous

Davies Bouldin Index
Maximum of intra-cluster distance by
inter-cluster distance

>> from sklearn.metrics import davies_bouldin_score
………....
>> davies_bouldin_score(X, labels)
 Lower the DB index value, better is the clustering

Dunn Index
It is defined as Minimum separation by
maximum diameter

 Higher the Dunn index value, better is the clustering.

Silhouette Index
 The Silhouette Coefficient combine ideas cohesion
and separation, but for individual points
S(i) = ( b(i) – a(i) ) / ( max { ( a(i), b(i) ) }
Where,
 a(i) is the average dissimilarity of ith object to all other
objects in the same cluster
 b(i) is the average dissimilarity of ith object with all objects
in the closest cluster.

>> from sklearn.metrics import silhouette_score
………....
>> silhouette_score(X, labels)

Other Internal Cluster Validity Indices
 Root-mean-square std dev
 R-squared
 Modified Hubert statistics
 Calinski-Harabasz index
 I index
 SD validity index
 S_Dbw validity index and so on….

External Quality Indices
Comparing the results of a cluster analysis to an
externally known result, such as externally
provided class labels
 Validate against ground truth
 Compare two clusters

Rand Index
 Measure the number of pairs that are in:
 A = Same class both in P and G
 B = Same class in P but different in G
 C = Different class in P but
same in G
 D = Different class both in
P and G

 Agreement: a, d
 Disagreement: b, c
 Rand Index:
>> from sklearn.metrics import adjusted_rand_score
………....
>> adjusted_rand_score(labels_true, labels_pred)

F-measure
 Precision: What % of tuples that the classifier labeled
positive are actually positive
 Recall: What % of positive tuples did
the classifier label as positive
F-Measure : The harmonic mean of precision
and recall

Others External Cluster Validity Indices
 Normalized Mutual Information(NMI)
 Purity
 Sorensen-Dice
 Braun-Banquet
 Normalized Van Dongen
 Pair-Set Index
 Centroid Index and many more….

Reference
 https://medium.com/swlh/how-to-choose-the-right-
number-of-clusters-in-the-k-means-algorithm-
9160c57ec760
 https://present5.com/clustering-methods-part-3-cluster-
validation-pasi-franti/
 https://www.datanovia.com/en/lessons/cluster-validation-
statistics-must-know-methods/
 https://www.geeksforgeeks.org/dunn-index-and-db-index-
cluster-validity-indices-set-1/
 Understanding of Internal Clustering Validation Measures
Yanchi Liu1,2, Zhongmou Li2, Hui Xiong2, Xuedong
Gao1, Junjie Wu31School of Economics and
Management, University of Science and Technology
Beijing, China
 https://scikit-learn.org/stable/modules/clustering.html

Cluster validation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cluster validation

Similar to Cluster validation (20)

Recently uploaded

Recently uploaded (20)

Cluster validation