AINL 2016: Muravyov

Towards Cluster Validity Index Evaluation and
Selection
Andrey Filchenkov, Sergey Muravyov, Vladimir Parfenov
ITMO University St. Petersburg, Russia
{afilchenkov,smuravyov}@corp.ifmo.ru, parfenov@mail.ifmo.ru
11.11.2016

What is clustering?
Clustering means grouping the objects based on the
information found in the data describing the objects or their
relationships.
The goal is that the objects in a group will be similar (or
related) to one other and different from (or unrelated to) the
objects in other groups.
2 / 35

Applications of clustering
Biology
Medicine
Business and marketing
Social science
Computer science
4 / 35

Clustering Evaluation
No target field
Two ways of partition evaluation:
• External
• Internal
5 / 35

External measures
Based on known class labels and external benchmarks
Examples:
• F-measure
• Rand measure
• Jaccard index
Not applicable for real-life tasks!
6 / 35

Internal measures
Usually assign the best score to the algorithm that produces
clusters with high similarity within a cluster and low similarity
between clusters
No need for extra information
7 / 35

List of the most popular existing internal measures
(clustering validity indicies)
Dunn index (D) COP-index (COP)
Davies-Bouldin index (DB) OS-Index (OS)
Silhouette index (Sil) Generalized Dunn indices (6 indicies: GD*)
Calinski–Harabasz (CH) C-Index (CI)
CS index (CS) Sym-index (Symm)
Modified Davies-Bouldin index (DB*) S Dbw index (SDbw)
Score function (SF)
8 / 35

Problem of choice
There are a lot of CVIs
Which one should be chosen for a hard clustering problem?
• How to compare the existing ones?
• How to evaluate each one?
9 / 35

Approaches for CVI comparison
Visual-based comparison
Comparison with known labels
Purely theoretical comparison that is based on studying CVI
properties
Comparison based on stability, robustness of structure, or
other desired properties
10 / 35

Approaches for CVI comparison
Visual-based comparison
Comparison with known labels
Purely theoretical comparison that is based on studying CVI
properties
Comparison based on stability, robustness of structure, or
other desired properties
11 / 35

Practical approach for partition evaluation
It is stated in the classical book [1], ”Understanding our world
requires conceptualizing the similarities and differences
between the entities that compose it”
M. Ackerman showed that there is no difference in partition
evaluations made by clustering experts and non-experts
The only way of obtaining ground truth for partition quality
estimation is human estimation
12 / 35
[1] - R.C. Tryon and D. E. Bailey, Cluster analysis, 1970

Formalized ground truth for partitions
Let 𝐶 denote the space of clustering tasks
Weak ordering:
• 𝐻𝑟 𝑐 : 𝐶 → 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 × 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 , 𝑐 ∈ 𝐶
Binary estimation scale
• 𝐻 𝑏: 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 → {0,1}
13 / 35

Measure evaluation framework
Procedure for one assessor
Procedure for multiple assessors
14 / 35

Procedure for one assessor
Evaluate all possible partitions
Set partial order, compare all pairs of partitions
15 / 35

Constraints on one-assessors procedure
Choose finite set of partitions
Mark each partition with a number
• There is a score function that can be put into
correspondence to any weak strict order
16 / 35

Procedure for multiple assessors
Four criteria for each CVI:
• Binarized adequacy
• Weighted adequacy
• Adequacy of the best
• Aggregated ranking
17 / 35

Binarized adequacy
𝑅 𝐵𝐴
=
𝜌 𝐾𝜏(𝑏 𝑏𝑒𝑠𝑡,𝑏 𝐶𝑉𝐼)
𝜌 𝐾𝜏(𝑏 𝑏𝑒𝑠𝑡,𝑏 𝑤𝑜𝑟𝑠𝑡)
• 𝑏 𝐶𝑉𝐼 — permutation on binary marks with respect to the
CVIs estimations of these partitions, contains of + and −
• 𝑏 𝑏𝑒𝑠𝑡 — best permutation: + + ⋯ + − ⋯ −
• 𝑏 𝑤𝑜𝑟𝑠𝑡— worst permutation: − ⋯ − + ⋯ +
• 𝜌 𝐾𝜏 — modified Kendall tau distance
18 / 35

Weighted adequacy
𝑅 𝑊𝐴
=
𝜌 𝐾𝜏(𝑟 𝑏𝑒𝑠𝑡,𝑟 𝐶𝑉𝐼)
𝜌 𝐾𝜏(𝑟 𝑏𝑒𝑠𝑡,𝑟 𝑤𝑜𝑟𝑠𝑡)
• Instead of + and − we take the amount of assessors that
gave an adequate mark to CVI: 𝑤𝑖
• 𝑟𝐶𝑉𝐼 — permutation on weighted marks with respect to the
CVIs estimations of these partitions
• 𝑟𝑏𝑒𝑠𝑡 — best permutation: 𝑤1 ≥ ⋯ ≥ 𝑤 𝑛
• 𝑟 𝑤𝑜𝑟𝑠𝑡— worst permutation:𝑤1 ≤ ⋯ ≤ 𝑤 𝑛
• 𝜌 𝐾𝜏 — modified Kendall tau distance
19 / 35

Adequacy of the best
Chose the adequacy mark, which was assigned to the
partition having the highest value of CVI being estimated.
20 / 35

Aggregated ranking
Represents how many orderings produced by the assessors
and by each CVI differ
Weak order aggregation algorithm and distance measure 𝜌 J.
L. Garcia-Lapresta and D. Perez-Roman
𝑅 𝐴𝑅 =
𝜌(𝑎𝑟 𝑏𝑒𝑠𝑡,𝑎𝑟 𝐶𝑉𝐼)
𝜌(𝑎𝑟 𝑏𝑒𝑠𝑡,𝑎𝑟 𝑤𝑜𝑟𝑠𝑡)
• 𝑎𝑟∗— aggregated ordering
21 / 35

Experimental evaluation
19 of the most popular CVIs were taken to find out if any of
them matches the real quality of resulting clusters
22 / 35

Experimental data
41 datasets:
• Artificial
• 2D
23 / 35

Clustering algorithms
6 algorithms:
• 𝑘-Means
• X-Means
• EM
• DBSCAN
• FarthestFirst
• Hierarchical
14 configurations totally
24 / 35

Example of partitions (densed clusters)
25 / 35
Initial K-Means
EM Hierarchical

Example of partitions (curves topology)
26 / 35
Initial K-Means
EM Hierarchical

Summarization for binarized adequacy
CVI Adequacy CVI Adequacy
DB 0.151 Symm 0.333
D 0.515 CI 0.121
Syl 0.393 DB* 0.121
CH 0.352 GD31 0.333
SDBW 0.454 GD41 0.435
SF 0.393 GD51 0.393
CS 0.303 GD33 0.352
COP 0.575 GD43 0.272
SV 0.272 GD53 0.333
OS 0.575 threshold 0.7
27 / 35

Summarization for weighted adequacy
CVI Adequacy CVI Adequacy
DB 0.121 Symm 0.353
D 0.317 CI 0.073
Syl 0.121 DB* 0.000
CH 0.097 GD31 0.097
SDBW 0.195 GD41 0.097
SF 0.170 GD51 0.121
CS 0.146 GD33 0.073
COP 0.414 GD43 0.035
SV 0.195 GD53 0.035
28 / 35

Adequacy of the cluster partitions, claimed to be
the best by CVIs
CVI # of the best CVI # of the best
DB 4 (9.7%) Symm 5 (12.1%)
D 4 (9.7%) CI 4 (9.7%)
Syl 4 (9.7%) DB* 4 (9.7%)
CH 4 (9.7%) GD31 5 (12.1%)
SDBW 6 (14.6%) GD41 5 (12.1%)
SF 4 (9.7%) GD51 6 (14.6%)
CS 6 (14.6%) GD33 4 (9.7%)
COP 6 (14.6%) GD43 4 (9.7%)
SV 6 (14.6%) GD53 5 (12.1%)
OS 5 (12.1%) threshold 70%
29 / 35

Summarization for weighted adequacy
CVI Rank CVI Rank
DB 0.390 Symm 0.292
D 0.353 CI 0.353
Syl 0.170 DB* 0.317
CH 0.073 GD31 0.073
SDBW 0.353 GD41 0.146
SF 0.146 GD51 0.195
CS 0.146 GD33 0.035
COP 0.414 GD43 0.035
SV 0.414 GD53 0.073
30 / 35

Results
None of the metrics fits the requirements described above
No perfectly applicable, universal metric
CVIs should be chosen for problems specifically
31 / 35

Main disadvantage of proposed measure evaluation
framework
Expensive: each new dataset has to be evaluated by assessors
Improvement:
• Meta-learning
32 / 35

Meta-learning approach
Selecting a CVI based on the set of meta-features
Classification task:
• Train on datasets by using framework described above
33 / 35

Conclusion
Assessors’ estimates should be used as the ground truth (and
measure of CVI quality)
No universal clustering validity index existing at the moment
Cluster validity indices should be chosen for problems
specifically (meta-leaning approach)
34 / 35

AINL 2016: Muravyov

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to AINL 2016: Muravyov

Similar to AINL 2016: Muravyov (20)

More from Lidia Pivovarova

More from Lidia Pivovarova (7)

Recently uploaded

Recently uploaded (20)

AINL 2016: Muravyov