A lattice-based consensus clustering

A Lattice-based Consensus Clustering
Algorithm
Artem Bocharov, Dmitry Gnatyshak, Dmitry Ignatov, Boris Mirkin,
Andrey Shestakov
Computer Science Faculty, Dept. of Data Analysis and Artificial Intelligence, HSE, Moscow
The 13th International Conference on Concept Lattices and Their Applications
July 21, 2016
CLA 2016 (HSE) Lattice-Based Consensus clustering 21.07.2016 1 / 45

Objectives
1 Propose lattice-based consensus criteria and algorithms
2 Experimentally compare least-squares consensus clustering results
with those by recent algorithms for consensus clustering

Outline
1 Introduction
Consensus Clustering Problem
Existing approaches
2 Least-squares criteria
Combined consensus clustering
Ensemble consensus clustering
3 Lattice-based approach
4 Computational Experiments
Synthetic datasets
Consensus partition evaluation
Results

Outline
1 Introduction
Existing approaches
Synthetic datasets
Results

Clustering results: different partitions
[Reigner, 1965], [Mirkin, 1969]
Figure 1 : Four clusterings at the same datasetCLA 2016 (HSE) Lattice-Based Consensus clustering 21.07.2016 5 / 45

Consensus Problem
Figure 2 : Clustering ensemble (on the left) and consensus clustering result (on
the right)

Approaches
Probabilistic
Bayesian Cluster Ensembles [Wang, 2009]
Mixture Model [Topchy, 2004]
Direct
Cumulative Voting [Dimitriadou, 2002], [Ayad, 2010]
Graph Partitioning [Ghosh, 2002]
Consensus matrix (Pairwise Similarity) [Guenoche, 2011]
A = (aij ), aij is the number of partitions in which objects yi and yj are
in the same cluster
Least Squares Consensus Clustering [Muchnik, Mirkin, 1981], [Mirkin,
2012]

Outline
1 Introduction
Existing approaches
Synthetic datasets
Results

Basic Definitions
Def.1
A partition of a nonempty set A is a set of its subsets σ = {B | B ⊆ A}
such that
B∈σ
B = A and B ∩ C = ∅ for all B, C ∈ σ.
Every element of σ is called block.

Incidence matrix and projector
Incidence matrix example
σ = {S1, S2, S3} ⇒
y1 :
y2 :
y3 :
y4 :
y5 :
y6 :








1
2
3
1
2
2








⇔ Z =
S1 S2 S3
y1 : 1 0 0
y2 : 0 1 0
y3 : 0 0 1
y4 : 1 0 0
y5 : 0 1 0
y6 : 0 1 0
Projector matrix
Pz = Z(ZT
Z)−1
ZT
= (pij )
pij =
1
|Sk | , if {yi , yj } ∈ Sk;
0, otherwise.

Two goals for consensus clustering
[Mirkin & Muchnik, 1981], [Mirkin, 2012]
Given partitions R1, R2, ..., RT find a consensus partition S so that:
Ensemble consensus: S is good for recovering Rt, t = 1, 2, . . . T
Combined consensus: Rt, t = 1, 2, . . . T are good for describing S

Least-squares criteria
Figure 3 : Partitions S, R1
, . . . RT
result in the corresponding incidence matrices
Z, X1, . . . , XT
Ensemble consensus
S ⇒ {R1, R2, . . . , RT }
⇕
E2 =
T
t=1
∥Xt − PZ Xt∥2
Combined consensus
{R1, R2, . . . , RT } ⇒ S
⇕
E2 =
T
u=1
∥Z − PuZ∥2

Equivalent reformulations of the
least-squares criteria
g(S) =
K
k=1 i,j∈Sk
aij /|Sk|
where A = (aij ) — ensemble consensus matrix of R = {R1, . . . , RT }.
f (S) =
K
k=1 i,j∈Sk
(pij − T/N)
where P = (pij ) — summary projection matrix, and N — number of
objects.

Outline
1 Introduction
Existing approaches
Synthetic datasets
Results

Basic Definitions
Def.2
A partition lattice of set A is an ordered set (Part(A), ∨, ∧) where Part(A)
is a set of all possible partitions of A and for all partitions σ and ρ
supremum and infimum are defined as follows:
σ ∨ ρ = {Nρ(B) ∪
C∈Nρ(B)
Nσ(C)|B ∈ σ},
σ ∧ ρ = {B ∩ C | for all B ∈ σ, C ∈ ρ, and B ∩ C ̸= ∅},
where
Nρ(B) = {C | C ∈ ρ and B ∩ C ̸= ∅}.

Supremum and infimum
123|4|5|678
1235|46781234|5678
12345678
Figure 4 : Supremum and infimum of two partitions

Partition Lattice
Figure 5

Basic Definitions
Def.3
Let A be a set and let ρ, σ ∈ Part(A). The partition ρ is finer than the
partition σ if every block B of σ is a union of blocks of ρ, that is ρ ≤ σ.
Equivalently one can use traditional connection between supremum,
infimum and partial order in the lattice: ρ ≤ σ iff ρ ∨ σ = σ (ρ ∧ σ = ρ).

Isomorphism of Lattices
Theorem 1 (Ganter&Wille)
For a given partially ordered set P = (P, ≤) the concept lattice of the
formal context K = (J(P), M(P), ≤) is isomorphic to the
Dedekind–MacNeille completion of P, where J(P) and M(P) are set of
join-irreducible and meet-irreducible elements of P.
Theorem 2
For a given partition lattice L = (Part(A), ∨, ∧) there exist a formal
context K = (P2, A2, I), where P2 = {{a, b} | a, b ∈ A and a ̸= b},
A2 = {σ | σ ∈ Part(A) and |σ| = 2} and {a, b}Iσ when a and b belong to
the same block of σ. The concept lattice B(P2, A2, I) is isomorphic to the
initial lattice (Part(A), ∨, ∧).

Isomorphism of Lattices
There is a correspondence between elements of L = (Part(A), ∨, ∧)
and formal concepts of B(P2, A2, I).
Every (C, D) ∈ B(P2, A2, I) corresponds to σ = D and every pair
{i, j} from C is in one of σ blocks, where σ ∈ Part(A).
Every (C, D) ∈ BDM(J(L), M(L), ≤) corresponds to
σ = D = C.

Concept Lattice
Figure 6 : The diagram of the concept lattice isomorphic to the partition lattice
of four elements

Partition context
Def.4
Let us call KR = (G, ⊔Mt, I ⊆ G × ⊔Mt) a partition context, where G is
a set of objects, t = 1, . . . , T, and each Mt consists of labels of all clusters
in the t-th k-means partition from the ensemble.
For example, gImt1 means that the object g has been clustered to the first
cluster by t-th clustering algorithm in the ensemble.

The idea of the algorithm
Our consensus algorithm looks for S, an antichain of concepts of KR,
such that for every (A, B) and (C, D) the condition A ∩ C = ∅ is
fulfilled.
The concept extent A corresponds to one of the resulting clusters,
and its intent contains all labels of the ensemble members that voted
for the objects from A being in one cluster.
It is a reasonable consensus hypothesis that at least ⌈T/2⌉ should
vote for a set of objects to be in one cluster.

An example of the algorithm execution
Figure 7 : An example from A. Bocharov’s thesis
The anticahin: S = {({o1, o2, o3, o6}, {a1, b1}), ({o4, o5, o7}, {b2, c2})}.
The orphan object: o8. o′
8 = {a2, b2, c1}.
The resulting partition: σ = {{o1, o2, o3, o6}, {o4, o5, o7, o8}}.

Perfect Recovery Condition
Theorem 3
In the concept lattice of a partition context
KR = (G, ⊔Mt, I ⊆ G × ⊔Mt), there is the antichain of concepts S such
that all extents of its concepts Ai coincide with Si from λ, the true
partition, if and only if S′′
i = Si where i = 1, . . . K.

Outline
1 Introduction
Existing approaches
Synthetic datasets
Results

Gaussian cluster generation
Generated partition
300 five-dimensional objects comprising three randomly generated
spherical Gaussian clusters.
The variance of each cluster lies in 0.1 − 0.3
The center components are independently generated from N(0, 0.7).

Dataset generation
Example
Figure 8 : 300 objects, 5 features, 3 classes

Experiments
Let us denote thus generated partition as λ with kλ clusters. The
profile of partitions R = {ρ1, ρ2, . . . , ρT } for consensus algorithms is
constructed as a result of T runs of k-means clustering algorithm
starting from random k centers.
We carry out the experiments in four settings (next slides).
The size of an ensemble T = 100 for all our experiments.
10 runs for every of 10 generated datasets.

Experiment 1
Investigation of influence of the number of clusters kλ ∈ {2, 3, 5, 9} under
various numbers of minimal votes
a) two clusters case kλ = 2, k ∈ {2, 3, 4, 5},
b) three clusters case kλ = 3, k ∈ {2, 3},
c) five clusters case kλ = 5, k ∈ {2, 5},
d) nine clusters case kλ = 9, k ∈ {2, 3, 4, 5, 6, 7, 8, 9};

Experiments 2 & 3
2 Investigation of the numbers of clusters of ensemble clusterers with
fixed number of true clusters kλ = 5
a) k = 2,
b) k ∈ {2, 3, 4, 5},
c) k ∈ {5},
d) k ∈ {5, 6, 7, 8, 9}
e) k = 9;
3 Investigation of the number of objects N ∈ {100, 300, 500, 1000};

Experiment 4
Comparison with the state-of-the-art algorithms
a) two clusters case kλ = 2, k = 2,
b) three clusters case kλ = 3, k ∈ {2, 3},
c) five clusters case kλ = 5, k ∈ {2, 3, 4, 5},
d) nine clusters case kλ = 9, k ∈ {2, 3, 4, 5, 6, 7, 8, 9}.

Similarity between partitions
ARI measure
Adjusted Rand Index (Hubert, Arabie 1986)
Given two partitions ρa = {Ra
1 , . . . , Ra
ka
} and ρb = {Rb
1 , . . . , Rb
kb
}, where
Na
h = |Ra
h|, Nhm = |Ra
h Rb
m|, N is the number of objects,
Ca =
h
Na
h
2
=
h
Na
h (Na
h −1)
2 .
φARI
(ρa
, ρb
) = hm
Nhm
2
− CaCb
N
2
1
2(Ca + Cb) − CaCb
N
2

Algorithms under comparison
AddRemAdd (Mirkin 2011; Mirkin and Shestakov, 2013)
Voting Scheme (Dimitriadou, Weingessel and Hornik, 2002)
cVote (Ayad, 2010)
Condorcet and Borda Consensus (Dominguez, Carrie and Pujol, 2008)
Meta-CLustering Algorithm (Strehl and Ghosh, 2002)
Hyper Graph Partitioning Algorithm
Cluster-based Similarity Partitioning Algorithm

Computational Experiments
Experiment Scheme

0 10% 20% 30% 40% 50% 60% 70%
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Minimal voting threshold
ARI
Two cluters
Three clusters
Five clusters
Nine clusters
Figure 9 : Influence of minimal voting threshold to ARI for different number of
true clusters

1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dataset no.
ARI
2
2–5
5
5–9
9
Figure 10 : ARI for different numbers of clusters of the ensemble clusterers with
kλ = 5 (each point is averaged over 10 datasets)

1 2 3 4 5 6 7 8 9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dataset no.
ARI
100
300
500
1000
Figure 11 : Influence of different numbers of objects to ARI

Lattice ARA Borda MCLA CSPA
HGPA Condorse CVote Vote
1 2 3 4 5 6 7 8 9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dataset no.
ARI
Figure 12 : Two clusters
1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dataset no.
ARI
Figure 13 : Three clusters

Lattice ARA Borda MCLA CSPA
HGPA Condorse CVote Vote
1 2 3 4 5 6 7 8 9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dataset no.
ARI
Figure 14 : Five clusters
1 2 3 4 5 6 7 8 9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Dataset no.
ARI
Figure 15 : Nine clusters

Conclusion
Optimal voting threshold in terms of minimal intent size for the
resulting anticahin of concepts is not constant; moreover, it is not
always majority of votes of ensemble members.
Our FCA-based consensus clustering method works better if set the
number of blocks for the ensemble clusterers to be equal to the size
of the original (true) partition.
ARI depends on the number of objects: The higher the number, the
lower ARI.
For two (and almost for all three) true clusters our method beats the
compared algorithms and in some cases consensus clustering task was
solved with 100% accuracy.
For larger number of clusters, our method is median among the
compared methods.

Future prospects
■ Upper and lower semi-lattices in the Pattern Structures framework
(Ganter, Kuznetsov, 2001) as a search space.
■ Experiments with real data and applications.

Thank you!

References
B. Mirkin
Clustering: A Data Recovery Approach, 2012
found E. Dimitriadou, A. Weingessel and K. Hornik
A Combination Scheme for Fuzzy Clustering
In International Journal of Pattern Recognition and Artificial Intelligence,
2002.
H. Ayad, M. Kamel
On voting-based consensus of cluster ensembles
Pattern Recognition, pp. 1943-1953, 2010
A. Guenoche.
Consensus of partitions : a constructive approach
Adv. Data Analysis and Classification 5, pp. 215-229, 2011.

References
X. Sevillano Dominguez, J. C. Socoro Carrie and
F. Alias Pujol.
Fuzzy clusterers combination by positional voting for robust document
clustering
Procesamiento del lenguaje natural, 43, pp. 245-253.
A. Strehl, J. Ghosh
Cluster ensembles – a knowledge reuse framework for combining multiple
partitions
Journal on Machine Learning Research, 2002.

A lattice-based consensus clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to A lattice-based consensus clustering

Similar to A lattice-based consensus clustering (20)

More from Dmitrii Ignatov

More from Dmitrii Ignatov (11)

Recently uploaded

Recently uploaded (20)

A lattice-based consensus clustering