Consensus Clustering presentation slides by Arghadip Chakraborty
1. Consensus Clustering
Name: Arghadip Chakraborty
College: Netaji Subhash Engineering College
Stream: Computer Science & Engineering
Section: B Year: 3rd year Class Roll: 91
Univ. Roll: 10900117106
2. What is Clustering?
➢ Grouping objects in different
clusters according to their similarity.
➢ Unsupervised learning method.
➢ Eg: K-means, K-prototype, C-means,
DBSCAN etc.
3. So.. what is Consensus Clustering?
➢ Consensus means ‘General Agreement’.
➢ Combining multiple clusters into more stable single clusters
which are better than the input clusters.
➢ The process is done by generating consensus matrix at each level.
4. Workflow of Consensus Clustering
Cluster 1
Cluster 2
Cluster N
…..
Consensus
Building
[2]
5. But...why Consensus Clustering?
➢ Better quality and robustness of the clusters.
➢ Producing the correct number of clusters.
➢ Better handling of missing data.
➢ Individual partitions can be obtained independently.
6. Process of Consensus Clustering
Consensus Clustering is based on two steps:
➢ Partition Generation.
➢ Consensus Generation.
[1]
7. Partition Generation Process
Generating partitions by,
➢ Different subsets of attributes.
➢ Applying different clustering algorithms with different bias.
➢ Using different parameters for clustering.
➢ Using random sub-samples of dataset.
[2]
8. Consensus Generation Process
Consensus is generally generated using two approaches,
➢ Median partitioning based approach.
➢ Co-occurrence based approach.
○ Relabeling/Voting based method.
○ Co-association matrix based method.
○ Graph based method.
9. Median Partitioning approach
Given a set of partitions (P = {P1, P2,…., Pn) of all the data points and
the similarity function f (Pi, Pj), the Median Partition, Pc is the
partition that maximizes the similarity to the set.
The Similarity function depend on the agreement & disagreement of
the data points, which is measured by F-measures, Rand index etc.
[2]
10. Co-occurrence based approach
1. Relabeling/Voting method (Algorithm):
STEP 1: Generate the clusters.
STEP 2: Determine the correspondence with the current consensus.
STEP 3: Each instance gains certain vote from the cluster assignments.
STEP 4: Update the consensus and the cluster assignments accordingly.
[2]
11. Co-occurrence based approach
2. Co-association matrix method (Algorithm):
STEP 1: Generate the clusters.
STEP 2: Generate co-association matrix by the similarity of data points.
STEP 3: Apply hierarchical clustering.
STEP 4: Update the clusters.
[2]
13. Co-occurrence based approach
3. Graph based method (Algorithm):
STEP 1: Generate a weighted graph to represent multiple clusters.
STEP 2: Find optimal partition by minimizing the graph cut.
[4]