Hierarchical clustering techniques

Hierarchical Clustering
Techniques
CS306 Presentation
Presented By:
Md Syed Ahamad
Yanshul Sharma

Outline and Reference
▪ Outline
– Introduction
– Its types and Example
– Selected Research papers
– Experiment in some datasets
▪ Reference
– Introduction to the Hierarchical Clustering , Online Edition ©2009 Cambridge UP.
– Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate
Algorithm for HierarchicalClustering on Euclidean Distances. PAKDD (2) 2013: 111-122.
– Steinbach, M., Karypis, G., Kumar,V., “A Comparison of Document Clustering
Techniques,” University of Minnesota.
CS306 Presentation 2

Introduction
▪ Hierarchical Clustering – clustering given data in hierarchic structure.
– It is structured, more informative than flat clustering.
– Deterministic, Low efficiency
– Important when one of the potential flat clustering problem is concerned.
▪ Most of the flat clustering techniques are concerned with efficiency.
▪ Types
– Agglomerative clustering – bottom up
– DivisiveClustering – top down

Hierarchical clustering types
[ Src: http://www.saedsayad.com/images/Clustering_h1.png ]CS306 Presentation 4

Example
[ Src: http://tangibleauditoryinterfaces.de/wp-content/uploads/2010/04/durcheinander-cluster-chart.png ]CS306 Presentation 5

Selected papers
▪ The paper proposed new algorithm called CLUBS.
▪ CLUBS – Clustering Using Binary Splitting.
– Faster than existing algorithm.
– More accurate, robust and impervious to noise.
– Works in complete unsupervised fashion.
– Also works density based clustering.
– It can be used for refining other algorithm’s performances.
▪ Popular algorithm k-means has repeatability problems of results.
– But CLUBS overcomes this problem.
Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate Algorithm for Hierarchical Clustering
on Euclidean Distances. PAKDD (2) 2013: 111-122.

Approach
▪ CLUBS has two phases
– Divisive – original data set is split recursively into mini-clusters through binary
splitting.
▪ May cause a non optimal way.
– Agglomerative – the final mini-clusters are recursively combined into the final
results.
▪ It backtracks previously wrong calculations.
▪ Algorithm exploits SSQ (Sum of Squares) to minimize cost of split
operation.

Algorithm
▪ Phase 1:
▪ Definition 1 – binary partition BP.
– d-dimensional data distribution D (multi-dimensional array of integers).
– N – non-zero entries of D
– ρi – range [l…u] on the i-th dimension of D, 1 ≤ l ≤u ≤ n, 1 ≤ i ≤ d, size(ρi) = ub(ρi) −
lb(ρi) + 1 = u − l + 1.
– block b (of D) is a d-tuple {ρ1, . . . , ρd}, vol(b)=size(ρ1) × . . . ×. size(ρd)
– A point x = x1, . . . , xd is chosen, lb(ρi) ≤ xi ≤ ub(ρi).
– x divides the range ρi of b into ρlowi = [lb(ρi)..x]and ρhighi = [(x+1)..ub(ρi)], thus
partitioning b into blow={ρ1, . . . , ρlowi , . . . , ρd } and bhigh = {ρ1, . . . , ρhighi , . . . , ρd }.
– (blow, bhigh ) – binary split, i – dimension splitting, x – position splitting.

Algorithm
▪ Definition 2 –stopping condition of BP
– Cs – a cluster , S = (S1, . . . , Sd) = p∈Cs 𝑃 is a vector, p is a point.Centre of Cs,
Cs0=S/N,Qi = p∈Cs 𝑃𝑥𝑃.

Algorithm
– Binary splitting stops when avgSSQ > deltSSQ which yields n’ mini-clusters,
where avgSSQ = SSQ0/n & deltSSQ = overall reduction of SSQ.
▪ Phase 2:
– n’ mini-clusters merged by choosing each best pairs (greedy approach).
– Continues until increase in SSQ is greater than avgdeltSSQ.
– It gives the final result.
▪ Complexity – O(n.d.l.s)

Algorithm

Experiment
– Dataset 1 – 42 patients into 3 groups
(RM,HN,PM). 98 differentially expressed
genes picked up and analysed.
– Dataset 2 – samples extracted from
human breast cancer cells which consist
of four cell group and analysed.
Ek= Error calculation at 10 clusters
ε = probability that two similar data
belongs to same clusters.
Qk = avg % of points in the k-
neighborhood of a generic point
belonging to the same class of that point.

ThankYou

Hierarchical clustering techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Hierarchical clustering techniques

Similar to Hierarchical clustering techniques (20)

More from Md Syed Ahamad

More from Md Syed Ahamad (10)

Recently uploaded

Recently uploaded (20)

Hierarchical clustering techniques