Presentation about the clustering algorithm gSkeletonClu.
Huang, J., Sun, H., Song, Q., Deng, H., & Han, J. (2013). Revealing density-based clustering structure from the core-connected tree of a network. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1876–1889. http://doi.org/10.1109/TKDE.2012.100
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
gSkeletonClu - Revealing density-based clustering structure from the core-connected tree of a network
1. gSkeletonClu [1]
Revealing density-based clustering structure from the core-connected
tree of a network
[1]Huang, J., Sun, H., Song, Q., Deng, H., & Han, J. (2013). Revealing density-based clustering structure from the core-connected tree of a
network. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1876–1889. http://doi.org/10.1109/TKDE.2012.100
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6200274&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F69%2F4358933%2F06200274.pdf%
3Farnumber%3D6200274
3. Overview
Given a weighted network…
1- Calculate its CCMST with the Core-Connectivity Similarity
2- Find the components called (Structure Core-Connected)
● Components that contains the core
3- Attach the vertex classified as border
4 - Identify the Hubs and Outlier
10. Def6) Hubs and Outliers
if h does not belong to any cluster
AND
if h bridges multiples cluster, such that:
h E r(u) ^ h E r(v)
then h is hub.
If not hub:
v is Outlier
11. Def6) Hubs and Outliers
if h does not belong to any cluster
AND
if h bridges multiples cluster, such that:
● h E r(u) ^ h E r(v)
then h is hub.
If not hub:
v is Outlier
hub
outlier
17. Def10) Structure Core-Connected
Given Ɛ E IR, μ E IN;
u, v E V;
u, and v are directly core-connected with each other if and only if:
● Kε,μ (u) ^Kε,μ (v) ^ u ⟼ ε,μv
This is denoted by:
u ⟷ ε,μv
gSkeletonClu will first try to find structures that respect this definition above, after
that will append the "borders" ( vertex that are "directly structure reachable" but
don't respect this def. above). At the end, the gSkeletonClu will separate the
clusters, hubs and outliers.
18. CCMST - Core-Connected Maximal Spanning Tree
Instead to use the complete network the authors proved that it is possible to
identify the Structure Core-Connected components from the CCMST, considering
the weight as the "CCS(u,v)".
Ɛ-Candidates:
● 0.51
● 0.47
● 0.43
● 0.08
● 0
26. Step 3B - Fire again !
Attach the borders!
Ɛ= 0.47
27. Step 3C - Kill it, before it kills you!
Detect Cluster, hubs and outliers
n0 is a hub because:
● n0 does not belong to any cluster
● n0 bridges the clusters A and B.
n7 is a outlier because:
● it is not a hub =(
hub
outlier
28. Results - Guard the guns… You are the winner!
(or just a survivor...)
29. Clustering of Automatically Selected Ɛ
If you have the Ɛ candidates extracted from the CCMST…
AND...
If you adopt a way to measure what is the best Ɛ...
Then, you can automatically select the Ɛ parameter.
One possible choice is to use the modularity Q as a quality measure of network
clustering. The Q value belongs to [0,1]. The higher the value close to 1 indicates a
better clustering result.
In a nutshell… You should run the gSkeletonClu for all Ɛ candidates and based on a
quality index, choose the best partition!!!
30. Did you like?
There is more!
From the CCMST is possible to extract the clustering hierarchy… (next opportunity)
Limitation
● The gSkletonClu just can be applied on networks!
● In the author`s paper of gSkeletonClu, the tests show that it is slower than
SCAN…
● Maybe it can not work in BIG networks. (more than 1 million of vertex)
○ SCAN ++ (Shiokawa, 2015) [1][2] did tests in BIG networks and could not
perform the gSkeleton on them…
Have fun!
[1] http://www.vldb.org/pvldb/vol8/p1178-shiokawa.pdf
[2] htp://pt.slideshare.net/LazyShion/scan-efficient-algorithm-for-finding-clusters-hubs-and-outliers-on-largescale-graphs-vldb-2015