2. Maximal Cliques
β’ Input
β’ Undirected graph πΊ = (π, πΈ)
β’ Maximal cliques
β’ Clique: vertex set of a complete subgraph
β’ Maximal: adding vertex makes it no clique
2
a
gfe
dc
b
3. β’ MCE (Maximal Clique Enumeration)
β’ exhaustive: finding set of ALL maximal cliques
Classic problem
3
a
gfe
dc
b a
gfe
dc
b a
gfe
dc
b
4. Classic algorithm
β’ Algorithm: recursive search
β’ Maintain current clique πΆ & candidate set π
β’ Recursion:
β’ select vertex in π, add to πΆ (a branch)
β’ update π
4
6. Problems of MCE
β’ Usability
β’ overwhelmingly large output
β’ cliques less useful due to overlap
β’ full MCE no good or necessary
β’ anomaly detection, explorationβ¦
β’ Speed
β’ exhaustive search of large space
β’ can be exponentially many
6
a
gfe
dc
b
a
gfe
d
c
b
overlap
overlap
7. Problems of MCE
β’ Instead we desire
β’ I: compact representation β each result meaningful
β’ II: preserved information β widely covering
β’ I & II: a good summary, e.g.:
7
a
gfe
dc
ba
gfe
dc
b a
gfe
dc
b
9. β’ Clique visibility
β’ visibility of πΆ given π:
max ratio π of πΆ covered by any πΆβ in π
β’ Denoted by π£ππ (πΆ)
β’ π-visible summary
β’ A summary π such that π£ππ πΆ β₯ π
for each πΆ in π
β’ Problem: π-visible MCE
β’ find a small π-visible summary π of π
a
gfe
dc
b
A new notion
9
Have enabled
redundancy
reduction.
Possibly faster too?
π£ππ ({π, π, π, π})
= 3/4
π£ππ ({π, π, π, π, π})
= 4/5
a 3/4-visible
summary
π = {{π, π, π, π, π}}
10. A naΓ―ve implementation
β’ In classic MCE
β’ π: summary of cliques so far
β’ πΆ compare to each maximal clique in
β’ ο add πΆ to π: if no redundancy
β’ ο discard πΆ: if much overlap with any πΆβ in π
β’ Overhead
β’ π(π ππΆπΈ + |π| Γ |π|)
β’ costly computation
10
11. b
d
f
a
e c
g
Main idea
β’ Characterizing search process
β’ nearby cliques πΆ and πΆβ (leafs) correlated
β’ have common ancestors in search tree
β’ πΆ βΌ πΆβ when close in search tree
11
C Cβ
Shared by C & Cβ
Shared by C & Cββ
Cββ
12. β’ Glancing at last one
β’ discard most redundancy in one shot
For efficiency β first step
12
generated sequence of cliques
13. For efficiency β first step
β’ Summary as a sample
β’ retain with probability π π : decreases with π
β’ cliques as data points, π as slope
β’ a perspective: analogy to importance sampling
13
generated sequence of cliques
high π (π) low π (π)
14. For efficiency β first step
β’ Choice of π (π)
β’ To meet visibility requirements
β’ Choose: π π =
(1βπ)(2βπ)
2βπβπ
β’ Claim: πΈ[π£ππ (πΆ)] β₯ π for all πΆ
14
16. For efficiency β a further step
β’ Sampling search branch
β’ Want: guarantee still holds
β’ for expected visibility
β’ Need: maintain Pr[final retaining prob.] β₯ π (π)
β’ How: set Pr[sample a branch] = π
π ( π)
β’ π: upper bound of branch depth
β’ π: lower bound of π
16
...
T1level-1
level-2
level-l
T2
Tl
s(r1)^(1/l1)
s(r2)^(1/l2)
s(r)^(1/l)
17. Applying the summary
β’ Feed other computations
β’ A succinct input
β’ Example: top-π results
β’ Approx.` ratio using π: π(1 β 1/π)
17
MCE Summary Applications
Set of all maximal cliques
π-visible summary
top-k retrieval
exploration
visualization
β¦
18. Applying the summary
β’ Discovering clique space
β’ Proposal: explore interactively
18
All maximal
cliques, M
summary of M,
Top-k if too
many
Interesting
region Z
cliques on Z
and its
neighbors, Mβ
summary of Mβ
β¦β¦
19. On real world networks
β’ Datasets
19
Blog Skitter Wiki Patent
|π| 990K 1.7M 2.4M 3.7M
|πΈ| 6.6M 11.1M 41.7M 33M
|π| 11.2M 18.3M 82.7M 6.1M # of all maximal
cliques
20. On real world networks
β’ Summary size
β’ slimmed output
β’ sharp drop from
π = 1 to π = 0.9
20
~50 times smaller
21. On real world networks
β’ Running time
β’ Reduced time
β’ Especially from
π = 1 to π = 0.9
21
time halved
22. On real world networks
β’ Top-π reporting
β’ using full result or summary
β’ setting: π = 20, π = 0.7
β’ result: small quality loss, greatly faster
22
Blog Skitter Wiki Patent
ππ πππ 822 1205 462 173
π πππ 826 1214 464 174
ππ πππ 1.38 4.02 8.59 0.7
ππππ 28.4 57.5 197 8.9
ο Quality by summary
ο Quality by all cliques
ο Time by summary
ο Time by all cliques
23. Wrapping up
β’ Tradeoff
β’ completeness ο compactness & usability & time
β’ Approaches
β’ notion of π-visible summary
β’ fast redundancy detection
β’ early pruning
β’ summary as a sample
β’ Applications
β’ exploration, top-π, and more
23