clique-summary

Redundancy-Aware
Maximal Cliques
Jia Wang James Cheng Ada Wai-Chee Fu
Chinese University of Hong Kong

Maximal Cliques
• Input
• Undirected graph 𝐺 = (𝑉, 𝐸)
• Maximal cliques
• Clique: vertex set of a complete subgraph
• Maximal: adding vertex makes it no clique
2
a
gfe
dc
b

• MCE (Maximal Clique Enumeration)
• exhaustive: finding set of ALL maximal cliques
Classic problem
3
a
gfe
dc
b a
gfe
dc
b a
gfe
dc
b

Classic algorithm
• Algorithm: recursive search
• Maintain current clique 𝐶 & candidate set 𝑇
• Recursion:
• select vertex in 𝑇, add to 𝐶 (a branch)
• update 𝑇
4

a
gfe
dc
b
Classic algorithm
• Example
5
a
gfe
dc
ba
gfe
dc
ba
gfe
dc
ba
gfe
dc
ba
gfe
dc
b current clique
candidates

Problems of MCE
• Usability
• overwhelmingly large output
• cliques less useful due to overlap
• full MCE no good or necessary
• anomaly detection, exploration…
• Speed
• exhaustive search of large space
• can be exponentially many
6
a
gfe
dc
b
a
gfe
d
c
b
overlap
overlap

Problems of MCE
• Instead we desire
• I: compact representation – each result meaningful
• II: preserved information – widely covering
• I & II: a good summary, e.g.:
7
a
gfe
dc
ba
gfe
dc
b a
gfe
dc
b

Notations
8
𝑀 Set of all maximal cliques
𝑆 a subset of 𝑀 (summary)
𝐶/𝐶’ current/last maximal clique
𝑟 |𝐶′∩𝐶|
|𝐶|
, overlap ratio

• Clique visibility
• visibility of 𝐶 given 𝑆:
max ratio 𝑟 of 𝐶 covered by any 𝐶’ in 𝑆
• Denoted by 𝑣𝑖𝑠(𝐶)
• 𝝉-visible summary
• A summary 𝑆 such that 𝑣𝑖𝑠 𝐶 ≥ 𝜏
for each 𝐶 in 𝑀
• Problem: 𝝉-visible MCE
• find a small 𝜏-visible summary 𝑆 of 𝑀
a
gfe
dc
b
A new notion
9
Have enabled
redundancy
reduction.
Possibly faster too?
𝑣𝑖𝑠({𝑏, 𝑑, 𝑓, 𝑔})
= 3/4
𝑣𝑖𝑠({𝑎, 𝑏, 𝑐, 𝑑, 𝑓})
= 4/5
a 3/4-visible
summary
𝑆 = {{𝑎, 𝑏, 𝑑, 𝑒, 𝑓}}

A naïve implementation
• In classic MCE
• 𝑆: summary of cliques so far
• 𝐶 compare to each maximal clique in
•  add 𝐶 to 𝑆: if no redundancy
•  discard 𝐶: if much overlap with any 𝐶’ in 𝑆
• Overhead
• 𝑂(𝑇 𝑀𝐶𝐸 + |𝑀| × |𝑆|)
• costly computation
10

b
d
f
a
e c
g
Main idea
• Characterizing search process
• nearby cliques 𝐶 and 𝐶’ (leafs) correlated
• have common ancestors in search tree
• 𝐶 ∼ 𝐶’ when close in search tree
11
C C’
Shared by C & C’
Shared by C & C’’
C’’

• Glancing at last one
• discard most redundancy in one shot
For efficiency – first step
12
generated sequence of cliques

• Summary as a sample
• retain with probability 𝑠 𝑟 : decreases with 𝑟
• cliques as data points, 𝑟 as slope
• a perspective: analogy to importance sampling
13
generated sequence of cliques
high 𝑠(𝑟) low 𝑠(𝑟)

• Choice of 𝑠(𝑟)
• To meet visibility requirements
• Choose: 𝑠 𝑟 =
(1−𝑟)(2−𝜏)
2−𝑟−𝜏
• Claim: 𝐸[𝑣𝑖𝑠(𝐶)] ≥ 𝜏 for all 𝐶
14

C’
For efficiency – a further step
• Detected redundancy when fully grown
• Now: earlier with foresight
• At inner node
• lower bound 𝑟
• prune whole branch with large 𝑟
15
foretell r at least
how much for
any C starting
here?
C
𝒕 more vertices to 𝐶
At most 𝒚 vertices in 𝐶′ for 𝐶
(forming a clique)
Then at least 𝒚 − 𝒕 vertices
in 𝐶 ∩ 𝐶′
C

For efficiency – a further step
• Sampling search branch
• Want: guarantee still holds
• for expected visibility
• Need: maintain Pr[final retaining prob.] ≥ 𝑠(𝑟)
• How: set Pr[sample a branch] = 𝑙
𝑠( 𝑟)
• 𝑙: upper bound of branch depth
• 𝑟: lower bound of 𝑟
16
...
T1level-1
level-2
level-l
T2
Tl
s(r1)^(1/l1)
s(r2)^(1/l2)
s(r)^(1/l)

Applying the summary
• Feed other computations
• A succinct input
• Example: top-𝑘 results
• Approx.` ratio using 𝑆: 𝜏(1 − 1/𝑒)
17
MCE Summary Applications
Set of all maximal cliques
𝜏-visible summary
top-k retrieval
exploration
visualization
…

Applying the summary
• Discovering clique space
• Proposal: explore interactively
18
All maximal
cliques, M
summary of M,
Top-k if too
many
Interesting
region Z
cliques on Z
and its
neighbors, M’
summary of M’
……

On real world networks
• Datasets
19
Blog Skitter Wiki Patent
|𝑉| 990K 1.7M 2.4M 3.7M
|𝐸| 6.6M 11.1M 41.7M 33M
|𝑀| 11.2M 18.3M 82.7M 6.1M # of all maximal
cliques

• Summary size
• slimmed output
• sharp drop from
𝜏 = 1 to 𝜏 = 0.9
20
~50 times smaller

• Running time
• Reduced time
• Especially from
𝜏 = 1 to 𝜏 = 0.9
21
time halved

• Top-𝑘 reporting
• using full result or summary
• setting: 𝑘 = 20, 𝜏 = 0.7
• result: small quality loss, greatly faster
22
Blog Skitter Wiki Patent
𝑄𝑠𝑎𝑚𝑝 822 1205 462 173
𝑄 𝑎𝑙𝑙 826 1214 464 174
𝑇𝑠𝑎𝑚𝑝 1.38 4.02 8.59 0.7
𝑇𝑎𝑙𝑙 28.4 57.5 197 8.9
 Quality by summary
 Quality by all cliques
 Time by summary
 Time by all cliques

Wrapping up
• Tradeoff
• completeness  compactness & usability & time
• Approaches
• notion of 𝜏-visible summary
• fast redundancy detection
• early pruning
• summary as a sample
• Applications
• exploration, top-𝑘, and more
23

clique-summary

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to clique-summary

Similar to clique-summary (20)

clique-summary

Editor's Notes