SlideShare a Scribd company logo
1 of 23
Redundancy-Aware
Maximal Cliques
Jia Wang James Cheng Ada Wai-Chee Fu
Chinese University of Hong Kong
Maximal Cliques
β€’ Input
β€’ Undirected graph 𝐺 = (𝑉, 𝐸)
β€’ Maximal cliques
β€’ Clique: vertex set of a complete subgraph
β€’ Maximal: adding vertex makes it no clique
2
a
gfe
dc
b
β€’ MCE (Maximal Clique Enumeration)
β€’ exhaustive: finding set of ALL maximal cliques
Classic problem
3
a
gfe
dc
b a
gfe
dc
b a
gfe
dc
b
Classic algorithm
β€’ Algorithm: recursive search
β€’ Maintain current clique 𝐢 & candidate set 𝑇
β€’ Recursion:
β€’ select vertex in 𝑇, add to 𝐢 (a branch)
β€’ update 𝑇
4
a
gfe
dc
b
Classic algorithm
β€’ Example
5
a
gfe
dc
ba
gfe
dc
ba
gfe
dc
ba
gfe
dc
ba
gfe
dc
b current clique
candidates
Problems of MCE
β€’ Usability
β€’ overwhelmingly large output
β€’ cliques less useful due to overlap
β€’ full MCE no good or necessary
β€’ anomaly detection, exploration…
β€’ Speed
β€’ exhaustive search of large space
β€’ can be exponentially many
6
a
gfe
dc
b
a
gfe
d
c
b
overlap
overlap
Problems of MCE
β€’ Instead we desire
β€’ I: compact representation – each result meaningful
β€’ II: preserved information – widely covering
β€’ I & II: a good summary, e.g.:
7
a
gfe
dc
ba
gfe
dc
b a
gfe
dc
b
Notations
8
𝑀 Set of all maximal cliques
𝑆 a subset of 𝑀 (summary)
𝐢/𝐢’ current/last maximal clique
π‘Ÿ |πΆβ€²βˆ©πΆ|
|𝐢|
, overlap ratio
β€’ Clique visibility
β€’ visibility of 𝐢 given 𝑆:
max ratio π‘Ÿ of 𝐢 covered by any 𝐢’ in 𝑆
β€’ Denoted by 𝑣𝑖𝑠(𝐢)
β€’ 𝝉-visible summary
β€’ A summary 𝑆 such that 𝑣𝑖𝑠 𝐢 β‰₯ 𝜏
for each 𝐢 in 𝑀
β€’ Problem: 𝝉-visible MCE
β€’ find a small 𝜏-visible summary 𝑆 of 𝑀
a
gfe
dc
b
A new notion
9
Have enabled
redundancy
reduction.
Possibly faster too?
𝑣𝑖𝑠({𝑏, 𝑑, 𝑓, 𝑔})
= 3/4
𝑣𝑖𝑠({π‘Ž, 𝑏, 𝑐, 𝑑, 𝑓})
= 4/5
a 3/4-visible
summary
𝑆 = {{π‘Ž, 𝑏, 𝑑, 𝑒, 𝑓}}
A naΓ―ve implementation
β€’ In classic MCE
β€’ 𝑆: summary of cliques so far
β€’ 𝐢 compare to each maximal clique in
β€’ οƒ  add 𝐢 to 𝑆: if no redundancy
β€’ οƒ  discard 𝐢: if much overlap with any 𝐢’ in 𝑆
β€’ Overhead
β€’ 𝑂(𝑇 𝑀𝐢𝐸 + |𝑀| Γ— |𝑆|)
β€’ costly computation
10
b
d
f
a
e c
g
Main idea
β€’ Characterizing search process
β€’ nearby cliques 𝐢 and 𝐢’ (leafs) correlated
β€’ have common ancestors in search tree
β€’ 𝐢 ∼ 𝐢’ when close in search tree
11
C C’
Shared by C & C’
Shared by C & C’’
C’’
β€’ Glancing at last one
β€’ discard most redundancy in one shot
For efficiency – first step
12
generated sequence of cliques
For efficiency – first step
β€’ Summary as a sample
β€’ retain with probability 𝑠 π‘Ÿ : decreases with π‘Ÿ
β€’ cliques as data points, π‘Ÿ as slope
β€’ a perspective: analogy to importance sampling
13
generated sequence of cliques
high 𝑠(π‘Ÿ) low 𝑠(π‘Ÿ)
For efficiency – first step
β€’ Choice of 𝑠(π‘Ÿ)
β€’ To meet visibility requirements
β€’ Choose: 𝑠 π‘Ÿ =
(1βˆ’π‘Ÿ)(2βˆ’πœ)
2βˆ’π‘Ÿβˆ’πœ
β€’ Claim: 𝐸[𝑣𝑖𝑠(𝐢)] β‰₯ 𝜏 for all 𝐢
14
C’
For efficiency – a further step
β€’ Detected redundancy when fully grown
β€’ Now: earlier with foresight
β€’ At inner node
β€’ lower bound π‘Ÿ
β€’ prune whole branch with large π‘Ÿ
15
foretell r at least
how much for
any C starting
here?
C
𝒕 more vertices to 𝐢
At most π’š vertices in 𝐢′ for 𝐢
(forming a clique)
Then at least π’š βˆ’ 𝒕 vertices
in 𝐢 ∩ 𝐢′
C
For efficiency – a further step
β€’ Sampling search branch
β€’ Want: guarantee still holds
β€’ for expected visibility
β€’ Need: maintain Pr[final retaining prob.] β‰₯ 𝑠(π‘Ÿ)
β€’ How: set Pr[sample a branch] = 𝑙
𝑠( π‘Ÿ)
β€’ 𝑙: upper bound of branch depth
β€’ π‘Ÿ: lower bound of π‘Ÿ
16
...
T1level-1
level-2
level-l
T2
Tl
s(r1)^(1/l1)
s(r2)^(1/l2)
s(r)^(1/l)
Applying the summary
β€’ Feed other computations
β€’ A succinct input
β€’ Example: top-π‘˜ results
β€’ Approx.` ratio using 𝑆: 𝜏(1 βˆ’ 1/𝑒)
17
MCE Summary Applications
Set of all maximal cliques
𝜏-visible summary
top-k retrieval
exploration
visualization
…
Applying the summary
β€’ Discovering clique space
β€’ Proposal: explore interactively
18
All maximal
cliques, M
summary of M,
Top-k if too
many
Interesting
region Z
cliques on Z
and its
neighbors, M’
summary of M’
……
On real world networks
β€’ Datasets
19
Blog Skitter Wiki Patent
|𝑉| 990K 1.7M 2.4M 3.7M
|𝐸| 6.6M 11.1M 41.7M 33M
|𝑀| 11.2M 18.3M 82.7M 6.1M # of all maximal
cliques
On real world networks
β€’ Summary size
β€’ slimmed output
β€’ sharp drop from
𝜏 = 1 to 𝜏 = 0.9
20
~50 times smaller
On real world networks
β€’ Running time
β€’ Reduced time
β€’ Especially from
𝜏 = 1 to 𝜏 = 0.9
21
time halved
On real world networks
β€’ Top-π‘˜ reporting
β€’ using full result or summary
β€’ setting: π‘˜ = 20, 𝜏 = 0.7
β€’ result: small quality loss, greatly faster
22
Blog Skitter Wiki Patent
π‘„π‘ π‘Žπ‘šπ‘ 822 1205 462 173
𝑄 π‘Žπ‘™π‘™ 826 1214 464 174
π‘‡π‘ π‘Žπ‘šπ‘ 1.38 4.02 8.59 0.7
π‘‡π‘Žπ‘™π‘™ 28.4 57.5 197 8.9
οƒ  Quality by summary
οƒ  Quality by all cliques
οƒ  Time by summary
οƒ  Time by all cliques
Wrapping up
β€’ Tradeoff
β€’ completeness οƒ  compactness & usability & time
β€’ Approaches
β€’ notion of 𝜏-visible summary
β€’ fast redundancy detection
β€’ early pruning
β€’ summary as a sample
β€’ Applications
β€’ exploration, top-π‘˜, and more
23

More Related Content

What's hot

Code Coverage [9] - Software Testing Techniques (CIS640)
Code Coverage [9] - Software Testing Techniques (CIS640)Code Coverage [9] - Software Testing Techniques (CIS640)
Code Coverage [9] - Software Testing Techniques (CIS640)Venkatesh Prasad Ranganath
Β 
Fast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesFast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesWenlei Xie
Β 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Mapsivaderivader
Β 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
Β 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
Β 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraWenlei Xie
Β 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Β 
Tides Lecture Ernst Schrama
Tides Lecture Ernst SchramaTides Lecture Ernst Schrama
Tides Lecture Ernst SchramaErnst Schrama
Β 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
Β 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...T. E. BOGALE
Β 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine LearningFabian Pedregosa
Β 
Java 8 Concurrency Updates
Java 8 Concurrency UpdatesJava 8 Concurrency Updates
Java 8 Concurrency UpdatesDamian Łukasik
Β 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point AnalysisMark Conway
Β 
[Seminar] hyunwook 0624
[Seminar] hyunwook 0624[Seminar] hyunwook 0624
[Seminar] hyunwook 0624ivaderivader
Β 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
Β 
Effective management of high volume numeric data with histograms
Effective management of high volume numeric data with histogramsEffective management of high volume numeric data with histograms
Effective management of high volume numeric data with histogramsFred Moyer
Β 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
Β 

What's hot (20)

Code Coverage [9] - Software Testing Techniques (CIS640)
Code Coverage [9] - Software Testing Techniques (CIS640)Code Coverage [9] - Software Testing Techniques (CIS640)
Code Coverage [9] - Software Testing Techniques (CIS640)
Β 
Fast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesFast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block Updates
Β 
Dive into Deep Learning
Dive into Deep LearningDive into Deep Learning
Dive into Deep Learning
Β 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Maps
Β 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
Β 
MBrace: Cloud Computing with F#
MBrace: Cloud Computing with F#MBrace: Cloud Computing with F#
MBrace: Cloud Computing with F#
Β 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Β 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data Era
Β 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
Β 
Tides Lecture Ernst Schrama
Tides Lecture Ernst SchramaTides Lecture Ernst Schrama
Tides Lecture Ernst Schrama
Β 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Β 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Β 
Raster processing
Raster processingRaster processing
Raster processing
Β 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
Β 
Java 8 Concurrency Updates
Java 8 Concurrency UpdatesJava 8 Concurrency Updates
Java 8 Concurrency Updates
Β 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
Β 
[Seminar] hyunwook 0624
[Seminar] hyunwook 0624[Seminar] hyunwook 0624
[Seminar] hyunwook 0624
Β 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
Β 
Effective management of high volume numeric data with histograms
Effective management of high volume numeric data with histogramsEffective management of high volume numeric data with histograms
Effective management of high volume numeric data with histograms
Β 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Β 

Viewers also liked

State of marketing leadership 2015
State of marketing leadership 2015State of marketing leadership 2015
State of marketing leadership 2015Prayukth K V
Β 
Apresentaçao comcluida
Apresentaçao comcluidaApresentaçao comcluida
Apresentaçao comcluidajoao2000pedro
Β 
Diapos fiscalidad-en-internet
Diapos fiscalidad-en-internetDiapos fiscalidad-en-internet
Diapos fiscalidad-en-internetkarla lovera salas
Β 
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016  Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016 Pablo Carrillo
Β 
False Memory FINAL Eng Text
False Memory FINAL Eng TextFalse Memory FINAL Eng Text
False Memory FINAL Eng TextDante Busquets
Β 
Vivir en salud
Vivir en saludVivir en salud
Vivir en saludCeip Palencia
Β 
The Success Guide V4
The Success Guide V4The Success Guide V4
The Success Guide V4Brett Loveday
Β 
Webquest
WebquestWebquest
Webquestabandono
Β 
Brain Fingerprinting
Brain FingerprintingBrain Fingerprinting
Brain FingerprintingSWARUP GHOSH
Β 
Lapartesolidadelatierra
LapartesolidadelatierraLapartesolidadelatierra
LapartesolidadelatierraEva Fresco
Β 
Minerales y rocas
Minerales y rocasMinerales y rocas
Minerales y rocasEva Fresco
Β 
Social media marketing planning guide for 2016
Social media marketing planning guide for 2016Social media marketing planning guide for 2016
Social media marketing planning guide for 2016Prayukth K V
Β 
Importancia del suelo
Importancia del sueloImportancia del suelo
Importancia del sueloandreavargasUuU
Β 

Viewers also liked (14)

State of marketing leadership 2015
State of marketing leadership 2015State of marketing leadership 2015
State of marketing leadership 2015
Β 
Apresentaçao comcluida
Apresentaçao comcluidaApresentaçao comcluida
Apresentaçao comcluida
Β 
Diapos fiscalidad-en-internet
Diapos fiscalidad-en-internetDiapos fiscalidad-en-internet
Diapos fiscalidad-en-internet
Β 
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016  Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016
Turismo Presencia del estado de Chihuahua en Fitur Madrid 2016
Β 
False Memory FINAL Eng Text
False Memory FINAL Eng TextFalse Memory FINAL Eng Text
False Memory FINAL Eng Text
Β 
Vivir en salud
Vivir en saludVivir en salud
Vivir en salud
Β 
Meu primeiro site
Meu primeiro siteMeu primeiro site
Meu primeiro site
Β 
The Success Guide V4
The Success Guide V4The Success Guide V4
The Success Guide V4
Β 
Webquest
WebquestWebquest
Webquest
Β 
Brain Fingerprinting
Brain FingerprintingBrain Fingerprinting
Brain Fingerprinting
Β 
Lapartesolidadelatierra
LapartesolidadelatierraLapartesolidadelatierra
Lapartesolidadelatierra
Β 
Minerales y rocas
Minerales y rocasMinerales y rocas
Minerales y rocas
Β 
Social media marketing planning guide for 2016
Social media marketing planning guide for 2016Social media marketing planning guide for 2016
Social media marketing planning guide for 2016
Β 
Importancia del suelo
Importancia del sueloImportancia del suelo
Importancia del suelo
Β 

Similar to clique-summary

Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25MapR Technologies
Β 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
Β 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentGiuliana Carullo
Β 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
Β 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
Β 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
Β 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_publicDavide Cherubini
Β 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
Β 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
Β 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
Β 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
Β 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25Ted Dunning
Β 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
Β 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
Β 
cutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfcutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfHyeonjeongShin6
Β 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblasMIT
Β 

Similar to clique-summary (20)

Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
Β 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Β 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence Alignment
Β 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Β 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
Β 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Β 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
Β 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Β 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Β 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
Β 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
Β 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
Β 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
Β 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Β 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
Β 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
Β 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
Β 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
Β 
cutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfcutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdf
Β 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
Β 

clique-summary

  • 1. Redundancy-Aware Maximal Cliques Jia Wang James Cheng Ada Wai-Chee Fu Chinese University of Hong Kong
  • 2. Maximal Cliques β€’ Input β€’ Undirected graph 𝐺 = (𝑉, 𝐸) β€’ Maximal cliques β€’ Clique: vertex set of a complete subgraph β€’ Maximal: adding vertex makes it no clique 2 a gfe dc b
  • 3. β€’ MCE (Maximal Clique Enumeration) β€’ exhaustive: finding set of ALL maximal cliques Classic problem 3 a gfe dc b a gfe dc b a gfe dc b
  • 4. Classic algorithm β€’ Algorithm: recursive search β€’ Maintain current clique 𝐢 & candidate set 𝑇 β€’ Recursion: β€’ select vertex in 𝑇, add to 𝐢 (a branch) β€’ update 𝑇 4
  • 6. Problems of MCE β€’ Usability β€’ overwhelmingly large output β€’ cliques less useful due to overlap β€’ full MCE no good or necessary β€’ anomaly detection, exploration… β€’ Speed β€’ exhaustive search of large space β€’ can be exponentially many 6 a gfe dc b a gfe d c b overlap overlap
  • 7. Problems of MCE β€’ Instead we desire β€’ I: compact representation – each result meaningful β€’ II: preserved information – widely covering β€’ I & II: a good summary, e.g.: 7 a gfe dc ba gfe dc b a gfe dc b
  • 8. Notations 8 𝑀 Set of all maximal cliques 𝑆 a subset of 𝑀 (summary) 𝐢/𝐢’ current/last maximal clique π‘Ÿ |πΆβ€²βˆ©πΆ| |𝐢| , overlap ratio
  • 9. β€’ Clique visibility β€’ visibility of 𝐢 given 𝑆: max ratio π‘Ÿ of 𝐢 covered by any 𝐢’ in 𝑆 β€’ Denoted by 𝑣𝑖𝑠(𝐢) β€’ 𝝉-visible summary β€’ A summary 𝑆 such that 𝑣𝑖𝑠 𝐢 β‰₯ 𝜏 for each 𝐢 in 𝑀 β€’ Problem: 𝝉-visible MCE β€’ find a small 𝜏-visible summary 𝑆 of 𝑀 a gfe dc b A new notion 9 Have enabled redundancy reduction. Possibly faster too? 𝑣𝑖𝑠({𝑏, 𝑑, 𝑓, 𝑔}) = 3/4 𝑣𝑖𝑠({π‘Ž, 𝑏, 𝑐, 𝑑, 𝑓}) = 4/5 a 3/4-visible summary 𝑆 = {{π‘Ž, 𝑏, 𝑑, 𝑒, 𝑓}}
  • 10. A naΓ―ve implementation β€’ In classic MCE β€’ 𝑆: summary of cliques so far β€’ 𝐢 compare to each maximal clique in β€’ οƒ  add 𝐢 to 𝑆: if no redundancy β€’ οƒ  discard 𝐢: if much overlap with any 𝐢’ in 𝑆 β€’ Overhead β€’ 𝑂(𝑇 𝑀𝐢𝐸 + |𝑀| Γ— |𝑆|) β€’ costly computation 10
  • 11. b d f a e c g Main idea β€’ Characterizing search process β€’ nearby cliques 𝐢 and 𝐢’ (leafs) correlated β€’ have common ancestors in search tree β€’ 𝐢 ∼ 𝐢’ when close in search tree 11 C C’ Shared by C & C’ Shared by C & C’’ C’’
  • 12. β€’ Glancing at last one β€’ discard most redundancy in one shot For efficiency – first step 12 generated sequence of cliques
  • 13. For efficiency – first step β€’ Summary as a sample β€’ retain with probability 𝑠 π‘Ÿ : decreases with π‘Ÿ β€’ cliques as data points, π‘Ÿ as slope β€’ a perspective: analogy to importance sampling 13 generated sequence of cliques high 𝑠(π‘Ÿ) low 𝑠(π‘Ÿ)
  • 14. For efficiency – first step β€’ Choice of 𝑠(π‘Ÿ) β€’ To meet visibility requirements β€’ Choose: 𝑠 π‘Ÿ = (1βˆ’π‘Ÿ)(2βˆ’πœ) 2βˆ’π‘Ÿβˆ’πœ β€’ Claim: 𝐸[𝑣𝑖𝑠(𝐢)] β‰₯ 𝜏 for all 𝐢 14
  • 15. C’ For efficiency – a further step β€’ Detected redundancy when fully grown β€’ Now: earlier with foresight β€’ At inner node β€’ lower bound π‘Ÿ β€’ prune whole branch with large π‘Ÿ 15 foretell r at least how much for any C starting here? C 𝒕 more vertices to 𝐢 At most π’š vertices in 𝐢′ for 𝐢 (forming a clique) Then at least π’š βˆ’ 𝒕 vertices in 𝐢 ∩ 𝐢′ C
  • 16. For efficiency – a further step β€’ Sampling search branch β€’ Want: guarantee still holds β€’ for expected visibility β€’ Need: maintain Pr[final retaining prob.] β‰₯ 𝑠(π‘Ÿ) β€’ How: set Pr[sample a branch] = 𝑙 𝑠( π‘Ÿ) β€’ 𝑙: upper bound of branch depth β€’ π‘Ÿ: lower bound of π‘Ÿ 16 ... T1level-1 level-2 level-l T2 Tl s(r1)^(1/l1) s(r2)^(1/l2) s(r)^(1/l)
  • 17. Applying the summary β€’ Feed other computations β€’ A succinct input β€’ Example: top-π‘˜ results β€’ Approx.` ratio using 𝑆: 𝜏(1 βˆ’ 1/𝑒) 17 MCE Summary Applications Set of all maximal cliques 𝜏-visible summary top-k retrieval exploration visualization …
  • 18. Applying the summary β€’ Discovering clique space β€’ Proposal: explore interactively 18 All maximal cliques, M summary of M, Top-k if too many Interesting region Z cliques on Z and its neighbors, M’ summary of M’ ……
  • 19. On real world networks β€’ Datasets 19 Blog Skitter Wiki Patent |𝑉| 990K 1.7M 2.4M 3.7M |𝐸| 6.6M 11.1M 41.7M 33M |𝑀| 11.2M 18.3M 82.7M 6.1M # of all maximal cliques
  • 20. On real world networks β€’ Summary size β€’ slimmed output β€’ sharp drop from 𝜏 = 1 to 𝜏 = 0.9 20 ~50 times smaller
  • 21. On real world networks β€’ Running time β€’ Reduced time β€’ Especially from 𝜏 = 1 to 𝜏 = 0.9 21 time halved
  • 22. On real world networks β€’ Top-π‘˜ reporting β€’ using full result or summary β€’ setting: π‘˜ = 20, 𝜏 = 0.7 β€’ result: small quality loss, greatly faster 22 Blog Skitter Wiki Patent π‘„π‘ π‘Žπ‘šπ‘ 822 1205 462 173 𝑄 π‘Žπ‘™π‘™ 826 1214 464 174 π‘‡π‘ π‘Žπ‘šπ‘ 1.38 4.02 8.59 0.7 π‘‡π‘Žπ‘™π‘™ 28.4 57.5 197 8.9 οƒ  Quality by summary οƒ  Quality by all cliques οƒ  Time by summary οƒ  Time by all cliques
  • 23. Wrapping up β€’ Tradeoff β€’ completeness οƒ  compactness & usability & time β€’ Approaches β€’ notion of 𝜏-visible summary β€’ fast redundancy detection β€’ early pruning β€’ summary as a sample β€’ Applications β€’ exploration, top-π‘˜, and more 23

Editor's Notes

  1. All points to a good summary