SlideShare a Scribd company logo
1 of 72
Download to read offline
Joint Work with:
Jen Neville Ryan Rossi Nick Duffield
-
-
-
-
-
Social network
Human Disease Network
[Barabasi 2007]
Food Web [2007]
Terrorist Network
[Krebs 2002]Internet (AS) [2005]
Gene Regulatory Network
[Decourty 2008]
Protein Interactions
[breast cancer]
Political blogs
Power grid
(1) Motif Counting
Theory and Algorithms for Large Graphs
(2) Machine Learning Applications
for Motif Counting
§ Network motifs (graphlets) are small induced subgraphs
§ Motifs represent patterns of inter-connections, basic
elements in complex networks
CliqueTriangle CycleEdge
A study of motif frequencies in real-world complex networks
Applied to food web, genetic, neural, web, and other network
- Found distinct motifs in each case
Red dashed lines indicate edges that participate
in a feed-forward loop
Motifs are over-expressed
in real complex networks
Motifs occur in real networks
with frequencies
significantly higher than
randomly generated networks
Real networks often have
modular structure
Cj#
Ci#
Ck#
Concentration of Motifs
in real vs random networks
Two Conclusions
Nodes/edges are not the fundamental units of real networks
Motifs are the building blocks of these networks
We should analyze/model graphs in terms of motifs
Network Motifs: Simple Building Blocks of Complex Networks – [Milo et. al – Science 2002]
The Structure and Function of Complex Networks – [Newman – Siam Review 2003]
2-node
Graphlets
3-node
Graphlets
4-node
Graphlets
Connected
Disconnected
Motifs/graphlets are Small k-vertex induced subgraphs
Ex: Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
Ex: Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
à In practice, we would like to count all k-vertex graphlets
§ Enumerate all possible graphlets
§ Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Count graphlets for each node – and combine all node counts
[Shervashidze et. al – AISTAT 2009]
§ Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Count graphlets for each node – and combine all node counts
à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009]
§ Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Count graphlets for each node – and combine all node counts
à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009]
§ Other recent work counts only connected graphlets of size k=4
[Marcus & Shavitt – Computer Networks 2012]
Not practical – scales only for small graphs with few
hundred/thousand nodes/edges
- taking hours for a graph with 30K nodes
Most work focused on graphlets of k=3 nodes
In this work, we focus on graphlets of k=3,4 nodes
Efficient Graphlet Counting for Large Networks
[Ahmed et al., ICDM 2015]
Graphlet Decomposition: Framework, Algorithms, and Applications
[Ahmed et al., KAIS Journal 2016]
Edge-centric, Parallel, Fast, Space-efficient Framework
u v
v2 v3v1 v4
v6 v7
edge
Edge Neighborhood
Jointly count motifs
Searching Edge
Neighborhoods
① For each edge do
u v
v2 v3v1 v4
v6
v7
edge
Searching Edge
Neighborhoods
① For each edge do
• Count All 3-node graphlets
② Merge counts from all edges
u v
v2 v3v1 v4
v6 v7
edge
Triangle 2-star 1-edge Independent
Searching Edge
Neighborhoods
① For each edge do
• Count All 3-node graphlets
② Merge counts from all edges
u v
v2 v3v1 V4
v6 v7
edge
Triangle 2-star 1-edge Independent
q We only need to
find/count triangles
q Use equations to get
counts of others in o(1)
Triangle
How to count all 4-node graphlets?
4-Clique 4-Cycle4-Chrodal-Cycle Tailed-triangle 4-Path 3-Star
4-node-triangle 4-node-2star 4-node-2edge 4-node-1edge Independent
± 1 edge
4-Node Motif Transition Diagram
Step 1 Step 2 Step 3
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count 4-node graphlets
For each edge
Count 4-node cliques
and 4-node cycles only
Count 4-node graphlets
For each edge
Use combinatorial
relationships to compute
counts of other graphlets
in constant time
Step 4 Merge counts from all edges
± 1 edge
4-Node Motif Transition Diagram
4-Node Graphlet Transition Diagram
± 1 edge
4-Cliques
4-Cycles
± 1 edge
4-Node Motif Transition Diagram
Count a few patterns
Use relationships & transitions
to count all other graphlets in constant time
✓ Fast
✓ Space-Efficient
4-Node Motif Transition Diagram
± 1 edge
4-Cliques
4-Cycles
Maximum no. triangles
Incident to an edge
Maximum no. stars
Incident to an edge
Count a few patterns
Use relationships & transitions
to count all other graphlets in constant time
T T
Relationship between 4-cliques & 4-ChordalCycles
4-Cliques 4-ChordalCycle
e
T T
e
No. 4-ChordalCycles No. 4-Cliques
Proof in Lemma 1 - Ahmed et al., ICDM 2015
T T
Relationship between 4-cliques & 4-ChordalCycles
T T
No. 4-ChordalCycles No. 4-Cliques
4-Cliques 4-ChordalCycle
e e
Proof in Lemma 1 - Ahmed et al., ICDM 2015
0 1 2 4 8 16
0
5
10
15
Number of Processing Units
Speedup
socfb−Texas
socfb−OR
socfb−UCLA
socfb−Berkeley13
socfb−MIT
socfb−Penn94
0 1 2 4 8 16
0
5
10
15
Number of Processing Units
Speedup
0 1 2 4 8 16
0
2
4
6
8
10
12
14
Number of Processing Units
Speedup
tech−internet−as
tech−WHOIS
web−it−2004
web−spam
0 1 2 4 8 16
0
2
4
6
8
10
12
14
Number of Processing Units
Speedup
Strong scaling results
Intel Xeon 3.10 Ghz E5-2687W server v3, 16 cores
Comparison to RAGE [Marcus & Shavitt – J. Computer Networks 2011]
Facebook100 Networks from US Schools
Ours RAGE
Time in Seconds
|V| |E| Ours RAGE
Time in Seconds
Baseline (RAGE)
did not finish for
most graphs
We take ~4.5 secs for web-google (4.3M edges)
We take ~4 secs for inf-road-usa (29M edges)
Most motif counts in orders of 106 – 1015
§ Node-level and edge-level motif counting
Role discovery, Relational Learning, Multi-label Classification
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
Most edge neighborhoods are fast
with runtimes that are approximately
equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
HOWEVER, a handful of
neighborhoods are hard and take
significantly longer.
Most edge neighborhoods are fast
with runtimes that are approximately
equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
0 2 4 6 8 10
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Edges
Timeinseconds
HOWEVER, a handful of
neighborhoods are hard and take
significantly longer.
Most edge neighborhoods are fast
with runtimes that are approximately
equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
QUESTION:
How to reduce runtime?
à Sampling
§ [Horvitz-Thompson Estimation 1952]
§ Using HT estimator we scale the sampled count by probability p
Count of graphlet i
for edge j
Estimated Count
of graphlet i for edge j
More details in the paper
See Lemma 1
§ [Hoeffding’s Inequality 1963]
More details in the paper
See Lemma 2
Sampled maximum degree
in the graph
Less than the actual maximum degree
Average relative error for top 1000 (highest degree) edges
Sampling with replacement p = 0.01
GFD: Graphlet Frequency Distribution
Kolmogorov distance (KS)
L1 Normalized distance
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
socfb−MIT
bio−dmela
soc−gowalla
tech−RL−caida
web−wikipedia09
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
Strong scaling results
Using Intel Xeon E5-2687W v3 server, 16 cores
§ Unbiased Estimation of Motif Counts
10
4
10
5
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size
10
4
10
5
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size
x/y
10
4
10
5
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size
10
4
10
5
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size
Estimation of counts of 4-vertex clique
[Ahmed et. al, ICDM 2015, KAIS 2016, TNNLS 2018]
Counting a few motifs
+
Storing a few motif counts
+
Obtain other counts using
combinatorial relationships
(1) Fast and parallel
(2) Space-efficient
Two Conclusions
Joint Motif Counting
for Large Graphs
✓ (1) Motif Counting
Theory and Algorithms for Large Graphs
(2) Machine Learning Applications
for Motif Counting
§ Biological Networks
• network alignment, protein function prediction
[Pržulj 2007][Milenković-Pržulj 2008] [Hulovatyy-Solava-Milenković 2014]
[Shervashidze et al. 2009][Vishwanathan et al. 2010]
§ Social Networks
• Triad analysis, role discovery, community detection
[Granovetter 1983][Holland-Leinhardt 1976][Rossi-Ahmed 2015]
[Ahmed et al. 2015][Xie-Kelley-Szymanski 2013]
§ Internet AS [Feldman et al. 2008]
§ Spam Detection
[Becchetti et al. 2008][Ahmed et al. 2016]
-
-
-
-
-
Useful for various machine learning tasks
e.g., Anomaly detection, Role Discovery, Relational Learning, Clustering etc.
§ Higher-order network analysis
§ Graph classification
§ Higher-order Network Embeddings
Higher-order network structures
§ Visualization – “spotting anomalies” [Ahmed et al. ICDM 2015]
§ Finding large cliques, stars, and other larger network
structures [Ahmed et al. KAIS 2015]
§ Spectral clustering [Benson et al. Science 2016]
§ Role discovery [Ahmed et al. 2016]
...
Ranking by graphlet counts
Nodes are colored/weighted
by triangle counts
Links are colored/weighted
by stars of size 4 nodes
Leukemia
Colon
cancer
Deafness
§ Higher-order network analysis
§ Graph classification
§ Higher-order Network Embeddings
Label 1
Label 0
Enzyme
Non-Enzyme
Collection of Graphs
(e.g. Protein Graphs)
.
.
.
Graphs
Each Protein is represented by a graph
Binary label represents the function of the protein
Label 1
Label 0
Enzyme
Non-Enzyme
?
?
.
.
.
Graphs
?
?
?
Collection of Graphs
(e.g. Protein Graphs)
Assume we know the labels of a few graphs
How to predict the labels of the unlabeled graphs?
Features
Graphs
Motif/Graphlet
Feature
Extraction
Model
Learning
Predict Labels of
Unlabeled Graphs
Label 1
Label 0
?
?
?
?
.
.
.
Graphs
?
?
?
Protein Graphs
§ D&D – 1178 protein graphs. Binary labeled as Enzymes vs.
Non. Enzymes
§ MUTAG – 188 mutagenic compounds. Binary labeled
(whether or not they have a mutagenic effect on the Gram-
negative bacterium)
§ 10-fold validation, Support Vector Machine
§ Used 3,4 node motif counts as features
§ Higher-order network analysis
§ Graph classification
§ Higher-order Network Embeddings
input 0 …
1 …
0 …
0
…
2
0
…
1
Feature	
Engineering
features
1 …
1 … 0
0
1
0
0
Learning	
AlgorithmModel
Prediction	Task
Link	prediction
Classification	
Anomaly	detection
input 0 …
1 …
0 …
0
…
2
0
…
1
Feature	
Engineering
features
1 …
1 … 0
0
1
0
0
Learning	
AlgorithmModel
Prediction	Task
Automatic	
Feature	Learning
Link	prediction
Classification	
Anomaly	detection
§ Goal: Learn representation (features) for a set of graph
elements (nodes, edges, etc.)
§ Key intuition: Map the graph elements (e.g., nodes) to the
d-dimension space, while preserving structural similarity
§ Use the features for any downstream prediction task
Given a graph G=(V,E) and a set of T network motifs,
We form the weighted motif adjacency matrices
# of instances of motif Ht that contain nodes i and j
Weighted-Motif Graph
To generalize HONE for any motif-based matrix formulation, we
can define a function
Motif Transition Matrix
Laplacian Matrix
Normalized Laplacian Matrix
Motif Matrix
Formulations
The number of paths weighted by
motif counts from node i to node j in k-
steps is
The probability of transitioning from
node i to node j in k-steps is given by
We derive all k-step motif-based matrices for all T motifs and K steps
,""""for"k=1,…,K""and""t=1,…,T
K-step Motif Matrix
Local Embeddings
Concatenate all k-step node embedding for all T motifs and K steps
and find a ”global” higher-order node embeddings by solving
H
D"×"TKD&
Z
N"×"D
Each row of Z is a D-dimensional embedding of a node
Global Embeddings
Experimental setup
§ 10-fold cross-validation, repeated for 10 random trials
§ Used all 2-4 node connected orbits
§ D=128, Dl = 16 for the local motif embeddings
§ Edge embedding derived via mean function
§ Predict link existence via logistic regression (LR)
§ # steps K selected via grid search over
Main findings:
§ Mean Gain in AUC of 19.24% (& up
to 75.21%)
✓ (1) Motif Counting
Theory and Algorithms for Large Graphs
✓ (2) Machine Learning Applications
for Motif Counting
Motif
Counting
Higher-order
network analysis
Graph Classification
Higher-order
Network Embeddings
Higher-order
Clustering
Role Discovery
§ Framework & Algorithms
• One of the first parallel approaches for graphlet counting
• On average 460x faster than current methods
• Edge-centric computations (only requires access to edge
neighborhood)
• Time and space-efficient
• Sampling/estimation methods
• Local/global counting
§ Applications
• Large-scale graph comparison, classification, and anomaly detection
• Visual analytics and real-time graphlet mining
• Higher-order network embeddings
Code
https://github.com/nkahmed/PGD
Data
http://networkrepository.com
à Email me for questions
§ Efficient Graphlet Counting for Large Networks. ICDM 2015, [Ahmed et al.]
§ Graphlet Decomposition: Framework, Algorithms, and Applications. J. Know. & Info. 2016 [Ahmed et al.]
§ Estimation of Graphlet Counts in Massive Networks. IEEE TNNLS 2018 [Rossi-Zhou-Ahmed]
§ Higher-order Network Representation Learning. WWW 2018 [Rossi-Ahmed-Koh]
§ Network Motifs: Simple Building Blocks of Complex Networks. Science 2002, [Milo et al.]
§ Uncovering Biological Network Function via Graphlet Degree Signatures. Cancer Informatics 2008 [Milenković-Pržulj]
§ Graph Kernels. JMLR 2010, [Vishwanathan et al.]
§ Graph Sample and Hold: A Framework for Big Graph Analytics. KDD 2014 [Ahmed-Duffield-Neville-Kompella]
§ Role Discovery in Networks. IEEE TKDE 2015 [Rossi-Ahmed]
§ The Structure and Function of Complex Networks. SIAM Review 2003, [Newman]
§ Biological network comparison using graphlet degree distribution. Bioinformatics 2007 [Pržulj]
§ Efficient Graphlet Kernels for Large Graph Comparison. AISTAT 2009 [Shervashidze et al.]
§ Revealing Missing Parts of the Interactome via Link Prediction. PLoS ONE 2014 [Hulovatyy-Solava-Milenković]
§ Graft: An Efficient Graphlet Counting Method for Large Graph Analysis. TKDE 2014 [Rahman-Bhuiyan-Hasan]
§ Graph-Based Anomaly Detection, KDD 2003 [Noble-Cook]
§ Local structure in social networks. Sociological methodology 1976, [Holland-Leinhardt]
§ The strength of weak ties: A network theory revisited. Sociological theory 1983 [Granovetter]
§ Overlapping community detection in networks. ACM Computing Surveys 2013 [Xie-Kelley-Szymanski]
§ Automatic large scale generation of internet pop level maps. GLOBECOM 2008 [Feldman et al.]
§ Efficient semi-streaming algorithms for local triangle counting in massive graphs. KDD 2008 [Becchetti et al.]
Thank you!
Questions?
nesreen.k.ahmed@intel.com
http://nesreenahmed.com

More Related Content

What's hot

What's hot (20)

Data communication and networks by B. Forouzan
Data communication and networks by B. ForouzanData communication and networks by B. Forouzan
Data communication and networks by B. Forouzan
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - Graph
 
trees in data structure
trees in data structure trees in data structure
trees in data structure
 
8 queen problem
8 queen problem8 queen problem
8 queen problem
 
Graphs bfs dfs
Graphs bfs dfsGraphs bfs dfs
Graphs bfs dfs
 
B tree
B treeB tree
B tree
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
Graphs in data structure
Graphs in data structureGraphs in data structure
Graphs in data structure
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
 
Stacks, Queues, Deques
Stacks, Queues, DequesStacks, Queues, Deques
Stacks, Queues, Deques
 
Python tuples and Dictionary
Python   tuples and DictionaryPython   tuples and Dictionary
Python tuples and Dictionary
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
Spanning trees
Spanning treesSpanning trees
Spanning trees
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
Linked lists
Linked listsLinked lists
Linked lists
 
Counting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort AlgorithmsCounting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort Algorithms
 
Graph traversals in Data Structures
Graph traversals in Data StructuresGraph traversals in Data Structures
Graph traversals in Data Structures
 

Similar to The Power of Motif Counting Theory, Algorithms, and Applications for Large Graphs

Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Alejandro Salado
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015József Makai
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Introduction to Parallel Programming
Introduction to Parallel ProgrammingIntroduction to Parallel Programming
Introduction to Parallel Programmingyashkumarcs20
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Cross-Validation and Big Data Partitioning Via Experimental Design
Cross-Validation and Big Data Partitioning Via Experimental DesignCross-Validation and Big Data Partitioning Via Experimental Design
Cross-Validation and Big Data Partitioning Via Experimental Designdans_salford
 
magellan_mongodb_workload_analysis
magellan_mongodb_workload_analysismagellan_mongodb_workload_analysis
magellan_mongodb_workload_analysisPraveen Narayanan
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureOllieShoresna
 
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...AboutYouGmbH
 

Similar to The Power of Motif Counting Theory, Algorithms, and Applications for Large Graphs (20)

Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Introduction to Parallel Programming
Introduction to Parallel ProgrammingIntroduction to Parallel Programming
Introduction to Parallel Programming
 
User biglm
User biglmUser biglm
User biglm
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Cross-Validation and Big Data Partitioning Via Experimental Design
Cross-Validation and Big Data Partitioning Via Experimental DesignCross-Validation and Big Data Partitioning Via Experimental Design
Cross-Validation and Big Data Partitioning Via Experimental Design
 
magellan_mongodb_workload_analysis
magellan_mongodb_workload_analysismagellan_mongodb_workload_analysis
magellan_mongodb_workload_analysis
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data Structure
 
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
 

More from Nesreen K. Ahmed

Sampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkSampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkNesreen K. Ahmed
 
Sampling for Approximate Bipartite Network Projection
Sampling for Approximate Bipartite Network ProjectionSampling for Approximate Bipartite Network Projection
Sampling for Approximate Bipartite Network ProjectionNesreen K. Ahmed
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsNesreen K. Ahmed
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsNesreen K. Ahmed
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Fast Graphlet Decomposition: Theory, Algorithms, and Applications
Fast Graphlet Decomposition: Theory, Algorithms, and ApplicationsFast Graphlet Decomposition: Theory, Algorithms, and Applications
Fast Graphlet Decomposition: Theory, Algorithms, and ApplicationsNesreen K. Ahmed
 

More from Nesreen K. Ahmed (6)

Sampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkSampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying Framework
 
Sampling for Approximate Bipartite Network Projection
Sampling for Approximate Bipartite Network ProjectionSampling for Approximate Bipartite Network Projection
Sampling for Approximate Bipartite Network Projection
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph Streams
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Fast Graphlet Decomposition: Theory, Algorithms, and Applications
Fast Graphlet Decomposition: Theory, Algorithms, and ApplicationsFast Graphlet Decomposition: Theory, Algorithms, and Applications
Fast Graphlet Decomposition: Theory, Algorithms, and Applications
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

The Power of Motif Counting Theory, Algorithms, and Applications for Large Graphs

  • 1. Joint Work with: Jen Neville Ryan Rossi Nick Duffield
  • 2. - - - - - Social network Human Disease Network [Barabasi 2007] Food Web [2007] Terrorist Network [Krebs 2002]Internet (AS) [2005] Gene Regulatory Network [Decourty 2008] Protein Interactions [breast cancer] Political blogs Power grid
  • 3. (1) Motif Counting Theory and Algorithms for Large Graphs (2) Machine Learning Applications for Motif Counting
  • 4. § Network motifs (graphlets) are small induced subgraphs § Motifs represent patterns of inter-connections, basic elements in complex networks CliqueTriangle CycleEdge
  • 5. A study of motif frequencies in real-world complex networks Applied to food web, genetic, neural, web, and other network - Found distinct motifs in each case
  • 6. Red dashed lines indicate edges that participate in a feed-forward loop Motifs are over-expressed in real complex networks
  • 7. Motifs occur in real networks with frequencies significantly higher than randomly generated networks Real networks often have modular structure Cj# Ci# Ck# Concentration of Motifs in real vs random networks Two Conclusions
  • 8. Nodes/edges are not the fundamental units of real networks Motifs are the building blocks of these networks We should analyze/model graphs in terms of motifs
  • 9. Network Motifs: Simple Building Blocks of Complex Networks – [Milo et. al – Science 2002] The Structure and Function of Complex Networks – [Newman – Siam Review 2003] 2-node Graphlets 3-node Graphlets 4-node Graphlets Connected Disconnected Motifs/graphlets are Small k-vertex induced subgraphs
  • 10.
  • 11. Ex: Given an input graph G - How many triangles in G? - How many cliques of size 4-nodes in G? - How many cycles of size 4-nodes in G?
  • 12. Ex: Given an input graph G - How many triangles in G? - How many cliques of size 4-nodes in G? - How many cycles of size 4-nodes in G? à In practice, we would like to count all k-vertex graphlets
  • 13. § Enumerate all possible graphlets
  • 14. § Enumerate all possible graphlets à Exhaustive enumeration is too expensive
  • 15. § Enumerate all possible graphlets à Exhaustive enumeration is too expensive § Count graphlets for each node – and combine all node counts [Shervashidze et. al – AISTAT 2009]
  • 16. § Enumerate all possible graphlets à Exhaustive enumeration is too expensive § Count graphlets for each node – and combine all node counts à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009]
  • 17. § Enumerate all possible graphlets à Exhaustive enumeration is too expensive § Count graphlets for each node – and combine all node counts à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009] § Other recent work counts only connected graphlets of size k=4 [Marcus & Shavitt – Computer Networks 2012] Not practical – scales only for small graphs with few hundred/thousand nodes/edges - taking hours for a graph with 30K nodes
  • 18. Most work focused on graphlets of k=3 nodes In this work, we focus on graphlets of k=3,4 nodes Efficient Graphlet Counting for Large Networks [Ahmed et al., ICDM 2015] Graphlet Decomposition: Framework, Algorithms, and Applications [Ahmed et al., KAIS Journal 2016]
  • 19. Edge-centric, Parallel, Fast, Space-efficient Framework u v v2 v3v1 v4 v6 v7 edge Edge Neighborhood Jointly count motifs
  • 20. Searching Edge Neighborhoods ① For each edge do u v v2 v3v1 v4 v6 v7 edge
  • 21. Searching Edge Neighborhoods ① For each edge do • Count All 3-node graphlets ② Merge counts from all edges u v v2 v3v1 v4 v6 v7 edge Triangle 2-star 1-edge Independent
  • 22. Searching Edge Neighborhoods ① For each edge do • Count All 3-node graphlets ② Merge counts from all edges u v v2 v3v1 V4 v6 v7 edge Triangle 2-star 1-edge Independent q We only need to find/count triangles q Use equations to get counts of others in o(1) Triangle
  • 23. How to count all 4-node graphlets? 4-Clique 4-Cycle4-Chrodal-Cycle Tailed-triangle 4-Path 3-Star 4-node-triangle 4-node-2star 4-node-2edge 4-node-1edge Independent
  • 24. ± 1 edge 4-Node Motif Transition Diagram
  • 25. Step 1 Step 2 Step 3 Searching Edge Neighborhoods For each edge Find the triangles Count 4-node graphlets For each edge Count 4-node cliques and 4-node cycles only Count 4-node graphlets For each edge Use combinatorial relationships to compute counts of other graphlets in constant time Step 4 Merge counts from all edges
  • 26. ± 1 edge 4-Node Motif Transition Diagram
  • 27. 4-Node Graphlet Transition Diagram ± 1 edge 4-Cliques 4-Cycles
  • 28. ± 1 edge 4-Node Motif Transition Diagram Count a few patterns Use relationships & transitions to count all other graphlets in constant time ✓ Fast ✓ Space-Efficient
  • 29. 4-Node Motif Transition Diagram ± 1 edge 4-Cliques 4-Cycles Maximum no. triangles Incident to an edge Maximum no. stars Incident to an edge Count a few patterns Use relationships & transitions to count all other graphlets in constant time
  • 30. T T Relationship between 4-cliques & 4-ChordalCycles 4-Cliques 4-ChordalCycle e T T e No. 4-ChordalCycles No. 4-Cliques Proof in Lemma 1 - Ahmed et al., ICDM 2015
  • 31. T T Relationship between 4-cliques & 4-ChordalCycles T T No. 4-ChordalCycles No. 4-Cliques 4-Cliques 4-ChordalCycle e e Proof in Lemma 1 - Ahmed et al., ICDM 2015
  • 32. 0 1 2 4 8 16 0 5 10 15 Number of Processing Units Speedup socfb−Texas socfb−OR socfb−UCLA socfb−Berkeley13 socfb−MIT socfb−Penn94 0 1 2 4 8 16 0 5 10 15 Number of Processing Units Speedup 0 1 2 4 8 16 0 2 4 6 8 10 12 14 Number of Processing Units Speedup tech−internet−as tech−WHOIS web−it−2004 web−spam 0 1 2 4 8 16 0 2 4 6 8 10 12 14 Number of Processing Units Speedup Strong scaling results Intel Xeon 3.10 Ghz E5-2687W server v3, 16 cores
  • 33. Comparison to RAGE [Marcus & Shavitt – J. Computer Networks 2011] Facebook100 Networks from US Schools Ours RAGE Time in Seconds
  • 34. |V| |E| Ours RAGE Time in Seconds Baseline (RAGE) did not finish for most graphs We take ~4.5 secs for web-google (4.3M edges) We take ~4 secs for inf-road-usa (29M edges) Most motif counts in orders of 106 – 1015
  • 35. § Node-level and edge-level motif counting Role discovery, Relational Learning, Multi-label Classification
  • 36. 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law.
  • 37. 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law.
  • 38. 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds HOWEVER, a handful of neighborhoods are hard and take significantly longer. Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law. Neighborhood runtimes are power-lawed
  • 39. 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds 0 2 4 6 8 10 x 10 4 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Edges Timeinseconds HOWEVER, a handful of neighborhoods are hard and take significantly longer. Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law. Neighborhood runtimes are power-lawed QUESTION: How to reduce runtime? à Sampling
  • 40. § [Horvitz-Thompson Estimation 1952] § Using HT estimator we scale the sampled count by probability p Count of graphlet i for edge j Estimated Count of graphlet i for edge j More details in the paper See Lemma 1
  • 41. § [Hoeffding’s Inequality 1963] More details in the paper See Lemma 2
  • 42. Sampled maximum degree in the graph Less than the actual maximum degree
  • 43. Average relative error for top 1000 (highest degree) edges Sampling with replacement p = 0.01 GFD: Graphlet Frequency Distribution Kolmogorov distance (KS) L1 Normalized distance
  • 44. 1 2 4 8 12 16 0 2 4 6 8 10 12 14 16 Number of processing units Speedup socfb−MIT bio−dmela soc−gowalla tech−RL−caida web−wikipedia09 1 2 4 8 12 16 0 2 4 6 8 10 12 14 16 Number of processing units Speedup Strong scaling results Using Intel Xeon E5-2687W v3 server, 16 cores
  • 45. § Unbiased Estimation of Motif Counts 10 4 10 5 0.85 0.9 0.95 1 1.05 1.1 1.15 soc−orkut−dir Sample Size 10 4 10 5 0.85 0.9 0.95 1 1.05 1.1 1.15 soc−orkut−dir Sample Size x/y 10 4 10 5 0.9 0.95 1 1.05 1.1 1.15 soc−flickr Sample Size 10 4 10 5 0.9 0.95 1 1.05 1.1 1.15 soc−flickr Sample Size Estimation of counts of 4-vertex clique [Ahmed et. al, ICDM 2015, KAIS 2016, TNNLS 2018]
  • 46. Counting a few motifs + Storing a few motif counts + Obtain other counts using combinatorial relationships (1) Fast and parallel (2) Space-efficient Two Conclusions Joint Motif Counting for Large Graphs
  • 47. ✓ (1) Motif Counting Theory and Algorithms for Large Graphs (2) Machine Learning Applications for Motif Counting
  • 48. § Biological Networks • network alignment, protein function prediction [Pržulj 2007][Milenković-Pržulj 2008] [Hulovatyy-Solava-Milenković 2014] [Shervashidze et al. 2009][Vishwanathan et al. 2010] § Social Networks • Triad analysis, role discovery, community detection [Granovetter 1983][Holland-Leinhardt 1976][Rossi-Ahmed 2015] [Ahmed et al. 2015][Xie-Kelley-Szymanski 2013] § Internet AS [Feldman et al. 2008] § Spam Detection [Becchetti et al. 2008][Ahmed et al. 2016] - - - - - Useful for various machine learning tasks e.g., Anomaly detection, Role Discovery, Relational Learning, Clustering etc.
  • 49. § Higher-order network analysis § Graph classification § Higher-order Network Embeddings
  • 50. Higher-order network structures § Visualization – “spotting anomalies” [Ahmed et al. ICDM 2015] § Finding large cliques, stars, and other larger network structures [Ahmed et al. KAIS 2015] § Spectral clustering [Benson et al. Science 2016] § Role discovery [Ahmed et al. 2016] ...
  • 51. Ranking by graphlet counts Nodes are colored/weighted by triangle counts Links are colored/weighted by stars of size 4 nodes Leukemia Colon cancer Deafness
  • 52. § Higher-order network analysis § Graph classification § Higher-order Network Embeddings
  • 53. Label 1 Label 0 Enzyme Non-Enzyme Collection of Graphs (e.g. Protein Graphs) . . . Graphs Each Protein is represented by a graph Binary label represents the function of the protein
  • 54. Label 1 Label 0 Enzyme Non-Enzyme ? ? . . . Graphs ? ? ? Collection of Graphs (e.g. Protein Graphs) Assume we know the labels of a few graphs How to predict the labels of the unlabeled graphs?
  • 55. Features Graphs Motif/Graphlet Feature Extraction Model Learning Predict Labels of Unlabeled Graphs Label 1 Label 0 ? ? ? ? . . . Graphs ? ? ? Protein Graphs
  • 56. § D&D – 1178 protein graphs. Binary labeled as Enzymes vs. Non. Enzymes § MUTAG – 188 mutagenic compounds. Binary labeled (whether or not they have a mutagenic effect on the Gram- negative bacterium) § 10-fold validation, Support Vector Machine § Used 3,4 node motif counts as features
  • 57. § Higher-order network analysis § Graph classification § Higher-order Network Embeddings
  • 58. input 0 … 1 … 0 … 0 … 2 0 … 1 Feature Engineering features 1 … 1 … 0 0 1 0 0 Learning AlgorithmModel Prediction Task Link prediction Classification Anomaly detection
  • 59. input 0 … 1 … 0 … 0 … 2 0 … 1 Feature Engineering features 1 … 1 … 0 0 1 0 0 Learning AlgorithmModel Prediction Task Automatic Feature Learning Link prediction Classification Anomaly detection
  • 60. § Goal: Learn representation (features) for a set of graph elements (nodes, edges, etc.) § Key intuition: Map the graph elements (e.g., nodes) to the d-dimension space, while preserving structural similarity § Use the features for any downstream prediction task
  • 61. Given a graph G=(V,E) and a set of T network motifs, We form the weighted motif adjacency matrices # of instances of motif Ht that contain nodes i and j
  • 63. To generalize HONE for any motif-based matrix formulation, we can define a function Motif Transition Matrix Laplacian Matrix Normalized Laplacian Matrix Motif Matrix Formulations
  • 64. The number of paths weighted by motif counts from node i to node j in k- steps is The probability of transitioning from node i to node j in k-steps is given by We derive all k-step motif-based matrices for all T motifs and K steps ,""""for"k=1,…,K""and""t=1,…,T K-step Motif Matrix Local Embeddings
  • 65. Concatenate all k-step node embedding for all T motifs and K steps and find a ”global” higher-order node embeddings by solving H D"×"TKD& Z N"×"D Each row of Z is a D-dimensional embedding of a node Global Embeddings
  • 66. Experimental setup § 10-fold cross-validation, repeated for 10 random trials § Used all 2-4 node connected orbits § D=128, Dl = 16 for the local motif embeddings § Edge embedding derived via mean function § Predict link existence via logistic regression (LR) § # steps K selected via grid search over Main findings: § Mean Gain in AUC of 19.24% (& up to 75.21%)
  • 67. ✓ (1) Motif Counting Theory and Algorithms for Large Graphs ✓ (2) Machine Learning Applications for Motif Counting
  • 69. § Framework & Algorithms • One of the first parallel approaches for graphlet counting • On average 460x faster than current methods • Edge-centric computations (only requires access to edge neighborhood) • Time and space-efficient • Sampling/estimation methods • Local/global counting § Applications • Large-scale graph comparison, classification, and anomaly detection • Visual analytics and real-time graphlet mining • Higher-order network embeddings
  • 71. § Efficient Graphlet Counting for Large Networks. ICDM 2015, [Ahmed et al.] § Graphlet Decomposition: Framework, Algorithms, and Applications. J. Know. & Info. 2016 [Ahmed et al.] § Estimation of Graphlet Counts in Massive Networks. IEEE TNNLS 2018 [Rossi-Zhou-Ahmed] § Higher-order Network Representation Learning. WWW 2018 [Rossi-Ahmed-Koh] § Network Motifs: Simple Building Blocks of Complex Networks. Science 2002, [Milo et al.] § Uncovering Biological Network Function via Graphlet Degree Signatures. Cancer Informatics 2008 [Milenković-Pržulj] § Graph Kernels. JMLR 2010, [Vishwanathan et al.] § Graph Sample and Hold: A Framework for Big Graph Analytics. KDD 2014 [Ahmed-Duffield-Neville-Kompella] § Role Discovery in Networks. IEEE TKDE 2015 [Rossi-Ahmed] § The Structure and Function of Complex Networks. SIAM Review 2003, [Newman] § Biological network comparison using graphlet degree distribution. Bioinformatics 2007 [Pržulj] § Efficient Graphlet Kernels for Large Graph Comparison. AISTAT 2009 [Shervashidze et al.] § Revealing Missing Parts of the Interactome via Link Prediction. PLoS ONE 2014 [Hulovatyy-Solava-Milenković] § Graft: An Efficient Graphlet Counting Method for Large Graph Analysis. TKDE 2014 [Rahman-Bhuiyan-Hasan] § Graph-Based Anomaly Detection, KDD 2003 [Noble-Cook] § Local structure in social networks. Sociological methodology 1976, [Holland-Leinhardt] § The strength of weak ties: A network theory revisited. Sociological theory 1983 [Granovetter] § Overlapping community detection in networks. ACM Computing Surveys 2013 [Xie-Kelley-Szymanski] § Automatic large scale generation of internet pop level maps. GLOBECOM 2008 [Feldman et al.] § Efficient semi-streaming algorithms for local triangle counting in massive graphs. KDD 2008 [Becchetti et al.]