SlideShare a Scribd company logo
Graphs – rich and powerful
data representation
-
-
-
-
-
Social network
Human Disease Network
[Barabasi 2007]
Food Web [2007]
Terrorist Network
[Krebs 2002]Internet (AS) [2005]
Gene Regulatory Network
[Decourty 2008]
Protein Interactions
[breast cancer]
Political blogs
Power grid
Graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
Graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
Graphlets
k-graphlets = family of graphlets of size k
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
Graphlets
k-graphlets = family of graphlets of size k
motifs = frequently occurring subgraphs
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
Graphlets
k-graphlets = family of graphlets of size k
motifs = frequently occurring subgraphs
Applied to food web, genetic, neural, web, and other networks
Found distinct graphlets in each case
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
• Biological Networks
⎻ network alignment, protein function prediction
[Pržulj 2007][Milenković-Pržulj 2008] [Hulovatyy-Solava-Milenković 2014]
[Shervashidze et al. 2009][Vishwanathan et al. 2010]
• Social Networks
⎻ Triad analysis, role discovery, community detection
[Granovetter 1983][Holland-Leinhardt 1976][Rossi-Ahmed 2015]
[Ahmed et al. 2015][Xie-Kelley-Szymanski 2013]
• Internet AS [Feldman et al. 2008]
• Spam Detection
[Becchetti et al. 2008][Ahmed et al. 2016]
Applications of Graphlets
Useful for various machine learning tasks
e.g., Anomaly detection, Role Discovery, Relational Learning, Clustering etc.
Useful for a variety of ML tasks
• Graph-based anomaly detection
⎻ Unusual/malicious behavior detection
⎻ Emerging event and threat identification, …
• Graph-based semi-supervised learning, classification, …
• Link prediction and relationship strength estimation
• Graph similarity queries
⎻ Find similar nodes, edges, or graphs
• Subgraph detection and matching
Applications:
Higher-order network analysis and
modeling
Higher-order network structures
• Visualization – “spotting anomalies” [Ahmed et al.
ICDM 2014]
• Finding large cliques, stars, and other larger
network structures [Ahmed et al. KAIS 2015]
• Spectral clustering [Jure et al. Science 2016]
• Role discovery [Ahmed et al. 2016]
...
How
CPU/GP
Us
compare
CPU GPU
Large memory Memory is very limited
Few fast/powerful processing units Thousands of smaller processing units
Handles unbalanced jobs better Performs best with “balanced” workloads
Optimized for general computations
Optimized for simple repetitive calculations
at a very fast rate.
How
CPU/GP
Us
compare
CPU GPU
Large memory Memory is very limited
Few fast/powerful processing units Thousands of smaller processing units
Handles unbalanced jobs better Performs best with “balanced” workloads
Optimized for general computations
Optimized for simple repetitive calculations
at a very fast rate.
Combine advantages of both
INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
 Many applications require counting all k-vertex graphlets
 Recent research work
- Exact/approximation of global counts [Rahman et al. TKDE14] [Jha et al. WWW15]
- Scalable for massive graphs (billions of nodes/edges) ] [Ahmed et al. ICDM15,KAIS16]
INPUT: a large graph G=(V,E), set of graphlets ℋ
PROBLEM: Find the number of occurrences that
edge i is contained within 𝐻 𝑘, for all k = 1, … , |ℋ|
Role discovery, Relational Learning, Multi-label Classification
Problem: local graphlet counting
(micro-level)
Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
Parallel
• Edge-centric graphlet counting (PGD)
⎻ Multi-core CPUs, large graphs
[Ahmed et al. ICDM 14, KAIS 15]
Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
Parallel
• Edge-centric graphlet counting (PGD)
⎻ Multi-core CPUs, large graphs
• Node-centric graphlet counting,
⎻ Single GPU, Handles only tiny graphs (ORCA-GPU)
[Ahmed et al. ICDM 14, KAIS 15]
[Milinković et al.]
Our approach
Hybrid parallel graphlet counting framework that
leverages all available CPUs and GPUs
Parallel Graphlet
Counting Framework
Single GPU
methods
Multi-GPU
methods
Hybrid CPU-GPU
methods
Algorithm classes
Our approach
Hybrid parallel graphlet
counting framework that
leverages all available
CPUs & GPUs
Hybrid Parallel
Graphlet Counting
Framework
Multi-GPU
methods
Hybrid CPU-GPU
methods
Algorithm classes
Single GPU
methods
Other key advantages:
• Edge-centric parallelization
⎻ Improved load balancing & lock-free
• Global and local graphlet counts
• Connected and disconnected graphlets
• Fine-grained parallelization
• Space-efficient
⫶
e
Overview of our approach
uv
SuSv
… T
uv
…
Ec
EDGE
Overview
nodes completing a triangle with edge (v, u)T =
nodes that form a 2-star with v (u)Sv (Su) =
Edge-centric, Parallel, Fast,
Space-efficient Framework
Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Step 1
Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2
Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
Count all other
graphlets
For each edge, use
combinatorial
relationships
to derive counts
of other graphlets
in constant time o(1)
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2 Step 3
Our Approach – (Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
Count all other
graphlets
For each edge, use
combinatorial
relationships
to derive counts
of other graphlets
in constant time o(1)
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2 Step 3
Step 4 Merge all counts
neighborhood runtimes (CPU)
Key Observations
Neighborhood runtimes
are power-lawed
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
HOWEVER, a handful of neighborhoods
are hard and take significantly longer.
Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
HOWEVER, a handful of neighborhoods
are hard and take significantly longer.
Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
QUESTION:
What is the “best” way to
partition neighborhoods
among CPUs and GPUs?
• “hardness” proxy 
edge deg., vol., ...
Our approach
• Order edges by “hardness” and partition into 3 sets:
Γ e1 Γ e 𝑀Γ e𝑗Γ e 𝑘
⋯ ⋯ ⋯
Πcpu Πgpu
Our approach
• Order edges by “hardness” and partition into 3 sets:
• Compute induced subgraphs centered at each edge
• CPU Workers: use hash table for o(1) lookups, O(N)
• GPU Workers: use binary search for o(log d) lookups
• When finished, dequeue next b edges:
• CPU: get b edges from FRONT of Πunproc
• GPU: get b edges from BACK of Πunproc
Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u
⎻ hence, v is always the vertex with largest degree, dv ≥ du
Preprocessing steps
• All of these steps are not required, but significantly improve
• Each step is extremely fast and lends itself to easy parallelization
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u
⎻ hence, v is always the vertex with largest degree, dv ≥ du
Fine Granularity & Work Stealing
For a single edge (v, u) ∈ E,
I. Compute the sets 𝐓 and 𝐒 𝐮
II. Find the total 4-cliques using T
III. Find the total 4-cycles using 𝐒 𝐮
NOTE: (II) and (III) are independent  parallelize
Unrestricted counts
3-graphlets
4-graphlets
De = N −(|Sv|+ |Su| + |T|) − 2
Global counts
4-graphlets
3-graphlets
Time Complexity
K = number of edges
Δ = max degree
Tmax = max number of triangles incident to an edge in G
Smax = max number of 2-stars incident to an edge in G
Experiments
Connected 4-graphlet frequencies for a variety of the real-
world networks investigated from different domains.
Facebook networks
Social networks
Interaction networks
Network type
Collaboration networks
Brain networks
Web graphs
Technological/IP networks
Dense hard benchmark graphs
Validating edge
partitioning
0 2 4 6
x 10
4
0
0.05
0.1
0.15
0.2
Edge neighborhoods
Time(ms)
CPU
GPU
0 2 4 6
x 10
4
0
0.05
0.1
0.15
0.2
Edge neighborhoods
Time(ms)
• Edges partitioned by
“hardness”
• GPUs assigned sparser
neighborhoods
• Assigns edge neighborhoods
to “best” processor type
• Importance of initial
ordering
GPU workers assigned easy & balanced edge
neighborhoods (approx. equal runtimes)
CPU workers assigned difficult
unbalanced/skewed neighborhoods
Experiments: Improvement
Runtime improvement
over state-of-the-art
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
2 Intel Xeon CPUs (E5-2687) –
• 8 cores (3.10Ghz)
8 Titan Black NVIDIA GPUs –
• 2880 cores (889 Mhz), ~6GB
Experiments: Improvement
Runtime improvement
over state-of-the-art
Improvement:
significant at α = 0.01
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
Experiments: Improvement
Runtime improvement
over state-of-the-art
Improvement:
significant at α = 0.01
MEAN 8x 40x 126x
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
Comparing ORCA-GPU methods
Many problems with Orca-GPU:
• No “effective parallelization”, many parts dependent
• Requires synchronization throughout, locks
• No fine-grained parallelization
• Significant improvement
over Orca-GPU (at 𝛼 =
0.01)
Orca-GPU runtime /
runtime of proposed
method
ImprovementoverOrca-GPU
Varying the edge ordering
Ordering strategy significantly impacts performance
Space-efficient & comm.
avoidance
Average memory (MB) per GPU for three networks.
Applications
Ranking/spotting Large Cliques
via Graphlets
Ranking/spotting Large Cliques via
Graphlets
Ranking/spotting Large Cliques via
Graphlets
Ranking/spotting Large Cliques
via Graphlets
Ranking/spotting Large Stars via
Graphlets
Ranking/spotting Large Stars via
Graphlets
Framework & Algorithms
• Introduced hybrid graphlet counting approach that leverages all
available CPUs & GPUs
• First hybrid CPU-GPU approach for graphlet counting
• On average 126x faster than current methods
- Edge-centric computations (only requires access to edge neighborhood)
• Time and space-efficient
Applications
• Visual analytics and real-time graphlet mining
Summary
Research generously supported by:
Data: http://networkrepository.com

More Related Content

Similar to Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks

Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
Stanka Dalekova
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarization
aftab alam
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
Nikos Kostagiolas
 
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
IJMER
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Faculty of Technical Sciences, University of Novi Sad
 
mini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptxmini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptx
tusharpawar803067
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Eswar Publications
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Amazon Web Services
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
Carlos Castillo (ChaTo)
 
Ijciet 10 01_183
Ijciet 10 01_183Ijciet 10 01_183
Ijciet 10 01_183
IAEME Publication
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph Streams
Nesreen K. Ahmed
 
Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)
manojg1990
 
Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...
SURFevents
 
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
Neo4j
 

Similar to Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks (20)

Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarization
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
mini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptxmini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptx
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Ijciet 10 01_183
Ijciet 10 01_183Ijciet 10 01_183
Ijciet 10 01_183
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph Streams
 
Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)
 
Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...
 
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...
 

Recently uploaded

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 

Recently uploaded (20)

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 

Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks

  • 1.
  • 2. Graphs – rich and powerful data representation - - - - - Social network Human Disease Network [Barabasi 2007] Food Web [2007] Terrorist Network [Krebs 2002]Internet (AS) [2005] Gene Regulatory Network [Decourty 2008] Protein Interactions [breast cancer] Political blogs Power grid
  • 3. Graphlets H13 H14 H15 H16 H17H6H2 H7 H8 H9 H10 H11 H12H4H1 H3 H5 1 0 0.5 0.83 0.67 0.33 0.17 Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002] The Structure and Function of Complex Networks [Newman – Siam Review 2003] Small induced subgraphs
  • 4. Graphlets H13 H14 H15 H16 H17H6H2 H7 H8 H9 H10 H11 H12H4H1 H3 H5 1 0 0.5 0.83 0.67 0.33 0.17 Connected Disconnected Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002] The Structure and Function of Complex Networks [Newman – Siam Review 2003] Small induced subgraphs
  • 5. Graphlets k-graphlets = family of graphlets of size k 2-graphlets 3-graphlets 4-graphlets H13 H14 H15 H16 H17H6H2 H7 H8 H9 H10 H11 H12H4H1 H3 H5 1 0 0.5 0.83 0.67 0.33 0.17 Connected Disconnected Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002] The Structure and Function of Complex Networks [Newman – Siam Review 2003] Small induced subgraphs
  • 6. Graphlets k-graphlets = family of graphlets of size k motifs = frequently occurring subgraphs 2-graphlets 3-graphlets 4-graphlets H13 H14 H15 H16 H17H6H2 H7 H8 H9 H10 H11 H12H4H1 H3 H5 1 0 0.5 0.83 0.67 0.33 0.17 Connected Disconnected Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002] The Structure and Function of Complex Networks [Newman – Siam Review 2003] Small induced subgraphs
  • 7. Graphlets k-graphlets = family of graphlets of size k motifs = frequently occurring subgraphs Applied to food web, genetic, neural, web, and other networks Found distinct graphlets in each case 2-graphlets 3-graphlets 4-graphlets H13 H14 H15 H16 H17H6H2 H7 H8 H9 H10 H11 H12H4H1 H3 H5 1 0 0.5 0.83 0.67 0.33 0.17 Connected Disconnected Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002] The Structure and Function of Complex Networks [Newman – Siam Review 2003] Small induced subgraphs
  • 8. • Biological Networks ⎻ network alignment, protein function prediction [Pržulj 2007][Milenković-Pržulj 2008] [Hulovatyy-Solava-Milenković 2014] [Shervashidze et al. 2009][Vishwanathan et al. 2010] • Social Networks ⎻ Triad analysis, role discovery, community detection [Granovetter 1983][Holland-Leinhardt 1976][Rossi-Ahmed 2015] [Ahmed et al. 2015][Xie-Kelley-Szymanski 2013] • Internet AS [Feldman et al. 2008] • Spam Detection [Becchetti et al. 2008][Ahmed et al. 2016] Applications of Graphlets Useful for various machine learning tasks e.g., Anomaly detection, Role Discovery, Relational Learning, Clustering etc.
  • 9. Useful for a variety of ML tasks • Graph-based anomaly detection ⎻ Unusual/malicious behavior detection ⎻ Emerging event and threat identification, … • Graph-based semi-supervised learning, classification, … • Link prediction and relationship strength estimation • Graph similarity queries ⎻ Find similar nodes, edges, or graphs • Subgraph detection and matching
  • 10. Applications: Higher-order network analysis and modeling Higher-order network structures • Visualization – “spotting anomalies” [Ahmed et al. ICDM 2014] • Finding large cliques, stars, and other larger network structures [Ahmed et al. KAIS 2015] • Spectral clustering [Jure et al. Science 2016] • Role discovery [Ahmed et al. 2016] ...
  • 11. How CPU/GP Us compare CPU GPU Large memory Memory is very limited Few fast/powerful processing units Thousands of smaller processing units Handles unbalanced jobs better Performs best with “balanced” workloads Optimized for general computations Optimized for simple repetitive calculations at a very fast rate.
  • 12. How CPU/GP Us compare CPU GPU Large memory Memory is very limited Few fast/powerful processing units Thousands of smaller processing units Handles unbalanced jobs better Performs best with “balanced” workloads Optimized for general computations Optimized for simple repetitive calculations at a very fast rate. Combine advantages of both
  • 13. INPUT: a large graph G=(V,E), set of graphlets 𝓗 PROBLEM: Find the number of embeddings (appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G Problem: global graphlet counting (macro-level)
  • 14. INPUT: a large graph G=(V,E), set of graphlets 𝓗 PROBLEM: Find the number of embeddings (appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G Problem: global graphlet counting (macro-level) Given an input graph G - How many triangles in G? - How many cliques of size 4-nodes in G? - How many cycles of size 4-nodes in G?
  • 15. INPUT: a large graph G=(V,E), set of graphlets 𝓗 PROBLEM: Find the number of embeddings (appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G Problem: global graphlet counting (macro-level) Given an input graph G - How many triangles in G? - How many cliques of size 4-nodes in G? - How many cycles of size 4-nodes in G?  Many applications require counting all k-vertex graphlets  Recent research work - Exact/approximation of global counts [Rahman et al. TKDE14] [Jha et al. WWW15] - Scalable for massive graphs (billions of nodes/edges) ] [Ahmed et al. ICDM15,KAIS16]
  • 16. INPUT: a large graph G=(V,E), set of graphlets ℋ PROBLEM: Find the number of occurrences that edge i is contained within 𝐻 𝑘, for all k = 1, … , |ℋ| Role discovery, Relational Learning, Multi-label Classification Problem: local graphlet counting (micro-level)
  • 17. Current work • Enumerate all possible graphlets - Exhaustive enumeration is too expensive • Count graphlets for each node - Expensive for large k [Shervashidze et al. – AISTAT 2009] Not practical – scales only for graphs with few hundred/thousand nodes/edges [Hočevar et al. – Bioinfo. 13] Sequential
  • 18. Current work • Enumerate all possible graphlets - Exhaustive enumeration is too expensive • Count graphlets for each node - Expensive for large k [Shervashidze et al. – AISTAT 2009] Not practical – scales only for graphs with few hundred/thousand nodes/edges [Hočevar et al. – Bioinfo. 13] Sequential Parallel • Edge-centric graphlet counting (PGD) ⎻ Multi-core CPUs, large graphs [Ahmed et al. ICDM 14, KAIS 15]
  • 19. Current work • Enumerate all possible graphlets - Exhaustive enumeration is too expensive • Count graphlets for each node - Expensive for large k [Shervashidze et al. – AISTAT 2009] Not practical – scales only for graphs with few hundred/thousand nodes/edges [Hočevar et al. – Bioinfo. 13] Sequential Parallel • Edge-centric graphlet counting (PGD) ⎻ Multi-core CPUs, large graphs • Node-centric graphlet counting, ⎻ Single GPU, Handles only tiny graphs (ORCA-GPU) [Ahmed et al. ICDM 14, KAIS 15] [Milinković et al.]
  • 20. Our approach Hybrid parallel graphlet counting framework that leverages all available CPUs and GPUs Parallel Graphlet Counting Framework Single GPU methods Multi-GPU methods Hybrid CPU-GPU methods Algorithm classes
  • 21. Our approach Hybrid parallel graphlet counting framework that leverages all available CPUs & GPUs Hybrid Parallel Graphlet Counting Framework Multi-GPU methods Hybrid CPU-GPU methods Algorithm classes Single GPU methods Other key advantages: • Edge-centric parallelization ⎻ Improved load balancing & lock-free • Global and local graphlet counts • Connected and disconnected graphlets • Fine-grained parallelization • Space-efficient ⫶ e
  • 22. Overview of our approach
  • 23. uv SuSv … T uv … Ec EDGE Overview nodes completing a triangle with edge (v, u)T = nodes that form a 2-star with v (u)Sv (Su) = Edge-centric, Parallel, Fast, Space-efficient Framework
  • 24. Our Approach – (Edge-centric, parallel, space-efficient) Searching Edge Neighborhoods For each edge Find the triangles Step 1
  • 25. Our Approach – (Edge-centric, parallel, space-efficient) Searching Edge Neighborhoods For each edge Find the triangles Count a few k-graphlets For each edge, count only: k-cliques k-cycles tailed-triangles Step 1 Step 2
  • 26. Our Approach – (Edge-centric, parallel, space-efficient) Searching Edge Neighborhoods For each edge Find the triangles Count a few k-graphlets For each edge, count only: Count all other graphlets For each edge, use combinatorial relationships to derive counts of other graphlets in constant time o(1) k-cliques k-cycles tailed-triangles Step 1 Step 2 Step 3
  • 27. Our Approach – (Edge-centric, parallel, space-efficient) Searching Edge Neighborhoods For each edge Find the triangles Count a few k-graphlets For each edge, count only: Count all other graphlets For each edge, use combinatorial relationships to derive counts of other graphlets in constant time o(1) k-cliques k-cycles tailed-triangles Step 1 Step 2 Step 3 Step 4 Merge all counts
  • 28. neighborhood runtimes (CPU) Key Observations Neighborhood runtimes are power-lawed The distribution of graphlet runtimes for edge neighborhoods obey a power-law.
  • 29. Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law. Neighborhood runtimes are power-lawed
  • 30. HOWEVER, a handful of neighborhoods are hard and take significantly longer. Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law. Neighborhood runtimes are power-lawed
  • 31. HOWEVER, a handful of neighborhoods are hard and take significantly longer. Most edge neighborhoods are fast with runtimes that are approximately equal. Key Observations The distribution of graphlet runtimes for edge neighborhoods obey a power-law. Neighborhood runtimes are power-lawed QUESTION: What is the “best” way to partition neighborhoods among CPUs and GPUs? • “hardness” proxy  edge deg., vol., ...
  • 32. Our approach • Order edges by “hardness” and partition into 3 sets: Γ e1 Γ e 𝑀Γ e𝑗Γ e 𝑘 ⋯ ⋯ ⋯ Πcpu Πgpu
  • 33. Our approach • Order edges by “hardness” and partition into 3 sets: • Compute induced subgraphs centered at each edge • CPU Workers: use hash table for o(1) lookups, O(N) • GPU Workers: use binary search for o(log d) lookups • When finished, dequeue next b edges: • CPU: get b edges from FRONT of Πunproc • GPU: get b edges from BACK of Πunproc
  • 34. Preprocessing steps Three simple and efficient preprocessing steps: 1) Sort vertices from smallest to largest degree 𝑓(∙) and relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
  • 35. Preprocessing steps Three simple and efficient preprocessing steps: 1) Sort vertices from smallest to largest degree 𝑓(∙) and relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁 2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
  • 36. Preprocessing steps Three simple and efficient preprocessing steps: 1) Sort vertices from smallest to largest degree 𝑓(∙) and relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁 2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg. 3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u ⎻ hence, v is always the vertex with largest degree, dv ≥ du
  • 37. Preprocessing steps • All of these steps are not required, but significantly improve • Each step is extremely fast and lends itself to easy parallelization Three simple and efficient preprocessing steps: 1) Sort vertices from smallest to largest degree 𝑓(∙) and relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁 2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg. 3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u ⎻ hence, v is always the vertex with largest degree, dv ≥ du
  • 38. Fine Granularity & Work Stealing For a single edge (v, u) ∈ E, I. Compute the sets 𝐓 and 𝐒 𝐮 II. Find the total 4-cliques using T III. Find the total 4-cycles using 𝐒 𝐮 NOTE: (II) and (III) are independent  parallelize
  • 39. Unrestricted counts 3-graphlets 4-graphlets De = N −(|Sv|+ |Su| + |T|) − 2
  • 41. Time Complexity K = number of edges Δ = max degree Tmax = max number of triangles incident to an edge in G Smax = max number of 2-stars incident to an edge in G
  • 43. Connected 4-graphlet frequencies for a variety of the real- world networks investigated from different domains. Facebook networks Social networks Interaction networks Network type Collaboration networks Brain networks Web graphs Technological/IP networks Dense hard benchmark graphs
  • 44. Validating edge partitioning 0 2 4 6 x 10 4 0 0.05 0.1 0.15 0.2 Edge neighborhoods Time(ms) CPU GPU 0 2 4 6 x 10 4 0 0.05 0.1 0.15 0.2 Edge neighborhoods Time(ms) • Edges partitioned by “hardness” • GPUs assigned sparser neighborhoods • Assigns edge neighborhoods to “best” processor type • Importance of initial ordering GPU workers assigned easy & balanced edge neighborhoods (approx. equal runtimes) CPU workers assigned difficult unbalanced/skewed neighborhoods
  • 45. Experiments: Improvement Runtime improvement over state-of-the-art GPU: Uses a single multi-core GPU Multi-GPU: Uses all available GPUs Hybrid: Leverages all multi-core CPUs & GPUs 2 Intel Xeon CPUs (E5-2687) – • 8 cores (3.10Ghz) 8 Titan Black NVIDIA GPUs – • 2880 cores (889 Mhz), ~6GB
  • 46. Experiments: Improvement Runtime improvement over state-of-the-art Improvement: significant at α = 0.01 GPU: Uses a single multi-core GPU Multi-GPU: Uses all available GPUs Hybrid: Leverages all multi-core CPUs & GPUs
  • 47. Experiments: Improvement Runtime improvement over state-of-the-art Improvement: significant at α = 0.01 MEAN 8x 40x 126x GPU: Uses a single multi-core GPU Multi-GPU: Uses all available GPUs Hybrid: Leverages all multi-core CPUs & GPUs
  • 48. Comparing ORCA-GPU methods Many problems with Orca-GPU: • No “effective parallelization”, many parts dependent • Requires synchronization throughout, locks • No fine-grained parallelization • Significant improvement over Orca-GPU (at 𝛼 = 0.01) Orca-GPU runtime / runtime of proposed method ImprovementoverOrca-GPU
  • 49. Varying the edge ordering Ordering strategy significantly impacts performance
  • 50. Space-efficient & comm. avoidance Average memory (MB) per GPU for three networks.
  • 58. Framework & Algorithms • Introduced hybrid graphlet counting approach that leverages all available CPUs & GPUs • First hybrid CPU-GPU approach for graphlet counting • On average 126x faster than current methods - Edge-centric computations (only requires access to edge neighborhood) • Time and space-efficient Applications • Visual analytics and real-time graphlet mining Summary
  • 59. Research generously supported by: Data: http://networkrepository.com