C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

C-SAW: A Framework for Graph
Sampling and Random Walk on
GPUs
Santosh Pandey , Lingda Li , Adolfy Hoisie ,
Xiaoye S. Li , Hang Liu
Source code: https://github.com/concept-inversion/C-SAW

• Graphs
Natural representation of data; present everywhere.
2
Mining Large Graphs
• Graph embedding
• Graph visualization
• Graph neural networks
Huge storage requirement?
Computationally expensive?
Extracting information from a large graph is challenging.
Algorithms
• Extracting information
Plethora of algorithms to mine graphs.
Large graphs
Millions/Billions of
vertices and edges

3
Graph Sampling and Random Walk (RW)
Reduces computational complexity and memory requirement.
Reference: https://towardsdatascience.com/graph-embeddings-the-summary-cc6075aba007
1
Samples/RWs
2
Train
model
3
Compute
embeddings
G (V, E)

3
15
2
15
2
15
2
15
4
Graph Sampling and RW
4 5
7 8 9
3
2
1
6
0 10
11 12
Biased edge transition.
5
9
10
11
7
7
🎲
*Dice: Random number (0,1)
6
15

5
Framework for Graph Sampling and RW
• Allows implementation with few lines of codes.
GraphSAINT KnightKing
A distributed framework random walk.
Sampler for generating graph embedding.
Framework Sampling algorithms* RW algorithms GPU support
KnightKing ❌ ✅ ❌
GraphSAINT ❌ ✅ ❌
C-SAW ✅ ✅ ✅
Limitations
Our work
Challenge 1: No generic
framework
* Traversal-based graph sampling algorithms.

6
Sampling Example with C-SAW
• Multi-dimensional RW
• Randomly generated frontier set.
• Sample a frontier vertex (Biased).
• Sample a neighbor vertex (Unbiased).
• Replace frontier with sampled neighbor.
C-SAW API
Challenge 1
NeighborPool
FrontierPoolt 8 0 3
Frontiert 8
5 7 9 10 11
FrontierPoolt+1 0 3 7
VERTEXBIAS()
EDGEBIAS ()
UPDATE ()
7
Sampled edges
8
7
Almost all graph sampling/RW algorithms can be defined with similar
flow.
But have different 1) bias and 2) method to update frontier set.

7
C-SAW Framework
Simple and Expressive;
Support existing/emerging algorithms.
Hides complex implementation from
users.
VERTEXBIAS ( )
EDGEBIAS ( )
UPDATE ( )
(a) User programming interface
Challenge 1
(b) MAIN function
Optimized for GPU;

8
Sampling More Than 1 Neighbors
• Objective: Sample 2 (out of 5) neighbors of red vertex (8) with a bias.
Independent and Concurrent.
Fast sampling.
But ???
Thread 1 and 2 sampled same vertex (7).
Selection Collision
4 5
7 8 9
3
2
1
6
0 10
11 12
7
Challenge 2
Thread1 Thread2
🎲
🎲

9
Solutions for Selection Collision
1. Updated sampling
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
3
9
2
9
2
9
2
9
7
🎲
11
Costly update
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
2. Repeated sampling
High repetition
7
🎲
🎲
🎲
10
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
Proposed: Bipartite region search (BRS)
7
🎲
Update random number
to jump from 7. Repeat sampling
in new region.
Reduces repetition
Cheap update
10
Challenge 2
*More details on paper.
🎲
🎲
🎲

10
Sampling Large Graph
GPUs Memory
RTX 2080 Ti 11 GB
P100 16 GB
V100 16/32 GB
Graphs Graph Size (CSR)
Friendster 29 GB
Twitter 22 GB
clueweb12 162 GB
Uk-2014 180 GB
• GPU memory • Graph Size
Limited memory
Larger graph
Solution
Out-of-memory sampling with 1D partition.
Observation: Entire graph not required.
Challenge 3

GP
U
CPU
Frontier = {0, 2, 8}
11
Out-of-memory Sampling
P1 P3
Transfer
partition
2
Workload
balancing
Queue size
3
2 0 1
#active frontier vertices
P
1
P
2
P
3
Workload-aware
scheduling
1
ɸ ɸ
7, 5, 4 Frontier queues
(Kernel K1 exits)
4
P
1
P
2
P
3
Frontier queues
0, 2 8
ɸ
K2
K1
Frontier queues
(Kernel K2 exits)
7 5
3
3 ɸ
7, 5
Challenge 3
Frontier: {0 , 2 , 8}
4 5
7 8 9
3
2
1
6
0 10
11 12
8
2
0
5
7
3 4
*Assume 2 partitions can fit in GPU.
Repeat: until frontier is { ɸ }
1 2 3

12
Experimental Setup
• Comparison metrics:
Sampled edges per second (SEPS).
#𝐒𝐚𝐦𝐩𝐥𝐞𝐝𝐄𝐝𝐠𝐞𝐬
𝐓𝐢𝐦𝐞
• Test performed on Summit supercomputer of ORNL.
• 6 NVIDIA Tesla 16GB V100 GPUs, dual-socket 22-core POWER9 CPUs.
• 10 Datasets.

13
Comparing with The State-of-the-Art
• Length of the RW: 2000
• Number of sampling instances/walks: 4000
Frontier size for multi-dimensional RW: 2000
C-SAW vs KnightKing: Biased RW C-SAW vs GraphSAINT: Multi-dimensional RW
Speedup: 10x (1 GPU) , 14.7x (6 GPUs) Speedup: 8.1x (1 GPU) , 11.5x (6 GPUs)
95 135
Million
SEPS
AM AS CP FS LJ OR RE TW WG YE
0
20
40
60 KnightKing
C-SAW (1 GPU)
C-SAW (6 GPUs)
Million
SEPS
AM AS CP FS LJ OR RE TW WG YE
0
4
8
10 C-SAW (1 GPU) C-SAW (6 GPUs)
GraphSAINT

14
Scalability of C-SAW with Multiple GPUs
• Neighbor sampling with 8000 instances
0
1
2
3
4
5
6
AM AS CP FR LJ OR RE TW WG YE
Speedup
1
2
3
4
5
6
GPUs:
5.2x with 6
GPUs
*More detailed evaluation on paper.

15
Conclusion
• First GPU based framework for graph sampling and RW.
• Outperforms the sate-of-the-art works by 14.7x and 11.5x for KnightKing and GraphSAINT
respectively.
• Efficient out-of-memory sampling for handling larger graphs.
• Future work:
Adding support for more sampling algorithms.
Improving the sampling techniques.
Source code: https://github.com/concept-inversion/C-SAW

16
Acknowledgement
Thank you. Please cite this work if it was useful for you.
Pandey, Santosh, et al. "C-SAW: A framework for graph sampling and random walk on
GPUs." SC20: International Conference for High Performance Computing, Networking,
Storage and Analysis. IEEE, 2020.

C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

Similar to C-SAW: A Framework for Graph Sampling and Random Walk on GPUs (20)

Recently uploaded

Recently uploaded (20)

C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

Editor's Notes