SlideShare a Scribd company logo
C-SAW: A Framework for Graph
Sampling and Random Walk on
GPUs
Santosh Pandey , Lingda Li , Adolfy Hoisie ,
Xiaoye S. Li , Hang Liu
Source code: https://github.com/concept-inversion/C-SAW
• Graphs
Natural representation of data; present everywhere.
2
Mining Large Graphs
• Graph embedding
• Graph visualization
• Graph neural networks
Huge storage requirement?
Computationally expensive?
Extracting information from a large graph is challenging.
Algorithms
• Extracting information
Plethora of algorithms to mine graphs.
Large graphs
Millions/Billions of
vertices and edges
3
Graph Sampling and Random Walk (RW)
Reduces computational complexity and memory requirement.
Reference: https://towardsdatascience.com/graph-embeddings-the-summary-cc6075aba007
1
Samples/RWs
2
Train
model
3
Compute
embeddings
G (V, E)
3
15
2
15
2
15
2
15
4
Graph Sampling and RW
4 5
7 8 9
3
2
1
6
0 10
11 12
Biased edge transition.
5
9
10
11
7
7
🎲
*Dice: Random number (0,1)
6
15
5
Framework for Graph Sampling and RW
• Allows implementation with few lines of codes.
GraphSAINT KnightKing
A distributed framework random walk.
Sampler for generating graph embedding.
Framework Sampling algorithms* RW algorithms GPU support
KnightKing ❌ ✅ ❌
GraphSAINT ❌ ✅ ❌
C-SAW ✅ ✅ ✅
Limitations
Our work
Challenge 1: No generic
framework
* Traversal-based graph sampling algorithms.
6
Sampling Example with C-SAW
• Multi-dimensional RW
• Randomly generated frontier set.
• Sample a frontier vertex (Biased).
• Sample a neighbor vertex (Unbiased).
• Replace frontier with sampled neighbor.
C-SAW API
Challenge 1
NeighborPool
FrontierPoolt 8 0 3
Frontiert 8
5 7 9 10 11
FrontierPoolt+1 0 3 7
VERTEXBIAS()
EDGEBIAS ()
UPDATE ()
7
Sampled edges
8
7
Almost all graph sampling/RW algorithms can be defined with similar
flow.
But have different 1) bias and 2) method to update frontier set.
7
C-SAW Framework
Simple and Expressive;
Support existing/emerging algorithms.
Hides complex implementation from
users.
VERTEXBIAS ( )
EDGEBIAS ( )
UPDATE ( )
(a) User programming interface
Challenge 1
(b) MAIN function
Optimized for GPU;
8
Sampling More Than 1 Neighbors
• Objective: Sample 2 (out of 5) neighbors of red vertex (8) with a bias.
Independent and Concurrent.
Fast sampling.
But ???
Thread 1 and 2 sampled same vertex (7).
Selection Collision
4 5
7 8 9
3
2
1
6
0 10
11 12
7
Challenge 2
Thread1 Thread2
🎲
🎲
9
Solutions for Selection Collision
1. Updated sampling
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
3
9
2
9
2
9
2
9
7
🎲
11
Costly update
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
2. Repeated sampling
High repetition
7
🎲
🎲
🎲
10
4 5
7 8 9
3
2
1
6
0 10
11 12
3
15
6
15
2
15
2
15
2
15
Proposed: Bipartite region search (BRS)
7
🎲
Update random number
to jump from 7. Repeat sampling
in new region.
Reduces repetition
Cheap update
10
Challenge 2
*More details on paper.
🎲
🎲
🎲
10
Sampling Large Graph
GPUs Memory
RTX 2080 Ti 11 GB
P100 16 GB
V100 16/32 GB
Graphs Graph Size (CSR)
Friendster 29 GB
Twitter 22 GB
clueweb12 162 GB
Uk-2014 180 GB
• GPU memory • Graph Size
Limited memory
Larger graph
Solution
Out-of-memory sampling with 1D partition.
Observation: Entire graph not required.
Challenge 3
GP
U
CPU
Frontier = {0, 2, 8}
11
Out-of-memory Sampling
P1 P3
Transfer
partition
2
Workload
balancing
Queue size
3
2 0 1
#active frontier vertices
P
1
P
2
P
3
Workload-aware
scheduling
1
ɸ ɸ
7, 5, 4 Frontier queues
(Kernel K1 exits)
4
P
1
P
2
P
3
Frontier queues
0, 2 8
ɸ
K2
K1
Frontier queues
(Kernel K2 exits)
7 5
3
3 ɸ
7, 5
Challenge 3
Frontier: {0 , 2 , 8}
4 5
7 8 9
3
2
1
6
0 10
11 12
8
2
0
5
7
3 4
*Assume 2 partitions can fit in GPU.
Repeat: until frontier is { ɸ }
1 2 3
12
Experimental Setup
• Comparison metrics:
Sampled edges per second (SEPS).
#𝐒𝐚𝐦𝐩𝐥𝐞𝐝𝐄𝐝𝐠𝐞𝐬
𝐓𝐢𝐦𝐞
• Test performed on Summit supercomputer of ORNL.
• 6 NVIDIA Tesla 16GB V100 GPUs, dual-socket 22-core POWER9 CPUs.
• 10 Datasets.
13
Comparing with The State-of-the-Art
• Length of the RW: 2000
• Number of sampling instances/walks: 4000
Frontier size for multi-dimensional RW: 2000
C-SAW vs KnightKing: Biased RW C-SAW vs GraphSAINT: Multi-dimensional RW
Speedup: 10x (1 GPU) , 14.7x (6 GPUs) Speedup: 8.1x (1 GPU) , 11.5x (6 GPUs)
95 135
Million
SEPS
AM AS CP FS LJ OR RE TW WG YE
0
20
40
60 KnightKing
C-SAW (1 GPU)
C-SAW (6 GPUs)
Million
SEPS
AM AS CP FS LJ OR RE TW WG YE
0
4
8
10 C-SAW (1 GPU) C-SAW (6 GPUs)
GraphSAINT
14
Scalability of C-SAW with Multiple GPUs
• Neighbor sampling with 8000 instances
0
1
2
3
4
5
6
AM AS CP FR LJ OR RE TW WG YE
Speedup
1
2
3
4
5
6
GPUs:
5.2x with 6
GPUs
*More detailed evaluation on paper.
15
Conclusion
• First GPU based framework for graph sampling and RW.
• Outperforms the sate-of-the-art works by 14.7x and 11.5x for KnightKing and GraphSAINT
respectively.
• Efficient out-of-memory sampling for handling larger graphs.
• Future work:
Adding support for more sampling algorithms.
Improving the sampling techniques.
Source code: https://github.com/concept-inversion/C-SAW
16
Acknowledgement
Thank you. Please cite this work if it was useful for you.
Pandey, Santosh, et al. "C-SAW: A framework for graph sampling and random walk on
GPUs." SC20: International Conference for High Performance Computing, Networking,
Storage and Analysis. IEEE, 2020.

More Related Content

What's hot

[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
Deep Learning JP
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
Seiya Tokui
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
Wei Yang
 
SynFace: Face Recognition with Synthetic Data 論文紹介
SynFace:  Face Recognition with Synthetic Data 論文紹介SynFace:  Face Recognition with Synthetic Data 論文紹介
SynFace: Face Recognition with Synthetic Data 論文紹介
Plot Hong
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
Deep Learning JP
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
Deep Learning JP
 
Objectnessとその周辺技術
Objectnessとその周辺技術Objectnessとその周辺技術
Objectnessとその周辺技術
Takao Yamanaka
 
敵対的サンプル・摂動サーベイ
敵対的サンプル・摂動サーベイ敵対的サンプル・摂動サーベイ
敵対的サンプル・摂動サーベイ
Simossyi Funabashi
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
ぱんいち すみもと
 
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
SSII
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial Networks
ARISE analytics
 
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
Deep Learning JP
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
Deep Learning JP
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
Deep Learning JP
 
GANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムGANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズム
Hirosaji
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
GiacomoBalloccu
 
[DL輪読会]DropBlock: A regularization method for convolutional networks
[DL輪読会]DropBlock: A regularization method for convolutional networks[DL輪読会]DropBlock: A regularization method for convolutional networks
[DL輪読会]DropBlock: A regularization method for convolutional networks
Deep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
 

What's hot (20)

[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
 
SynFace: Face Recognition with Synthetic Data 論文紹介
SynFace:  Face Recognition with Synthetic Data 論文紹介SynFace:  Face Recognition with Synthetic Data 論文紹介
SynFace: Face Recognition with Synthetic Data 論文紹介
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
Objectnessとその周辺技術
Objectnessとその周辺技術Objectnessとその周辺技術
Objectnessとその周辺技術
 
敵対的サンプル・摂動サーベイ
敵対的サンプル・摂動サーベイ敵対的サンプル・摂動サーベイ
敵対的サンプル・摂動サーベイ
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
 
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
SSII2021 [OS3-03] 画像と点群を用いた、森林という広域空間のゾーニングと施業管理
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial Networks
 
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
 
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 
GANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムGANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズム
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
 
[DL輪読会]DropBlock: A regularization method for convolutional networks
[DL輪読会]DropBlock: A regularization method for convolutional networks[DL輪読会]DropBlock: A regularization method for convolutional networks
[DL輪読会]DropBlock: A regularization method for convolutional networks
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
 

Similar to C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

OpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow smallOpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow small
takuyayamamoto1800
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
Tigabu Yaya
 
OpenFOAM benchmark for EPYC server: cavity medium
OpenFOAM benchmark for EPYC server: cavity mediumOpenFOAM benchmark for EPYC server: cavity medium
OpenFOAM benchmark for EPYC server: cavity medium
takuyayamamoto1800
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
klepsydratechnologie
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
Shien-Chun Luo
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ganesan Narayanasamy
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
KangZhang
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
Matthias Trapp
 
Graph processing
Graph processingGraph processing
Graph processing
yeahjs
 
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
Oregon State University
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Yuichiro Yasui
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
MLconf
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
Dongliang_Slides
Dongliang_SlidesDongliang_Slides
Dongliang_Slides
Dongliang Chu
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 

Similar to C-SAW: A Framework for Graph Sampling and Random Walk on GPUs (20)

OpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow smallOpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow small
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
OpenFOAM benchmark for EPYC server: cavity medium
OpenFOAM benchmark for EPYC server: cavity mediumOpenFOAM benchmark for EPYC server: cavity medium
OpenFOAM benchmark for EPYC server: cavity medium
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
 
Graph processing
Graph processingGraph processing
Graph processing
 
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion s...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
Dongliang_Slides
Dongliang_SlidesDongliang_Slides
Dongliang_Slides
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 

Recently uploaded

Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
amsjournal
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 

Recently uploaded (20)

Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 

C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

  • 1. C-SAW: A Framework for Graph Sampling and Random Walk on GPUs Santosh Pandey , Lingda Li , Adolfy Hoisie , Xiaoye S. Li , Hang Liu Source code: https://github.com/concept-inversion/C-SAW
  • 2. • Graphs Natural representation of data; present everywhere. 2 Mining Large Graphs • Graph embedding • Graph visualization • Graph neural networks Huge storage requirement? Computationally expensive? Extracting information from a large graph is challenging. Algorithms • Extracting information Plethora of algorithms to mine graphs. Large graphs Millions/Billions of vertices and edges
  • 3. 3 Graph Sampling and Random Walk (RW) Reduces computational complexity and memory requirement. Reference: https://towardsdatascience.com/graph-embeddings-the-summary-cc6075aba007 1 Samples/RWs 2 Train model 3 Compute embeddings G (V, E)
  • 4. 3 15 2 15 2 15 2 15 4 Graph Sampling and RW 4 5 7 8 9 3 2 1 6 0 10 11 12 Biased edge transition. 5 9 10 11 7 7 🎲 *Dice: Random number (0,1) 6 15
  • 5. 5 Framework for Graph Sampling and RW • Allows implementation with few lines of codes. GraphSAINT KnightKing A distributed framework random walk. Sampler for generating graph embedding. Framework Sampling algorithms* RW algorithms GPU support KnightKing ❌ ✅ ❌ GraphSAINT ❌ ✅ ❌ C-SAW ✅ ✅ ✅ Limitations Our work Challenge 1: No generic framework * Traversal-based graph sampling algorithms.
  • 6. 6 Sampling Example with C-SAW • Multi-dimensional RW • Randomly generated frontier set. • Sample a frontier vertex (Biased). • Sample a neighbor vertex (Unbiased). • Replace frontier with sampled neighbor. C-SAW API Challenge 1 NeighborPool FrontierPoolt 8 0 3 Frontiert 8 5 7 9 10 11 FrontierPoolt+1 0 3 7 VERTEXBIAS() EDGEBIAS () UPDATE () 7 Sampled edges 8 7 Almost all graph sampling/RW algorithms can be defined with similar flow. But have different 1) bias and 2) method to update frontier set.
  • 7. 7 C-SAW Framework Simple and Expressive; Support existing/emerging algorithms. Hides complex implementation from users. VERTEXBIAS ( ) EDGEBIAS ( ) UPDATE ( ) (a) User programming interface Challenge 1 (b) MAIN function Optimized for GPU;
  • 8. 8 Sampling More Than 1 Neighbors • Objective: Sample 2 (out of 5) neighbors of red vertex (8) with a bias. Independent and Concurrent. Fast sampling. But ??? Thread 1 and 2 sampled same vertex (7). Selection Collision 4 5 7 8 9 3 2 1 6 0 10 11 12 7 Challenge 2 Thread1 Thread2 🎲 🎲
  • 9. 9 Solutions for Selection Collision 1. Updated sampling 4 5 7 8 9 3 2 1 6 0 10 11 12 3 15 6 15 2 15 2 15 2 15 3 9 2 9 2 9 2 9 7 🎲 11 Costly update 4 5 7 8 9 3 2 1 6 0 10 11 12 3 15 6 15 2 15 2 15 2 15 2. Repeated sampling High repetition 7 🎲 🎲 🎲 10 4 5 7 8 9 3 2 1 6 0 10 11 12 3 15 6 15 2 15 2 15 2 15 Proposed: Bipartite region search (BRS) 7 🎲 Update random number to jump from 7. Repeat sampling in new region. Reduces repetition Cheap update 10 Challenge 2 *More details on paper. 🎲 🎲 🎲
  • 10. 10 Sampling Large Graph GPUs Memory RTX 2080 Ti 11 GB P100 16 GB V100 16/32 GB Graphs Graph Size (CSR) Friendster 29 GB Twitter 22 GB clueweb12 162 GB Uk-2014 180 GB • GPU memory • Graph Size Limited memory Larger graph Solution Out-of-memory sampling with 1D partition. Observation: Entire graph not required. Challenge 3
  • 11. GP U CPU Frontier = {0, 2, 8} 11 Out-of-memory Sampling P1 P3 Transfer partition 2 Workload balancing Queue size 3 2 0 1 #active frontier vertices P 1 P 2 P 3 Workload-aware scheduling 1 ɸ ɸ 7, 5, 4 Frontier queues (Kernel K1 exits) 4 P 1 P 2 P 3 Frontier queues 0, 2 8 ɸ K2 K1 Frontier queues (Kernel K2 exits) 7 5 3 3 ɸ 7, 5 Challenge 3 Frontier: {0 , 2 , 8} 4 5 7 8 9 3 2 1 6 0 10 11 12 8 2 0 5 7 3 4 *Assume 2 partitions can fit in GPU. Repeat: until frontier is { ɸ } 1 2 3
  • 12. 12 Experimental Setup • Comparison metrics: Sampled edges per second (SEPS). #𝐒𝐚𝐦𝐩𝐥𝐞𝐝𝐄𝐝𝐠𝐞𝐬 𝐓𝐢𝐦𝐞 • Test performed on Summit supercomputer of ORNL. • 6 NVIDIA Tesla 16GB V100 GPUs, dual-socket 22-core POWER9 CPUs. • 10 Datasets.
  • 13. 13 Comparing with The State-of-the-Art • Length of the RW: 2000 • Number of sampling instances/walks: 4000 Frontier size for multi-dimensional RW: 2000 C-SAW vs KnightKing: Biased RW C-SAW vs GraphSAINT: Multi-dimensional RW Speedup: 10x (1 GPU) , 14.7x (6 GPUs) Speedup: 8.1x (1 GPU) , 11.5x (6 GPUs) 95 135 Million SEPS AM AS CP FS LJ OR RE TW WG YE 0 20 40 60 KnightKing C-SAW (1 GPU) C-SAW (6 GPUs) Million SEPS AM AS CP FS LJ OR RE TW WG YE 0 4 8 10 C-SAW (1 GPU) C-SAW (6 GPUs) GraphSAINT
  • 14. 14 Scalability of C-SAW with Multiple GPUs • Neighbor sampling with 8000 instances 0 1 2 3 4 5 6 AM AS CP FR LJ OR RE TW WG YE Speedup 1 2 3 4 5 6 GPUs: 5.2x with 6 GPUs *More detailed evaluation on paper.
  • 15. 15 Conclusion • First GPU based framework for graph sampling and RW. • Outperforms the sate-of-the-art works by 14.7x and 11.5x for KnightKing and GraphSAINT respectively. • Efficient out-of-memory sampling for handling larger graphs. • Future work: Adding support for more sampling algorithms. Improving the sampling techniques. Source code: https://github.com/concept-inversion/C-SAW
  • 16. 16 Acknowledgement Thank you. Please cite this work if it was useful for you. Pandey, Santosh, et al. "C-SAW: A framework for graph sampling and random walk on GPUs." SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020.

Editor's Notes

  1. Graph provides a natural representation for most data and are widely used. They are mostly used to represent social networks, web networks, knowledge graphs etc. (N) A plethora of algorithms exists for mining crucial information from graphs. Generating embeddings, visualization of graph data and graph neural networks are some of the example algorithms. (N) But Real world graphs are large and applying these algorithms directly over large graphs incurs huge storage requirement and computational cost. (N) Hence, extracting information from a large graph is challenging.
  2. Graph sampling and random walk algorithms are used to overcome the challenge. Instead of directly applying algorithms over a large graph G (N), we can generate samples or random walks. As samples and random walks are reduced representation of graph, the memory and computational requirement for processing them is also low (N). Multiple instances of samples or random walks can be used instead of whole graph allowing applications to train (N) and perform prediction or compute embeddings from large graphs.
  3. Let’s take a deeper look on how we generate samples or random walks. We start from a randomly selected source vertex 8, and randomly traverse immediate neighbors for certain steps (N). Here, vertex 5, 7, 9, 10 and 11 are the immediate neighbors of vertex 8. The transition probability for each vertex can be defined as degree of vertex upon sum of degree of all neighbor vertices. We can call this transition as biased edge transition. For vertex 8, the sum of degree of all neighbor vertices is 15. For example, as the degree of vertex 7 is 6, its transition probability is 6 upon 15. For random transition (N), we need to generate a random number between 0 and 1 (N) which is denoted by a dice roll here. Based upon the random number, (N) we select a neighbor. We repeat the process from vertex 7 for certain steps to generate samples or random walks.
  4. Moving on, let's have an overview of available framework for graph sampling and random walks. A framework allows us to implement sampling and random walk algorithms with just a few lines of code. Two related works propose a framework for sampling and random walk. (N) The first one is Graphsaint which focus is on defining sampler for generating graph embeddings (N). The second one is KnightKing which is a distributed framework for random walk algorithms (N). Moving towards the limitations, (N) both graphsaint and kinghtking lacks support for majority of traversal based graph sampling algorithm and also do not support GPUs. (N) This brings us towards our first challenge: building a generic framework supporting all algorithms. (N) Our proposed framework C-SAW is the first framework in our knowledge able to support both sampling and random walk algorithms with GPU support.
  5. Let’s see how C-SAW can be used to implement these algorithms with Multi-dimensional random walk as an example. (N) This random walk starts with a randomly generated frontier set. We have vertex 8, 0 and 3 in the frontier set. First, (N) we sample a frontier vertex with vertex degree as a bias. Here, the sampled frontier is vertex 8. Next, (N) we gather the neighbors of vertex 8 (N) and sample one neighbor with equal probability or unbiased selection. The sampled neighbor is 7 which is also added to the sampled edge list. Next, (N) we update the frontier with sampled neighbor by replacing vertex 8 in the frontierpool with vertex 7. (N) Almost all sampling and random walk algorithms can be defined with similar flow (N) but they differ in two things : first, how the bias is defined and second one is method to update the frontier set. (N) Now, let's see how we can use C-SAW APIs to implement these algorithms. (N) Our first API vertexbias defines how a frontier is sampled from frontierpool. (N) Second API Edgebias defines how a neighbor is sampled from neighborpool and (N) our third API update defines how a frontier is updated based upon the sampled neighbor.
  6. Looking at the overview of C-SAW framework, (N) C-SAW provides three simple and expressive APIs or user programming interface which can support existing algorithms and have flexibility to support emerging ones. (N) The main function uses these APIs for implementation and is optimized for GPU. (N) The users do not need to know about the complex implementation of C-SAW to define sampling and random walk algorithms. With this, we address our first challenge for a generic framework.
  7. Moving on to the next challenge: while most random walk algorithms sample only a single neighbor, some graph sampling algorithms like layer sampling or neighbor sampling need to sample more than one neighbor from a vertex with a bias. Here, in this figure, our objective is to sample 2 neighbors of vertex 8 based upon a bias. (N) For faster sampling, we use two different threads to sample each neighbor. Each thread samples independently and concurrently. (N) But with biased sampling, different threads could sample the same neighbor. Here, both threads try to sample same vertex 7 (N) which is not allowed if we do not want any duplicates. We term this duplication as selection collision which is another challenge not addressed by previous work.
  8. One solution for selection collision could be updated sampling. In this method, we update the transition probability after sampling each neighbor. (N) Here, we first sampled the neighbor 7 randomly. Then, (N) vertex 7 is removed from the neighbor list and the transition probability is updated for the remaining neighbors. (N) We perform sampling again with updated transition probability. This time we sampled neighbor 11. As the sampled vertices are removed, we avoid selection collision from happening. (N) But updating the transition probability after each sample is very costly. (N) Another solution for selection collision is repeated sampling. In this technique, we repeat the sampling until we acquire unique neighbors. (N) First, each thread samples neighbor independently. As there is a selection collision for a thread in vertex 7, (N) one thread repeats the sampling process. There is again a selection collision as vertex 7 is sampled again. (N) Finally, in another repetition, the thread is able to sample a unique neighbor This method also solves the problem of selection collision but (N) may require higher number of repetition. (N) We propose an efficient solution inspired from both updated sampling and repeated sampling which we term as bipartite region search or BRS. (N) If selection collision occurs during sampling, (N) we update the random number in such a way that we jump from vertex 7. The update corresponds to a virtual removal of vertex 7 from the neighbor list. (N) Then, we repeat sampling with updated random number in a new region. With this technique, (N) we reduce number of repetition for sampling and avoid costly update. With BRS, C-SAW solves challenge 2 more efficiently. (N) More details on how BRS works can be found on the paper.
  9. As discussed earlier, real world graphs are can be very large. (N) The average memory in recent GPUs in 16 GB with a model of V100 GPU having up to 32 GB. (N) The size of the graph in CSR format for graphs like Friendster and Twitter is 29 GB and 22 GB respectively which is larger than the average memory size of recent GPUs. Even with GPU memory size of 32 GB, graphs like clueweb12 and Uk-2014 have higher than 100 GB space requirement. (N) This brings us to a third challenge: handling graphs larger than memory size of GPU. (N) For sampling and random walk, we observe that entire graph is not required to be stored in the GPU memory. We only need the active frontiers and their immediate neighbors for each step of sampling. (N) This motivates our solution out-of-memory sampling with 1D partition.
  10. Let’s see an example of out-of-memory sampling with C-SAW. (N) Assume we are sampling randomly generated source vertices 0, 2 and 8. (N) We partition the graph by equally assigning the vertices to a different partition P1, p2 and P3. Each color represents a partition in the figure. (N) At first, we have frontiers in the CPU side. (N)Then, we determine the active frontier vertices for each partition. Here, P1, P2 and P3 have 2,0 and 1 active vertices respectively. Our first optimization, (N) workload-aware scheduling determines which partition to sample based upon the workload. The partition with higher workload is scheduled earlier as it helps to reduce the overall partition transfer to GPU. (N) Assuming we can only sample two partitions, we select P1 and P3. (N) Our next optimization, workload balancing, allocates computation resources based upon the workload. As the ration of workload is 2:1 for P1 and P3, the computational resource is also allocated in the ration of 2:1. (N) Then, the partitions are transferred to the GPU side for sampling. (N) Each partition have its own frontier queue in the GPU memory. (N) After first round of sampling, P1 have one frontier, P2 have two frontiers 7,5, and P3 have 0 frontiers. Each selected partitioned at an iteration continues sampling as long as they have some frontiers. As P3 do not have any active frontier, sampling terminates for P3. (N) In the next round, only P1 is sampled which results in addition of one frontier vertex in P2 and sampling terminates for P1. As sampling for both P1 and P3 have terminated, one iteration is completed with 3 vertices in frontier queue of P2. (N) For determining next partitions and resource allocation, the queue size for each partition is passed to CPU side. (N) We repeat this process until the frontier is empty. With this out-of-memory sampling, we address challenge 3.
  11. Moving on to the evaluations, we perform all tests on summit supercomputer from oak ridge national labo. (N) Each node is equipped with 6 V100 gpus with 16gb memory and dual-socket 22 core power9 CPUs. (N) We use 10 different datasets for evaluation. More details on graph datasets and size can be found on the paper. (N) For comparing our work with related works, we use sampled edges per second or SEPS in short. It is computed as total number of sampled edges upon total time for sampling.
  12. For comparing with state of the art methods, we compare our results with Knightking and graphsaint. For Knightking, we use biased ranom walk and for graphsaint we use multi-dimensional random walk. (N) The length of the random walk is kept 2000 and number of sampling instances or walks is 4000 for all comparisons. (N) The graph shows million sampled edges per second achieved for different datasets. Compared with knightking, we achieve 10 times and 14.7 times speedup with 1 GPU and 6 GPUs. (N) We achieve 8.1 times and 11.5 times speedup with 1 GPU and 6 GPU respectively. For multi-dimensional random walk, we use a frontier size of 2000.
  13. Next, we compare the scalability of C-SAW with multiple GPUs using neighbor sampling algorithm with 8000 instances of samples. (N) The graph shows the speedup achieved with different graph datasets. (N) We achieve upto 5.2x speedup at max with 6 GPUs. (N) More detailed evaluation of C-SAW along with profiling of each optimization can be found in the paper.
  14. In conclusion, C-SAW is first generic GPU based framework for both graph sampling random walk. C-SAW outperforms both state-of-the-art implementation 14.7 times and 11.5 times for Knightking and Graphsaint respectively. C-SAW provides efficient out-of-memory sampling for handling larger graphs. For future improvements, we leave adding support for more sampling algorithms and improving the existing sampling techniques. The source-code of C-SAW is open sourced in Github. Please checkout our paper if you find this work interesting.
  15. This work was supported by NSF and Department of energy. Thank you.