SlideShare a Scribd company logo
Scalable Graph
Convolutional Network
based Link Prediction on
a Distributed Graph
Database Server
Anuradha Karunarathna, Dinika Senarath, Shalika
Madhushanki, Chinthaka Weerakkody, Miyuru
Dayarathna, Sanath Jayasena, and Toyotaro
Suzumura
21/10/2020
IEEE International Conference on Cloud Computing 2020
University of Moratuwa, Sri Lanka
WSO2, Inc. USA
IBM T.J. Watson Research Center, USA
MIT-IBM Watson AI Lab, USA
Barcelona Supercomputing Center, Spain
Introduction
▰ Graphs are rich data structures
▰ Graph data enables a wide variety of applications
▰ Link prediction in graph databases has become a
prominent research area
2
Online social networks
Protein-interaction networks
Computational biology
Cyber security
Transportation systems
Human Protein Interaction Network (P.M.
Kim et al, 2007).
Graph Convolutional Neural Networks
●
Until very recently little attention has been made to the generalization
of neural network models to graph structured data [1]
●
Graph Convolutional Network (GCN) is an improvement made over
Convolutional Neural Networks with the aim of encoding graphs
3
[1] https://tkipf.github.io/graph-convolutional-networks/
Why Graph Link Prediction?
Link prediction predicts whether there will be links between two
nodes based on the attribute information and the observed
existing link Information.
▰ Recommendation Systems
▰ Interaction discovery - (bioinformatics)
▰ Route planning - (aircraft route planning)
▰ Help to find hidden terrorist criminal gangs
4
Presentation Outline
▰ Introduction
▰ Research Problem (Link Prediction Performance)
▰ Proposed Solution (Scheduling algorithm)
▰ Related Work
▰ Methodology
▰ Evaluation
▰ Conclusion
5
Research Problem
Conduct efficient
scheduling of link
prediction tasks on
large attributed
graphs?
6
Graphs are used in many
applications
Graph datasets have
become too large
Expensive in terms of
storage and computational
Time for link prediction
★ Distribute graphs
★ Perform link
prediction on
distributed graphs
but
Proposed Solution and Contributions
▰ Distribute graphs across multi-machine clusters and
conduct deep learning and link prediction on distributed
graph partitions
▰ Develop a scheduling algorithm to conduct GCN
training process of the graph partitions in the worker
nodes
7
Objectives
▰ Develop a Link prediction application on top of a distributed
graph database server - JasmineGraph [1]
▰ Our approach has
▻ High accuracy by considering graph structure + node
features
▻ Computational efficiency
▻ Effective Communication Management
8
[1] M. Dayarathna (2018), miyurud/jasminegraph, GitHub. [Online]. Available:
https://github.com/miyurud/jasminegraph .
Related Work
9
No Related Work Relatedness Limitation
1. Link prediction using
heuristics [16]
Eg: Common Neighbour,
Jaccard coefficient, Katz
index
Link prediction mechanism on
graphs
● Finding one heuristic which can be applied for any generic
graph
● Ignoring explicit features of the graph
● Only consider the graph structure
● Capture a small set of structure patterns
2. SEAL [26] Link prediction based on local
subgraphs using a graph neural
network
● Use matrix factorization for node embeddings (train and
optimize embedding vector of each node)
● Huge number of parameters because number of node
parameters are linear with graph size
[16] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and
technology, 58(7):1019–1031, 2007.
[26] M. Zhang and Y. Chen. Link prediction based on graph neural networks. In Proceedings of the 32Nd International Conference on Neural Information
Processing Systems, NIPS’18, pages 5171–5181,USA, 2018. Curran Associates Inc
Related Work (Contd.)
10
3. GraphSAGE [10]
(Graph SAmple and
aggreGatE)
Inductive node embedding
generation based on GCN
Training on local sub-graphs is possible, but requires entire graph to
be loaded to the memory
4. Pytorch BigGraph [14] Distributed graph training
mechanism
● High number of buckets (If nodes are partitioned to p
partitions, there are p^2 buckets)
● Random node partitioning
● Shared file system
5. Euler [1] Distributed graph training
mechanism
● Distributed graph training mechanism
[1] Alibaba. Euler. URL: https://github.com/alibaba/euler , 2019.
[10] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information
Processing Systems, 2017, pp. 1024–1034.
[14] A. Ler er, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph
embedding system. CoRR, abs/1903.12287, 2019.
Methodology - JasmineGraph
▰ JasmineGraph Distributed
Database Server [1]
▻ Partitions and stores graph
data using one of the
Metis/hash/etc. partitioning
approaches
11
[1] https://github.com/miyurud/jasminegraph
Methodology (Contd.)
▰ Node Embedding Generation
12
▻ Training process happens
in database server
partitions
▻ Using GraphSAGE
▻ Implemented in Tensorflow
▻ Embeddings are written to
the model store
Methodology (Contd.)
▰ We use localized graph
convolution modules to
train each graph
partition
▰ The training happens in
an unsupervised
manner
13
Methodology (Embedding Generation for
a Graph Partition)
▰ The training process is initiated with the
concatenation of local store and its corresponding
central store partition
▰ The concatenated graph and the graph structure
is input to the neural network
▰ The hidden layers of the GNN are structured in a
way to transform the node features and aggregate
the features over the graph to generate node
embeddings.
14
Methodology (Contd.)
▰ Link Prediction using generated node embeddings
▻ Inter worker communication to collect node
embeddings
▻ No linear comparison with all other nodes (Time
Complexity – O(N))
▻ Apply Locality Sensitive Hashing to rank predictions
15
Link Prediction Algorithm
▰ Accepts a starting node
denoted as query node (q)
and it points out a list of
predicted nodes as the output
▰ Random projection method of
LSH
16
Scheduling Algorithm
▰ Decides which partitions can be trained parallely with the
available memory
▰ Two main objectives,
▻ Utilize the available memory optimally
▻ Finish training all partitions in minimum number of iterations
▰ Bin Packing Problem - given n different items with weights, and
bins with capacity c, assign each item to a bin in a manner that
number of total bins is minimized.
▻ Bins - Training Iterations
▻ Capacity - Available Memory
▻ Items - Graph Partitions
▻ Weights - Memory requirement of each partition 17
Data Sets
18
Dataset Number of
vertices
Number of
edges
Number of
features
Edgelist File
Size (MB)
Feature File
Size (MB)
Twitter 81,306 1,768,149 1007 16 157
Amazon Small 548,551 1,244,636 250 19.4 266
Reddit 232,965 11,606,919 602 145 270
DBLP-V11 4,107,340 36,624,464 948 508 9523
Data Sets (Contd.)
19
Experiments and the environments
20
Experiments
Vertical Scalability Horizontal Scalability
Node
Embedding
Accuracy
Graph
Training
Time
Experiment
s
Node
Embedding
Accuracy
Graph
Training
Time
Experiments
Server Specification
CPU 80
RAM 64GB
OS Ubuntu
16.04.6
LTS
Disk 1.8TB
Master Specification
CPU 4
RAM 16GB
OS Ubuntu
16.04.6
LTS
Disk 100GB
CPU 8
RAM 30GB
OS Ubuntu
16.04.6
LTS
Disk 10GB
Graph Training Experiments - Vertical
Scalability
21
Graph Training Experiments - Vertical
Scalability
22
Graph Training Experiments - Horizontal
Scalability
23
Training Times for Different Graph
Partitions
24
Our approach could run
Training process on the
Partitioned DBLP-V11
Accuracy Comparison Experiments
▰ Mean Reciprocal Rank (MRR value)
value
▰ Hit @ 1 score
▰ Hit @ 10 score
25
MRR Value
26
Hit@1 Score Comparison
27
Hit@10 Score Comparison
28
Training Accuracy of Different Graph
Partitions
29
Conclusion
▰ Current graph link prediction approaches cannot scale
well when the datasets are large
▰ A solution is to perform link prediction distributedly
▰ Densely connected components play critical role in
determining the performance of the overall training
process
30
Conclusion (Contd.)
▰ JasmineGraph was able to train a GCN from the largest dataset
DBLP-V11 (> 9.3GB) in 11 hours and 40 minutes time using 16
workers on a single server.
▰ Reddit was processed by the original GraphSAGE implementation in
238 minutes while JasmineGraph took only 100 minutes on the same
hardware with 16 workers leading to 2.4 times improved performance
▰ Future work - graph stream processing, privacy preserving machine
learning
31
Thank you!
32

More Related Content

What's hot

Ai lecture 03 computer vision
Ai lecture 03 computer visionAi lecture 03 computer vision
Ai lecture 03 computer vision
Ahmad sohail Kakar
 
Essentials of lap
Essentials of lapEssentials of lap
Essentials of lap
Home
 
Iris recognition
Iris recognitionIris recognition
Iris recognition
NARAHARISRUTHI1
 
Postoperative retained foreign bodies
Postoperative retained foreign bodiesPostoperative retained foreign bodies
Postoperative retained foreign bodies
Sandrina Dascalescu
 
Robotic surgery - Principles
Robotic surgery - PrinciplesRobotic surgery - Principles
Robotic surgery - Principles
AbhishekPandey1012
 
Design patterns for mobile apps
Design patterns for mobile appsDesign patterns for mobile apps
Design patterns for mobile apps
Ivano Malavolta
 
Capsule endoscopy
Capsule endoscopyCapsule endoscopy
Capsule endoscopy
Ahmed Abudeif
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
BharathiLakshmiAAssi
 
finger print pore extraction methods
finger print pore extraction methodsfinger print pore extraction methods
finger print pore extraction methods
ARVIND SARDAR
 
Robotic surgery
Robotic surgery Robotic surgery
Robotic surgery
MOHD HASEEB KHAN
 
IoT & Applications Digital Notes (1).pdf
IoT & Applications Digital Notes (1).pdfIoT & Applications Digital Notes (1).pdf
IoT & Applications Digital Notes (1).pdf
ssusere169ea1
 

What's hot (11)

Ai lecture 03 computer vision
Ai lecture 03 computer visionAi lecture 03 computer vision
Ai lecture 03 computer vision
 
Essentials of lap
Essentials of lapEssentials of lap
Essentials of lap
 
Iris recognition
Iris recognitionIris recognition
Iris recognition
 
Postoperative retained foreign bodies
Postoperative retained foreign bodiesPostoperative retained foreign bodies
Postoperative retained foreign bodies
 
Robotic surgery - Principles
Robotic surgery - PrinciplesRobotic surgery - Principles
Robotic surgery - Principles
 
Design patterns for mobile apps
Design patterns for mobile appsDesign patterns for mobile apps
Design patterns for mobile apps
 
Capsule endoscopy
Capsule endoscopyCapsule endoscopy
Capsule endoscopy
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
finger print pore extraction methods
finger print pore extraction methodsfinger print pore extraction methods
finger print pore extraction methods
 
Robotic surgery
Robotic surgery Robotic surgery
Robotic surgery
 
IoT & Applications Digital Notes (1).pdf
IoT & Applications Digital Notes (1).pdfIoT & Applications Digital Notes (1).pdf
IoT & Applications Digital Notes (1).pdf
 

Similar to Scalable Graph Convolutional Network Based Link Prediction on a Distributed Graph Database Server

Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
Jason Riedy
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
ijdms
 
PointNet
PointNetPointNet
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
Satwik Mishra
 
Node classification with graph neural network based centrality measures and f...
Node classification with graph neural network based centrality measures and f...Node classification with graph neural network based centrality measures and f...
Node classification with graph neural network based centrality measures and f...
IJECEIAES
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
Guy K. Kloss
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
cscpconf
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
nitesh saxena
 
Web Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink StructureWeb Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink Structure
aciijournal
 
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
Satwik Mishra
 
2013-imMens-EuroVis
2013-imMens-EuroVis2013-imMens-EuroVis
2013-imMens-EuroVis
somayeh geravand
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for ai
Kyunam Cho
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
IJECEIAES
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
ADAPTER
ADAPTERADAPTER

Similar to Scalable Graph Convolutional Network Based Link Prediction on a Distributed Graph Database Server (20)

Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
PointNet
PointNetPointNet
PointNet
 
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
 
Node classification with graph neural network based centrality measures and f...
Node classification with graph neural network based centrality measures and f...Node classification with graph neural network based centrality measures and f...
Node classification with graph neural network based centrality measures and f...
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Web Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink StructureWeb Graph Clustering Using Hyperlink Structure
Web Graph Clustering Using Hyperlink Structure
 
Satwik mishra resume
Satwik mishra resumeSatwik mishra resume
Satwik mishra resume
 
2013-imMens-EuroVis
2013-imMens-EuroVis2013-imMens-EuroVis
2013-imMens-EuroVis
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for ai
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
ADAPTER
ADAPTERADAPTER
ADAPTER
 

Recently uploaded

Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 

Recently uploaded (20)

Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 

Scalable Graph Convolutional Network Based Link Prediction on a Distributed Graph Database Server

  • 1. Scalable Graph Convolutional Network based Link Prediction on a Distributed Graph Database Server Anuradha Karunarathna, Dinika Senarath, Shalika Madhushanki, Chinthaka Weerakkody, Miyuru Dayarathna, Sanath Jayasena, and Toyotaro Suzumura 21/10/2020 IEEE International Conference on Cloud Computing 2020 University of Moratuwa, Sri Lanka WSO2, Inc. USA IBM T.J. Watson Research Center, USA MIT-IBM Watson AI Lab, USA Barcelona Supercomputing Center, Spain
  • 2. Introduction ▰ Graphs are rich data structures ▰ Graph data enables a wide variety of applications ▰ Link prediction in graph databases has become a prominent research area 2 Online social networks Protein-interaction networks Computational biology Cyber security Transportation systems Human Protein Interaction Network (P.M. Kim et al, 2007).
  • 3. Graph Convolutional Neural Networks ● Until very recently little attention has been made to the generalization of neural network models to graph structured data [1] ● Graph Convolutional Network (GCN) is an improvement made over Convolutional Neural Networks with the aim of encoding graphs 3 [1] https://tkipf.github.io/graph-convolutional-networks/
  • 4. Why Graph Link Prediction? Link prediction predicts whether there will be links between two nodes based on the attribute information and the observed existing link Information. ▰ Recommendation Systems ▰ Interaction discovery - (bioinformatics) ▰ Route planning - (aircraft route planning) ▰ Help to find hidden terrorist criminal gangs 4
  • 5. Presentation Outline ▰ Introduction ▰ Research Problem (Link Prediction Performance) ▰ Proposed Solution (Scheduling algorithm) ▰ Related Work ▰ Methodology ▰ Evaluation ▰ Conclusion 5
  • 6. Research Problem Conduct efficient scheduling of link prediction tasks on large attributed graphs? 6 Graphs are used in many applications Graph datasets have become too large Expensive in terms of storage and computational Time for link prediction ★ Distribute graphs ★ Perform link prediction on distributed graphs but
  • 7. Proposed Solution and Contributions ▰ Distribute graphs across multi-machine clusters and conduct deep learning and link prediction on distributed graph partitions ▰ Develop a scheduling algorithm to conduct GCN training process of the graph partitions in the worker nodes 7
  • 8. Objectives ▰ Develop a Link prediction application on top of a distributed graph database server - JasmineGraph [1] ▰ Our approach has ▻ High accuracy by considering graph structure + node features ▻ Computational efficiency ▻ Effective Communication Management 8 [1] M. Dayarathna (2018), miyurud/jasminegraph, GitHub. [Online]. Available: https://github.com/miyurud/jasminegraph .
  • 9. Related Work 9 No Related Work Relatedness Limitation 1. Link prediction using heuristics [16] Eg: Common Neighbour, Jaccard coefficient, Katz index Link prediction mechanism on graphs ● Finding one heuristic which can be applied for any generic graph ● Ignoring explicit features of the graph ● Only consider the graph structure ● Capture a small set of structure patterns 2. SEAL [26] Link prediction based on local subgraphs using a graph neural network ● Use matrix factorization for node embeddings (train and optimize embedding vector of each node) ● Huge number of parameters because number of node parameters are linear with graph size [16] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007. [26] M. Zhang and Y. Chen. Link prediction based on graph neural networks. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, NIPS’18, pages 5171–5181,USA, 2018. Curran Associates Inc
  • 10. Related Work (Contd.) 10 3. GraphSAGE [10] (Graph SAmple and aggreGatE) Inductive node embedding generation based on GCN Training on local sub-graphs is possible, but requires entire graph to be loaded to the memory 4. Pytorch BigGraph [14] Distributed graph training mechanism ● High number of buckets (If nodes are partitioned to p partitions, there are p^2 buckets) ● Random node partitioning ● Shared file system 5. Euler [1] Distributed graph training mechanism ● Distributed graph training mechanism [1] Alibaba. Euler. URL: https://github.com/alibaba/euler , 2019. [10] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, 2017, pp. 1024–1034. [14] A. Ler er, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. CoRR, abs/1903.12287, 2019.
  • 11. Methodology - JasmineGraph ▰ JasmineGraph Distributed Database Server [1] ▻ Partitions and stores graph data using one of the Metis/hash/etc. partitioning approaches 11 [1] https://github.com/miyurud/jasminegraph
  • 12. Methodology (Contd.) ▰ Node Embedding Generation 12 ▻ Training process happens in database server partitions ▻ Using GraphSAGE ▻ Implemented in Tensorflow ▻ Embeddings are written to the model store
  • 13. Methodology (Contd.) ▰ We use localized graph convolution modules to train each graph partition ▰ The training happens in an unsupervised manner 13
  • 14. Methodology (Embedding Generation for a Graph Partition) ▰ The training process is initiated with the concatenation of local store and its corresponding central store partition ▰ The concatenated graph and the graph structure is input to the neural network ▰ The hidden layers of the GNN are structured in a way to transform the node features and aggregate the features over the graph to generate node embeddings. 14
  • 15. Methodology (Contd.) ▰ Link Prediction using generated node embeddings ▻ Inter worker communication to collect node embeddings ▻ No linear comparison with all other nodes (Time Complexity – O(N)) ▻ Apply Locality Sensitive Hashing to rank predictions 15
  • 16. Link Prediction Algorithm ▰ Accepts a starting node denoted as query node (q) and it points out a list of predicted nodes as the output ▰ Random projection method of LSH 16
  • 17. Scheduling Algorithm ▰ Decides which partitions can be trained parallely with the available memory ▰ Two main objectives, ▻ Utilize the available memory optimally ▻ Finish training all partitions in minimum number of iterations ▰ Bin Packing Problem - given n different items with weights, and bins with capacity c, assign each item to a bin in a manner that number of total bins is minimized. ▻ Bins - Training Iterations ▻ Capacity - Available Memory ▻ Items - Graph Partitions ▻ Weights - Memory requirement of each partition 17
  • 18. Data Sets 18 Dataset Number of vertices Number of edges Number of features Edgelist File Size (MB) Feature File Size (MB) Twitter 81,306 1,768,149 1007 16 157 Amazon Small 548,551 1,244,636 250 19.4 266 Reddit 232,965 11,606,919 602 145 270 DBLP-V11 4,107,340 36,624,464 948 508 9523
  • 20. Experiments and the environments 20 Experiments Vertical Scalability Horizontal Scalability Node Embedding Accuracy Graph Training Time Experiment s Node Embedding Accuracy Graph Training Time Experiments Server Specification CPU 80 RAM 64GB OS Ubuntu 16.04.6 LTS Disk 1.8TB Master Specification CPU 4 RAM 16GB OS Ubuntu 16.04.6 LTS Disk 100GB CPU 8 RAM 30GB OS Ubuntu 16.04.6 LTS Disk 10GB
  • 21. Graph Training Experiments - Vertical Scalability 21
  • 22. Graph Training Experiments - Vertical Scalability 22
  • 23. Graph Training Experiments - Horizontal Scalability 23
  • 24. Training Times for Different Graph Partitions 24 Our approach could run Training process on the Partitioned DBLP-V11
  • 25. Accuracy Comparison Experiments ▰ Mean Reciprocal Rank (MRR value) value ▰ Hit @ 1 score ▰ Hit @ 10 score 25
  • 29. Training Accuracy of Different Graph Partitions 29
  • 30. Conclusion ▰ Current graph link prediction approaches cannot scale well when the datasets are large ▰ A solution is to perform link prediction distributedly ▰ Densely connected components play critical role in determining the performance of the overall training process 30
  • 31. Conclusion (Contd.) ▰ JasmineGraph was able to train a GCN from the largest dataset DBLP-V11 (> 9.3GB) in 11 hours and 40 minutes time using 16 workers on a single server. ▰ Reddit was processed by the original GraphSAGE implementation in 238 minutes while JasmineGraph took only 100 minutes on the same hardware with 16 workers leading to 2.4 times improved performance ▰ Future work - graph stream processing, privacy preserving machine learning 31