Superworkflow of Graph Neural Networks with Kubernetes and Fugue

Superworkflow of
Graph Neural Networks
with Kubernetes and Fugue
Jintao Zhang
Software Engineer, Machine Learning

Agenda
▪ Introduction
▪ Word2Vec Embedding
▪ Node2Vec Superworkflow
▪ Node2Vec Testing
▪ Summary

Knowledge Graphs
Knowledge base forming into graphs
● Google Knowledge Graph (webpage graph for search engine)
● Social graphs (Facebook friends)
● Merchant graphs (transactions, buyers, sellers in e-Commerce)
Complement to traditional machine learning
● A type of ontology
● Graph topology contains critical business insights

Knowledge graphs for e-Commerce and FinTech
● Fraud detection
● Cross-selling and recommendation
● Trending products
Graph Neural Networks
● Node2Vec (transductive node embedding)
● GraphSAGE (inductive node embedding)
Graph Neural Network

Map each node in a graph into a low-dimensional space
• Nodes with similar local neighborhood have similar embeddings
• Only graph topology matters
Node Embedding

Graph Embedding is hard
● Images are fixed size
● Text is linear, and fixed size with sliding window
● Graph node numbering is arbitrary with more complicated structures
Procedure
● Graph creation and indexing
● Compute random walks probabilities
● Simulate random walks of a given length starting from each vertex
● Conduct embedding via word2vec by treating random walks as sentences
Node2Vec

Distributed graph storage with Spark GraphFrames
● Entire graph in memory
● Use adjacency lists to represent a graph in a distributed way
Distributed Node2Vec algorithm
● Distributed Breadth-First Search for random walk
● Cache critical variables for picking next step during BFS
Distributed Node2Vec

• Gensim word2vec can only handle small graphs
• Spark MLlib word2vec module
• Not a fully functioning implementation
• Limited on the number of nodes (< 12MM)
• Running time is not impressive
Limitations of Existing Word2Vec

A small CPU cluster in K8S or a large GPU instance
• Relax the limit on the number of vertices
• Save computing cost
• More efficient computing
PyTorch for Embedding

Graph creation and storage for efficient embedding
● I/O overhead is huge if involving disk
● Distribute graph storage for random access of nodes and edges
● Deep traversal on large graph will quickly drain the stack
Indexing is to convert node labels to a set of sequential integers
● Significantly save memory and storage
● Much better load balancing and data partitioning
● Required for the embedding step
Graph Creation and Indexing

Apply random walk strategy on graph to generate a collection of
node vectors to be used by embedding algorithms
Random Walk

Word2Vec Preprocessing
A set of pre-processing steps required before Word2Vec training
• Word frequency counts
• Word indexing
• Rare words removal
• Word frequency normalization
• Negative sampling
These steps can be largely accelerated by distributed computing

Embedding Training in PyTorch
The training step of Word2Vec embedding is iterative
• Iterative optimization for a given rounds
• Multiple for loops inside
• GPU is critical for runtime performance
Distributed computing is not of much help

Different steps have different degrees of parallelism
• Graph creation and indexing: O(|V| + |E|)
• Random walk: O(n * |E| * L) (n: num of walks starting from each node)
• Word2Vec preprocessing: O(|V| + |E|)
Superworkflow

Fugue: A Superframework
● A pure abstraction layer
● Unify and simplify core concepts of distributed computing
● Decouple your logic from any specific solution
● Easy to learn and easy to switch

Fugue: Optimizations on DAG Execution
● Automatically parallelize independent branches
● Auto persist
● More errors can be captured at “compile” time
● Determinism enables checkpointing, executions can “resume”

● n: number of random walks starting from each node
● L: length of each walk
● p: weight on returning probability
● q: weight on probability to new node
● Word2Vec hyperparameters
○ window size, min word count, iterations
Hyperparameter tuning is parallelable, even in iterative tasks
Hyperparameter Tuning

An Interactive On-demand Spark Ecosystem

● Graph (10 million vertices, 300 million edges)
○ ~3 hours with 500 cores and 3 TB memory
● Graph (100 million vertices, 3 billion edges)
○ ~8 hours with 2,000 cores and 12 TB memory
Node2Vec Testing with Spark MLlib

○ Word2Vec embedding from 1.5 hours to 30 min
○ 32 CPUs + 4 GPUs (Nvidia Tesla P100)
○ Word2Vec embedding from 3.4 hours to 1 hour
○ 96 CPUs + 16 GPUs (Nvidia Tesla P100)
Superworkflow Runtime (PyTorch)

○ Word2Vec preprocessing: 600 CPUs for 25 min
○ Word2Vec embedding: 750 CPU hours --> 20 CPU hours + 2 GPU hours
○ Word2Vec preprocessing: 1,000 CPUs for 35 min
○ Word2Vec embedding: 6,800 CPU hours → 100 CPU hours + 16 GPU
hours
Superworkflow Cost (PyTorch)

Summary Introduce the concept of superworkflow using the
Fugue framework on Kubernetes
Use Node2Vec as a case study for creating
superworkflow step by step, and demonstrate the
advantages of superworkflow
The idea of superworkflow can be easily generalize
to other complex distributed computing problems

Open
Source
Node2Vec on Fugue
https://github.com/fugue-project/node2ve
pip install node2vec-fugue
Fugue
https://github.com/fugue-project/fugue

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Superworkflow of Graph Neural Networks with Kubernetes and Fugue

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Superworkflow of Graph Neural Networks with Kubernetes and Fugue

Similar to Superworkflow of Graph Neural Networks with Kubernetes and Fugue (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Superworkflow of Graph Neural Networks with Kubernetes and Fugue