Representation Learning on Complex Graphs

Representation Learning on Graphs with
Complex Structures
Prof. Dr. Philippe Cudré-Mauroux
eXascale Infolab, U. of Fribourg–Switzerland
DL4G-SDE @ WWW2019
San Francisco, May 13, 2019

Representation Learning on Graphs
■ Projecting nodes of a graph onto a vector space while preserving key
structural properties of the graph (e.g., topological proximity of the nodes)
8/5/192 WWW2019@San Francisco
Neural embedding
techniques
(e.g.word2vec)
…
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31
1
Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pp. 701-710. ACM, 2014.
DeepWalk1

What if the graph at hand exhibits
a much more complex structure?

Outlines
■ JUST: Embedding heterogeneous graphs without meta-paths
[CIKM’18]
■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs
[WWW’19]
■ NodeSketch: Highly-efficient graph embeddings via recursive
sketching [KDD’19]

Heterogeneous Graphs
■ Heterogeneous Graphs contain multiple node types:
● Homogeneous edges: linking nodes from the same domain
● Heterogeneous edges: linking nodes across different domains

Meta-Paths in Heterogeneous Graphs
■ A meta-path is a sequence of node types encoding key composite relations among the
involved node types.
■ Meta-paths are used to guide random walks to redefine the neighborhood of a node.
1
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
Metapath2vec1
Neural embedding
techniques
(e.g.word2vec)
…
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31

Challenges with Meta-Paths
■ The choice of meta-paths highly affects the quality of the learnt node
embeddings for a specific task.
■ How to select meta-paths ?
● Graph specific and highly depends on prior knowledge from domain experts.
● Strategies to combine a set of meta-paths can be complex and computationally
expensive.

Are meta-paths necessary?

JUST: Embedding Heterogeneous Graphs without Meta-Paths
■ Random Walk with JUmp and STay strategies to probabilistically control the
random walk.
■ 2 ways to balance the random walk:
● Step I: Jump or stay?
−Objective: Balance the number of heterogeneous and homogeneous edges traversed during
random walks (stay with probability 𝝰, exponential decay).
● Step II: If Jump, where to Jump?
−Objective: Control the randomness in choosing a target domain
(memory window to favor diversity).
■ Learn node embeddings with SkipGram model.

Results
JUST achieves state-of-the-art performance without using meta-paths.
Node classification results

Runtime Performance
■ End-to-end node embedding learning time for all random-walk based
methods in seconds.
DBLP Movie Foursquare
DeepWalk 236 333 484
Metapath2vec (original) 965 19,200 2,248
Metapath2vec (ours) 290 408 550
Hin2vec 904 1,301 1,801
JUST 310 442 616
• Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time, but achieves
better results in classification and clustering tasks.
• Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better results in most
experiments.

Outlines
[CIKM’18]
[WWW’19]

Social Relationships v.s. Human Mobility

How to quantify the impact of social relationships and
mobility on each other?

● Two types of links
−Friendships
−Check-ins (Hyperedges)
Location Based Social Networks
■A hypergraph with
● Four data domains
Spatial
- POI
Temporal
- Time slot
Semantic
- Activity category
Social
- User

Hypergraph Embedding
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31
045 0.89 1.56 0.02 0.79
…
Graph embedding
Neural embedding
techniques
(e.g. SkipGram)
1. How to sample from a
LBSN hypergraph?
2. How to preserve n-wise
proximity from Hyperedges?

1. Sample from A Hypergraph: Random Walk with Stay
■ Balancing the impact of social and mobility on the learnt embeddings
Sample and learn from
• A check-in hyperedge with probability 𝛼
• A user-user pair with probability (1-𝛼)

2. Learn from Hyperedges: Learning via Best-Fit-Line
■ Maximizing the similarity between the nodes of a hyperedge and their
best-fit-line under cosine similarity.
1. Compute the best-fit-line
2. Maximize the cosine similarity between each node
and the best-fit-line

Task I: Friendship Prediction
■ Comparison with other graph embedding techniques
● (S) Social network only
● (S&M) Social and mobility through clique expansion
↑ 32.95% on
precision@10
Clique expansion

Task II: Location Prediction
■ Comparison with other graph embedding techniques
● (M) Mobility (Check-in) network only
● (S&M) Social and mobility through clique expansion
↑ 25.32% on
accuracy@10

8/5/19 WWW2019@San Francisco21
Balancing the Impact of Social Relationships and Mobility Matters!
Asymmetric impact of mobility and social relationships on predicting each other:
• Friendship prediction: 80% social and 20% mobility data
• Location prediction: 60% social and 40% mobility data

Outlines
[CIKM’18]
[WWW’19]

Graph Embeddings
■ Graph-sampling based techniques
● Sample node pairs from a graph, and preserve node proximity from the node pairs
● Examples: DeepWalk, Node2Vec, LINE, SDNE and VERSE, etc.
● Efficiency bottleneck: A large number of node pairs -> significant computation resources (CPU time)
■ Factorization based techniques
● Factorize a (transformed, e.g., high-order) proximity/adjacency matrix of a graph
● Examples: GraRep, HOPE and NetMF, etc.
● Efficiency bottleneck: Large matrix factorization -> significant computation resources (both CPU time and
RAM)
■ Node proximity preserved using cosine similarity
● Efficiency bottleneck: cosine similarity is less efficient than hamming similarity, for example.

Similarity-Preserving Hashing/Sketching
■ Efficient similarity approximation of high dimensional data
● Data-dependent hashing (learning-to-hash)
−Learning dataset-specific hashing functions
−Examples: spectral hashing, iterative quantization, etc.
−Efficient in similarity computation, but requires learning hashing functions
● Data-independent hashing/sketching (locality sensitive hashing)
−Hashing without involving any learning process from data
−Examples: minhash, consistent weighted sampling, etc.
−Efficient in both similarity approximation and hashing

Can we sketch nodes in a graph as embeddings?

Preliminary: Consistent Weighted Sampling1
■ Principled techniques for highly-efficient similarity approximation
The min-max similarity
between original data
Can be approximated by the
Hamming similarity between
sketches
1.32 2.77 1.11 3.29 1.31V
Sketch S = S1 … Sj … SL
D=5 Random hash
function hj , j=1…,L.
1
Dingqi Yang, Bin Li, Rettig Laura, Philippe Cudré-Mauroux, D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms,
IEEE Transactions on Knowledge and Data Engineering (TKDE) 2018

Sketching the Adjacency Matrix ?
■ Adjacency matrix v.s. Self-Loop-Augmented (SLA) adjacency matrix

NodeSketch: Low-Order Node Embeddings
1
2
3
4 5

NodeSketch: High-Order Node Embeddings
1 1
0.33 0.33 0.33
Neighbors
𝒏 ∈ 𝜞 𝒓
Node 2 2 3 1
SLA adjacency vector '𝑽 𝒓
Sketch element distribution
𝟏
𝑳
∑𝒋-𝟏
𝑳
𝕝[𝑺 𝒋
𝒏
𝒌2𝟏 -𝒊], 𝑖=1,..,D
1.066 1.066 0.066
Approximate 𝑘-order
SLA adjacency vector '𝑽 𝒓
(𝒌)
node 1
Sketching using Eq. 3
*Weight
α=0.2
Merge
1 1
1 1 1
1 1 1 1
1 1
1 1
SLA adjacency
matrix '𝑨
2 1 1
2 3 1
2 3 4
4 3 4
5 3 5
(𝑘-1)-order node
embeddings 𝑺(𝒌 − 𝟏)
𝑘-order
embeddings 𝑺(𝒌)
2 1 3
2 3 4
2 3 4
2 3 4
4 3 5
(𝑘-1)-order Sketches
𝑺 𝒏
(𝒌 − 𝟏)
… … …
Uniformity of the generated samples:
The foundation of our recursive sketching process
1
2
3
4 5

Results: Node Classification Performance using Kernel SVM
Classical graph
embedding techniques
(preserving cosine
similarity)
Learning-to-hash
techniques
Sketching
techniques
NodeSketch shows comparable performance to the best-performing state-of-the-art techniques.

Results: Runtime Performance
NodeSketch is highly-efficient, and significantly
outperforms all baselines, showing 9x-273x speedup.
Hamming similarity also shows improved efficiency (1.19x-
1.68x speedup) over cosine similarity.

Take-Away Messages
■ JUST: Meta-path free heterogeneous graph embedding can achieve state-
of-the-art performance efficiently. [CIKM’18]
■ LBSN2Vec: Asymmetric impact of social and mobility on each other
[WWW’19]
■ NodeSketch: High-quality node embeddings can be generated via highly-
efficient sketching techniques [KDD’19]
[CIKM’18] Hussein, Rana, Dingqi Yang, and Philippe Cudré-Mauroux. "Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings." CIKM’18.
[WWW’19] Dingqi Yang, Bingqing Qu, Jie Yang, Philippe Cudre-Mauroux, ”Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach.” WWW’19.
[KDD’19] Dingqi Yang, Paolo Rosso, Bin Li and Philippe Cudre-Mauroux, “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching.” KDD’19.

Future Plan for Representation Learning on Graphs
■ Attributed graph structure (e.g., property graphs)
■ Heterogeneous data structures (e.g., structured knowledge graph + unstructured text)
■ Dynamic graphs (e.g., streaming LBSN graphs)
4/29/19 Dingqi's job talk @ University of Luxembourg33

Representation Learning on Complex Graphs

More Related Content

What's hot

Similar to Representation Learning on Complex Graphs

More from eXascale Infolab

Recently uploaded

Representation Learning on Complex Graphs