NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation learning for heterogeneous networks", KDD 2017
1. Ho-Beom Kim
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: hobeom2001@catholic.ac.kr
2023 / 07 / 24
DONG, Yuxiao; CHAWLA, Nitesh V.; SWAMI, Ananthram.
ACM SIGKDD
2. 2
Introduction
Problem Statements
• Neural network-based learning models can represent latent embeddings that capture the internal relations
of rich, complex data across various modalities, such as image, audio, and language.
• Social and information networks are similarly rich and complex data that encode the dynamics and types of
human interactions, and are similarly amenable to representation learning using neural networks.
• Recent research publications have proposed word2vec-based network representation learning frameworks
• DeepWalk, LINE, node2vec
→ These work has far focused on representation learning for homogeneous networks-representative of
singular type of nodes and relationships.
• A large number of social and information networks are heterogeneous in nature, involving diversity of node
types and/or relationships between nodes
3. 3
Introduction
Problem Statements
• The aforementioned works commonly concentrate more on user behaviors of recent times
→ They are not capable of fully mining older behavior-sequences to accurately estimate their current
interests.
• There two major challenges in sequential recommendation that have not been well-addressed so far
1. User behaviors in long sequences reflect implicit and noisy preference signals
2. User preferences are always drifting over time due to their diversity
To address these two challenges, they propose a graph-based method with graph convolutional networks
to extract implicit preference signals
Sequence Graph
Design a attentive graph convolutional network
Dynamic pooling technique
4. 4
Introduction
Contributions
• They approach sequential recommendation from a new perspective by taking into consideration the
implicit-signal behaviors and fast-changing preferences
• They Propose to aggregate implicit signals into explicit ones from user behaviors by designing graph
neural network-based models on constructed item-item interest graphs. Then they design dynamic-pooling
for filtering and reserving activated core preferences for recommendation
• They conduct extensive experiments on two large-scale datasets collected from real-world applications.
The experimental results show significant performance improvements compared with the state-of-the-art
methods of sequential recommendation. Further studies also verify that our method can model long
behavioral sequences effectively and efficiently
5. 5
Introduction
Problem Formulation
• They approach sequential recommendation from a new perspective by taking into consideration the
implicit-signal behaviors and fast-changing preferences
• They Propose to aggregate implicit signals into explicit ones from user behaviors by designing graph
neural network-based models on constructed item-item interest graphs. Then they design dynamic-pooling
for filtering and reserving activated core preferences for recommendation
• They conduct extensive experiments on two large-scale datasets collected from real-world applications.
The experimental results show significant performance improvements compared with the state-of-the-art
methods of sequential recommendation. Further studies also verify that our method can model long
behavioral sequences effectively and efficiently
6. 6
Problem Definition
Definition
A : authors
P : papers
V : venues
O : organizations
→ Nodes
A-A : coauthor
A-P,P-V : publish
O-A : affiliation
7. 7
Problem Definition
Problem1 : Heterogeneous Network Representation Learning
• The task is to learn the d-dimensional latent representations X ∈ ℝ 𝑉 X𝑑
, 𝑑 ≪ 𝑉 that are able to capture
the structural and semantic relations among them
• The output : the low-dimensional matrix X with the vth row-a d-dimensional vector 𝑋𝑣-corresponding to the
representation of node v
• There are different types of nodes in V their representations are mapped into the same latent space
• The learned node representations can benefit embedding vector of each node can be used as the feature
input of node classification, clustering, and similarity search tasks
• The main challenge of this problem comes from the network heterogeneity, wherein it is difficult to directly
apply homogeneous language and network embedding methods
• The premise of network embedding models is to preserve the proximity between a anode and its
neighborhood.
8. 8
Related Works
PTE
• Fist learns a low dimensional embedding for words through a heterogeneous text network.
• The network encodes different levels of co-occurrence information between words and words, words and
documents, and words and labels.
• The network is embedded into a low dimensional vector space that preserves the second-order proximity
between the vertices in the network.
• Proposed to learn predictive text embeddings in a semi-supervised manner.
• Unlabeled data and labeled information are integrated into a heterogeneous text network which
incorporates different levels of cooccurrence information in text
• Propose PTE, which learns a distributed representation of text through embedding the heterogeneous text
network into a low dimensional space
• PTE uses the average value of word embeddings
• PTE considers the linear relationship between words and labels
9. 9
Related Works
PTE
• Deep neural networks fully leverage labeled information that is available for a task when they learn the
representations of the data
• Most text embedding methods are not able to consider labeled information when learning the
representations
• Previous embedding models using unsupervised text embeddings
→ Generalizable for different tasks but have a weaker predictive power for a particular task
• Text embedding methods are much more efficient, are much easier to tune, and naturally accommodate
unlabeled data
13. 13
Experimental Setup
Experiments
The number of walks per node : 1000;
The walk length l : 1000;
The vector dimension d: 128 (LINE: 128 for each order);
The neighborhood size k: 7;
The size of negative samples: 5
23. 23
Conclusions
Conclusions
• To address the network heterogeneity challenge, we propose the metapath2vec and metapath2vec++
methods.
• To leverage this method, we formalize the heterogeneous neighborhood function of a node, enabling the
skip-gram-based maximization of the network probability in the context of multiple types of nodes
• Finally, we achieve effective and efficient optimization by presenting a heterogeneous negative sampling
technique.
Future work
1. Face the challenge of large intermediate output data when sampling a network into a huge pile of paths, and t
identifying and optimizing the sampling space is an important direction
2. As is also the case with all metapath-based heterogeneous network mining methods, metapath2vec and
metapath2vec++ can be further improved by the automatic learning of meaningful meta-paths
3. Extending the models to incorporate the dynamics of evolving heterogeneous networks
4. Generalizing the models for different genres of heterogeneous networks
Editor's Notes
Nt(v) : v’s neighborhood with the tth type of nodes
P : a softmax function
p(ct|v; θ) = exp(Xct · Xv) / Σu∈V exp(Xu · Xv)
Xv is the v th row of X
Given a negative sample size M
σ (x) = 1 / (1 + exp(-x)) and P(u) is the pre-defined distribution
Metapath2vec builds the node frequency distribution by viewing different types of nodes homogeneously and draws (negative) nodes regardless of their types.
한 저자 노드 a4의 이웃은 다른 저자들 (예: a2, a3 및 a5), 학회 (예: ACL 및 KDD), 기관 (CMU 및 MIT)뿐만 아니라 논문들 (예: p2 및 p3)과도 구조적으로 가까울 수 있습니다.
메타패스투벡스는 다른 유형의 노드를 동질적으로 간주하여 노드 빈도 분포를 구성하고, 노드의 유형과 상관없이 (부정적인) 노드를 뽑아냅니다.
we can put random walkers in a heterogeneous network to generate paths of multiple types of nodes. At step i, the transition probability p(v i+1 |v i ) is denoted as the normalized probability distributed over the neighbors of v i by ignoring their node types.
In light of these issues, we design meta-path-based random walks to generate paths that are able to capture both the semantic and structural correlations between dierent types of nodes, facilitating the transformation of heterogeneous network structures into metapath2vec’s skip-gram.
a meta-path scheme P is defined as a path that is denoted in the form of V1 R1 −−→ V2 R2 −−→ · · ·Vt Rt −−→ Vt+1 · · · Rl−1 −−−−→ Vl ,
R = R1 ◦ R2 ◦ · · · ◦ Rl−1 denes the composite relations between node types V1 and Vl
gradients are derived
Ict [u m t ] is an indicator function to indicate whether u m t is the neighborhood context node ct and when m = 0, u 0 t = ct .
The model is optimized by using stochastic gradient descent algorithm.
In metapath2vec, metapath2vec++, They need to specify the meta-path scheme to guide random walks
When predicting for the venue category, the advantage of both metapath2vec and metapath2vec++ are particular strong given a small size of training data.
The advantage of the proposed methods lies in their proper consideration and accommodation of the network heterogeneity challenge—the existence of multiple types of nodes and relations
Figure 3 shows the classification results as a function of one chosen parameter when the others are controlled for.
we leverage the k-means algorithm to cluster the data and evaluate the clustering results in terms of normalized mutual information (NMI)
experiments are conducted 10 times and the average performance is reported.\
metapath2vec and metapath2vec++ generate more appropriate embeddings for dierent types of nodes in the network than comparative baselines, suggesting their ability to capture and incorporate the underlying structural and semantic relationships between various types of nodes in heterogeneous networks.
Further, dierent from the positive eect of increasing w and l on author clustering, d and k are negatively correlated with the author clustering performance, as observed from Figures 4(c) and 4(d)
Similarly, the venue clustering performance also shows an descending trend with an increasing d, while on the other hand, we observe a rst-increasing and then-decreasing NMI line when k is increased. Both gures together imply that d = 128 and k = 7 are capable of embedding heterogeneous nodes into latent space for promising clustering outcome.
Table 5 lists the top ten similar results for querying the 16 leading conferences in corresponding computer science sub-fields
for the query “ACL”, for example, metapath2vec++ returns venues with the same focus—natural language processing, such as EMNLP (1st ), NAACL (2nd ), Computational Linguistics (3r d ), CoNLL (4th), COLING (5th)
Similar performance can be also achieved when querying the other conferences from various fields
we find that in most cases, the top three results cover venues with similar prestige to the query one
Similar results can also be observed in Tables 6 which show the similarity search results for the DBIS network.
, Figure 5 visualizes the latent vectors—learned by metapath2vec++—of 48 venues used in similarity search of Section 4.4, three each from 16 sub-elds. We can see that conferences from the same domain are geographically grouped to each other and each group is well separated from others, further demonstrating the embedding ability of metapath2vec++.
Figure 6 shows the speedup of metapath2vec & metapath2vec++ over the single-threaded case
Optimal speedup performance is denoted by the dashed y = x line, which represents perfect distribution and execution