•1 like•872 views

Report

Share

Download to read offline

Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings Full paper @CIKM 2018 Rana Hussein, Dingqi Yang and Philippe Cudre-Mauroux

Follow

- 1. Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings Rana Hussein, Dingqi Yang and Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg, Switzerland 27th ACM International Conference on Information and Knowledge Management (CIKM 2018)
- 2. Graph Embeddings • Represent nodes in a graph using a vector space. • Learn a latent space representation of the graph structure and node interactions. • Community detection • Friendship recommendation • User interest prediction 2Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710.
- 3. Graph Embeddings Techniques • One of the typical approaches is Random Walk + SkipGram like model. 3 Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
- 4. Heterogeneous Graphs • Heterogeneous Graphs contain multiple node types: • Homogeneous edges: linking nodes from the same domain • Heterogeneous edges: linking nodes across different domains 4 • The proximity among nodes is based on semantics.
- 5. Heterogeneous Graph embeddings • A meta-path is a sequence of node types encoding key composite relations among the involved node types. • Meta-paths are used to guide random walks to redefine the neighborhood of a node. • Metapath2vec (KDD 2017) 5Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
- 6. Challenges • How to select meta-paths ? • Graph specific and highly depends on prior knowledge from domain experts. • Strategies to combine a set of meta-paths can be complex and computationally expensive. • The choice of meta-paths highly affects the quality of the learnt node embeddings for a specific task. 6
- 8. 8 • We propose a two level graph embeddings technique for HIN: • Step 1: Random Walk with JUmp and STay strategies to probabilistically control the random walk. • Step 2: Learn node embeddings with SkipGram model. JUST - Heterogeneous Graph Embeddings technique
- 9. Random Walk with JUmp and STay strategies (JUST) 1- Jump or stay? 9 • Objective: Balance the number of heterogeneous and homogeneous edges traversed during random walks. • α ∈ [0, 1] is an initial stay probability. • refers to the number of nodes consecutively visited in the same domain.
- 10. Random Walk with JUmp and STay strategies (JUST) 2- Where to Jump? 10 • Objective: Control the randomness in choosing a target domain. • Define a fixed length queue Qhist to memorize up-to-m previously visited domains.
- 11. • For each node in the graph, we initialize a random walk, until the maximum length is reached. • Maximize the co-occurance probability of two nodes appearing within a context window in the random walk using SkipGram model. 11 Random Walk with JUmp and STay strategies (JUST)
- 12. Experimental evaluation - Datasets DBLP Movie Foursquare 12
- 13. Experimental evaluation - Baselines • Homogeneous graph embedding techniques: • Deepwalk • LINE • Heterogeneous graph embedding techniques : • PTE • Metapath2vec • Hin2vec • JUST_no_memory (simplified version of our proposed method) 13
- 14. Node classification results 14 JUST achieves state of the art performance, and outperforms the baselines.
- 15. Node clustering results 15 JUST outperforms the baselines on all datasets. Combining several meta-paths may not consistently outperform manually selecting one meta-path. DeepWalk LINE Hin2vecPTE Metapath2vec JUSTJUST_no_memory DBLP MovieFoursquare
- 16. Impact of initial stay probability α 16 • Balances the impact of heterogeneous and homogeneous edges on the learnt embeddings. • Tune α within [0.1,0.9] with a step of 0.1 Suboptimal results for too many heterogeneous or homogeneous edges. Balancing the number of edges is key to learn high quality embeddings. The optimal α lies in the range [0.2,0.4] on all three datasets in both node classification and clustering tasks.
- 17. Runtime Performance • End-to-end node embedding learning time for all random-walk based methods in seconds. 17 DBLP Movie Foursquare DeepWalk 236 333 484 Metapath2vec (original) 965 19,200 2,248 Metapath2vec (ours) 290 408 550 Hin2vec 904 1,301 1,801 JUST 310 442 616 • Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time, but achieves better results in classification and clustering tasks. • Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better results in most experiments.
- 18. Conclusions • Propose JUST, a heterogeneous graph embedding technique using random walks with jump and stay strategies without prior knowledge. • JUST achieves state of the art performance without using meta-paths for classification and clustering tasks. 18 • We plan to investigate how JUST performs on different graph structures, such as: Knowledge Graphs.