Here are some potential ways to leverage meta-path incompatibility in graph embedding models:
1. Generate random walks constrained by meta-path incompatibility. Only allow transitions between nodes/edges that are compatible based on the meta-paths.
2. Learn separate embeddings for each meta-path/aspect. Model incompatibility by having the embeddings for incompatible meta-paths be orthogonal/dissimilar.
3. Incorporate meta-path incompatibility directly into the objective function, e.g. by maximizing agreement between compatible meta-paths and minimizing agreement between incompatible ones.
4. Sample negative examples differently based on meta-path incompatibility. Only use nodes/edges reachable by compatible meta-paths as negative examples
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Easing embedding learning by comprehensive transcription of heterogeneous information networks
1. Easing Embedding Learning by
Comprehensive Transcription of
Heterogeneous Information Network
Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Presenter: Zhiwei (Jim) Liu
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks, KDD’18
2. Road Map
• Background: Network Embedding + HIN
• Preliminary
• Proposed Model
• Experiment
• Conclusion and Future work
• Q&A
5. Network Embedding
[1] W. Zachary. An information flow model for conflict and fission in small groups1. Journal of anthropological
research, 33(4):452–473, 1977.
6. DeepWalk
• Algorithm: Random Walk + Skip-gram Model
[1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
7. LINE
• Algorithm: First-order + Second-order Proximity
[1] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW,
2015.
• First-order Proximity:
Local Pairwise Similarity
• Second-order Proximity:
Neighborhood structure
similarity
8. node2vec
• Algorithm: Random Walk with two balance parameters
[1] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In ACM SIGKDD.
• Return parameter: p
• In-out parameter: q
10. Homogeneous Network Embedding
• No type structure
• No side information
• Types are always compatible?
• …
Heterogeneous Information Network
11. DeepWalk
• Algorithm: Random Walk + Skip-gram Model
[1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
• Random Walk over the connection
• Only one type of connection
• Only one type of node
13. Meta-path on HIN
[1] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- based top-k similarity search in heterogeneous
information networks,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
17. Incompatibility
• Closeness under different
metric
• User-director and user-genre
type is incompatible
• Incompatible connections
cannot be close at the same
time in one metric space
18. HEER model
• Comprehensive transcription of HINs in embedding learning
• Dealing with the semantic incompatibility of connection in HINs
• Leveraging the edge representation and heterogeneous metrics
• And neural network model for learning both node and edge
representation
22. Notations
• only one node type can be associated with a certain end of an edge
type
Edge type 𝑟
E.g., Director Fatih Akin living in Germany,
Movie In the Fade being produced in Germany
23. HIN Embedding Definition
• Given an HIN, 𝐺 = (𝒱 , ℰ; 𝜑, 𝜓), 𝑣 ∈ 𝒱, 𝑢, 𝑣 ∈ ℰ;
• Learning a node embedding mapping, 𝑓 𝑣 : 𝒱 → ℝ 𝑑 𝒱
• Learning an edge embedding mapping, 𝑔(𝑢, 𝑣): 𝒱 × 𝒱 → ℝ 𝑑ℰ
• A node pair can be of multi-type, 𝑔 𝑢, 𝑣 encapsulate such
information
25. Typed closeness
• Node pair, 𝑢, 𝑣 , edge embedding g 𝑢𝑣,
• 𝜇 𝑟 is an edge-type-specific vector to be inferred which
represents the metric coupled with this type
• Compatible edge types share similar 𝜇 𝑟
33. Dataset
• DBLP[1]: Bibliographical network
• Five types of nodes: author, paper, key term, venue, and year
• Edge types: author—paper, term—paper, year—paper, venue—paper,
paper—>paper (directed)
• YAGO[2]: Large scale knowledge graph
• Seven types of nodes:person, location, organization, piece ofwork, prize,
position, and event;
• 24 Edge types.
[1] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD.
[2] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW
35. Baselines
• LINE
• AspEm: Old version of HEER, embeddings learned independently for
each aspect(metric)
• Metapath2vec++
• Pretrained + logit: logistic regression model for each edge type
[1] Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. AspEm: Embed- ding Learning by Aspects in Heterogeneous
Information Networks.. In SDM.
42. Experiment analysis
• Modeling Incompatibility benefits embedding quality
• YAGO has much more (sophistic) incompatible types
• Heterogeneous metrics helps improving embedding quality
• HEER more prone to suffering from over-fitting at knock-out rate=0.8
46. HEER model
• Comprehensive transcription of HINs in embedding learning
• Dealing with the semantic incompatibility of connection in HINs
• Leveraging the edge representation and heterogeneous metrics
• And neural network model for learning both node and edge
representation
47. Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
49. Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
• Incompatibility need designing manually
51. Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
• Incompatibility learned from network? Not just “drop-out”
• Edge embedding function is too weak to maintain the edge
information
• More experiment to verify the embedding
• Meta-path incompatibility? (YAGO)
• …
53. Open discussion
• How to build an graph embedding model leveraging the meta-path
incompatibility?
• Random Walk over meta-paths?
• Probability distribution? E.g. Skip-gram model