SlideShare a Scribd company logo
1 of 53
Download to read offline
Easing Embedding Learning by
Comprehensive Transcription of
Heterogeneous Information Network
Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Presenter: Zhiwei (Jim) Liu
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks, KDD’18
Road Map
• Background: Network Embedding + HIN
• Preliminary
• Proposed Model
• Experiment
• Conclusion and Future work
• Q&A
Background: Network Embedding + HIN
𝐺 = (𝑉, 𝐸)
𝜙 𝑣 : 𝑉 → 𝑇𝑉
𝜓 𝑒 : 𝐸 → 𝑇𝐸
𝑇𝑦𝑝𝑒 𝑀𝑎𝑝𝑝𝑖𝑛𝑔 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠:
|𝑇𝑉 = 1 𝑎𝑛𝑑 |𝑇𝐸 = 1 ∶ 𝐻𝑜𝑚𝑒𝑔𝑒𝑛𝑒𝑜𝑢𝑠
|𝑇𝑉 > 1 𝑜𝑟 |𝑇𝐸 > 1 ∶ 𝐻𝑒𝑡𝑒𝑟𝑜𝑔𝑒𝑛𝑒𝑜𝑢𝑠
Network Embedding
[1] W. Zachary. An information flow model for conflict and fission in small groups1. Journal of anthropological
research, 33(4):452–473, 1977.
DeepWalk
• Algorithm: Random Walk + Skip-gram Model
[1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
LINE
• Algorithm: First-order + Second-order Proximity
[1] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW,
2015.
• First-order Proximity:
Local Pairwise Similarity
• Second-order Proximity:
Neighborhood structure
similarity
node2vec
• Algorithm: Random Walk with two balance parameters
[1] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In ACM SIGKDD.
• Return parameter: p
• In-out parameter: q
Heterogeneous Information Network (HIN)
[1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Homogeneous Network Embedding
• No type structure
• No side information
• Types are always compatible?
• …
Heterogeneous Information Network
DeepWalk
• Algorithm: Random Walk + Skip-gram Model
[1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
• Random Walk over the connection
• Only one type of connection
• Only one type of node
Heterogeneous Information Network (HIN)
[1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Meta-path on HIN
[1] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- based top-k similarity search in heterogeneous
information networks,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
Metapath2vec
• Homogeneous Skip-Gram
• Heterogeneous Skip-Gram
[1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Metapath2vec(++)
[1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Incompatibility
• Similar nodes via different meta-paths (connections)
• Jaccard Coefficient
Incompatibility
• Closeness under different
metric
• User-director and user-genre
type is incompatible
• Incompatible connections
cannot be close at the same
time in one metric space
HEER model
• Comprehensive transcription of HINs in embedding learning
• Dealing with the semantic incompatibility of connection in HINs
• Leveraging the edge representation and heterogeneous metrics
• And neural network model for learning both node and edge
representation
Preliminary: Notation and Definition
𝐺 = (𝑉, 𝐸)
𝜙 𝑣 : 𝑉 → 𝑇𝑉
𝜓 𝑒 : 𝐸 → 𝑇𝐸
𝑇𝑦𝑝𝑒 𝑀𝑎𝑝𝑝𝑖𝑛𝑔 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠:
|𝑇𝑉 = 1 𝑎𝑛𝑑 |𝑇𝐸 = 1 ∶ 𝐻𝑜𝑚𝑒𝑔𝑒𝑛𝑒𝑜𝑢𝑠
|𝑇𝑉 > 1 𝑜𝑟 |𝑇𝐸 > 1 ∶ 𝐻𝑒𝑡𝑒𝑟𝑜𝑔𝑒𝑛𝑒𝑜𝑢𝑠
Preliminary
Preliminary
• Network:𝐺 = (𝒱 , ℰ; 𝜑, 𝜓)
• Network Schema:G~
= (𝒯, ℛ)
Notations
• only one node type can be associated with a certain end of an edge
type
Edge type 𝑟
E.g., Director Fatih Akin living in Germany,
Movie In the Fade being produced in Germany
HIN Embedding Definition
• Given an HIN, 𝐺 = (𝒱 , ℰ; 𝜑, 𝜓), 𝑣 ∈ 𝒱, 𝑢, 𝑣 ∈ ℰ;
• Learning a node embedding mapping, 𝑓 𝑣 : 𝒱 → ℝ 𝑑 𝒱
• Learning an edge embedding mapping, 𝑔(𝑢, 𝑣): 𝒱 × 𝒱 → ℝ 𝑑ℰ
• A node pair can be of multi-type, 𝑔 𝑢, 𝑣 encapsulate such
information
Proposed Model: HEER
Typed closeness
• Node pair, 𝑢, 𝑣 , edge embedding g 𝑢𝑣,
• 𝜇 𝑟 is an edge-type-specific vector to be inferred which
represents the metric coupled with this type
• Compatible edge types share similar 𝜇 𝑟
Objective Function
• KL-divergence between the original weights and embedding similarity
• Overall objective function
Architecture
Details in the HEER Model
• Edge embedding
• Node embedding
Details in the HEER Model
• Type filter can distinguish the compatibility between edge types
Architecture
Details in the HEER Model
• Type filter can distinguish the compatibility between edge types
• Negative sampling
Experiment: Reconstruction + Case study
Dataset
• DBLP[1]: Bibliographical network
• Five types of nodes: author, paper, key term, venue, and year
• Edge types: author—paper, term—paper, year—paper, venue—paper,
paper—>paper (directed)
• YAGO[2]: Large scale knowledge graph
• Seven types of nodes:person, location, organization, piece ofwork, prize,
position, and event;
• 24 Edge types.
[1] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD.
[2] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW
YAGO
Baselines
• LINE
• AspEm: Old version of HEER, embeddings learned independently for
each aspect(metric)
• Metapath2vec++
• Pretrained + logit: logistic regression model for each edge type
[1] Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. AspEm: Embed- ding Learning by Aspects in Heterogeneous
Information Networks.. In SDM.
Edge Reconstruction
• Evaluation method: Mean Reciprocal Rank (MRR)
• Task Goal: Knock-out + Associated by type-𝑟 edge
Edge Reconstruction (DBLP)
Edge Reconstruction (DBLP)
Edge Reconstruction (YAGO)
Knock-out rate (DBLP)
Knock-out rate (YAGO)
Experiment analysis
• Modeling Incompatibility benefits embedding quality
• YAGO has much more (sophistic) incompatible types
• Heterogeneous metrics helps improving embedding quality
• HEER more prone to suffering from over-fitting at knock-out rate=0.8
Learned Heterogeneous Metrics (DBLP)
Learned Heterogeneous Metrics (YAGO)
Conclusion and Future work
HEER model
• Comprehensive transcription of HINs in embedding learning
• Dealing with the semantic incompatibility of connection in HINs
• Leveraging the edge representation and heterogeneous metrics
• And neural network model for learning both node and edge
representation
Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
Learned Heterogeneous Metrics (DBLP)
Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
• Incompatibility need designing manually
Architecture
Future Work
• Different metrics but not exact represented
• Heat map: reference with term and the term year relationship
• Incompatibility learned from network? Not just “drop-out”
• Edge embedding function is too weak to maintain the edge
information
• More experiment to verify the embedding
• Meta-path incompatibility? (YAGO)
• …
Q&A
Open discussion
• How to build an graph embedding model leveraging the meta-path
incompatibility?
• Random Walk over meta-paths?
• Probability distribution? E.g. Skip-gram model

More Related Content

What's hot

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mappingbutest
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...MOVING Project
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analyticsFarheen Nilofer
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsMauro Dragoni
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
 
Study of Social Network Sites in crises
Study of Social Network Sites in crisesStudy of Social Network Sites in crises
Study of Social Network Sites in crisesPablo Acuña
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging DataHang Dong
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Rich Heimann
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
 
Digital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesDigital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesCornelius Puschmann
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 

What's hot (20)

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World Settings
 
Linked Data Selectors
Linked Data SelectorsLinked Data Selectors
Linked Data Selectors
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
Study of Social Network Sites in crises
Study of Social Network Sites in crisesStudy of Social Network Sites in crises
Study of Social Network Sites in crises
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging Data
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
 
Digital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesDigital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectives
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 

Similar to Easing embedding learning by comprehensive transcription of heterogeneous information networks

NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceColleen Farrelly
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...ssuser4b1f48
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Insights from Knowledge Graphs
Insights from Knowledge GraphsInsights from Knowledge Graphs
Insights from Knowledge GraphsAnirudh Prabhu
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data AnalysisMason Porter
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. Fabien Gandon
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Hilmar Lapp
 
Social Networks and Computer Science
Social Networks and Computer ScienceSocial Networks and Computer Science
Social Networks and Computer Sciencedragonmeteor
 

Similar to Easing embedding learning by comprehensive transcription of heterogeneous information networks (20)

NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Insights from Knowledge Graphs
Insights from Knowledge GraphsInsights from Knowledge Graphs
Insights from Knowledge Graphs
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
Introduction to Topological Data Analysis
Introduction to Topological Data AnalysisIntroduction to Topological Data Analysis
Introduction to Topological Data Analysis
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links.
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Social Networks and Computer Science
Social Networks and Computer ScienceSocial Networks and Computer Science
Social Networks and Computer Science
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
Tianpei research summary
Tianpei research summaryTianpei research summary
Tianpei research summary
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Easing embedding learning by comprehensive transcription of heterogeneous information networks

  • 1. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Network Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Presenter: Zhiwei (Jim) Liu Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks, KDD’18
  • 2. Road Map • Background: Network Embedding + HIN • Preliminary • Proposed Model • Experiment • Conclusion and Future work • Q&A
  • 4. 𝐺 = (𝑉, 𝐸) 𝜙 𝑣 : 𝑉 → 𝑇𝑉 𝜓 𝑒 : 𝐸 → 𝑇𝐸 𝑇𝑦𝑝𝑒 𝑀𝑎𝑝𝑝𝑖𝑛𝑔 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠: |𝑇𝑉 = 1 𝑎𝑛𝑑 |𝑇𝐸 = 1 ∶ 𝐻𝑜𝑚𝑒𝑔𝑒𝑛𝑒𝑜𝑢𝑠 |𝑇𝑉 > 1 𝑜𝑟 |𝑇𝐸 > 1 ∶ 𝐻𝑒𝑡𝑒𝑟𝑜𝑔𝑒𝑛𝑒𝑜𝑢𝑠
  • 5. Network Embedding [1] W. Zachary. An information flow model for conflict and fission in small groups1. Journal of anthropological research, 33(4):452–473, 1977.
  • 6. DeepWalk • Algorithm: Random Walk + Skip-gram Model [1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
  • 7. LINE • Algorithm: First-order + Second-order Proximity [1] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW, 2015. • First-order Proximity: Local Pairwise Similarity • Second-order Proximity: Neighborhood structure similarity
  • 8. node2vec • Algorithm: Random Walk with two balance parameters [1] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In ACM SIGKDD. • Return parameter: p • In-out parameter: q
  • 9. Heterogeneous Information Network (HIN) [1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
  • 10. Homogeneous Network Embedding • No type structure • No side information • Types are always compatible? • … Heterogeneous Information Network
  • 11. DeepWalk • Algorithm: Random Walk + Skip-gram Model [1] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014. • Random Walk over the connection • Only one type of connection • Only one type of node
  • 12. Heterogeneous Information Network (HIN) [1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
  • 13. Meta-path on HIN [1] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- based top-k similarity search in heterogeneous information networks,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 992–1003, 2011.
  • 14. Metapath2vec • Homogeneous Skip-Gram • Heterogeneous Skip-Gram [1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
  • 15. Metapath2vec(++) [1] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
  • 16. Incompatibility • Similar nodes via different meta-paths (connections) • Jaccard Coefficient
  • 17. Incompatibility • Closeness under different metric • User-director and user-genre type is incompatible • Incompatible connections cannot be close at the same time in one metric space
  • 18. HEER model • Comprehensive transcription of HINs in embedding learning • Dealing with the semantic incompatibility of connection in HINs • Leveraging the edge representation and heterogeneous metrics • And neural network model for learning both node and edge representation
  • 20. 𝐺 = (𝑉, 𝐸) 𝜙 𝑣 : 𝑉 → 𝑇𝑉 𝜓 𝑒 : 𝐸 → 𝑇𝐸 𝑇𝑦𝑝𝑒 𝑀𝑎𝑝𝑝𝑖𝑛𝑔 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠: |𝑇𝑉 = 1 𝑎𝑛𝑑 |𝑇𝐸 = 1 ∶ 𝐻𝑜𝑚𝑒𝑔𝑒𝑛𝑒𝑜𝑢𝑠 |𝑇𝑉 > 1 𝑜𝑟 |𝑇𝐸 > 1 ∶ 𝐻𝑒𝑡𝑒𝑟𝑜𝑔𝑒𝑛𝑒𝑜𝑢𝑠 Preliminary
  • 21. Preliminary • Network:𝐺 = (𝒱 , ℰ; 𝜑, 𝜓) • Network Schema:G~ = (𝒯, ℛ)
  • 22. Notations • only one node type can be associated with a certain end of an edge type Edge type 𝑟 E.g., Director Fatih Akin living in Germany, Movie In the Fade being produced in Germany
  • 23. HIN Embedding Definition • Given an HIN, 𝐺 = (𝒱 , ℰ; 𝜑, 𝜓), 𝑣 ∈ 𝒱, 𝑢, 𝑣 ∈ ℰ; • Learning a node embedding mapping, 𝑓 𝑣 : 𝒱 → ℝ 𝑑 𝒱 • Learning an edge embedding mapping, 𝑔(𝑢, 𝑣): 𝒱 × 𝒱 → ℝ 𝑑ℰ • A node pair can be of multi-type, 𝑔 𝑢, 𝑣 encapsulate such information
  • 25. Typed closeness • Node pair, 𝑢, 𝑣 , edge embedding g 𝑢𝑣, • 𝜇 𝑟 is an edge-type-specific vector to be inferred which represents the metric coupled with this type • Compatible edge types share similar 𝜇 𝑟
  • 26. Objective Function • KL-divergence between the original weights and embedding similarity • Overall objective function
  • 28. Details in the HEER Model • Edge embedding • Node embedding
  • 29. Details in the HEER Model • Type filter can distinguish the compatibility between edge types
  • 31. Details in the HEER Model • Type filter can distinguish the compatibility between edge types • Negative sampling
  • 33. Dataset • DBLP[1]: Bibliographical network • Five types of nodes: author, paper, key term, venue, and year • Edge types: author—paper, term—paper, year—paper, venue—paper, paper—>paper (directed) • YAGO[2]: Large scale knowledge graph • Seven types of nodes:person, location, organization, piece ofwork, prize, position, and event; • 24 Edge types. [1] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD. [2] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW
  • 34. YAGO
  • 35. Baselines • LINE • AspEm: Old version of HEER, embeddings learned independently for each aspect(metric) • Metapath2vec++ • Pretrained + logit: logistic regression model for each edge type [1] Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. AspEm: Embed- ding Learning by Aspects in Heterogeneous Information Networks.. In SDM.
  • 36. Edge Reconstruction • Evaluation method: Mean Reciprocal Rank (MRR) • Task Goal: Knock-out + Associated by type-𝑟 edge
  • 42. Experiment analysis • Modeling Incompatibility benefits embedding quality • YAGO has much more (sophistic) incompatible types • Heterogeneous metrics helps improving embedding quality • HEER more prone to suffering from over-fitting at knock-out rate=0.8
  • 46. HEER model • Comprehensive transcription of HINs in embedding learning • Dealing with the semantic incompatibility of connection in HINs • Leveraging the edge representation and heterogeneous metrics • And neural network model for learning both node and edge representation
  • 47. Future Work • Different metrics but not exact represented • Heat map: reference with term and the term year relationship
  • 49. Future Work • Different metrics but not exact represented • Heat map: reference with term and the term year relationship • Incompatibility need designing manually
  • 51. Future Work • Different metrics but not exact represented • Heat map: reference with term and the term year relationship • Incompatibility learned from network? Not just “drop-out” • Edge embedding function is too weak to maintain the edge information • More experiment to verify the embedding • Meta-path incompatibility? (YAGO) • …
  • 52. Q&A
  • 53. Open discussion • How to build an graph embedding model leveraging the meta-path incompatibility? • Random Walk over meta-paths? • Probability distribution? E.g. Skip-gram model