Cikm 2018

eXascale Infolab
eXascale InfolabeXascale Infolab
Are Meta-Paths Necessary?
Revisiting Heterogeneous Graph Embeddings
Rana Hussein, Dingqi Yang and Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg, Switzerland
27th ACM International Conference on Information and Knowledge Management (CIKM 2018)
Graph Embeddings
• Represent nodes in a graph using a vector space.
• Learn a latent space representation of the graph structure and node interactions.
• Community detection
• Friendship recommendation
• User interest prediction
2Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining. ACM, 701–710.
Graph Embeddings Techniques
• One of the typical approaches is Random Walk + SkipGram like model.
3
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining. ACM, 701–710.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
Heterogeneous Graphs
• Heterogeneous Graphs contain multiple node types:
• Homogeneous edges: linking nodes from the same domain
• Heterogeneous edges: linking nodes across different domains
4
• The proximity among nodes is based on semantics.
Heterogeneous Graph embeddings
• A meta-path is a sequence of node types encoding key composite relations among the involved
node types.
• Meta-paths are used to guide random walks to redefine the neighborhood of a node.
• Metapath2vec (KDD 2017)
5Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
Challenges
• How to select meta-paths ?
• Graph specific and highly depends on prior knowledge from domain experts.
• Strategies to combine a set of meta-paths can be complex and computationally expensive.
• The choice of meta-paths highly affects the quality of the learnt node embeddings for a specific
task.
6
Are meta-paths necessary?
7
8
• We propose a two level graph embeddings technique for HIN:
• Step 1: Random Walk with JUmp and STay strategies to probabilistically
control the random walk.
• Step 2: Learn node embeddings with SkipGram model.
JUST - Heterogeneous Graph Embeddings technique
Random Walk with JUmp and STay strategies (JUST)
1- Jump or stay?
9
• Objective: Balance the number of heterogeneous and
homogeneous edges traversed during random walks.
• α ∈ [0, 1] is an initial stay probability.
• refers to the number of nodes consecutively visited in the same domain.
Random Walk with JUmp and STay strategies (JUST)
2- Where to Jump?
10
• Objective: Control the randomness in choosing a target domain.
• Define a fixed length queue Qhist to memorize up-to-m previously
visited domains.
• For each node in the graph, we initialize a random walk, until the maximum
length is reached.
• Maximize the co-occurance probability of two nodes appearing within a context
window in the random walk using SkipGram model.
11
Random Walk with JUmp and STay strategies (JUST)
Experimental evaluation - Datasets
DBLP Movie Foursquare
12
Experimental evaluation - Baselines
• Homogeneous graph embedding techniques:
• Deepwalk
• LINE
• Heterogeneous graph embedding techniques :
• PTE
• Metapath2vec
• Hin2vec
• JUST_no_memory (simplified version of our proposed method)
13
Node classification results
14
JUST achieves state of the art performance, and outperforms the baselines.
Node clustering results
15
JUST outperforms the baselines on all datasets.
Combining several meta-paths may not consistently outperform manually selecting one meta-path.
DeepWalk LINE Hin2vecPTE Metapath2vec JUSTJUST_no_memory
DBLP MovieFoursquare
Impact of initial stay probability α
16
• Balances the impact of heterogeneous and homogeneous edges on the learnt embeddings.
• Tune α within [0.1,0.9] with a step of 0.1
Suboptimal results for too many heterogeneous or homogeneous edges.
Balancing the number of edges is key to learn high quality embeddings.
The optimal α lies in the range [0.2,0.4] on all three datasets in both node classification and clustering tasks.
Runtime Performance
• End-to-end node embedding learning time for all random-walk based
methods in seconds.
17
DBLP Movie Foursquare
DeepWalk 236 333 484
Metapath2vec (original) 965 19,200 2,248
Metapath2vec (ours) 290 408 550
Hin2vec 904 1,301 1,801
JUST 310 442 616
• Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time,
but achieves better results in classification and clustering tasks.
• Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better
results in most experiments.
Conclusions
• Propose JUST, a heterogeneous graph embedding technique using random
walks with jump and stay strategies without prior knowledge.
• JUST achieves state of the art performance without using meta-paths for
classification and clustering tasks.
18
• We plan to investigate how JUST performs on different graph structures, such as:
Knowledge Graphs.
1 of 18

Recommended

C-SAW: A Framework for Graph Sampling and Random Walk on GPUs by
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsC-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsPandey_G
139 views16 slides
Y. Jung, ICML 2023, MLILAB, KAISTAI by
Y. Jung, ICML 2023, MLILAB, KAISTAIY. Jung, ICML 2023, MLILAB, KAISTAI
Y. Jung, ICML 2023, MLILAB, KAISTAIMLILAB
26 views13 slides
Machine learning with graph by
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
194 views98 slides
Graph neural networks overview by
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
576 views36 slides
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks by
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
1.4K views39 slides
Introduction to Graph neural networks @ Vienna Deep Learning meetup by
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetupLiad Magen
408 views39 slides

More Related Content

What's hot

Graph kernels by
Graph kernelsGraph kernels
Graph kernelsLuc Brun
682 views59 slides
Gnn overview by
Gnn overviewGnn overview
Gnn overviewLouis (Yufeng) Wang
889 views28 slides
Introduction to Generative Adversarial Networks (GANs) by
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
13.4K views13 slides
Graph Representation Learning by
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
7.9K views65 slides
Learning Convolutional Neural Networks for Graphs by
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsMathias Niepert
1.3K views38 slides
VJAI Paper Reading#3-KDD2019-ClusterGCN by
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
423 views38 slides

What's hot(20)

Graph kernels by Luc Brun
Graph kernelsGraph kernels
Graph kernels
Luc Brun682 views
Introduction to Generative Adversarial Networks (GANs) by Appsilon Data Science
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science13.4K views
Graph Representation Learning by Jure Leskovec
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
Jure Leskovec7.9K views
Learning Convolutional Neural Networks for Graphs by Mathias Niepert
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
Mathias Niepert1.3K views
VJAI Paper Reading#3-KDD2019-ClusterGCN by Dat Nguyen
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dat Nguyen423 views
Graph Neural Network in practice by tuxette
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette445 views
Anomaly Detection Using Generative Adversarial Network(GAN) by Asha Aher
Anomaly Detection Using Generative Adversarial Network(GAN)Anomaly Detection Using Generative Adversarial Network(GAN)
Anomaly Detection Using Generative Adversarial Network(GAN)
Asha Aher142 views
Graph-Powered Machine Learning by Databricks
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
Databricks647 views
Architecture Design for Deep Neural Networks III by Wanjin Yu
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
Wanjin Yu826 views
Anomaly detection in plain static graphs by dash-javad
Anomaly detection in plain static graphsAnomaly detection in plain static graphs
Anomaly detection in plain static graphs
dash-javad510 views
High Dimensional Data Visualization using t-SNE by Kai-Wen Zhao
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao7K views
The ways of node embedding by SEMINARGROOT
The ways of node embeddingThe ways of node embedding
The ways of node embedding
SEMINARGROOT274 views
End to-end semi-supervised object detection with soft teacher ver.1.0 by taeseon ryu
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
taeseon ryu556 views
Master's Thesis Presentation by Wajdi Khattel
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
Wajdi Khattel495 views
GAN - Theory and Applications by Emanuele Ghelfi
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi9.5K views

Similar to Cikm 2018

An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ... by
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...Harshal Solao
261 views18 slides
Representation Learning on Complex Graphs by
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
539 views33 slides
00 Automatic Mental Health Classification in Online Settings and Language Emb... by
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
814 views37 slides
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb... by
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...IJET - International Journal of Engineering and Techniques
109 views5 slides
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le... by
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
80 views23 slides
Throttling Malware Families in 2D by
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
186 views44 slides

Similar to Cikm 2018(20)

An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ... by Harshal Solao
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
Harshal Solao261 views
Representation Learning on Complex Graphs by eXascale Infolab
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab539 views
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le... by ssuser4b1f48
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
ssuser4b1f4880 views
Throttling Malware Families in 2D by Mohamed Nassar
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
Mohamed Nassar186 views
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time... by Michael Dorner
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Michael Dorner7 views
Ling liu part 01:big graph processing by jins0618
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
jins0618610 views
2019 cvpr paper overview by Ho Seong Lee by Moazzem Hossain
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
Moazzem Hossain50 views
2019 cvpr paper_overview by LEE HOSEONG
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
LEE HOSEONG8.7K views
Model Evaluation in the land of Deep Learning by Pramit Choudhary
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary734 views
Neo4j MeetUp - Graph Exploration with MetaExp by Adrian Ziegler
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
Adrian Ziegler197 views
Graph Attention Networks.pptx by ssuser2624f71
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptx
ssuser2624f7195 views
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE... by ijscai
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
ijscai173 views
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE... by ijscai
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
ijscai25 views
Study of Distance Measurement Techniques in Context to Prediction Model of We... by IJSCAI Journal
Study of Distance Measurement Techniques in Context to Prediction Model of We...Study of Distance Measurement Techniques in Context to Prediction Model of We...
Study of Distance Measurement Techniques in Context to Prediction Model of We...
IJSCAI Journal20 views
The Future is Big Graphs: A Community View on Graph Processing Systems by Neo4j
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
Neo4j382 views
Adaptive Geographical Search in Networks by Andrea Wiggins
Adaptive Geographical Search in NetworksAdaptive Geographical Search in Networks
Adaptive Geographical Search in Networks
Andrea Wiggins501 views
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En... by ssuser2624f71
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
ssuser2624f7158 views

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction by
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
287 views30 slides
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S... by
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
167 views16 slides
A force directed approach for offline gps trajectory map by
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
459 views12 slides
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit... by
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
787 views20 slides
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous... by
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
1.2K views15 slides
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans by
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
687 views18 slides

More from eXascale Infolab(20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction by eXascale Infolab
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab287 views
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S... by eXascale Infolab
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab167 views
A force directed approach for offline gps trajectory map by eXascale Infolab
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab459 views
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit... by eXascale Infolab
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab787 views
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous... by eXascale Infolab
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab1.2K views
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans by eXascale Infolab
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab687 views
SANAPHOR: Ontology-based Coreference Resolution by eXascale Infolab
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab1.1K views
Efficient, Scalable, and Provenance-Aware Management of Linked Data by eXascale Infolab
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab713 views
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data by eXascale Infolab
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab4K views
Executing Provenance-Enabled Queries over Web Data by eXascale Infolab
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab1.5K views
The Dynamics of Micro-Task Crowdsourcing by eXascale Infolab
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab1.6K views
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu... by eXascale Infolab
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab3.1K views
CIKM14: Fixing grammatical errors by preposition ranking by eXascale Infolab
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab1.7K views
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series) by eXascale Infolab
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab663 views

Recently uploaded

Data Journeys Hard Talk workshop final.pptx by
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
11 views18 slides
CRM stick or twist.pptx by
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptxinfo828217
11 views16 slides
CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
7 views24 slides
4_4_WP_4_06_ND_Model.pptx by
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptxd6fmc6kwd4
7 views13 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
20 views17 slides
Amy slides.pdf by
Amy slides.pdfAmy slides.pdf
Amy slides.pdfStatsCommunications
5 views13 slides

Recently uploaded(20)

Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
4_4_WP_4_06_ND_Model.pptx by d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0120 views
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 views
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf by DataScienceConferenc1
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 views
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx by DataScienceConferenc1
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf by Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus27 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...

Cikm 2018

  • 1. Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings Rana Hussein, Dingqi Yang and Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg, Switzerland 27th ACM International Conference on Information and Knowledge Management (CIKM 2018)
  • 2. Graph Embeddings • Represent nodes in a graph using a vector space. • Learn a latent space representation of the graph structure and node interactions. • Community detection • Friendship recommendation • User interest prediction 2Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710.
  • 3. Graph Embeddings Techniques • One of the typical approaches is Random Walk + SkipGram like model. 3 Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  • 4. Heterogeneous Graphs • Heterogeneous Graphs contain multiple node types: • Homogeneous edges: linking nodes from the same domain • Heterogeneous edges: linking nodes across different domains 4 • The proximity among nodes is based on semantics.
  • 5. Heterogeneous Graph embeddings • A meta-path is a sequence of node types encoding key composite relations among the involved node types. • Meta-paths are used to guide random walks to redefine the neighborhood of a node. • Metapath2vec (KDD 2017) 5Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
  • 6. Challenges • How to select meta-paths ? • Graph specific and highly depends on prior knowledge from domain experts. • Strategies to combine a set of meta-paths can be complex and computationally expensive. • The choice of meta-paths highly affects the quality of the learnt node embeddings for a specific task. 6
  • 8. 8 • We propose a two level graph embeddings technique for HIN: • Step 1: Random Walk with JUmp and STay strategies to probabilistically control the random walk. • Step 2: Learn node embeddings with SkipGram model. JUST - Heterogeneous Graph Embeddings technique
  • 9. Random Walk with JUmp and STay strategies (JUST) 1- Jump or stay? 9 • Objective: Balance the number of heterogeneous and homogeneous edges traversed during random walks. • α ∈ [0, 1] is an initial stay probability. • refers to the number of nodes consecutively visited in the same domain.
  • 10. Random Walk with JUmp and STay strategies (JUST) 2- Where to Jump? 10 • Objective: Control the randomness in choosing a target domain. • Define a fixed length queue Qhist to memorize up-to-m previously visited domains.
  • 11. • For each node in the graph, we initialize a random walk, until the maximum length is reached. • Maximize the co-occurance probability of two nodes appearing within a context window in the random walk using SkipGram model. 11 Random Walk with JUmp and STay strategies (JUST)
  • 12. Experimental evaluation - Datasets DBLP Movie Foursquare 12
  • 13. Experimental evaluation - Baselines • Homogeneous graph embedding techniques: • Deepwalk • LINE • Heterogeneous graph embedding techniques : • PTE • Metapath2vec • Hin2vec • JUST_no_memory (simplified version of our proposed method) 13
  • 14. Node classification results 14 JUST achieves state of the art performance, and outperforms the baselines.
  • 15. Node clustering results 15 JUST outperforms the baselines on all datasets. Combining several meta-paths may not consistently outperform manually selecting one meta-path. DeepWalk LINE Hin2vecPTE Metapath2vec JUSTJUST_no_memory DBLP MovieFoursquare
  • 16. Impact of initial stay probability α 16 • Balances the impact of heterogeneous and homogeneous edges on the learnt embeddings. • Tune α within [0.1,0.9] with a step of 0.1 Suboptimal results for too many heterogeneous or homogeneous edges. Balancing the number of edges is key to learn high quality embeddings. The optimal α lies in the range [0.2,0.4] on all three datasets in both node classification and clustering tasks.
  • 17. Runtime Performance • End-to-end node embedding learning time for all random-walk based methods in seconds. 17 DBLP Movie Foursquare DeepWalk 236 333 484 Metapath2vec (original) 965 19,200 2,248 Metapath2vec (ours) 290 408 550 Hin2vec 904 1,301 1,801 JUST 310 442 616 • Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time, but achieves better results in classification and clustering tasks. • Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better results in most experiments.
  • 18. Conclusions • Propose JUST, a heterogeneous graph embedding technique using random walks with jump and stay strategies without prior knowledge. • JUST achieves state of the art performance without using meta-paths for classification and clustering tasks. 18 • We plan to investigate how JUST performs on different graph structures, such as: Knowledge Graphs.