Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction

331 views

Published on

Preprint: https://arxiv.org/pdf/2001.11181.pdf

Source code & Supplementary document: https://github.com/granelle/www20-higher-order

Datasets: https://www.cs.cornell.edu/~arb/data/

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction

  1. 1. L A N A D A How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction The Web Conference 2020 Se-eun Yoon, Hyungseok Song, Kijung Shin, and Yung Yi Contact: seeuny@kaist.ac.kr
  2. 2. L A N A D A Table of contents 2 1. Introduction and related work 2. Problem formulation 3. Methods 4. Experiments 5. Conclusion
  3. 3. L A N A D A 1. Introduction and related work
  4. 4. L A N A D A Hypergraphs 4 • What are hypergraphs? Graph 1 2 3 45 Edge (Link) Interactions of two entities Hypergraph Hyperedge 1 2 3 45 Interactions of arbitrary numbers of entities What about interactions of more than two? • Coauthorship • Protein interactions • Web hashtags
  5. 5. L A N A D A Original hypergraph Projected graph 1, 2, 3, 4 1, 3, 5 1, 6 2, 6 1, 7, 8 3, 9 5, 8 1, 2, 6 [PNAS 2018] Simplicial closure and higher-order link prediction Simplifying a hypergraph 5 • Hypergraphs are not straightforward to use. • Common practice is to simplify them. • E.g., projected graphs 3 1 1 3 0 2 1 2 0 1 2 2 1 2 2 3 1 2 3 4 node node 1 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 1 1 node hyperedge 𝒆 𝟏 𝒆 𝟐 𝒆 𝟑 𝒆 𝟒 𝒆 𝟓 𝒆 𝟔 Whole hypergraph (Incidence matrix) Projected graph (Adjacency matrix) [AAAI 2018] Beyond link prediction: Predicting hyperlinks in adjacency space 1 2 3 4 1 2 3 4
  6. 6. L A N A D A Using a hypergraph as it is 6 • However, simplification comes with information loss. • Projected graphs express hypergraphs with only 2-way information. • Many studies propose methods to use the whole hypergraph. [KDD 2018] Sequences of sets 𝑡1: 1, 2, 3, 4 𝑡2: 1, 3, 5 𝑡3: 1, 6 𝑡4: 2, 6 𝑡5: 1, 7, 8 𝑡6: 3, 9 𝑡7: 5, 8 𝑡8: 1, 2, 6 Timestamped hyperedges [AAAI 2019] Hypergraph neural networks Hypergraph as neural net input
  7. 7. L A N A D A Our question 7 Projected graph 2-way information Original hypergraph All information accuracy complexity
  8. 8. L A N A D A Our question 8 2-way info 2-way info 3-way info 2-way info 3-way info 4-way info … All info ? accuracy complexity
  9. 9. L A N A D A Proposed method to answer our question 9 • [Our question] How much higher-order information is sufficient for accurately solving a hypergraph task? • That is, how much n-way information do we need? • [Proposed method to capture n-way information] n-projected graph • 2-projected graph: captures 2-way information • 3-projected graph: captures 3-way information • ⋯ • 𝑛-projected graph: captures n-way information • [Our task] Hyperedge prediction • Measure prediction accuracy as n grows
  10. 10. L A N A D A Example 10 • In the figure below: a) Suppose we want to predict whether {1, 2, 3, 4} would collaborate in the future. b) Knowing about {1, 2}, {3, 4}, … could be useful • Ex) How often pairs have collaborated c) Knowing also about {1, 2, 3}, {2, 3, 4}, … could be even more useful. • Ex) How often 3 people have collaborated • How much n-way information do we need for accurate enough prediction? 5 2 3 1 4 Pairwise interactions 5 2 3 1 4 ? Hypergraph 5 2 3 1 4 Pairwise + 3-way interactions a) Hyperedge prediction b) Pairwise information c) 3-way information
  11. 11. L A N A D A Related work 11 Pairwise representation Whole representation Hypergraph representation In hyperedge prediction [NeurIPS 2007] Learning with hypergraphs: Clustering, classification, and embedding [CVPR 2005] Beyond pairwise clustering [VLSI design 2000] Multilevel k-way hypergraph partitioning [AAAI 2018] Beyond link prediction: Predicting hyperlinks in adjacency space [PNAS 2018] Simplicial closure and higher- order link prediction [WWW 2013] Link prediction in social networks based on hypergraph. [DS 2013] Hyperlink prediction in hypernetworks using latent social features [AAAI 2019] Hypergraph neural networks [arXiv 2018] Hypergcn: Hypergraph convolutional networks for semi-supervised classification [ICML 2005] Higher order learning with graphs [KDD 2018] Sequences of sets [Multimedia 2018] Exploiting relational information in social networks using geometric deep learning on hypergraphs [arXiv 2014] Predicting multi-actor collaborations using hypergraphs
  12. 12. L A N A D A 2. Problem formulation
  13. 13. L A N A D A Concept: Hypergraphs 13 • Hypergraph 𝐺 = 𝑉, 𝐸, 𝜔 • 𝑉: set of nodes • 𝐸: set of hyperedges • 𝑤 𝑒 : weight of hyperedge 𝑒 = number of times of occurrence {1, 2, 3, 4} {1, 2, 4} {4, 5} {1, 2, 3, 4} Interactions that took place 𝒘 𝟏, 𝟐, 𝟑, 𝟒 = 𝟐 𝒘 {𝟏, 𝟐, 𝟒} = 𝟏 1 3 2 4 5 1 2 4 5 3 (a) Hypergraph (b) 2-pg Hypergraph representation 𝒘 {𝟒, 𝟓} = 𝟏 node hyperedge
  14. 14. L A N A D A Problem: Hyperedge prediction 14 • Hyperedge prediction • Binary classification problem: find 1 2 3 4 5 6 7 8 1 2 1 2 7 8 3 4 8 6 7 𝑪 𝒑: positive hyperedges 𝑪 𝒏: negative hyperedges 𝑪: candidate hyperedges Remove some hyperedges 𝟏 (𝒄 ∈ 𝑪 𝒑) 𝟎 (𝒄 ∈ 𝑪 𝒏) 𝒇 ≅ 𝒇⋆ (𝒄)
  15. 15. L A N A D A Constructing hyperedge candidate set 𝑪 15 • We can create 𝐶 in many different ways. • Depending on how we create it, task difficulty may change. Target hyperedge size Size 4 Size 5 Size 10 Quality of negative hyperedges Stars Cliques More difficult Quantity of negative hyperedges 1:1 1:2 1:5 1:10 More difficult
  16. 16. L A N A D A 3. Methods
  17. 17. L A N A D A The n-projected graph 17 • How to capture n-way information? • Idea: extend the (pairwise) projected graph to more than just 2 nodes • n-projected graph (n-pg) 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 n-pg captures n-way information = # times group of n nodes have interacted 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 2 1 1 2 1
  18. 18. L A N A D A 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 The n-order expansion 18 • n-projected graph captures only n-way information • However, we want represent a hypergraph with up to n-way information • n-order expansion 2-order expansion 3-order expansion 4-order expansion As n increases, n-order expansion becomes a more accurate representation of the original hypergraph
  19. 19. L A N A D A Prediction model: Features 19 • Given a candidate hyperedge, we can extract its features from the n-order expansion. Example feature: Common neighbors (CN) Candidate hyperedge: {1, 2, 3, 5} 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 ? CN of nodes 1, 2, 3, 5 = node 4 # CN = 1 CN of nodes 12, 13, …, 35 = None # CN = 0 CN of nodes 123, …, 235 = None # CN = 0 1 0 0 4-order expansion feature vector
  20. 20. L A N A D A Prediction model: Features 20 • These are the list of features we used. Feature Definition Geometric mean (GM) 𝑥 𝑛(𝑐) = 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛 1 |𝐸 𝑛(𝑐)| Harmonic mean (HM) 𝑥 𝑛(𝑐) = |𝐸 𝑛(𝑐)| 𝑒 𝑛∈𝐸 𝑛(𝑐) 𝜔 𝑛 𝑒 𝑛 −1 Arithmetic mean (AM) 𝑥 𝑛 𝑐 = 1 𝐸 𝑛 𝑐 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛 Common neighbors (CN) 𝑥 𝑛 𝑐 = 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 Jaccard coefficient (JC) 𝑥 𝑛 𝑐 = 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 𝑣 𝑛⊆𝑐 𝑁 𝑛(𝑣 𝑛) Adamic-Adar index (AA) 𝑥 𝑛 𝑐 = 𝑢 𝑛∈ 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 1 log |𝑁 𝑛(𝑢 𝑛)| Features widely used in link prediction Mean variations
  21. 21. L A N A D A Prediction model: Classifier 21 • Classifier: logistic regression classifier with L2 regularization • Classifier input: feature vector from n-order expansion • Classifier output: 1/0 Classifier … … … n-order expansion feature vector 1/0 How does prediction performance change as we increase n?
  22. 22. L A N A D A 4. Experiments
  23. 23. L A N A D A Setup: Datasets • 15 datasets from 8 domains • Ranges from about 1,000 to 2,500,000 hyperedges 1) Email: recipient addresses of an email 2) Contact: persons that appeared in face-to-face proximity 3) Drug components: classes or substances within a single drug, listed in the National Drug Code Directory 4) Drug use: drugs used by a patient, reported to the Drug Abuse Warning Network, before an emergency visit 5) US Congress: congress members cosponsoring a bill 6) Online tags: tags in a question in Stack Exchange forums 7) Online threads: users answering a question in Stack Exchange forums 8) Coauthorship: coauthors of a publication
  24. 24. L A N A D A Setup: Training and evaluation 24 • Training and test sets • First, generate the candidate set 𝐶 • Then split 𝐶 into training (50%) and test(50%) sets • Performance metric: Area Under Curve – Precision and Recall (AUC-PR) • Recall: “How many true hyperedges can you find?” • Precision: “How precisely can you find true hyperedges?” AUC-PR # hyperedges that I claim “true” # true hyperedges I found # true hyperedges # true hyperedges I found Recall = Precision =
  25. 25. L A N A D A Results and messages (1) 25 (M1) More higher-order information leads to better prediction quality, but with diminishing returns.
  26. 26. L A N A D A Results and messages (1) 26 (M1) More higher-order information leads to better prediction quality, but with diminishing returns. Large gain Small gain
  27. 27. L A N A D A Results and messages (2) 27 (M2) More hardness of the task makes higher-order information even more valuable. harder harder Hardness of the task Stars < Cliques 1:1 < 1:2 < 1:5 < 1:10
  28. 28. L A N A D A Results and messages (3) 28 (M3) Why is higher-order information more important in some datasets than in others? Such datasets have the following properties: (i) Higher-order information is more abundant. (ii) Higher-order information share less information with pairwise ones. “How to measure abundance of 3-way information?” # all possible 3-way combinations # edges in 3-pg Edge density = × 100%
  29. 29. L A N A D A Results and messages (3) 29 (M3) Why is higher-order information more important in some datasets than in others? Such datasets have the following properties: (i) Higher-order information is more abundant. (ii) Higher-order information share less information with pairwise ones. 𝐼 𝑊3; 𝑊2 Mutual information “Shared information between 2-pg and 3-pg” 1. Sample three nodes 𝑣1, 𝑣2, 𝑣3 from hypergraph “How to measure shared information between 2-way and of 3-way?” Conditional entropy “Information exclusive to 3-pg” 𝐻 𝑊3|𝑊2 2. Obtain from 2-pg 𝑊2: = (𝑤2 𝑣1, 𝑣2 , 𝑤2 𝑣2, 𝑣3 , 𝑤2 𝑣1, 𝑣3 ) 3. Obtain from 3-pg 𝑊3 ≔ 𝑤3(𝑣1, 𝑣2, 𝑣3)
  30. 30. L A N A D A 5. Conclusion
  31. 31. L A N A D A Conclusion 31 • We ask and answer the following questions. 1) How much higher-order information is needed to accurately represent a hypergraph? 2) When is such higher-order information particularly useful? 3) Why is higher-order information important in some datasets more than in others? • Our results could offer insights to future works on hypergraphs. • E.g., higher performance on hypergraph tasks, but with less computational complexity • Some examples of hypergraph tasks: 1 2 3 45 4 4 ? Node classification 1 2 3 45 Node embedding
  32. 32. L A N A D A Links 32 • Preprint: https://arxiv.org/pdf/2001.11181.pdf • Source code & Supplementary document: https://github.com/granelle/www20-higher-order • Datasets: https://www.cs.cornell.edu/~arb/data/
  33. 33. A n y Q u e s t i o n s ? Thank you!

×