Successfully reported this slideshow.

How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction

0

Share

Upcoming SlideShare
Marvin_Capstone
Marvin_Capstone
Loading in …3
×
1 of 33
1 of 33

How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction

0

Share

Download to read offline

Preprint: https://arxiv.org/pdf/2001.11181.pdf

Source code & Supplementary document: https://github.com/granelle/www20-higher-order

Datasets: https://www.cs.cornell.edu/~arb/data/

Preprint: https://arxiv.org/pdf/2001.11181.pdf

Source code & Supplementary document: https://github.com/granelle/www20-higher-order

Datasets: https://www.cs.cornell.edu/~arb/data/

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction

  1. 1. L A N A D A How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction The Web Conference 2020 Se-eun Yoon, Hyungseok Song, Kijung Shin, and Yung Yi Contact: seeuny@kaist.ac.kr
  2. 2. L A N A D A Table of contents 2 1. Introduction and related work 2. Problem formulation 3. Methods 4. Experiments 5. Conclusion
  3. 3. L A N A D A 1. Introduction and related work
  4. 4. L A N A D A Hypergraphs 4 • What are hypergraphs? Graph 1 2 3 45 Edge (Link) Interactions of two entities Hypergraph Hyperedge 1 2 3 45 Interactions of arbitrary numbers of entities What about interactions of more than two? • Coauthorship • Protein interactions • Web hashtags
  5. 5. L A N A D A Original hypergraph Projected graph 1, 2, 3, 4 1, 3, 5 1, 6 2, 6 1, 7, 8 3, 9 5, 8 1, 2, 6 [PNAS 2018] Simplicial closure and higher-order link prediction Simplifying a hypergraph 5 • Hypergraphs are not straightforward to use. • Common practice is to simplify them. • E.g., projected graphs 3 1 1 3 0 2 1 2 0 1 2 2 1 2 2 3 1 2 3 4 node node 1 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 1 1 node hyperedge 𝒆 𝟏 𝒆 𝟐 𝒆 𝟑 𝒆 𝟒 𝒆 𝟓 𝒆 𝟔 Whole hypergraph (Incidence matrix) Projected graph (Adjacency matrix) [AAAI 2018] Beyond link prediction: Predicting hyperlinks in adjacency space 1 2 3 4 1 2 3 4
  6. 6. L A N A D A Using a hypergraph as it is 6 • However, simplification comes with information loss. • Projected graphs express hypergraphs with only 2-way information. • Many studies propose methods to use the whole hypergraph. [KDD 2018] Sequences of sets 𝑡1: 1, 2, 3, 4 𝑡2: 1, 3, 5 𝑡3: 1, 6 𝑡4: 2, 6 𝑡5: 1, 7, 8 𝑡6: 3, 9 𝑡7: 5, 8 𝑡8: 1, 2, 6 Timestamped hyperedges [AAAI 2019] Hypergraph neural networks Hypergraph as neural net input
  7. 7. L A N A D A Our question 7 Projected graph 2-way information Original hypergraph All information accuracy complexity
  8. 8. L A N A D A Our question 8 2-way info 2-way info 3-way info 2-way info 3-way info 4-way info … All info ? accuracy complexity
  9. 9. L A N A D A Proposed method to answer our question 9 • [Our question] How much higher-order information is sufficient for accurately solving a hypergraph task? • That is, how much n-way information do we need? • [Proposed method to capture n-way information] n-projected graph • 2-projected graph: captures 2-way information • 3-projected graph: captures 3-way information • ⋯ • 𝑛-projected graph: captures n-way information • [Our task] Hyperedge prediction • Measure prediction accuracy as n grows
  10. 10. L A N A D A Example 10 • In the figure below: a) Suppose we want to predict whether {1, 2, 3, 4} would collaborate in the future. b) Knowing about {1, 2}, {3, 4}, … could be useful • Ex) How often pairs have collaborated c) Knowing also about {1, 2, 3}, {2, 3, 4}, … could be even more useful. • Ex) How often 3 people have collaborated • How much n-way information do we need for accurate enough prediction? 5 2 3 1 4 Pairwise interactions 5 2 3 1 4 ? Hypergraph 5 2 3 1 4 Pairwise + 3-way interactions a) Hyperedge prediction b) Pairwise information c) 3-way information
  11. 11. L A N A D A Related work 11 Pairwise representation Whole representation Hypergraph representation In hyperedge prediction [NeurIPS 2007] Learning with hypergraphs: Clustering, classification, and embedding [CVPR 2005] Beyond pairwise clustering [VLSI design 2000] Multilevel k-way hypergraph partitioning [AAAI 2018] Beyond link prediction: Predicting hyperlinks in adjacency space [PNAS 2018] Simplicial closure and higher- order link prediction [WWW 2013] Link prediction in social networks based on hypergraph. [DS 2013] Hyperlink prediction in hypernetworks using latent social features [AAAI 2019] Hypergraph neural networks [arXiv 2018] Hypergcn: Hypergraph convolutional networks for semi-supervised classification [ICML 2005] Higher order learning with graphs [KDD 2018] Sequences of sets [Multimedia 2018] Exploiting relational information in social networks using geometric deep learning on hypergraphs [arXiv 2014] Predicting multi-actor collaborations using hypergraphs
  12. 12. L A N A D A 2. Problem formulation
  13. 13. L A N A D A Concept: Hypergraphs 13 • Hypergraph 𝐺 = 𝑉, 𝐸, 𝜔 • 𝑉: set of nodes • 𝐸: set of hyperedges • 𝑤 𝑒 : weight of hyperedge 𝑒 = number of times of occurrence {1, 2, 3, 4} {1, 2, 4} {4, 5} {1, 2, 3, 4} Interactions that took place 𝒘 𝟏, 𝟐, 𝟑, 𝟒 = 𝟐 𝒘 {𝟏, 𝟐, 𝟒} = 𝟏 1 3 2 4 5 1 2 4 5 3 (a) Hypergraph (b) 2-pg Hypergraph representation 𝒘 {𝟒, 𝟓} = 𝟏 node hyperedge
  14. 14. L A N A D A Problem: Hyperedge prediction 14 • Hyperedge prediction • Binary classification problem: find 1 2 3 4 5 6 7 8 1 2 1 2 7 8 3 4 8 6 7 𝑪 𝒑: positive hyperedges 𝑪 𝒏: negative hyperedges 𝑪: candidate hyperedges Remove some hyperedges 𝟏 (𝒄 ∈ 𝑪 𝒑) 𝟎 (𝒄 ∈ 𝑪 𝒏) 𝒇 ≅ 𝒇⋆ (𝒄)
  15. 15. L A N A D A Constructing hyperedge candidate set 𝑪 15 • We can create 𝐶 in many different ways. • Depending on how we create it, task difficulty may change. Target hyperedge size Size 4 Size 5 Size 10 Quality of negative hyperedges Stars Cliques More difficult Quantity of negative hyperedges 1:1 1:2 1:5 1:10 More difficult
  16. 16. L A N A D A 3. Methods
  17. 17. L A N A D A The n-projected graph 17 • How to capture n-way information? • Idea: extend the (pairwise) projected graph to more than just 2 nodes • n-projected graph (n-pg) 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 n-pg captures n-way information = # times group of n nodes have interacted 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 2 1 1 2 1
  18. 18. L A N A D A 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 The n-order expansion 18 • n-projected graph captures only n-way information • However, we want represent a hypergraph with up to n-way information • n-order expansion 2-order expansion 3-order expansion 4-order expansion As n increases, n-order expansion becomes a more accurate representation of the original hypergraph
  19. 19. L A N A D A Prediction model: Features 19 • Given a candidate hyperedge, we can extract its features from the n-order expansion. Example feature: Common neighbors (CN) Candidate hyperedge: {1, 2, 3, 5} 1 3 2 4 5 123 124 134234 1 2 4 5 3 (a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg 24 12 13 34 23 14 ? CN of nodes 1, 2, 3, 5 = node 4 # CN = 1 CN of nodes 12, 13, …, 35 = None # CN = 0 CN of nodes 123, …, 235 = None # CN = 0 1 0 0 4-order expansion feature vector
  20. 20. L A N A D A Prediction model: Features 20 • These are the list of features we used. Feature Definition Geometric mean (GM) 𝑥 𝑛(𝑐) = 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛 1 |𝐸 𝑛(𝑐)| Harmonic mean (HM) 𝑥 𝑛(𝑐) = |𝐸 𝑛(𝑐)| 𝑒 𝑛∈𝐸 𝑛(𝑐) 𝜔 𝑛 𝑒 𝑛 −1 Arithmetic mean (AM) 𝑥 𝑛 𝑐 = 1 𝐸 𝑛 𝑐 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛 Common neighbors (CN) 𝑥 𝑛 𝑐 = 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 Jaccard coefficient (JC) 𝑥 𝑛 𝑐 = 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 𝑣 𝑛⊆𝑐 𝑁 𝑛(𝑣 𝑛) Adamic-Adar index (AA) 𝑥 𝑛 𝑐 = 𝑢 𝑛∈ 𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛 1 log |𝑁 𝑛(𝑢 𝑛)| Features widely used in link prediction Mean variations
  21. 21. L A N A D A Prediction model: Classifier 21 • Classifier: logistic regression classifier with L2 regularization • Classifier input: feature vector from n-order expansion • Classifier output: 1/0 Classifier … … … n-order expansion feature vector 1/0 How does prediction performance change as we increase n?
  22. 22. L A N A D A 4. Experiments
  23. 23. L A N A D A Setup: Datasets • 15 datasets from 8 domains • Ranges from about 1,000 to 2,500,000 hyperedges 1) Email: recipient addresses of an email 2) Contact: persons that appeared in face-to-face proximity 3) Drug components: classes or substances within a single drug, listed in the National Drug Code Directory 4) Drug use: drugs used by a patient, reported to the Drug Abuse Warning Network, before an emergency visit 5) US Congress: congress members cosponsoring a bill 6) Online tags: tags in a question in Stack Exchange forums 7) Online threads: users answering a question in Stack Exchange forums 8) Coauthorship: coauthors of a publication
  24. 24. L A N A D A Setup: Training and evaluation 24 • Training and test sets • First, generate the candidate set 𝐶 • Then split 𝐶 into training (50%) and test(50%) sets • Performance metric: Area Under Curve – Precision and Recall (AUC-PR) • Recall: “How many true hyperedges can you find?” • Precision: “How precisely can you find true hyperedges?” AUC-PR # hyperedges that I claim “true” # true hyperedges I found # true hyperedges # true hyperedges I found Recall = Precision =
  25. 25. L A N A D A Results and messages (1) 25 (M1) More higher-order information leads to better prediction quality, but with diminishing returns.
  26. 26. L A N A D A Results and messages (1) 26 (M1) More higher-order information leads to better prediction quality, but with diminishing returns. Large gain Small gain
  27. 27. L A N A D A Results and messages (2) 27 (M2) More hardness of the task makes higher-order information even more valuable. harder harder Hardness of the task Stars < Cliques 1:1 < 1:2 < 1:5 < 1:10
  28. 28. L A N A D A Results and messages (3) 28 (M3) Why is higher-order information more important in some datasets than in others? Such datasets have the following properties: (i) Higher-order information is more abundant. (ii) Higher-order information share less information with pairwise ones. “How to measure abundance of 3-way information?” # all possible 3-way combinations # edges in 3-pg Edge density = × 100%
  29. 29. L A N A D A Results and messages (3) 29 (M3) Why is higher-order information more important in some datasets than in others? Such datasets have the following properties: (i) Higher-order information is more abundant. (ii) Higher-order information share less information with pairwise ones. 𝐼 𝑊3; 𝑊2 Mutual information “Shared information between 2-pg and 3-pg” 1. Sample three nodes 𝑣1, 𝑣2, 𝑣3 from hypergraph “How to measure shared information between 2-way and of 3-way?” Conditional entropy “Information exclusive to 3-pg” 𝐻 𝑊3|𝑊2 2. Obtain from 2-pg 𝑊2: = (𝑤2 𝑣1, 𝑣2 , 𝑤2 𝑣2, 𝑣3 , 𝑤2 𝑣1, 𝑣3 ) 3. Obtain from 3-pg 𝑊3 ≔ 𝑤3(𝑣1, 𝑣2, 𝑣3)
  30. 30. L A N A D A 5. Conclusion
  31. 31. L A N A D A Conclusion 31 • We ask and answer the following questions. 1) How much higher-order information is needed to accurately represent a hypergraph? 2) When is such higher-order information particularly useful? 3) Why is higher-order information important in some datasets more than in others? • Our results could offer insights to future works on hypergraphs. • E.g., higher performance on hypergraph tasks, but with less computational complexity • Some examples of hypergraph tasks: 1 2 3 45 4 4 ? Node classification 1 2 3 45 Node embedding
  32. 32. L A N A D A Links 32 • Preprint: https://arxiv.org/pdf/2001.11181.pdf • Source code & Supplementary document: https://github.com/granelle/www20-higher-order • Datasets: https://www.cs.cornell.edu/~arb/data/
  33. 33. A n y Q u e s t i o n s ? Thank you!

×