How Much and When Do We Need Higher-order Information in Hypergraphs? A Case Study on Hyperedge Prediction
1. L A N A D A
How Much and When Do We Need
Higher-order Information in Hypergraphs?
A Case Study on Hyperedge Prediction
The Web Conference 2020
Se-eun Yoon, Hyungseok Song, Kijung Shin, and Yung Yi
Contact: seeuny@kaist.ac.kr
2. L A N A D A
Table of contents 2
1. Introduction and related work
2. Problem formulation
3. Methods
4. Experiments
5. Conclusion
4. L A N A D A
Hypergraphs 4
• What are hypergraphs?
Graph
1
2
3
45
Edge
(Link)
Interactions of two entities
Hypergraph
Hyperedge
1
2
3
45
Interactions of arbitrary
numbers of entities
What about interactions
of more than two?
• Coauthorship
• Protein interactions
• Web hashtags
5. L A N A D A
Original hypergraph Projected graph
1, 2, 3, 4
1, 3, 5
1, 6
2, 6
1, 7, 8
3, 9
5, 8
1, 2, 6
[PNAS 2018] Simplicial closure and
higher-order link prediction
Simplifying a hypergraph 5
• Hypergraphs are not straightforward to use.
• Common practice is to simplify them.
• E.g., projected graphs
3 1
1 3
0 2
1 2
0 1
2 2
1 2
2 3
1 2 3 4
node
node
1 0
0 1
0 1
1 1
0 1
0 0
0 1
0 1
0 0
0 1
1 0
1 1
node
hyperedge
𝒆 𝟏 𝒆 𝟐 𝒆 𝟑 𝒆 𝟒 𝒆 𝟓 𝒆 𝟔
Whole hypergraph
(Incidence matrix)
Projected graph
(Adjacency matrix)
[AAAI 2018] Beyond link prediction:
Predicting hyperlinks in adjacency space
1
2
3
4
1
2
3
4
6. L A N A D A
Using a hypergraph as it is 6
• However, simplification comes with information loss.
• Projected graphs express hypergraphs with only 2-way information.
• Many studies propose methods to use the whole hypergraph.
[KDD 2018] Sequences of sets
𝑡1: 1, 2, 3, 4
𝑡2: 1, 3, 5
𝑡3: 1, 6
𝑡4: 2, 6
𝑡5: 1, 7, 8
𝑡6: 3, 9
𝑡7: 5, 8
𝑡8: 1, 2, 6
Timestamped hyperedges
[AAAI 2019] Hypergraph neural networks
Hypergraph as neural net input
7. L A N A D A
Our question 7
Projected graph
2-way information
Original hypergraph
All information
accuracy
complexity
8. L A N A D A
Our question 8
2-way info
2-way info
3-way info
2-way info
3-way info
4-way info
…
All info
?
accuracy
complexity
9. L A N A D A
Proposed method to answer our question 9
• [Our question] How much higher-order information is sufficient for accurately
solving a hypergraph task?
• That is, how much n-way information do we need?
• [Proposed method to capture n-way information] n-projected graph
• 2-projected graph: captures 2-way information
• 3-projected graph: captures 3-way information
• ⋯
• 𝑛-projected graph: captures n-way information
• [Our task] Hyperedge prediction
• Measure prediction accuracy as n grows
10. L A N A D A
Example 10
• In the figure below:
a) Suppose we want to predict whether {1, 2, 3, 4} would collaborate in the future.
b) Knowing about {1, 2}, {3, 4}, … could be useful
• Ex) How often pairs have collaborated
c) Knowing also about {1, 2, 3}, {2, 3, 4}, … could be even more useful.
• Ex) How often 3 people have collaborated
• How much n-way information do we need for accurate enough prediction?
5
2 3
1
4
Pairwise
interactions
5
2 3
1
4
?
Hypergraph
5
2 3
1
4
Pairwise + 3-way
interactions
a) Hyperedge
prediction
b) Pairwise
information
c) 3-way
information
11. L A N A D A
Related work 11
Pairwise representation Whole representation
Hypergraph
representation
In hyperedge
prediction
[NeurIPS 2007] Learning with hypergraphs:
Clustering, classification, and embedding
[CVPR 2005] Beyond pairwise clustering
[VLSI design 2000] Multilevel k-way
hypergraph partitioning
[AAAI 2018] Beyond link prediction:
Predicting hyperlinks in adjacency space
[PNAS 2018] Simplicial closure and higher-
order link prediction
[WWW 2013] Link prediction in social
networks based on hypergraph.
[DS 2013] Hyperlink prediction in
hypernetworks using latent social features
[AAAI 2019] Hypergraph neural networks
[arXiv 2018] Hypergcn: Hypergraph convolutional
networks for semi-supervised classification
[ICML 2005] Higher order learning with graphs
[KDD 2018] Sequences of sets
[Multimedia 2018] Exploiting relational
information in social networks using geometric
deep learning on hypergraphs
[arXiv 2014] Predicting multi-actor collaborations
using hypergraphs
13. L A N A D A
Concept: Hypergraphs 13
• Hypergraph 𝐺 = 𝑉, 𝐸, 𝜔
• 𝑉: set of nodes
• 𝐸: set of hyperedges
• 𝑤 𝑒 : weight of hyperedge 𝑒 = number of times of occurrence
{1, 2, 3, 4}
{1, 2, 4}
{4, 5}
{1, 2, 3, 4}
Interactions that took place
𝒘 𝟏, 𝟐, 𝟑, 𝟒 = 𝟐
𝒘 {𝟏, 𝟐, 𝟒} = 𝟏
1
3
2
4 5
1 2
4
5
3
(a) Hypergraph (b) 2-pg
Hypergraph representation
𝒘 {𝟒, 𝟓} = 𝟏
node
hyperedge
14. L A N A D A
Problem: Hyperedge prediction 14
• Hyperedge prediction
• Binary classification problem: find
1
2
3
4
5
6
7
8
1
2
1
2
7
8
3
4
8
6
7
𝑪 𝒑: positive hyperedges
𝑪 𝒏: negative hyperedges
𝑪: candidate hyperedges
Remove some hyperedges
𝟏 (𝒄 ∈ 𝑪 𝒑)
𝟎 (𝒄 ∈ 𝑪 𝒏)
𝒇 ≅ 𝒇⋆
(𝒄)
15. L A N A D A
Constructing hyperedge candidate set 𝑪 15
• We can create 𝐶 in many different ways.
• Depending on how we create it, task difficulty may change.
Target hyperedge size
Size 4 Size 5 Size 10
Quality of negative hyperedges
Stars Cliques
More difficult
Quantity of negative hyperedges
1:1 1:2 1:5 1:10
More difficult
17. L A N A D A
The n-projected graph 17
• How to capture n-way information?
• Idea: extend the (pairwise) projected graph to more than just 2 nodes
• n-projected graph (n-pg)
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
n-pg captures n-way information
= # times group of n nodes have interacted
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
2
1
1
2
1
18. L A N A D A
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
The n-order expansion 18
• n-projected graph captures only n-way information
• However, we want represent a hypergraph with up to n-way information
• n-order expansion
2-order expansion
3-order expansion
4-order expansion
As n increases, n-order expansion becomes a more accurate representation of the original hypergraph
19. L A N A D A
Prediction model: Features 19
• Given a candidate hyperedge, we can extract its features from the n-order expansion.
Example feature: Common neighbors (CN)
Candidate hyperedge: {1, 2, 3, 5}
1
3
2
4 5
123 124
134234
1 2
4
5
3
(a) Hypergraph (b) 2-pg (c) 3-pg (d) 4-pg
24
12
13
34
23
14
?
CN of nodes 1, 2, 3, 5
= node 4
# CN = 1
CN of nodes 12, 13, …, 35
= None
# CN = 0
CN of nodes 123, …, 235
= None
# CN = 0
1 0 0
4-order expansion
feature vector
20. L A N A D A
Prediction model: Features 20
• These are the list of features we used.
Feature Definition
Geometric mean (GM) 𝑥 𝑛(𝑐) = 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛
1
|𝐸 𝑛(𝑐)|
Harmonic mean (HM) 𝑥 𝑛(𝑐) =
|𝐸 𝑛(𝑐)|
𝑒 𝑛∈𝐸 𝑛(𝑐) 𝜔 𝑛 𝑒 𝑛
−1
Arithmetic mean (AM) 𝑥 𝑛 𝑐 =
1
𝐸 𝑛 𝑐 𝑒 𝑛∈𝐸 𝑛 𝑐 𝜔 𝑛 𝑒 𝑛
Common neighbors (CN) 𝑥 𝑛 𝑐 = 𝑣 𝑛⊆𝑐
𝑁 𝑛 𝑣 𝑛
Jaccard coefficient (JC) 𝑥 𝑛 𝑐 =
𝑣 𝑛⊆𝑐 𝑁 𝑛 𝑣 𝑛
𝑣 𝑛⊆𝑐
𝑁 𝑛(𝑣 𝑛)
Adamic-Adar index (AA) 𝑥 𝑛 𝑐 = 𝑢 𝑛∈ 𝑣 𝑛⊆𝑐
𝑁 𝑛 𝑣 𝑛
1
log |𝑁 𝑛(𝑢 𝑛)|
Features widely used
in link prediction
Mean variations
21. L A N A D A
Prediction model: Classifier 21
• Classifier: logistic regression classifier with L2 regularization
• Classifier input: feature vector from n-order expansion
• Classifier output: 1/0
Classifier
… … …
n-order expansion
feature vector
1/0
How does prediction performance change as we increase n?
23. L A N A D A
Setup: Datasets
• 15 datasets from 8 domains
• Ranges from about 1,000 to 2,500,000 hyperedges
1) Email: recipient addresses of an email
2) Contact: persons that appeared in face-to-face proximity
3) Drug components: classes or substances within a single drug, listed in the National Drug Code Directory
4) Drug use: drugs used by a patient, reported to the Drug Abuse Warning Network, before an emergency visit
5) US Congress: congress members cosponsoring a bill
6) Online tags: tags in a question in Stack Exchange forums
7) Online threads: users answering a question in Stack Exchange forums
8) Coauthorship: coauthors of a publication
24. L A N A D A
Setup: Training and evaluation 24
• Training and test sets
• First, generate the candidate set 𝐶
• Then split 𝐶 into training (50%) and test(50%) sets
• Performance metric: Area Under Curve – Precision and Recall (AUC-PR)
• Recall: “How many true hyperedges can you find?”
• Precision: “How precisely can you find true hyperedges?”
AUC-PR
# hyperedges that I claim “true”
# true hyperedges I found
# true hyperedges
# true hyperedges I found
Recall =
Precision =
25. L A N A D A
Results and messages (1) 25
(M1) More higher-order information leads to better prediction quality, but with
diminishing returns.
26. L A N A D A
Results and messages (1) 26
(M1) More higher-order information leads to better prediction quality, but with
diminishing returns.
Large gain
Small gain
27. L A N A D A
Results and messages (2) 27
(M2) More hardness of the task makes higher-order information even more valuable.
harder
harder
Hardness of the task
Stars < Cliques
1:1 < 1:2 < 1:5 < 1:10
28. L A N A D A
Results and messages (3) 28
(M3) Why is higher-order information more important in some datasets than in others?
Such datasets have the following properties:
(i) Higher-order information is more abundant.
(ii) Higher-order information share less information with pairwise ones.
“How to measure abundance of 3-way information?”
# all possible 3-way combinations
# edges in 3-pg
Edge density =
× 100%
29. L A N A D A
Results and messages (3) 29
(M3) Why is higher-order information more important in some datasets than in others?
Such datasets have the following properties:
(i) Higher-order information is more abundant.
(ii) Higher-order information share less information with pairwise ones.
𝐼 𝑊3; 𝑊2
Mutual information
“Shared information
between 2-pg and 3-pg”
1. Sample three nodes 𝑣1, 𝑣2, 𝑣3 from hypergraph
“How to measure shared information
between 2-way and of 3-way?”
Conditional entropy
“Information exclusive
to 3-pg”
𝐻 𝑊3|𝑊2
2. Obtain from 2-pg
𝑊2: = (𝑤2 𝑣1, 𝑣2 , 𝑤2 𝑣2, 𝑣3 , 𝑤2 𝑣1, 𝑣3 )
3. Obtain from 3-pg
𝑊3 ≔ 𝑤3(𝑣1, 𝑣2, 𝑣3)
31. L A N A D A
Conclusion 31
• We ask and answer the following questions.
1) How much higher-order information is needed to accurately represent a hypergraph?
2) When is such higher-order information particularly useful?
3) Why is higher-order information important in some datasets more than in others?
• Our results could offer insights to future works on hypergraphs.
• E.g., higher performance on hypergraph tasks, but with less computational complexity
• Some examples of hypergraph tasks:
1
2
3
45
4
4
?
Node classification
1
2
3
45
Node embedding
32. L A N A D A
Links 32
• Preprint: https://arxiv.org/pdf/2001.11181.pdf
• Source code & Supplementary document: https://github.com/granelle/www20-higher-order
• Datasets: https://www.cs.cornell.edu/~arb/data/