2. Hypergraphs are Everywhere
Icon made by Freepik from www.flaticon.com
Collaborations of Researchers Co-purchases of Items Joint Interactions of Proteins
• Hypergraphs consist of nodes and hyperedges.
• Each hyperedge is a subset of any number of nodes.
3. Our Questions
Q1 What are structural design principles of real-world hypergraphs?
Real-world Hypergraph Randomized Hypergraph
VS
4. Our Questions (cont.)
Q2 How can we compare local structures of hypergraphs of different sizes?
VS
Small Hypergraph Large Hypergraph
10. Hypergraph Motifs: Properties
Exhaustive
H-motifs capture connectivity patterns of all possible three connected
hyperedges.
Unique
Connectivity pattern of any three connected hyperedges is captured
by at most one h-motif.
Size Independent
H-motifs capture connectivity patterns independently of the sizes of
hyperedges.
11. Hypergraph Motifs: Properties
Question:
Why are non-pairwise relations considered?
Answer:
Non-pairwise relations play a key role in capturing the local
structural patterns of real-world hypergraphs.
𝒆 𝟐
𝒆 𝟏 𝒆 𝟒 𝒆 𝟑
𝒆 𝟏 𝒆 𝟒
𝑒), 𝑒*, 𝑒+ and 𝑒), 𝑒,, 𝑒+ have same pairwise
relations, while their connectivity patterns are
distinguished by h-motifs.
12. Characteristic Profiles
Question:
How can we summarize structural properties of hypergraphs?
Answer:
We compute a compact vector of normalized significance of
every h-motif.
13. Characteristic Profiles (cont.)
• Significance of h-motifs
∆0≔
𝑀 𝑡 − 𝑀5678 𝑡
𝑀 𝑡 + 𝑀5678 𝑡 + 𝜖
• Characteristic Profiles (CPs)
𝐶𝑃0 ≔
∆0
∑0>?
@A
∆0
@
# of instances of h-motif 𝑡
in the given hypergraph
# of instances of h-motif 𝑡
in randomized hypergraphs
15. MoCHy: Motif Counting in Hypergraphs
• Given a hypergraph, how can we count the instances of each h-motif?
• We present MoCHy (Motif Counting in Hypergraphs), a family of parallel
algorithms for counting the instances of h-motifs.
Hypergraph Projection
(Preprocessing Step)
MoCHy-A
MoCHy-E
MoCHy-A+
Exact Counting
Approximate Counting
16. MoCHy: Hypergraph Projection
• Every version of MoCHy builds the projected graph ̅𝐺 = 𝐸,∧, 𝜔 of
the input hypergraph 𝐺 = 𝑉, 𝐸 .
• To find the neighbors of each hyperedge 𝑒", find hyperedge 𝑒# that
contains any node 𝑣 ∈ 𝑒".
FH
L
K
B G
S
R
𝒆 𝟏
𝒆 𝟐
𝒆 𝟑
𝒆 𝟒
𝒆 𝟐 𝒆 𝟑
𝒆 𝟏 𝒆 𝟒
hyperwedge ∧*,
17. MoCHy-E: Exact Counting
𝒆 𝟐 𝒆 𝟒
𝒆 𝟑 𝒆 𝟓
𝒆 𝟏
• Enumerate all three connected hyperedges in the hypergraph.
• Increment the count of h-motif corresponding to each instance.
𝑒?, 𝑒@, 𝑒L
𝑒?, 𝑒@, 𝑒M
𝑒@, 𝑒L, 𝑒M
𝑒@, 𝑒L, 𝑒N
𝑒L, 𝑒M, 𝑒N
Definition
𝒉 𝒆𝒊, 𝒆𝒋, 𝒆 𝒌 : h-motif corresponding to an instance 𝑒", 𝑒#, 𝑒$
𝑴 𝒕 : count of h-motif 𝑡’s instances
𝑀 ℎ 𝑒?, 𝑒@, 𝑒L ↑
𝑀 ℎ 𝑒?, 𝑒@, 𝑒M ↑
𝑀 ℎ 𝑒@, 𝑒L, 𝑒M ↑
𝑀 ℎ 𝑒@, 𝑒L, 𝑒N ↑
𝑀 ℎ 𝑒L, 𝑒M, 𝑒N ↑
19. MoCHy-A: Hyperedge Sampling
𝒆 𝒔 𝒆 𝟑
𝒆 𝟐 𝒆 𝟒
𝒆 𝟏
Sampled hyperedge
𝑒U, 𝑒?, 𝑒@
𝑒U, 𝑒?, 𝑒L
𝑒U, 𝑒@, 𝑒L
𝑒U, 𝑒@, 𝑒M
• Sample 𝑠 hyperedges from the hyperedge set 𝐸 uniformly at
random.
• For each sampled hyperedge 𝑒U, count the number of instances
of each h-motif 𝑡 that contains 𝑒U.
• Rescale the total approximate counts based on the sample size.
𝒔
Rescale
20. MoCHy-A+: Hyperwedge Sampling
𝒆𝒊 𝒆 𝟐
𝒆𝒋 𝒆 𝟑
𝒆 𝟏
Sampled hyperwedge
𝑒", 𝑒#, 𝑒?
𝑒", 𝑒#, 𝑒@
𝑒", 𝑒#, 𝑒L
• Sample 𝑟 hyperwedges from the hyperwedge set ∧ uniformly at
random.
• For each sampled hyperwedge ∧"#= 𝑒", 𝑒# , count the number
of instances of each h-motif 𝑡 that contains ∧"#.
• Rescale the total approximate counts based on the sample size.
𝒓
Rescale
22. Experimental Settings
• 11 real-world hypergraphs from 5 different domains
• All versions of MoCHy implemented using C++ and OpenMP.
Co-authorship E-mail Contact Tags Threads
&
for parallelization
28. EXP3. Observations and Applications (cont.)
• Hyperedge prediction: classify real hyperedges and fake ones.
• H-motifs give informative features.
HM26 HM7 HC
Logistic Regression
ACC 0.754 0.656 0.636
AUC 0.813 0.693 0.691
Random Forest
ACC 0.768 0.741 0.639
AUC 0.852 0.779 0.692
Decision Tree
ACC 0.731 0.684 0.613
AUC 0.732 0.685 0.616
K-Nearest Neighbors
ACC 0.694 0.689 0.640
AUC 0.750 0.743 0.684
MLP Classifier
ACC 0.795 0.762 0.646
AUC 0.875 0.841 0.701
ACC: accuracy, AUC: area under the ROC curve
The number of each h-motif’s instances that
contain each hyperedge.
HM26 (∈ ℝ&'
)
The seven features with the largest variance
among those in HM26.
HM7 (∈ ℝ(
)
The mean, maximum, and minimum degree
and the mean, maximum, and minimum
number of neighbors of the nodes in each
hyperedge and its size.
HC (∈ ℝ()
29. EXP4. Performance of Counting Algorithms
• MoCHy-A+ is up to 25X more accurate than MoCHy-A.
• MoCHy-A+ is up to 32.5X faster than MoCHy-E.
26.3X
5.9X
0.00
0.01
0.02
0.03
0.04
0 2 4 6
Elapsed Time (sec)
RelativeError
28.2X
10.2X
0.00
0.01
0.02
0.03
0 20 40 60
Elapsed Time (sec)
RelativeError
11.1X
8.4X
0.00
0.02
0.04
0.06
0.08
0 2 4 6 8 10
Elapsed Time (sec)
RelativeError
32.5X
24.6X
0.00
0.01
0.02
0.03
0.04
0 1000 2000
Elapsed Time (sec)
RelativeError
24.2X
25X
0.00
0.02
0.04
0.06
0 200 400
Elapsed Time (sec)
RelativeError
7.6X
7.6X
0.00
0.05
0.10
0.15
0.0 0.1 0.2
Elapsed Time (sec)
RelativeError
Definition
Relative Error:
∑'()
*+
* + , -* +
∑'()
*+ * +
threads-ubuntu
contact-high email-Enron coauth-history
contact-primary email-Eu
30. EXP4. Performance of Counting Algorithms (cont.)
• CPs obtained by MoCHy-A+ are estimated near perfectly even with a
smaller number of samples.
-0.2
0.0
0.2
1 6 11 16 21 26
Hypergraph Motif Index
Normalized
Significance
-0.2
0.0
0.2
1 6 11 16 21 26
Hypergraph Motif Index
Normalized
Significance
-0.2
0.0
0.2
1 6 11 16 21 26
Hypergraph Motif Index
Normalized
Significance
MoCHy-A+ (! = ∧ ⋅ 0.01)
MoCHy-A+ (! = ∧ ⋅ 0.05)
MoCHy-A+ (! = ∧ ⋅ 0.001)
MoCHy-A+ (! = ∧ ⋅ 0.005)
MoCHy-E
coauth-history
contact-primary
email-Eu
31. EXP4. Performance of Counting Algorithms (cont.)
• Both MoCHy-E and MoCHy-A+ achieve significant speedups with
multiple threads.
0
1000
2000
1 2 3 4 5 6 7 8
Number of Threads
ElapsedTime(sec)
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
Number of Threads
Speedup
MoCHy-A+ (! = 1M)
MoCHy-A+ (! = 2M)
MoCHy-A+ (! = 4M)
MoCHy-A+ (! = 8M)
MoCHy-E
32. MoCHy-A+ (! = 1M)
MoCHy-A+ (! = 2M)
MoCHy-A+ (! = 4M)
MoCHy-A+ (! = 8M)
0
500
1000
1500
2000
0 0.1 1 10 100
Memory (% of Edges)
ElapsedTime(sec)
1
2
3
4
5
0 0.1 1 10 100
Memory (% of Edges)
Speedup
EXP4. Performance of Counting Algorithms (cont.)
• Memoizing a small fraction of projected graphs leads to significant
speedups of MoCHy-A+.