Simplicial closure and higher-order link prediction --- SIAMNS18

1
Joint work with
Rediet Abebe & Jon Kleinberg (Cornell)
Michael Schaub & Ali Jadbabaie (MIT)
Simplicial closure and higher-order link prediction
Austin R. Benson · Cornell
SIAM NS · July 13, 2018
Paper. arXiv 1802.06916
Data. bit.ly/sc-holp-data
Code. bit.ly/sc-holp-code
Slides. bit.ly/arb-SIAMAN18

Canonical networks are everywhere slide,
but with a hidden purpose!
2
Collaboration
nodes are people/groups
edges link entities
working together
Communications
nodes are people/accounts
edges show info.exchange
Physical proximity
nodes are people/animals
edges link those that interact
in close proximity
Drug compounds
nodes are substances
edge between substances that
appear in the same drug

Real-world systems are composed of“higher-order”
interactions that we often reduce to pairwise ones.
3
Collaboration
nodes are people/groups
teams are made up of
small groups
Communications
nodes are people/accounts
emails often have several
recipients,not just one
Physical proximity
nodes are people/animals
people often gather in
small groups
Drug compounds
nodes are substances
drugs are made up of
several substances

There are many ways to mathematically represent the
higher-order structure present in relational data.
4
• Hypergraphs [Berge 89]
• Set systems [Frankl 95]
• Affiliation networks [Feld 81,Newman-Watts-Strogatz 02]
• Multipartite networks [Lambiotte-Ausloos 05,Lind-Herrmann 07]
• Abstract simplicial complexes [Lim 15,Osting-Palande-Wang 17]
• Multilayer networks [Kivelä+ 14,Boccaletti+ 14,many others…]
• Meta-paths [Sun-Han 12]
• Motif-based representations [Benson-Gleich-Leskovec 15,17]
• …
⟶ All lead to their own algorithms and methods for understanding dynamics.
Data representation is not the problem.
The problem is how do we evaluate and compare models and algorithms?

We propose“higher-order link prediction”as a similar
framework for evaluation of higher-order models.
5
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data.
Observe simplices up to some
time t. Using this data, we want
to predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!
Possible applications
• Novel combinations of drugs for treatments.
• Group chat recommendation in social networks.
• Team formation.

We collected many datasets of timestamped simplices,
where each simplex is a subset of nodes.
6
1. Coauthorship in different domains.
2. Emails with multiple recipients.
3. Tags on Q&A forums.
4. Threads on Q&A forums.
5. Contact/proximity measurements.
6. Musical artist collaboration.
7. Substance makeup and
classification codes applied to
drugs the FDA examines.
8. U.S. Congress committee
memberships and bill sponsorship.
9. Combinations of drugs seen in
patients in ER visits. https://math.stackexchange.com/q/80181
bit.ly/sc-holp-data

Thinking of higher-order data as a weighted projected
graph with“filled-in”structures is a convenient viewpoint.
7
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data. Pictures to have in mind.
Projected graph W.
Wij = # of simplices containing nodes i and j.

9
i
j k
i
j k
Warm-up. What’s more common in data?
or
“Open triangle”
each pair has been in a simplex
together but all 3 nodes have
never been in the same simplex
“Closed triangle”
there is some simplex that
contains all 3 nodes

music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-stack-overflow
threads-math-sx
threads-ask-ubuntu
contact-high-school
contact-primary-school
10 5
10 4
10 3
10 2
10 1
Edge density in projected graph
0.00
0.25
0.50
0.75
1.00
Fractionoftrianglesopen
There is lots of variation in the fraction of triangles that
are open,but datasets from the same domain are similar.
10
See also [Patania-Petri-Vaccarino 17] for fraction of open
triangles on several arxiv collaboration networks.

We still have variety with only 3-node simplices.
11
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-math-sx
threads-ask-ubuntu
contact-high-school
10 5
10 4
10 3
10 2
10 1
0.00
0.25
0.50
0.75
1.00
Exactly 3 nodes per simplex

A simple model can account for open triangle variation.
12
• n nodes; only 3-node simplices; {i, j, k} included with prob. p = 1 / nb i.i.d.
• ⟶ always get ϴ(pn3) = ϴ(n3 - b) closed triangles in expectation.
b = 0.8, 0.82, 0.84, ..., 1.8
Larger b is darker marker.
Proposition (rough).
• b < 1 ⟶ ϴ(n3) open triangles in
expectation for large n.
• b > 1 ⟶ ϴ(n3(2-b)) open triangles
in expectation for large n.
• The number of open triangles
grows faster for b < 3/2.
10 1
100
0.00
0.25
0.50
0.75
1.00
Exactly 3 nodes per simplex (simulated)
n = 200
n = 100
n = 50
n = 25

Dataset domain separation also occurs at the local level.
13
• Randomly sample 100 egonets per dataset and measure
log of average degree and fraction of open triangles.
• Logistic regression model to predict domain
(coauthorship, tags, threads, email, contact).
• 75% model accuracy vs. 21% with random guessing.

14
Simplicial closure.
How do new simplices appear?
How do new closed triangles appear?

Groups of nodes go through trajectories until finally
reaching a“simplicial closure event.”
15
1
2
3
4
5
6
7
8
9
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
1
2 6
1
2 6
1
2 6
1
2 6
1
2 6
{1, 2, 3, 4}
t1
{1, 6}
t3
{2, 6}
t4
{1, 2, 6}
t8
For this talk, we will focus on simplicial closure on 3 nodes.

Groups of nodes go through trajectories until finally
reaching a“simplicial closure event.”
16
Substances in marketed drugs recorded in the National Drug Code directory.
HIV protease
inhibitors
UGT1A1
inhibitors
Breast cancer
resistance protein inhibitors
1
2+
2+
1
2+
1
1
2+
2+
1
2+
2+
2+
Reyataz
RedPharm
2003
Reyataz
Squibb & Sons
2003
Kaletra
Physicians
Total Care
2006
Promacta
GSK (25mg)
2008
Promacta
GSK (50mg)
2008
Kaletra
DOH Central
Pharmacy
2009
Evotaz
Squibb & Sons
2015
We bin weighted edges into “weak” and “strong ties” in the projected graph W.
• Weak ties. Wij = 1 (one simplex contains i and j)
• Strong ties. Wij > 2 (at least two simplices contain i and j)

1
2+1
1
2+
1
1
1
1
2+
2+
2+
1
1
2+
2+
1
2+
2+
2+
245,996
74,219
14,541
7,575
5,781
773
3,560
952
445
389
618
285
2,732,839
157,236
66,644
7,987
8,844
328
3,171
779
722
285
We can also study temporal dynamics in aggregate.
17
Coauthorship data of scholars publishing in history.
Most groups formed have no previous interaction.
Open triangle of all weak
ties more likely to form a
strong tie before closing.

Simplicial closure depends on structure in projected graph.
18
• First 80% of the data (in time) ⟶ record configurations of triplets not in closed triangle.
• Remainder of data ⟶ find fraction that are now closed triangles.
Increased edge density
increases closure probability.
Increased tie strength
Tension between edge
density and tie strength.
Left and middle observations are consistent with theory and empirical studies of social networks.
[Granovetter 73; Leskovec+ 08; Backstrom+ 06; Kossinets-Watts 06]
Closure probability Closure probability Closure probability

19
Higher-order link prediction.
How can we predict which new simplices appear?
How can we predict which triangles become closed?

We propose“higher-order link prediction”as a similar
framework for evaluation of higher-order models.
20
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data.
Observe simplices up to some
time t. Using this data, want to
predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!

21
Our structural analysis tells us what we should be
looking at for prediction.
1. Edge density is a positive indicator.
⟶ focus our attention on predicting which open
triangles become closed triangles.
2. Tie strength is a positive indicator.
⟶ various ways of incorporating this information
i
j k
Wij
Wjk
Wjk

22
For every open triangle,we assign a score function on
first 80% of data based on structural properties.
Four broad classes of score functions for an open triangle.
Score s(i, j, k)…
1. is a function of Wij, Wjk, Wjk
arithmetic mean, geometric mean, etc.
2. looks at common neighbors of the three nodes.
generalized Jaccard, Adamic-Adar, etc.
3. uses “whole-network” similarity scores on projected graph
sum of PageRank or Katz scores amongst edges
4. is learned from data
train a logistic regression model with features
i
j k
Wij
Wjk
Wjk
After computing scores, predict that open triangles with highest
scores will be closed triangles in final 20% of data.

24
A few lessons learned from applying all of these ideas.
1. We can predict pretty well on all datasets using some method.
→ 4x to 107x better than random w/r/t mean average precision
depending on the dataset/method
2. Thread co-participation and co-tagging on stack exchange are
consistently easy to predict.
3. Simply averaging Wij, Wjk, and Wik consistently performs well.
i
j k
Wij
Wjk
Wjk

Generalized means of edges weights are often good
predictors of new 3-node simplices appearing.
25
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-math-sx
threads-ask-ubuntu
contact-high-school
harmonic geometric arithmetic
p
4 3 2 1 0 1 2 3 4
0
20
40
60
80
Relativeperformance
4 3 2 1 0 1 2 3 4
p
2.5
5.0
7.5
10.0
12.5
Relativeperformance
4 3 2 1 0 1 2 3 4
p
1.0
1.5
2.0
2.5
3.0
3.5
Relativeperformance
Good performance from this local information is a deviation from classical link prediction,where
methods that use long paths (e.g.,PageRank) perform well [Liben-Nowell & Kleinberg 07].
For structures on k nodes,the subsets of size k-1 contain rich information only when k > 2.
i
j k
Wij
Wjk
Wjk
i
j k
?
scorep(i, j, k)
= (Wp
ij + Wp
jk + Wp
ik)1/p
<latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit>

There are lots of opportunities in studying this data.
26
1. Higher-order data is pervasive!
We have ways to represent data, and higher-order link prediction is a
general framework for comparing comparing models and methods.
2. There is rich static and temporal structure in the datasets we collected.
3. Do you have a better model, algorithm, or data representation?
Great! Show us by out-performing these baselines.

Simplicial closure and higher-order link prediction.
27
Paper. arXiv 1802.06916
Slides. bit.ly/arb-SIAMAN18
Austin R. Benson
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
THANKS!
i
j k

Most open triangles do not come from asynchronous
temporal behavior.
28
i
j k
In 61.1% to 97.4% of open triangles,
all three pairs of edges have an
overlapping period of activity.
⟶ there is an overlapping period of
activity between all 3 edges
(Helly’s theorem).
# overlaps
Dataset # open triangles 0 1 2 3
coauth-DBLP 1,295,214 0.012 0.143 0.123 0.722
coauth-MAG-history 96,420 0.002 0.055 0.059 0.884
coauth-MAG-geology 2,494,960 0.010 0.128 0.109 0.753
tags-stack-overﬂow 300,646,440 0.002 0.067 0.071 0.860
tags-math-sx 2,666,353 0.001 0.040 0.049 0.910
tags-ask-ubuntu 3,288,058 0.002 0.088 0.085 0.825
threads-stack-overﬂow 99,027,304 0.001 0.034 0.037 0.929
threads-math-sx 11,294,665 0.001 0.038 0.039 0.922
threads-ask-ubuntu 136,374 0.000 0.020 0.023 0.957
NDC-substances 1,136,357 0.020 0.196 0.151 0.633
NDC-classes 9,064 0.022 0.191 0.136 0.652
DAWN 5,682,552 0.027 0.216 0.155 0.602
congress-committees 190,054 0.001 0.046 0.058 0.895
congress-bills 44,857,465 0.003 0.063 0.113 0.821
email-Enron 3,317 0.008 0.130 0.151 0.711
email-Eu 234,600 0.010 0.131 0.132 0.727
contact-high-school 31,850 0.000 0.015 0.019 0.966
contact-primary-school 98,621 0.000 0.012 0.014 0.974
music-rap-genius 70,057 0.028 0.221 0.141 0.611

Simplicial closure probability on 4 nodes has similar
behavior to those with 3 nodes,just“up one dimension”.
LA/OPT Seminar 29
Take first 80% of the data (in time),record the configuration of every 4 nodes,
and compute the fraction that simplicially close in the final 20% of the data.
Increased edge density
Increased simplicial tie strength
Tension between
simplicial density and
simplicial tie strength.

Simplicial closure and higher-order link prediction --- SIAMNS18

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Simplicial closure and higher-order link prediction --- SIAMNS18

Similar to Simplicial closure and higher-order link prediction --- SIAMNS18 (20)

More from Austin Benson

More from Austin Benson (14)

Recently uploaded

Recently uploaded (20)

Simplicial closure and higher-order link prediction --- SIAMNS18