SlideShare a Scribd company logo
1
Joint work with
Rediet Abebe & Jon Kleinberg (Cornell)
Michael Schaub & Ali Jadbabaie (MIT)
Simplicial closure and higher-order link prediction
Austin R. Benson · Cornell
SIAM NS · July 13, 2018
Paper. arXiv 1802.06916
Data. bit.ly/sc-holp-data
Code. bit.ly/sc-holp-code
Slides. bit.ly/arb-SIAMAN18
Canonical networks are everywhere slide,
but with a hidden purpose!
2
Collaboration
nodes are people/groups
edges link entities
working together
Communications
nodes are people/accounts
edges show info.exchange
Physical proximity
nodes are people/animals
edges link those that interact
in close proximity
Drug compounds
nodes are substances
edge between substances that
appear in the same drug
Real-world systems are composed of“higher-order”
interactions that we often reduce to pairwise ones.
3
Collaboration
nodes are people/groups
teams are made up of
small groups
Communications
nodes are people/accounts
emails often have several
recipients,not just one
Physical proximity
nodes are people/animals
people often gather in
small groups
Drug compounds
nodes are substances
drugs are made up of
several substances
There are many ways to mathematically represent the
higher-order structure present in relational data.
4
• Hypergraphs [Berge 89]
• Set systems [Frankl 95]
• Affiliation networks [Feld 81,Newman-Watts-Strogatz 02]
• Multipartite networks [Lambiotte-Ausloos 05,Lind-Herrmann 07]
• Abstract simplicial complexes [Lim 15,Osting-Palande-Wang 17]
• Multilayer networks [Kivelä+ 14,Boccaletti+ 14,many others…]
• Meta-paths [Sun-Han 12]
• Motif-based representations [Benson-Gleich-Leskovec 15,17]
• …
⟶ All lead to their own algorithms and methods for understanding dynamics.
Data representation is not the problem.
The problem is how do we evaluate and compare models and algorithms?
We propose“higher-order link prediction”as a similar
framework for evaluation of higher-order models.
5
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data.
Observe simplices up to some
time t. Using this data, we want
to predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!
Possible applications
• Novel combinations of drugs for treatments.
• Group chat recommendation in social networks.
• Team formation.
We collected many datasets of timestamped simplices,
where each simplex is a subset of nodes.
6
1. Coauthorship in different domains.
2. Emails with multiple recipients.
3. Tags on Q&A forums.
4. Threads on Q&A forums.
5. Contact/proximity measurements.
6. Musical artist collaboration.
7. Substance makeup and
classification codes applied to
drugs the FDA examines.
8. U.S. Congress committee
memberships and bill sponsorship.
9. Combinations of drugs seen in
patients in ER visits. https://math.stackexchange.com/q/80181
bit.ly/sc-holp-data
Thinking of higher-order data as a weighted projected
graph with“filled-in”structures is a convenient viewpoint.
7
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data. Pictures to have in mind.
Projected graph W.
Wij = # of simplices containing nodes i and j.
8
5
23
16
20
9
i
j k
i
j k
Warm-up. What’s more common in data?
or
“Open triangle”
each pair has been in a simplex
together but all 3 nodes have
never been in the same simplex
“Closed triangle”
there is some simplex that
contains all 3 nodes
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-stack-overflow
threads-math-sx
threads-ask-ubuntu
contact-high-school
contact-primary-school
10 5
10 4
10 3
10 2
10 1
Edge density in projected graph
0.00
0.25
0.50
0.75
1.00
Fractionoftrianglesopen
There is lots of variation in the fraction of triangles that
are open,but datasets from the same domain are similar.
10
See also [Patania-Petri-Vaccarino 17] for fraction of open
triangles on several arxiv collaboration networks.
We still have variety with only 3-node simplices.
11
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-stack-overflow
threads-math-sx
threads-ask-ubuntu
contact-high-school
contact-primary-school
10 5
10 4
10 3
10 2
10 1
Edge density in projected graph
0.00
0.25
0.50
0.75
1.00
Fractionoftrianglesopen
Exactly 3 nodes per simplex
A simple model can account for open triangle variation.
12
• n nodes; only 3-node simplices; {i, j, k} included with prob. p = 1 / nb i.i.d.
• ⟶ always get ϴ(pn3) = ϴ(n3 - b) closed triangles in expectation.
b = 0.8, 0.82, 0.84, ..., 1.8
Larger b is darker marker.
Proposition (rough).
• b < 1 ⟶ ϴ(n3) open triangles in
expectation for large n.
• b > 1 ⟶ ϴ(n3(2-b)) open triangles
in expectation for large n.
• The number of open triangles
grows faster for b < 3/2.
10 1
100
Edge density in projected graph
0.00
0.25
0.50
0.75
1.00
Fractionoftrianglesopen
Exactly 3 nodes per simplex (simulated)
n = 200
n = 100
n = 50
n = 25
Dataset domain separation also occurs at the local level.
13
• Randomly sample 100 egonets per dataset and measure
log of average degree and fraction of open triangles.
• Logistic regression model to predict domain
(coauthorship, tags, threads, email, contact).
• 75% model accuracy vs. 21% with random guessing.
14
Simplicial closure.
How do new simplices appear?
How do new closed triangles appear?
Groups of nodes go through trajectories until finally
reaching a“simplicial closure event.”
15
1
2
3
4
5
6
7
8
9
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
1
2 6
1
2 6
1
2 6
1
2 6
1
2 6
{1, 2, 3, 4}
t1
{1, 6}
t3
{2, 6}
t4
{1, 2, 6}
t8
For this talk, we will focus on simplicial closure on 3 nodes.
Groups of nodes go through trajectories until finally
reaching a“simplicial closure event.”
16
Substances in marketed drugs recorded in the National Drug Code directory.
HIV protease
inhibitors
UGT1A1
inhibitors
Breast cancer
resistance protein inhibitors
1
2+
2+
1
2+
1
1
2+
2+
1
2+
2+
2+
Reyataz
RedPharm
2003
Reyataz
Squibb & Sons
2003
Kaletra
Physicians
Total Care
2006
Promacta
GSK (25mg)
2008
Promacta
GSK (50mg)
2008
Kaletra
DOH Central
Pharmacy
2009
Evotaz
Squibb & Sons
2015
We bin weighted edges into “weak” and “strong ties” in the projected graph W.
Wij = # of simplices containing nodes i and j.
• Weak ties. Wij = 1 (one simplex contains i and j)
• Strong ties. Wij > 2 (at least two simplices contain i and j)
1
2+1
1
2+
1
1
1
1
2+
2+
2+
1
1
2+
2+
1
2+
2+
2+
245,996
74,219
14,541
7,575
5,781
773
3,560
952
445
389
618
285
2,732,839
157,236
66,644
7,987
8,844
328
3,171
779
722
285
We can also study temporal dynamics in aggregate.
17
Coauthorship data of scholars publishing in history.
Wij = # of simplices containing nodes i and j.
Most groups formed have no previous interaction.
Open triangle of all weak
ties more likely to form a
strong tie before closing.
Simplicial closure depends on structure in projected graph.
18
• First 80% of the data (in time) ⟶ record configurations of triplets not in closed triangle.
• Remainder of data ⟶ find fraction that are now closed triangles.
Increased edge density
increases closure probability.
Increased tie strength
increases closure probability.
Tension between edge
density and tie strength.
Left and middle observations are consistent with theory and empirical studies of social networks.
[Granovetter 73; Leskovec+ 08; Backstrom+ 06; Kossinets-Watts 06]
Closure probability Closure probability Closure probability
19
Higher-order link prediction.
How can we predict which new simplices appear?
How can we predict which triangles become closed?
We propose“higher-order link prediction”as a similar
framework for evaluation of higher-order models.
20
t1 : {1, 2, 3, 4}
t2 : {1, 3, 5}
t3 : {1, 6}
t4 : {2, 6}
t5 : {1, 7, 8}
t6 : {3, 9}
t7 : {5, 8}
t8 : {1, 2, 6}
Data.
Observe simplices up to some
time t. Using this data, want to
predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!
21
Our structural analysis tells us what we should be
looking at for prediction.
1. Edge density is a positive indicator.
⟶ focus our attention on predicting which open
triangles become closed triangles.
2. Tie strength is a positive indicator.
⟶ various ways of incorporating this information
i
j k
Wij
Wjk
Wjk
22
For every open triangle,we assign a score function on
first 80% of data based on structural properties.
Four broad classes of score functions for an open triangle.
Score s(i, j, k)…
1. is a function of Wij, Wjk, Wjk
arithmetic mean, geometric mean, etc.
2. looks at common neighbors of the three nodes.
generalized Jaccard, Adamic-Adar, etc.
3. uses “whole-network” similarity scores on projected graph
sum of PageRank or Katz scores amongst edges
4. is learned from data
train a logistic regression model with features
i
j k
Wij
Wjk
Wjk
After computing scores, predict that open triangles with highest
scores will be closed triangles in final 20% of data.
23
24
A few lessons learned from applying all of these ideas.
1. We can predict pretty well on all datasets using some method.
→ 4x to 107x better than random w/r/t mean average precision
depending on the dataset/method
2. Thread co-participation and co-tagging on stack exchange are
consistently easy to predict.
3. Simply averaging Wij, Wjk, and Wik consistently performs well.
i
j k
Wij
Wjk
Wjk
Generalized means of edges weights are often good
predictors of new 3-node simplices appearing.
25
music-rap-genius
NDC-substances
NDC-classes
DAWN
coauth-DBLP
coauth-MAG-geology
coauth-MAG-history
congress-bills
congress-committees
tags-stack-overflow
tags-math-sx
tags-ask-ubuntu
email-Eu
email-Enron
threads-stack-overflow
threads-math-sx
threads-ask-ubuntu
contact-high-school
contact-primary-school
harmonic geometric arithmetic
p
4 3 2 1 0 1 2 3 4
0
20
40
60
80
Relativeperformance
4 3 2 1 0 1 2 3 4
p
2.5
5.0
7.5
10.0
12.5
Relativeperformance
4 3 2 1 0 1 2 3 4
p
1.0
1.5
2.0
2.5
3.0
3.5
Relativeperformance
Good performance from this local information is a deviation from classical link prediction,where
methods that use long paths (e.g.,PageRank) perform well [Liben-Nowell & Kleinberg 07].
For structures on k nodes,the subsets of size k-1 contain rich information only when k > 2.
i
j k
Wij
Wjk
Wjk
i
j k
?
scorep(i, j, k)
= (Wp
ij + Wp
jk + Wp
ik)1/p
<latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit>
There are lots of opportunities in studying this data.
26
1. Higher-order data is pervasive!
We have ways to represent data, and higher-order link prediction is a
general framework for comparing comparing models and methods.
2. There is rich static and temporal structure in the datasets we collected.
3. Do you have a better model, algorithm, or data representation?
Great! Show us by out-performing these baselines.
Data. bit.ly/sc-holp-data
Code. bit.ly/sc-holp-code
Simplicial closure and higher-order link prediction.
27
Paper. arXiv 1802.06916
Data. bit.ly/sc-holp-data
Code. bit.ly/sc-holp-code
Slides. bit.ly/arb-SIAMAN18
Austin R. Benson
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
THANKS!
i
j k
Most open triangles do not come from asynchronous
temporal behavior.
28
i
j k
In 61.1% to 97.4% of open triangles,
all three pairs of edges have an
overlapping period of activity.
⟶ there is an overlapping period of
activity between all 3 edges
(Helly’s theorem).
# overlaps
Dataset # open triangles 0 1 2 3
coauth-DBLP 1,295,214 0.012 0.143 0.123 0.722
coauth-MAG-history 96,420 0.002 0.055 0.059 0.884
coauth-MAG-geology 2,494,960 0.010 0.128 0.109 0.753
tags-stack-overflow 300,646,440 0.002 0.067 0.071 0.860
tags-math-sx 2,666,353 0.001 0.040 0.049 0.910
tags-ask-ubuntu 3,288,058 0.002 0.088 0.085 0.825
threads-stack-overflow 99,027,304 0.001 0.034 0.037 0.929
threads-math-sx 11,294,665 0.001 0.038 0.039 0.922
threads-ask-ubuntu 136,374 0.000 0.020 0.023 0.957
NDC-substances 1,136,357 0.020 0.196 0.151 0.633
NDC-classes 9,064 0.022 0.191 0.136 0.652
DAWN 5,682,552 0.027 0.216 0.155 0.602
congress-committees 190,054 0.001 0.046 0.058 0.895
congress-bills 44,857,465 0.003 0.063 0.113 0.821
email-Enron 3,317 0.008 0.130 0.151 0.711
email-Eu 234,600 0.010 0.131 0.132 0.727
contact-high-school 31,850 0.000 0.015 0.019 0.966
contact-primary-school 98,621 0.000 0.012 0.014 0.974
music-rap-genius 70,057 0.028 0.221 0.141 0.611
Simplicial closure probability on 4 nodes has similar
behavior to those with 3 nodes,just“up one dimension”.
LA/OPT Seminar 29
Take first 80% of the data (in time),record the configuration of every 4 nodes,
and compute the fraction that simplicially close in the final 20% of the data.
Increased edge density
increases closure probability.
Increased simplicial tie strength
increases closure probability.
Tension between
simplicial density and
simplicial tie strength.

More Related Content

What's hot

Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ... Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
Gota Morota
 
Data enriched linear regression
Data enriched linear regressionData enriched linear regression
Data enriched linear regression
Sunny Kr
 
A Three-Layer Visual Hash Function Using Adler-32
A Three-Layer Visual Hash Function Using Adler-32A Three-Layer Visual Hash Function Using Adler-32
A Three-Layer Visual Hash Function Using Adler-32
Universitas Pembangunan Panca Budi
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flows
Austin Benson
 
String Identifier in Multiple Medical Databases
String Identifier in Multiple Medical DatabasesString Identifier in Multiple Medical Databases
String Identifier in Multiple Medical Databases
Eswar Publications
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
Austin Benson
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
IOSR Journals
 
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SETTWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
IJDKP
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
Austin Benson
 
Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fc
Zachary Combs
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
eSAT Journals
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
eSAT Publishing House
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
IJDKP
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
tuxette
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
ijeei-iaes
 
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
Waqas Tariq
 

What's hot (18)

Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ... Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 
Data enriched linear regression
Data enriched linear regressionData enriched linear regression
Data enriched linear regression
 
A Three-Layer Visual Hash Function Using Adler-32
A Three-Layer Visual Hash Function Using Adler-32A Three-Layer Visual Hash Function Using Adler-32
A Three-Layer Visual Hash Function Using Adler-32
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flows
 
String Identifier in Multiple Medical Databases
String Identifier in Multiple Medical DatabasesString Identifier in Multiple Medical Databases
String Identifier in Multiple Medical Databases
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
 
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SETTWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
 
Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fc
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...
 

Similar to Simplicial closure and higher-order link prediction --- SIAMNS18

Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
Austin Benson
 
Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)
Austin Benson
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
Austin Benson
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data Analysis
Austin Benson
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
Austin Benson
 
Simplicial closure and higher-order link prediction LA/OPT
Simplicial closure and higher-order link prediction LA/OPTSimplicial closure and higher-order link prediction LA/OPT
Simplicial closure and higher-order link prediction LA/OPT
Austin Benson
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)nlt2390
 
Community detection
Community detectionCommunity detection
Community detection
Scott Pauls
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
Alexander Pico
 
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
The Statistical and Applied Mathematical Sciences Institute
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Research on Haberman dataset also business required document
Research on Haberman dataset also business required documentResearch on Haberman dataset also business required document
Research on Haberman dataset also business required document
ManjuYadav65
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
Tin180 VietNam
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
IOSR Journals
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approach
Elena Sügis
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
Austin Benson
 
Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory Networks
Michael Stumpf
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...Daniel Katz
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
theijes
 

Similar to Simplicial closure and higher-order link prediction --- SIAMNS18 (20)

Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data Analysis
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Simplicial closure and higher-order link prediction LA/OPT
Simplicial closure and higher-order link prediction LA/OPTSimplicial closure and higher-order link prediction LA/OPT
Simplicial closure and higher-order link prediction LA/OPT
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)
 
Community detection
Community detectionCommunity detection
Community detection
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
ppt
pptppt
ppt
 
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Research on Haberman dataset also business required document
Research on Haberman dataset also business required documentResearch on Haberman dataset also business required document
Research on Haberman dataset also business required document
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
 
Interpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approachInterpretation of the biological knowledge using networks approach
Interpretation of the biological knowledge using networks approach
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory Networks
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 

More from Austin Benson

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)
Austin Benson
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networks
Austin Benson
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modeling
Austin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
Austin Benson
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralities
Austin Benson
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structure
Austin Benson
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.
Austin Benson
 
Higher-order clustering in networks
Higher-order clustering in networksHigher-order clustering in networks
Higher-order clustering in networks
Austin Benson
 
New perspectives on measuring network clustering
New perspectives on measuring network clusteringNew perspectives on measuring network clustering
New perspectives on measuring network clustering
Austin Benson
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
Austin Benson
 
Tensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic ProcessesTensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic Processes
Austin Benson
 
Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017
Austin Benson
 
Spacey random walks CAM Colloquium
Spacey random walks CAM ColloquiumSpacey random walks CAM Colloquium
Spacey random walks CAM Colloquium
Austin Benson
 

More from Austin Benson (14)

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networks
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modeling
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralities
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structure
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.
 
Higher-order clustering in networks
Higher-order clustering in networksHigher-order clustering in networks
Higher-order clustering in networks
 
New perspectives on measuring network clustering
New perspectives on measuring network clusteringNew perspectives on measuring network clustering
New perspectives on measuring network clustering
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Tensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic ProcessesTensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic Processes
 
Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017
 
Spacey random walks CAM Colloquium
Spacey random walks CAM ColloquiumSpacey random walks CAM Colloquium
Spacey random walks CAM Colloquium
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Simplicial closure and higher-order link prediction --- SIAMNS18

  • 1. 1 Joint work with Rediet Abebe & Jon Kleinberg (Cornell) Michael Schaub & Ali Jadbabaie (MIT) Simplicial closure and higher-order link prediction Austin R. Benson · Cornell SIAM NS · July 13, 2018 Paper. arXiv 1802.06916 Data. bit.ly/sc-holp-data Code. bit.ly/sc-holp-code Slides. bit.ly/arb-SIAMAN18
  • 2. Canonical networks are everywhere slide, but with a hidden purpose! 2 Collaboration nodes are people/groups edges link entities working together Communications nodes are people/accounts edges show info.exchange Physical proximity nodes are people/animals edges link those that interact in close proximity Drug compounds nodes are substances edge between substances that appear in the same drug
  • 3. Real-world systems are composed of“higher-order” interactions that we often reduce to pairwise ones. 3 Collaboration nodes are people/groups teams are made up of small groups Communications nodes are people/accounts emails often have several recipients,not just one Physical proximity nodes are people/animals people often gather in small groups Drug compounds nodes are substances drugs are made up of several substances
  • 4. There are many ways to mathematically represent the higher-order structure present in relational data. 4 • Hypergraphs [Berge 89] • Set systems [Frankl 95] • Affiliation networks [Feld 81,Newman-Watts-Strogatz 02] • Multipartite networks [Lambiotte-Ausloos 05,Lind-Herrmann 07] • Abstract simplicial complexes [Lim 15,Osting-Palande-Wang 17] • Multilayer networks [Kivelä+ 14,Boccaletti+ 14,many others…] • Meta-paths [Sun-Han 12] • Motif-based representations [Benson-Gleich-Leskovec 15,17] • … ⟶ All lead to their own algorithms and methods for understanding dynamics. Data representation is not the problem. The problem is how do we evaluate and compare models and algorithms?
  • 5. We propose“higher-order link prediction”as a similar framework for evaluation of higher-order models. 5 t1 : {1, 2, 3, 4} t2 : {1, 3, 5} t3 : {1, 6} t4 : {2, 6} t5 : {1, 7, 8} t6 : {3, 9} t7 : {5, 8} t8 : {1, 2, 6} Data. Observe simplices up to some time t. Using this data, we want to predict what groups of > 2 nodes will appear in a simplex in the future. t 1 2 3 4 5 6 7 8 9 We predict structure that classical link prediction would not even consider! Possible applications • Novel combinations of drugs for treatments. • Group chat recommendation in social networks. • Team formation.
  • 6. We collected many datasets of timestamped simplices, where each simplex is a subset of nodes. 6 1. Coauthorship in different domains. 2. Emails with multiple recipients. 3. Tags on Q&A forums. 4. Threads on Q&A forums. 5. Contact/proximity measurements. 6. Musical artist collaboration. 7. Substance makeup and classification codes applied to drugs the FDA examines. 8. U.S. Congress committee memberships and bill sponsorship. 9. Combinations of drugs seen in patients in ER visits. https://math.stackexchange.com/q/80181 bit.ly/sc-holp-data
  • 7. Thinking of higher-order data as a weighted projected graph with“filled-in”structures is a convenient viewpoint. 7 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 t1 : {1, 2, 3, 4} t2 : {1, 3, 5} t3 : {1, 6} t4 : {2, 6} t5 : {1, 7, 8} t6 : {3, 9} t7 : {5, 8} t8 : {1, 2, 6} Data. Pictures to have in mind. Projected graph W. Wij = # of simplices containing nodes i and j.
  • 9. 9 i j k i j k Warm-up. What’s more common in data? or “Open triangle” each pair has been in a simplex together but all 3 nodes have never been in the same simplex “Closed triangle” there is some simplex that contains all 3 nodes
  • 10. music-rap-genius NDC-substances NDC-classes DAWN coauth-DBLP coauth-MAG-geology coauth-MAG-history congress-bills congress-committees tags-stack-overflow tags-math-sx tags-ask-ubuntu email-Eu email-Enron threads-stack-overflow threads-math-sx threads-ask-ubuntu contact-high-school contact-primary-school 10 5 10 4 10 3 10 2 10 1 Edge density in projected graph 0.00 0.25 0.50 0.75 1.00 Fractionoftrianglesopen There is lots of variation in the fraction of triangles that are open,but datasets from the same domain are similar. 10 See also [Patania-Petri-Vaccarino 17] for fraction of open triangles on several arxiv collaboration networks.
  • 11. We still have variety with only 3-node simplices. 11 music-rap-genius NDC-substances NDC-classes DAWN coauth-DBLP coauth-MAG-geology coauth-MAG-history congress-bills congress-committees tags-stack-overflow tags-math-sx tags-ask-ubuntu email-Eu email-Enron threads-stack-overflow threads-math-sx threads-ask-ubuntu contact-high-school contact-primary-school 10 5 10 4 10 3 10 2 10 1 Edge density in projected graph 0.00 0.25 0.50 0.75 1.00 Fractionoftrianglesopen Exactly 3 nodes per simplex
  • 12. A simple model can account for open triangle variation. 12 • n nodes; only 3-node simplices; {i, j, k} included with prob. p = 1 / nb i.i.d. • ⟶ always get ϴ(pn3) = ϴ(n3 - b) closed triangles in expectation. b = 0.8, 0.82, 0.84, ..., 1.8 Larger b is darker marker. Proposition (rough). • b < 1 ⟶ ϴ(n3) open triangles in expectation for large n. • b > 1 ⟶ ϴ(n3(2-b)) open triangles in expectation for large n. • The number of open triangles grows faster for b < 3/2. 10 1 100 Edge density in projected graph 0.00 0.25 0.50 0.75 1.00 Fractionoftrianglesopen Exactly 3 nodes per simplex (simulated) n = 200 n = 100 n = 50 n = 25
  • 13. Dataset domain separation also occurs at the local level. 13 • Randomly sample 100 egonets per dataset and measure log of average degree and fraction of open triangles. • Logistic regression model to predict domain (coauthorship, tags, threads, email, contact). • 75% model accuracy vs. 21% with random guessing.
  • 14. 14 Simplicial closure. How do new simplices appear? How do new closed triangles appear?
  • 15. Groups of nodes go through trajectories until finally reaching a“simplicial closure event.” 15 1 2 3 4 5 6 7 8 9 t1 : {1, 2, 3, 4} t2 : {1, 3, 5} t3 : {1, 6} t4 : {2, 6} t5 : {1, 7, 8} t6 : {3, 9} t7 : {5, 8} t8 : {1, 2, 6} 1 2 6 1 2 6 1 2 6 1 2 6 1 2 6 {1, 2, 3, 4} t1 {1, 6} t3 {2, 6} t4 {1, 2, 6} t8 For this talk, we will focus on simplicial closure on 3 nodes.
  • 16. Groups of nodes go through trajectories until finally reaching a“simplicial closure event.” 16 Substances in marketed drugs recorded in the National Drug Code directory. HIV protease inhibitors UGT1A1 inhibitors Breast cancer resistance protein inhibitors 1 2+ 2+ 1 2+ 1 1 2+ 2+ 1 2+ 2+ 2+ Reyataz RedPharm 2003 Reyataz Squibb & Sons 2003 Kaletra Physicians Total Care 2006 Promacta GSK (25mg) 2008 Promacta GSK (50mg) 2008 Kaletra DOH Central Pharmacy 2009 Evotaz Squibb & Sons 2015 We bin weighted edges into “weak” and “strong ties” in the projected graph W. Wij = # of simplices containing nodes i and j. • Weak ties. Wij = 1 (one simplex contains i and j) • Strong ties. Wij > 2 (at least two simplices contain i and j)
  • 17. 1 2+1 1 2+ 1 1 1 1 2+ 2+ 2+ 1 1 2+ 2+ 1 2+ 2+ 2+ 245,996 74,219 14,541 7,575 5,781 773 3,560 952 445 389 618 285 2,732,839 157,236 66,644 7,987 8,844 328 3,171 779 722 285 We can also study temporal dynamics in aggregate. 17 Coauthorship data of scholars publishing in history. Wij = # of simplices containing nodes i and j. Most groups formed have no previous interaction. Open triangle of all weak ties more likely to form a strong tie before closing.
  • 18. Simplicial closure depends on structure in projected graph. 18 • First 80% of the data (in time) ⟶ record configurations of triplets not in closed triangle. • Remainder of data ⟶ find fraction that are now closed triangles. Increased edge density increases closure probability. Increased tie strength increases closure probability. Tension between edge density and tie strength. Left and middle observations are consistent with theory and empirical studies of social networks. [Granovetter 73; Leskovec+ 08; Backstrom+ 06; Kossinets-Watts 06] Closure probability Closure probability Closure probability
  • 19. 19 Higher-order link prediction. How can we predict which new simplices appear? How can we predict which triangles become closed?
  • 20. We propose“higher-order link prediction”as a similar framework for evaluation of higher-order models. 20 t1 : {1, 2, 3, 4} t2 : {1, 3, 5} t3 : {1, 6} t4 : {2, 6} t5 : {1, 7, 8} t6 : {3, 9} t7 : {5, 8} t8 : {1, 2, 6} Data. Observe simplices up to some time t. Using this data, want to predict what groups of > 2 nodes will appear in a simplex in the future. t 1 2 3 4 5 6 7 8 9 We predict structure that classical link prediction would not even consider!
  • 21. 21 Our structural analysis tells us what we should be looking at for prediction. 1. Edge density is a positive indicator. ⟶ focus our attention on predicting which open triangles become closed triangles. 2. Tie strength is a positive indicator. ⟶ various ways of incorporating this information i j k Wij Wjk Wjk
  • 22. 22 For every open triangle,we assign a score function on first 80% of data based on structural properties. Four broad classes of score functions for an open triangle. Score s(i, j, k)… 1. is a function of Wij, Wjk, Wjk arithmetic mean, geometric mean, etc. 2. looks at common neighbors of the three nodes. generalized Jaccard, Adamic-Adar, etc. 3. uses “whole-network” similarity scores on projected graph sum of PageRank or Katz scores amongst edges 4. is learned from data train a logistic regression model with features i j k Wij Wjk Wjk After computing scores, predict that open triangles with highest scores will be closed triangles in final 20% of data.
  • 23. 23
  • 24. 24 A few lessons learned from applying all of these ideas. 1. We can predict pretty well on all datasets using some method. → 4x to 107x better than random w/r/t mean average precision depending on the dataset/method 2. Thread co-participation and co-tagging on stack exchange are consistently easy to predict. 3. Simply averaging Wij, Wjk, and Wik consistently performs well. i j k Wij Wjk Wjk
  • 25. Generalized means of edges weights are often good predictors of new 3-node simplices appearing. 25 music-rap-genius NDC-substances NDC-classes DAWN coauth-DBLP coauth-MAG-geology coauth-MAG-history congress-bills congress-committees tags-stack-overflow tags-math-sx tags-ask-ubuntu email-Eu email-Enron threads-stack-overflow threads-math-sx threads-ask-ubuntu contact-high-school contact-primary-school harmonic geometric arithmetic p 4 3 2 1 0 1 2 3 4 0 20 40 60 80 Relativeperformance 4 3 2 1 0 1 2 3 4 p 2.5 5.0 7.5 10.0 12.5 Relativeperformance 4 3 2 1 0 1 2 3 4 p 1.0 1.5 2.0 2.5 3.0 3.5 Relativeperformance Good performance from this local information is a deviation from classical link prediction,where methods that use long paths (e.g.,PageRank) perform well [Liben-Nowell & Kleinberg 07]. For structures on k nodes,the subsets of size k-1 contain rich information only when k > 2. i j k Wij Wjk Wjk i j k ? scorep(i, j, k) = (Wp ij + Wp jk + Wp ik)1/p <latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit><latexit sha1_base64="wECyDT1irjpegMdv/Iox6i4U4iU=">AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk=</latexit>
  • 26. There are lots of opportunities in studying this data. 26 1. Higher-order data is pervasive! We have ways to represent data, and higher-order link prediction is a general framework for comparing comparing models and methods. 2. There is rich static and temporal structure in the datasets we collected. 3. Do you have a better model, algorithm, or data representation? Great! Show us by out-performing these baselines. Data. bit.ly/sc-holp-data Code. bit.ly/sc-holp-code
  • 27. Simplicial closure and higher-order link prediction. 27 Paper. arXiv 1802.06916 Data. bit.ly/sc-holp-data Code. bit.ly/sc-holp-code Slides. bit.ly/arb-SIAMAN18 Austin R. Benson http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu THANKS! i j k
  • 28. Most open triangles do not come from asynchronous temporal behavior. 28 i j k In 61.1% to 97.4% of open triangles, all three pairs of edges have an overlapping period of activity. ⟶ there is an overlapping period of activity between all 3 edges (Helly’s theorem). # overlaps Dataset # open triangles 0 1 2 3 coauth-DBLP 1,295,214 0.012 0.143 0.123 0.722 coauth-MAG-history 96,420 0.002 0.055 0.059 0.884 coauth-MAG-geology 2,494,960 0.010 0.128 0.109 0.753 tags-stack-overflow 300,646,440 0.002 0.067 0.071 0.860 tags-math-sx 2,666,353 0.001 0.040 0.049 0.910 tags-ask-ubuntu 3,288,058 0.002 0.088 0.085 0.825 threads-stack-overflow 99,027,304 0.001 0.034 0.037 0.929 threads-math-sx 11,294,665 0.001 0.038 0.039 0.922 threads-ask-ubuntu 136,374 0.000 0.020 0.023 0.957 NDC-substances 1,136,357 0.020 0.196 0.151 0.633 NDC-classes 9,064 0.022 0.191 0.136 0.652 DAWN 5,682,552 0.027 0.216 0.155 0.602 congress-committees 190,054 0.001 0.046 0.058 0.895 congress-bills 44,857,465 0.003 0.063 0.113 0.821 email-Enron 3,317 0.008 0.130 0.151 0.711 email-Eu 234,600 0.010 0.131 0.132 0.727 contact-high-school 31,850 0.000 0.015 0.019 0.966 contact-primary-school 98,621 0.000 0.012 0.014 0.974 music-rap-genius 70,057 0.028 0.221 0.141 0.611
  • 29. Simplicial closure probability on 4 nodes has similar behavior to those with 3 nodes,just“up one dimension”. LA/OPT Seminar 29 Take first 80% of the data (in time),record the configuration of every 4 nodes, and compute the fraction that simplicially close in the final 20% of the data. Increased edge density increases closure probability. Increased simplicial tie strength increases closure probability. Tension between simplicial density and simplicial tie strength.