SlideShare a Scribd company logo
1 of 39
Simplicial closure and
higher-order link
prediction
Austin R. Benson · Cornell
LA/OPT Seminar · April 26, 2018
LA/OPT Seminar 1
bit.ly/sc-holp-code bit.ly/arb-laopt-2018
bit.ly/sc-holp-data arXiv 1802.06916
Joint work with
Rediet Abebe & Jon Kleinberg
(Cornell)
Michael Schaub & Ali Jadbabaie (MIT)
Networks are sets of nodes and edges (graphs)
that model real-world systems.
LA/OPT Seminar 2
Collaboration
nodes are
people/groups
edges link entities
working together
Communications
nodes are
people/accounts
edges show info.
exchange
Physical proximity
nodes are people/animals
edges link those that
interact in close proximity
Drug compounds
nodes are substances
edge between substances
that appear in the same drug
Real-world systems are often composed from
“higher-order” interactions that we reduce to
pairwise ones.
LA/OPT Seminar 3
Collaboration
nodes are
people/groups
teams are made up of
small groups
Communications
nodes are
people/accounts
emails often have several
recipients, not just one
Physical proximity
nodes are people/animals
people often gather in
small groups
Drug compounds
nodes are substances
drugs are made up of
several substances
There are many ways to mathematically represent
the higher-order structure present in relational
data.
LA/OPT Seminar 4
• Hypergraphs [Berge 89]
• Set systems [Frankl 95]
• Affiliation networks [Feld 81, Newman-Watts-Strogatz 02]
• Abstract simplicial complexes [Lim 15, Osting-Palande-Wang 17]
• Multilayer networks [Kivelä+ 14, Boccaletti+ 14, many others…]
• Meta-paths [Sun-Han 12]
• Motif-based representations [Benson-Gleich-Leskovec 15, 17]
• …
LA/OPT Seminar 5
However, there are problems with higher-order data
representation…
1. Researchers and practitioners often don’t use it.
Graphs are easier and we already have lots of graph analysis tools.
2. We don’t have good frameworks for evaluating them
(especially for what they can do beyond graphs).
A goal with this research is to provide a framework through
which higher-order models can be evaluated.
Link prediction is a classical machine learning
problem in network science, which is used to
evaluate models.
LA/OPT Seminar 6
We observe data which is a list of edges in a graph up to some point t.
We want to predict which new edges will form in the future.
Shows up in a variety of applications
• Predicting new social relationships and friend recommendation.
[Backstrom-Leskovec 11; Wang+ 15]
• Inferring new links between genes and diseases.
[Wang-Gulbahce-Yu 11; Moreau-Tranchevent 12]
• Suggesting novel connections in the scientific community.
[Liben-Nowell-Kleinberg 07; Tang-Wu-Sun-Su 12]
Also useful as a framework to evaluate new methods!
[Liben-Nowell-Kleinberg 07; Lü-Zhau 11]
We propose “higher-order link prediction” as a
similar framework for evaluation of higher-order
models.
LA/OPT Seminar 7
Data.
Observe simplices up to some
time t. Using this data, want to
predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!
Possible applications
• Novel combinations of drugs for treatments.
• Group chat recommendation in social networks.
• Team formation.
Thinking of higher-order data as a weighted
projected graph with “filled in” structures is a
convenient viewpoint.
LA/OPT Seminar 8
1
2
3
4
5
6
7
8
9
Data. Pictures to have in mind.
Generalized means of edges weights are often
good predictors of new 3-node simplices
appearing.
LA/OPT Seminar 9
Good performance from this local information is a deviation from classical link prediction,
where methods that use long paths (e.g., PageRank) perform well [Liben-Nowell & Kleinberg
07].
For structures on k nodes, the subsets of size k-1 contain rich information only when k > 2.
i
j k
Wij
Wjk
Wjk
i
j k
?
LA/OPT Seminar 10
Three parts of this talk.
1. Basic study of datasets.
How diverse are the types of datasets that we encounter?
2. Simplicial closure.
How do we characterize the ways in which new simplices appear?
3. Higher-order link prediction.
How do we use insights to predict which new simplices appear?
Our datasets are timestamped simplices, where
each simplex is a subset of nodes.
LA/OPT Seminar 11
1. Co-authorship in different domains.
2. Emails with multiple recipients.
3. Tags on Q&A forums.
4. Threads on Q&A forums.
5. Contact/proximity measurements.
6. Musical artist collaboration.
7. Substance makeup and classification
codes applied to drugs the FDA
examines.
8. U.S. Congress committee
memberships and bill sponsorship.
9. Combinations of drugs seen in
patients in ER visits.
https://math.stackexchange.com/q/80181
Thinking of higher-order data as a weighted
projected graph with “filled in” structures is a
convenient viewpoint.
LA/OPT Seminar 12
1
2
3
4
5
6
7
8
9
Data. Pictures to have in mind.
LA/OPT Seminar 13
i
j k
i
j k
Warm-up. What’s more common?
or
“Open triangle”
each pair has been in a
simplex together but all 3
nodes have never been in the
same simplex
“Closed triangle”
there is some simplex that
contains all 3 nodes
There is lots of variation in the fraction of triangles
that are open, but datasets from the same domain
are similar.
LA/OPT Seminar 14
Most open triangles do not come from
asynchronous temporal behavior.
LA/OPT Seminar 15
i
j k
In 61.1% to 97.4% of open
triangles, all three pairs of edges
have an overlapping period of
activity.
⟶ there is an overlapping period
of
activity between all 3 edges
(Helly’s theorem).
A simple model can account for open triangle
variation.
LA/OPT Seminar 16
Consider a dataset with n nodes.
Form a 3-node simplex between nodes i, j, k with prob. p = 1 / nb i.i.d.
Thus, we always get ϴ(pn3) = ϴ(n3 - b) closed triangles in expectation.
Proposition (rough).
If b < 1, then we get ϴ(n3) open triangles in expectation for large n.
If b > 1, then we get ϴ(n3(2-b)) open triangles in expectation for large n.
⟶ The number of open triangles grows faster for b < 3/2.
A simple model can account for open triangle
variation.
LA/OPT Seminar 17
Consider a dataset with n nodes.
Form a 3-node simplex between nodes i, j, k with prob. p = 1 / nb i.i.d.
b = 0.8, 0.82, 0.84, ..., 1.8
Larger b is darker marker.
Summary of structure of datasets.
LA/OPT Seminar 18
1. Datasets with higher-order structure are common.
2. It’s useful to think of them as “projected graphs” with additional filled
in structure (like a simplicial complex).
3. Perhaps surprisingly, several datasets have many open triangles.
Not due to asynchronous temporal behavior.
A simple model can give a range of open triangles.
LA/OPT Seminar 19
Three parts of this talk.
1. Basic study of datasets.
How diverse are the types of datasets that we encounter?
2. Simplicial closure.
How do we characterize the ways in which new simplices appear?
3. Higher-order link prediction.
How do we use insights to predict which new simplices appear?
1
2+1
1
2+
1
1
1
1
2+
2+
2+
1
1
2+
2+
1
2+
2+
2+
58,749
245,996
13,034
74,219
45,71514,541
13,034
7,575
13,098
5,781
32,617
773
5,029
3,56021,103952
11,579
445
5,029
389
9,674
618
2,977
285
0
2,732,839
0
157,236
0
66,644
0
7,987
0
8,844
0
328
0
3,171
0
779
0
722
0
285
Simplicial closure is the appearance of a group of
nodes appearing together in a simplex for the first
time.
LA/OPT Seminar 20
Co-authorship data of authors publishing in history.
Simplicial closure is the appearance of a group of
nodes appearing together in a simplex for the first
time.
LA/OPT Seminar 21
H IV prot ease
inhibit ors
UGT 1A 1
inhibit ors
Breast cancer resist ance
prot ein inhibit ors
1
2+
2+
1
2+
1
1
2+
2+
1
2+
2+
2+
Reyat az
RedPharm
2003
Reyat az
Squibb & Sons
2003
K alet ra
Physicians To-
t al Care
2006
Promact a
GSK (25mg)
2008
Promact a
GSK (50mg)
2008
K alet ra
D OH Cent ral
Pharmacy
2009
Evot az
Squibb & Sons
2015
Substances in marketed drugs recorded in the National Drug Code
directory.
icons
colors 16.04
1
2+
2+
1
2+
2+
H ow can I change
t he icon colors, ap-
pearance, et c. at
t he t op panel?
2011
H ow do I change
t he icon and t ext
color?
2012
U bunt u 15.10
/ 16.04 t heme
doesn’t change
2016
U bunt u 16.04
Eclipse launcher
icon problems
2016
Set deskt op icons background
color K ubunt u 16.04
2016
Tags on askubuntu.com forum questions.
Simplicial closure probability on 3 nodes largely
depends on the weights in the projected network.
LA/OPT Seminar 22
Take first 80% of the data (in time), record the configuration of every triple of
nodes, and compute the fraction that simplicially close in the final 20% of the data.
Increased edge density
increases closure
probability.
Increased tie strength
increases closure
probability.
Tension between
edge density and tie
strength.
Left and middle observations are consistent with theory and empirical studies of social
networks. [Granovetter 73; Leskovec+ 08; Backstrom+ 06; Kossinets-Watts 06]
Simplicial closure probability on 4 nodes has
similar behavior to those with 3 nodes, just “up
one dimension”.
LA/OPT Seminar 23
Take first 80% of the data (in time), record the configuration of every 4
nodes, and compute the fraction that simplicially close in the final 20% of
the data.
Increased edge density
increases closure
probability.
Increased simplicial tie
strength increases
closure probability.
Tension between
simplicial density
and simplicial tie
strength.
Induced count = c - 2 * # - 2 * # - #
- 3 * # - 3 * # - 4 * #
Computing simplicial closure probabilities
requires some clever counting algorithms.
LA/OPT Seminar 24
How many sets of 4 nodes induce the structure?
1
2+
Assume we can enumerate 3- and 4-cliques and quickly
determine their “simplicial structure”.
1
2+1
1
2+ 1
2+
2+
1
1
1
2+
1
1
2+
2+
1
2+
2+
2+
Summary of simplicial closure.
LA/OPT Seminar 25
1. Useful to think of “trajectories” of sets of nodes in the projected graph.
2. More edges between group of nodes ⟶ more likely to close in future.
True for both 3-node and 4-node groups!
3. Stronger ties between groups of nodes ⟶ also more likely to close.
True for both 3-node and 4-node groups!
LA/OPT Seminar 26
Three parts of this talk.
1. Basic study of datasets.
How diverse are the types of datasets that we encounter?
2. Simplicial closure.
How do we characterize the ways in which new simplices appear?
3. Higher-order link prediction.
How do we use insights to predict which new simplices appear?
We propose “higher-order link prediction” as a
similar framework for evaluation of higher-order
models.
LA/OPT Seminar 27
Data.
Observe simplices up to some
time t. Using this data, want to
predict what groups of > 2
nodes will appear in a simplex
in the future.
t
1
2
3
4
5
6
7
8
9
We predict structure that classical link
prediction would not even consider!
LA/OPT Seminar 28
Our structural analysis tells us what we should be
looking at for prediction.
1. Edge density is a positive indicator.
⟶ focus our attention on predicting “open triangles”
2. Tie strength is a positive indicator.
⟶ various ways of incorporating this information
i
j k
Wij
Wjk
Wjk
LA/OPT Seminar 29
For every open triangle, we assign a score
function on first 80% of data based on structural
properties.
Four broad classes of score functions for
an open triangle. Score is s(i, j, k).
s(i, j, k)…
1. is a function of Wij, Wjk, Wjk
2. is a function of A[:, [i, j, k]] and the
simplices containing these nodes
3. is built from “whole-network”
similarity scores on edges
4. is “learned” from data
i
j k
Wij
Wjk
Wjk
After computing scores, predict that open triangles with highest
scores will simplicially close in final 20% of data.
LA/OPT Seminar 30
1. s(i, j, k) is a function of Wij , Wjk , and Wjk
i
j k
Wij
Wjk
Wjk
1. Arithmetic mean
2. Geometric mean
3. Harmonic mean
4. Generalized mean
LA/OPT Seminar 31
2. s(i, j, k) is a function of is a function of A[:, [i, j,
k]] and the related simplicial information
i
j k
Wij
Wjk
Wjk
1. Common 4-th neighbors
2. Generalized Jaccard coefficient
3. Preferential attachment (graph)
4. Preferential attachment (simplices)
LA/OPT Seminar 32
3. s(i, j, k) is is built from “whole-network”
similarity scores on edges: s(i, j, k) = Sij + Sji + Sjk +
Skj + Sik + Ski
i
j k
Wij
Wjk
Wjk
1. PageRank (unweighted or
weighted)
2. Katz (unweighted or weighted)
3. Fancier topological methods based
on lifted random walks
(work in progress…)
LA/OPT Seminar 33
Hopefully Michael chimed in on matrix inverses…
Want to reduce
computational cost
(only need scores on
open triangles).
For each node i that participates in an open triangle
1. Solve
2. Store ith column
using GMRES with low tolerance
LA/OPT Seminar 34
4. s(i, j, k) is learned from data
1. Split data into training and validation sets.
2. Compute features of (i, j, k) from previous ideas using training data.
3. Throw features + validation labels into machine learning blender
→ learn model.
4. Re-compute features on combined training + validation
→ apply model on the data.
LA/OPT Seminar 35
LA/OPT Seminar 36
A few lessons learned from applying all of these
ideas.
1. We can predict pretty well on all datasets using some method.
→ 4x to 107x better than random w/r/t mean average
precision
depending on the dataset/method
2. On some classes of datasets, we can consistently predict well.
→ thread co-participation and co-tagging on stack exchange
3. Simply averaging Wij, Wjk, and Wik consistently performs well.
→ something between harmonic and geometric mean is
consistently best
i
j k
Wij
Wjk
Wjk
Generalized means of edges weights are often
good predictors of new 3-node simplices
appearing.
LA/OPT Seminar 37
Good performance from this local information is a deviation from classical link prediction,
where methods that use long paths (e.g., PageRank) perform well [Liben-Nowell & Kleinberg
07].
For structures on k nodes, the subsets of size k-1 contain rich information only when k > 2.
i
j k
Wij
Wjk
Wjk
i
j k
?
LA/OPT Seminar 38
Shameless plug for SIAM ALA 2018!
David Gleich and I are organizing a minitutorial.
Tensor Eigenvectors and Stochastic Processes
May 6, 10:45am – 12:45pm
Learn about how stochastics help us understand
problems in numerical multilinear algebra!
It’s a minitutorial—little background is assumed.
Lots of open problems for ICME students to solve.
Simplicial closure and higher-order link prediction.
LA/OPT Seminar 39
Paper. arXiv 1802.06916 Code. bit.ly/sc-holp-code
Data. bit.ly/sc-holp-data Slides. bit.ly/arb-laopt-
2018
THANKS!
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
1. We want higher-order models to
be pervasive.
2. Higher-order link prediction is a
way to compare models.
3. Have a better model or
algorithm? Great! Show us on
the data.
4. Lots of interesting structural
information in these datasets.

More Related Content

What's hot

Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExAustin Benson
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureAustin Benson
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flowsAustin Benson
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseAustin Benson
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifsAustin Benson
 
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News CorpusUnsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News Corpushtanev
 
Sequences of Sets KDD '18
Sequences of Sets KDD '18Sequences of Sets KDD '18
Sequences of Sets KDD '18Austin Benson
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graphAustin Benson
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScsandit
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataDon Pellegrino
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksJim Bagrow
 
Equation 2.doc
Equation 2.docEquation 2.doc
Equation 2.docbutest
 
BookyScholia: A Methodology for the Investigation of Expert Systems
BookyScholia: A Methodology for the  Investigation of Expert SystemsBookyScholia: A Methodology for the  Investigation of Expert Systems
BookyScholia: A Methodology for the Investigation of Expert Systemsijcnac
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...ijdms
 
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Amir Shokri
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksAnvardh Nanduri
 

What's hot (20)

Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphEx
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structure
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flows
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction Syracuse
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News CorpusUnsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
 
Sequences of Sets KDD '18
Sequences of Sets KDD '18Sequences of Sets KDD '18
Sequences of Sets KDD '18
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
 
Classement Leiden Ranking
Classement Leiden RankingClassement Leiden Ranking
Classement Leiden Ranking
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and data
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
 
Equation 2.doc
Equation 2.docEquation 2.doc
Equation 2.doc
 
BookyScholia: A Methodology for the Investigation of Expert Systems
BookyScholia: A Methodology for the  Investigation of Expert SystemsBookyScholia: A Methodology for the  Investigation of Expert Systems
BookyScholia: A Methodology for the Investigation of Expert Systems
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
 
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
Heterogeneous fuzzy xml data integration based on structrual and semantic sim...
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
OntoFrac-S
OntoFrac-SOntoFrac-S
OntoFrac-S
 

Similar to Simplicial closure and higher-order link prediction LA/OPT

Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsAustin Benson
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Austin Benson
 
Protein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingProtein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingRangarajan Chari
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Summary Of Thesis
Summary Of ThesisSummary Of Thesis
Summary Of Thesisguestb452d6
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Tin180 VietNam
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...Lviv Data Science Summer School
 
Community detection
Community detectionCommunity detection
Community detectionScott Pauls
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Carmine gelormini network analysis
Carmine gelormini network analysisCarmine gelormini network analysis
Carmine gelormini network analysisCarmineGelormini
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social MachineJeremy Frey
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsUmeå University
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structuresMaxim Boiko Savenko
 
Distribution of maximal clique size of the
Distribution of maximal clique size of theDistribution of maximal clique size of the
Distribution of maximal clique size of theIJCNCJournal
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...Christine Luk
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksApril Smith
 

Similar to Simplicial closure and higher-order link prediction LA/OPT (20)

Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18
 
Protein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modelingProtein-protein interactions-graph-theoretic-modeling
Protein-protein interactions-graph-theoretic-modeling
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Summary Of Thesis
Summary Of ThesisSummary Of Thesis
Summary Of Thesis
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
Master defence 2020 - Serhii Brodiuk - Concept Embedding and Network Analysis...
 
Community detection
Community detectionCommunity detection
Community detection
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Carmine gelormini network analysis
Carmine gelormini network analysisCarmine gelormini network analysis
Carmine gelormini network analysis
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social Machine
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flows
 
Community structure in social and biological structures
Community structure in social and biological structuresCommunity structure in social and biological structures
Community structure in social and biological structures
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Distribution of maximal clique size of the
Distribution of maximal clique size of theDistribution of maximal clique size of the
Distribution of maximal clique size of the
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...
“The Epistemic Cultures of Single Molecule Biophysics: Participation, Observa...
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 

More from Austin Benson

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Austin Benson
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networksAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structureAustin Benson
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Austin Benson
 
Higher-order clustering in networks
Higher-order clustering in networksHigher-order clustering in networks
Higher-order clustering in networksAustin Benson
 
New perspectives on measuring network clustering
New perspectives on measuring network clusteringNew perspectives on measuring network clustering
New perspectives on measuring network clusteringAustin Benson
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsAustin Benson
 
Tensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic ProcessesTensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic ProcessesAustin Benson
 
Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017Austin Benson
 
Spacey random walks CAM Colloquium
Spacey random walks CAM ColloquiumSpacey random walks CAM Colloquium
Spacey random walks CAM ColloquiumAustin Benson
 

More from Austin Benson (12)

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networks
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structure
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.
 
Higher-order clustering in networks
Higher-order clustering in networksHigher-order clustering in networks
Higher-order clustering in networks
 
New perspectives on measuring network clustering
New perspectives on measuring network clusteringNew perspectives on measuring network clustering
New perspectives on measuring network clustering
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Tensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic ProcessesTensor Eigenvectors and Stochastic Processes
Tensor Eigenvectors and Stochastic Processes
 
Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017Spacey random walks CMStatistics 2017
Spacey random walks CMStatistics 2017
 
Spacey random walks CAM Colloquium
Spacey random walks CAM ColloquiumSpacey random walks CAM Colloquium
Spacey random walks CAM Colloquium
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Simplicial closure and higher-order link prediction LA/OPT

  • 1. Simplicial closure and higher-order link prediction Austin R. Benson · Cornell LA/OPT Seminar · April 26, 2018 LA/OPT Seminar 1 bit.ly/sc-holp-code bit.ly/arb-laopt-2018 bit.ly/sc-holp-data arXiv 1802.06916 Joint work with Rediet Abebe & Jon Kleinberg (Cornell) Michael Schaub & Ali Jadbabaie (MIT)
  • 2. Networks are sets of nodes and edges (graphs) that model real-world systems. LA/OPT Seminar 2 Collaboration nodes are people/groups edges link entities working together Communications nodes are people/accounts edges show info. exchange Physical proximity nodes are people/animals edges link those that interact in close proximity Drug compounds nodes are substances edge between substances that appear in the same drug
  • 3. Real-world systems are often composed from “higher-order” interactions that we reduce to pairwise ones. LA/OPT Seminar 3 Collaboration nodes are people/groups teams are made up of small groups Communications nodes are people/accounts emails often have several recipients, not just one Physical proximity nodes are people/animals people often gather in small groups Drug compounds nodes are substances drugs are made up of several substances
  • 4. There are many ways to mathematically represent the higher-order structure present in relational data. LA/OPT Seminar 4 • Hypergraphs [Berge 89] • Set systems [Frankl 95] • Affiliation networks [Feld 81, Newman-Watts-Strogatz 02] • Abstract simplicial complexes [Lim 15, Osting-Palande-Wang 17] • Multilayer networks [Kivelä+ 14, Boccaletti+ 14, many others…] • Meta-paths [Sun-Han 12] • Motif-based representations [Benson-Gleich-Leskovec 15, 17] • …
  • 5. LA/OPT Seminar 5 However, there are problems with higher-order data representation… 1. Researchers and practitioners often don’t use it. Graphs are easier and we already have lots of graph analysis tools. 2. We don’t have good frameworks for evaluating them (especially for what they can do beyond graphs). A goal with this research is to provide a framework through which higher-order models can be evaluated.
  • 6. Link prediction is a classical machine learning problem in network science, which is used to evaluate models. LA/OPT Seminar 6 We observe data which is a list of edges in a graph up to some point t. We want to predict which new edges will form in the future. Shows up in a variety of applications • Predicting new social relationships and friend recommendation. [Backstrom-Leskovec 11; Wang+ 15] • Inferring new links between genes and diseases. [Wang-Gulbahce-Yu 11; Moreau-Tranchevent 12] • Suggesting novel connections in the scientific community. [Liben-Nowell-Kleinberg 07; Tang-Wu-Sun-Su 12] Also useful as a framework to evaluate new methods! [Liben-Nowell-Kleinberg 07; Lü-Zhau 11]
  • 7. We propose “higher-order link prediction” as a similar framework for evaluation of higher-order models. LA/OPT Seminar 7 Data. Observe simplices up to some time t. Using this data, want to predict what groups of > 2 nodes will appear in a simplex in the future. t 1 2 3 4 5 6 7 8 9 We predict structure that classical link prediction would not even consider! Possible applications • Novel combinations of drugs for treatments. • Group chat recommendation in social networks. • Team formation.
  • 8. Thinking of higher-order data as a weighted projected graph with “filled in” structures is a convenient viewpoint. LA/OPT Seminar 8 1 2 3 4 5 6 7 8 9 Data. Pictures to have in mind.
  • 9. Generalized means of edges weights are often good predictors of new 3-node simplices appearing. LA/OPT Seminar 9 Good performance from this local information is a deviation from classical link prediction, where methods that use long paths (e.g., PageRank) perform well [Liben-Nowell & Kleinberg 07]. For structures on k nodes, the subsets of size k-1 contain rich information only when k > 2. i j k Wij Wjk Wjk i j k ?
  • 10. LA/OPT Seminar 10 Three parts of this talk. 1. Basic study of datasets. How diverse are the types of datasets that we encounter? 2. Simplicial closure. How do we characterize the ways in which new simplices appear? 3. Higher-order link prediction. How do we use insights to predict which new simplices appear?
  • 11. Our datasets are timestamped simplices, where each simplex is a subset of nodes. LA/OPT Seminar 11 1. Co-authorship in different domains. 2. Emails with multiple recipients. 3. Tags on Q&A forums. 4. Threads on Q&A forums. 5. Contact/proximity measurements. 6. Musical artist collaboration. 7. Substance makeup and classification codes applied to drugs the FDA examines. 8. U.S. Congress committee memberships and bill sponsorship. 9. Combinations of drugs seen in patients in ER visits. https://math.stackexchange.com/q/80181
  • 12. Thinking of higher-order data as a weighted projected graph with “filled in” structures is a convenient viewpoint. LA/OPT Seminar 12 1 2 3 4 5 6 7 8 9 Data. Pictures to have in mind.
  • 13. LA/OPT Seminar 13 i j k i j k Warm-up. What’s more common? or “Open triangle” each pair has been in a simplex together but all 3 nodes have never been in the same simplex “Closed triangle” there is some simplex that contains all 3 nodes
  • 14. There is lots of variation in the fraction of triangles that are open, but datasets from the same domain are similar. LA/OPT Seminar 14
  • 15. Most open triangles do not come from asynchronous temporal behavior. LA/OPT Seminar 15 i j k In 61.1% to 97.4% of open triangles, all three pairs of edges have an overlapping period of activity. ⟶ there is an overlapping period of activity between all 3 edges (Helly’s theorem).
  • 16. A simple model can account for open triangle variation. LA/OPT Seminar 16 Consider a dataset with n nodes. Form a 3-node simplex between nodes i, j, k with prob. p = 1 / nb i.i.d. Thus, we always get ϴ(pn3) = ϴ(n3 - b) closed triangles in expectation. Proposition (rough). If b < 1, then we get ϴ(n3) open triangles in expectation for large n. If b > 1, then we get ϴ(n3(2-b)) open triangles in expectation for large n. ⟶ The number of open triangles grows faster for b < 3/2.
  • 17. A simple model can account for open triangle variation. LA/OPT Seminar 17 Consider a dataset with n nodes. Form a 3-node simplex between nodes i, j, k with prob. p = 1 / nb i.i.d. b = 0.8, 0.82, 0.84, ..., 1.8 Larger b is darker marker.
  • 18. Summary of structure of datasets. LA/OPT Seminar 18 1. Datasets with higher-order structure are common. 2. It’s useful to think of them as “projected graphs” with additional filled in structure (like a simplicial complex). 3. Perhaps surprisingly, several datasets have many open triangles. Not due to asynchronous temporal behavior. A simple model can give a range of open triangles.
  • 19. LA/OPT Seminar 19 Three parts of this talk. 1. Basic study of datasets. How diverse are the types of datasets that we encounter? 2. Simplicial closure. How do we characterize the ways in which new simplices appear? 3. Higher-order link prediction. How do we use insights to predict which new simplices appear?
  • 21. Simplicial closure is the appearance of a group of nodes appearing together in a simplex for the first time. LA/OPT Seminar 21 H IV prot ease inhibit ors UGT 1A 1 inhibit ors Breast cancer resist ance prot ein inhibit ors 1 2+ 2+ 1 2+ 1 1 2+ 2+ 1 2+ 2+ 2+ Reyat az RedPharm 2003 Reyat az Squibb & Sons 2003 K alet ra Physicians To- t al Care 2006 Promact a GSK (25mg) 2008 Promact a GSK (50mg) 2008 K alet ra D OH Cent ral Pharmacy 2009 Evot az Squibb & Sons 2015 Substances in marketed drugs recorded in the National Drug Code directory. icons colors 16.04 1 2+ 2+ 1 2+ 2+ H ow can I change t he icon colors, ap- pearance, et c. at t he t op panel? 2011 H ow do I change t he icon and t ext color? 2012 U bunt u 15.10 / 16.04 t heme doesn’t change 2016 U bunt u 16.04 Eclipse launcher icon problems 2016 Set deskt op icons background color K ubunt u 16.04 2016 Tags on askubuntu.com forum questions.
  • 22. Simplicial closure probability on 3 nodes largely depends on the weights in the projected network. LA/OPT Seminar 22 Take first 80% of the data (in time), record the configuration of every triple of nodes, and compute the fraction that simplicially close in the final 20% of the data. Increased edge density increases closure probability. Increased tie strength increases closure probability. Tension between edge density and tie strength. Left and middle observations are consistent with theory and empirical studies of social networks. [Granovetter 73; Leskovec+ 08; Backstrom+ 06; Kossinets-Watts 06]
  • 23. Simplicial closure probability on 4 nodes has similar behavior to those with 3 nodes, just “up one dimension”. LA/OPT Seminar 23 Take first 80% of the data (in time), record the configuration of every 4 nodes, and compute the fraction that simplicially close in the final 20% of the data. Increased edge density increases closure probability. Increased simplicial tie strength increases closure probability. Tension between simplicial density and simplicial tie strength.
  • 24. Induced count = c - 2 * # - 2 * # - # - 3 * # - 3 * # - 4 * # Computing simplicial closure probabilities requires some clever counting algorithms. LA/OPT Seminar 24 How many sets of 4 nodes induce the structure? 1 2+ Assume we can enumerate 3- and 4-cliques and quickly determine their “simplicial structure”. 1 2+1 1 2+ 1 2+ 2+ 1 1 1 2+ 1 1 2+ 2+ 1 2+ 2+ 2+
  • 25. Summary of simplicial closure. LA/OPT Seminar 25 1. Useful to think of “trajectories” of sets of nodes in the projected graph. 2. More edges between group of nodes ⟶ more likely to close in future. True for both 3-node and 4-node groups! 3. Stronger ties between groups of nodes ⟶ also more likely to close. True for both 3-node and 4-node groups!
  • 26. LA/OPT Seminar 26 Three parts of this talk. 1. Basic study of datasets. How diverse are the types of datasets that we encounter? 2. Simplicial closure. How do we characterize the ways in which new simplices appear? 3. Higher-order link prediction. How do we use insights to predict which new simplices appear?
  • 27. We propose “higher-order link prediction” as a similar framework for evaluation of higher-order models. LA/OPT Seminar 27 Data. Observe simplices up to some time t. Using this data, want to predict what groups of > 2 nodes will appear in a simplex in the future. t 1 2 3 4 5 6 7 8 9 We predict structure that classical link prediction would not even consider!
  • 28. LA/OPT Seminar 28 Our structural analysis tells us what we should be looking at for prediction. 1. Edge density is a positive indicator. ⟶ focus our attention on predicting “open triangles” 2. Tie strength is a positive indicator. ⟶ various ways of incorporating this information i j k Wij Wjk Wjk
  • 29. LA/OPT Seminar 29 For every open triangle, we assign a score function on first 80% of data based on structural properties. Four broad classes of score functions for an open triangle. Score is s(i, j, k). s(i, j, k)… 1. is a function of Wij, Wjk, Wjk 2. is a function of A[:, [i, j, k]] and the simplices containing these nodes 3. is built from “whole-network” similarity scores on edges 4. is “learned” from data i j k Wij Wjk Wjk After computing scores, predict that open triangles with highest scores will simplicially close in final 20% of data.
  • 30. LA/OPT Seminar 30 1. s(i, j, k) is a function of Wij , Wjk , and Wjk i j k Wij Wjk Wjk 1. Arithmetic mean 2. Geometric mean 3. Harmonic mean 4. Generalized mean
  • 31. LA/OPT Seminar 31 2. s(i, j, k) is a function of is a function of A[:, [i, j, k]] and the related simplicial information i j k Wij Wjk Wjk 1. Common 4-th neighbors 2. Generalized Jaccard coefficient 3. Preferential attachment (graph) 4. Preferential attachment (simplices)
  • 32. LA/OPT Seminar 32 3. s(i, j, k) is is built from “whole-network” similarity scores on edges: s(i, j, k) = Sij + Sji + Sjk + Skj + Sik + Ski i j k Wij Wjk Wjk 1. PageRank (unweighted or weighted) 2. Katz (unweighted or weighted) 3. Fancier topological methods based on lifted random walks (work in progress…)
  • 33. LA/OPT Seminar 33 Hopefully Michael chimed in on matrix inverses… Want to reduce computational cost (only need scores on open triangles). For each node i that participates in an open triangle 1. Solve 2. Store ith column using GMRES with low tolerance
  • 34. LA/OPT Seminar 34 4. s(i, j, k) is learned from data 1. Split data into training and validation sets. 2. Compute features of (i, j, k) from previous ideas using training data. 3. Throw features + validation labels into machine learning blender → learn model. 4. Re-compute features on combined training + validation → apply model on the data.
  • 36. LA/OPT Seminar 36 A few lessons learned from applying all of these ideas. 1. We can predict pretty well on all datasets using some method. → 4x to 107x better than random w/r/t mean average precision depending on the dataset/method 2. On some classes of datasets, we can consistently predict well. → thread co-participation and co-tagging on stack exchange 3. Simply averaging Wij, Wjk, and Wik consistently performs well. → something between harmonic and geometric mean is consistently best i j k Wij Wjk Wjk
  • 37. Generalized means of edges weights are often good predictors of new 3-node simplices appearing. LA/OPT Seminar 37 Good performance from this local information is a deviation from classical link prediction, where methods that use long paths (e.g., PageRank) perform well [Liben-Nowell & Kleinberg 07]. For structures on k nodes, the subsets of size k-1 contain rich information only when k > 2. i j k Wij Wjk Wjk i j k ?
  • 38. LA/OPT Seminar 38 Shameless plug for SIAM ALA 2018! David Gleich and I are organizing a minitutorial. Tensor Eigenvectors and Stochastic Processes May 6, 10:45am – 12:45pm Learn about how stochastics help us understand problems in numerical multilinear algebra! It’s a minitutorial—little background is assumed. Lots of open problems for ICME students to solve.
  • 39. Simplicial closure and higher-order link prediction. LA/OPT Seminar 39 Paper. arXiv 1802.06916 Code. bit.ly/sc-holp-code Data. bit.ly/sc-holp-data Slides. bit.ly/arb-laopt- 2018 THANKS! http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu 1. We want higher-order models to be pervasive. 2. Higher-order link prediction is a way to compare models. 3. Have a better model or algorithm? Great! Show us on the data. 4. Lots of interesting structural information in these datasets.