Modeling and mining complex networks with feature-rich nodes.
Mar. 9, 2017•0 likes•394 views
Download to read offline
Report
Data & Analytics
Slideshow for my PhD dissertation. The core of my work was to analyze the problems of link prediction, label prediction and graph modeling within a single framework of graphs with binary attributes on their nodes.
Modeling and mining complex networks with feature-rich nodes.
1. Modeling and Mining Complex Networks
with Feature-Rich Nodes
PhD Candidate: Corrado Monti
Advisor: Paolo Boldi
Laboratory for
Web Algorithmics
1
2. Complex networks with feature-rich nodes
A common scenario:
• Networks describe (directed
or undirected) links between
objects
• Objects have properties and
attributes
2
3. A word on wording
• Networks = graphs
• Some scientific communities
use the former, some the latter
• Complex networks
• Small-diameter, scale-free,
high clustering coefficient...
• Closer to many real-world networks
• We focus on features of nodes,
also known as attributes
3
4. Complex networks with feature-rich nodes: examples
• A citation network, with each author interests
A
C {Neural Network,
Bayesian models}{Neural
Network}
{Statistics,
Neural Network}
{Neuropsychlogy,
Statistics}
{Statistics}
{Statistics,
Bayesian models}
4
6. We need a model
Base idea: links can be explained by
the features of the nodes being connected.
• E.g., “Actors” → “Directors”
6
7. Complex networks with feature-rich nodes
• There is a plethora of models:
• Stochastic Block Model (2001), Mixed-membership SBM (2008), Infinite
relational model (2006), Multiplicative Attribute Graphs (2010...)
• Some of these models...
• ...consider only one feature per node
• ...or, can only work with homophily
• That is: feature h and feature k interact only if h = k
• ...or, they require a great number of parameters and are unable
to work with large networks.
7
8. We need a model
• Able to go beyond homophily
• Able to model overlapping features
• The less hidden parameters, the better
8
9. We need a model
• Able to go beyond homophily
• Able to model overlapping features
• The less hidden parameters, the better
• Therefore, we built on:
Miller-Griffiths-Jordan,
Nonparametric latent feature models for link prediction.
NIPS 2009.
8
10. We need a model
• Able to go beyond homophily
• Able to model overlapping features
• The less hidden parameters, the better
• Therefore, we built on:
Miller-Griffiths-Jordan,
Nonparametric latent feature models for link prediction.
NIPS 2009.
Challenges:
1. Explore how it can help in mining useful information
from feature-rich networks
2. Adapt it to very large graphs
• Millions of nodes, and beyond
8
11. Our framework
• We have a set of nodes N of size n
• Links L ⊆ N × N
• We have a set of features F containing m features
• A binary node-feature matrix Z of size n × m
• Beware: L and Z can be represented as (binary) matrices or graphs.
9
12. Our framework
P (i, j) ∈ L = φ
h k
Zi,hWh,kZj,k
Where:
• L is the set of links in the network
• Z is the node-features matrix
• φ is a monotonically increasing function R → [0, 1]
• φ is a parameter of the model (e.g., a sigmoid function)
• W is a m × m matrix: the latent feature-feature matrix
10
13. Our framework
P (i, j) ∈ L = φ
h k
Zi,hWh,kZj,k
The entries of W indicate how the co-presence of features on the
two nodes will influence the presence/absence of a link:
• Wh,k > 0 will foster the creation of links from nodes with
feature h to nodes with feature k
10
14. Our framework
P (i, j) ∈ L = φ
h k
Zi,hWh,kZj,k
The entries of W indicate how the co-presence of features on the
two nodes will influence the presence/absence of a link:
• Wh,k > 0 will foster the creation of links from nodes with
feature h to nodes with feature k
• Wh,k < 0 will discourage the creation of links from nodes with
feature h to nodes with feature k
10
17. Generative model: node-feature association
We have developed a model for realistic node-feature generation.
A real dataset
• A real node-feature association
• Rows are nodes, columns are features
• Matrix is considered left-ordered
1 5000 10000 15000 21933
1
10000
20000
27770
1 5000 10000 15000 21933
1
10000
20000
27770
Data
11
18. Generative model: node-feature association
We have developed a model for realistic node-feature generation.
A real dataset
• A real node-feature association
• Rows are nodes, columns are features
• Matrix is considered left-ordered
1 5000 10000 15000 21933
1
10000
20000
27770
1 5000 10000 15000 21933
1
10000
20000
27770
Data
A synthetic dataset
• Estimate its parameters
• Generate a new synthetic dataset
• 1 5000 10000 15000 22179
1
10000
20000
27770
1 5000 10000 15000 22179
1
10000
20000
27770
Simulation
11
19. Generative model
• Model based on Miller-Griffiths-Jordan
• Intuition: features are generated in a rich-get-richer fashion
• Specifically, a statistical process known as Indian Buffet
12
20. Generative model
• Model based on Miller-Griffiths-Jordan
• Intuition: features are generated in a rich-get-richer fashion
• Specifically, a statistical process known as Indian Buffet
• Novel aspects:1
1
P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent
attribute structure with competition”. Information Sciences (Elsevier), 2016.
12
21. Generative model
• Model based on Miller-Griffiths-Jordan
• Intuition: features are generated in a rich-get-richer fashion
• Specifically, a statistical process known as Indian Buffet
• Novel aspects:1
1. Each node has a fitness value representing how much it can
spread its features
• We developed algorithms to rebuild these values from data
1
P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent
attribute structure with competition”. Information Sciences (Elsevier), 2016.
12
22. Generative model
• Model based on Miller-Griffiths-Jordan
• Intuition: features are generated in a rich-get-richer fashion
• Specifically, a statistical process known as Indian Buffet
• Novel aspects:1
1. Each node has a fitness value representing how much it can
spread its features
• We developed algorithms to rebuild these values from data
2. A few, easy-to-interpret, estimable parameters
1
P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent
attribute structure with competition”. Information Sciences (Elsevier), 2016.
12
23. Generative model
• Model based on Miller-Griffiths-Jordan
• Intuition: features are generated in a rich-get-richer fashion
• Specifically, a statistical process known as Indian Buffet
• Novel aspects:1
1. Each node has a fitness value representing how much it can
spread its features
• We developed algorithms to rebuild these values from data
2. A few, easy-to-interpret, estimable parameters
3. We investigate properties of the generated graphs
1
P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent
attribute structure with competition”. Information Sciences (Elsevier), 2016.
12
24. Generative model: generating the network
We can generate graphs with realistic degree and distance distributions:
1 10 100 1000
10-4
10-3
10-2
0.1
1
0 2 4 6 8 10
0.
0.2
0.4
0.6
1.
0.787221
13
25. Generative model: generating the network
We can generate graphs with realistic degree and distance distributions:
1 10 100 1000
10-4
10-3
10-2
0.1
1
0 2 4 6 8 10
0.
0.2
0.4
0.6
1.
0.787221
Why is this useful?
13
26. Generative model: generating the network
We can generate graphs with realistic degree and distance distributions:
1 10 100 1000
10-4
10-3
10-2
0.1
1
0 2 4 6 8 10
0.
0.2
0.4
0.6
1.
0.787221
Why is this useful?
• We can generate synthetic feature-rich graphs, from scratch
13
27. Generative model: generating the network
We can generate graphs with realistic degree and distance distributions:
1 10 100 1000
10-4
10-3
10-2
0.1
1
0 2 4 6 8 10
0.
0.2
0.4
0.6
1.
0.787221
Why is this useful?
• We can generate synthetic feature-rich graphs, from scratch
• They have realistic global properties
13
28. Generative model: generating the network
We can generate graphs with realistic degree and distance distributions:
1 10 100 1000
10-4
10-3
10-2
0.1
1
0 2 4 6 8 10
0.
0.2
0.4
0.6
1.
0.787221
Why is this useful?
• We can generate synthetic feature-rich graphs, from scratch
• They have realistic global properties
• We can use them for tests
• If an algorithm works on them, we expect it to be useful
whenever our model is valid
13
31. Inferring feature-feature interaction: Na¨ıve Bayes
A Na¨ıve Bayes approach.
Assumptions:
• Fix φ(x) = min(1, exp(x))
• Na¨ıve independence assumptions!
• If Nk is the set of nodes with feature k then:
Wh,k = log
|{(Nh × Nk) ∩ L}|
|Nh| · |Nk|
• Independence assumptions unteneable in practice
• We will see later the data
15
32. Inferring feature-feature interaction: perceptron with kernel
• Node i → Node-feature vector zi , with 1 for its features
• We feed as input to a perceptron the outer product zi ⊗ zj
• We ask it to predict 1 ⇐⇒ (i, j) ∈ L and −1 ⇐⇒ (i, j) /∈ L
• Then, the internal state W corresponds to the matrix W
zi,1
zi,2
. . .
zi,m
zi (features of node i)
zj,1 zj,2 . . . zj,m
zj (features of node j)
zi,1zj,1 zi,1zj,2 . . . zi,1zj,m
zi,2zj,1 zi,2zj,2 . . . zi,2zj,m
...
...
...
...
zi,mzj,1 zi,mzj,2 . . . zi,mzj,m
zi ⊗ zj
W11 W12 . . . W1m
W21 W22 . . . W2m
...
...
...
...
Wm1 Wm2 . . . Wmm
Latent feature-feature matrix W
·
16
33. Inferring feature-feature interaction: synthetic data
• We can test if it works on graphs simulated with our model
• Each Wi,j is randomly generated with a Bernoulli distribution
• φ is a sigmoid
Na¨ıve
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
Perceptron
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
17
34. Inferring feature-feature interaction: experiments
• Test on real graph: a citation network
• 18 939 155 papers, 189 465 540 citations
• 47 269 fields of research each paper can be tagged with
• on average, 3.88 fields per paper
Na¨ıve
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
Perceptron
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
18
35. Inferring feature-feature interaction: experiments
Within our framework, we can answer to questions like:
“Are the links in a citation graph
explained by the fields of research of authors?”
We can use the AUPR as a measure of this explainability.
19
36. Inferring feature-feature interaction: experiments
• It depends on the feature we choose.
• For the same citation net, we can use affiliations of authors:
Na¨ıve
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
Learning
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
precision
recall
Affiliations Fields of research Both
Perceptron-like .5925 ± .0050 .9175 ± .0002 .9210 ± .0012
Naive Bayes .5517 ± .0003 .6318 ± .0001 .6345 ± .0002
20
38. Inferring feature-feature interaction: experiments
It’s fundamentally a perceptron =⇒ we can re-adapt...
• ...other perceptron-like techniques
• E.g., passive aggressive algorithms
• Efficient in practice: 4 µs/link
21
39. Inferring feature-feature interaction: experiments
It’s fundamentally a perceptron =⇒ we can re-adapt...
• ...other perceptron-like techniques
• E.g., passive aggressive algorithms
• Efficient in practice: 4 µs/link
• ...error bounds from perceptron theoretical analysis.
• E.g., we can prove2
that the bound on the number of errors is
higher when
max
i,j
|Fi | · |Fj |
is higher (where |Fi | is the number of features of i).
2
Corrado Monti, Paolo Boldi. “Estimating latent feature-feature interactions in large
feature-rich graphs”. Under review, 2017 (arXiv preprint arXiv:1612.00984).
21
40. Inferring feature-feature interaction: experiments on en.wiki
• We also applied this model to the Wikipedia Link Network
• Typical and open-source example of a semantic network
22
41. Inferring feature-feature interaction: experiments on en.wiki
• We also applied this model to the Wikipedia Link Network
• Typical and open-source example of a semantic network
• Problem: Wikipedia categories are a mess
22
42. Inferring feature-feature interaction: experiments on en.wiki
• We also applied this model to the Wikipedia Link Network
• Typical and open-source example of a semantic network
• Problem: Wikipedia categories are a mess
• We developed a novel approach to cleanse this categorization3
3
Paolo Boldi, Corrado Monti. “Cleansing wikipedia categories using centrality”.
Proceedings of the 25th International Conference Companion on World Wide Web
(WWW companion), 2016.
22
43. Inferring feature-feature interaction: experiments on en.wiki
• We also applied this model to the Wikipedia Link Network
• Typical and open-source example of a semantic network
• Problem: Wikipedia categories are a mess
• We developed a novel approach to cleanse this categorization3
• Based on Harmonic Centrality, indipendent from our model
• Tunable: we can choose the k “better” categories
• Tested against expert-curated bibliographic classification
• AUC ROC 94%
3
Paolo Boldi, Corrado Monti. “Cleansing wikipedia categories using centrality”.
Proceedings of the 25th International Conference Companion on World Wide Web
(WWW companion), 2016.
22
44. Inferring feature-feature interaction: experiments on en.wiki
F1 Score versus number of considered categories
100 101 102 103 104 105 106
Number of features
0.0
0.2
0.4
0.6
0.8
1.0
FMeasure
400 Bytes 40 kB 4 MB 400 MB 40 GB
Feature matrix size
23
46. Inferring feature-feature interaction: experiments on en.wiki
• We settled for 20 000 categories
• It can process the 1.1 · 108 links of Wikipedia in 9 minutes
• Resulting W
• Explains 84% of links
• On random pairs, has a precision of 90%
24
47. Inferring feature-feature interaction: experiments on en.wiki
• W describes how Wikipedia categories interact between them
• We can view this matrix as a network itself
• A “latent network” between categories,
that explains the links we are seeing between pages
25
48. Inferring feature-feature interaction: experiments on en.wiki
Science fiction by nationalityScience fiction by nationality
Science fiction book seriesScience fiction book series
Science fiction by franchiseScience fiction by franchise
RobotsRobots
Science fiction novelsScience fiction novels
SpaceflightSpaceflight
Speculative fiction novelsSpeculative fiction novels
Planets of the Solar SystemPlanets of the Solar System
MaterialsMaterials
Evolutionary biologyEvolutionary biology
PredationPredation
Technology systemsTechnology systems
German cultureGerman culture
Science fiction filmsScience fiction films
Theory of relativityTheory of relativity
Celestial mechanicsCelestial mechanics
Production and manufacturingProduction and manufacturing
SaurischiansSaurischians
Prehistoric reptilesPrehistoric reptiles
26
49. Inferring feature-feature interaction: experiments on en.wiki
Music-related listsMusic-related lists
Catholic pilgrimage sitesCatholic pilgrimage sites
Place namesPlace names
London boroughsLondon boroughs
ArtistsArtists
Multinational companies in the U.S.Multinational companies in the U.S.
KeyboardistsKeyboardists
Buildings and structures by American architectsBuildings and structures by American architects
Power metal albumsPower metal albums
Human–machine interactionHuman–machine interaction
AnimationAnimation
British songsBritish songs
Universities by countryUniversities by country
Progressive rock albums by British artistsProgressive rock albums by British artists
British awardsBritish awards
English writersEnglish writers
Music by nationalityMusic by nationality
Short filmsShort films
Electronic albums by American artistsElectronic albums by American artists
27
50. Finding unexpected relations
Finally, we found out that the links unexpected by this model are
unexpected for a reason!4
4
Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find
unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web
Science, 2016.
28
51. Finding unexpected relations
Finally, we found out that the links unexpected by this model are
unexpected for a reason!4
The most unexpected link in the
page of Kim Jong-il...
4
Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find
unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web
Science, 2016.
28
52. Finding unexpected relations
Finally, we found out that the links unexpected by this model are
unexpected for a reason!4
The most unexpected link in the
page of Kim Jong-il...
...is Elvis Presley.
“Kim Jong-il was obsessed
with Elvis Presley. His
mansion was crammed with
his idol’s records and his
collection of 20,000
Hollywood movies.”
The model described by W did not expect a link from a
“North Korean communist” to a “Pioneer of music genres”.
4
Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find
unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web
Science, 2016.
28
55. Predicting features
• So, predicting links from features and estimating W are
intrinsecally connected problems
• What about the dual problem?
• That is: we completely know links and some of the features:
can we guess the others?
30
56. Predicting features
• This problem is known as label prediction
• But, here as well:
• We want to go beyond homophily
• We want to handle large scale graphs
31
57. Predicting features: a neural network
Supervised learning, every node is an instance:
• Input: raw description of
its in-neighborhood and its
out-neighborhood
?
r+
(i)k :=
Nk ∩ N+(i)
Nk N+(i)
, r−
(i)k :=
Nk ∩ N−(i)
Nk N−(i)
• Output: likelihood of the features for the considered node
32
58. Predicting features: a neural network
Input layer Hidden layer Max pooling Output layer
W
max z( i, 1)
max z( i, 2)
max z( i, 3)
W T
f (N −
(i), 1)
f (N −
(i), 2)
f (N −
(i), 3)
f (N +
(i), 1)
f (N +
(i), 2)
f (N +
(i), 3)
33
59. Predicting features: a neural network
Formally, if x is our output vector, the cost function is:
(x, Fi ) = λ W +
k∈Fi
log φ(xk) −
k∈Fi
log 1 − φ(xk)
where:
• λ is a regularization parameter
• φ(x) = (e−x + 1)−1, a standard sigmoid
• F is the set of features of our node i, and Fi its complement
We then can express x as a function of W
and optimize W with gradient descent (e.g. AdaGrad).
34
63. Conclusions
Take-home message: feature-rich complex network models can
be used to design, test and analyze graph data mining algorithms.
Specifically:
• Link prediction
• Label prediction
• Anomaly detection and serendipity recommendations
37
64. Conclusions
Take-home message: feature-rich complex network models can
be used to design, test and analyze graph data mining algorithms.
Specifically:
• Link prediction
• Label prediction
• Anomaly detection and serendipity recommendations
On the way, we also found out that:
• Feature-rich models lead to realistic complex networks
37
65. Conclusions
Take-home message: feature-rich complex network models can
be used to design, test and analyze graph data mining algorithms.
Specifically:
• Link prediction
• Label prediction
• Anomaly detection and serendipity recommendations
On the way, we also found out that:
• Feature-rich models lead to realistic complex networks
• We can measure the explainability of different sets of features
37
66. Conclusions
Take-home message: feature-rich complex network models can
be used to design, test and analyze graph data mining algorithms.
Specifically:
• Link prediction
• Label prediction
• Anomaly detection and serendipity recommendations
On the way, we also found out that:
• Feature-rich models lead to realistic complex networks
• We can measure the explainability of different sets of features
• A (clean) typing system can be a powerful tool in semantic networks
37
67. Applications and future works
We have seen applications to:
• Semantic network: link prediction and serendipity finding
• Scientific networks: citation analysis
38
68. Applications and future works
We have seen applications to:
• Semantic network: link prediction and serendipity finding
• Scientific networks: citation analysis
However the model is general and scalable. Possible applications:
• Biological network: gene interactome, gene ontologies
• Content networks: web pages and topics
• Social networks: link prediction and recommendations
38