Artificial Intelligence In Microbiology by Dr. Prince C P
001 20151005 ranking_nodesingrowingnetwork
1. Ranking nodes in growing network: When PageRank fails
Manuel Sebastian Mariani
Department of Physics, University of Fribourg
Under review in Scientific Reports
Paper Alert, October 5, 2015
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 1 / 13
2. Overview
1 Findings
Realistic temporal effects in growing network make PageRank fail in
individuating the most valuable nodes for a broad range of model
parameters (numerical simulations).
2 Future works
Diagnose whether a growing directed network is or not suitable for
the application of PageRank.
Build a well-grounded ranking algorithm on the temporal pattern of
the system.
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 2 / 13
3. PageRank
Ideas
1 Node is more important if it
has more in-links
2 Links from important nodes
count more
3 The node with both many
in-links and out-links may
not be important
In directed monopartite network
composed of N nodes (without zero
outdegree node), the vector of
PageRank scores {pi } = the
stationary solution of:
p
(t+1)
i = c
j
Aij
pt
j
kout
j
+
1 − c
N
(1)
Aij : one if node j points to node i
and zero otherwise
kout
j : outdegree of node j
c: teleportation parameter
t: iteration number
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 3 / 13
4. Relevance Model (RM)
Create new link When a node j
create a new link at time t, the
probability of choosing node i as
the target :
in
i
(t) ≈ (kin
i (t) + 1)ηi fR(t − τi )
(2)
kin
i (t): current degree of
node i
ηi : fitness
fR: function of node’s age
(τi is the time at which node
i enters the system)
Choose active nodes General
situation where nodes continue being
active, create outgoing links
continually. The m nodes that are
active at time t are chosen with the
probability
out
i
(t) ≈ Ai fA(t − τi ) (3)
Ai : activity parameter at node i
fA(t): monotonously decaying
function of time
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 4 / 13
5. Numerical simulation with RM
A good ranking algorithm is expected to produce an unbiased ranking
where both recent and old nodes have the same chance to appear at
the top.
Exponential decay functions: fR(t) = exp(−
t
θR
), fA(t) = exp(−
t
θA
Power decay functions: fR(t) = t−αR , fA(t) = t−αA )
Goal: study the dependence of PageRank performance on model
parameters θR, θA and αR, αA
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 5 / 13
6. PageRank Bias
PageRank score is biased by
decay parameters
The ranking by indegree is
essentially unbiased
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 6 / 13
7. Comparison of performance of PageRank and indegree
r(p, η): Pearson’s
correlation between the
PageRank scores p and the
fitness values η
r(kin, η): Pearson’s
correlation between the node
indegree and the fitness
values η
In RM, PageRank yields no
improvement with respect to
indegree in ranking nodes by
fitness.
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 7 / 13
8. Extended model based of fitness (EFM)
The PageRank algorithm assumes that important nodes point to
other importance nodes, this feature is absent in RM.
Extended Fitness Model: high and low fitness nodes differ not only in
their ability to attract new incoming links, but also in their sensitivity
to the fitness of the other nodes when choosing their outgoing
connections.
The probability in
i;j that a link created by node j at time t ends in
node i:
in
i;j
(t) ≈ (kin
i (t) + 1)1−ηj
η
ηj
i fR(t − τi ) (4)
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 8 / 13
9. Linking pattern in EFM
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 9 / 13
10. Comparison of performance of PageRank and indegree
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 10 / 13
11. Decay of empirical relevance and activity in real data
Empirical relevance The empirical relevance ri (t) of node i at time t
ri (t) =
ni (t)
nPA
i (t)
(5)
ni (t) = ∆kin
i (t, ∆t)/L(t, ∆t): ratio between the number ∆kin
i (t, ∆t)
of incoming links received by node i in a time window [t, t + ∆t] and
the total number L(t, ∆t) of links within the same window
nPA
i (t) = kin
i (t)/ j kin
j (t)
Total relevance Ti = t ri (t)
Empirical activity The activity of node i at time t
ai (t) =
∆kout
i (t, ∆t)
L(t, ∆t)
(6)
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 11 / 13
12. Comparison of PageRank and indegree correlation with
total relevance
Manuel Sebastian Mariani When PageRank fails Paper Alert, October 5, 2015 12 / 13