Asymptotic behaviour of ranking
algorithms in directed random
networks
Nelly Litvak
University of Twente, The Netherlands
...
Power law of PageRank
Pandurangan, Raghavan, Upfal, 2002.
[ Nelly Litvak, SOR group ] 2/25
Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
[ Nelly Litvak, SOR...
Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node ...
Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node ...
Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node ...
Regular variation
X is regularly varying random variable with index α
P(X > x) = L(x)x−α
, x > 0
L(x) is slowly varying:
f...
Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
[ Nelly Litvak, SOR g...
Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
PageRank Ri of page i...
Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
PageRank Ri of page i...
Examples of applications
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Topic-sensitive search (Haveliwala, 2002);
Spam...
Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
[ Nelly Litva...
Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Stochastic eq...
Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Theorem (Volkovich&L 2010)
If P(B > x) = o(P(N > x)), then the foll...
Power Law behaviour of PageRank
Data for Web, Wikipedia and Preferential Attachment graph
[ Nelly Litvak, SOR group ] 9/25
Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,
Olvera-Cra...
Recursion on a graph
So far we, in fact, consider recursion on a tree
Will similar results hold on a particular graph stru...
Directed configuration model
Directed graph on n nodes V = {v1, . . . , vn}.
In-degree and out-degree:
mi = in-degree of no...
Assumptions on the target distributions
Suppose further that for some α, β 2,
F(x) =
k>x
fk x−α
LF (x)
and
G(x) =
k>x
gk x...
The bi-degree sequence (Chen&Olvera-Cravioto, 2012)
1 Fix 0 < δ0 < 1 − θ, θ = max{α−1, β−1, 1/2}.
2 Sample {γ1, . . . , γn...
Constructing the graph
Using the bi-degree-sequence (N, D) for the in- and
out-degrees:
assign to each node vi a number mi...
PageRank in directed configuration model
Ci = ζi /Di , where {ζi } is a sequence of i.i.d. random variables
independent of ...
Matrix iterations
R(n,0)
= B,
R(n,1)
= R(n,0)
M + Q = BM + Q,
R(n,2)
= R(n,1)
M + Q = BM2
+ QM + Q,
R(n,3)
= R(n,2)
M + Q ...
Idea of the analysis
ˆR
(n,k)
1 – PageRank on a perfect branching tree
R – solution of the equation
R
d
=
γ
i=1
Cj Rj + Q
...
Idea of the analysis
If we prove that for some k = k(n) → ∞ and any > 0,
(Matrix Iterations) P R
(n,∞)
1 − R
(n,k)
1 > → 0...
Coupling with branching tree
We start with random node (node 1) and explore its
neighbours, labeling the stubs that we hav...
Coupling with branching tree
Lemma
Let τ be the number of generations of the TBP that we are able to
complete before we dr...
Combining with matrix iteration
P R
(n,∞)
1 − R
(n,k)
1 > ckKn = o(1)
We need ckn = o(1) for some k < τ
Combining this wit...
Main result
Let n be the number of nodes in the random graph, and let N
and D be r.v.s having the in-degree and effective o...
Work in progress
Relaxing conditions on c: better bounds for τ and the matrix
iterations
So far, finite variance assumption...
Thank you!
[ Nelly Litvak, SOR group ] 25/25
Upcoming SlideShare
Loading in …5
×

Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

390 views

Published on

There is a vast empirical research on the behaviour of ranking algorithms, e.g. Google PageRank, in scale-free networks. In this talk, we address this problem by analytical probabilistic methods. In particular, it is well-known that the PageRank in scale-free networks follows a power law with the same exponent as in-degree. Recent probabilistic analysis has provided an explanation for this phenomenon by obtaining a natural approximation for PageRank based on stochastic fixed-point equations. For these equations, explicit solutions can be constructed on weighted branching trees, and their tail behavior can be described in great detail.

In this talk we present a model for generating directed random graphs with prescribed degree distributions where we can prove that the PageRank of a randomly chosen node does indeed converge to the solution of the corresponding fixed-point equation as the number of nodes in the graph grows to infinity. The proof of this result is based on classical random graph coupling techniques combined with the now extensive literature on the behavior of branching recursions on trees.

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
390
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

  1. 1. Asymptotic behaviour of ranking algorithms in directed random networks Nelly Litvak University of Twente, The Netherlands joint work with Mariana Olvera-Cravioto and Ningyuan Chen Workshop on Extremal Graph Theory Moscow, 06-06-2014
  2. 2. Power law of PageRank Pandurangan, Raghavan, Upfal, 2002. [ Nelly Litvak, SOR group ] 2/25
  3. 3. Power laws in complex networks Power laws: Internet, WWW, social networks, biological networks, etc... [ Nelly Litvak, SOR group ] 3/25
  4. 4. Power laws in complex networks Power laws: Internet, WWW, social networks, biological networks, etc... degree of the node = # (in-/out-) links [fraction nodes degree at least k] = pk, Power law: pk ≈ const · k−α, α > 0. Power law is the model for high variability: some nodes (hubs) have extremely many connections [ Nelly Litvak, SOR group ] 3/25
  5. 5. Power laws in complex networks Power laws: Internet, WWW, social networks, biological networks, etc... degree of the node = # (in-/out-) links [fraction nodes degree at least k] = pk, Power law: pk ≈ const · k−α, α > 0. Power law is the model for high variability: some nodes (hubs) have extremely many connections log pk = log(const) − α log k [ Nelly Litvak, SOR group ] 3/25
  6. 6. Power laws in complex networks Power laws: Internet, WWW, social networks, biological networks, etc... degree of the node = # (in-/out-) links [fraction nodes degree at least k] = pk, Power law: pk ≈ const · k−α, α > 0. Power law is the model for high variability: some nodes (hubs) have extremely many connections log pk = log(const) − α log k Straight line on the log-log scale [ Nelly Litvak, SOR group ] 3/25
  7. 7. Regular variation X is regularly varying random variable with index α P(X > x) = L(x)x−α , x > 0 L(x) is slowly varying: for every t > 0, L(tx)/L(x) → 1 as x → ∞ [ Nelly Litvak, SOR group ] 4/25
  8. 8. Google PageRank S. Brin, L. Page, The anatomy of a large-scale hypertextual Web search engine (1998) [ Nelly Litvak, SOR group ] 5/25
  9. 9. Google PageRank S. Brin, L. Page, The anatomy of a large-scale hypertextual Web search engine (1998) PageRank Ri of page i = 1, . . . , n is defined as a stationary distribution of a random walk with jumps: Ri = j → i c dj Rj + (1 − c)bi , i = 1, . . . , n dj = # out-links of page j c ∈ (0, 1), originally 0.85, probability of a random jump bi probability to jump to page i, originally, bi = 1/n personalized PageRank: bi = 1/n [ Nelly Litvak, SOR group ] 5/25
  10. 10. Google PageRank S. Brin, L. Page, The anatomy of a large-scale hypertextual Web search engine (1998) PageRank Ri of page i = 1, . . . , n is defined as a stationary distribution of a random walk with jumps: Ri = j → i c dj Rj + (1 − c)bi , i = 1, . . . , n dj = # out-links of page j c ∈ (0, 1), originally 0.85, probability of a random jump bi probability to jump to page i, originally, bi = 1/n personalized PageRank: bi = 1/n [ Nelly Litvak, SOR group ] 5/25
  11. 11. Examples of applications Ri = j → i c dj Rj + (1 − c)bi , i = 1, . . . , n Topic-sensitive search (Haveliwala, 2002); Spam detection (Gy¨ongyi et al., 2004) Finding related entities (Chakrabarti, 2007); Link prediction (Liben-Nowell and Kleinberg, 2003; Voevodski, Teng, Xia, 2009); Finding local cuts (Andersen, Chung, Lang, 2006); Graph clustering (Tsiatas, Chung, 2010); Person name disambiguation (Smirnova, Avrachenkov, Trousse, 2010); Finding most influential people in Wikipedia (Shepelyansky et al, 2010, 2013) [ Nelly Litvak, SOR group ] 6/25
  12. 12. Stochastic model for PageRank Rescale: Ri → nRi , bi → nbi Ri = j → i c dj Rj + (1 − c)bi , i = 1, . . . , n [ Nelly Litvak, SOR group ] 7/25
  13. 13. Stochastic model for PageRank Rescale: Ri → nRi , bi → nbi Ri = j → i c dj Rj + (1 − c)bi , i = 1, . . . , n Stochastic equation: R d = c N j=1 1 Dj Rj + cp0 + (1 − c)B N: in-degree of the randomly chosen page D: out-degree of page that links to the randomly chosen page p0: fraction of pages with out-degree zero Rj is distributed as R; N, D, Rj are independent; N and B can be dependent We can denote Q = cp0 + (1 − c)B, Cj = c/Dj . [ Nelly Litvak, SOR group ] 7/25
  14. 14. Results for stochastic recursion R d = N j=1 Cj Rj + Q Theorem (Volkovich&L 2010) If P(B > x) = o(P(N > x)), then the following are equivalent: P(N > x) ∼ x−αN LN(x) as x → ∞, P(R > x) ∼ cNx−αN LN(x) as x → ∞, where cN = (E(c/D))αN [1 − E(N)E((C)αN )]−1 [ Nelly Litvak, SOR group ] 8/25
  15. 15. Power Law behaviour of PageRank Data for Web, Wikipedia and Preferential Attachment graph [ Nelly Litvak, SOR group ] 9/25
  16. 16. Results for stochastic recursion R d = N j=1 Cj Rj + Q Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012, Olvera-Cravioto 2012 analyzed the recursion in details using sample path large deviation and implicit renewal theory. Tail behaviour of R is obtained under most general assumptions on Cj ’s R can be heavy-tailed even when N is light-tailed. [ Nelly Litvak, SOR group ] 10/25
  17. 17. Recursion on a graph So far we, in fact, consider recursion on a tree Will similar results hold on a particular graph structure? Some graphs are tree-like (Thorny Branching Process, TBP) [ Nelly Litvak, SOR group ] 11/25
  18. 18. Directed configuration model Directed graph on n nodes V = {v1, . . . , vn}. In-degree and out-degree: mi = in-degree of node vi = number of edges pointing to vi . di = out-degree of node vi = number of edges pointing from vi . (m, d) = ({mi }, {di }) is called a bi-degree-sequence. Target distributions: In-degree: F = (fk : k = 0, 1, 2, . . . ), and Out-degree: G = (gk : k = 0, 1, 2, . . . ). [ Nelly Litvak, SOR group ] 12/25
  19. 19. Assumptions on the target distributions Suppose further that for some α, β 2, F(x) = k>x fk x−α LF (x) and G(x) = k>x gk x−β LG (x), for all x 0, where LF (·) and LG (·) are slowly varying. Assume both F and G have finite variance. [ Nelly Litvak, SOR group ] 13/25
  20. 20. The bi-degree sequence (Chen&Olvera-Cravioto, 2012) 1 Fix 0 < δ0 < 1 − θ, θ = max{α−1, β−1, 1/2}. 2 Sample {γ1, . . . , γn} i.i.d. from F; let Γn = n i=1 γi . 3 Sample {ξ1, . . . , ξn} i.i.d. from G; let Ξn = n i=1 ξi . 4 Let ∆n = Γn − Ξn. If |∆n| nθ+δ0 go to step 5; otherwise go to step 2. 5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without replacement and let Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n, where χi = 1 if ∆n 0 and i ∈ S, 0 otherwise, and τi = 1 if ∆n < 0 and i ∈ S, 0 otherwise. [ Nelly Litvak, SOR group ] 14/25
  21. 21. Constructing the graph Using the bi-degree-sequence (N, D) for the in- and out-degrees: assign to each node vi a number mi of inbound stubs and a number di of outbound stubs; pair outbound stubs to inbound stubs to form directed edges by matching to each inbound stub an outbound stub chosen uniformly at random from the set of unpaired outbound stubs. proceed in the same way for all remaining unpaired inbound stubs, i.e., choose uniformly from the set of unpaired outbound stubs and draw the corresponding directed edge. The result is a multigraph (e.g., with self-loops and multiple edges in the same direction) on nodes {v1, . . . , vn}. [ Nelly Litvak, SOR group ] 15/25
  22. 22. PageRank in directed configuration model Ci = ζi /Di , where {ζi } is a sequence of i.i.d. random variables independent of (N, D) (ζi = c in a classical case) M = M(n) ∈ Rn×n is related to the adjacency matrix of the graph: Mi,j = sij Ci , if there are sij edges from i to j, 0, otherwise. Q ∈ Rn is a personalization vector We are interested in one coordinate, R1, of the vector R ∈ Rn defined by R = RM + Q [ Nelly Litvak, SOR group ] 16/25
  23. 23. Matrix iterations R(n,0) = B, R(n,1) = R(n,0) M + Q = BM + Q, R(n,2) = R(n,1) M + Q = BM2 + QM + Q, R(n,3) = R(n,2) M + Q = BM3 + QM2 + QM + Q, ... R(n,k) = k−1 i=0 QMi + BMk , k 1. We are interested in analyzing P(R (n,∞) 1 > x), x → ∞. [ Nelly Litvak, SOR group ] 17/25
  24. 24. Idea of the analysis ˆR (n,k) 1 – PageRank on a perfect branching tree R – solution of the equation R d = γ i=1 Cj Rj + Q We will try to prove the following: for any fixed t ∈ R, and a randomly chosen node v, P(R (n,∞) 1 t) ≈ P(R (n,k) 1 t) ≈ P( ˆR (n,k) 1 t) ≈ P(R t) for large enough n, k. [ Nelly Litvak, SOR group ] 18/25
  25. 25. Idea of the analysis If we prove that for some k = k(n) → ∞ and any > 0, (Matrix Iterations) P R (n,∞) 1 − R (n,k) 1 > → 0, (1) (Coupling with branching tree) P R (n,k) 1 − ˆR (n,k) 1 > → 0, (2) (Limiting solution) P ˆR (n,k) 1 − R > → 0, (3) as n → ∞, then it will follow, by Slutsky’s lemma, that R (n,∞) 1 ⇒ R(∞) as n → ∞, where ⇒ denotes convergence in distribution. [ Nelly Litvak, SOR group ] 19/25
  26. 26. Coupling with branching tree We start with random node (node 1) and explore its neighbours, labeling the stubs that we have already seen τ – the number of generations of WBP completed before coupling breaks [ Nelly Litvak, SOR group ] 20/25
  27. 27. Coupling with branching tree Lemma Let τ be the number of generations of the TBP that we are able to complete before we draw the first stub that has already been observed before. Then, for any 0 < < 1/2, and a = (1/2 − )/ log m, where m = E[N] P(τ a log n) = O n− /2 as n → ∞. [ Nelly Litvak, SOR group ] 21/25
  28. 28. Combining with matrix iteration P R (n,∞) 1 − R (n,k) 1 > ckKn = o(1) We need ckn = o(1) for some k < τ Combining this with Lemma 2, we get the main result [ Nelly Litvak, SOR group ] 22/25
  29. 29. Main result Let n be the number of nodes in the random graph, and let N and D be r.v.s having the in-degree and effective out-degree distributions, resp. Let R(n) be the rank vector computed on the graph with n nodes. Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose 0 < c < 1/(E[N])2, then R1(n) ⇒ R, n → ∞, where R is the solution to the fixed point equation R d = q + c N i=1 Ri Di . [ Nelly Litvak, SOR group ] 23/25
  30. 30. Work in progress Relaxing conditions on c: better bounds for τ and the matrix iterations So far, finite variance assumption The result probably will not hold for all c ∈ (0, 1). The PageRank must converge for all c < 1. Will we obtain the same power law but with different factor? [ Nelly Litvak, SOR group ] 24/25
  31. 31. Thank you! [ Nelly Litvak, SOR group ] 25/25

×