Yandex wg-talk
Upcoming SlideShare
Loading in...5
×
 

Yandex wg-talk

on

  • 708 views

Научно-технический семинар в Яндексе 28 октября (Колин Купер)

Научно-технический семинар в Яндексе 28 октября (Колин Купер)

Statistics

Views

Total Views
708
Views on SlideShare
317
Embed Views
391

Actions

Likes
1
Downloads
1
Comments
0

4 Embeds 391

http://tech.yandex.ru 376
https://tech.yandex.ru 12
http://admin-ru.tech.yandex-team.ru 2
http://news.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Yandex wg-talk Yandex wg-talk Presentation Transcript

  • Random graph process models of large networks Colin Cooper Department of Informatics King’s College London 28th October 2013 Yandex
  • Random graph process Graph process: at each step the existing graph is modified by making a small number of structural changes, e.g. Add a new vertex with edges incident to existing graph Add edges within the existing graph Delete some edges or vertices Exchange some existing edges for others If these changes are random then some asymptotic structural properties may emerge as the process evolves. For example The degree sequence has a power law with parameter γ
  • Outline Introduction Various web graph models Degree distribution: Undirected model Hub-Authority model: Directed Web-graphs of increasing degree
  • Experimental studies Large-scale dynamic networks such as the Internet and the World Wide Web Barabási and Albert, Emergence of scaling in random networks, (1999). Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins and Wiener, Graph Structure in the Web, (2000). M. Faloutsos and P. Faloutsos and C. Faloutsos, On Power-law Relationships of the Internet Topology, (1999)
  • Power law degree sequence Proportion of vertices of a given degree k follows an approximate inverse power law nk ∼ Ck −γ for some constants C, γ Various explanatory models e.g. Bollobás, Riordan, Spencer and Tusnády, The degree sequence of a scale-free random graph process, (2001) Aiello, Chung and Lu, A random graph model for massive graphs, (2000) Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins and Upfal. Stochastic models for the web graph, (2000) Dorogovtsev, Mendes and Samukhin, Structure of growing networks with preferential linking (2000)
  • Preferential attachment One approach: generate graphs via a preferential attachment PA: attach to a vertex proportional to degree PA gives a power law distribution parameter γ = 3 The preferential attachment model dates back to Yule G. Yule. A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis, Philosophical Transactions of the Royal Society of London (Series B) (1924). Yule model: Random tree. Each point independently generates children with rate 1 in time interval ∆t. Early points have most children PA was proposed as a random graph model for the web by Barabási and Albert. Emergence of scaling in random networks, (1999)
  • Publications relevant to this talk Cooper and Frieze, A general model of web graphs, RSA (2003) An analysis of the recurrence for the expected number of vertices of degree k , combined with concentration results and bounds for maximum degree. Uses Laplace’s method to solve recurrences with rational coefficients Cooper. The age specific degree distribution of web-graphs, CPC (2006) Derives degree distribution directly, and uses this to obtain expected number of vertices of degree k Cooper, Pralat. Scale-free graphs of increasing degree, RSA (2011) Adapts the degree distribution method to obtain results for growth model
  • Web-graph models Simple undirected or directed process models where a mixture of vertices and edges are added at each step either preferentially or uniformly at random For undirected web-graph processes, as the degree k tends to infinity, the expected proportion of vertices of degree k tends to Nk ∝ k −γ . The power law parameter is given by γ = 1 + 1/η. Here η is the limiting ratio of the expected number of edge endpoints inserted in the process by preferential attachment to the expected total degree The maximum degree ∆ in this model is a.s. ∆ = O (nη ) where n is the number of vertices Surprisingly, these results seem to hold for other types of process model and can be useful as a general heuristic
  • Some examples of the power law heuristic Standard preferential attachment: Make G(t) from G(t − 1) by adding a new vertex vt with (an average of) m neighbours chosen preferentially from G(t − 1) η= Power law 1 m = 2m 2 γ =1+ Maximum degree 1 =1+2=3 η ∆ = O n1/2
  • Experimental evidence PA model Rapid convergence for PA graphs γ = 3 20, 000 vertices is enough (see light blue plot data) Thanks to Yiannis Siantos for the figure
  • Non-standard triangle closing model Make G(t) from G(t − 1) by adding a new vertex vt with one neighbour u chosen u.a.r from G(t − 1) and one edge from vt to a random neighbour w of u Pr(w chosen) ∝ d(w) One edge in 4 is chosen preferentially
  • Proportion of edges added preferentially is η= 1 4 So heuristically Power law γ =1+ Maximum degree 1 =1+4=5 η ∆ = O n1/4 Experimentally this seems to be true in the limit (see next slide) The model seems difficult to analyze formally
  • Heuristic gives no information on convergence rate Slow convergence: Large experiments up to 4 × 108 vertices Still not quite arrived at γ = 5, ∆ = O n1/4 Thanks to Yiannis Siantos for the figure
  • Web-graph model generative choices
  • Web-graph model: Power law degree sequence For undirected web-graph process, as the vertex degree k tends to infinity, the expected proportion of vertices of degree k tends to Nk ∝ k −γ . The power law parameter is given by γ = 1 + 1/η where η is the limiting ratio of the expected number of edge endpoints inserted by preferential attachment to the expected total degree Any γ > 2 can be obtained by suitable choices of parameters
  • Undirected Web-graph model parameters At each step either NEW vertex (+edges) is added with probability α or extra edges added between OLD vertices with prob. β =1−α For convenience edges are regarded as "directed out" from new vertex The number of edges is sampled from a distribution depending on the choice made (NEW, OLD) Each edge endpoint makes independent UAR or PA choices: A. New vertex v , choice for edges directed OUT from v B. Old vertex v , choice for extra edge directed OUT from v C. Old vertex v , choice for extra edge directed IN to v
  • Undirected model continued NEW procedure. All edges are "directed out" from new vertex. Each edge of v chooses independently using probability mixture (parameter A) Pr(w is selected) = A1 1 d(w, t) + A2 2|E(t)| |V (t)| where Pr(w is selected by ei ) = A1 + A2 = 1 w In all OLD cases Z = A, B, C we have pZ (v , t) = Z1 1 d(v , t − 1) + Z2 2|E(t − 1)| |V (t − 1)|
  • Result of these choices At each step with prob. α, NEW vertex (+edges) is added, with prob. β = 1 − α extra edges are added between OLD vertices The number of edges m, M (NEW, OLD) sampled from a probability distribution. Expected number of edges m, M A. New vertex v , edges directed OUT from v B. Old vertex v , edges directed OUT from v C. Old vertex v , edges directed IN to v Degree distribution depends on two parameters η, ν PA UAR η= αmA1 + βM(B1 + C1 ) 2(αm + βM) ν= αmA2 + βM(B2 + C2 ) α
  • Degree distribution: Undirected model η = αmA1 + βM(B1 + C1 ) 2(αm + βM) PA ν = αmA2 + βM(B2 + C2 ) α Uar Vertex v of initial degree m added at step v Distribution of degree d(v , t), of v at step t P(d(v , t) = m+ |m) ∼ +m+ ν −1 η v t m η +ν 1− Assumes t → ∞ and v is added after time v0 → ∞, and = o(t 1/4 ) v η t
  • Illustration: Pr (degree increases by 2) Prob. of change p, no change q at step t η(m + j) ν + t t Change points τ1 , τ2 p(j, t) ∼ q(j, t) = 1 − p(j, t) v | − − − − − −|τ1 − − − − − −|τ2 − − − − − − − −|t Prob of exactly 2 changes at τ1 , τ2 q(0, v + 1) · · · q(0, τ1 − 1)p(0, τ1 ) ×q(1, τ1 + 1) · · · q(1, τ2 − 1)p(1, τ2 ) ×q(2, τ2 + 1) · · · q(2, t) first change at τ1 second change at τ2 no further changes This evaluates to v F (τ1 , τ2 ) ∼ ((ηm+ν)(η(m+1)+ν)) t m+ν η−1 ητ1 tη η−1 ητ2 tη
  • This evaluates to F (τ1 , τ2 ) ∼ ((ηm+ν)(η(m+1)+ν)) v t m+ν η−1 ητ1 tη η−1 ητ2 tη Add over all possible τ1 , τ2 F (τ1 , τ2 ) ∼ ∼ (ηm+ν)(η(m+1)+ν) 2! (ηm+ν)(η(m+1)+ν) 2! v t v t m+ν m+ν t ητ η−1 dτ tη v v η 2 1− t 2
  • From deg. distn we can obtain.. n( | m) expected proportion of vertices of degree m + n( | m) = (( + m − 1)η + ν) · · · (mη + ν) (( + m)η + ν + 1) · · · (mη + ν + 1) Proportion, Nt ( | m) of vertices of degree m + concentrated around n( | m) provided t → ∞, and not too large As → ∞, n( | m) ∼ K −(1+1/η) Range of η is 0 < η < 1. Power law coefficient γ ≥ 2 η= αmA1 + βM(B1 + C1 ) 2(αm + βM) As η → 0. Geometric degree sequence random graph lim nη ( | m) ∼ η→0 1 ν+1 ν ν+1
  • Hub-Authority model: Directed Hub: Vertex with a lot of edges directed out (opinionated page) Authority: Vertex with a lot of edges directed in (popular page) The initial in- and out-degree is given by a distribution (P − , P + ) How does a new vertex v added at step t + 1 choose its IN-neighbours? Pr(w points to v ) = D1 1 d + (w, t) + D2 |E(t)| |V (t)| It is most likely a hub vertex will point an edge to v How does a new vertex added at step t + 1 choose its OUT-neighbours? Pr(v points to w) = A1 d − (w, t) 1 + A2 , |E(t)| |V (t)| it is most likely v will point to an authority vertex
  • Results summary Undirected model √ ( ) Age dependent degree distribution √ ( ) Number of vertices with given degree √ ( ) Asymptotic degree sequence n(k ) ∼ k −x Hub-Authority model √ ( ) Age dependent in- and out-degree distribution √ ( , ×) Number of vertices with given in- & out-degree (as an integral) √ ( ) Asymptotic degree sequence n(k , l) ∼ k −x − −x + , x = x(k , ) General Directed model (×) The in- and out-degree distribution is not obtainable explicitly Sum of path dependent integrals (order of events matters)
  • Directed model. Definition only In general, the choice type can be made on a mixture of IN and OUT degree E.g. How does a new vertex added at step t choose its OUT-neighbours? Pr(v points to w) = A(1,+) d − (w, t − 1) 1 d + (w, t − 1) + A(1,−) + A2 , |E(t − 1)| |E(t − 1)| |V (t − 1)| where A(1,+) + A(1,+) + A2 = 1 An in-degree of 2 at w could be made up of various choices (++), (+−), (−+), (−−) at w by subsequent vertices t > w
  • Results: Hub-Authority model Degree distribution: Explicit distribution (similar to undirected) Power law: Number of vertices n(r , s) of in-degree r , out-degree s is of the form − n(r , s | m− , m+ ) = Cr ,s r −x s−x + The parameters x − , x + depend on the relative sizes of r , s + − They change as s increases from 1 to s = Θ(r η /η ) Functional form x = f (η + , η − , ν, m+ , m− ) quotient η + , η − are the preferential attachment parameters The parameter η − is the limiting ratio of the expected number of edges whose terminal vertex was chosen by preferential attachment, to the expected number of edges of the process η− = αm+ A1 + βMC1 αm+ + γm− + βM
  • How does degree sequence differ from Undirected? Pr(d − (v , t) = r , d + (v , t) = s) ∼ Pr(d − (v , t) = r )Pr(d + (v , t) = s) Expected proportion of vertices of degree (r , s) − + n(r , s) = Cr −(1−ξ ) s−(1−ξ ) J(r , s) where ξ + = m+ + ν + /η + and 1 J(r , s) = x a (1 − x)r (1 − x b )s dx 0 where b = η + /η − and a = η + /η − ξ + + 1/η − + ξ − − 1 Asymptotics for J(r , s) depend on relative sizes of r , s
  • Increasing degree model: Preferential Attachment Can we escape from power law γ = 3 by increasing the number of edges added at each step? At each step t add NEW vertex with f (t) edges f (t) = [t c ], For k 0<c<1 t c the power law we get is nk = C t 1+c 1−c 3−c k 1+c Need c > 0 constant to escape power law γ = 3 given by PA models When c = 1 all vertices have degree ∼ t so no power law anymore For 0 < c < 1 the power law is γ(c) = 1 + 2/(1 − c) > 3
  • Concluding remarks Good points of web-graph model Method works well for undirected models Provides a heuristic for predicting degree sequence power law and maximum degree in unrelated models Generalizes to hypergraph models (not covered in this talk) If 1 ≤ m(t) = t o(1) edges added at step t, power law is 3 Not so good points of web-graph model Directed models less pleasing, as power law varies as a function of relative sizes of in-degree and out-degree General directed model: no closed form for degree distribution? Model does not explain/predict power laws with parameter γ < 2 (As η ≤ 1 it must be that γ = 1 + 1/η ≥ 2)
  • THANK YOU QUESTIONS