Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Random walks and diffusion on networks

11 views

Published on

Presentation slides for Winter School on Spectral Methods for Complex Systems, a webinar, January 21-30 (2019)

Published in: Science
  • Be the first to comment

  • Be the first to like this

Random walks and diffusion on networks

  1. 1. Random walks and diffusion on networks Naoki Masuda Department of Engineering Mathematics University of Bristol, UK naoki.masuda@bristol.ac.uk http://www.naokimasuda.net Online Winter School on Spectral Methods for Complex Systems Webinar, January 21–30 (2019) Ref: Masuda, Porter & Lambiotte, Physics Reports, 716–717, 1–58 (2017). [open access] Naoki Masuda (University of Bristol) Random walks and diffusion on networks 1 / 61
  2. 2. Talk plan • Lecture 1: Introduction, dynamics and stationary density • Random walks and diffusion • 1-dim • Networks • Lecture 2: Applications • PageRank • Community detection • Respondenet-driven sampling • Consensus dynamics Naoki Masuda (University of Bristol) Random walks and diffusion on networks 2 / 61
  3. 3. Lecture 1: Introduction, dynamics and stationary density Naoki Masuda (University of Bristol) Random walks and diffusion on networks 3 / 61
  4. 4. Diffusion is a broad term • Diffusion of innovation • Biological diffusion (public domain; US Library of Congress’s Prints and Photographs division under the digital ID cph.3b46036) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 4 / 61
  5. 5. Background • “Gambler’s ruin”: Pascal, Fermat, Huygens, Bernoulli, … • Term “random walk” was coined by Karl Pearson (1905). • Brownian motion of particles colliding with atoms and molecules (Einstein, 1905) • Applications: locomotion and foraging of animals, neuronal firing, decision-making in the brain, population genetics, polymer chains, financial markets, sports statistics … • Relevant theoretical research areas: probability theory (in maths), CS, statistical physics, OR, … Naoki Masuda (University of Bristol) Random walks and diffusion on networks 5 / 61
  6. 6. Why do we study RWs on networks? • Theoretical interest • Applications. Especially, RWs lie at the core of many algorithms • PageRank • community detection • core-periphery structure • diffusion map • respondent-driven sampling • consensus algorithms Naoki Masuda (University of Bristol) Random walks and diffusion on networks 6 / 61
  7. 7. 1-dim • Assume the continuous space (R1) • Discrete-space case is similar • Assume discrete-time random walks (DTRWs) • In each time step, a walker at x moves to the interval [x + r, x + r + ∆r] with probability f (r)∆r. Then, the master equation is given by p(x; n) = ∫ ∞ −∞ f (x − x′ )p(x′ ; n − 1)dx′ . • As time n tends to ∞, p(x; n) = 1 (2πDn)1/2 e− (x−vn)2 4Dn , where v ≡ ⟨r⟩ and D ≡ ⟨(r−⟨r⟩)2⟩ 2 . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 7 / 61
  8. 8. Continuous-time RW for rwreview t2 t3 t4 timet5t1 3 t6 x (a) (b) 2 1 4 5 t7 t8 t9 t10 0 3 2 3 4 5 4 3 2 3 2 1 1 2 3 4 5 6 7 8 9 10steps (n) x 6 (This fig assumes a discrete space) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 8 / 61
  9. 9. for rwreview t2 t3 t4 timet5t1 3 t6 x (a) (b) 2 1 4 5 t7 t8 t9 t10 0 3 2 3 4 5 4 3 2 3 2 1 1 2 3 4 5 6 7 8 9 10steps (n) x 6 p(x; t) = ∞∑ n=0 p(x; n)p(n, t) where x: position, t: time, n: number of moves made ⟨n⟩ = t ⟨τ⟩ , where τ is inter-event time E.g., ψ(τ) = βe−βτ → p(n, t) = (βt)n n! e−βt ✓Full sol’n available in terms of the Laplace and Fourier transforms Naoki Masuda (University of Bristol) Random walks and diffusion on networks 9 / 61
  10. 10. RWs on networks Naoki Masuda (University of Bristol) Random walks and diffusion on networks 10 / 61
  11. 11. Adjacency matrix v1 2 3 4 5 v v v v ⇐⇒ A =       0 1 1 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 0       • We allow directed and weighted adjacency matrices. Naoki Masuda (University of Bristol) Random walks and diffusion on networks 11 / 61
  12. 12. Transition-probability matrix (of DTRW) • i → j with probability Tij = Aij sout i where sout i = ∑N j=1 Aij is the out-strength of node i. 1 3 v v v4 5 2 v v v6 7 8v v A =             0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0             • Directed networks allowed • Weighted networks allowed Q: Which complicates things more than the other? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 12 / 61
  13. 13. Transition-probability matrix (of DTRW) • i → j with probability Tij = Aij sout i where sout i = ∑N j=1 Aij is the out-strength of node i. 1 3 v v v4 5 2 v v v6 7 8v v T =             0 0 1 3 1 3 0 0 0 1 3 0 0 0 1 2 0 0 0 1 2 1 0 0 0 0 0 0 0 1 3 0 0 0 0 1 3 0 1 3 1 6 1 6 1 6 1 6 0 0 1 6 1 6 1 3 0 0 0 1 3 0 0 1 3 1 2 0 0 0 0 0 0 1 2 0 0 0 0 0 1 0 0             • Directed networks allowed • Weighted networks allowed Q: Which complicates things more than the other? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 12 / 61
  14. 14. i → j with probability Tij = Aij sout i , where sout i = ∑N j=1 Aij • Conservation: ∑N j=1 Tij = 1 • Dynamics: pj (n + 1) = ∑N i=1 pi (n)Tij (j ∈ {1, . . . , N}) • Show that ∑N i=1 pi (n) = 1 • With vectors, p(n + 1) = p(n)T where p(n) = (p1(n) , . . . , pN(n)) • p(n) = p(0)Tn. That’s it. • Q: linear or nonlinear? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 13 / 61
  15. 15. Two main Qs for RWs • Both in theory and applications, we are mostly asking some of the following questions. • Stationary density (infinite-time behaviour) • Relaxation time (asymptotic behaviour) • First-passage time (finite-time behaviour) • Other important questions: • Flow • Other finite-time behaviour (esp. in community detection) • (Loss of) detailed balance Naoki Masuda (University of Bristol) Random walks and diffusion on networks 14 / 61
  16. 16. Stationary density of DTRWs p(n + 1) = p(n)T p(n) = p(0)Tn Let p∗ i = lim n→∞ pi (n) p∗ = p∗ T • The stationary density is the left eigenvector of T with eigenvalue 1. • The corresponding right eigenvector is (1 , . . . , 1)⊤. Why? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 15 / 61
  17. 17. A central result on RWs on networks For undirected networks, the stationariy density of DTRWs at node i is given by p∗ i = si ∑N ℓ=1 sℓ (i ∈ {1, . . . , N}) where si = ∑N j=1 Aij = ∑N j=1 Aji is the strength (i.e., weighted degree) of node i Proof: Use p∗ = p∗T, or p∗ i = ∑N j=1 p∗ j Tji (for each i). Naoki Masuda (University of Bristol) Random walks and diffusion on networks 16 / 61
  18. 18. What does p∗ i = si ∑N ℓ=1 sℓ mean? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 17 / 61
  19. 19. Stationary density for directed networks • No such handy results • First-order approximation: p∗ i = N∑ j=1 p∗ j Aji sout j ≈ (const) × N∑ j=1 Aji ∝ sin i where sin i = ∑N j=1 Aji is the in-strength of node i. • Accurate in many cases and inaccurate in many other cases Naoki Masuda (University of Bristol) Random walks and diffusion on networks 18 / 61
  20. 20. Example 1 3 v v v4 5 2 v v v6 7 8v v Node p∗ i kin i From kin i kout i From kout i v1 0.2275 5 0.2381 3 0.1429 v2 0.0140 1 0.0476 2 0.0952 v3 0.0899 2 0.0952 1 0.0476 v4 0.0969 3 0.1429 3 0.1429 v5 0.0843 1 0.0476 6 0.2857 v6 0.2528 2 0.0952 3 0.1429 v7 0.0140 1 0.0476 2 0.0952 v8 0.2205 6 0.2857 1 0.0476 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 19 / 61
  21. 21. Relaxation time • Strategy: Project p(n) onto an appropriate basis of vectors and to represent it in terms of its modes. • Sometimes called a “graph Fourier transform” • Assume undirected networks, just for simplicity. The transition probability matrix T is generally asymmetric even for undirected networks. But it has a special form. Consider the symmetric matrix ˜Aij = Aij √ si sj Decomposition: ˜A = N∑ ℓ=1 λℓuℓu⊤ ℓ • λℓ: ℓth eigenvalue of ˜A, which is real-valued. Why? • uℓ: corresponding normalized eigenvector (so that ⟨uℓ, uℓ′ ⟩ = δℓℓ′ ) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 20 / 61
  22. 22. ˜Aij = Aij √ si sj Because Tij = √ sj ˜Aij √ si , T and A are similar: T = D−1/2 ˜AD1/2 , where D = diag(k1, . . . , kN). Similarity implies that T and ˜A have the same eigenvalues. So, all eigenvalues of T are real-valued. The left and right eigenvectors of T for eigenvalue λℓ are given by uL ℓ = u⊤ ℓ D1/2 = ((uℓ)1 √ s1, . . . , (uℓ)N √ sN) and uR ℓ = D−1/2 uℓ = ( (uℓ)1 √ s1 , . . . , (uℓ)N √ sN )⊤ . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 21 / 61
  23. 23. Using Tn = D−1/2 ˜An D1/2 = D−1/2 N∑ ℓ=1 λn ℓ uℓu⊤ ℓ D1/2 = N∑ ℓ=1 λn ℓ uR ℓ uL ℓ , we obtain the mode expansion of the solution of the RW: p(n) = p(0)Tn = N∑ ℓ=1 λn ℓ uL ℓ ⟨p(0), uR ℓ ⟩ . which is equivalent to pi (n) = N∑ ℓ=1 aℓ(n)(uL ℓ )i , where aℓ(n) = λn ℓ ⟨p(0), uR ℓ ⟩ is the projection onto the ℓth eigenmode. “Graph Fourier transform”: p(n) → (a1(n), . . . , aN(n)). The eigenvectors of T play the role of the Fourier modes eikx . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 22 / 61
  24. 24. Eigenmodes, graph spectra, and relaxation • −1 ≤ λℓ ≤ 1 for each ℓ. • λℓ > −1 for all ℓ except for the bipartite graph. Then, the mode with λℓ = 1 corresponds to the stationary density. • λmax = 1 is unique if the network is connected. p(n) = p(0)Tn = N∑ ℓ=1 λn ℓ uL ℓ ⟨p(0), uR ℓ ⟩ Relaxation: p(n) ≈ 1n uL max⟨p(0), uR max⟩ + λn 2nduL 2nd⟨p(0), uR 2nd⟩ Naoki Masuda (University of Bristol) Random walks and diffusion on networks 23 / 61
  25. 25. p(n) ≈ 1n uL max⟨p(0), uR max⟩ + λn 2nduL 2nd⟨p(0), uR 2nd⟩ Because uR max = 1√∑N i=1 si    1 ... 1    and uL max = (s1, ... ,sN ) √∑N i=1 si , p(n) ≈ p∗ + λn 2nduL 2nd⟨p(0), uR 2nd⟩ Can verify this? • λ2nd governs the relaxation time. • 1 − λ2 is often called the “spectral gap”. • The Perron–Frobenius theorem guarantees all elements of uL max and uR max are positive. Naoki Masuda (University of Bristol) Random walks and diffusion on networks 24 / 61
  26. 26. Cheeger inequality Cheeger constant (also called conductance) h = min S { (number of edges that connect S and S) min{vol(S), vol(S)} } , • vol(S) ≡ ∑N i=1;vi ∈S (weighted degree of node i). • Seeking a minimum cut Cheeger inequality h2 2 < 1 − |λ2| ≤ 2h , • Relationship between the relaxation speed and h? • Intuition? Community structure? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 25 / 61
  27. 27. CTRWs on networks t2 t3 t4 timet5t1 3 t6 x (a) (b) 2 1 4 5 t7 t8 t9 t10 0 3 2 3 4 5 4 3 2 3 2 1 1 2 3 4 5 6 7 8 9 10steps (n) x 6 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 26 / 61
  28. 28. • Node-centric CTRW 1 The walker at node i waits an inter-event time τ distributed according to ψ(τ) 2 Then, it moves to an (out-) neighbour j with the probability ∝ Aij • Edge-centric CTRW: 1 Each edge (incident to i) is activated with an inter-event time τ distributed according to ψ(τ) 2 The walker moves the (first) activated edge to a neighbour j • They are the same only for regular networks (or sout i = (const) for directed networks) • When ψ(τ) is a non-Poissonian distribution (cf. temporal networks), things are more complicated. Here we assume a Poissonian ψ(τ). 1 1 1/4 1/4 (a) Node-centric CTRW (b) Edge-centric CTRW 1/3 1/4 1 1 1 1 11/3 1/3 1/4 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 27 / 61
  29. 29. (Poissonian) node-centric CTRW Master equation: dp(t) dt = p(t)(−I + T) ≡ −p(t)L′ = −p(t)D−1 L where L′ =I − T = ( δij − Aij sout i ) (random-walk normalized Laplacian) D =diag(sout 1 , . . . , sout N ) (diag(s1, . . . , sN) if undirected networks) L ≡D − A ((combinatorial) Laplacian matrix) Stationary density: dp∗(t) dt = 0 → p∗ = p∗T. Same as DTRW. If the network is undirected, p∗ i = si / ∑N ℓ=1 sℓ. Naoki Masuda (University of Bristol) Random walks and diffusion on networks 28 / 61
  30. 30. (Poissonian) edge-centric CTRW Master equation: dp(t) dt = p(t)(−D + A) = −p(t)L Stationary density: p∗ L = 0 i.e., zero eigenvector of L • For undirected networks, p∗ = 1 N (1 , . . . , 1) • For directed networks, no simple solution but the first-order approximation: p∗ i ≈ (const) × sin i sout i Naoki Masuda (University of Bristol) Random walks and diffusion on networks 29 / 61
  31. 31. Summary • DTRW • Driven by transition-probability matrix T • p∗ = p∗ T • p∗ i ∝ s∗ i for undirected networks • Mode decomposition • Relaxation governed by λ2nd • CTRW • Node-centric • Driven by random-walk normalized Laplacian L′ = I − T • Same p∗ as DTRW • Edge-centric • Driven by combinatorial Laplacian L = D − A • p∗ L = 0 • p∗ i = 1/N for undirected networks • Undirected vs directed Naoki Masuda (University of Bristol) Random walks and diffusion on networks 30 / 61
  32. 32. Lecture 2: Applications Naoki Masuda (University of Bristol) Random walks and diffusion on networks 31 / 61
  33. 33. Applications • PageRank • Community detection • Respondent-driven sampling • Consensus dynamics (if time allows) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 32 / 61
  34. 34. PageRank • Sergey Brin & Larry Page (1998) (but see Pinski & Narin 1976) Rank websites in Google search! They were born on 1973. 12th and 13rd richest persons in the world (Nov 2016) • Other applications: ranking of academic journals and papers, professional sports, disease-gene identification, recommendation systems in online marketplaces, prediction of traffic flow . . . • It is the stationary density of a DTRW on a network. Why should it be? Pictures from Wikipedia; CC BY 2.0 (for Sergey Brin) and CC BY-SA 3.0 (for Larry Page) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 33 / 61
  35. 35. Web graph Naoki Masuda (University of Bristol) Random walks and diffusion on networks 34 / 61
  36. 36. Design principle of the PageRank The rank of node i should be large when • i receives many edges (with large weights) • i received edges from important nodes j • i receives “exclusive” edges from j (A) (B) (C) v1 2 v1 v1v 2v 2v p∗ = p∗T, i.e., p∗ i = ∑N j=1 p∗ j Aji ∑N ℓ=1 Ajℓ fulfills these three criteria. Naoki Masuda (University of Bristol) Random walks and diffusion on networks 35 / 61
  37. 37. PageRank math • Empirical networks (e.g., web graph) are usually not “strongly connected” • Sinks (dangling nodes) i.e., sout i = 0 • Sources, i.e., sin i = 0 • Multiple strongly connected components • transient nodes and components Bow-tie strutcure Naoki Masuda (University of Bristol) Random walks and diffusion on networks 36 / 61
  38. 38. PageRank math • So, we allow walkers to teleport to other nodes with probability 1 − α to make an effective network that is strongly connected. • Master equation: pi (t + 1) = α N∑ j=1 pj (t)Tji + (1 − α)ui where (u1, . . . , uN) is the “preference vector” ( ∑N i=1 ui = 1) • A popular choice: ui = 1/N • At dangling nodes, teleport with probability 1 • A popular choice: α = 0.85 • Effective network: T′ ij = αTij + (1 − α)uj (and correction for dangling nodes) • PageRank: p∗ = p∗T′ Naoki Masuda (University of Bristol) Random walks and diffusion on networks 37 / 61
  39. 39. Back to the example 1 3 v v v4 5 2 v v v6 7 8v v Node p∗ i kin i kout i v1 0.2275 5 3 v2 0.0140 1 2 v3 0.0899 2 1 v4 0.0969 3 3 v5 0.0843 1 6 v6 0.2528 2 3 v7 0.0140 1 2 v8 0.2205 6 1 • The three criteria are respected? • i receives many edges (with large weights) • i receives edges from important nodes j • i receives “exclusive” edges from j Naoki Masuda (University of Bristol) Random walks and diffusion on networks 38 / 61
  40. 40. Tennis players on PageRank Who Is the Best (Radicchi, PLOS ONE, 6, e17249, 2011; paper published under CC BY license.) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 39 / 61
  41. 41. Tennis players on PageRank3Þ y 4Þ r 3) It is worth to notice that for q~1, Eqs. (6) correctly give Pr~2{‘ for any r, meaning that, in absence of diffusion, prestige is homogeneously distributed among all nodes. Conversely, for q~0 the solution is Table 1. Top 30 players in the history of tennis. Rank Player Country Hand Start End 1 Jimmy Connors United States L 1970 1996 2 Ivan Lendl United States R 1978 1994 3 John McEnroe United States L 1976 1994 4 Guillermo Vilas Argentina L 1969 1992 5 Andre Agassi United States R 1986 2006 6 Stefan Edberg Sweden R 1982 1996 7 Roger Federer Switzerland R 1998 2010 8 Pete Sampras United States R 1988 2002 9 Ilie Na ^ stase Romania R 1968 1985 10 Bjo¨rn Borg Sweden R 1971 1993 11 Boris Becker Germany R 1983 1999 12 Arthur Ashe United States R 1968 1979 13 Brian Gottfried United States R 1970 1984 14 Stan Smith United States R 1968 1985 15 Manuel Orantes Spain L 1968 1984 16 Michael Chang United States R 1987 2003 17 Roscoe Tanner United States L 1969 1985 18 Eddie Dibbs United States R 1971 1984 19 Harold Solomon United States R 1971 1991 20 Tom Okker Netherlands R 1968 1981 21 Mats Wilander Sweden R 1980 1996 22 Goran Ivanisˇevic’ Croatia L 1988 2004 23 Vitas Gerulaitis United States R 1971 1986 2‘{1 ‘{1 r~0 2{q 2 z 2{qð Þ‘ 2‘{1 1{ 2{qð Þ=2½ Š‘ 2{qð Þ=2½ Š z 2{qð Þ‘ : 2‘ { 2{qð Þ‘ q z 2{qð Þ‘ Figure 3. Prestige score in a single tournament. Prestige score Pr as a function of the number of victories r in a tournament with ‘~7 rounds (Grand Slam). Black circles are obtained from Eqs. (7) and valid for q~0. All other values of qw0 have been calculated from Eqs. (6): red squares stand for q~0:15, blue diamonds for q~0:5, violet up-triangles for q~0:85 and green down-triangles for q~1. doi:10.1371/journal.pone.0017249.g003 8 Pete Sampras United States R 1988 2002 9 Ilie Na ^ stase Romania R 1968 1985 10 Bjo¨rn Borg Sweden R 1971 1993 11 Boris Becker Germany R 1983 1999 12 Arthur Ashe United States R 1968 1979 13 Brian Gottfried United States R 1970 1984 14 Stan Smith United States R 1968 1985 15 Manuel Orantes Spain L 1968 1984 16 Michael Chang United States R 1987 2003 17 Roscoe Tanner United States L 1969 1985 18 Eddie Dibbs United States R 1971 1984 19 Harold Solomon United States R 1971 1991 20 Tom Okker Netherlands R 1968 1981 21 Mats Wilander Sweden R 1980 1996 22 Goran Ivanisˇevic’ Croatia L 1988 2004 23 Vitas Gerulaitis United States R 1971 1986 24 Rafael Nadal Spain L 2002 2010 25 Rau´l Ramirez Mexico R 1970 1983 26 John Newcombe Australia R 1968 1981 27 Ken Rosewall Australia R 1968 1980 28 Yevgeny Kafelnikov Russian FederationR 1992 2003 29 Andy Roddick United States R 2000 2010 30 Thomas Mu¨ster Austria L 1984 1999 Players having been at the top of ATP ranking are highlighted in gray. From left to right we indicate for each player: rank position according to prestige score, full name, country of origin, the hand used to play, and the years of the first and last ATP tournament played. doi:10.1371/journal.pone.0017249.t001 PLoS ONE | www.plosone.org 4 February 2011 | Volume 6 | Issue 2 | e17249 (Radicchi, PLOS ONE, 6, e17249, 2011) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 40 / 61
  42. 42. Community detection • A very very popular application of network analysis • Various algorithms to detect communities in unlabeled networks Naoki Masuda (University of Bristol) Random walks and diffusion on networks 41 / 61
  43. 43. (from Wikipedia Commons, licensed under the Creative Commons, CC BY-SA 4.0) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 42 / 61
  44. 44. Why RWs? • Idea: A random walker would wander within a community for a long time before transiting to another community. • Various algorithms based on this idea: Markov stability, Walktrap, InfoMap, Nibble, . . . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 43 / 61
  45. 45. Walktrap (Pons & Latapy, J. Graph Algo., 2006) • Consider an undirected and unweighted network • A (DT)RW-based distance between nodes: rij = N∑ ℓ=1 (Tn iℓ − Tn jℓ)2 kℓ Physical meaning of rij ? • Nodes i and j with small rij would belong to the same community • Then run a hierarchical clustering algorithm • Discounted by kℓ because p∗ ℓ ∝ kℓ • n should not be too large because limn→∞ Tn iℓ = limn→∞ Tn jℓ = p∗ ℓ , implying rij ≈ 0 (Wikipedia; copyright: CC BY-SA 4.0) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 44 / 61
  46. 46. InfoMap (Rosvall & Bergstrom, PNAS 2008) • Use DTRWs on (generally) directed and weighted networks • Encode each node into a binary code word • Ex: If v1 → 000, v2 → 001, v3 → 010, v4 → 011, v5 → 100, . . ., Trajectory v3, v6, v3, v1, v8, . . . is encoded into 010101010000111 · · · . • Unique decoding requires a “prefix-free” coding scheme. • : Ex: If v1 → 000 and v2 → 0001? • Huffman code is a famous prefix-free code yielding short binary sequences. frequently visited node → short code word • The Shannon entropy gives the lower bound of the mean code word length H = − N∑ i=1 p∗ i log p∗ i Not about communities so far Naoki Masuda (University of Bristol) Random walks and diffusion on networks 45 / 61
  47. 47. • Use coding schemes local to individual communities. • Extra workaround to get around entry to and exit from communities • But shorter code words if walkers wander within a community • Doesn’t matter if the same local code word is used in different communities • Find such a two-layer Huffman code that minimises the mean code word length 110 111 01 10 110 111 01 10 111 10 111 10 01 110 01 110 in: 11 out: 00 in: 00 out: 00 in: 01 out: 00 in: 10 out: 00 11 111 10 01 00 00 10 01 110 . . . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 46 / 61
  48. 48. Temporal communities (mapping change) (Rosvall & Bergstrom, PLoS ONE, 5, e8694 (2016); published under the terms of the Creative Commons) Apps, Code and more at their website: http://www.mapequation.org/ Naoki Masuda (University of Bristol) Random walks and diffusion on networks 47 / 61
  49. 49. Respondent-driven sampling • Estimate the fraction of infected individuals in a large or difficult-to-reach population. What would you do? • Respondent-driven sampling uses edge-tracing in a social network. 1 Start from a seed individual 2 He/she recruits neighbours to a survey by passing a coupon to each of them. 3 The successfully recruited individuals then participate in the survey and also pass coupons to their neighbors. 4 Calculate a weighted mean of the samples to derive an estimate • Don’t forget to reward participants Naoki Masuda (University of Bristol) Random walks and diffusion on networks 48 / 61
  50. 50. • Useful for difficult-to-reach populations • RDS = “snowball sampling” + mathematics (Ross et al., BMJ Open, 6, e012072 (2016); published in accordance with CC BY 4.0) Naoki Masuda (University of Bristol) Random walks and diffusion on networks 49 / 61
  51. 51. Q: Why a “weighted” mean? Naoki Masuda (University of Bristol) Random walks and diffusion on networks 50 / 61
  52. 52. • yi : quantity assigned to node i (e.g., infected or not, age, opinion) • NS : number of samples in S • p∗ i = ki N⟨k⟩ • Estimator: ⟨ˆy⟩ = 1 NS ∑ vi ∈S yi N ˆp∗ i because N ˆp∗ i is normalised (⟨N ˆp∗ i ⟩ = 1) • Use ˆp∗ i = ki N⟨ˆk⟩ to rewrite ⟨ˆy⟩ = 1 NS ∑ vi ∈S yi ⟨ˆk⟩ ki • What’s all these calculations? → We do not know the value of ⟨k⟩ or N. So we have to estimate them (or evade to estimate them). This is why we are estimating p∗ i . Naoki Masuda (University of Bristol) Random walks and diffusion on networks 51 / 61
  53. 53. Because p∗ i = ki N⟨k⟩ , we get ⟨ˆk⟩ = ∑ vi ∈S ki Np∗ i ∑ vi ∈S 1 Np∗ i = NS ∑ vi ∈S (ki )−1 . This leads to ⟨ˆy⟩ = 1 NS ∑ vi ∈S yi ⟨ˆk⟩ ki = ∑ vi ∈S (ki )−1 yi ∑ vi ∈S (ki )−1 (RDS II estimator. Volz & Heckathorn, J Official Stat, 24, 79–97, 2008) E.g., proportion of nodes that have a discrete type A (e.g., infected): ˆPA = ∑ vi ∈A∩S (ki )−1 ∑ vi ∈S (ki )−1 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 52 / 61
  54. 54. Voter model • Stochastic dynamics on networks • In discrete time, pick an edge uniformly randomly and achieve local consensus if the two nodes connected to the selected edge are discordant. consensus consensus Naoki Masuda (University of Bristol) Random walks and diffusion on networks 53 / 61
  55. 55. Consensus probability 1−Fi Fi i for random walk slides • Also called fixation probability • May depend on the initial seed node F1 F2 F3 F4 F5 F6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 54 / 61
  56. 56. How to get consensus probability, Fi? • Select an edge uniformly randomly for update in each step. • Convention: i → j means node i influences node j Fi = N∑ j=1 Aij ∑N i′,j′=1 Ai′j′ F{i,j} + ∑N j=1 Aji ∑N i′,j′=1 Ai′j′ × F∅ + ∑N i′,j′=1;i′̸=i,j′̸=i Ai′j′ ∑N i′,j′=1 Ai′j′ Fi • Note that F∅ = 0 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 55 / 61
  57. 57. • F{i,j} = Fi + Fj because . . . 1 2 3 4 5 6 F4 1 2 3 4 5 6 F5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 F1 1 2 3 4 5 6 F2 F3 F6 Naoki Masuda (University of Bristol) Random walks and diffusion on networks 56 / 61
  58. 58. Consensus probability in directed networks Fi N∑ j=1 Aji = N∑ j=1 Aij Fj Proof: Fi = N∑ j=1 Aij ∑N i′,j′=1 Ai′j′ F{i,j} + ∑N j=1 Aji ∑N i′,j′=1 Ai′j′ × F∅ + ∑N i′,j′=1;i′̸=i,j′̸=i Ai′j′ ∑N i′,j′=1 Ai′j′ Fi Naoki Masuda (University of Bristol) Random walks and diffusion on networks 57 / 61
  59. 59. Consensus probability in directed networks Fi N∑ j=1 Aji = N∑ j=1 Aij Fj • For undirected networks, Fi = 1/N for all nodes. For the edge-centric CTRW, p∗ L = p∗ (D − A) = 0 i.e. pi N∑ j=1 Aij − N∑ j=1 pj Aji = 0, ∀i So, Fi is the stationary density of the edge-centric CTRW on the edge-reversed network. This is not a coincidence. The dual process of this opinion dynamics is a coalescing RW on the edge-reversed network in continuous time. Naoki Masuda (University of Bristol) Random walks and diffusion on networks 58 / 61
  60. 60. Traditional node-based voter model 1 Select a node i uniformly at random (with probability 1/N). 2 Select an inneighbor of i (call it node j) with probability Aji /sin i . 3 Node i copies opinion of node j. FVM i = N∑ j=1 1 N Aij sin j FVM {i,j} + 1 N FVM ∅ +  1 − N∑ j=1 1 N Aij sin j − 1 N   FVM i • Note that FVM ∅ = 0 Consensus probability of the voter model in directed networks FVD i = N∑ j=1 Aij FVD j sin j Naoki Masuda (University of Bristol) Random walks and diffusion on networks 59 / 61
  61. 61. Consensus probability of the voter model in directed networks FVD i = N∑ j=1 Aij FVD j sin j For the DTRW and node-centric CTRW, p∗ = p∗ T i.e. p∗ i = N∑ j=1 p∗ j Aji sout j So, FVD i is the stationary density of the DTRW on the edge-reversed network. Therefore, FVD i = si ∑N ℓ=1 sℓ in undirected networks. Note that FVM i = sin i Fi Naoki Masuda (University of Bristol) Random walks and diffusion on networks 60 / 61
  62. 62. Resources Masuda, Porter & Lambiotte, Physics Reports, 716–717, 1–58 (2017). [open access] • Other topics covered: • Fourier and Laplace transforms • First-passage times • Fractal networks • Multilayer networks • Temporal networks • Discrete-choice models • More on community detection • Core-periphery structure • Diffusion maps • DeGroot model of opinion formation dynamics Naoki Masuda (University of Bristol) Random walks and diffusion on networks 61 / 61

×