•

2 likes•1,227 views

Slides from our SIAM ALA 2018 mini-tutorial. May 6, 2018. Hong Kong.

- 1. Tensor eigenvectors and stochastic processes Austin R. Benson · Cornell David F. Gleich · Purdue Act 1. 10:45-10:55am Overview Act 2. 10:55-11:05am Motivating applications Act 3. 11:05-11:30am Stochastic processes & Markov chains Act 4. 11:40-12:10pm Spacey random walk stochastic process Act 5. 12:10-12:30pm Theory of spacey random walks SIAM ALA'18Benson & Gleich 1 Papers, slides, & code ⟶ bit.ly/tesp-web, bit.ly/tesp-code A 3 2 1
- 2. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. After this is done, you should know a little bit about And where to look for more info! bit.ly/tesp-web 1 3 2 P • Tensor eigenvectors • Z-eigenvectors • Irreducible tensors • Higher-order Markov chains • Spacey random walks • Vertex reinforced random walks • Dynamical systems for trajectories • Fitting spacey random walks to data • Multilinear PageRank models • Clustering tensors • And lots of open problems in this area! SIAM ALA'18Benson & Gleich 2
- 3. A quick overview of where we are going to go in this tutorial and some rough timing. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and thought • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex reinforced • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor • New algorithm for computing tensor SIAM ALA'18Benson & Gleich 3
- 4. A note. We tried to be a friendly tutorial instead of trying to be comprehensive! See the extensive work by HK groups! SIAM ALA'18Benson & Gleich 4
- 5. 3 2 1 Fundamental notations and some helpful pictures SIAM ALA'18Benson & Gleich 5 tensor-vector product tensor-collapse product
- 6. Summary of fundamental notations SIAM ALA'18Benson & Gleich 6 We assume the tensor is symmetric or permuted so the last operations are all that’s needed.
- 7. The tensor Z-eigenvector problem has different properties than matrix eigenvectors SIAM ALA'18Benson & Gleich 7
- 8. There are many generalizations of eigen-problems to tensors. Their properties are very different. All eigenvectors have unit 2-norm. ||x||2 = 1. The H-eigenvalue spectrum is scale invariant. SIAM ALA'18Benson & Gleich 8
- 9. There are many generalizations of eigen-problems to tensors. Their properties are very different. All eigenvectors have unit 2-norm. ||x||2 = 1. Z-eigenvectors are not scale invariant. H-eigenvectors are. SIAM ALA'18Benson & Gleich 9 There are even more types of eigen-probs! • D-eigenvalues • E-eigenvalues (complex Z-eigenvalues) • Generalized versions too… • Other normalizations! [Lim 05] For more information about these tensor eigenvectors and some of their fundamental properties, we recommend the following resources • Tensor Analysis: Spectral Theory and Special Tensors. Qi & Luo, 2017. • A survey on the spectral theory of nonnegative tensors. Chang, Qi, & Zhang, 2013.
- 10. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. Usually, the properties of these objects are explored algebraically or through polynomial interpretations. Our tutorial focuses on interpreting the tensor objects stochastically! SIAM ALA'18Benson & Gleich 10 A 3 2 1
- 11. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing tensor
- 12. The best rank-1 approximation to a symmetric tensor is given by the principal eigenvector. [De Lathauwer 97; De Lathauwer-De Moor-Vandewalle 00; Kofidis-Regalia 01, 02] A is a symmetric if the entries are the same under any permutation of the indices. In data mining and signal processing applications, we are often interested in the “best” rank-1 approximation. Notes. The first k tensor eigenvectors do not necessarily give the best rank- k approximation. In general, this problem is not even well-posed [de Silva- Lim 08]. Furthermore, the first eigenvector is not necessarily in the best rank-k “orthogonal approximation” from orthogonal vectors [Kolda 01, 03]. 3 2 1 SIAM ALA'18Benson & Gleich 12
- 13. Quantum entanglement A(i,j,k,…,l) are the normalized amplitudes of an m-partite pure state |ψ> A is a nonneg sym tensor Diffusion imaging W is a symmetric, fourth-order kurtosis diffusion tensor D is a symmetric, 3 x 3 matrix ⟶ both are measured from MRI data. Michael S. Helfenbein Yale University https://www.eurekalert.org/pub_r eleases/2016-05/yu- ddo052616.php [Wei-Goldbart 03; Hu-Qi-Zhang 16] is the geometric measure of entanglement Paydar et al., Am. J. of Neuroradiology, 2014 [Qi-Wang-Wu 08] SIAM ALA'18Benson & Gleich 13
- 14. Markovian binary trees. Entry-wise minimal solutions to x = Bx2 + a are extinction Distribution of alleles (forms of a gene) in a population at time t is x. Start with an infinite population. 1. Every individual gets a random mate. 2. Mates of type j and k produce offspring of type i with probability P(i, j, k) and then die. Hardy-Weinberg equilibria of random mating models are tensor eigenvectors. Under Hardy- Weinberg equilibria (steady-state), x satisfies x = Px2. [Bean-Kontoleon-Taylor 08; Bini-Meini-Poloni 11; Meini-Poloni 11, 17] SIAM ALA'18Benson & Gleich 14
- 15. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic Markov Chains & Higher-order • Limiting and stationary • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing tensor
- 16. Markov chains, matrices, and eigenvectors have a long-standing relationship. [Kemeny-Snell 76] “In the land of Oz they never have two nice days in a row. If they have a nice day, they are just as likely to have snow as rain the next day. If they have snow or rain, they have an even chance of having the same the next day. If there is a change from snow or rain, only half of the time is this change to a nice day.” Column-stochastic in this tutorial (since we are linear algebra people). Equations for stationary distribution x. The vector x is an eigenvector of P. Px = x. SIAM ALA'18Benson & Gleich 16
- 17. Markov chains are a special case of a stochastic process. Stochastic processes are a (possibly infinite) sequence of RV. Z1, Z2, …, Zt, Zt+1, … • Zt is a random variable. • This is a discrete time stochastic process Stochastic processes are models throughout applied math and life • The weather • The stock market • Natural language • Random walks on graphs • Pólya’s urn • Brownian motion SIAM ALA'18Benson & Gleich 17
- 18. Stochastic processes are just sets of random variables (RV). Often they are infinite and coupled. Brownian Motion. • My value at the next time goes up or down by a normal random variable. • Z0 = 0, Zt+1 = Zt + N(0,1) • Z = cumsum(randn(100,1)) Z is a realization of a Brownian motion • Often used to model stock prices normal random variable SIAM ALA'18Benson & Gleich 18
- 19. Stochastic processes are just sets of random variables (RV). Often they are infinite and coupled. Pólya Urn. • Consider an urn with 1 purple and 1 green ball, draw a ball at random, replace it with one of the same color. • Z0 = 1, Zt+1 = Zt + B(1, Zt / (t+2)) 1 with prob Zt / (t+2) 0 otherwise Draw ball at random Put ball back with another of the same color SIAM ALA'18Benson & Gleich 19
- 20. Stochastic processes are just sets of random variables (RV). Usually they are infinite and coupled somehow. Finite Markov chain & random walk. Z0 = “state”, Pr(Zt+1 = i | Zt = j) = Pij • States are indexed by 1, …, n • The random walk on a graph is a special Markov chain where • Random walks on weighted graphs and finite Markov chains are isomorphic SIAM ALA'18Benson & Gleich 20
- 21. SIAM REVIEW c⃝ 2015 Society for Industrial and Applied Mathematics Vol. 57, No. 3, pp. 321–363 PageRank Beyond the Web∗ David F. Gleich† The PageRank Markov chain and random walk is another well known instance. Originally, the random surfer model • States are web-pages and links between pages make a directed graph. • The random surfer is a Markov chain with prob α follow a random outlink and with prob (1-α) go to a random page PageRank can be used for everything from analyzing the world's most important books to predicting traffic flow to ending sports arguments. -JESSICA LEBER, Fast Information.David F. Gleich SIAM ALA'18Benson & Gleich 21
- 22. Higher-order Markov chains & random walks are useful models for many data problems. Higher order Markov chains & random walks A second order chain uses the last two states Z-1 = “state”, Z0 = “another state” Pr(Zt+1 = i | Zt = j, Zt-1 = k) = Pi,j,k Simple to understand and turn out to be better models than standard (first-order) chains in several application domains [Ching-Ng-Fung 08] • Traffic flow in airport networks [Rosvall+ 14] • Web browsing behavior [Pirolli-Pitkow 99; Chierichetti+ 12] • DNA sequences [Borodovsky-McIninch 93; Ching-Fung-Ng 04] • Non backtracking walks in networks [Krzakala+ 13; Arrigo-Gringod-Higham-Noferini 18] Rosvall et al., Nature Comm., 2014. A tensor! SIAM ALA'18Benson & Gleich 22
- 23. Higher-order Markov chains are actually first-order Markov chains in disguise. Start with a second-order Markov chain Consider a new stochastic process on pairs of variables Higher-order Markov chains are Markov chains on the product space. SIAM ALA'18Benson & Gleich 23
- 24. Tensors are a natural representation of transition probabilities of higher-order Markov chains. 1 3 2 P Often called transition probability tensors. [Li-Ng-Ye 11, Li-Ng 14, Chu-Wu 14, Culp-Pearson- Zhang 17] SIAM ALA'18Benson & Gleich 24
- 25. A note. Often we use the “second-order” case as a stand-in for the “general” higher-order case. Second order Markov chain Z-1 = “state”, Z0 = “another state” Pr(Zt+1 = i | Zt = j, Zt-1 = k) = Pijk General higher-order Markov chain Pr(Zt+1 = i | Zt = j, Zt-1 = k, …, Zt-m+1 = l) = P(i, j, k, …, l) Terminology • Second-order = 2 states of history 3-mode tensor • mth-order = m states of history (m+1)-mode tensor An m+1-mode tensor. SIAM ALA'18Benson & Gleich 25
- 26. We love stochastic processes because they give you an intuition and “physics” about what is happening SIAM ALA'18Benson & Gleich 26
- 27. A fundamental quantity for stochastic processes is the fraction of time spent at each state (limiting distribution). Consider a stochastic process that goes on infinitely where each Zj takes a discrete value from a finite set. We want to know how often are we in a particular state in the long run? Other fundamental quantities include • Return times • Hitting times (Cesàro limit) SIAM ALA'18Benson & Gleich 27
- 28. Example limiting distribution with a random walk. Long time SIAM ALA'18Benson & Gleich 28
- 29. In the Pólya Urn, the limiting distribution of ball draws always exists. It can converge to any value. Thisistheuniformdistribution We have 1000 samples of the trajectories. SIAM ALA'18Benson & Gleich 29
- 30. For each realization, the sequence of random variables Z1, Z2, …, Zt, Zt+1, … converges. It does not converge to a unique value, but rather can converge to any value. SIAM ALA'18Benson & Gleich 30
- 31. Limiting distributions and stationary distributions for Markov chains have different properties. This point is often mis-understood.We want to make sure you get it right! Limiting distribution A stationary distribution Pk estart converges to p* Theorem. A finite Markov chain always has a limiting distribution. Theorem. The limiting distribution is unique if and only if the chain has only a single recurrent class. Theorem.A stationary distribution is limt ⟶ ∞ Prob[Zt = i].This is unique if and only if a Markov chain has a single aperiodic, recurrent class. SIAM ALA'18Benson & Gleich 31
- 32. States in a finite Markov chain are either recurrent or transient. Proof by picture. Recurrent: Prob[another visit] = 1 Transient: Prob[another visit] < 1. Markov chains ⟺ Dir. graphs Directed graphs +Tarjan’s algorithm give the flow among strongly connected components. (Block triangular form.) Block triangular form fromTarjan’s algorithm Strongly connected components Recurrent states SIAM ALA'18Benson & Gleich 32
- 33. The fundamental theorem of Markov chains is that any stochastic matrix is Cesàro summable. Limiting distribution given start node is P*[:, start] because Pk gives the k-state transition probability. Result. Only one recurrent class iff P* is rank 1. Proof sketch. A recurrent class is a fully-stochastic sub- matrix. If there are >1 recurrent classes, then P* would be rank >1 because we could look at the sub-chain on each recurrent class; if P* is rank 1, then the distribution is the same regardless of where you start and so “no choice” . Cesàro summable This always exists! SIAM ALA'18Benson & Gleich 33
- 34. Stationary distributions are much stronger than limiting distribution A stationary distribution Pk converging to P* This requires a single aperiodic recurrent class or irreducible & aperiodic matrix. (There are some funky cases if your chain is really two disconnected, independent chains.) We can always make a limiting distribution a stationary distribution.Turn P into a lazy-Markov chain. This is automatically aperiodic and doesn’t change the recurrence. SIAM ALA'18Benson & Gleich 34
- 35. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing tensor
- 36. Remember! Tensors are a natural representation of transition probabilities of higher-order Markov chains. 1 3 2 P But the stationary distribution on pairs of states is still a matrix eigenvector... [Li-Ng 14] Making the “rank-1 approximation” Xj,k = xjxk gives a formulation for tensor eigenvectors. SIAM ALA'18Benson & Gleich 36
- 37. The vector x satisfying Px2 = x is nonnegative and sums to 1. Thus, x often gets called a limiting distribution. But all we have done is algebra! What is a natural stochastic process that has this limiting distribution? SIAM ALA'18Benson & Gleich 37
- 38. Spacey random walks are stochastic processes whose limiting distribution(s) lead to such tensor eigenvectors. 1 3 2 P 1. We are at state Zt = j and want to transition according to P. 2. However, upon arriving at state Zt = j, we space out and forget about Zt-1 = k. 3. We still want to do our best, so we choose state Yt = r uniformly from our history Z1, Z2, …, Zt (technically, we initialize having visited each state once). 4. We then follow P pretending that Zt-1 = r. Stochastic process Z1, Z2, …, Zt, Zt+1, … with states in {1, …, n}. Spacey or space out? 走神 or 心不在焉 According to David’s students SIAM ALA'18Benson & Gleich 38
- 39. Spacey random walks are stochastic processes whose limiting distributions are such tensor eigenvectors. 10 12 4 9 7 11 4 Zt-1 Zt Yt Key insight [Benson-Gleich-Lim 17] Limiting distributions of this process are tensor eigenvectors of P. 1 3 2 P Prob(Zt+1 = i | Zt = j, Yt = r) = P(i, j, r). SIAM ALA'18Benson & Gleich 39
- 40. The main point. Limiting distributions of the spacey random walk stochastic process are tensor eigenvectors of P (we’ll prove this later). SIAM ALA'18Benson & Gleich 40
- 41. We have to be careful with undefined transitions, which correspond to zero columns in the tensor. 10 12 4 9 7 11 4 Zt-1 Zt Yt 1 3 2 P Prob(Zt+1 = i | Zt = j, Yt = r) = ? P(:, j, r) = 0. A couple options. 1. Pre-specify a distribution for when P(:, j, r) = 0. 2. Choose a random state from history ⟶ super SRW [Wu-Benson-Gleich 16] SIAM ALA'18Benson & Gleich 41
- 42. 1 2 3 1/2 1/2 1/2 1/2 1/2 1/2 Limiting distribution of RW is [1/3, 1/3, 1/3]. What about non-backtracking RW? NBRW disallows going back to where you came from and re-normalizes the probabilities. Lim. dist. is still [1/3, 1/3, 1/3], but for far different reasons. NBRW is a second-order Markov chain! What happens with the spacey random walk using the NBRW transition probabilities? Zero-column fill-in affects the limiting distribution and tensor evec. Follow along with Jupyter notebook! 3-node-cycle-walks.ipynb SIAM ALA'18Benson & Gleich 42
- 43. FAQ. Please ask your own questions, too! 1. What’s a spacey random walk, again? A stochastic process defined by a transition probability tensor. 2. Is the spacey random walk a Markov chain? No, not in general—the transitions depends on the entire history. 3. Is the limiting distribution of a higher-order MC a tensor e-vec? No, not in general. 4. Why not just compute the stat. dist. of the higher-order MC? We are motivating tensor eigenvectors from a stochastic processes view. 5. What is an e-vec with e-val 1 of a transition probability tensor? It could be the limiting distribution of a spacey random walk. 1 3 2 P SIAM ALA'18Benson & Gleich 43
- 44. 1 2 3 1/2 1/2 Follow along with Jupyter notebook! 7-node-line-walks.ipynb 4 5 6 7 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/3 1/3 1/61/6 What will happen with a… RW? NBRW? SRW (uniform fill-in)? SRW (RW stat. dist. fill-in)? SRW (NBRW stat. dist. fill-in)? SSRW? SIAM ALA'18Benson & Gleich 44
- 45. (Not well defined) conjecture. When the transition probability tensor entries come from a non-backtracking random walk, the spacey random walk “interpolates” between the standard random walk and the non-backtracking one. SIAM ALA'18Benson & Gleich 45
- 46. Pólya urns are spacey random walks. Draw random ball. Put ball back with another of the same color This is a second-order spacey random walk with two states. Consequently, we know this one must converge because it’s a Pólya Urn! SIAM ALA'18Benson & Gleich 46
- 47. But didn’t Pólya Urns have any limiting distribution? Does this mean that tensor is interesting? Yes! Any stochastic vector is a tensor eigenvector and these are also limiting distributions. SIAM ALA'18Benson & Gleich 47
- 48. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, convergence • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing tensor
- 49. Spacey random walks have a number of interesting properties as well as a number of open challenges! Properties 1. Limiting distributions of SRWs are tensor evecs with eval 1 (proof shortly!) 2. Asymptotically, SRWs are first-order Markov chains. 3. If there are just 2 states, then the SRW converges but possibly to one of several distributions. 4. If P is sufficiently “regularized”, then the SRW converges to a unique limiting distribution. Open problems • Existence? • Uniqueness? • Computation? SIAM ALA'18Benson & Gleich 49 1 3 2 P Note. Spacey random walks are defined by a stochastic transition tensor, so these are all tensor questions!
- 50. An informal and intuitive proof that spacey random walks converge to tensor eigenvectors Idea. Let wT be the fraction of time spent in each state after T ≫ 1 steps. Consider an additional L steps, T ≫ L ≫ 1.Then wT ≈ wT+L if we converge. SIAM ALA'18Benson & Gleich 50 1 3 2 P Long time wT wT+L Suppose M(x) = P[wT]m-2 has a unique stationary distribution, xT. If the SRW converges, then xT = wT+L, otherwise wT+L would be different. Thus, xT = P[wT]m-1 xT ≈ P[wT+L]m-1 xT = P[xT]m-1 xT = PxT m-1. Long time wT wT+L
- 51. To formalize convergence, we need the theory of generalized vertex reinforced random walks (GVRRW). A stochastic process X1, …, Xt, … is a GVRRW if wT is the fraction of time in each state FT is the sigma algebra generated by X1, …, XT. M(wT) is a column stochastic matrix that depends on wT . [Diaconis 88; Pemantle 92, 07; Benaïm 97] SIAM ALA'18Benson & Gleich 51 The classicVRRW is the following • Given a graph, randomly move to a neighbor with probability propotional to how often we’ve visited the neighbor!
- 52. To formalize convergence, we need the theory of generalized vertex reinforced random walks (GVRRW). A stochastic process X1, …, Xt, … is a GVRRW if wT is the fraction of time in each state FT is the sigma algebra generated by X1, …, XT. M(wT) is a column stochastic matrix that depends on wT . Spacey random walks are GVRRWs with the map M: M(wT) = P[wT]m-2. [Diaconis 88; Pemantle 92, 07; Benaïm 97] SIAM ALA'18Benson & Gleich 52
- 53. To formalize convergence, we need the theory of generalized vertex reinforced random walks (GVRRW). Theorem [Benaïm97] heavily paraphrased In a discrete GVRRW, the long-term behavior of the occupancy distribution wT follows the long-term behavior of the dynamical system To study convergence properties of the SRW, we just need to study the dynamical system for our map M: M(wT) = P[wT]m-2: where maps a column stochastic matrix to its Perron vector. SIAM ALA'18Benson & Gleich 53
- 54. More on how stationary distributions of GVRRWs correspond to ODEs SIAM ALA'18Benson & Gleich 54 THEOREM [Benaïm, 1997] Less Paraphrased The sequence of empirical observation probabilities ct is an asymptotic pseudo-trajectory for the dynamical system Thus, convergence of the ODE to a fixed point is equivalent to stationary distributions of the VRRW. • M must always have a unique stationary distribution! • The map to M must be very continuous • Asymptotic pseudo-trajectories satisfy
- 55. Spacey random walks converge to tensor eigenvectors (a more formal proof). Suppose that the SRW converges.Then we converge to a stationary point. SIAM ALA'18Benson & Gleich 55 1 3 2 P Long time wT wT+L Remember the informal proof. All we’ve done is just formalize this by using the dynamical system to map behavior!
- 56. Corollary. Asymptotically, GVRRWs (including spacey random walks) act as first-order Markov chains. Suppose that the SRW converges to x. Then SIAM ALA'18Benson & Gleich 56
- 57. Relationship between spacey random walk convergence and existence of tensor eigenvectors. SRW converges ⇒ existence of tensor e-vec of P with e-val 1. SRW converges ⇍ existence of tensor e-vec of P with e-val 1. Apply map f(x) = Pxm-1 satisfies conditions of Brouwer’s fixed point theorem, so there always exists an x such that Pxm-1 = x. Furthermore, 𝜆 = 1 is the largest eigenvalue. [Li-Ng 14] There exists a P for which the SRW does not converge [Peterson 18] SIAM ALA'18Benson & Gleich 57
- 58. General Open Question. Under what conditions does the spacey random walk converge? Peterson’s Conjecture. If P is a 3-mode tensor, then the spacey random walk converges. Broader conjecture There is always a (generalized) SRW that converges to a tensor evec. What we have been able to show so far. 1. If there are just 2 states, then the SRW converges. 2. If P is sufficiently “regularized”, then the SRW converges. SIAM ALA'18Benson & Gleich 58
- 59. Almost every 2-state spacey random walk converges. [Benson-Gleich-Lim 17] Special case of 2 x 2 x 2 system... SIAM ALA'18Benson & Gleich 59
- 60. Almost every 2-state spacey random walk converges. Theorem [Benson-Gleich-Lim 17] The dynamics of almost every 2 x 2 x … x 2 spacey random walk (of any order) converges to a stable equilibrium point. stable stable unstable Things to note… 1. Multiple stable points in above example; SRW could converge to any. 2. Randomness of SRW is “baked in” to initial condition of system. SIAM ALA'18Benson & Gleich 60
- 61. A sufficiently regularized spacey random walk converges. Consider a modified “spacey random surfer” model. At each step, 1. with probability α, follow SRW model P. 2. with probability 1 - α, teleport to a random node. Equivalent to a SRW on S = αP + (1 – α)J, where J is normalized ones tensor. Theorem. If α < 1 / (m – 1), 1. the SRW on S converges [Benson-Gleich-Lim 17] 2. there is a unique tensor x e-vec satisfying Sxm-1 = x [Gleich-Lim-Yu 15] [Gleich-Lim-Yu 15; Benson-Gleich-Lim 17] SIAM ALA'18Benson & Gleich 61
- 62. A sufficiently regularized spacey random walk converges. The higher-order power method is an algorithm to compute the dominant tensor eigenvector. yk+1 = Txk m-1 xk+1 = yk+1 / || yk+1 || Theorem [Gleich-Lim-Yu 15] If α < 1 / (m – 1), the power method on S = αP + (1 – α)J converges to the unique vector satisfying Sxm-1 = x. Conjecture. If the higher-order power method on P always converges, then the spacey random walk on P always converges. SIAM ALA'18Benson & Gleich 62
- 63. Conjecture. Determining if a SRW converges is PPAD-complete. Computing a limiting distribution of SRW is PPAD-complete. Why? In general, NP-hard to determine if tensor evec for eval 𝜆 [Hillar-Lim 13]. Know evec exists for transition probability tensor P, eval 𝜆 = 1 [Li-Ng 14]. However, no obvious way to compute it. Similar to other PPAD-complete problems (e.g., Nash equilibria). SIAM ALA'18Benson & Gleich 63
- 64. General Open Question. What is the best way to compute tensor eigenvectors? • Higher-order power method [Kofidis-Regalia 00, 01; De Lathauwer-De Moor-Vandewalle 00] • Shifted higher-order power method [Kolda-Mayo 11] • SDP hierarchies [Cui-Dai-Nie 14; Nie-Wang 14; Nie-Zhang 18] • Perron iteration [Meini-Poloni 11, 17] For SRWs, the dynamical system offers another way. Numerically integrate the dynamical system! [Benson-Gleich-Lim 17; Benson-Gleich 18] SIAM ALA'18Benson & Gleich 64 Equivalent to Perron iteration with Forward Euler & unit time-step.
- 65. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing evecs
- 66. Applications of spacey random walks. 1. Pólya urns are SRWs. 2. SRWs model taxi sequence data. 3. Asymptotics of SRWs for data clustering. 4. Insight for new algorithms to compute tensor eigenvectors. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. (Us, Slide 10) 66SIAM ALA'18Benson & Gleich
- 67. (Review) Pólya urns are spacey random walks. Draw random ball. Put ball back with another of the same color This is a second-order spacey random walk with two states. We know it converges by our theory (every two-state process converges). 67SIAM ALA'18Benson & Gleich
- 68. Generalized Pólya urns are spacey random walks. Draw m random balls with replacement. Put in new green ball with probability q(b1, b2, …, bm). This is a (m-1)-order spacey random walk with two states. We know it converges by our theory (every two-state process converges). b1 b2 bm … 68SIAM ALA'18Benson & Gleich
- 69. Applications of spacey random walks. 1. Pólya urns are SRWs. 2. SRWs model taxi sequence data. 3. Asymptotics of SRWs for data clustering. 4. Insight for new algorithms to compute tensor eigenvectors. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. (Us, Slide 10) 69SIAM ALA'18Benson & Gleich
- 70. Spacey random walks model sequence data. Maximum likelihood estimation problem (most likely P for the SRW model and the observed data). convex objective linear constraints nyc.gov [Benson-Gleich-Lim 17] 70SIAM ALA'18Benson & Gleich
- 71. What is the SRW model saying for this data? Model people by locations. • A passenger with location k is drawn at random. • The taxi picks up the passenger at location j. • The taxi drives the passenger to location i with probability Pi,j,k Approximate location dist. by history ⟶ spacey random walk. Spacey random walks model sequence data. nyc.gov 71SIAM ALA'18Benson & Gleich
- 72. • One year of 1000 taxi trajectories in NYC. • States are neighborhoods in Manhattan. • Compute MLE P for SRW model with 800 taxis. • Evaluate RMSE on test data of 200 taxis. RMSE = 1 – Prob[sequence generated by process] Spacey random walks model sequence data. 72SIAM ALA'18Benson & Gleich
- 73. Spacey random walks are identifiable via this procedure. 73 Two difficult test tensors from [Gleich-Lim-Yu 15] 1. Generate 80 sequences with 200 transitions each from SRW model Learn P for 2nd-order SRW, R for 2nd-order MC, P for 1st-order MC 2. Generate 20 sequences with 200 transitions each and evaluate RMSE. Evaluate RMSE = 1 – Prob[sequence generated by process] SIAM ALA'18Benson & Gleich
- 74. Applications of spacey random walks. 1. Pólya urns are SRWs. 2. SRWs model taxi sequence data. 3. Asymptotics of SRWs for data clustering. 4. Insight for new algorithms to compute tensor eigenvectors. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. (Us, Slide 10) 74SIAM ALA'18Benson & Gleich
- 75. Co-clustering nonnegative tensor data. Joint work with Tao Wu, Purdue Spacey random walks that converge are asymptotically Markov chains. • occupancy vector wT converges to w ⟶ dynamics converge to P[w]m-2. 1 3 2 P 2 1 M(wt ) This connects to spectral clustering on graphs. • Eigenvectors of the normalized Laplacian of a graph are eigenvectors of the random walk matrix. • Instead, we compute a stationary distribution w and use eigenvectors of P = P[w]m-2. [Wu-Benson-Gleich 16] 75SIAM ALA'18Benson & Gleich
- 76. We possibly symmetrize and normalize nonnegative data to get a transition probability tensor. [1, 2, …, n] x [1, 2, …, n] x [1, 2, …, n] [i1, i2, …, in1 ]x [j1, j2, …, jn2 ]x [k1, k2, …, kn3 ] If the data is a brick, we symmetrize before normalization [Ragnarsson-Van Loan 13] Generalization of If the data is a symmetric cube, we can normalize it to get a transition tensor P. 76SIAM ALA'18Benson & Gleich
- 77. 77 Input. Nonnegative brick of data. 1. Symmetrize the brick (if necessary). 2. Normalize to a stochastic tensor. 3. Estimate the stationary distribution of the spacey random walk (or super-spacey random walk for sparse data). 4. Form the asymptotic Markov model. 5. Bisect indices using eigenvector of the asymptotic Markov model. 6. Recurse. Output. Partition of indices. The clustering methodology. 1 3 2 T SIAM ALA'18Benson & Gleich
- 78. 78 Ti,j,k = #(flights between airport i and airport j on airline k) Clustering airline-airport-airport networks. UNCLUSTERED no apparent structure CLUSTERED diagonal structure evident SIAM ALA'18Benson & Gleich
- 79. 79 “best” clusters • pronouns & articles (the, we, he, …) • prepositions & link verbs (in, of, as, to, …) fun 3-gram clusters • {cheese, cream, sour, low-fat, frosting, nonfat, fat-free} • {bag, plastic, garbage, grocery, trash, freezer} fun 4-gram cluster • {german, chancellor, angela, merkel, gerhard, schroeder, helmut, kohl} Ti,j,k = #(consecutive co-occurrences of words i, j, k in corpus) Ti,j,k,l = #(consecutive co-occurrences of words i, j, k, l in corpus) Data from Corpus of ContemporaryAmerican English (COCA) www.ngrams.info Clustering n-grams in natural language. SIAM ALA'18Benson & Gleich
- 80. Applications of spacey random walks. 1. Pólya urns are SRWs. 2. SRWs model taxi sequence data. 3. Asymptotics of SRWs for data clustering. 4. Insight for new algorithms to compute tensor eigenvectors. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. (Us, Slide 10) 80SIAM ALA'18Benson & Gleich
- 81. New framework for computing tensor evecs. [Benson-Gleich 18] Our stochastic viewpoint gives a new approach. We numerically integrate the dynamical system. Many tensor eigenvector computation algorithms are algebraic, look like generalizations of matrix power method, shifted iteration, Newton iteration. [Lathauwer-Moore-Vandewalle 00, Regalia-Kofidis 00, Li-Ng 14; Chu-Wu 14; Kolda-Mayo 11, 14] Higher-order power method Dynamical system Many known convergence issues! 1. The dynamical system is empirically more robust for principal evec of transition probability tensors. 2. Can generalize for symmetric tensors & any evec. 81SIAM ALA'18Benson & Gleich
- 82. New framework for computing tensor evecs. [Benson-Gleich 18] Let Λ be a prescribed map from a matrix to one of its eigenvectors, e.g., Λ(M) = eigenvector of M for kth smallest algebraic eigenvalue, Λ(M) = eigenvector of M for largest magnitude eigenvalue Suppose the dynamical system converges.Then New computational framework. 1. Choose a mapΛ 2. Numerically integrate the dynamical system 82SIAM ALA'18Benson & Gleich
- 83. The algorithm is evolving this system! The algorithm has a simple Julia code function mult3(A, x) dims = size(A) M = zeros(dims[1],dims[2]) for i=1:dims[3] M += A[:,:,i]*x[i] end return M end function dynsys_tensor_eigenvector(A; maxit=100, k=1, h=0.5) x = randn(size(A,1)); normalize!(x) # This is the ODE function F = function(x) M = mult3(A, x) d,V = eig(M) # we use Julia's ordering (*) v = V[:,k] # pick out the kth eigenvector if real(v[1]) >= 0; v *= -1.0; end # canonicalize return real(v) – x end # evolve the ODE via Forward Euler for iter=1:maxit; x = x + h*F(x); end return x, x'*mult3(A,x)*x end Benson & Gleich SIAM ALA'18 83
- 84. New framework for computing tensor evecs. Empirically, we can compute all the tensor eigenpairs with this approach (including unstable ones that higher-order power method cannot compute). tensor is Example 3.6 from [Kolda-Mayo 11] 84SIAM ALA'18Benson & Gleich
- 85. Why does this work? (Hand-wavy version) Trajectory of dynamical system for Example 3.6 from Kolda and Mayo [2011]. Color is projection onto first eigenvector of Jacobian which is +1 at stationary points. Numerical integration with forward Euler. Why does this work? The eigenvector map shifts the spectrum around unstable eigenvectors. Benson & Gleich SIAM ALA'18 85
- 86. There are tons of open questions with this approach that we could use help with! Can the dynamical system cycle? Yes, but what problems produce this behavior? Which eigenvector (k) to use? It really matters How to numerically integrate? Seems like ODE45 does the trick! SSHOPM -> Dyn Sys? If SSHOPM converges, can you show the dyn. sys will converge for some k? Can you show there are inaccessible vecs? No clue right now! Benson & Gleich SIAM ALA'18 86 Trajectory of dynamical system for Example 3.6 from Kolda and Mayo [2011]. Color is projection onto first eigenvector of Jacobian which is +1 at stationary points. Numerical integration with forward Euler.
- 87. New framework for computing tensor evecs. • SDP methods can compute all eigenpairs but have scalability issues [Cui-Dai-Nie 14, Nie-Wang 14, Nie-Zhang 17] • Empirically, we can compute the same eigenvectors while maintaining scalability. tensor is Example 4.11 from [Cui-Dai-Nie 14] 87SIAM ALA'18Benson & Gleich
- 88. Act 1. This overview • Basic notation and operations • The fundamental problems Act 2. Motivating applications • Compression • Diffusion imaging • Hardy-Weinberg genetics Act 3. Review of Stochastic processes Markov Chains & Higher-order chains • Limiting and stationary distributions • Irreducibility Act 4. Spacey RWs as stochastic processes • Pause for interpretations and • FAQ Act 5. Theory of spacey random walks • Limiting dists are tensor evecs • Dynamical systems & vertex RWs • (Non-) existence, uniqueness, • Computation Act 6. Applications of spacey random walks • Pólya urns, sequence data, tensor clustering • New algorithm for computing tensor
- 89. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. Usually, the properties of these objects are explored algebraically or through polynomial interpretations. Our tutorial focused on interpreting the tensor objects stochastically! SIAM ALA'18Benson & Gleich 89 A 3 2 1
- 90. Stochastic processes offer a new and exciting set of opportunities and challenges in tensor algorithms. Hopefully, you should know a little bit about… And where to look for more info! www.cs.cornell.edu/~arb/tesp 1 3 2 P • Tensor eigenvectors • Z-eigenvectors • Irreducible tensors • Higher-order Markov chains • Spacey random walks • Vertex reinforced random walks • Dynamical systems for trajectories • Fitting spacey random walks to data • Multilinear PageRank models • Clustering tensors • And lots of open problems in this area! SIAM ALA'18Benson & Gleich 90
- 91. Open problems abound! SIAM ALA'18Benson & Gleich 91 General Open Questions. 1. What is the relationship between RWs, non-backtracking RWs, and SRWs? 2. Under what conditions does the spacey random walk converge? 3. What is the computational complexity surrounding SRWs? 4. How well does the dynamical system work for computing tensor evecs? 5. How can we use stochastic or dynamical systems views for H-eigenpairs? 6. More data mining applications? Conjectures. 1. If P is a 3-mode tensor, then the spacey random walk converges. 2. If the HOPM on P always converges, the SRW on P always converges. 3. Determining if a SRW converges is PPAD-complete. 4. Computing a limiting distribution of SRW is PPAD-complete.
- 92. Tensor Eigenvectors and Stochastic Processes. Thanks for your attention! SIAM ALA'18Benson & Gleich 92 Today’s information & more. www.cs.cornell.edu/~arb/tesp Austin R. Benson http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu David F. Gleich https://www.cs.purdue.edu/homes/dgleich/ @dgleich dgleich@purdue.edu