Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analytic tools for higher-order data

135 views

Published on

Slides from my talk at the University of Chicago Scientific and Statistical Computing Seminar on January 12, 2017.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Analytic tools for higher-order data

  1. 1. Analytic tools for higher-order data 1 3 2 P 1 2 3 1,3 2 1 2,3 1,2 3 Phonecall-Eu CollegeMsg Email-Eu SMS-A FBWall 0.00.5 Fraction of counts Blocking motifs 0.0 0.5 Fraction of counts Non-blocking motifs 1,3 2 1 2,3 1,2 3 Austin Benson Stanford University Univ. of Chicago, Jan. 12, 2017 Figure 1: Higher-order network structures and framework. A: Higher-order structures are captured 13 connected three-node directed motifs are shown her motif M7. For a given motif M, our framework aims motif conductance, M (S), which we define as the ra triangles cut) to the minimum number of nodes in insta In this case, there is one motif cut. C: The higher-order graph and a motif of interest (in this case, M7), the fram (WM ) by counting the number of times two nodes co eigenvector of a Laplacian transformation of the motif S ¯S 9 10 3 5 6 7 8 1 2 4 Dec.00 June.01 Dec.01 0 50 100 150 CEO Skilling appointed CEO Skilling Started Calif. bankruptcy Calif. legislature India (general) Numberofemails
  2. 2. 1. New analytical techniques for data § Higher-order clustering in networks Science 16, SDM 15 § Higher-order structures in temporal graphs WSDM 17 § Models for inherently higher-order data SIAM Review 17, NIPS 16, more in prep. § Novel null models for graph data KDD 14 2 3. Modeling user behavior on the Web § Discrete choice models WWW 16, more in prep. § Repeat behavior WWW 16 2. Faster algorithms for larger problems § Distributed matrix algs. for tall-and-skinny data NIPS 14, BigData 13 § Practical fast matrix multiplication SIAM J. Matrix Analysis & App. 16, PPoPP 15 § Structured matrix computations SIAM J. Scientific Computing 14 My research computational tools for bigger and better data analysis … all for downstream applications § Ecology, biology and neuroscience Science 16, NIPS 14 § Transportation systems SIAM Review 17, Science 16, NIPS 16 § Social networks WSDM 17, WWW 16, KDD 14
  3. 3. 3 Complex systems are modeled by networks my goal analyze the model to gain insights into the system Brains nodes are neurons edges are synapses Social interactions nodes are people edges are friendships Electrical grid nodes are power plants edges are transmission lines Tim Meko, Washington Post Currency nodes are accounts edges are exchanges
  4. 4. 4 Some common high-level analyses 1. Clustering/partitioning/community detection. Finding groups of nodes that form functional modules 2. Ranking. Finding what things are important (e.g., PageRank and its variants) 3. Spreading and traversing. How viruses, social influences, etc. spread, random walks, routing 4. Evolution. what new links will form? Graphs are defined by nodes and edges, so we usually think about our analysis, models, and algorithms in terms of nodes and edges. https://www1.udel.edu Complex systems are modeled by networks my goal analyze the model to gain insights into the system
  5. 5. 5 A fundamental network analysis is finding clusters, modules, communities, groups Real-world networks have modular organization. We want to automatically find the modules in the system. Co-author network Main idea. Find groups of nodes with high internal edge density and low external edge density. Brain network, de Reus et al., RSTB, 2014.
  6. 6. Background on community detection and conductance Conductance is one of the most important quality scores [Schaeffer07] used in Markov chain theory, spectral clustering, bioinformatics, vision, etc. The conductance of a set of vertices is the ratio of edges leaving to total edges small conductance ó good cluster (S) = cut(S) min(vol(S),vol( ¯S)) cut(S) = 7 |S| = 15 vol(S) = 85 cut( ¯S) = 7 | ¯S| = 20 vol( ¯S) = 151 φ(S) = 7/85 (edges leaving set) (total edges in the set) 6
  7. 7. 7 Thinking beyond nodes and edges Nodes & edges are a bit simplistic. There is abundant evidence that small subgraph patterns, or network motifs, drive complex systems. “Contact neighborhoods” drive social influence. Ugander et al., PNAS, 2012 Mangan et al., J. of Mol. Bio., 2003 Alon, Nature Reviews Genetics, 2007 Signed “feed-forward loops” are delay units in genetic transcription. A C B CC A C B A B C Also see Milo et al., 2002 Sporns & Kötter, 2004 Milenković & Pržulj, 2008 Yaveroğlu et al., 2014
  8. 8. Nodes and edges are not the fundamental units of complex networks. So why should we look for structure in terms of them? We shouldn’t. And we gain new insights into data with higher-order analysis techniques. 8
  9. 9. Network Motif Different motifs give different clusters 9We can formalize this by generalizing conductance for motifs! Higher-order clustering finding groups of nodes based on motifs Benson, Gleich, & Leskovec, Science, 2016
  10. 10. In practice, motifs organize real-world networks amazing well and recover aquatic layers in food webs http://marinebio.org/oceans/marine-zones/ 10 B D Micronutrient sources Pelagic fishes and benthic prey Benthic macroinvertebrates Benthic Fishes We don’t know how to find this structure with edge partitioning.
  11. 11. Motif-based conductance Need new notions of cut and volume S S S¯S ¯S (S) = cut(S) min(vol(S),vol( ¯S)) M (S) = cutM (S) min(volM (S),volM ( ¯S)) volM (S) = #(motif end points in S)vol(S) = #(edge end points in S) Benson, Gleich, & Leskovec, Science, 2016 11 cut(S) = #(edges cut) cutM (S) = #(motifs cut)
  12. 12. Motif-based conductance Motif S ¯S M (S) = motifs cut motif volume = 1 8 12 Benson, Gleich, & Leskovec, Science, 2016 9 10 3 5 6 7 8 1 2 4 9 10 3 5 6 7 8 1 2 4
  13. 13. Higher-order clustering Given a motif M and a graph G, we want to find a set of nodes S that minimizes motif conductance Unfortunately, this is NP-hard… Solution: motif-based spectral clustering 1. Input graph G and motif M 2. Form new weighted graph W 3. Apply spectral clustering on W à output clusters Theorem resulting clusters will obtain near optimal motif conductance (motif Cheeger inequality) 13 Benson, Gleich, & Leskovec, Science, 2016
  14. 14. Motif-based spectral clustering W(M) ij = counts co-occurrences of motif pattern between i, j W(M) 1 2 3 4 5 6 7 8 9 10 1 11 1 1 1 3 1 1 1 1 1 2 1 1 14 Benson, Gleich, & Leskovec, Science, 2016 9 10 3 5 6 7 8 1 2 4
  15. 15. Motif-based spectral clustering W(M) ij = counts co-occurrences of motif pattern between i, j W(M) Key insight Spectral clustering on weighted graph W(M) finds clusters of low motif conductance M (S) = motifs cut motif volume = 1 10 1 2 3 4 5 6 7 8 9 10 1 11 1 1 1 3 1 1 1 1 1 2 1 1 15 Benson, Gleich, & Leskovec, Science, 2016
  16. 16. Motif-based spectral clustering step 1 Step 1. Form weighted graph W(M) 1 2 3 4 5 6 7 8 9 10 16
  17. 17. Motif-based spectral clustering step 2 Step 2. Compute Fiedler vector f(M) associated with λ2 of the normalized Laplacian of W(M). Roughly O(# edges) time. D = diag(W(M) e) L(M) = D 1/2 (D W(M) )D 1/2 L(M) z = 2z f(M) = D 1/2 z D = diag(W(M) e) L(M) = D 1/2 (D W(M) )D 1/2 L(M) z = 2z f(M) = D 1/2 z Diagonal degree matrix 17 (using ideas from [Mihail89,Chung92]) f(M) 1 2 3 4 5 6 7 8 9 10 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
  18. 18. Motif-based spectral clustering step 3 Step 3. Sort nodes by values in f(M): f1, f2, …fn Let Sr = {f1, …, fr} and compute the motif conductance of each Sr “motif conductance sweep cut” 18 (using tools from [Mihail89,Chung92]) 1 2 3 4 5 6 7 8 9 10 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
  19. 19. Motif Cheeger inequality Theorem If the motif has three nodes, then the sweep procedure on the weighted graph finds a set S of nodes for which Theorem for 4+ nodes, we use a slightly altered definition of conductance. M (S)  4 q ⇤ M optimal value over all sets of nodes 19 Benson, Gleich, & Leskovec, Science, 2016 M(G) = {instances of M in G} Key Proof Step cutM (S, G) = X {i,j,k}2M(G) Indicator[xi , xj , xk not the same] = 1 4 (x2 i + x2 j + x2 k xi xj xj xk xi xk ) = quadratic in x
  20. 20. Applications 20 1. We do not know the motif of interest food webs and new applications 2. We know the motif of interest transcription regulation networks, connectome, social networks 3. We seek richer information from our data transportation networks and new applications
  21. 21. Application 1 Food web structure Nodes are species Edges are energy flows (who eats whom) i → j means j eats i Do not know which motif, so we try several. By Cheeger inequality, motif conductance better than edge conductance 21 Benson, Gleich, & Leskovec, Science, 2016
  22. 22. B D Micronutrient sources Pelagic fishes and benthic prey Benthic macro- invertebrates Benthic Fishes Motif M6 reveals aquatic layers A 61% accuracy vs. 48% with edge- based methods 22 Benson, Gleich, & Leskovec, Science, 2016 Application 1 Food web structure
  23. 23. Application 2 Yeast regulatory functions Nodes are groups of genes i → j if i is regulated by a transcriptional factor encoded by j Partitioning based on signed motifs re-identifies known functions. 97% accuracy vs. 68–82% for other methods C D C C A C B D 23 Benson, Gleich, & Leskovec, Science, 2016
  24. 24. Application 3 Transportation North American air transport network Nodes are cites. Edges reflect reachability, and are directed and unweighted. Data from [Frey&Dueck07]. 24
  25. 25. Accepted pend B A Counts length-two paths W (M) ij = #{bi-directional length-2 paths from i to j} Weighted adjacency matrix already reveals hub-like structure 25 Benson, Gleich, & Leskovec, Science, 2016 Application 3 Transportation
  26. 26. Top 10 U.S. hubs East coast non-hubs West coast non-hubs Primary spectral coordinate Atlanta, the top hub, is next to Salina, a non-hub. MOTIF SPECTRAL EMBEDDING EDGE SPECTRAL EMBEDDING 26 Benson, Gleich, & Leskovec, Science, 2016 Application 3 Transportation
  27. 27. Other higher-order analyses for data 27 1. Adapting tools for richer network models. Defining motifs in temporal graphs and then finding them. 2. New methods for inherently higher-order data. Understanding the structure of tensor data.
  28. 28. Technical infrastructure packets over the internet, messages over supercomputer Private communication e-mail, phone calls, text messages, instant messages Public communication Facebook wall post, Q&A forums, Wikipedia edits Payments credit card transactions, Bitcoin, Venmo Biology cell signaling Timestamped connections between things are everywhere 28
  29. 29. What are temporal networks, formally? A temporal networks is a sequence of (directed) edges with timestamps (u, v, 05:31:14) (u, x, 05:32:33) (x, y, 06:07:53) (u, v, 06:45:12) many timestamps between the same pair of nodes! (y, z, 06:45:12) Richer than other dynamic graph models 1. The growth of networks (citation networks, Facebook friends) 2. Sequences of coarse-grain snapshots 29 What is a motif in this network representation?
  30. 30. a b c d 25s 17s, 28s, 30s, 35s 15s, 32s 31s 14s 1 2 3 a b c 25s 28s 32s a b c 25s 30s 32s c d a 31s 32s 35s a d c 14s 17s 15s a d c 14s 28s 32s δ = 10s Temporal network Motif instance 1. Isomorphic subgraph 2. Respects ordering 3. All edges happen within δ time units Wrong order (a, c) after (c, a) Too long 32 – 14 = 18 > 10 Formal definition 1. Directed multigraph 2. Edge ordering 3. Time window δ Temporal network motifs 30 Paranjape, Benson, & Leskovec, WSDM, 2017.
  31. 31. The temporal motif problem a b c d 25s 17s, 28s, 30s, 35s 15s, 32s 31s 14s 1 2 3 δ = 10s Given a network, a motif with time duration δ, how many instances of the motif are in the network? 3 ( )a b c 25s 28s 32s a b c 25s 30s 32s c d a 31s 32s 35s How can we do this efficiently? 31
  32. 32. Summary of new algorithms In a graph with m temporal edges and T static triangles 1. General algorithm for any motif. faster than naive methods, but may not be optimal 2. 2 nodes, k temporal edges. O(k2 m) (uses general algorithm) linear time in size of temporal graph (for const. k) 3. 3 nodes, 3 temporal edges, stars. O(m) linear time in size of temporal graph 4. 3 nodes, 3 temporal edges, triangles. O(T1/2m) faster than Ω(Tm) with naive method 1 2 3 1, 2 3 1, 3 2 32 Paranjape, Benson, & Leskovec, WSDM, 2017.
  33. 33. New algorithms let us analyze large datasets 33 Paranjape, Benson, & Leskovec, WSDM, 2017. § Processing temporal graph with 2 billion temporal edges takes just a few hours (single threaded) § Triangle counting algorithm provides significant speedups over naive methods. # temp. edges speedup Wikipedia edits 10M 1.92x Stack Overflow comments 63M 1.29x Bitcoin transactions 123M 56.5x Text messages 800M 2.28x Phone calls 2.04B 1.42x
  34. 34. Overview of some empirical results 1,3 2 1 2,3 1,2 3 Phonecall-Eu CollegeMsg Email-Eu SMS-A FBWall 0.00.5 Fraction of counts Blocking motifs 0.0 0.5 Fraction of counts Non-blocking motifs 1,3 2 1 2,3 1,2 3 Paranjape, Benson, & Leskovec, WSDM, 2017. 34 Exposes one-to-one vs. one-to-many behavior in communication systems like text messaging and e-mail. 10 1 10 2 10 3 Time to complete motif (seconds) 0 200 400 600 800 1000 1200 Numberofinstances Motif counts over time in CollegeMsg 1 2,3 1,3 2 1,2 3 1,2,3 Show changes in communication patterns at different time scales.
  35. 35. Other higher-order analyses for data 35 1. Adapting tools for richer network models. Defining motifs in temporal graphs and then finding them. 2. New methods for inherently higher-order data. Understanding the structure of tensor* data. *technically, these are hypermatrices
  36. 36. Higher-order data 36 1 3 2 A A We use matrices to describe pairwise data. Ecology Aij = 1 if j eats i, 0 otherwise Correlations Aij = correlation(i, j) Often our data is inherently higher-order. We can represent this data with tensors. Natural language Aijk is frequency of 3-gram of words i, j, k Multi-modal Aijk is # of flights from city i and city j on airline k
  37. 37. Used in machine learning for optimal recovery of latent variable models (eigenvectors of higher-order moment tensors) [Anandkumar+14] 1 3 2 A tensor A : n ⇥ n ⇥ n tensor eigenvector X j,k A(i, j, k)xj xk = xi , Ax2 = x Higher-order Markov chains are often described using tensors Also give eigenvector equations on a product state-space. O(n2) memory L P[Xt+1 = i | Xt = j, Xt 1 = k] = P(i, j, k) Tensor eigenvectors and higher-order Markov chains [Lim05, Qi05] 37
  38. 38. Data type Stochastic process Property Data representation Computation First-order Random walk on a graph Relative time spent at each node (stationary dist.) Matrix representing first- order Markov chain Pij = Prob(j --> i) Matrix Eigenvector Higher-order Spacey random walk on a graph Relative time spent at each node (stationary distribution) Tensor representing higher-order Markov chain Pijk = Prob((k, j) --> (j, i)) Tensor eigenvector [Li&Ng13] Benson, Gleich, & Lim. SIAM Review, to appear 2017. 38 The context for spacey random walks 1 3 2 P P Px2 = x Px = x
  39. 39. The spacey random walk stochastic process Consider a higher-order Markov chain. If we were perfect, we’d figure out the stationary distribution of that. But we are spacey! § On arriving at state j, we promptly “space out” and forget we came from k. § But we still believe we are “higher-order” § So we invent a state k by drawing a random state from our history. P[Xt+i | history] = P[Xt+i | Xt = j, Xt 1 = k] Benson, Gleich, & Lim. SIAM Review, to appear 2017. 39
  40. 40. 10 12 4 9 7 11 4 Xt-1 Xt Yt Theorem limiting distributions of this process are tensor eigenvectors The spacey random walk P[Xt+1 = i | Xt = j, Xt 1 = k] = P(i, j, k) Higher-order Markov chain P[Xt+1 = i | Xt = j, Yt = g] = P(i, j, g) Benson, Gleich, & Lim. SIAM Review, to appear 2017. 40
  41. 41. Spacey random walks: theory and applications Benson, Gleich, & Lim. SIAM Review, to appear 2017. § Stationary distributions can be computed in O(n) memory unlike higher-order random walks. § Spacey RWs are asymptotically first-order Markovian § Generalization of Pólya urn processes § All 2x2x2x…x2 problems have a stationary distribution (not necessarily unique, though) § New method to compute tensor eigenvectors based on ODE theory (better than power method) § Spacey RWs are a good model for NYC taxi cab movement. 41
  42. 42. Spacey random walks for co-clustering Wu, Benson, & Gleich, NIPS, 2016. 42 Dec.00 June.01 Dec.01 0 50 100 150 CEO Skilling appointed CEO Skilling Started Calif. bankruptcy Calif. legislature India (general) Numberofemails Enron e-mail data Aijkt = # e-mails b/w i and j on week k with topic t [i1, …, in1 ] x [j1,…, jn2 ] x [k1, …, kn3 ] The data nonnegative brick of numbers (tensor) Co-clustering output subsets of rows, columns, and slices Main idea use relationship b/w conductance and random walks to generalize conductance for this data Fun 3-gram cluster {bag, plastic, garbage, grocery, trash, freezer} Fun 4-gram cluster {german, chancellor, angela, merkel, gerhard, schroeder, helmut, kohl} Natural language corpus Aijk = #(consecutive co-occurrences of words i, j, k) Aijkl = #(consecutive co-occurrences of words i, j, k, l)
  43. 43. Analytic tools for higher-order data § Finding clusters based on motifs Higher-order organization of complex networks, Science, 2016. Code + data http://snap.stanford.edu/higher-order § Higher-order structures in temporal networks Motifs in Temporal Networks, WSDM, 2017. Code + data http://snap.stanford.edu/temporal-motifs § New stochastic processes for higher-order data The spacey random walk, SIAM Review, to appear. General tensor spectral co-clustering, NIPS, 2016. Code + data https://github.com/arbenson/spacey-random-walks https://github.com/wutao27/GtensorSC 1 3 2 P 1 2 3 43 Figure 1: Higher-order network structures and th framework. A: Higher-order structures are captured b 13 connected three-node directed motifs are shown here. motif M7. For a given motif M, our framework aims to motif conductance, M (S), which we define as the rati triangles cut) to the minimum number of nodes in instan In this case, there is one motif cut. C: The higher-order n graph and a motif of interest (in this case, M7), the frame (WM ) by counting the number of times two nodes co-o eigenvector of a Laplacian transformation of the motif ad ordering of the nodes provided by the components of the Sr = { 1, . . . , r} of increasing size r. We prove that th S ¯S 9 10 3 5 6 7 8 1 2 4
  44. 44. Ongoing work localized motif-based clusters 44 § Find clusters containing motifs near a given seed node …and do it fast by generalizing diffusion techniques. § Triangle-based clusters are more robust to error than edge-based clusters in in common null models. Edge-based diffusion seed error error error Triangle-based diffusion Avg. precision = 0.98 (100 samples) same seed, better localization Cluster output Avg. precision = 0.20 (100 samples) Cluster output
  45. 45. Ongoing work generalizing clustering coefficients 45 ? Fundamental network measurement Clustering coefficient = fraction of length-2 paths that are closed Main idea of the work Look for higher-order closure patterns. ? Random edges u u Near-random 4- clique closure Higher 4-clique closure than 3- clique closure
  46. 46. There are many directions in the early stages 46 Algorithms § Sampling for temporal motifs. § Data-dependent algorithms. § Persistent homology for natural time scales in temporal data. Models § Good null model for temporal graphs to test statistical significance of temporal motifs. Theory § Lower bounds for temporal motif counting and enumeration. Applications § Higher-order temporal analysis in economic trade data.
  47. 47. Thanks! Austin Benson Stanford University arbenson@stanford.edu http://stanford.edu/~arbenson @austinbenson Collaborators: Ashwin Paranjape, Hao Yin, Jure Leskovec (Stanford), Tao Wu, David Gleich (Purdue), Lek-Heng Lim (Chicago) 47

×