Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- AI and Machine Learning Demystified... by Carol Smith 2246259 views
- The AI Rush by Jean-Baptiste Dumont 612391 views
- 10 facts about jobs in the future by Pew Research Cent... 357195 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 586442 views
- Harry Surden - Artificial Intellige... by Harry Surden 312246 views
- Inside Google's Numbers in 2017 by Rand Fishkin 864883 views

149 views

Published on

April 22, 2017

Published in:
Data & Analytics

No Downloads

Total views

149

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

7

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Higher-order graph clustering Austin Benson Stanford University AMS Spring Western Sectional Pullman, WA April 22, 2017 Joint work with David Gleich, Purdue Jure Leskovec, Stanford
- 2. A C B A C B 2 Unifying two themes of network analysis Co-author network 1. Real-world networks have modular organization. [Newman04, Newman06] 2. Higher-order connectivity patterns, or network motifs, drive complex systems. [Milo+02, Yaveroğlu+14]. Edge-based clustering and community detection sometimes expose this structure. Signed feed-forward loops in genetic transcription. [Mangan+03, Alon07] Triangles in social networks. [Rapoport53, Granovetter73] Bi-directed length-2 paths in brain networks. [Sporns-Kötter04, Sporns+07] A CA C
- 3. 3 Motifs are the fundamental units of complex networks. We should look for structure (clusters) in terms of them.
- 4. Network Motif Different motifs give different clusters. 4 Higher-order graph clustering is our technique for finding clusters based on motifs
- 5. 5 § We will generalize spectral clustering, a classical technique to find clusters or communities in a graph, to use motifs as the fundamental unit to partition. § Based on a higher-order (motif-based) conductance metric that generalizes the traditional conductance. § Comes with theoretical guarantees. § We’ll first briefly review how spectral clustering works. § Then we’ll see how to adapt it to work with network motifs. § Then we’ll see the impact of this approach on various real-world data. Higher-order graph clustering Main points and overview
- 6. Graph Laplacian Eigenvector 6 Background spectral clustering is a classic technique to partition graphs by looking at eigenvectors Cluster [Fiedler73]
- 7. Conductance is one of the most important cluster quality scores [Schaeffer07] used in Markov chain theory, spectral clustering, bioinformatics, vision, etc. The conductance of a set of vertices S is the ratio of edges leaving to total edges small conductance ó good cluster (edges leaving S) (edge end points in S) 7 S cut(S) = 7 vol(S) = 85 vol( ¯S) = 151 ﬃ(S) = 7/85 Background spectral clustering works based on conductance S (S) = cut(S) min(vol(S), vol(S))
- 8. Cheeger Inequality Finding the best conductance set is NP-hard. L § Cheeger realized the eigenvalues of the Laplacian provided surface area to volume bounds in manifolds. § Alon and Milman independently realized the same thing for a graph (conductance)! Laplacian 2 ⇤/2 2 2 ⇤ 0 = 1 2 ... n 2 ⇤ = set of smallest conductance 8 Background spectral clustering has theoretical guarantees [Cheeger70, Alon-Milman85] D = diag(Ae) L = D−1/2 (A − D)D−1/2 Eigenvalues of the Laplacian L
- 9. We can find a set S that achieves the Cheeger bound. 1. Compute the eigenvector z associated with λ2 and scale to f = D-1/2z 2. Sort the vertices by their values in f: σ1, σ2, …, σn 3. Let Sr = {σ1, …, σr} and compute the conductance of φ(Sr) of each Sr. 4. Pick the set Sm with minimum conductance. 9 Background the sweep cut algorithm realizes the guarantee [Mihail89, Chung92] 0 20 40 0 0.2 0.4 0.6 0.8 1 S i φi (Sm) 2 Sr
- 10. 0 20 40 0 0.2 0.4 0.6 0.8 1 Si φ i 10 Background the sweep cut visualized [Mihail89, Chung92] Sr (S) = cut(S) min(vol(S), vol(S))
- 11. We want to cluster with richer data Motifs that may be directed, signed, colored, feature-valued, etc. 11 Spectral clustering is theoretically justified for finding edge-based clusters in undirected, simple graphs. Signed feed-forward loops in genetic transcription [Mangan+03] Gene X activates transcription in gene Y. Gene X suppresses transcription in gene Z. Gene Y suppresses transcription in gene Z.X Y Z
- 12. Benson-Gleich-Leskovec, Science, 2016. 12 Our contributions § A generalized conductance metric for motifs. § A new spectral clustering algorithm to minimize the generalized conductance. § AND an associated motif Cheeger inequality guarantee. § Naturally handles directed, signed, colored, weighted, and combinations of motifs. § Scales to networks with billions of edges. § Applications in ecology, biology, and transportation. A B wijk
- 13. 13 How do we find clusters based on motifs?
- 14. Motif-based conductance Need new notions of cut and volume 14 φM (S) = cutM (S) min(volM (S), volM (S)) (S) = cut(S) min(vol(S), vol(S)) vol(S) = #(edge end points in S) cut(S) = #(edges cut) cutM (S) = #(motifs cut) volM (S) = #(motif end points in S) S S S S S M = triangle motif
- 15. Motif-based conductance Motif M M (S) = motifs cut motif volume = 1 8 15 9 10 3 5 6 7 8 1 2 4 9 10 3 5 6 7 8 1 2 4 S S
- 16. Problem Given a motif M and a graph G, we want to find a set of nodes S that minimizes motif conductance This is NP-hard. [Wagner-Wagner93] Our solution Generalize spectral clustering for motifs 1. Form new weighted, undirected graph W(M) based on M and G 2. Compute Fiedler vector of Laplacian matrix of W(M) [Fiedler73, Alon-Milman85] 3. Use “sweep cut” procedure to output clusters [Mihail89, Chung92] Theorem (motif Cheeger inequality) resulting clusters will obtain near optimal motif conductance 16 Higher-order clustering
- 17. Motif-based spectral clustering 1 2 3 4 5 6 7 8 9 10 1 11 1 1 1 3 1 1 1 1 1 2 1 1 17 9 10 3 5 6 7 8 1 2 4 W(M) ij = #{instances of motif M that contain nodes i and j} motif M Step 1. Given directed graph G and motif M, form a weighted graph W(M). graph G weighted graph W(M)
- 18. Key insight Classical spectral clustering on weighted graph W(M) finds clusters of low motif conductance. M (S) = motifs cut motif volume = 1 10 1 2 3 4 5 6 7 8 9 10 1 11 1 1 1 3 1 1 1 1 1 2 1 1 18 Motif-based spectral clustering Step 1. Given directed graph G and motif M, form a weighted graph W(M). motif M W(M) ij = #{instances of motif M that contain nodes i and j} W(M)
- 19. Motif-based spectral clustering 19 Step 2. Compute the eigenvector f(M) associated with λ2 of the normalized Laplacian matrix of W(M) 1 2 3 4 5 6 7 8 9 10 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 Takes roughly O(# edges) time. D = diag(W(M) e) L(M) = D−1/2 (D − W(M) )D−1/2 L(M) z = –2z f (M) = D−1/2 z f(M)
- 20. Motif-based spectral clustering 20 Step 3 (motif sweep cut) [Mihail89,Chung92] § Sort nodes by values in f(M) → σ1, σ2, …σn. § Pick set Sr = {σ1, …, σr} with smallest motif conductance. 9 10 3 5 6 7 8 1 2 4
- 21. Motif Cheeger inequality Theorem If the motif has three nodes, then the sweep procedure on the weighted graph finds a set S of nodes for which For 4+ nodes, need slightly different notion of conductance. 21 M(G) = {instances of M in G} Key Proof Step cutM (S, G) = X {i,j,k}2M(G) Indicator[xi , xj , xk not the same] = 1 4 (x2 i + x2 j + x2 k xi xj xj xk xi xk ) = quadratic in x M (S) 2 M
- 22. Applications 22 1. We do not know the motif of interest. food webs and new applications 2. We know the motif of interest from domain knowledge. yeast transcription regulation networks, connectome, social networks 3. We seek richer information from our data. transportation networks and new applications
- 23. 23 Application 1 We do not know the motif of interest.
- 24. Application 1 Food webs Florida bay food web § Nodes are species § Edges represent carbon exchange i → j if j eats i § Motifs represent energy flow patterns 24 http://marinebio.org/oceans/marine-zones/ M6M5
- 25. Application 1 Food webs Which motif clusters the food web? Our approach § Run motif spectral clustering for all 3-node motifs as well as for just edges. § Examine the sweep profile to see which motif gives the best clusters. 25
- 26. Application 1 Food webs 26 M6 M5 edge Our finding motif M6 organizes the food web into good clusters M6M5 edge
- 27. B D Micronutrient sources Pelagic fishes and benthic prey Benthic macro- invertebrates Benthic Fishes Motif M6 reveals aquatic layers A 61% accuracy vs. 48% with edge- based methods 27 Application 1 Food webs
- 28. 28 Application 2 We know the motif of interest from domain knowledge.
- 29. Application 2 Yeast transcription regulation networks § Nodes are groups of genes § Edge i → j means i regulates transcription to j § Sign + / - denotes activation / suppression § Coherent feedforward loops encode biological function [Mangan+03, Alon07] A CC A 29
- 30. Clustering based on coherent feedforward loops identifies functions studied individually by biologists [Mangan+03] 97% accuracy vs. 68–82% with edge-based methods A B A CA C A B 30 Application 2 Yeast transcription regulation networks
- 31. A B 31 Structure of the found modules (all edge signs are positive) Application 2 Yeast transcription regulation networks
- 32. 32 Application 3 We seek richer information from our data.
- 33. Application 3 Transportation networks § North American air transport network. § Nodes are cites. § i → j if you can travel from i to j in < 8 hours. [Frey-Dueck07] 33
- 34. Accepted pending m B A Important motifs from literature [Rosvall+2014] W (M) ij = #{bi-directional length-2 paths from i to j} Weighted adjacency matrix already reveals hub-like structure 34 Application 3 Transportation networks
- 35. Top 8 U.S. hubs East coast non-hubs West coast non-hubs Primary spectral coordinate Atlanta, the top hub, is next to Salina, a non-hub. MOTIF SPECTRAL EMBEDDING EDGE SPECTRAL EMBEDDING 35 Application 3 Transportation networks
- 36. 36 §Generalization of graph clustering to higher-order structures (motifs) through a new objective (motif conductance). §Generalizing old ideas from spectral graph theory admits a new algorithm and a motif Cheeger inequality. §Applications in ecology, biology, transportation. Recap Higher-order graph clustering
- 37. # form W0 … W4 sc0 = spectral_cut(W0) sc1 = spectral_cut(W1) sc2 = spectral_cut(W2) sc3 = spectral_cut(W3) sc4 = spectral_cut(W4) plt = x -> semilogx( x.sweepcut_profile. conductance) plt(sc0) plt(sc1) plt(sc2) plt(sc3) plt(sc4) Find me at… http://stanford.edu/~arbenson @austinbenson arbenson@stanford.edu Thanks! Austin Benson Stanford University For fun reproduce some results from this talk! Jupyter notebooks at bit.ly/higher-order-julia Higher-order graph clustering Paper Benson-Gleich-Leskovec, Higher-order organization of complex networks, Science, 2016 Code + data http://snap.stanford.edu/higher-order

No public clipboards found for this slide

Be the first to comment