Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 483042 views
- AI and Machine Learning Demystified... by Carol Smith 3415679 views
- 10 facts about jobs in the future by Pew Research Cent... 563919 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 926620 views
- Harry Surden - Artificial Intellige... by Harry Surden 521819 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1109383 views

370 views

Published on

Published in:
Data & Analytics

No Downloads

Total views

370

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Higher-order clustering coefficients Austin R. Benson Cornell University Purdue CSoI Seminar October 4, 2017 Joint work with Hao Yin & Jure Leskovec (Stanford)
- 2. 2 Brains nodes are neurons edges are synapses Social networks nodes are people edges are friendships Electrical grid nodes are power plants edges are transmission linesTim Meko, Washington Post Currency nodes are accounts edges are transactions Background. Networks are sets of nodes and edges (graphs) that model real-world systems.
- 3. 3 Background. Networks are globally sparse but locally dense. Co-author network Networks for real-world systems have modules, clusters, communities. [Watts-Strogatz 1998; Flake 2000; Newman 2004, 2006; many others…] Brain network Sporns and Bullmore, Nature Rev. Neuro., 2012
- 4. 4 How do we measure how much a network clusters?
- 5. 5 ? C(u) = fraction of length-2 paths centered at node u that form a triangle. average clustering coefficient C = average C(u) over all nodes u. • In real-world networks, C is larger than we would expect (there is clustering). [Watts-Strogatz 1998] > 34k citations! • Attributed to triadic closure in sociology – a common friend provides an opportunity for more friendships. [Rapoport 1953; Granovetter 1973] • Key property for generative models. [Newman 2009; Seshadhri-Kolda-Pinar 2012; Robles-Moreno-Neville 2016] • Common feature in role discovery, anomaly detection, etc. [Henderson+ 2012; La Fond-Neville-Gallagher 2014, 2016] • Predictor of mental health. [Bearman-Moody 2004] - - Background. The clustering coefficient is the fundamental measurement of network science.
- 6. 6 The clustering coefficient measures the closure probability of just one simple structure—the triangle. … but there is lots of evidence that dense “higher-order structure” between > 3 nodes are also important for clustering. • 4-cliques reveal community structure in word association and PPI networks [Palla+ 2005] • 4- and 5-cliques (+ other motifs/graphlets) used to identify network type and dimension [Yaveroğlu+ 2014, Bonato+ 2014] • 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 2016] The clustering coefficient is inherently limited.
- 7. 7 Triangles tell just one part of the story. How can we measure higher-order (clique) closure patterns?
- 8. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2 + 1)- clique 1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)- clique 1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-clique 8 C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique Increase clique size by 1 to get a higher-order clustering coefficient. C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique - - - Our higher-order view through clique expansion.
- 9. Alice Bo b Charli e 1. Start with a group of 3 friends 2. One person in the group befriends someone new 3. The group might increase in size Dav e 9 rollingstone.com oprah.com Intuition for higher-order closure in social networks.
- 10. 10 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
- 11. 11 Second-order (classical) local clustering coefficient at node u. Second-order (classical) global clustering coefficient. Second-order (classical) average clustering coefficient. Background. Local, average, and global clustering coefficients.
- 12. 12 Third-order local clustering coefficient at node u. Third-order global clustering coefficient. Third-order average clustering coefficient. Local, average, and global higher-order clustering coefficients.
- 13. 13 • Small-world [Watts-Strogatz 1998] • Start with n nodes and edges to 2k neighbors and then rewire each edge with probability p. n = 16 k = 3 p = 0 [Watts-Strogatz 1998] [Yin-Benson-Leskovec 2017] Small-world network analysis.
- 14. 14 Proposition [Yin-Benson-Leskovec 2017] Everything scales exponentially in the order of the cluster coefficient... Even if a node’s neighborhood is dense, i.e., C2(u) is large, higher-order clustering still decays exponentially in Gn,p. Gn,p random graph network analysis.
- 15. 15 General network combinatorial analysis. Extremal relationships HOCCs of different orders. Proposition [Yin-Benson-Leskovec 2017] For any node u in the network, (tight upper and lower bounds)
- 16. 16 General network combinatorial analysis. Clique density interpretation. Proposition [Yin-Benson-Leskovec 2017] The product of the first r - 1 local higher-order clustering coefficients is the r-clique density between the neighbors of node u.
- 17. 17 General network combinatorial analysis. Clique participation and computation. Observation We can compute the rth-order HOCCs by enumerating r- and (r + 1)- cliques. is the number of a- cliques containing u
- 18. 18 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
- 19. 19 Datasets. Neural connections (C. elegans) 297 nodes 2.15k edges Facebook friendships (Stanford3) 11.6k nodes 568k edges Co-authorships (arXiv ca- AstroPh) 18.8k nodes 198k edges http://www.wormatlas.org/hermaphro dite/ neuronalsupport/mainframe.htm
- 20. 20 Neural connections 0.18 0.08 0.06 decreases with order Facebook friendships 0.16 0.11 0.12 decreases and increases Co-authorships 0.32 0.33 0.36 increases with order Is this just due to cliques in co-authorships? No. High-degree nodes in co-authorships exhibit clique + star structure where C3(u) > C2(u). Global higher-order clustering coefficients.
- 21. 21 Neural connections 0.31 0.14 0.06 Random configurations 0.15 0.04 0.01 Random configurations (C2 fixed) 0.31 0.17 0.09 Facebook friendships 0.25 0.18 0.16 Random configurations 0.03 0.00 0.00 Random configurations (C2 fixed) 0.25 0.14 0.09 Co-authorships 0.68 0.61 0.56 Random configurations 0.01 0.00 0.00 Random configurations (C2 fixed) 0.68 0.60 0.52- - - Average higher-order clustering coefficients
- 22. 22 Random configurations [Bollobás 1980; Milo 2003] Random configurations with C2 fixed [Park-Newman 2004; Colomer de Simón+ 2013] Real network (C. elegans) - Concentration in random samples for neural connections data.
- 23. 23 Neural connections findings not just due to cliques. Original network Null model # 4-cliques 2,010 440 ± 68 C3 0.14 0.17 ± 0.004 4-clique count decreases in the null model, but the higher-order clustering coefficient increases. - Key reason. Clustering coefficients are normalized by opportunities to cluster.
- 24. 24 Neural connections Gn,p baseline Upper bound Facebook friendships Co-authorships Dense but nearly random regions Dense and structured regions • Real network • Random configuration with C2 fixe- Local HOCCs.
- 25. 25 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
- 26. 26 If a network has a large higher-order clustering coefficient, then it has communities. then there exists at least one community by one particular measure of “higher-order community structure”, but we can find the community efficiently.
- 27. Conductance is one of the most important cluster quality scores [Schaeffer 2007] used in Markov chain theory, spectral clustering, bioinformatics, vision, etc. The conductance of a set of vertices S is the ratio of edges leaving to edges in S. small conductance good cluster (edges leaving S) (edge end points in S) 27 S S Background. Graph clustering and conductance.
- 28. 28 Background. Motif conductance generalizes conductance to higher-order structures like cliques [Benson-Gleich-Leskovec 2016] Uses higher-order notions of cut and volume. M = triangle motif
- 29. 29 Easy to see that if Cr = 1, then the network is a union of disjoint cliques… … any of these cliques has optimal motif conductance = 0 Theorem [Yin-Benson-Leskovec, in preparation] There is some node u whose 1-hop neighborhood N1(u) satisfies where M is the r-clique motif This generalizes and improves a similar r = 2 (edge) result [Gleich-Seshadhri 2012] Higher-order clustering higher-order communities.
- 30. 30 Neural connections Facebook friendships Co-authorships Neighborhood Neighborhood with smallest conductance Fiedler cut with motif normalized Laplacian [Benson-Gleich-Leskovec 16] Large C3 and several neighborhoods with small triangle conductance Higher-order clustering higher-order communities.
- 31. 31 Higher-order clustering higher-order communities. Theory. (pessimistic in practice) Practice. If the higher-order clustering coefficient is non-trivial, then there should be good local clusters.
- 32. 32 Local higher-order graph clustering Yin, Benson, Leskovec, & Gleich, KDD, 2017. • Studies the general problem of finding local clusters based on motifs (cliques). • Our method is a generalization of the Andersen-Chung-Lang personalized PageRank algorithm that expands clusters around a seed node. • Theoretical guarantees on cluster quality and performance (in practice, < 2 sec / seed on 2B edge graph). Seed node Local cluster
- 33. 33 Local higher-order graph clustering Yin, Benson, Leskovec, & Gleich, KDD, 2017. • Clusters based on triangles yield better recovery results on common synthetic graph models. Average F1 0.40 0.50 • Clusters based on triangles can better recover a person’s departmental affiliation in an academic email network.
- 34. 34 Related work Gleich and Seshadrhi, “Vertex neighborhoods, low conductance cuts, and good seeds for local community methods”, KDD, 2012. Motivation for relating higher-order clustering coefficients to 1-hop neighborhood communities. Intellectually indebted for their proof techniques! Benson, Gleich, and Leskovec, “Higher-order organization of complex networks,” Science, 2016. Introduced higher-order conductance and a spectral method for optimizing it. Fronczak et al., “Higher order clustering coefficients in Barabási–Albert networks.” Physica A, 2002. Higher-order clustering by looking at shortest path lengths. Jiang and Claramunt, “Topological analysis of urban street networks,” Environ. and Planning B, 2004. Higher-order clustering by looking for triangles in k-hop neighborhoods. Lambiotte et al., “Structural Transitions in Densifying Networks,” PRL, 2016. Bhat et al., “Densification and structural transitions in networks that grow by node copying,” PRE, 2016. Generative models with similar clique closure ideas.
- 35. 35 Papers • “Higher-order clustering in networks.” Yin, Benson, and Leskovec. arXiv, 2017. • “Local higher-order graph clustering.” Yin, Benson, Leskovec, and Gleich. KDD, 2017. • “Higher-order organization of complex networks.” Benson, Gleich, and Leskovec. Science, 2016. 1. A generalization of the fundamental measurement of network science through “clique expansion” interpretation. 2. Able to analyze generally and in common random graph models (small-world and Gn,p). 3. old idea all real-world graphs cluster. new idea only cluster up to a certain order. 4. In data, helps distinguish between dense and random (neural connections) and dense and structured (FB friendships, co-authorship). 5. Higher-order clustering implies local (1-hop neighborhood) higher-order communities. Open questions / future work • Is there a generative model that reproduces the observed higher-order clustering coefficients (e.g., forest fire)? • Tighter analysis for 1-hop neighborhood conductance? • Higher-order clustering coefficients for other motifs (i.e., not just cliques). http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu Thanks! Austin Benson
- 36. 36 Neural connections Facebook friendships Co-authorships Decrease in average clustering with order is independent of degree. For large degrees, Changes in higher-order clustering coefficients tend to be independent of degree.

No public clipboards found for this slide

Be the first to comment