Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Higher-order clustering coefficients

469 views

Published on

Slides from talk at the SF Data Institute Conference (DSCO17)

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Higher-order clustering coefficients

  1. 1. Higher-order clustering coefficients Austin R. Benson Cornell University Data Institute SF Conference October 16, 2017 Joint work with Hao Yin & Jure Leskovec (Stanford) Slides. bit.ly/arb-dsco17 arXiv. 1704.03913
  2. 2. 2 Background. Networks are globally sparse but locally dense. Co-authorship network Networks for real-world systems have modules, clusters, communities. [Watts-Strogatz 1998; Flake 2000; Newman 2004, 2006; many others…] Brain network Sporns and Bullmore, Nature Rev. Neuro., 2012
  3. 3. 3 How do we measure how much a network clusters?
  4. 4. 4 ? C(u) = fraction of length-2 paths centered at node u that form a triangle. average clustering coefficient C = average C(u) over all nodes u. • In real-world networks, C is larger than we would expect (there is clustering). [Watts-Strogatz 1998] > 34k citations! • Attributed to triadic closure in sociology – a common friend provides an opportunity for more friendships. [Rapoport 1953; Granovetter 1973] • Key property for generative models. [Newman 2009; Seshadhri-Kolda-Pinar 2012; Robles-Moreno-Neville 2016] • Common feature in role discovery, anomaly detection, etc. [Henderson+ 2012; La Fond-Neville-Gallagher 2014, 2016] • Predictor of mental health. [Bearman-Moody 2004] - - Background. The clustering coefficient is the fundamental measurement of network science.
  5. 5. 5 The clustering coefficient measures the closure probability of just one simple structure—the triangle. … but there is lots of evidence that dense “higher-order structure” between > 3 nodes are also important for clustering. • 4-cliques reveal community structure in word association and PPI networks [Palla+ 2005] • 4- and 5-cliques (+ other motifs/graphlets) used to identify network type and dimension [Yaveroğlu+ 2014, Bonato+ 2014] • 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 2016] The clustering coefficient is inherently limited.
  6. 6. 6 Triangles tell just one part of the story. How can we measure higher-order (clique) closure patterns?
  7. 7. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2 + 1)- clique 1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)- clique 1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-clique 7 C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique Increase clique size by 1 to get a higher-order clustering coefficient. C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique - - - Our higher-order view through clique expansion.
  8. 8. Alice Bo b Charli e 1. Start with a group of 3 friends 2. One person in the group befriends someone new 3. The group might increase in size Dav e 8 rollingstone.com oprah.com Intuition for higher-order closure in social networks.
  9. 9. 9 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
  10. 10. 10 Third-order local clustering coefficient at node u. Third-order global clustering coefficient. Third-order average clustering coefficient. Local, average, and global higher-order clustering coefficients.
  11. 11. 11 • Small-world [Watts-Strogatz 1998] • Start with n nodes and edges to 2k neighbors and then rewire each edge with probability p. n = 16 k = 3 p = 0 [Watts-Strogatz 1998] [Yin-Benson-Leskovec 2017] Small-world network analysis.
  12. 12. 12 Proposition [Yin-Benson-Leskovec 2017] Everything scales exponentially in the order of the cluster coefficient... Even if a node’s neighborhood is dense, i.e., C2(u) is large, higher-order clustering still decays exponentially in Gn,p. Gn,p random graph network analysis.
  13. 13. 13 General network combinatorial analysis. Extremal relationships HOCCs of different orders. Proposition [Yin-Benson-Leskovec 2017] For any node u in the network, (tight upper and lower bounds)
  14. 14. 14 General network combinatorial analysis. Clique participation and computation. Observation We can compute the rth-order HOCCs by enumerating r- and (r + 1)- cliques. is the number of a- cliques containing u
  15. 15. 15 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
  16. 16. 16 Datasets. Neural connections (C. elegans) 297 nodes 2.15k edges Facebook friendships (Stanford3) 11.6k nodes 568k edges Co-authorships (arXiv ca- AstroPh) 18.8k nodes 198k edges http://www.wormatlas.org/hermaphro dite/ neuronalsupport/mainframe.htm
  17. 17. 17 Neural connections 0.31 0.14 0.06 Random configurations 0.15 0.04 0.01 Random configurations (C2 fixed) 0.31 0.17 0.09 Facebook friendships 0.25 0.18 0.16 Random configurations 0.03 0.00 0.00 Random configurations (C2 fixed) 0.25 0.14 0.09 Co-authorships 0.68 0.61 0.56 Random configurations 0.01 0.00 0.00 Random configurations (C2 fixed) 0.68 0.60 0.52- - - Average higher-order clustering coefficients
  18. 18. 18 Random configurations [Bollobás 1980; Milo 2003] Random configurations with C2 fixed [Park-Newman 2004; Colomer de Simón+ 2013] Real network (C. elegans) - Concentration in random samples for neural connections data.
  19. 19. 19 Neural connections findings not just due to cliques. Original network Null model # 4-cliques 2,010 440 ± 68 C3 0.14 0.17 ± 0.004 4-clique count decreases in the null model, but the higher-order clustering coefficient increases. - Key reason. Clustering coefficients are normalized by opportunities to cluster.
  20. 20. 20 Neural connections Gn,p baseline Upper bound Facebook friendships Co-authorships Dense but nearly random regions Dense and structured regions • Real network • Random configuration with C2 fixe- Local HOCCs.
  21. 21. 21 Neural connections 0.18 0.08 0.06 decreases with order Facebook friendships 0.16 0.11 0.12 decreases and increases Co-authorships 0.32 0.33 0.36 increases with order Is this just due to cliques in co-authorships? No. High-degree nodes in co-authorships exhibit clique + star structure where C3(u) > C2(u). Global higher-order clustering coefficients. Global HOCCs tell us something about existence of
  22. 22. 22 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Theory. Analyze relationships between clustering at different orders. • small-world and Gn,p random graph models • combinatorics for general graphs 2. Data Insights. How do real-world networks cluster? • old idea pretty much all real-world networks exhibit clustering • new idea real-world networks may only cluster up to a certain order. 3. Applications. Finding “higher-order” communities. • Large higher-order clustering coefficient → can find good “higher-order community” Higher-order clustering coefficients.
  23. 23. 23 If a network has a large higher-order clustering coefficient, then it has communities. then there exists at least one community by one particular measure of “higher-order community structure”, but we can find the community efficiently.
  24. 24. Conductance is one of the most important cluster quality scores [Schaeffer 2007] used in Markov chain theory, spectral clustering, bioinformatics, vision, etc. The conductance of a set of vertices S is the ratio of edges leaving to edges in S. small conductance  good cluster (edges leaving S) (edge end points in S) 24 S S Background. Graph clustering and conductance.
  25. 25. 25 Background. Motif conductance generalizes conductance to higher-order structures like cliques [Benson-Gleich-Leskovec 2016] Uses higher-order notions of cut and volume. M = triangle motif
  26. 26. 26 Easy to see that if Cr = 1, then the network is a union of disjoint cliques… … any of these cliques has optimal motif conductance = 0 Theorem [Yin-Benson-Leskovec, in preparation] There is some node u whose 1-hop neighborhood N1(u) satisfies where M is the r-clique motif This generalizes and improves a similar r = 2 (edge) result [Gleich-Seshadhri 2012] Higher-order clustering  higher-order communities.
  27. 27. 27 Neural connections Facebook friendships Co-authorships Neighborhood Neighborhood with smallest conductance Fiedler cut with motif normalized Laplacian [Benson-Gleich-Leskovec 16] Large C3 and several neighborhoods with small triangle conductance Higher-order clustering  higher-order communities.
  28. 28. 28 Papers • “Higher-order clustering in networks.” Yin, Benson, and Leskovec. arXiv, 2017. • “Local higher-order graph clustering.” Yin, Benson, Leskovec, and Gleich. KDD, 2017. • “Higher-order organization of complex networks.” Benson, Gleich, and Leskovec. Science, 2016. 1. A generalization of the fundamental measurement of network science through “clique expansion” interpretation. 2. Able to analyze generally and in common random graph models (small-world and Gn,p). 3. old idea all real-world graphs cluster. new idea only cluster up to a certain order. 4. In data, helps distinguish between dense and random (neural connections) and dense and structured (FB friendships, co-authorship). 5. Higher-order clustering implies local (1-hop neighborhood) higher-order communities. Open questions / future work • Is there a generative model that reproduces the observed higher-order clustering coefficients (e.g., forest fire)? • Tighter analysis for 1-hop neighborhood conductance? • Higher-order clustering coefficients for other motifs (i.e., not just cliques). http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu Thanks! Austin Benson
  29. 29. 29 Higher-order clustering  higher-order communities. Theory. (pessimistic in practice) Practice. If the higher-order clustering coefficient is non-trivial, then there should be good local clusters.
  30. 30. 30 Related work  Gleich and Seshadrhi, “Vertex neighborhoods, low conductance cuts, and good seeds for local community methods”, KDD, 2012. Motivation for relating higher-order clustering coefficients to 1-hop neighborhood communities. Intellectually indebted for their proof techniques!  Benson, Gleich, and Leskovec, “Higher-order organization of complex networks,” Science, 2016. Introduced higher-order conductance and a spectral method for optimizing it.  Fronczak et al., “Higher order clustering coefficients in Barabási–Albert networks.” Physica A, 2002. Higher-order clustering by looking at shortest path lengths.  Jiang and Claramunt, “Topological analysis of urban street networks,” Environ. and Planning B, 2004. Higher-order clustering by looking for triangles in k-hop neighborhoods.  Lambiotte et al., “Structural Transitions in Densifying Networks,” PRL, 2016.  Bhat et al., “Densification and structural transitions in networks that grow by node copying,” PRE, 2016. Generative models with similar clique closure ideas.
  31. 31. 31 General network combinatorial analysis. Clique density interpretation. Proposition [Yin-Benson-Leskovec 2017] The product of the first r - 1 local higher-order clustering coefficients is the r-clique density between the neighbors of node u.
  32. 32. 32 Neural connections Facebook friendships Co-authorships Decrease in average clustering with order is independent of degree. For large degrees, Changes in higher-order clustering coefficients tend to be independent of degree.
  33. 33. 33 Local higher-order graph clustering Yin, Benson, Leskovec, & Gleich, KDD, 2017. • Studies the general problem of finding local clusters based on motifs (cliques). • Our method is a generalization of the Andersen-Chung-Lang personalized PageRank algorithm that expands clusters around a seed node. • Theoretical guarantees on cluster quality and performance (in practice, < 2 sec / seed on 2B edge graph). Seed node Local cluster
  34. 34. 34 Local higher-order graph clustering Yin, Benson, Leskovec, & Gleich, KDD, 2017. • Clusters based on triangles yield better recovery results on common synthetic graph models. Average F1 0.40 0.50 • Clusters based on triangles can better recover a person’s departmental affiliation in an academic email network.

×