Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Higher-order clustering coefficients

375 views

Published on

Slides from my talk at the SIAM Workshop on Network Science on July 13, 2017.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Higher-order clustering coefficients

  1. 1. Higher-order clustering coefficients Austin R. Benson Cornell University SIAM Network Science July 13, 2017 Joint work with Hao Yin, Stanford Jure Leskovec, Stanford David Gleich, Purdue Slides bit.ly/austin-ns17
  2. 2. u The clustering coefficient is the fundamental measurement of network science 2 ? C(u) = fraction of length-2 paths centered at node u that form a triangle. average clustering coefficient C = average C(u) over all nodes u. § In real-world networks, C is larger than we would expect (there is clustering). [Watts-Strogatz 1998] > 33k citations! § Attributed to triadic closure in sociology – a common friend provides an opportunity for more friendships. [Simmel 1908, Rapoport 1953, Granovetter 1973] § Key property for generative models [Newman 2009, Seshadhri-Kolda-Pinar 2012] § Used as a feature for node role discovery [Henderson+ 2012] § Predictor of mental health [Bearman-Moody 2004] - - u
  3. 3. 3 The clustering coefficient is inherently limited as it measures the closure probability of just one simple structure—the triangle. And there is lots of evidence that dense “higher-order structure” between > 3 nodes are also important for clustering. § 4-cliques reveal community structure in word association and PPI networks [Palla+ 2005] § 4- and 5-cliques (+ other motifs/graphlets) used to identify network type and dimension [Yaveroğlu+ 2014, Bonato+ 2014] § 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 2016]
  4. 4. 4 Triangles tell just one part of the story. How can we measure higher-order (clique) closure patterns?
  5. 5. Our higher-order view of clustering coefficients 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2 + 1)-clique 1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-clique 1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-clique 5 C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique Increase clique size by 1 to get a higher-order clustering coefficient. C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique - - - u uuu uu uuu
  6. 6. Alice Bob Charlie1. Start with a group of 3 friends 2. One person in the group befriends someone new 3. The group might increase in size Dave 6 rollingstone.com oprah.com A plausible example of higher-order closure
  7. 7. 7 We generalize clustering coefficients to account for clique closure. This particular generalization has several advantages… 1. Easy to analyze relationships between clustering at different orders. results for small-world and Gn,p models as well as general analysis 2. New insights into data old idea pretty much all real-world networks exhibit clustering. new idea real-world networks may only cluster up to a certain order. 3. Can relate clustering coefficients to existence of communities Large higher-order clustering coefficient → can find a good “higher-order community” Overview Higher-order clustering coefficients uuu
  8. 8. 8 What can we say theoretically?
  9. 9. 9 Third-order local clustering coefficient at node u. u u # # Third-order global clustering coefficient. Third-order average clustering coefficient. u u # # ¯C3 = C3(u) = C3 = u u # # 1 n P u P u P u = 1 n P u C3(u) Local, average, and global HOCCs
  10. 10. Analysis for small-world networks 10 Ring network n nodes, edges to 2k neighbors edge rewiring probability p n = 16 k = 3 p = 0 [Watts-Strogatz 1998] With p = 0:0 and k < cn, as k; n → ∞, 10-3 10-2 10-1 100 Rewiring probability (p) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Avg.clust.coeff. C2- 10-3 10-2 10-1 100 Rewiring probability (p) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Avg.clust.coeff. C2- C3 C4 - - Proposition [Yin-Benson-Leskovec 2017] ¯C2 → 3 4 ¯Cr → 1 2 + 1 2r
  11. 11. Analysis for Gn,p networks 11 E [Cr (u) | C2(u)] = (C2(u))r−1 + O(1/d2 u ) E [Cr ] = E ˆ ¯Cr ˜ = E [Cr (u)] = pr−1 Proposition [Yin-Benson-Leskovec 2017] Everything scales exponentially in the order of the cluster coefficient... Even if a node’s neighborhood is dense, i.e., large C2(u), higher-order clustering still decays exponentially.
  12. 12. Analysis for general networks 12 Proposition For any node u in any network, (tight upper and lower bounds) 0 ≤ C3(u) ≤ p C2(u) u uu C2(u) = 1 C3(u) = 1 C2(u) ≈ 1/4 C3(u) = 1/2 C2(u) ≈ 1/2 C3(u) = 0 Observation Cr (u) = rKr+1(u) (du − r + 1)Kr (u) (says how to compute HOCCs by enumerating r- and (r + 1)-cliques.) Ka(u) is the number of a-cliques containing u
  13. 13. 13 What happens in the data?
  14. 14. 14 Neural connections (C. elegans) 0.31 0.14 0.06 Random configurations 0.15 0.04 0.01 Random configurations (C2 fixed) 0.31 0.17 0.09 Facebook friendships (Stanford3) 0.25 0.18 0.16 Random configurations 0.03 0.00 0.00 Random configurations (C2 fixed) 0.25 0.14 0.09 Co-authorship (ca-AstroPh) 0.68 0.61 0.56 Random configurations 0.01 0.00 0.00 Random configurations (C2 fixed) 0.68 0.60 0.52 ¯C2 ¯C3 ¯C4 Average HOCCs - - - uuu
  15. 15. 15 Neural connections Random configurations [Bollobás 1980, Milo 2003] Random configurations with C2 fixed [Park-Newman 2004, Colomer de Simón+ 2013] Real network (C. elegans) u ¯C3 Concentration in random samples -
  16. 16. 16 C2(u) C3(u) Neural connections Gn,p baseline Upper bound Facebook friendships Co-authorships Dense but nearly random regions Dense and structured regions • Real network • Random configuration with C2 fixed Local HOCCs - u u
  17. 17. 17 Neural connections 0.18 0.08 0.06 decreases with order Facebook friendships 0.16 0.11 0.12 decreases and increases Co-authorship 0.32 0.33 0.36 increases with order Global HOCCs C2 C3 C4 The global HOCCs tell us something about the existence of communities… u High-degree nodes in co-authorship exhibit clique + star structure where C3(u) > C2(u).
  18. 18. 18 Does higher-order clustering lead to “communities”?
  19. 19. 19 If a network has a large higher-order clustering coefficient, then it has communities. then there exists at least one community by one particular measure of “higher-order community structure”, but we can find the community efficiently.
  20. 20. The conductance of a set of vertices S is the ratio of edges leaving to edge end points in S. small conductance ó good “community” (edges leaving S) (edge end points in S) 20 S Background graph communities and conductance S cut(S) = 7 vol(S) = 85 (S) = 7/85 (S) = cut(S) vol(S)
  21. 21. Background motif conductance generalizes conductance to higher-order structures like cliques [Benson-Gleich-Leskovec 2016] Uses higher-order notions of cut and volume 21 vol(S) = #(edge end points in S) cut(S) = #(edges cut) cutM (S) = #(motifs cut) volM (S) = #(motif end points in S) S S S S S (S) = cut(S) vol(S) M (S) = cutM (S) volM (S)
  22. 22. 22 Higher-order clustering → a good higher-order community Easy to see that if Cr = 1, then the network is a union of disjoint cliques… … any of these cliques has optimal motif conductance = 0 Theorem [Yin-Benson-Leskovec-Gleich 2017] There exists some node u whose 1-hop neighborhood N1(u) satisfies where M is the r-clique motif, f is monotonically decreasing, and f(Cr) → 0 and Cr → 1 φM (N1(u)) ≤ f(Cr ) N1(u) This generalizes and improves a similar r = 2 result [Gleich-Seshadhri 12] u u
  23. 23. 23 1-hop neighborhoods and higher-order communities Neural connections Facebook friendships Co-authorships Neighborhood Neighborhood with smallest conductance Fiedler cut with motif normalized Laplacian [Benson-Gleich-Leskovec 16] Large C3 and several neighborhoods with small triangle conductance
  24. 24. 24 Related work § Gleich and Seshadrhi, “Vertex neighborhoods, low conductance cuts, and good seeds for local community methods”, KDD, 2012. Motivation for relating higher-order clustering coefficients to 1-hop neighborhood communities. Intellectually indebted for their proof techniques! § Benson, Gleich, and Leskovec, “Higher-order organization of complex networks,” Science, 2016. Introduced higher-order conductance and a spectral method for optimizing it. § Fronczak et al., “Higher order clustering coefficients in Barabási–Albert networks.” Physica A, 2002. Higher-order clustering by looking at shortest path lengths. § Jiang and Claramunt, “Topological analysis of urban street networks,” Environ. and Planning B, 2004. Higher-order clustering by looking for triangles in k-hop neighborhoods. § Lambiotte et al., “Structural Transitions in Densifying Networks,” Physical Review Letters, 2016. § Bhat et al., “Densification and structural transitions in networks that grow by node copying,” Physical Review E, 2016. Generative models with similar clique closure ideas.
  25. 25. 25 § Higher-order clustering in networks. Yin, Benson, and Leskovec. arXiv:1704.03913, 2017 § Local higher-order graph clustering. Yin, Benson, Leskovec, and Gleich. To appear at KDD, 2017. http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu Thanks! Austin R. Benson 1. A generalization of the fundamental measurement of network science through “clique expansion” interpretation. 2. Able to analyze generally and in common random graph models (small-world and Gn,p). 3. Old idea all real-world graphs cluster. → New idea only cluster up to a certain order. 4. In data, helps distinguish between dense and random (neural connections) and dense and structured (FB friendships, co-authorship). 5. Global higher-order clustering leads to higher-order 1-hop neighborhood communities. Papers Higher-order clustering coefficients

×