Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Higher-order clustering in networks

156 views

Published on

Slides for my talk at the HONS Satellite at NetSci.
Paris, France.
June 12, 2018.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Higher-order clustering in networks

  1. 1. Higher-order clustering in networks Austin R. Benson · Cornell HONS 2018 June 8, 2018 · Paris, France HONS'18Austin R. Benson 1 Joint work with Hao Yin · Stanford Jure Leskovec · slides ⟶ bit.ly/arb-HONS-18 code ⟶ github.com/arbenson/HigherOrderClustering.jl
  2. 2. Many networks are globally sparse but locally dense. HONS'18Austin R. Benson 2 Coauthorship network Brain network Sporns and Bullmore, Nature Rev. Neuro., 2012 Networks for real-world systems have modules, clusters, communities. [Watts-Strogatz 98; Flake 00; Newman 04, 06; many others…]
  3. 3. HONS'18Austin R. Benson 3 How do we measure how much a network clusters?
  4. 4. The clustering coefficient is a fundamental measure in network science about how much a network clusters. HONS'18Austin R. Benson 4 ? C(u) = fraction of length-2 paths centered at node u that form a triangle. Average clustering coefficient C = mean of C(u). • Data insights. Average clustering coefficient is larger than we would expect. [Watts-Strogatz 98] > 36k citations! • Domain phenomenon. Triadic closure in sociology. [Simmel 1908; Rapoport 53; Granovetter 73] • Statistical Feature. Role discovery, anomaly detection, mental health study. [Henderson+ 12; La Fond+ 14, 16; Bearman-Moody 2004] • Modeling tool. Key property for generative models. [Newman 09; Seshadhri-Kolda-Pinar 12; Roble+ 16] -
  5. 5. Higher-order clustering coefficients are limited. HONS'18Austin R. Benson 5 The clustering coefficient measures the closure probability of just one simple structure—the triangle. • 4-cliques reveal community structure in word association and PPI networks [Palla+ 05] • 4-/5-cliques (+ other structure) identify network type & dimension [Yaveroğlu+ 14, Bonato+ 14 • 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 16] … but there is lots of evidence that dense “higher- order structure” between > 3 nodes are also important for clustering.
  6. 6. We will show that triangles are insufficient to explain clustering. We need larger cliques. HONS'18Austin R. Benson 6 • old idea ⟶ pretty much all real-world networks exhibit clustering. • new idea ⟶ networks may only cluster up to a certain “order”.
  7. 7. HONS'18Austin R. Benson 7 Triangles tell just one part of the story. How do we measure clustering with respect to higher-order (clique) closure?
  8. 8. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2+1)-clique 1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-clique 1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)- clique 8 C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique. Increase clique size by 1 to get a higher-order clustering coefficient! C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique. C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique. - - - We view clustering as a clique expansion process. HONS'18Austin R. Benson
  9. 9. 9 We can think of higher-order closure processes in everyday life. HONS'18Austin R. Benson Alice Bob Charlie 1. Start with a group of 3 friends. 2. One person in the group befriends someone new. 3.The group might increase in size. Dave rollingstone.com oprah.com
  10. 10. 10 Higher-order clustering coefficients offer several advantages. HONS'18Austin R. Benson Theory & analysis. • Better understanding of small-world and Gn,p random graph models. • Extremal combinatorics for general graphs. Data Insights. • old idea ⟶ pretty much all real-world networks exhibit clustering. • new idea ⟶ real-world networks may only cluster up to a certain order. order.
  11. 11. 11 Background. Local, average, and global clustering coefficients. HONS'18Austin R. Benson Second-order (classical) local clustering coefficient at node u. Second-order (classical) global clustering coefficient. Second-order (classical) average clustering coefficient. # # # # # #
  12. 12. 12 Higher-order (third-order) local, average, and global clustering coefficients. HONS'18Austin R. Benson Third-order local clustering coefficient at node u. Third-order global clustering coefficient. Third-order average clustering coefficient. # # # # # #
  13. 13. Theorem [Watts-Strogatz 98] 13 We can analyze higher-order clustering with small-world models. HONS'18Austin R. Benson • Start with n nodes and edges to 2k neighbors and then rewire each edge with probability p. n = 16 k = 3 p = 0 [Yin-Benson-Leskovec 18] [Watts-Strogatz 98]
  14. 14. 14 We can also analyze higher-order clustering in Gn,p. HONS'18Austin R. Benson Theorem [Yin-Benson-Leskovec 18] Everything scales exponentially in the order of the cluster coefficient... Even if a node’s neighborhood is dense, i.e., C2(u) is large, higher-order clustering still decays exponentially in Gn,p.
  15. 15. 15 Extremal combinatorics show relationships between clustering coefficients of different orders. HONS'18Austin R. Benson Theorem [Yin-Benson-Leskovec 18]
  16. 16. Local higher-order clustering coefficients hierarchically capture clique density in a node’s neighborhood. HONS'18Austin R. Benson 16 Theorem [Yin-Benson-Leskovec 18] The product of the first r - 1 local higher-order clustering coefficients is the r-clique density between the neighbors of node u.
  17. 17. Computation only requires clique participation counts. HONS'18Austin R. Benson 17 We can compute the rth-order HOCCs by enumerating r- and (r + 1)-cliques. Ka(u) is the number of a- cliques containing u.
  18. 18. 18 Higher-order clustering coefficients offer several advantages. HONS'18Austin R. Benson Theory & analysis. • Better understanding of small-world and Gn,p random graph models. • Extremal combinatorics for general graphs. Data Insights. • old idea ⟶ pretty much all real-world networks exhibit clustering. • new idea ⟶ real-world networks may only cluster up to a certain order. order.
  19. 19. 19 Neural connections (C. elegans) 297 nodes 2.15k edges Facebook friendships (Stanford3) 11.6k nodes 568k edges Coauthorships (arXiv ca-AstroPh) 18.8k nodes 198k edges http://www.wormatlas.org/hermaphrodite/ neuronalsupport/mainframe.htm HONS'18Austin R. Benson
  20. 20. Global clustering patterns varies widely across datasets. HONS'18Austin R. Benson 20 Neural connections Facebook friendships Coauthorships Not obviously due to cliques in coauthorship! High-degree nodes in co-authorships exhibit clique + star structure where C3(u) > C2(u). 0.32 0.33 0.36 increases with order 0.16 0.11 0.12 decreases and increases 0.18 0.08 0.06 decreases with order
  21. 21. Average higher-order clustering also varies widely. HONS'18Austin R. Benson 21 Neural connections 0.31 0.14 Random configurations 0.15 0.04 Random configurations (C2 fixed). 0.31 0.17 Facebook friendships 0.25 0.18 Random configurations 0.03 0.00 Random configurations (C2 fixed) 0.25 0.14 Coauthorships 0.68 0.61 Random configurations 0.01 0.00 Random configurations (C2 fixed). 0.68 0.60- - - statistically significantly less clustering statistically significantly more clustering Not significantly different clustering (using sampling tools from [Bollobás 1980; Milo+ 03; Park-Newman 04; Colomer de Simón+ 13])
  22. 22. Random samples concentrate in neural connections data. HONS'18Austin R. Benson 22 Random configurations [Bollobás 1980; Milo 2003] Random configurations with C2 fixed [Park-Newman 2004; Colomer de Simón+ 2013] Real network (C. elegans) -
  23. 23. Clustering in neural connections not just due to cliques. HONS'18Austin R. Benson 23 Original network Null model # 4-cliques 2,010 440 ± 68 C3 0.14 0.17 ± 0.004 4-clique count decreases in the null model, but the higher-order clustering coefficient increases. - Key reason. Clustering coefficients are normalized by opportunities to cluster.
  24. 24. Changes in higher-order clustering tend to be independent of the degree. HONS'18Austin R. Benson 24 Neural connections Facebook friendships Coauthorships
  25. 25. HONS'18Austin R. Benson 25 Local higher-order clustering gives a more nuanced view. Neural connections Gn,p baseline Upper bound Facebook friendships Coauthorships Dense but nearly random regions Dense and structured regions • Actual network data • Random configuration with C2 fixed - Hitting upper bound
  26. 26. HONS'18Austin R. Benson 26 Email Autonomous systems Average third-order clusteringNot significantly different clustering statistically significantly more clustering
  27. 27. We should keep higher-order clustering in mind when mining and modeling network data. HONS'18Austin R. Benson 27 1. Only using triangles gives a misleading notion of clustering. Some networks do not even exhibit clustering w/r/t larger cliques! → Are there models that capture higher-order clustering statistics? 2. Higher-order clustering coefficients and closure coefficients offer additional measures of network clustering. →We should plug these features into ML pipelines for network data. 3. We examined higher-order structure from dyadic data. →What happens if we use hypergraph data?
  28. 28. Higher-order clustering in networks. Thanks for your attention! HONS'18Austin R. Benson 28 Austin R. Benson http://cs.cornell.edu/~arb @austinbenson arb@cs.cornell.edu Yin, Benson, and Leskovec. Higher-order clustering in networks. Physical Review E, 2018. Code. github.com/arbenson/HigherOrderClustering.jl Slides. bit.ly/arb-HONS-18

×