Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the Separability of Structural Classes of Communities

4,384 views

Published on

Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demonstrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still significantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.

Published in: Education, Technology
  • Be the first to comment

On the Separability of Structural Classes of Communities

  1. 1. TextOn the Separability ofStructural Classes of CommunitiesBruno AbrahaoSucheta SoundarajanJohn HopcroftRobert Kleinberg Cornell University 1
  2. 2. The idea of community structure as distinctive relationships [Newman-Girvan, 2004] 2
  3. 3. Which community is real? 3
  4. 4. Which community is real? 3
  5. 5. Which community is real? Metis “Real” Community Random Walk Infomap Newman-Modularity Louvain 3
  6. 6. How do their structures differ? Metis “Real” Community Random Walk Infomap Newman-Modularity Louvain 3
  7. 7. The definition of community structure• Community structure is not well defined – different people have different notions of community structure• Traditional strategy 1. start with an expectation of what a community should look like • e.g., a set of nodes that interact more within the set than with the outside 2. define an optimization problem 3. design heuristic 4. the solution gives the desired communities 4
  8. 8. Two research questions that we address here• A multitude of different of algorithms – different objective functions – different heuristics How dissimilar are their outputs?• Communities may differ from the proposed mathematical constructs – e.g., preponderance of links to the outside, as in this figure, contrary to widely accepted notions Which algorithms extract communities that most closely resemble the structure of real communities? 5
  9. 9. Two research questions that we address here• A multitude of different of algorithms – different objective functions – different heuristics How dissimilar are their outputs?• Communities may differ from the proposed mathematical constructs – e.g., preponderance of links to the outside, as in this figure, contrary to widely accepted notions Which algorithms extract communities that most closely resemble the structure of real communities? 5
  10. 10. Obstacles to answering the preceding questions• We dont know what properties communities possess• We cant characterize communities in the absence of negative examples – Look at real communities and determine their structure – do other sets that are not communities have these properties? – every other connected set could be a negative example - intractable – sets that are not annotated could also be communities• We dont know what metrics we should use – modularity, conductance, clustering coefficient... 6
  11. 11. Our plan to address the preceding questions• Propose a methodology to analyze community structure by using different notions of communities as references – key idea: analyze community structure without requiring negative examples of communities• Scalable and comprehensive, simultaneously considering – multiple notions of communities – diverse domains of application – a broad spectrum of community metrics• Assess the structural dissimilarities between – the output of different community detection algorithms – the output of algorithms and real communities 7
  12. 12. Building structural classes by extracting examples Algorithm Network 8
  13. 13. Building structural classes by extracting examples Apply Algorithm Network 8
  14. 14. Building structural classes by extracting examples Apply Algorithm Network Extract community examples 8
  15. 15. Building structural classes by extracting examples Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 Algorithm k 9
  16. 16. Building structural classes by extracting examples Algorithm 1 Class 1 Algorithm 2 Class 2 Algorithm 3 Class 3 Class 4 Algorithm 4 Algorithm k Class k 9
  17. 17. Building a feature space by characterizing examples Labeled Example 10
  18. 18. Building a feature space by characterizing examples Labeled Example Feature Vector 10
  19. 19. Building a feature space by characterizing examples 11
  20. 20. Building a feature space by characterizing examples Feature Space 11
  21. 21. Measure inter-class separability from feature space Feature Space 12
  22. 22. Measure inter-class separability from feature space Class Separability Measure Feature Space 12
  23. 23. Measure inter-class separability from feature space Are the classes separable? Class Separability Measure Separability = Distinct structures Feature Space 12
  24. 24. We test our methods on large-scale network datasets • Social + Rice University • Commercial • BiologicalFacebook+Rice with permission of Mislove et al.. Other datasets publicly available. 13
  25. 25. We furnish our framework with10 community detection algorithms • BFS (Random connected subgraphs) • Random-Walk-based (with and without restart) • (α,β)-communities • InfoMap • Markov Clustering • Metis • Louvain • Newman-Clauset-Moore • Link Communities 14
  26. 26. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  27. 27. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  28. 28. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  29. 29. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  30. 30. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  31. 31. Annotated communities identify exemplar communities Metadata included in the datasets + Rice University 15
  32. 32. To what extent are the classes separable? 16
  33. 33. First separability measure: Scatter Matrices• Traditional methods for measuring class separability give a single score, e.g., scatter matrices Network Reference: the same data with shuffled labels• This is a global measure. We need more fine-grained separability information of each class! 17
  34. 34. Idea: use the performance of probabilistic multi-class classifiers Train Algorithm 1 Probabilistic k-way classifier Algorithm 2 (SVM, k-NN) Annotated communities 18
  35. 35. Idea: use the performance of probabilistic multi-class classifiers Train Algorithm 1 Probabilistic k-way classifier Algorithm 2 (SVM, k-NN) Annotated communities 18
  36. 36. Idea: use the performance of probabilistic multi-class classifiers Classify (cross-validation) Probabilistic k-way classifier (SVM, k-NN) 19
  37. 37. Idea: use the performance of probabilistic multi-class classifiers Classify (cross-validation) Probabilistic k-way classifier (SVM, k-NN) 19
  38. 38. Idea: use the performance of probabilistic multi-class classifiers Classify (cross-validation) Probabilistic k-way classifier (SVM, k-NN) Pr(Algorithm 1) = 0.05 Pr(Algorithm 2) = 0.08 ... Pr(Annotated) = 0.48 19
  39. 39. Cross-validation performance indicates class separability BFS BFS RW0 RW0 RW15 RW15 AB ABStructural Class IM IM LC LC Louv. Louv. Newm. Newm. MCL MCL Metis Metis Ann. Ann 0.0 0.2 0.4 0.6 0.8 1.0 Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network. 20
  40. 40. Matching annotated communities Which algorithms extract communities that most closely resemble the structure of annotated communities? 21
  41. 41. Repeat the preceding experiment, leaving out the class of annotated communities Learn Algorithm 1 Probabilistic k-way Algorithm 2 classifier Algorithm N 22
  42. 42. Repeat the preceding experiment, leaving out the class of annotated communities Learn Algorithm 1 Probabilistic k-way Algorithm 2 classifier Algorithm N 22
  43. 43. Introduce the class of annotated communities in the test set Classify Probabilistic k-way classifier 23
  44. 44. Introduce the class of annotated communities in the test set Classify Probabilistic k-way classifier 23
  45. 45. Introduce the class of annotated communities in the test set Classify Probabilistic k-way classifier Pr(Algorithm 1) = 0.02 Pr(Algorithm 2) = 0.19 ... Pr(Algorithm k) = 0.12 23
  46. 46. Classification reveals that annotated resemble unstructured methods grad BFS RW0 Ugrad RW15 SC AB HS Network IM Fly LC Amazon Louv. DBLP Newm. LJ1 MCL LJ2 Metis 0.0 0.2 0.4 0.6 0.8 1.0 Probabilistic-SVM classification of annotated communities into 11 structural classes structural class for 9 different networks. 24
  47. 47. Improving the quality of the space What classes should we consider? 25
  48. 48. The classifier confuses the two types of RW communities BFS BFS RW0 RW0 RW15 RW15 AB ABStructural Class IM IM LC LC Louv. Louv. Newm. Newm. MCL MCL Metis Metis Ann. Ann 0.0 0.2 0.4 0.6 0.8 1.0 Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network. 26
  49. 49. Fisher’s discriminant ratio A Separability Framework for Analyzing Community Structure, Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, Robert Kleinberg,To appear in ACM Transactions on Knowledge Discovery from Data (TKDD), 2013 27
  50. 50. Fisher’s discriminant ratio A Separability Framework for Analyzing Community Structure, Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, Robert Kleinberg,To appear in ACM Transactions on Knowledge Discovery from Data (TKDD), 2013 27
  51. 51. Cross-validation performance indicates class separability BFS BFS RW0 RW0 RW15 RW15 AB ABStructural Class IM IM LC LC Louv. Louv. Newm. Newm. MCL MCL Metis Metis Ann. Ann 0.0 0.2 0.4 0.6 0.8 1.0 Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network. 28
  52. 52. Cross-validation performance indicates class separability Structural Class 29
  53. 53. Classification reveals that annotated resemble unstructured methods grad BFS RW0 Ugrad RW15 SC AB HSNetwork IM Fly LC Amazon Louv. DBLP Newm. LJ1 MCL LJ2 Metis 0.0 0.2 0.4 0.6 0.8 1.0Probabilistic-SVM classification of annotated communities into 11structural classes structural class for 9 different networks. 30
  54. 54. Classification reveals that annotated resemble unstructured methods Network 31
  55. 55. Can we reveal latent similarities amongcommunity detection algorithms?Our framework enables one to cluster algorithms that behavesimilarly 32
  56. 56. Step 1: identifying the most important features 7 features out of 36 retain the discriminative power of the full set 33
  57. 57. Grouping algorithms by their tendencieswith respect to most discriminative features High Medium Low 34
  58. 58. Grouping algorithms by their tendencieswith respect to most discriminative features High Medium Low 34
  59. 59. Conclusion of methodology• We present a methodology to address the complexity of analyzing community structure, which simultaneously considers – large number of algorithms – multiple domains of application – a broad spectrum of metrics to characterize community structure• A scalable framework that enables – researchers to compare and understand biases of new and existing community detection algorithms – practitioners to decide on the most suitable algorithm for particular purpose and network 35
  60. 60. Conclusion of experimental analysis• Our experimental analysis, which include 10 community detection algorithms and 9 different networks analyzed with 36 properties reveals – High variability among the output of community detection methods – Annotated communities have a distinct structure from what we expect • their structure is closer to the output of baseline procedures than to that of popular algorithms – A small set of features explain the biases produced by different algorithms – We can organize the tapestry of available community detection algorithms by grouping them with respect to similarities in behavior 36
  61. 61. Final remarks on future directions• Traditional methods are unsupervised – they find a particular type of community – little sensitivity to different purposes, structures of interest and domains of application• Our approach suggests a supervised approach to community detection – user specifies what they intended to find through examples (real or synthetic) – algorithm learns from those examples and retrieves similar structures in the network 37
  62. 62. Thank you!On the Separability ofStructural Classes of CommunitiesBruno AbrahaoSucheta SoundarajanJohn HopcroftRobert Kleinberg Cornell University 38

×