Successfully reported this slideshow.

Entropy based algorithm for community detection in augmented networks

2,003 views

Published on

This is the presentation I've made in the CASoN 2011 (Computation Aspects of Social Networks) in Salamanca, Spain.

Published in: Technology
  • Be the first to comment

Entropy based algorithm for community detection in augmented networks

  1. 1. mm 40 60 80 100 120 Entropy Based40 Community Detection in Augmented Social Networks60 J. Cruz1 1 LUSSI C. Bothorel1 F. Poulet2 Department Telecom – Bretagne France 2 IRISA80 Rennes 1 University France
  2. 2. Outline mm 40 60 80 100 120 1 Introduction Motivation Related Work 40 Augmented Networks 2 Clustering Algorithm 60 3 Experiments and Results 4 Conclusions 80page 2 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  3. 3. MotivationIntroduction Motivation mm 40 60 80 100 120 A social network is composed of actors, An augmented network: persons or organizations, and the links between them. Social networks have been “simplified” 40 to fit into graph structures, leaving behind any additional information... Node Attributes That information correspond to the semantic, and yet social, aspects of the 60 network The question is: How we can use both, the graph and the social information 80 to detect communities? page 3 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  4. 4. Related WorkIntroduction Related Work mm 40 60 80 100 120 Data Clustering Unsupervised clustering algorithms using some (dis)similarity measure between points in some n−dimensional space. Hierarchical clustering [1]. 40 k −means, fuzzy c−means [1]. Self–organizing maps [2] Communities Detection Algorithms designed to find community structures in graphs using 60 information from edges. Modularity optimization: Newman [3], Blondel [4] . . . Overlapping communities using GAs Pizzuti [5] . . . Community detection using attributes and structural information 80 [6]. page 4 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  5. 5. Quality Measures / Data TypesIntroduction Related Work mm Type 40 Objective 60 80 100 Examples 120 Reduce the distance between Manhattan L1 the members of the same group Euclidean L2 Data while the distance between Chebyshev L∞ 40 groups is increased. Entropy H Increase the number of edges Coverage γ within each community while Conductance ϕ Graphs the number of edges between Performance perf communities is reduced. Modularity Q 60 The selected measures: Entropy measures the disorder of Modularity measures the fraction each group: the more similar the of edges falling into the groups objects, more ordered is the group minus the minimum number of 80 (a.k.a less entropy). expected edges between nodes [7]. page 5 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  6. 6. Semantic InformationIntroduction Augmented Networks mm 40 60 80 100 120 Given an augmented network G (V , E, FV ): ∗ Given a subset of features of the nodes: FV ∈ P (FV ), Each node is associated with a vector ξ of f attributes. ξ ∈ Rf 40 The union of all the vectors ξFV is the vectorial ∗ representation of the node set: ASFV∗ 60 Node Attr 1 Attr 2 . . . Attr f The attributes set AS is 1 ξ11 ξ12 ··· ξ1f the matricial representa- 2 ξ21 ξ22 ··· ξ2f tion of the augmented in- . . . . . . .. . . formation from the net- . . . . . 80 work. n ξn1 ξn2 ··· ξnf page 6 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  7. 7. Data EntropyIntroduction Augmented Networks mm 40 60 80 100 120 Given a group C of N = |C| elements, the entropy H (C) of the group is given by: N−1 N 40 H (C) = − sij ln sij + 1 − sij ln 1 − sij i=1 j=i+1 where sij is a similarity measure of nodes i and j . Similarity measures? 60 Entropy measures the (dis)order of a partition, however it is necessary to calculate the distance between the nodes. This is made using metrics like the Cosine distance and the Jaccard distance among others. 80 page 7 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  8. 8. General ArchitectureClustering Algorithm mm 40 60 This80 the general architec- is 100 120 Augmented ture of the algorithm, which Network finds communities using struc- tural and semantic criteria at 40 the same time extracted from G (V , E, FV ) 60 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  9. 9. General ArchitectureClustering Algorithm mm 40 60 80 100 120 Modularity Augmented G Optimization Network Using the social graph G (V , E) First Step the algorithm finds a first parti- 40 First tion C0 with optimal modularity Partition C0 60 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  10. 10. General ArchitectureClustering Algorithm mm 40 60 80 100 120 . 40 . . 60 Using the structure of The algorithm takes a ...the node is assigned the social network in random node and put to that community; the which each node is a it into a random node is returned community. community, if the otherwise. The result 80 movement increases is the partition C0 the modularity... page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  11. 11. General ArchitectureClustering Algorithm mm 40 60 Modularity The 80entropy optimization 120 100 al- Augmented G Optimization gorithm uses the partition C0 Network First Step as initial configuration and the PoVFV from the augmented ∗ 40 First network to move nodes across Partition the groups. C0 Entropy 60 ASFV ∗ Entropy Op- Partition timization CH 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  12. 12. General ArchitectureClustering Algorithm mm 40 60 80 100 120 . 40 . . 60 Given an initial Take a random point ...take the point back partition C0 from the and insert it into a to its original group first step of the random group. If the otherwise. The result modularity entropy is reduced, is the partition CH 80 optimization... leave the point in its new group... page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  13. 13. General ArchitectureClustering Algorithm mm 40 60 80 100 120 Modularity Augmented G The partition CH has the same Optimization Network number of groups as C0 but First Step with a different configuration. 40 First The modularity optimization al- Partition gorithm will continue with CH . C0 Entropy 60 ASFV ∗ Entropy Op- Community Partition timization Aggregation CH 80 Final Partition Ck page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  14. 14. General ArchitectureClustering Algorithm mm 40 60 80 100 120 Ent ropy Opt imizat ion 40 Communit y Det ect ion Communit y 60 Aggregat ion 80 Adapted from [4] page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  15. 15. Experimental SetupExperiments and Results mm 40 60 80 100 120 Data used: Each graph in this data set The graph contains 6386 nodes contains a set of semantic and 435324 edges. Has an initial information for each node: 40 modularity of −2.8629 × 10−4 . Student faculty In each case, the initial entropy has been calculated using different Gender criteria: Major 60 AS Feature H0 Classes 1 Gender 0.2286 3 Second major/minor 2 Major 0.2318 77 House 30 executions of the experiments were performed for each point of view. Year 80 High school page 9 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  16. 16. ResultsExperiments and Results mm 40 60 80 100 120 There is a compromise between the entropy and the modularity. There are 7 communities for each attribute set AS: 40 From 3 classes in AS1 • • From 77 classes in AS 2 Results – Measures AS Exp. Average Q Average Entropy 60 AS1 CFU 0.4180 (±0) 0.2286 (±0) CFU+Ent 0.2565 (±0.006065) 0.1381 (±0.0025741) AS2 CFU 0.4180 (±0) 0.2318 (±0) CFU+Ent 0.2440 (±0.004242) 0.1356 (±0.001493) 80 page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  17. 17. ResultsExperiments and Results mm 40 60 80 100 120 r y nde use cult jor or ar . H. S Min Ye Ge Ma Fa Ho Results – Rand Index Pair Rand Index AS∅ − ASGender 0.4232 40 AS∅ − ASMajor 0.3070 ASGender − ASMajor 0.3919 Each partition configuration is r different for each attribute set nde jor Ma Ge 60 Non–topological information change the result of the clustering process 80 page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  18. 18. Algorithm Complexity ConsiderationsExperiments and Results mm 40 60 80 100 120 Algorithm Execution Time The complexity of entropy 60000 calculation is, in general 50000 O n2 × f (n points and f features). Execution Time (ms) 40 40000 Using only the contribution of 30000 a point to the group entropy, 20000 the complexity is reduced to a near–linear behavior. 10000 60 Using a fixed number of 0 0 20 40 60 80 100 nodes (6386) and varying Number of Features only the number of features Simple Matching Coefficient Cosine Distance this linear behavior is observed. 80 page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  19. 19. Algorithm Complexity ConsiderationsExperiments and Results Algorithm Memory Usage mm 40 60 900 80 100 120 800 In general, the memory Memory Used (Mb) 700 usage is linear, however, the 600 SMS graph is stepper than 500 the cosine distance. 40 400 300 For the SMC near the 40 0 20 40 60 80 100 features, the memory used SMC Number of Features Memory Baseline grows, coinciding with the 900 execution time increase. 60 800 The behavior of the graphs is Memory Used (Mb) 700 due to the Java’s memory 600 management system. 500 Anyway, the usage never 400 explodes. 80 300 0 20 40 60 80 100 Number of Features Cosine Distance Memory Baseline page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  20. 20. ConclusionsConclusions mm 40 60 80 100 120 Each type of information in the augmented network has different representations different and measures of similarity: those measures behave oppositely. A entropy based algorithm has been proposed to cluster 40 an augmented network. Using different points of view it is possible to have different partition configuration from the same social graph. The overall complexity of the algorithm is linear on the 60 number of features used to calculate the entropy. The memory used increases although it does not explode when the number attributes is increased. 80 page 12 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  21. 21. mm 40 60 80 100 120 Thank you. 40 Do you have any questions? 60 80page 13 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  22. 22. Bibliography IAppendix Bibliography mm 40 60 80 100 120 M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms. Wiley-IEEE Press, 1 ed., Oct. 2002. 40 T. Kohonen, Self-Organizing Maps. Springer, 1997. M. E. Newman, “Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality.,” Physical 60 Review. E, Statistical Nonliner and Soft Matter Physics, vol. 64, p. 7, July 2001. 80 page 14 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  23. 23. Bibliography IIAppendix Bibliography mm 40 60 80 100 120 V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008 (12pp), 2008. 40 C. Pizzuti, “Overlapped community detection in complex networks,” in GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (New York, NY, USA), pp. 859–866, ACM, 2009. 60 Y. Zhou, H. Cheng, and J. X. Yu, “Graph clustering based on structural/attribute similarities,” Proc. VLDB Endow., vol. 2, pp. 718–729, August 2009. 80 page 15 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  24. 24. Bibliography IIIAppendix Bibliography mm 40 60 80 100 120 M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical Review. E, Statistical Nonliner and Soft Matter Physics, vol. 69, p. 026113, Feb 2004. 40 T. Li, S. Ma, and M. Ogihara, “Entropy-based criterion in categorical clustering,” in Proceedings of the twenty-first international conference on Machine learning, ICML ’04, (New York, NY, USA), pp. 68–, ACM, 2004. 60 80 page 16 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  25. 25. Entropy Minimization Algorithm [8]Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  26. 26. Entropy Minimization Algorithm [8]Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  27. 27. Entropy Minimization Algorithm [8]Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  28. 28. Entropy Minimization Algorithm [8]Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  29. 29. Entropy Minimization Algorithm [8]Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection

×