Formulation of modularity factor for community detection applying


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Formulation of modularity factor for community detection applying

  1. 1. INTERNATIONALComputer Engineering and Technology ENGINEERING International Journal of JOURNAL OF COMPUTER (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online) IJCETVolume 4, Issue 2, March – April (2013), pp. 135-141© IAEME: Impact Factor (2013): 6.1302 (Calculated by GISI) © FORMULATION OF MODULARITY FACTOR FOR COMMUNITY DETECTION APPLYING MULTIRESOLUTION ON MOI ALGORITHM K. Senthil Kumar1, K. S. Suganthi2, C. Suchitra3, S. Sharmili4 1 (Assistant Professor, Department of Information Technology, SMVEC, Pondicherry, India) 2 (Department of Information Technology, SMVEC, Pondicherry, India) 3 (Department of Information Technology, SMVEC, Pondicherry, India) 4 (Department of Information Technology, SMVEC, Pondicherry, India) ABSTRACT Community structure is one of the most important properties in social networks, and has received an enormous amount of attention in recent years. Community Detection, a form of clustering, is a technique which is used for the discovery of the naturally occurring associations between vertices in a given network. Initially, algorithms were developed with the intention of detecting communities in static networks. This slowly evolved into detecting communities in dynamic environments as the nature of the network itself, in general, is dynamic. Community detection in dynamic networks with better performance and better accuracy is a problem for which the authors have proposed an idea involving the combination of two techniques: local community measurement of multi resolution applied in multi – objective immune algorithm, replacing the current local search strategy. Also, this proposal is phase one of the algorithm as the second phase is still under work, without which the complete solution cannot be resolved. Keywords: Community Detection, Graph Partition, Modularity, Networking 135
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEMEI. INTRODUCTION In recent years, the research on social networks is gaining utmost importance. Social networks are usually represented by graphs where nodes represent individuals and edges represent relationships and interactions among individuals. Community Detection is sometimes also referred to as clustering, but we largely avoid this term to prevent confusion. Both Graph Partitioning and community detection refer to the division of vertices in a network into groups or clusters or communities. Such groups are tightly knit with many edges inside groups and only a few edges between the groups. The ability to discover groups or cluster within a network proves to be a useful tool for revealing the network’s structure and organization at a larger scale than of a single vector. Clusters can also be defined as the group of vertices having association between the vertices based on certain identified similarity. The basic formation or the initial representation of communities was hierarchical clustering. The defect with hierarchical clustering was that they had overlapping communities in their clusters. This is a disadvantage because; the idea of community detection was developed so as to provide better understanding of the network structure and organization when viewed by a network administrator for its activities. But this is possible only if the communities are non – overlapping. Non – overlapping communities refer to those communities in which the vertices of one community are present solely in that community. In hierarchical clustering the vertices are clustered together but they do not form a coherent network upon completion. As we have overlapping communities in hierarchical clustering, it is not efficient as its latter developments of graph partitioning and community detection. Hence the methods of graph partitioning and community detection of networks came into existence. Graph partitioning is a classic problem of dividing the vertices of a network into non – overlapping groups of given sizes so that number of edges between groups is minimized. The main disadvantage in graph partitioning is that it is expected to specify the size of the community and the number of communities in the network. Community Detection is a method that searches for naturally occurring groups in a network without any regard to their size or number. This is used for discovering and understanding the large scale structure of networks. The advantage is that the number and the size of the communities are not fixed and the size of the community need not be specified. One of the most widely used methods is simulated annealing. But it has the drawback of being slow. Another general optimization method is genetic algorithm. It provides high quality results but is also very slow. This method is applicable to a network of few 100 vertices. The third method is by using greedy algorithm and it is simple. This gives a reasonable division of networks but the modularity values achieved are lower than those obtained by the previous methodologies. But the runtime is the best of any current algorithm. Multi-Objective evolutionary algorithm aims in the integration of dynamic and multi objective algorithms in dynamic environments. It uses genetic mechanism – locus based adjacency encoding scheme. Dynamic changing nature is not considered. A multi objective immune algorithm can solve the community detection problem in dynamic social networks. Its objective is to maximize the community quality and minimize the temporal cost. It can achieve better accuracy in community extraction and capture community evolution more faithfully. Online multi-resolution algorithm can identify both overlapping and non-overlapping communities. It is fast and scalable in large-scale networks. It can be used freely to acquire communities at any resolution. The contents mentioned above give an insight into the proceedings so far in the domain of community detection. The purpose of this paper is to resolve the disadvantages met in the previous methodologies followed for community detection. 136
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 0976 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEMEII. ARCHITECTURE DIAGRAM The proposed model for the implementation of multiresolution parameters in multi multiresolution objective immune algorithm consists of three modules. The initial input network is integrated with the first module where the modularity and betweenness values are calculated, the modularity is calculated by Newman – Girvan formula of, k  l  d i  2  ∑  i −   L   2 L    i=1   The basic calculation of betweenness centrality is done by the formula, formula n (n − 1 ) 2 The second module is associated with the identification of high similarity vertex pairs. This is done so as to segregate them separately from the other vertices in the chosen network. But the process does not stop there as the remaining vertices have to be segregated using the same methodology. It is a cyclic process until all similar vertices have been segregated. Fig. 1: Architecture Design of the Proposed Model : The process starts under three conditions. Consider Qn to be the modularity factor that has been calculated using the Newman – Girvan formula for modularity, we have, 137
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME • Qm <= -1, indicating that the community so formed will be a weak structure and can be easily subjected to frequent changes resulting in the instability. • Qm = 0, indicating that no more community structures can be extracted from the given network. • Qm >= 1, indicating that the structures so formed are strong and are not subjected to any frequent changes. In the third module, there exists the regrouping of the isolated vertices based on the high similarity of their respective modularity values. This finally gives us the required output network.III. ALGORITHM FOR MODULARITY CALCULATION In context with social networking that can be best cited for the concept proposed the parameters taken into consideration includes tags in the photos posted, comments received on the photos posted and related posts that have been shared or agreed upon. There are the parameters that help in calculating the parameter that helps in calculating the edges cost between two vertices. This helps in analyzing the strength of the link of the considered network. The algorithm given below has been developed with the Newman – Girvan as its base. Input: Text file containing the cluster Output: Detection if the cluster is strong or weak 1: Initialize i = 0, cluster_link = 0, net_link = 0 2: while ( ! EOF) 3: Initialize j = 0 4: while ( ! null) 5: Store the values in multi – dimensional array 6: Increment j by 1 7: Increment i by 1 8: Calculate total_cost for each vertex using the multi – dimensional array 9: if (total_cost [vertex] >= limit) 10: edge [vertex] = 1 11: else 12: edge [vertex] = 0 13: Initialize count to 0 14: for m = 1 to n do 15: if edge [m] = 1 16: Increment count by 1 17: if (count) not equal to 0 18: cluster_link = count 19: else 20: cluster does not exist 21: Obtain max (neighbour_values) and store as degree of the respective cluster 22: Repeat steps 1 to 21 for all existing clusters in the considered network 23: From step 22, obtain net_link value 24: Use cluster_link, net_link and degree values to calculate Qm 25: Check the Qm value to see if the cluster is weak or strong 26: Stop 138
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 0976 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME In the algorithm the multi – dimensional array stores the values of tags, comments and shares of that selected node. This is done in context with the social networking. These values are used to calculate the cost or edge cost between two vertices in a network. The o existence of a weak or strong edge is determined by the limit value that is set. When an edge exists as a strong edge between two vertices the edge count is incremented by one. When the edge count for a cluster is null then no community structure can be acquired out of it. But in existence of a valid edge count it is nothing but the number of links existing with the chosen cluster and its value is stored in cluster_link. After the th acquisition of the cluster_link, we also require the degree of the cluster. This can be acquired by choosing the maximum value out of all the neighbor values (neighbour_values). The net_link refers to the number of links present in the entire network. With the use of these values, the modularity factor can be calculated. thIV. CASE STUDY: SOCIAL NETWORKING (FACE BOOK) The social networking site taken into consideration for this particular case study is Facebook. In this, the authors have considered each member as a vertex of the network. As given below in Fig.1, Facebook represents the undeterminable dynamic network as 1, users tend to log in and log off at their interest. This is what provides the dynamicity of the network. Under such circumstances the network siz is unknown. size Fig. 2: Randomized Network Structure 2 From that a random network is chosen, say students of a particular university. Once this has been obtained, the next is the random selection of a cluster. In this cluster, each student is a separate vertex. The aim is to characterize them based on their activities, that is to say, detecting the communities under which the fall. Once the communities have 139
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME been detected for a particular cluster, the same procedure is followed with other clusters randomly chosen from the network. When all such communities from clusters have been detected, the network as a whole can be viewed based on its detected communities. Given below is a table, showing a set of sample data with respect to the considered scenario. Table 1: Vertices associated with Random Selected Vertex from the Social Networking site NUMBER NUMBER NUMBER TOTAL STRONG / OF OF VERTEX OF POSTS EDGE WEAK PICTURES COMMENTS SHARED COST EDGE TAGGED GIVEN Kethar 10 25 13 48 STRONG Raghubalan Sasikala 2 8 9 19 WEAK Palani Ankit Shah 13 5 8 26 STRONG Hismath 21 18 2 41 STRONG Begum Dinesh 5 1 6 12 WEAK Santhanam In the above table, the sample data shows the working of the proposed algorithm in determining the strength of the considered cluster of the network. From this, three edge connections are determined to be strong while two of them are determined to be weak, and are hence not considered for the community formation, as they exist as sole communities with a single vertex.V. CONCLUSION The algorithm proposed above, as mentioned in the abstract, is only the first phase towards achieving the objective of resolving the disadvantage of the multi objective immune algorithm. The second phase of this algorithm is under process, and will be submitted shortly. The second phase includes the proposal of an algorithm for the removal or segregation of vertices into their corresponding communities. Also, currently the tool being developed by the authors is for static purposes and work has been commenced for enhancing it for dynamic purposes, wherein the communities are at constant change depending upon the node activities. 140
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEMEREFERENCESBooks[1] M. E. J. Newman, Networks – An Introduction (Oxford)Journal Papers[2] Keehyung Kim, Ri McKay, and Byung – Ro Moon, Multiobjective EvolutionaryAlgorithms for Dynamic Social Netowork Clustering, ACM 2010.[3] Mao – Guo Gong, Ling – Jun Zhang, Jing Jing - Ma, and Li – Cheng Jiao,Community Detection in Dynamic Social Networks based on Multiobjective ImmuneAlgorithm, Journal of Computer Science and Technology, May 2012.[4] Jianbin Huang, Heli Sun, Yaguang Liu, Qinbao Song, and Tim Weninger, TowardsOnline Multiresolution Community Detection in Large – Scale Networks, August 2011.[5] M. E. J. Newman and M. Girvan, Finding and evaluating community structure innetworks, August 2003.[6] M. E. J. Newman, Fast algorithm for detecting community structure in networks,September 2003.[7] C. O. Dorso and A. D. Menus, Community Detection in Networks, InternationalJournal of Bifurcation and Chaos, 2010.[8] K. Sendil Kumar, K. S. Suganthi, C. Suchitra and S. Sharmili, An Analysis for theDetection of Network Communities in Dynamic Environments, International Journal ofApplied Information Sciences, February 2013. 141