SlideShare a Scribd company logo
1 of 154
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
1
A Project Thesis on
Local Cluster-Based Community Detection Algorithm
Project work submitted in partial fulfillment of the requirements for the award of
the degree of
MASTER OF COMPUTER APPLICATIONS
in
COMPUTER SCIENCE AND ENGINEERING
Submittedby
Regalla Sairam Reddy
[Roll No.17021F0005]
Under the supervision of
DR.M.H.M KRISHNA PRASAD
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA
2017-2020
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA 2017-2020
DECLARATION BY THE STUDENT
I hereby declare that the work described in this thesis, entitled β€œA
LiteratureStudy and Local Cluster-Based Community Detection Algorithm”,
which is being submitted by me in partial fulfillment of the requirements for the
award of degree of Master Of Computer Applications in the department of
Computer Science & Engineering to the University College of Engineering
(Autonomous),Jawaharlal Nehru Technological University, Kakinada – 533003,
A.P., is the result of investigations carried out by me under the guidance of
Dr.M.H.M Krishna Prasad, Assistant professor of Computer Science and
Engineering, University College of Engineering, Jawaharlal Nehru Technological
University, Kakinada-533003.
The work is original and has not been submitted to any other University or
Institute for the award of any degree or diploma.
Place: Kakinada Signature:
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
3
Date: Name: Regalla Sairam Reddy
Roll No:17021F0005
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA 2017-2020
CERTIFICATE FROM THE SUPERVISOR
This is to certify that the thesis entitled β€œA Literature Study and Local Cluster-
Based Community Detection Algorithm”, that is being submitted by Regalla
SairamReddy, Roll No.17021F0005, in partial fulfillment of the requirements for
the award of degree of Master of Computer Applications in the Department of
Computer Science &Engineering to the University College of Engineering
(Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of
bona fide project work carried out by him under my guidance and supervision.
Signature of the Supervisor
Dr.M.H.M Krishna Prasad
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
4
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA2017-2020
CERTIFICATE FROM THE HEAD OF THE DEPARTMENT
This is to certify that the thesis entitled β€œA Literature Study and Local Cluster-
Based Community Detection Algorithm”, that is being submitted by Regalla
Sairam Reddy, Roll No.17021F0005, in partial fulfillment of the requirements for
the award of degree of Master of Computer Applications in the Department of
Computer Science &Engineering to the University College of Engineering
(Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of
bona fide project work carried out by him at our department.
Signature of Head of the Department
Dr. D. Haritha
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
5
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
6
ACKNOWLEDGEMENTS
The successful completion of any task is not possible without proper
suggestion, guidance and environment. Combination of these three factors acts
like backbone to my β€œA Literature Study and Local Cluster-Based Community
Detection Algorithm” project.
I wish to express my sincere gratitude to Dr. M.H.M. Krishna Prasad,
Professor, Department of CSE, University College of Engineering, Jawaharlal
Nehru Technological University Kakinada for giving me his guidance, immense
support and valuable suggestions during my project.
I would like to express my grateful thanks to Dr. D. Haritha, Professor and
Head of the Department of CSE, University College of Engineering, Jawaharlal
Nehru Technological University Kakinada whose immense support encouraged
completing the project successfully.
My Sincere thanks to all the teaching and non-teaching staff of Department
of Computer Science and Engineering for their support throughout my project
work.
Finally, I would like to thank my family and friends for their support and
motivation, which helped me to complete the project successfully.
Regalla Sairam Reddy
Roll No: 17021F0005,
Master of Computer Applications,
University College of Engineering Kakinada,
Jawaharlal Nehru Technological University,
Kakinada – 533003,
East Godavari District,A.P.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
7
CONTENT
Acknowledgement 5
Abstract 6
Contents 7
List of Tables 8
List of Figures 9
CHAPTER -1 INTRODUCTION 1-23
1.1 What Is Software 1
1.2What Is Software Development Life Cycle
Bug Prediction 2
CHAPTER-2
LITERATURE REVIEW
2.1 Related Work5
CHAPTER 3
PROBLEM IDENTIFICATION AND OBJECTIVE
3.1 Problem Statement 8
3.2 Project Objective 8
CHAPTER 4
METHODOLOGY
4.1 Methodology 9
4.1.1using Python Tool On Standalone Machine Learning
Environment 9
4.2 Data Description 10
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
8
4.3 Evaluation Criteria Used For Classification 11
4.3.1 Confusion Matrix
12
4.3.2 Accuracy And Precision
12
4.3.3 Recall And F-Square
13
4.3.4 Sensitivity, Specificity And Roc 13
4.3.5 Significance And Analysis Of Ensemble Method In
MachineLearning 14
4.4 Uml Diagrams 15
4.4.1 Use Case Diagram 16
4.4.2 State Diagram 17
CHAPTER 5
OVERVIEW OF TECHNOLOGIES
5.1AlgorithmsUsed 17
5.1.1 Decision Tree Induction 17
5.1.2 NaΓ―ve Bayes 21
5.1.3 Artificial Neuralnetwork 22
5.1.4 Support Vector Machine Model 25
5.1.5 Kernal Functions 28
5.2 Tensorflow 28
CHAPTER 6
IMPLEMENTATION AND RESULTS
6.1 Framework Design 31
6.2 Coding And Testing 34
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
9
CHAPTER 7
CONCLUSION40
Reference 42
LIST OF TABLES:
Table No Table Name Page No
1 Cluster based community
detection
27
2 Generic algorithms based
community detection
30
3 Label propagation based
community detection
31
4 Clique based methods for
overlapping community detection
35
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
10
LIST OF FIGURES:
Figure No Figure Name Page No
5.1.1 Decision Tree induction 17
5.1.2 NaΓ―ve Bayes Algorithm 21
5.1.3 Artificial Neural Networks
Algorithm
23
5.1.4(i) Support Vector Machine
Hyperplane
25
5.1.4(ii) Support Vector Machine
Hyperplane
26
5.2 TensorFlow in UML 29
4.3.1 Use Case Diagram
4.3.2 State Diagram
4.1.1 Block Diagram Of Proposed
Work
4.2.5 Classification Model
6.1 Framework Design
6.2 Bar graph of
knn,svm,naviebayes
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
11
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
12
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
13
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
14
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
15
ABSTRACT
In order to discover the structure of local community more effectively,
this paper puts forward a new local community detection algorithm based on
minimal cluster. Most of the local community detection algorithms begin
from one node. The agglomeration ability of a single node must be less than
multiple nodes, so the beginning of the community extension of the
algorithm in this paper is no longer from the initial node only but from a
node cluster containing this initial node and nodes in the cluster are
relatively densely connected with each other. The algorithm mainly includes
two phases. First it detects the minimal cluster and then finds the local
community extended from the minimal cluster. Experimental results show
that the quality of the local community detected by our algorithm is much
better than other algorithms no matter in real networks or in simulated
networks.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
16
CHAPTER 1
INTRODUCTION
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
17
Introduction to Community Detection:
Recently, many researchers notice that the complex network is a
proper tools to describe variety of complex system in the real world , and
thus the complex network has attracted the great attention in many fields
such as physics, biology and social network et al. In complex network field,
one of the important topology property is community structure which
comprise of densely connected nodes, and some researchers have found that
detecting community structure can reveal some valuable insights of the
functional feature in the complex system . For example, communities in
multimedia social network may imply people with the same hobby and trust
relationship. Zhiyong Zhang et al. proposed an approach to analyse and
detect credible potential path based on community in multimedia social
networks , the approach can effectively and accurately mine potential paths
of copyrighted digital content sharing. Zhiyong Zhang et al. also proposed a
trust model based on small world theory which shows the widely application
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
18
of community struction The community structure in biology field may
cluster proteins with the same function. So that many methods have been
proposed to reveal this topological property in complex networks.
Community detection on complex networks has been a hot research field.
Recently, a large number of algorithms for studying the global structure of
the network are proposed, such as the modularity optimization algorithms ,
the spectral clustering algorithms , the hierarchical clustering algorithms ,
and the label propagation algorithms . However, with the continuous
expansion of complex networks, it is easy to collect large network dataset
with millions of nodes. How to store such a large-scale dataset in computer
memory to analyze is a huge challenge for scholars. The calculation for
studying the overall structure of this kind of large-scale networks is
unimaginable. So local community detection becomes an appealing problem
and has drawn more and more attention The main task of local community
detection is to find a community using the local information of the network.
Local community detection has good extensibility. If the local community
detection algorithm is iteratively executed, more local communities can be
found and the whole community structure of the network can be obtained.
The time complexity of this kind of global community detection algorithm is
dependent on the efficiency and accuracy of local community detection
algorithms, so the research of local community detection algorithm still has
a long way to go. There are several problems that need to be solved in the
research of local community detection. First, we should determine the initial
state and find the initial node for local community detection, so as to
determine the needed local information; then, we need to select an objective
function, and through continuous iterative optimization of the objective
function we find the community structure with high quality; after that we
need to find a suitable node expansion method, so that the algorithm can
extract the local community from the initial state step by step; finally, in
order to terminate the algorithm, a suitable termination condition is needed
to determine the boundary of the community.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
19
Most of local community detection algorithms are based on the above-
mentioned process. The definition of local community detection is to find the
local community structure from one or more nodes, but most of the existing
local community detection algorithms, including Clauset , LWP , and LS , are
starting from only one initial node. They greedily select the optimal nodes
from the candidate nodes and add them into the local community. LMD
algorithmextends not from the initial node but from its closest and next
closest local degree central nodes. It discovers a local community from each
of these nodes, respectively. It still starts from single node and discovers
many local communities for the initial node. In general, the aggregation
ability of a single node is lower than that of multiple nodes. So we do not
just rely on the initial node as the beginning of local community expansion.
Our primary goal is to find a minimal cluster closely connected to the initial
node and then detect local community based on the minimal cluster. This
can avoid instability because of the excessive dependence on the initial node.
In this paper, we introduce a local community detection algorithm based on
the minimal clusterβ€”NewLCD. In this new algorithm, the beginning of
community expansion is no longer from the initial node only, but a cluster of
nodes relatively closely connected to the initial node. The algorithm mainly
consists of two parts: one is the detection of the minimal cluster, and the
other is the detection of the local community based on the minimal cluster.
At the same time, the algorithm can be applied to the global community
detection. After finding one local community using this algorithm, we can
repeat the process to obtain the global community structure of the whole
network.
Community Detection
The concept of community detection has emerged in network science
as a method for finding groups within complex systems through represented
on a graph. In contrast to more traditional decomposition methods which
seek a strict block diagonal or block triangular structure, community
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
20
detection methods find subnetworks with statistically significantly more
links between nodes in the same group than nodes in different groups
(Girvan and Newman, 2002). Central to community detection is the notion
of modularity, a metric that captures this difference:
(1)Q=12mβˆ‘i,jAijβˆ’kikj2mΞ΄gigj
Here, Q is the modularity, Aij is the edge weight between nodes i and j, ki is
the total weight of all edges connecting node i with all other nodes, and m is
the total weight of all edges in the graph. The Kronecker
delta function Ξ΄(gi,gj) will evaluate to one if nodes i and j belong to the same
group, and zero otherwise. Modularity is a property of how one decides to
partition a network: networks that are not partitioned and those that place
every node in its own community will both have modularity equal to zero.
The goal of community detection, then, is to find communities that maximize
modularity. Although modularity maximization is an NP-hard integer
program, many efficient algorithms exist to solve it approximately,
including spectral clustering (Newman, 2006) and fast unfolding (Blondel et
al., 2008). Figure 1 shows a few networks with increasing maximum
modularity; note how the community structure becomes increasingly
apparent as this value increases. Previous efforts in our group have applied
community detection to chemical plant networks by creating an equation
graph of the corresponding dynamic model (Moharir et al., 2017). By doing
so, communities of state variables, inputs, and outputs can be obtained
which are tightly interacting amongst themselves but weakly interacting with
other communities. As such, these communities can form the basis of
distributed control architectures (Jogwar and Daoutidis, 2017) which
typically perform better than other distributed control architectures that one
may obtain from β€œintuition” (Pourkargar et al., 2017). For a more
comprehensive review of the use of community detection in distributed
control, we refer the reader to Daoutidis et al., 2017. An alternative method
for finding communities for distributed model predictive control is to apply a
decomposition on the optimization problem as a whole (Tang et al., 2017).
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
21
Figure 1. Networks of different maximum modularity.
However, community structure can exist in all optimization problems, thus it
makes sense to extend this method to any generic optimization problem, and
this work proposes to do so. The advantage of using community detection to
find decompositions is that subproblems generated will have statistically
minimal interactions, through complicating variables or constraints, and
thus require minimal coordination through the decomposition solution
method. The proposed method is generic, applicable to any optimization
problem or decomposition solution approach, and scalable, using
computationally efficient graph theory algorithms.
Automatically identifying groups based on network
clustering algorithms:
NodeXL can automatically identify groups within a network based
solely on network structure. In contrast to the approach of using existing
data about the attributes as used in Section 7.2.3, this approach is based
solely on who is connected to whom. A number of different network
β€œclustering” (also known as β€œcommunity detection”) algorithms exist, which
help find subgroups of highly inter-connected vertices within a network.
NodeXL includes three such algorithms: Clauset-Newman-Moore, Wakita-
Tsurumi [3], and Girvan-Newman (which can take a long time to run on
large graphs). In all of these algorithms, the number of clusters is not
predetermined; instead the algorithm dynamically determines the number it
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
22
thinks is best. Each vertex is assigned to exactly one cluster, meaning that
clusters do not overlap. The number of vertices in each cluster can vary
significantly. In some cases, a single cluster can encompass all vertices,
whereas in other cases, a cluster can consist of a single vertex. See
Newman [4] for background on some of these and other community
identifying algorithms.
There is no β€œright” or β€œwrong” algorithm to use; instead, it is often useful to
try out different ones and see which ones you believe provide the best results
given your network. For example, in this network, the Clauset-Newman-
Moore algorithm results in fewer, larger groups than the other algorithms,
which provide more groups of a smaller size. Try applying the Wakita-
Tsurumi clustering algorithm by clicking on the Groups dropdown menu in
the NodeXL ribbon and choosing Group by Cluster and the checking the
appropriate selector as shown in Figure 7.14. Notice that the data on the
Groups worksheet is now updated to reflect the new groups.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
23
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
24
A social network for an individual is created with his/her interactions and
personal relationships with other members in the society. Social networks
represent and model the social ties among individuals. With the rapid
expansion of the web, there is a tremendous growth in online interaction of
the users. Many social networking sites, e.g., Facebook, Twitter etc. have
also come up to facilitate user interaction. As the number of interactions
have increased manifold, it is becoming difficult to keep track of these
communications. Human beings tend to get associated with people of similar
likings and tastes. The easy-to-use social media allows people to extend their
social life in unprecedented ways since it is difficult to meet friends in the
physical world, but much easier to find friends online with similar interests.
These real-world social networks have interesting patterns and properties
which may be analysed for numerous useful purposes. Social networks have
a characteristic property to exhibit a community structure. If the vertices of
the network can be partitioned into either disjoint or overlapping sets of
vertices such that the number of edges within a set exceeds the number of
edges between any two sets by some reasonable amount, we say that the
network displays a community structure. Networks disp+laying a community
structure may often exhibit a hierarchical community structure as well1 .
The process of discovering the cohesive groups or clusters in the network is
known as community detection. It forms one of the key tasks of Social
network analysis2 . The detection of communities in social networks can be
useful in many applications where group decisions are taken, e.g.,
multicasting a message of interest to a community instead of sending it to
each one in the group or recommending a set of products to a community.
The applications of community detection have been highlighted towards the
end of the article. State of the art in community detection research for social
networks is presented in this work. The paper begins with the basic concepts
of social networks and communities. Various methods for community
detection are categorised and discussed in the next section followed by list of
standard datasets used for analysis in community detection research along
with the links for download if available online. Some potential applications of
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
25
community detection in social networks are briefly described in the next
section. Discussion section argues the advantages of using a method with
respect to another, the kind of community structure they obtain, etc. and
the conclusion section concludes the paper. BASIC CONCEPTS SOCIAL
NETWORK A social network is depicted by social network graph 𝐺 consisting
of 𝑛 number of nodes denoting 𝑛 individuals or the participants in the
network. The connection between node 𝑖 and node 𝑗 is represented by the
edge 𝑒𝑖𝑗 of the graph. A directed or an undirected graph may illustrate these
connections between the participants of the network. The graph can be
represented by an adjacency matrix 𝐴 in which 𝐴𝑖𝑗 = 1 in case there is an
edge between 𝑖 and 𝑗 else 𝐴𝑖𝑗 = 0. Social networks follow the properties of
complex networks3,4 . Some real life examples1 of social networks include
friends based, telephone, email and collaboration networks. These networks
can be represented as graphs and it is feasible to study and analyse them to
find interesting patterns amongst the entities. These appealing prototypes
can be utilized in various useful applications. Community A community can
be defined as a group of entities closer to each other in comparison to other
entities of the dataset. Community is formed by individuals such that those
within a group interact with each other more frequently than with those
outside the group. The closeness between entities of a group can be
measured via similarity or distance measures between entities. McPherson et
al5 stated that β€œsimilarity breeds connection”. They discussed various social
factors which lead to similar behaviour or homophily in networks. The
communities in social networks are analogous to clusters in networks. An
individual represented by a node in graphs may not be part of just a
community or a group, it may be an element of many closely associated or
different groups existing in the network. For example a person may
concurrently belong to college, school, friends and family groups. All such
communities which have common nodes are called overlapping
communities. Identification and analysis of the community structure has
been done by many researchers applying methodologies from numerous
form of sciences. The quality of clustering in networks is normally judged by
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
26
clustering coefficient which is a measure of how much the vertices of a
network tend to cluster together. The global clustering coefficient6 and the
local clustering coefficient7 are two types of clustering coefficients discussed
in literature. Methods for grouping similar items Communities are those
parts of the graph which have denser connections inside and few
connections with the rest of the graph8 . The aim of unsupervised learning is
to group together similar objects without any prior knowledge about them. In
case of networks, the clustering problem refers to grouping of nodes
according to their similarity computed based on topological features and/or
other characteristics of the graph. Network partitioning and clustering are
two commonly used methods in literature to find the groups in the social
network graph. These methods are briefly described in the next subsections.
Graph partitioning Graph partitioning is the process of partitioning a graph
into a predefined number of smaller components with specific properties. A
common property to be minimized is called cut size. A cut is a partition of
the vertex set of a graph into two disjoint subsets and the size of the cut is
the number of edges between the components. A multicut is a set of edges
whose removal divides the graph into two or more components. It is
necessary to specify the number of components one wishes to get in case of
graph partitioning. The size of the components must also be specified, as
otherwise a likely but not meaningful solution would be to put the minimum
degree vertex into one component and the rest of the vertices into another.
Since the number of communities is usually not known in advance, graph
partitioning methods are not suitable to detect communities in such cases.
Clustering Clustering is the process of grouping a set of similar items
together in structures known as clusters. Clustering the social network
graph may give a lot of information about the underlying hidden attributes,
relationships and properties of the participants as well as the interactions
among them. Hierarchical clustering and partitioning method of clustering
are the commonly used clustering techniques used in literature. In
hierarchical clustering, a hierarchy of clusters is formed. The process of
hierarchy creation or levelling can be agglomerative or divisive. In
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
27
agglomerative clustering methods, a bottom-up approach to clustering is
followed. A particular node is clubbed or agglomerated with similar nodes to
form a cluster or a community. This aggregation is based on similarity. In
divisive clustering approaches, a large cluster is repeatedly divided into
smaller clusters. Partitioning methods begin with an initial partition amidst
the number of clusters pre-set and relocation of instances by moving them
across clusters, e.g., K-means clustering. An exhaustive evaluation of all
possible partitions is required to achieve global optimality in partitioned-
based clustering. This is time consuming and sometimes infeasible, hence
researchers use greedy heuristics for iterative optimization in partitioning
methods of clustering. The next section categorizes and discusses major
algorithms for community detection. ALGORITHMS FOR COMMUNITY
DETECTION A number of community detection algorithms and methods
have been proposed and deployed for the identification of communities in
literature. There have also been modifications and revisions to many
methods and algorithms already proposed. A comprehensive survey of
community detection in graphs has been done by Fortunato8 in the year
2010. Other reviews available in literature are by Coscia et al9 in 2011,
Fortunato et al10 in 2012, Porter et al11 in 2009, Danon et al12 in 2005,
and PlantiΓ© et al13 in 2013. The presented work reviews the algorithms
available till 2015 to the best of our knowledge including the algorithms
given in the earlier surveys. Papers based on new approaches and
techniques like big data, not discussed by previous authors have been
incorporated in our article.The algorithms for community detection are
categorized into approaches based on graph partitioning, clustering, genetic
algorithms, label propagation along with methods for overlapping community
detection (clique based and non-clique based methods), and community
detection for dynamic networks. Algorithms under each of these categories
are described below. Graph partitioning based community detection Graph
partitioning based methods have been used in literature to divide the graph
into components such that there are few connections between components.
The Kernighan-Line14 algorithm for graph partitioning was amongst the
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
28
earliest techniques to divide a graph. It partitions the nodes of the graph
with cost on edges into subsets of given sizes so as to minimize the sum of
costs on all edges cut. A major disadvantage of this algorithm however is
that the number of groups have to be predefined. The algorithm however is
quite fast with a worst case running time of 𝑂(𝑛 2 ). Newman15 reduces the
widely-studied maximum likelihood method for community detection to a
search through a group of candidate solutions, each of which is itself a
solution to a minimum cut graph partitioning problem. The paper shows
that the two most essential community inference methods based on the
stochastic block model or its degree-corrected variant16 can be mapped onto
versions of the familiar minimum-cut graph partitioning problem. This has
been illustrated by adapting Laplacian spectral partitioning method17, 18 to
perform community inference. Clustering based community detection The
main concern of community detection is to detect clusters, groups or
cohesive subgroups. The basis of a large number of community detection
algorithms is clustering. Amongst the innovators of community detection
methods, Girvan and Newman19 had a main role. They proposed a divisive
algorithm based on edge-betweenness for a graph with undirected and
unweighted edges. The algorithm focused on edges that are most β€œbetween”
the communities and communities are constructed progressively by
removing these edges from the original graph. Three different measures for
calculation of edge-betweenness in vertices of a graph were proposed in
Newman and Girvan20 . The worst-case time complexity of the edge
betweenness algorithm is 𝑂(π‘š2𝑛) and is 𝑂(𝑛 3 ) for sparse graphs, where m
denotes the number of edges and n is the number of vertices. The Girvan
Newman (GN) algorithm has been enhanced by many authors and applied to
various networks21-28. Chen et al22 extended GN algorithm to partition
weighted graphs and used it to identify functional modules in the yeast
proteome network. Rattigan et al21 proposed the indexing methods to
reduce the computational complexity of the GN algorithm significantly.
Pinney et al24 also build an algorithm which uses GN algorithm for the
decomposition of networks based on graph theoretical concept of
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
29
betweenness centrality. Their paper inspected utility of betweenness
centrality to decompose such networks in diverse ways. Radicchi29 et al also
proposed an algorithm based on GN algorithm introducing a new definition
of community. They defined β€˜strong’ and β€˜weak’ communities. The algorithm
uses an edge clustering coefficient to perform the divisive edge removal step
of GN and has a running time of 𝑂( π‘š4 𝑛2 ) and 𝑂(𝑛 2 ) for sparse graphs.
Moon et al30 have proposed and implemented the parallel version of the GN
algorithm to handle large scale data. They have used MapReduce model
(Apache Hadoop) and GraphChi. Newman and Girvan first defined a
measure known as β€˜modularity’ to judge the quality of partitions or
communities formed20 . The modularity measure proposed by them has
been widely accepted and used by researchers to gauge the goodness of the
modules obtained from the community detection algorithms with high
modularity corresponding to a better community structure. Modularity was
defined as βˆ‘π’Šπ’†π’Šπ’Š βˆ’ π’‚π’ŠπŸ , where π’†π’Šπ’Š denotes fraction of the edges that connect
vertices in community π’Š, π’†π’Šπ’‹ denotes fraction of the edges connecting vertices
in two different communities π’Š and 𝒋 while π’‚π’Š = βˆ‘π’‹π’†π’Šπ’‹ is the fraction of edges
that connect to vertices in community π’Š. The value 𝑄 = 1 indicates a network
with strong community structure. The optimization of modularity function
has received great attention in literature. The table 1 lists clustering based
community detection methods, including algorithms which use modularity
and modularity optimization. Newman31 has worked to maximize
modularity so that the process of aggregating nodes to form communities
leads to maximum modularity gain. This change in modularity upon joining
two communities defined as βˆ†π‘„ = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 βˆ’ 2π‘Žπ‘–π‘Žπ‘— = 2(π‘’π‘–π‘—βˆ’π‘Žπ‘–π‘Žπ‘—) can be
calculated in constant time and hence is faster to execute in comparison to
the GN algorithm. The run time of the algorithm is 𝑂(𝑛 2 ) for sparse graphs
and 𝑂((π‘š + 𝑛)𝑛) for others. In a recent work, a scalable version of this
algorithm has been implemented using MapReduce by Chen et al32.
Newman33 generalized the betweenness algorithm for weighted networks.
The modularity was now represented as 𝑄 = 1 2π‘š βˆ‘π‘–π‘—[𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ]Ξ΄(𝑐𝑖 , 𝑐𝑗)
where π‘š = 1 2 βˆ‘π‘–π‘—π΄π‘–π‘— represents the number of edges between communities 𝑐𝑖
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
30
and 𝑐𝑗 in the graph, while π‘˜π‘– , π‘˜π‘— are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑒, 𝑣) is
1 if 𝑒 = 𝑣 and 0 otherwise. Newman34 in yet another approach characterised
the modularity matrix in terms of eigenvectors. The equation for modularity
was changed to 𝑄 = 1 4π‘šπ‘ π‘‡π΅π‘  , where the modularity matrix was given as 𝐡𝑖𝑗
= 𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š and modularity was defined using eigenvectors of the
modularity matrix. The algorithm runs in 𝑂(𝑛 2 π‘™π‘œπ‘”π‘›) time, where π‘™π‘œπ‘”π‘›
represents the average depth of the dendrogram .
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
31
Clauset et al35 used greedy optimization of modularity to detect
communities for large networks. For a network structure with m edges and n
vertices, the algorithm has a running time of (π‘šπ‘‘π‘™π‘œπ‘”π‘›) , where β€˜d’ denotes the
depth of the dendrogram. For sparse real world networks the running time
is(π‘›π‘™π‘œπ‘”2n) . Blondel et al36 designed an iterative two phase algorithm known
as Louvain method. In first phase, all nodes are placed into different
communities and then the modularity gain of moving a node 𝑖 from one
community to another is found. In case this modularity gain is positive, the
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
32
node is shifted to a new community. In second phase all the communities
found in earlier phase are treated as nodes and the weight of links is found.
The algorithm improves the time complexity of the GN algorithm. It has a
linear run time of 𝑂(π‘š). Guimera et al37 used simulated annealing for
modularity optimization and showed that computing the modularity of a
network is similar to determining the ground-state energy of a spin system.
Additionally, the authors showed that the stochastic network models give
rise to modular networks due to fluctuations. Zhou et al38 attempted to
improve modularity using simulated annealing introducing the idea of β€˜inter
edges’ and β€˜intra edges’. The authors modified the modularity equation to
include inter and intra edges as 𝑄 = 1 2π‘š βˆ‘ [(𝐴𝑖𝑗𝑛𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š )𝛿(𝐢𝑖 , 𝐢𝑗) βˆ’ 𝛽
(𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ) 𝛼 (1 βˆ’ 𝛿(𝐢𝑖 , 𝐢𝑗))] Intra factor Inter factor Here Ξ± and Ξ² are
undetermined parameters and affect the value of the inter-factor. The value
of Ξ² is increased and Ξ± is reduced when large communities are expected.
Duch et al39 proposed a heuristic search based approach for the
optimization of modularity function using extremal optimization technique,
which has a complexity of 𝑂(𝑛 2 π‘™π‘œπ‘”2𝑛). AdClust method40 can extract
modules from complex networks with significant precision and strength.
Each node in the network is assumed to act as a self-directed agent
representing flocking behaviour. The vertices of the network travel towards
the desirable adjoining groups. Wahl and Sheppard41 proposed hierarchical
fuzzy spectral clustering based approach. They argued that determining the
sub-communities and their hierarchies are as important as determining
communities within a network. DENGRAPH42 algorithm uses the idea of
density-based incremental clustering of spatial data and is intended to work
for large dynamic datasets with noise. The Markov Clustering
Algorithm(MCL)43 is a graph flow simulation algorithm which can be used to
detect clusters in a graph and is analogous to detection of communities in
the networks. This algorithm consists of two alternate processes of
β€˜expansion’ and β€˜inflation’. Markov chains are employed to perform random
walk through a graph. The method has a worst case run time of 𝑂(π‘›π‘˜ 2 )
where n represents the number of nodes and k is the number of resources.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
33
Nikolaev et al44 used β€˜entropy centrality measure’ based on Markovian
process to iteratively detect communities. A random walk through the nodes
is performed to find the communities existing in the network structure. For a
graph, the transition probability matrix for a Markov chain is created. A
locality t is selected and those edges for which the average entropy centrality
for the nodes over the graph is reduced are selected and removed. The
algorithm proposed by Steinhaeuser et al45 performs many short random
walks and interprets visited nodes during the same walk as similar nodes
which gives an indication that they belong to the same community. The
similar nodes are aggregated and community structure is created using
consensus clustering. It has a runtime of 𝑂(𝑛 2 π‘™π‘œπ‘”π‘›).
Genetic algorithms (GA) based community detection :
Genetic algorithms (GA) are adaptive heuristic search algorithms whose aim
is to find the best solution under the given circumstances. A genetic
algorithm starts with a set of solutions known as chromosomes and fitness
function is calculated for these chromosomes. If a solution with a maximum
fitness is obtained, one stops else with some probability crossover and
mutation operators are applied to the current set of solutions to obtain the
new set of solutions. Community detection can be viewed as an optimization
problem in which an objective function that captures the intuition of a
community with better internal connectivity than external connectivity is
chosen to be optimized. GA have been applied to the process of community
discovery and analysis in a few recent research works. These are described
briefly in this section. Table 2 enlists the algorithms available in literature
for community detection based on GA.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
34
Pizzuti46 proposed the GA-Net algorithm which uses a locus based graph
representation of the network. The nodes of the social network are depicted
by genes and alleles. The algorithm introduces and optimizes the community
score to measure the quality of partitioning. All the dense communities
present in the network structure are obtained at the end of the algorithm by
selectively exploring the search space, without the need to know in advance
the exact number of groups. Another GA based approach MOGA-Net47
proposed by the same author optimizes two objective functions i.e. the
community score and community fitness . The higher the community score,
the denser the clustering obtained. The community fitness is sum of fitness
of nodes belonging to a module. When this sum reaches its maximum, the
number of external links is minimized. MOGA-Net generates a set of
communities at different hierarchical levels in which solutions at deeper
levels, consisting of a higher number of modules, are contained in solutions
having a lower number of communities. Hafez et al48 have performed both
Single-Objective and Multi-Objective optimization for community detection
problem. The former optimization was done using roulette selection based
GA while NSGA-II algorithm was used for the latter process. Mazur et al49
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
35
have used modularity as the fitness function in addition to the community
score. The authors worked on undirected graphs and their algorithm can
also discover single node communities. Liu et al50 used GA in addition to
clustering to find the community structures in a network. The authors have
used a strategy of repeated divisions. The graph is initially divided into two
parts, then the subgraphs are further divided and a nested GA is applied to
them. Tasgin et al51 have also optimized the network modularity using GA.
A multi-cultural algorithm52 for community detection employs the fitness
function defined by Pizzuti46 in GA-Net. The belief space which is a state
space for the network and contains a set of individuals that have a better
fitness value has been used in this work to guide the search direction by
determining a range of possible states for individuals. A genetic algorithm for
the optimization of modularity, proposed by Nicosia et al53 and has been
explained in the overlapping communities section later.
Label propagation based community detection:
Label propagation in a network is the propagation of a label to various nodes
existing in the network. Each node attains the label possessed by a
maximum number of the neighbouring nodes. This section discusses some
label propagation based algorithms for discovering communities. Table 3
contains a listing of these algorithms, discussed in detail later in the section.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
36
Label Propagation Algorithm(LPA) was proposed by Raghavan et al54 in
which initially each node tries to achieve a label from the maximum number
of labels possessed by its neighbours. The stopping criteria for the process
was also the same, i.e., when each node achieves a label, which a maximum
number of its neighbouring nodes have. Each iteration of the algorithm
takes 𝑂(π‘š) time where m is the number of edges. SLPA (speaker listener label
propagation algorithm)55 is an extension to LPA which could analyse
different kinds of communities such as disjoint communities, overlapping
communities and hierarchical communities in both unipartite and bipartite
networks. The algorithm has a linear run time of 𝑂(π‘‡π‘š), where T is the user
defined maximum number of iterations and m is the number of edges. Based
on the SLPA algorithm, Hu56 proposed a Weighted Label Propagation
Algorithm (WLPA). It uses the similarity between any two of the vertices in a
network based on the labels of the vertices achieved in label propagation.
The similarity of these vertices is then used as a weight of the edge in label
propagation. LPA was further improved by Gregory57 in his algorithm
COPRA (Community Overlap Propagation Algorithm). It was the first label
propagation based procedure which could also detect overlapping
communities. The run time per iteration is 𝑂(π‘£π‘šπ‘™π‘œπ‘”(π‘£π‘šβ„π‘›), here n is the
number of nodes, m is the edges and v is the maximum number of
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
37
communities per vertex. LabelRank Algorithm58 uses the LPA and MCL
(Markov Clustering Algorithm). The node identifiers are used as labels. Each
node receives a number of labels from the neighbouring nodes. A community
is formed for nodes having the same highest probability label. Four operators
are applied namely propagation which propagates the label to neighbours,
inflation i.e. the inflation operator of the MCL algorithm, cut-off operator that
removes the labels below a threshold and an explicit conditional update
operator responsible for a conditional update. The algorithm runs in 𝑂(π‘š)
time where m is the number of edges. The LabelRank algorithm was modified
to LabelRankT algorithm by Xie et al60. This algorithm included both the
edge weights and the edge directions in the detection of communities. This
algorithm works for dynamic networks as well and is able to detect evolving
communities also. Wu et al59 proposed a Balanced Multi Label Propagation
Algorithm (BMPLA) for detection of overlapping communities. Using this
algorithm, vertices can belong to any number of communities without having
a global maximum limit on largest number of communities membership
required by COPRA57 . Each iteration of the algorithm takes 𝑂(π‘›π‘™π‘œπ‘”π‘›) time to
execute, where n is the number of nodes. Semantics based community
detection Semantic content and edge relationships in a semantic network
may be additionally used to partition the nodes into communities. The
context, as well as the relationship of the nodes, both are taken into
consideration in the process of semantic community detection. LDA(Latent
Dirichlet Allocation)61 is used in several semantic community based
community detection approaches. A clustering algorithm based on the link-
field-topic (LFT) model is put forward by Xin et al62 to overcome the
limitation of defining the number of communities beforehand. The study
forms the semantic link weight (SLW) based on the investigation of LFT, to
evaluate the semantic weight of links for each sampling field. The proposed
clustering algorithm is based on the SLW which could separate the semantic
social network into clustering units. In another work63 the authors have
used ARTs model and divided the process into two phases namely LDA
sampling and community detection. In the former process multiple sampling
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
38
ARTs have been designed. A community clustering algorithm has also been
proposed. The procedure could detect the overlapping communities. Xia et
al64 constructed a semantic network using information from the comment
content extracted from the initial HTML source files. An average score is
obtained for two users for each link assuming comments to be implicit links
between people. An analytic method for taking out comment content is
proposed to build the semantic network for example, the terms and phrases
in data are counted in comments as supportive or opposing. Each phrase is
given an associated numerical trust value. On this semantic network, the
classical community detection algorithm is applied henceforth. Ding65 has
considered the impact of topological as well as topical elements in
community detection. Topology based approaches are based on the idea that
the real world networks can be modelled as graphs where the nodes depict
the entities whereas the interactions between them are shown by the edges
of the graph. On the other hand topic based community detection have a
basis that the more words two objects share, the more similar they are. The
author performs systematic analysis with topology-based and topic-based
community detection methodologies on the co-authorship networks. The
paper puts forward the argument that, to detect communities, one should
take into account together the topical and topological features of networks. A
community detection algorithm, SemTagP (Semantic Tag propagation) has
been proposed by Ereteo et al66 that takes yield of the semantic data
captured while organizing the RDF graphs of social networks. It basically is
an extension of the LPA54 algorithm to perform the semantic propagation of
tags. The algorithm detects and moreover labels communities using the tags
used by group during the social labelling process and the semantic
associations derived between tags. In a study by Zhao et al67, a topic
oriented approach consisting of an amalgam of social objects clustering and
link analysis has been used. Firstly a modified form of k means clustering
named as β€˜Entropy Weighting K-Means (EWKM) algorithm’ has been used to
cluster the social objects. A subspace clustering algorithm is applied to
cluster all the social objects into topics. On the clusters obtained in this
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
39
process, topical community detection or link analysis is performed using a
modularity optimization algorithm. The members of the objects are
separated into topical clusters having unique topic. A link analysis is
performed on each topical cluster to discover the topical communities. The
end result of the entire method is topical communities. A community
extraction approach is given by Abdelbary et al68, which integrates the
content published within the social network with its semantic features.
Community discovery is performed using two layer generative Restricted
Boltzmann Machines model. The model presumes that members of a
community communicate over matters of common concern. The model
permits associate members to belong to multiple communities. Latent
semantic analysis (LSA)69 and Latent Dirichlet Allocation (LDA)61 are the
two techniques extensively employed in the process to detect topical
communities. Nyugen et al70 have used LDA to find hyper groups in the blog
content and then sentiment analysis is done to further find the meta-groups
in these units. A Link-Content model is proposed by Natarajan et al71 for
discovering topic based communities in social networks. Community has
been modelled as a distribution employing Gibbs sampling. This paper uses
links and content to extract communities in a content sharing network
Twitter. Methods to detect overlapping communities A recent survey by
Amelio et al gives a comprehensive review of major overlapping community
detection algorithms and includes the methods on dynamic networks. There
exists another review of methods for discover overlapping communities done
by Xie et al72. The following section discusses some of the methods to detect
overlapping communities. Tables 4 and 5 enlist the methods discussed in
this section. Clique based methods for overlapping community detection A
community can be interpreted as a union of smaller complete (fully
connected) subgraphs that share nodes. A k-clique is a fully connected
subgraph consisting of k nodes. A k-clique community can be defined as
union of all k-cliques that can be reached from each other through a series
of adjacent k-cliques. Many researchers have used cliques to detect
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
40
overlapping communities. Important contributions using cliques for
overlapping community detection are summarized in table 4.
The Clique Percolation Method (CPM) was proposed by Palla et al73 to detect
overlapping communities. The method first finds all cliques of the network
and uses the algorithm of Everett et al83 to identify communities by
component analysis of clique-clique overlap matrix. CPM has a runtime of
𝑂(exp(𝑛)). The Clique Percolation Method (CPM) method proposed by Palla et
al73 could not discover the hierarchical structure along with the overlapping
attribute. This limitation was overcome through method proposed by
Lancichinetti et al74 . It performs a local exploration in order to find the
community for each of the node. In this process the nodes may be revisited
any number of times. The main objective was to find local maxima based on
a fitness function. CFinder84 software was developed using CPM for
overlapping community detection. Du et al75 proposed ComTector
(Community DeTector) for detection of overlapping communities using
maximal cliques. Initially all maximal cliques in the network are found which
form the kernels of potential community. Then agglomerative technique is
iteratively used to add the vertices left to their closest kernels. The obtained
clusters are adjusted by merging pair of fractional communities in order to
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
41
optimize the modularity of the network. The running time of the algorithm is
(πΆβˆ—π‘‡2 ), where the communities detected are denoted by C and T is the
number of triangles in the network. EAGLE, an agglomerative hierarchical
clustering based algorithm has been proposed by Shen et al 76. In the first
step maximal cliques are discovered and those smaller than a threshold are
discarded. Subordinate maximal cliques are neglected, remaining give the
initial communities (also the subordinate vertices). The similarity is found
between these communities, and communities are repeatedly merged
together on the basis of this similarity. This is repeated till one community
remains at the end. Evans et al77 proposed that by partitioning the links of
a network, the overlapping communities may be discovered. In an extension
to this work, Evans et al78 used weighted line graphs. In another work
Evans79 used clique graphs to detect the overlapping communities in real
world social networks. GCE (Greedy Clique Expansion)80 first identifies
cliques in a network. These cliques act as seeds for expansion along with the
greedy optimization of a fitness function. A community is created by
expanding the selected seed and performing its greedy optimization via the
fitness function proposed by Lancichinetti et al74 . CONGA (Cluster-Overlap
Newman Girvan Algorithm) was proposed by Gregory25. This method was
based on split- betweenness algorithm of Girvan-Newman. The runtime of
the method is 𝑂(π‘š3 ). In another work CONGO81 (CONGA Optimized)
algorithm was proposed which used local betweenness measure, leading to
an improved complexity 𝑂(π‘›π‘™π‘œπ‘”π‘›). A two phase Peacock algorithm for
detection of overlapping communities is proposed in Gregory82 using
disjoint community detection approaches. In the first phase, the network
transformation was performed using the split betweenness concept proposed
earlier by the author. In the second phase, the transformed network is
processed by a disjoint community detection algorithm and the detected
communities were converted back to overlapping communities of the original
network. Non clique methods for overlapping community detection Some
other non-clique methods to discover overlapping communities are given in
the table 5. These methods have been briefly explained in this section. An
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
42
extension of Newman’s modularity for directed graphs and overlapping
communities was done by Nicosia et al53 and modularity was given by π‘„π‘œπ‘£ =
1 π‘š βˆ‘π‘βˆŠπΆ βˆ‘π‘–,π‘—βˆŠπ‘‰[𝛽𝑙(𝑖,𝑗),𝑐𝐴𝑖𝑗 βˆ’ 𝛽𝑙(𝑖,𝑗),π‘π‘œπ‘’π‘‘π‘˜π‘–π‘œπ‘’π‘‘π›½π‘™(𝑖,𝑗),π‘π‘–π‘›π‘˜π‘—π‘–π‘›π‘š . The authors
defined a belongingness coefficient 𝛽𝑙,𝑐 of an edge 𝑙 connecting nodes 𝑖 and 𝑗
for a particular community 𝑐 and is given by 𝛽𝑙,𝑐 = β„±(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) where
definition for β„±(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) is taken as arbitrary, e.g., it can be taken as a
product of the belonging coefficients of the nodes involved, or as max(𝛼𝑖,𝑐 ,
𝛼𝑗,𝑐). 𝛽𝑙(𝑖,𝑗),π‘π‘œπ‘’π‘‘ = βˆ‘π‘—βˆŠπ‘‰β„±(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| , 𝛽𝑙(𝑖,𝑗),𝑐𝑖𝑛 = βˆ‘π‘–βˆŠπ‘‰β„±(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| . A
genetic approach has been used in this work for the optimization of
modularity function. Another work which uses genetic approach to
overlapping community detection is GA-Net+, by Pizzuti85 GA-NET+ could
detect overlapping communities using edge clustering. Order Statistics Local
Optimization Method(OSLOM)86 detects clusters in networks, and can
handle various kind of graph properties like edge direction, edge weights,
overlapping communities, hierarchy and network dynamics. It is based on
local optimization of a fitness function expressing the statistical significance
of clusters with respect to random fluctuations, which is estimated with
tools of Extreme and Order Statistics. Baumes et al87 considered a
community as a subset of nodes which induces a locally optimal subgraph
with respect to a density function. Two different subsets with significant
overlap can be locally optimal which forms the basis to find overlapping
communities. Chen et al 88 used game-theoretic approach to address the
issue of overlapping communities. Each node is assumed to be an agent
trying to improve the utility by joining or leaving the community. The
community of the nodes in Nash equilibrium are assumed to form the
output of the algorithm. Utility of an agent is formulated as combination of a
gain and a loss function. To capture the idea of overlapping communities,
each agent is permitted to select multiple communities. In another game-
theoretic approach, Alvari et al89 proposed an algorithm consisting of two
methods PSGAME based on Pearson correlation, and NGGAME centred on
neighbourhood similarity measure. Alvari et al90 proposed the Dynamic
Game Theory method (D-GT) which treated nodes as rational agents. These
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
43
agents perform actions in iterative and game theoretic manner so as to
maximize the total utility.
Community detection for Dynamic networks:
Dynamic networks are the networks in which the membership of the
nodes of communities evolve or change over time. The task of community
identification for dynamic networks has received relatively less attention
than the static networks The methods have been categorized into two classes
by Bansal et al94, one designed for data which is evolving in real time known
as incremental or online community detection; and the other for data where
all the changes of the network evolution are known a priori, known as offline
community detection. Wolf et al95 proposed mathematical and
computational formulations for the analysis of dynamic communities on the
basis of social interactions occurring in the network. Tantipathananandh et
al96 made assumptions about the individual behaviour and group
membership. Henceforth they framed the objective as an optimization
problem by formulating three cost functions, namely i-cost, g-cost and c-
cost. Graph colouring and heuristics based approach were deployed.
FacetNet, proposed by Lin et al97 is a unified framework to study the
dynamic evolutions of communities. The community structure at any time
includes the network data as well as the previous history of the evolution.
They have used a cost function and proposed an iterative algorithm which
converges to an optimal solution. Palla et al98 conducted experiments on
two diverse datasets of phone call network and collaboration network to find
time dependence. After building joint graphs for two time steps, the CPM
algorithm73 was applied. They have used an auto-correlation function to
find overlap among two states of a community, and a stationarity parameter
which denotes the average correlation of various states. Greene et al99
proposed a heuristic technique for identification of dynamic communities in
the network data. They represented the dynamic network graph as an
aggregation of time step graphs. Step communities represent the dynamic
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
44
communities at a particular time. The algorithm begins with the application
of a static community detection algorithm on the graph. In the subsequent
steps, dynamic communities are created for each step and Jaccard similarity
is calculated. They have also generated benchmark dataset for experimental
work. The algorithm by Bansal et al94 involves the addition or deletion of
edges in the network. The algorithm is built on the greedy agglomerative
technique of the modularity based method earlier proposed in the work of
Clauset et al35 . He et al100 improvised Louvain method36 to include
concept of dynamicity in the formation of communities. A key point in their
algorithm is to make use of previously detected communities at time 𝑑 βˆ’ 1 to
identify the communities at time 𝑑. Dinh et al101 proposed A3CS, an
adaptive framework which uses the power-law distribution and achieves
approximation guarantees for the NP-hard modularity maximization
problem, particularly on dynamic networks. Nguyen et al102 have attempted
to identify disjoint community structure in dynamic social networks. An
adaptive modularity-based framework Quick Community Adaptation (QCA)
is proposed. The method finds and traces the progress of network
communities in dynamic online social networks. Takaffoli et al103 have
proposed a two-step approach to community detection. In the first step the
communities extracted at different time instances are compared using
weighted bipartite matching. Next, a β€˜meta’ community is constructed which
is defined as a series of similar communities at various time instances. Five
events to capture the changes to community are split, survive, dissolve,
merge, and form. A similarity function is used to calculate the similarity
between two communities and a community matching algorithm has been
employed thereafter. The authors, Kim et al104 proposed a particle-and-
density based evolutionary clustering method for discovery of communities
in dynamic networks. Their approach is grounded on the assumption that a
network is built of a number of particles termed as nano-communities,
where each community further is made up of particles termed as quasi-
clique-by-clique (l-KK). The density based clustering method uses cost
embedding technique and optimal modularity method to ensure temporal
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
45
smoothness even when the number of cluster varies. They have used an
information theory based mapping technique to recognize the stages of the
community i.e. evolving, forming or dissolving. Their method improves
accuracy and is time efficient as compared to the FacetNet method proposed
earlier. In another approach proposed by Chi et al105, two frameworks for
evolutionary spectral clustering have been proposed namely PCQ (Preserving
cluster quality) and PCM (Preserving cluster membership). In this work the
temporal smoothness is ensured by some terms in the clustering cost
functions. These two frameworks combine the processes of community
extraction and the community evolution process. They use a cost function
which consists of the snapshot and temporal cost. The clustering quality of
any partition determines the snapshot cost while the temporal cost definition
varies for each of the frameworks. For PCQ framework, the temporal cost is
decided by the cluster quality when the current partition is applied to the
historic data. In PCM, the difference between the current and the historic
partition gives the temporal cost. Both the frameworks proposed, can tackle
the change in number of clusters. In their work DYNMOGA (Dynamic
MultiObjective Genetic Algorithm), the authors Folino et al106 have used a
genetic algorithm based approach to dynamic community detection. They
attempt to achieve temporal smoothness by multiobjectiveoptimisation,
i.e.maximisation of snapshot quality (community score is used) and
minimization of temporal cost (here NMI is used). Kim et al107 in their
method CHRONICLE have performed two stage clustering and the method
can detect clusters of path group type also in addition to the single path type
clusters. In first stage of the algorithm, called as CHRONICLE1st the cosine
similarity measure is used. In second stage of the algorithm the measure
proposed and used is general similarity (GS). It is a combination of the two
measures structural affinity and weight affinity.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
46
SOME POTENTIAL APPLICATIONS OF COMMUNITY
DETECTION:
With the enormous growth of the social networking site users, the graphs
representing these sites are becoming very complex, hence difficult to
visualize and understand. Communities can be considered as a summary of
the whole network thus making the network easy to comprehend. The
discovery of these communities in social networks can be useful in various
applications. Some of the applications where community detection is useful
are briefly described below.
Improving recommender systems with community detection
Recommender Systems use data of similar users or similar items to
generate recommendations. This is analogous to the identification of groups,
or similar nodes in a graph. Hence community detection holds an immense
potential for recommendation algorithms. Cao et al114 have used a
community detection based approach to improve the traditional collaborative
filtering process of Recommender Systems. The process starts with the
mapping of user-item matrix to user similarity structure. On this matrix, a
discrete PSO (particle swarm optimization) algorithm is applied to detect
communities. The items are then recommended to the user based on the
discovered communities.
Evolution of communities in social media
With the increase in the number of social networking sites, the focus and
scope of sites are getting expanded. The sites are getting diversified in terms
of focus. In addition to common sites like Facebook, Twitter, MySpace and
Bebo, other sites like Flickr for photo-sharing have also come up. The
analysis of the tweetretweet and the follower-followee network in twitter
provides an insight into the community structure existing in the Twitter
network. Sentiment analysis of the tweets may be performed as an
intermediary step to find the general nature of the tweets and then
community detection algorithms may be applied to help deduce the
structure of communities. Zalmout et al115, applied the community
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
47
detection algorithm to UK political tweets dataset. CQA(Community question
answering) has been used by Zhang et al116 to discover overlapping
communities in dynamic networks based on user interactions.
Related Work:
Social Network:
A social network is depicted by social network graph consisting of number
of nodes denoting individuals or the participants in the network. The
connection between node and node is represented by the edge of the graph.
A directed or an undirected graph may illustrate these connections between
the participants of the network. The graph can be represented by an
adjacency matrix in which in case there is an edge between and else.
Social networks follow the properties of complex networks3,4. Some real life
examples1 of social networks include friends based, telephone, email and
collaboration networks. These networks can be represented as graphs and it
is feasible to study and analyse them to find interesting patterns amongst
the entities. These appealing prototypes can be utilized in various useful
applications.
Community:
A community can be defined as a group of entities closer to each other in
comparison to other entities of the dataset. Community is formed by
individuals such that those within a group interact with each other more
frequently than with those outside the group. The closeness between entities
of a group can be measured via similarity or distance measures between
entities. McPherson et al5 stated that β€œsimilarity breeds connection”. They
discussed various social factors which lead to similar behaviour or
homophily in networks. The communities in social networks are analogous
to clusters in networks. An individual represented by a node in graphs may
not be part of just a community or a group, it may be an element of many
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
48
closely associated or different groups existing in the network. For example a
person may concurrently belong to college, school, friends and family
groups. All such communities which have common nodes are called
overlapping communities. Identification and analysis of the community
structure has been done by many researchers applying methodologies from
numerous form of sciences. The quality of clustering in networks is
normally judged by clustering coefficient which is a measure of how much
the vertices of a network tend to cluster together. The global clustering
coefficient6 and the local clustering coefficient7 are two types of clustering
coefficients discussed in literature.
Methods for grouping similar items:
Communities are those parts of the graph which have denser connections
inside and few connections with the rest of the graph8. The aim of
unsupervised learning is to group together similar objects without any prior
knowledge about them. In case of networks, the clustering problem refers to
grouping of nodes according to their similarity computed based on
topological features and/or other characteristics of the graph. Network
partitioning and clustering are two commonly used methods in literature to
find the groups in the social network graph. These methods are briefly
described in the next subsections.
Graph partitioning Graph partitioning is the process of partitioning a graph
into a predefined number of smaller components with specific properties. A
common property to be minimized is called cut size. A cut is a partition of
the vertex set of a graph into two disjoint subsets and the size of the cut is
the number of edges between the
components. A multicut is a set of edges whose removal divides the graph
into two or more components. It is necessary to specify the number of
components one wishes to get in case of graph partitioning. The size of the
components must also be specified, as otherwise a likely but not meaningful
solution would be to put the minimum degree vertex into one component
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
49
and the rest of the vertices into another. Since the number of communities is
usually not known in advance, graph partitioning methods are not suitable
to detect communities in such cases.
Clustering is the process of grouping a set of similar items together in
structures known as clusters. Clustering the social network graph may give
a lot of information about the underlying hidden attributes, relationships
and properties of the participants as well as the interactions among them.
Hierarchical clustering and partitioning method of clustering are the
commonly used clustering techniques used in literature.
In hierarchical clustering, a hierarchy of clusters is formed. The process of
hierarchy creation or levelling can be agglomerative or divisive. In
agglomerative clustering methods, a bottom-up approach to clustering is
followed. A particular node is clubbed or agglomerated with similar nodes to
form a cluster or a community. This aggregation is based on similarity. In
divisive clustering approaches, a large cluster is repeatedly divided into
smaller clusters.
Partitioning methods begin with an initial partition amidst the number of
clusters pre-set and relocation of instances by moving them across clusters,
e.g., K-means clustering. An exhaustive evaluation of all possible partitions
is required to achieve global optimality in partitioned-based clustering. This
is time consuming and sometimes infeasible, hence researchers use greedy
heuristics for iterative optimization in partitioning methods of clustering.
The next section categorizes and discusses major algorithms for community
detection.
ALGORITHMS FOR COMMUNITY DETECTION:
A number of community detection algorithms and methods have been
proposed and deployed for the identification of communities in literature.
There have also been modifications and revisions to many methods and
algorithms already proposed. A comprehensive survey of community
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
50
detection in graphs has been done by Fortunato8 in the year 2010. Other
reviews available in literature are by Coscia et al9 in 2011, Fortunato et al10
in 2012, Porter et al11 in 2009, Danon et al12 in 2005, and PlantiΓ© et al13
in 2013. The presented work reviews the algorithms available till 2015 to the
best of our knowledge including the algorithms given in the earlier surveys.
Papers based on new approaches and techniques like big data, not
discussed by previous authors have been incorporated in our article.The
algorithms for community detection are categorized into approaches based
on graph partitioning, clustering, genetic algorithms, label propagation along
with methods for overlapping community detection (clique based and non-
clique based methods), and community detection for dynamic networks.
Algorithms under each of these categories are described below.
Graph partitioning based community detection:
Graph partitioning based methods have been used in literature to divide the
graph into components such that there are few connections between
components. The Kernighan-Line14 algorithm for graph partitioning was
amongst the earliest techniques to divide a graph. It partitions the nodes of
the graph with cost on edges into subsets of given sizes so as to minimize
the sum of costs on all edges cut. A major disadvantage of this algorithm
however is that the number of groups have to be predefined. The algorithm
however is quite fast with a worst case running time of 𝑂(𝑛2). Newman15
reduces the widely-studied maximum likelihood method for community
detection to a search through a group of candidate solutions, each of which
is itself a solution to a minimum cut graph partitioning problem. The paper
shows that the two most essential community inference methods based on
the stochastic block model or its degree-corrected variant16 can be mapped
onto versions of the familiar minimum-cut graph partitioning problem. This
has been illustrated by adapting Laplacian spectral partitioning method17,
18 to perform community inference.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
51
Definition of Local Community:
The problem of local community detection is proposed by Clauset [15].
Usually we define the local community problem in the following way: there is
a nondirected graph , represents the set of nodes, and represents the edges
in the graph. The connecting information of partial nodes in the graph is
known or can be obtained. The local community is defined as . The set of
nodes connected with is defined as and the set of nodes in connected with
nodes in is defined as the boundary node set . That is to say, any node in is
connected to one node in , and the rest of is the core node set .
Clustering based community detection:
The main concern of community detection is to detect clusters, groups or
cohesive subgroups. The basis of a large number of community detection
algorithms is clustering. Amongst the innovators of community detection
methods, Girvan and Newman19 had a main role. They proposed a divisive
algorithm based on edge-betweenness for a graph with undirected and
unweighted edges. The algorithm focused on edges that are most β€œbetween”
the communities and communities are constructed progressively by
removing these edges from the original graph. Three different measures for
calculation of edge-betweenness in vertices of a graph were proposed in
Newman and Girvan20. The worst-case time complexity of the edge
betweenness algorithm is 𝑂(π‘š2𝑛) and is 𝑂(𝑛3) for sparse graphs, where m
denotes the number of edges and n is the number of vertices.
The Girvan Newman (GN) algorithm has been enhanced by many authors
and applied to various networks21-28. Chen et al22 extended GN algorithm
to partition weighted graphs and used it to identify functional modules in
the yeast proteome network. Rattigan et al21 proposed the indexing methods
to reduce the computational complexity of the GN algorithm significantly.
Pinney et al24 also build an algorithm which uses GN algorithm for the
decomposition of networks based on graph theoretical concept of
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
52
betweenness centrality. Their paper inspected utility of betweenness
centrality to decompose such networks in diverse ways. Radicchi29 et al also
proposed an algorithm based on GN algorithm introducing a new definition
of community. They defined β€˜strong’ and β€˜weak’ communities. The algorithm
uses an edge clustering coefficient to perform the divisive edge removal step
of GN and has a running time of 𝑂(π‘š4 𝑛2 ) and 𝑂(𝑛2) for sparse graphs.
Moon et al30 have proposed and implemented the parallel version of the GN
algorithm to handle large scale data. They have used MapReduce model
(Apache Hadoop) and GraphChi. Newman and Girvan first defined a
measure known as β€˜modularity’ to judge the quality of partitions or
communities formed20. The modularity measure proposed by them has been
widely accepted and used by researchers to gauge the goodness of the
modules obtained from the community detection algorithms with high
modularity corresponding to a better community structure. Modularity was
defined as βˆ‘π’†π’Šπ’Šπ’Š βˆ’ π’‚π’ŠπŸ , where π’†π’Šπ’Š denotes fraction of the edges that connect
vertices in community π’Š, π’†π’Šπ’‹ denotes fraction of the edges connecting vertices
in two different communities π’Š and 𝒋 while π’‚π’Š = βˆ‘ π’†π’Šπ’‹π’‹ is the fraction of edges
that connect to vertices in community π’Š. The value 𝑄 = 1 indicates a network
with strong community structure. The optimization of modularity function
has received great attention in literature. The table 1 lists clustering based
community detection methods, including algorithms which use modularity
and modularity optimization. Newman31 has worked to maximize
modularity so that the process of aggregating nodes to form communities
leads to maximum modularity gain. This change in modularity upon joining
two communities defined as βˆ†π‘„ = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 βˆ’ 2π‘Žπ‘–π‘Žπ‘— = 2(π‘’π‘–π‘—βˆ’π‘Žπ‘–π‘Žπ‘—) can be
calculated in constant time and hence is faster to execute in comparison to
the GN algorithm. The run time of the algorithm is 𝑂(𝑛2) for sparse graphs
and 𝑂((π‘š + 𝑛)𝑛) for others. In a recent work, a scalable version of this
algorithm has been implemented using MapReduce by Chen et al32.
Newman33 generalized the betweenness algorithm for weighted networks.
The modularity was now represented as 𝑄 = 1 2π‘š βˆ‘ [𝐴𝑖𝑗𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ]Ξ΄(𝑐𝑖,𝑐𝑗)
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
53
where π‘š = 1 2 βˆ‘ 𝐴𝑖𝑗𝑖𝑗 represents the number of edges between communities
𝑐𝑖 and 𝑐𝑗 in the graph, while π‘˜π‘–,π‘˜π‘— are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑒,𝑣)
is 1 if 𝑒 = 𝑣 and 0 otherwise. Newman34 in yet another approach
characterised the modularity matrix in terms of eigenvectors. The equation
for modularity was changed to
𝑄 =1 4π‘šπ‘ π‘‡π΅π‘  , where the modularity matrix was given as 𝐡𝑖𝑗 = 𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š
and modularity was defined using eigenvectors of the modularity
matrix.
Figure 1
Definition of local community:
Local community detection problem is to start from a preselected source
node. It adds the node meeting the conditions in into and removes the node
which does not meet the conditions from D gradually.
2.2. Related Algorithms:
At present, many local community detection algorithms have been proposed.
We introduce two representative local community detection algorithms.
(1)Clauset Algorithm: In order to solve the problem of local community
detection, Clauset [15] put forward the local community modularity R and
gave a fast convergence greedy algorithm to find the local community with
the greatest modularity.
Thedefinition of local community modularity is asfollows
where and represent two nodes in the graph. If nodes and are connected,
the value of is 1; otherwise, it is 0; if nodes and are both in , the value
of is 1; otherwise, it is 0.
The local community detection process of Clauset algorithm is similar to that
of web crawler algorithm. First, Clauset algorithm starts from an initial
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
54
node .Node is added to the subgraph , and all its neighbor nodes are added
to . Then the algorithm adds the node in which can bring the maximum
increment of into the local community iteratively, until the scale of the local
community reaches the preset size. That is to say, the algorithm needs to set
up a parameter to decide the size of the community, and the result is greatly
influenced by the initial node.
(2) LWP Algorithm: LWP [16] algorithm is an improved algorithm and it has
a clear end condition compared with Clauset algorithm. The algorithm
defines another local community modularity , which is expressed
aswhere and represent two nodes in the graph. If nodes and are
connected to each other, the value of is 1; otherwise, it is 0; if
nodes and are both in , the value of is 1; otherwise, it is 0; if only one of
the nodes and is in , the value of is 1; otherwise, it is 0.
Given an undirected and unweighted graph , LWP algorithm starts from an
initial node to find a subgraph with maximum value of . If the subgraph is a
community (i.e., ), then it returns the subgraph as a community. Otherwise,
it is considered that there is no community that can be found starting from
this initial node. For an initial node, LWP algorithm finds a subgraph with
the maximum value of local modularity by two steps. First, the algorithm is
initialized by constructing a subgraph with only an initial node and all the
neighbor nodes of node are added to the set . Then the algorithm performs
incremental step and pruning step.
In the incremental step, the node selected from which can make the local
modularity of increase with the highest value is added to iteratively. The
greedy algorithm will iteratively add nodes in to , until no node in can be
added. In the pruning step, if the local modularity of becomes larger when
removing a node from , then really remove it from . In the process of
pruning, the algorithm must ensure that the connectivity of is not destroyed
until no node can be removed. Then update the set and repeat the two steps
until there is no change in the process. The algorithm has a high Recall, but
its accuracy is low.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
55
The complexity of these two algorithms is , where is the number of nodes to
be explored in the local community and is the average degree of the nodes to
be explored in the local community.
CHAPTER-2
LITERATURE
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
56
Literature Review:
A wide research study in the recent years focused on community detection in
complex systems [4], most of them focus on undirected networks to enhance
the efficiency of identifying communities in understanding complex
networks. For instances, Fortunato et al [3] based his proposed approach on
statistical inference perspectives, Schaeffer et al [5], proposed their approach
for clustering problem as an unsupervised learning task based on similarity
measure over the data of the network, Girvan and Newman based their
community detection proposal on betweenness calculation to find out
community boundaries where modularity measure is the overall quality of
the graph partitioning [6, 7]. The weight used by Newman and Girvan [7]
aims to be the betweenness measure of the edge, representing the number of
shortest paths connecting any pair of nodes passing through. However,
community detection problem has been studied mainly in case of undirected
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
57
networks, various solutions was proposed in this context, motivating many
disciplines to deal with the issue. Interestingly, Fortunato et al [3] mentioned
the few possibilities for extending techniques from undirected to directed
case, where the edge directedness is not the only complication that could
face the clustering problem. Nevertheless, diverse graph data in many real-
world applications are by nature directed, thus interesting to save available
information behinds the edge directionality. Malliaros et al [8], revealed in
their survey that the most common way for researcher community to deal
with the problem of clustering is to ignore the directionality of the graph,
then proceed to clustering with a wide range of proposed tools. Therefore,
most of community detection proposals can not be used directly on weighted
directed graphs, where the number of communities not always known in
advance and the communities present different granularity scale. Since the
problem of community detection in complex network analysis acquires more
attention, many researchers have been interested into structural information
and topological networks metrics [1, 3, 4, 6, 7, 8, 9]. In [10], S. Ahajjam
based the proposed community detection algorithms on a new scalable
approach using leader nodes characteristics through two steps: (i)
Identification of potential leaders in the network, and (ii) exploration of nodes
similarities around leaders to build communities. Therefore, recent works
start focusing on both topological and topical aspects [9, 11, 12] to overpass
limited performances of topology-based community detection approaches.
Topic-based community detection started gaining attention through different
works for community detection in complex network [9, 13, 14]. The essence
behind the approach is to similarly detect nodes with same properties, which
are not necessarily real connections between nodes of the network, in which
actors communicate on topics of mutual interest [14] to determine the
communities which are topically similar.
The main contributions of this thesis are to exploring and utilizing local
community detection related approaches to improve the services in Online
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
58
SocialNetworking Sites in terms of their recommendation efficiency and
accuracy.
Contributions are listed as follows:
M.E.J.NewmananDM.Girvan[1],”Finding and evaluating community structure
in network”. One of the most relevant features of graphs representing real
systems is community structure, or clustering. Detecting communities is of great
importance in sociology, biology and computer science, disciplines where systems
are often represented as graphs. Community detection is important for
other reasons, too. Identifying modules and their boundaries allows for a
classification of vertices, according to their structural position in the modules. So,
vertices lying at the boundaries between modules play an important role of
mediation and lead the relationships and exchanges between different
communities . Such a classification seems to be meaningful in social and
metabolic networks.
The procedure can be better illustrated by means of dendrograms. Sometimes,
stopping conditions are imposed to select a partition or a group of partitions that
satisfy a special criterion, like a given number of clusters or the optimization of a
quality function.
A simple way to identify communities in a graph is to detect the edges that connect
vertices of different communities and remove them, so that the clusters get
disconnected from each other. This is the philosophy of Divisivealgorithms.
H.-W.Shen and X.-Q.Cheng[3],”Spectral methods for the detection of neteork
community structure”.In this paper Spectral analysis has been successfully
applied at the detection of community structure of networks, respectively being
based on the adjacency matrix, the standard Laplacian matrix, the normalized
Laplacian matrix, the modularity matrix, the correlation matrix and several other
variants of these matrices. However, the comparison between these spectral
methods is less reported. More importantly, it is still unclear which matrix is more
appropriate for the detection of community structure. This paper answers the
question through evaluating the effectiveness of these five matrices against the
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
59
benchmark networks with heterogeneous distributions of node degree and
community size. In this paper, we conduct a comparative analysis of the
aforementioned five matrices on the benchmark networks which have
heterogeneous distributions of node degree and community size. The comparison is
carried out from two perspectives. The former one focuses on whether the number
of intrinsic communities can be exactly identified according to the spectrum of
these five matrices. The latter evaluates the effectiveness of these matrices at
identifying the intrinsic community structure using their eigenvectors
This paper carried out a comparative analysis on the spectral methods for the
detection of network community structure through evaluating the performance of
five widely used matrices on the benchmark networks with heterogeneous
distribution of node degree and community size. These five matrices are
respectively the adjacency matrix, the standard Laplacian matrix, the normalized
Laplacian matrix, the modularity matrix and the correlation matrix. Test results
demonstrate that the normalized Laplacian matrix and the correlation matrix
significantly outperform the other three matrices at identifying the community
structure of networks. This indicates that the heterogeneity of node degree is a
crucial ingredient for the detection of community structure using spectral
methods.
V.D.Blondel,J.Guillaume,R.Lambiotte et al[7],”Fast unfolding of communities in
large networks”. This Paper propose a simple method to extract the community
structure of large networks. Our method is a heuristic method that is based on
modularity optimization. It is shown to outperform all other known community
detection method in terms of computation time. Moreover, the quality of the
communities detected is very good, as measured by the so-called modularity. This
is shown first by identifying language communities in a Belgian mobile phone
network of 2.6 million customers and by analyzing a web graph of 118 million
nodes and more than one billion links. The accuracy of our algorithm is also
verified on ad-hoc modular networks. The problem of community detection
requires the partition of a network into communities of densely connected nodes,
with the nodes belonging to different communities being only sparsely connected.
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
60
This paper introduced an algorithm for optimizing modularity that allows to
study networks of unprecedented size. The limitation of the method for the
experiments that we performed was the storage of the network in main memory
rather than the computation time. The accuracy of our method has also been
tested on ad-hoc modular networks and is shown to be excellent in comparison
with other (much slower) community detection methods. By construction, our
algorithm unfolds a complete hierarchical community structure for the network,
each level of the hierarchy being given by the intermediate partitions found at each
pass.
K.M.Tan,D.Written and A.Shojaie[8],”The cluster graphical lasso for improved
estimation of Gaussian graphical models”. In this paper the task of estimating
a Gaussian graphical model in the high-dimensional setting is considered. The
graphical lasso, which involves maximizing the Gaussian log likelihood subject to a
lasso penalty, is a well-studied approach for this task. A surprising connection
between the graphical lasso and hierarchical clustering is introduced: the
graphical lasso in effect performs a two-step procedure, in which single linkage
hierarchical clustering is performed on the variables in order to identify connected
components, and then a penalized log likelihood is maximized on the subset of
variables within each connected component. Thus, the graphical lasso determines
the connected components of the estimated network via single linkage clustering.
The single linkage clustering is known to perform poorly in certain finite-sample
settings. Therefore, the cluster graphical lasso, which involves clustering the
features using an alternative to single linkage clustering, and then performing the
graphical lasso on the subset of variables within each cluster, is proposed.
We have shown that identifying the connected components of the graphical lasso
solution is equivalent to performing SLC based on S Μƒ , the absolute value of the
empirical covariance matrix. Based on this connection, we have proposed the
cluster graphical lasso, an improved version of the graphical lasso for sparse
inverse covariance estimation. In this paper, we have considered the use of
hierarchical clustering in the CGL procedure. We have shown that performing
hierarchical clustering on S Μƒ leads to consistent cluster recovery. As a byproduct,
Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
61
we suggest a choice of Ξ»1, …, Ξ»K in CGL that yields consistent identification of the
connected components. In addition, we establish the model selection consistency of
CGL.
Y.J.Wu,H.Huang,Z.F.Hao and F.Chen[17],”Local Community Detection using
link similarity”.In this paper Exploring local community structure is an
appealing problem that has drawn much recent attention in the area of social
network analysis. The existing approaches do well in measuring the community
quality, but they are largely dependent on source vertex and putting too strict
policy in agglomerating new vertices. Moreover, they have parameters which are
difficult to obtain. This paper proposes a method toΒ―find local community structure
by analyzing link similarity between the community and the vertex. Inspired by the
fact that elements in the same community are more likely to share common links,
we explore community structure heuristically by giving priority to vertices which
have a high link similarity with the community. A three-phase process is also used
for the sake of improving quality of community structure. Experimental results
prove that our method performs efectively not only in computer-generated graphs
but alsoin real-world graphs.
In this paper we present a method that mainly depends on link similarity
between the vertex and community to explore local community. This method
searches the potential vertices in a specific sequence so as to help improve the
accuracy. In this paper, we pro-posed an improved method to detect local
community structure. Our algorithm mainly takes advantage of link similarity
between the vertex and the community.A greedy agglomeration phase, an
optimization phase and a trimming phase are included in our algorithm.Compared
with other multi-phase algorithms, our al-gorithm implements comparatively
easier and stricter stopping criteria for the purpose of simplifying and op-timizing
our algorithm. Experimental results show that our algorithm can discover local
community better than other existing methods both in computer-generated
benchmark graphs and in real-world networks.
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster

More Related Content

What's hot

Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIJERA Editor
Β 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
Β 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
Β 
Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance csandit
Β 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
Β 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
Β 
A new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionA new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionIJDKP
Β 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionIJCSIS Research Publications
Β 
Handwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network methodHandwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network methodTELKOMNIKA JOURNAL
Β 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningGuido A. Ciollaro
Β 
Advances of neural networks in 2020
Advances of neural networks in 2020Advances of neural networks in 2020
Advances of neural networks in 2020kevig
Β 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
Β 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNAbhishek Tiwari
Β 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2IRJET Journal
Β 
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...Zac Darcy
Β 

What's hot (19)

Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural Network
Β 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
Β 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
Β 
J017446568
J017446568J017446568
J017446568
Β 
Seminar5
Seminar5Seminar5
Seminar5
Β 
Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance
Β 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
Β 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...
Β 
A new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionA new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detection
Β 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
Β 
Handwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network methodHandwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network method
Β 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
Β 
Advances of neural networks in 2020
Advances of neural networks in 2020Advances of neural networks in 2020
Advances of neural networks in 2020
Β 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
Β 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Β 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNN
Β 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
Β 
5th sem
5th sem5th sem
5th sem
Β 
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN ESTIMATING PARTICIPATION IN ELEC...
Β 

Similar to 5.local community detection algorithm based on minimal cluster

LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...Munisekhar Gunapati
Β 
Shreyan cv4apr2020 (1)
Shreyan cv4apr2020 (1)Shreyan cv4apr2020 (1)
Shreyan cv4apr2020 (1)ShreyanDatta
Β 
A1 Resume-DRUdaykumar Naik
A1 Resume-DRUdaykumar NaikA1 Resume-DRUdaykumar Naik
A1 Resume-DRUdaykumar NaikUdayakumar Naik
Β 
Introduction of Wireless Sensor Network
Introduction of Wireless Sensor NetworkIntroduction of Wireless Sensor Network
Introduction of Wireless Sensor NetworkMuhammad Kaife Uddin
Β 
Nishant_poras_13-08-1991
Nishant_poras_13-08-1991Nishant_poras_13-08-1991
Nishant_poras_13-08-1991Nishant Poras
Β 
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTION
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTIONA PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTION
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTIONScott Donald
Β 
Social Networking Site in JAVA
Social Networking Site in JAVASocial Networking Site in JAVA
Social Networking Site in JAVAPAS Softech Pvt. Ltd.
Β 
Handwritten_Equation_Project_PDF.pdf handwritten equation
Handwritten_Equation_Project_PDF.pdf handwritten equationHandwritten_Equation_Project_PDF.pdf handwritten equation
Handwritten_Equation_Project_PDF.pdf handwritten equationNagavelliMadhavi
Β 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesathiathi3
Β 
Mitul Nagar Resume for CR Report2014-2015
Mitul Nagar Resume for CR Report2014-2015Mitul Nagar Resume for CR Report2014-2015
Mitul Nagar Resume for CR Report2014-2015Mitul Nagar
Β 
Manjushree_EC_fresher_2016
Manjushree_EC_fresher_2016Manjushree_EC_fresher_2016
Manjushree_EC_fresher_2016Manjushree Mashal
Β 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORTSoham Wadekar
Β 
Praveen Resume 31oct
Praveen Resume 31octPraveen Resume 31oct
Praveen Resume 31octPraveen Kumar
Β 
Sample projectdocumentation
Sample projectdocumentationSample projectdocumentation
Sample projectdocumentationhlksd
Β 
NBA power point presentation final copy y
NBA power point presentation final copy yNBA power point presentation final copy y
NBA power point presentation final copy ysrajece
Β 
RAJALEKSHMI SANAL_RESUME
RAJALEKSHMI SANAL_RESUMERAJALEKSHMI SANAL_RESUME
RAJALEKSHMI SANAL_RESUMERajalekshmi Sanal
Β 
Megha Dogra1
Megha Dogra1Megha Dogra1
Megha Dogra1MEGHA DOGRA
Β 

Similar to 5.local community detection algorithm based on minimal cluster (20)

LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
Β 
Shreyan cv4apr2020 (1)
Shreyan cv4apr2020 (1)Shreyan cv4apr2020 (1)
Shreyan cv4apr2020 (1)
Β 
A1 Resume-DRUdaykumar Naik
A1 Resume-DRUdaykumar NaikA1 Resume-DRUdaykumar Naik
A1 Resume-DRUdaykumar Naik
Β 
Introduction of Wireless Sensor Network
Introduction of Wireless Sensor NetworkIntroduction of Wireless Sensor Network
Introduction of Wireless Sensor Network
Β 
Nishant_poras_13-08-1991
Nishant_poras_13-08-1991Nishant_poras_13-08-1991
Nishant_poras_13-08-1991
Β 
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTION
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTIONA PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTION
A PROJECT REPORT ON FACE RECOGNITION SYSTEM WITH FACE DETECTION
Β 
Thesis
ThesisThesis
Thesis
Β 
Social Networking Site in JAVA
Social Networking Site in JAVASocial Networking Site in JAVA
Social Networking Site in JAVA
Β 
Handwritten_Equation_Project_PDF.pdf handwritten equation
Handwritten_Equation_Project_PDF.pdf handwritten equationHandwritten_Equation_Project_PDF.pdf handwritten equation
Handwritten_Equation_Project_PDF.pdf handwritten equation
Β 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pages
Β 
resume
resumeresume
resume
Β 
Mitul Nagar Resume for CR Report2014-2015
Mitul Nagar Resume for CR Report2014-2015Mitul Nagar Resume for CR Report2014-2015
Mitul Nagar Resume for CR Report2014-2015
Β 
Manjushree_EC_fresher_2016
Manjushree_EC_fresher_2016Manjushree_EC_fresher_2016
Manjushree_EC_fresher_2016
Β 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORT
Β 
Praveen Resume 31oct
Praveen Resume 31octPraveen Resume 31oct
Praveen Resume 31oct
Β 
Sample projectdocumentation
Sample projectdocumentationSample projectdocumentation
Sample projectdocumentation
Β 
NBA power point presentation final copy y
NBA power point presentation final copy yNBA power point presentation final copy y
NBA power point presentation final copy y
Β 
drc 3-2.pptx
drc 3-2.pptxdrc 3-2.pptx
drc 3-2.pptx
Β 
RAJALEKSHMI SANAL_RESUME
RAJALEKSHMI SANAL_RESUMERAJALEKSHMI SANAL_RESUME
RAJALEKSHMI SANAL_RESUME
Β 
Megha Dogra1
Megha Dogra1Megha Dogra1
Megha Dogra1
Β 

More from Venkat Projects

1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docxVenkat Projects
Β 
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...Venkat Projects
Β 
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docxVenkat Projects
Β 
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docxVenkat Projects
Β 
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...Venkat Projects
Β 
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docxImage Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docxVenkat Projects
Β 
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...Venkat Projects
Β 
WATERMARKING IMAGES
WATERMARKING IMAGESWATERMARKING IMAGES
WATERMARKING IMAGESVenkat Projects
Β 
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...Venkat Projects
Β 
Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...Venkat Projects
Β 
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...Venkat Projects
Β 
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docxVenkat Projects
Β 
2022 PYTHON MAJOR PROJECTS LIST.docx
2022 PYTHON MAJOR  PROJECTS LIST.docx2022 PYTHON MAJOR  PROJECTS LIST.docx
2022 PYTHON MAJOR PROJECTS LIST.docxVenkat Projects
Β 
2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docxVenkat Projects
Β 
2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docxVenkat Projects
Β 
2021 python projects list
2021 python projects list2021 python projects list
2021 python projects listVenkat Projects
Β 
10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learni10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learniVenkat Projects
Β 
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...Venkat Projects
Β 
6.iris recognition using machine learning technique
6.iris recognition using machine learning technique6.iris recognition using machine learning technique
6.iris recognition using machine learning techniqueVenkat Projects
Β 
4.detection of fake news through implementation of data science application
4.detection of fake news through implementation of data science application4.detection of fake news through implementation of data science application
4.detection of fake news through implementation of data science applicationVenkat Projects
Β 

More from Venkat Projects (20)

1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
Β 
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
Β 
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
Β 
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
Β 
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
Β 
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docxImage Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Β 
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
Β 
WATERMARKING IMAGES
WATERMARKING IMAGESWATERMARKING IMAGES
WATERMARKING IMAGES
Β 
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
Β 
Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...
Β 
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
Β 
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
Β 
2022 PYTHON MAJOR PROJECTS LIST.docx
2022 PYTHON MAJOR  PROJECTS LIST.docx2022 PYTHON MAJOR  PROJECTS LIST.docx
2022 PYTHON MAJOR PROJECTS LIST.docx
Β 
2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx
Β 
2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx
Β 
2021 python projects list
2021 python projects list2021 python projects list
2021 python projects list
Β 
10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learni10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learni
Β 
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
Β 
6.iris recognition using machine learning technique
6.iris recognition using machine learning technique6.iris recognition using machine learning technique
6.iris recognition using machine learning technique
Β 
4.detection of fake news through implementation of data science application
4.detection of fake news through implementation of data science application4.detection of fake news through implementation of data science application
4.detection of fake news through implementation of data science application
Β 

Recently uploaded

2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
Β 
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoordharasingh5698
Β 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
Β 
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort ServiceCall Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
Β 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
Β 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
Β 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
Β 
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort ServiceCall Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Standamitlee9823
Β 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
Β 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
Β 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)
Β 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
Β 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
Β 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
Β 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
Β 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
Β 

Recently uploaded (20)

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Β 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
Β 
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor πŸ“± {7001035870} VIP Escorts chittoor
Β 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
Β 
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort ServiceCall Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Netaji Nagar, Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Β 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
Β 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
Β 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Β 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
Β 
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort ServiceCall Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Call Girls in Ramesh Nagar Delhi πŸ’― Call Us πŸ”9953056974 πŸ” Escort Service
Β 
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Β 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
Β 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
Β 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
Β 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
Β 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
Β 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
Β 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
Β 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
Β 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
Β 

5.local community detection algorithm based on minimal cluster

  • 1. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 1 A Project Thesis on Local Cluster-Based Community Detection Algorithm Project work submitted in partial fulfillment of the requirements for the award of the degree of MASTER OF COMPUTER APPLICATIONS in COMPUTER SCIENCE AND ENGINEERING Submittedby Regalla Sairam Reddy [Roll No.17021F0005] Under the supervision of DR.M.H.M KRISHNA PRASAD DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A) JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA KAKINADA-533003, A.P, INDIA 2017-2020
  • 2. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A) JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA KAKINADA-533003, A.P, INDIA 2017-2020 DECLARATION BY THE STUDENT I hereby declare that the work described in this thesis, entitled β€œA LiteratureStudy and Local Cluster-Based Community Detection Algorithm”, which is being submitted by me in partial fulfillment of the requirements for the award of degree of Master Of Computer Applications in the department of Computer Science & Engineering to the University College of Engineering (Autonomous),Jawaharlal Nehru Technological University, Kakinada – 533003, A.P., is the result of investigations carried out by me under the guidance of Dr.M.H.M Krishna Prasad, Assistant professor of Computer Science and Engineering, University College of Engineering, Jawaharlal Nehru Technological University, Kakinada-533003. The work is original and has not been submitted to any other University or Institute for the award of any degree or diploma. Place: Kakinada Signature:
  • 3. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 3 Date: Name: Regalla Sairam Reddy Roll No:17021F0005 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A) JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA KAKINADA-533003, A.P, INDIA 2017-2020 CERTIFICATE FROM THE SUPERVISOR This is to certify that the thesis entitled β€œA Literature Study and Local Cluster- Based Community Detection Algorithm”, that is being submitted by Regalla SairamReddy, Roll No.17021F0005, in partial fulfillment of the requirements for the award of degree of Master of Computer Applications in the Department of Computer Science &Engineering to the University College of Engineering (Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of bona fide project work carried out by him under my guidance and supervision. Signature of the Supervisor Dr.M.H.M Krishna Prasad
  • 4. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 4 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A) JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA KAKINADA-533003, A.P, INDIA2017-2020 CERTIFICATE FROM THE HEAD OF THE DEPARTMENT This is to certify that the thesis entitled β€œA Literature Study and Local Cluster- Based Community Detection Algorithm”, that is being submitted by Regalla Sairam Reddy, Roll No.17021F0005, in partial fulfillment of the requirements for the award of degree of Master of Computer Applications in the Department of Computer Science &Engineering to the University College of Engineering (Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of bona fide project work carried out by him at our department. Signature of Head of the Department Dr. D. Haritha
  • 5. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 5
  • 6. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 6 ACKNOWLEDGEMENTS The successful completion of any task is not possible without proper suggestion, guidance and environment. Combination of these three factors acts like backbone to my β€œA Literature Study and Local Cluster-Based Community Detection Algorithm” project. I wish to express my sincere gratitude to Dr. M.H.M. Krishna Prasad, Professor, Department of CSE, University College of Engineering, Jawaharlal Nehru Technological University Kakinada for giving me his guidance, immense support and valuable suggestions during my project. I would like to express my grateful thanks to Dr. D. Haritha, Professor and Head of the Department of CSE, University College of Engineering, Jawaharlal Nehru Technological University Kakinada whose immense support encouraged completing the project successfully. My Sincere thanks to all the teaching and non-teaching staff of Department of Computer Science and Engineering for their support throughout my project work. Finally, I would like to thank my family and friends for their support and motivation, which helped me to complete the project successfully. Regalla Sairam Reddy Roll No: 17021F0005, Master of Computer Applications, University College of Engineering Kakinada, Jawaharlal Nehru Technological University, Kakinada – 533003, East Godavari District,A.P.
  • 7. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 7 CONTENT Acknowledgement 5 Abstract 6 Contents 7 List of Tables 8 List of Figures 9 CHAPTER -1 INTRODUCTION 1-23 1.1 What Is Software 1 1.2What Is Software Development Life Cycle Bug Prediction 2 CHAPTER-2 LITERATURE REVIEW 2.1 Related Work5 CHAPTER 3 PROBLEM IDENTIFICATION AND OBJECTIVE 3.1 Problem Statement 8 3.2 Project Objective 8 CHAPTER 4 METHODOLOGY 4.1 Methodology 9 4.1.1using Python Tool On Standalone Machine Learning Environment 9 4.2 Data Description 10
  • 8. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 8 4.3 Evaluation Criteria Used For Classification 11 4.3.1 Confusion Matrix 12 4.3.2 Accuracy And Precision 12 4.3.3 Recall And F-Square 13 4.3.4 Sensitivity, Specificity And Roc 13 4.3.5 Significance And Analysis Of Ensemble Method In MachineLearning 14 4.4 Uml Diagrams 15 4.4.1 Use Case Diagram 16 4.4.2 State Diagram 17 CHAPTER 5 OVERVIEW OF TECHNOLOGIES 5.1AlgorithmsUsed 17 5.1.1 Decision Tree Induction 17 5.1.2 NaΓ―ve Bayes 21 5.1.3 Artificial Neuralnetwork 22 5.1.4 Support Vector Machine Model 25 5.1.5 Kernal Functions 28 5.2 Tensorflow 28 CHAPTER 6 IMPLEMENTATION AND RESULTS 6.1 Framework Design 31 6.2 Coding And Testing 34
  • 9. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 9 CHAPTER 7 CONCLUSION40 Reference 42 LIST OF TABLES: Table No Table Name Page No 1 Cluster based community detection 27 2 Generic algorithms based community detection 30 3 Label propagation based community detection 31 4 Clique based methods for overlapping community detection 35
  • 10. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 10 LIST OF FIGURES: Figure No Figure Name Page No 5.1.1 Decision Tree induction 17 5.1.2 NaΓ―ve Bayes Algorithm 21 5.1.3 Artificial Neural Networks Algorithm 23 5.1.4(i) Support Vector Machine Hyperplane 25 5.1.4(ii) Support Vector Machine Hyperplane 26 5.2 TensorFlow in UML 29 4.3.1 Use Case Diagram 4.3.2 State Diagram 4.1.1 Block Diagram Of Proposed Work 4.2.5 Classification Model 6.1 Framework Design 6.2 Bar graph of knn,svm,naviebayes
  • 11. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 11
  • 12. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 12
  • 13. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 13
  • 14. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 14
  • 15. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 15 ABSTRACT In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.
  • 16. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 16 CHAPTER 1 INTRODUCTION
  • 17. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 17 Introduction to Community Detection: Recently, many researchers notice that the complex network is a proper tools to describe variety of complex system in the real world , and thus the complex network has attracted the great attention in many fields such as physics, biology and social network et al. In complex network field, one of the important topology property is community structure which comprise of densely connected nodes, and some researchers have found that detecting community structure can reveal some valuable insights of the functional feature in the complex system . For example, communities in multimedia social network may imply people with the same hobby and trust relationship. Zhiyong Zhang et al. proposed an approach to analyse and detect credible potential path based on community in multimedia social networks , the approach can effectively and accurately mine potential paths of copyrighted digital content sharing. Zhiyong Zhang et al. also proposed a trust model based on small world theory which shows the widely application
  • 18. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 18 of community struction The community structure in biology field may cluster proteins with the same function. So that many methods have been proposed to reveal this topological property in complex networks. Community detection on complex networks has been a hot research field. Recently, a large number of algorithms for studying the global structure of the network are proposed, such as the modularity optimization algorithms , the spectral clustering algorithms , the hierarchical clustering algorithms , and the label propagation algorithms . However, with the continuous expansion of complex networks, it is easy to collect large network dataset with millions of nodes. How to store such a large-scale dataset in computer memory to analyze is a huge challenge for scholars. The calculation for studying the overall structure of this kind of large-scale networks is unimaginable. So local community detection becomes an appealing problem and has drawn more and more attention The main task of local community detection is to find a community using the local information of the network. Local community detection has good extensibility. If the local community detection algorithm is iteratively executed, more local communities can be found and the whole community structure of the network can be obtained. The time complexity of this kind of global community detection algorithm is dependent on the efficiency and accuracy of local community detection algorithms, so the research of local community detection algorithm still has a long way to go. There are several problems that need to be solved in the research of local community detection. First, we should determine the initial state and find the initial node for local community detection, so as to determine the needed local information; then, we need to select an objective function, and through continuous iterative optimization of the objective function we find the community structure with high quality; after that we need to find a suitable node expansion method, so that the algorithm can extract the local community from the initial state step by step; finally, in order to terminate the algorithm, a suitable termination condition is needed to determine the boundary of the community.
  • 19. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 19 Most of local community detection algorithms are based on the above- mentioned process. The definition of local community detection is to find the local community structure from one or more nodes, but most of the existing local community detection algorithms, including Clauset , LWP , and LS , are starting from only one initial node. They greedily select the optimal nodes from the candidate nodes and add them into the local community. LMD algorithmextends not from the initial node but from its closest and next closest local degree central nodes. It discovers a local community from each of these nodes, respectively. It still starts from single node and discovers many local communities for the initial node. In general, the aggregation ability of a single node is lower than that of multiple nodes. So we do not just rely on the initial node as the beginning of local community expansion. Our primary goal is to find a minimal cluster closely connected to the initial node and then detect local community based on the minimal cluster. This can avoid instability because of the excessive dependence on the initial node. In this paper, we introduce a local community detection algorithm based on the minimal clusterβ€”NewLCD. In this new algorithm, the beginning of community expansion is no longer from the initial node only, but a cluster of nodes relatively closely connected to the initial node. The algorithm mainly consists of two parts: one is the detection of the minimal cluster, and the other is the detection of the local community based on the minimal cluster. At the same time, the algorithm can be applied to the global community detection. After finding one local community using this algorithm, we can repeat the process to obtain the global community structure of the whole network. Community Detection The concept of community detection has emerged in network science as a method for finding groups within complex systems through represented on a graph. In contrast to more traditional decomposition methods which seek a strict block diagonal or block triangular structure, community
  • 20. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 20 detection methods find subnetworks with statistically significantly more links between nodes in the same group than nodes in different groups (Girvan and Newman, 2002). Central to community detection is the notion of modularity, a metric that captures this difference: (1)Q=12mβˆ‘i,jAijβˆ’kikj2mΞ΄gigj Here, Q is the modularity, Aij is the edge weight between nodes i and j, ki is the total weight of all edges connecting node i with all other nodes, and m is the total weight of all edges in the graph. The Kronecker delta function Ξ΄(gi,gj) will evaluate to one if nodes i and j belong to the same group, and zero otherwise. Modularity is a property of how one decides to partition a network: networks that are not partitioned and those that place every node in its own community will both have modularity equal to zero. The goal of community detection, then, is to find communities that maximize modularity. Although modularity maximization is an NP-hard integer program, many efficient algorithms exist to solve it approximately, including spectral clustering (Newman, 2006) and fast unfolding (Blondel et al., 2008). Figure 1 shows a few networks with increasing maximum modularity; note how the community structure becomes increasingly apparent as this value increases. Previous efforts in our group have applied community detection to chemical plant networks by creating an equation graph of the corresponding dynamic model (Moharir et al., 2017). By doing so, communities of state variables, inputs, and outputs can be obtained which are tightly interacting amongst themselves but weakly interacting with other communities. As such, these communities can form the basis of distributed control architectures (Jogwar and Daoutidis, 2017) which typically perform better than other distributed control architectures that one may obtain from β€œintuition” (Pourkargar et al., 2017). For a more comprehensive review of the use of community detection in distributed control, we refer the reader to Daoutidis et al., 2017. An alternative method for finding communities for distributed model predictive control is to apply a decomposition on the optimization problem as a whole (Tang et al., 2017).
  • 21. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 21 Figure 1. Networks of different maximum modularity. However, community structure can exist in all optimization problems, thus it makes sense to extend this method to any generic optimization problem, and this work proposes to do so. The advantage of using community detection to find decompositions is that subproblems generated will have statistically minimal interactions, through complicating variables or constraints, and thus require minimal coordination through the decomposition solution method. The proposed method is generic, applicable to any optimization problem or decomposition solution approach, and scalable, using computationally efficient graph theory algorithms. Automatically identifying groups based on network clustering algorithms: NodeXL can automatically identify groups within a network based solely on network structure. In contrast to the approach of using existing data about the attributes as used in Section 7.2.3, this approach is based solely on who is connected to whom. A number of different network β€œclustering” (also known as β€œcommunity detection”) algorithms exist, which help find subgroups of highly inter-connected vertices within a network. NodeXL includes three such algorithms: Clauset-Newman-Moore, Wakita- Tsurumi [3], and Girvan-Newman (which can take a long time to run on large graphs). In all of these algorithms, the number of clusters is not predetermined; instead the algorithm dynamically determines the number it
  • 22. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 22 thinks is best. Each vertex is assigned to exactly one cluster, meaning that clusters do not overlap. The number of vertices in each cluster can vary significantly. In some cases, a single cluster can encompass all vertices, whereas in other cases, a cluster can consist of a single vertex. See Newman [4] for background on some of these and other community identifying algorithms. There is no β€œright” or β€œwrong” algorithm to use; instead, it is often useful to try out different ones and see which ones you believe provide the best results given your network. For example, in this network, the Clauset-Newman- Moore algorithm results in fewer, larger groups than the other algorithms, which provide more groups of a smaller size. Try applying the Wakita- Tsurumi clustering algorithm by clicking on the Groups dropdown menu in the NodeXL ribbon and choosing Group by Cluster and the checking the appropriate selector as shown in Figure 7.14. Notice that the data on the Groups worksheet is now updated to reflect the new groups.
  • 23. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 23
  • 24. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 24 A social network for an individual is created with his/her interactions and personal relationships with other members in the society. Social networks represent and model the social ties among individuals. With the rapid expansion of the web, there is a tremendous growth in online interaction of the users. Many social networking sites, e.g., Facebook, Twitter etc. have also come up to facilitate user interaction. As the number of interactions have increased manifold, it is becoming difficult to keep track of these communications. Human beings tend to get associated with people of similar likings and tastes. The easy-to-use social media allows people to extend their social life in unprecedented ways since it is difficult to meet friends in the physical world, but much easier to find friends online with similar interests. These real-world social networks have interesting patterns and properties which may be analysed for numerous useful purposes. Social networks have a characteristic property to exhibit a community structure. If the vertices of the network can be partitioned into either disjoint or overlapping sets of vertices such that the number of edges within a set exceeds the number of edges between any two sets by some reasonable amount, we say that the network displays a community structure. Networks disp+laying a community structure may often exhibit a hierarchical community structure as well1 . The process of discovering the cohesive groups or clusters in the network is known as community detection. It forms one of the key tasks of Social network analysis2 . The detection of communities in social networks can be useful in many applications where group decisions are taken, e.g., multicasting a message of interest to a community instead of sending it to each one in the group or recommending a set of products to a community. The applications of community detection have been highlighted towards the end of the article. State of the art in community detection research for social networks is presented in this work. The paper begins with the basic concepts of social networks and communities. Various methods for community detection are categorised and discussed in the next section followed by list of standard datasets used for analysis in community detection research along with the links for download if available online. Some potential applications of
  • 25. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 25 community detection in social networks are briefly described in the next section. Discussion section argues the advantages of using a method with respect to another, the kind of community structure they obtain, etc. and the conclusion section concludes the paper. BASIC CONCEPTS SOCIAL NETWORK A social network is depicted by social network graph 𝐺 consisting of 𝑛 number of nodes denoting 𝑛 individuals or the participants in the network. The connection between node 𝑖 and node 𝑗 is represented by the edge 𝑒𝑖𝑗 of the graph. A directed or an undirected graph may illustrate these connections between the participants of the network. The graph can be represented by an adjacency matrix 𝐴 in which 𝐴𝑖𝑗 = 1 in case there is an edge between 𝑖 and 𝑗 else 𝐴𝑖𝑗 = 0. Social networks follow the properties of complex networks3,4 . Some real life examples1 of social networks include friends based, telephone, email and collaboration networks. These networks can be represented as graphs and it is feasible to study and analyse them to find interesting patterns amongst the entities. These appealing prototypes can be utilized in various useful applications. Community A community can be defined as a group of entities closer to each other in comparison to other entities of the dataset. Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group. The closeness between entities of a group can be measured via similarity or distance measures between entities. McPherson et al5 stated that β€œsimilarity breeds connection”. They discussed various social factors which lead to similar behaviour or homophily in networks. The communities in social networks are analogous to clusters in networks. An individual represented by a node in graphs may not be part of just a community or a group, it may be an element of many closely associated or different groups existing in the network. For example a person may concurrently belong to college, school, friends and family groups. All such communities which have common nodes are called overlapping communities. Identification and analysis of the community structure has been done by many researchers applying methodologies from numerous form of sciences. The quality of clustering in networks is normally judged by
  • 26. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 26 clustering coefficient which is a measure of how much the vertices of a network tend to cluster together. The global clustering coefficient6 and the local clustering coefficient7 are two types of clustering coefficients discussed in literature. Methods for grouping similar items Communities are those parts of the graph which have denser connections inside and few connections with the rest of the graph8 . The aim of unsupervised learning is to group together similar objects without any prior knowledge about them. In case of networks, the clustering problem refers to grouping of nodes according to their similarity computed based on topological features and/or other characteristics of the graph. Network partitioning and clustering are two commonly used methods in literature to find the groups in the social network graph. These methods are briefly described in the next subsections. Graph partitioning Graph partitioning is the process of partitioning a graph into a predefined number of smaller components with specific properties. A common property to be minimized is called cut size. A cut is a partition of the vertex set of a graph into two disjoint subsets and the size of the cut is the number of edges between the components. A multicut is a set of edges whose removal divides the graph into two or more components. It is necessary to specify the number of components one wishes to get in case of graph partitioning. The size of the components must also be specified, as otherwise a likely but not meaningful solution would be to put the minimum degree vertex into one component and the rest of the vertices into another. Since the number of communities is usually not known in advance, graph partitioning methods are not suitable to detect communities in such cases. Clustering Clustering is the process of grouping a set of similar items together in structures known as clusters. Clustering the social network graph may give a lot of information about the underlying hidden attributes, relationships and properties of the participants as well as the interactions among them. Hierarchical clustering and partitioning method of clustering are the commonly used clustering techniques used in literature. In hierarchical clustering, a hierarchy of clusters is formed. The process of hierarchy creation or levelling can be agglomerative or divisive. In
  • 27. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 27 agglomerative clustering methods, a bottom-up approach to clustering is followed. A particular node is clubbed or agglomerated with similar nodes to form a cluster or a community. This aggregation is based on similarity. In divisive clustering approaches, a large cluster is repeatedly divided into smaller clusters. Partitioning methods begin with an initial partition amidst the number of clusters pre-set and relocation of instances by moving them across clusters, e.g., K-means clustering. An exhaustive evaluation of all possible partitions is required to achieve global optimality in partitioned- based clustering. This is time consuming and sometimes infeasible, hence researchers use greedy heuristics for iterative optimization in partitioning methods of clustering. The next section categorizes and discusses major algorithms for community detection. ALGORITHMS FOR COMMUNITY DETECTION A number of community detection algorithms and methods have been proposed and deployed for the identification of communities in literature. There have also been modifications and revisions to many methods and algorithms already proposed. A comprehensive survey of community detection in graphs has been done by Fortunato8 in the year 2010. Other reviews available in literature are by Coscia et al9 in 2011, Fortunato et al10 in 2012, Porter et al11 in 2009, Danon et al12 in 2005, and PlantiΓ© et al13 in 2013. The presented work reviews the algorithms available till 2015 to the best of our knowledge including the algorithms given in the earlier surveys. Papers based on new approaches and techniques like big data, not discussed by previous authors have been incorporated in our article.The algorithms for community detection are categorized into approaches based on graph partitioning, clustering, genetic algorithms, label propagation along with methods for overlapping community detection (clique based and non-clique based methods), and community detection for dynamic networks. Algorithms under each of these categories are described below. Graph partitioning based community detection Graph partitioning based methods have been used in literature to divide the graph into components such that there are few connections between components. The Kernighan-Line14 algorithm for graph partitioning was amongst the
  • 28. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 28 earliest techniques to divide a graph. It partitions the nodes of the graph with cost on edges into subsets of given sizes so as to minimize the sum of costs on all edges cut. A major disadvantage of this algorithm however is that the number of groups have to be predefined. The algorithm however is quite fast with a worst case running time of 𝑂(𝑛 2 ). Newman15 reduces the widely-studied maximum likelihood method for community detection to a search through a group of candidate solutions, each of which is itself a solution to a minimum cut graph partitioning problem. The paper shows that the two most essential community inference methods based on the stochastic block model or its degree-corrected variant16 can be mapped onto versions of the familiar minimum-cut graph partitioning problem. This has been illustrated by adapting Laplacian spectral partitioning method17, 18 to perform community inference. Clustering based community detection The main concern of community detection is to detect clusters, groups or cohesive subgroups. The basis of a large number of community detection algorithms is clustering. Amongst the innovators of community detection methods, Girvan and Newman19 had a main role. They proposed a divisive algorithm based on edge-betweenness for a graph with undirected and unweighted edges. The algorithm focused on edges that are most β€œbetween” the communities and communities are constructed progressively by removing these edges from the original graph. Three different measures for calculation of edge-betweenness in vertices of a graph were proposed in Newman and Girvan20 . The worst-case time complexity of the edge betweenness algorithm is 𝑂(π‘š2𝑛) and is 𝑂(𝑛 3 ) for sparse graphs, where m denotes the number of edges and n is the number of vertices. The Girvan Newman (GN) algorithm has been enhanced by many authors and applied to various networks21-28. Chen et al22 extended GN algorithm to partition weighted graphs and used it to identify functional modules in the yeast proteome network. Rattigan et al21 proposed the indexing methods to reduce the computational complexity of the GN algorithm significantly. Pinney et al24 also build an algorithm which uses GN algorithm for the decomposition of networks based on graph theoretical concept of
  • 29. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 29 betweenness centrality. Their paper inspected utility of betweenness centrality to decompose such networks in diverse ways. Radicchi29 et al also proposed an algorithm based on GN algorithm introducing a new definition of community. They defined β€˜strong’ and β€˜weak’ communities. The algorithm uses an edge clustering coefficient to perform the divisive edge removal step of GN and has a running time of 𝑂( π‘š4 𝑛2 ) and 𝑂(𝑛 2 ) for sparse graphs. Moon et al30 have proposed and implemented the parallel version of the GN algorithm to handle large scale data. They have used MapReduce model (Apache Hadoop) and GraphChi. Newman and Girvan first defined a measure known as β€˜modularity’ to judge the quality of partitions or communities formed20 . The modularity measure proposed by them has been widely accepted and used by researchers to gauge the goodness of the modules obtained from the community detection algorithms with high modularity corresponding to a better community structure. Modularity was defined as βˆ‘π’Šπ’†π’Šπ’Š βˆ’ π’‚π’ŠπŸ , where π’†π’Šπ’Š denotes fraction of the edges that connect vertices in community π’Š, π’†π’Šπ’‹ denotes fraction of the edges connecting vertices in two different communities π’Š and 𝒋 while π’‚π’Š = βˆ‘π’‹π’†π’Šπ’‹ is the fraction of edges that connect to vertices in community π’Š. The value 𝑄 = 1 indicates a network with strong community structure. The optimization of modularity function has received great attention in literature. The table 1 lists clustering based community detection methods, including algorithms which use modularity and modularity optimization. Newman31 has worked to maximize modularity so that the process of aggregating nodes to form communities leads to maximum modularity gain. This change in modularity upon joining two communities defined as βˆ†π‘„ = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 βˆ’ 2π‘Žπ‘–π‘Žπ‘— = 2(π‘’π‘–π‘—βˆ’π‘Žπ‘–π‘Žπ‘—) can be calculated in constant time and hence is faster to execute in comparison to the GN algorithm. The run time of the algorithm is 𝑂(𝑛 2 ) for sparse graphs and 𝑂((π‘š + 𝑛)𝑛) for others. In a recent work, a scalable version of this algorithm has been implemented using MapReduce by Chen et al32. Newman33 generalized the betweenness algorithm for weighted networks. The modularity was now represented as 𝑄 = 1 2π‘š βˆ‘π‘–π‘—[𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ]Ξ΄(𝑐𝑖 , 𝑐𝑗) where π‘š = 1 2 βˆ‘π‘–π‘—π΄π‘–π‘— represents the number of edges between communities 𝑐𝑖
  • 30. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 30 and 𝑐𝑗 in the graph, while π‘˜π‘– , π‘˜π‘— are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑒, 𝑣) is 1 if 𝑒 = 𝑣 and 0 otherwise. Newman34 in yet another approach characterised the modularity matrix in terms of eigenvectors. The equation for modularity was changed to 𝑄 = 1 4π‘šπ‘ π‘‡π΅π‘  , where the modularity matrix was given as 𝐡𝑖𝑗 = 𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š and modularity was defined using eigenvectors of the modularity matrix. The algorithm runs in 𝑂(𝑛 2 π‘™π‘œπ‘”π‘›) time, where π‘™π‘œπ‘”π‘› represents the average depth of the dendrogram .
  • 31. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 31 Clauset et al35 used greedy optimization of modularity to detect communities for large networks. For a network structure with m edges and n vertices, the algorithm has a running time of (π‘šπ‘‘π‘™π‘œπ‘”π‘›) , where β€˜d’ denotes the depth of the dendrogram. For sparse real world networks the running time is(π‘›π‘™π‘œπ‘”2n) . Blondel et al36 designed an iterative two phase algorithm known as Louvain method. In first phase, all nodes are placed into different communities and then the modularity gain of moving a node 𝑖 from one community to another is found. In case this modularity gain is positive, the
  • 32. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 32 node is shifted to a new community. In second phase all the communities found in earlier phase are treated as nodes and the weight of links is found. The algorithm improves the time complexity of the GN algorithm. It has a linear run time of 𝑂(π‘š). Guimera et al37 used simulated annealing for modularity optimization and showed that computing the modularity of a network is similar to determining the ground-state energy of a spin system. Additionally, the authors showed that the stochastic network models give rise to modular networks due to fluctuations. Zhou et al38 attempted to improve modularity using simulated annealing introducing the idea of β€˜inter edges’ and β€˜intra edges’. The authors modified the modularity equation to include inter and intra edges as 𝑄 = 1 2π‘š βˆ‘ [(𝐴𝑖𝑗𝑛𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š )𝛿(𝐢𝑖 , 𝐢𝑗) βˆ’ 𝛽 (𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ) 𝛼 (1 βˆ’ 𝛿(𝐢𝑖 , 𝐢𝑗))] Intra factor Inter factor Here Ξ± and Ξ² are undetermined parameters and affect the value of the inter-factor. The value of Ξ² is increased and Ξ± is reduced when large communities are expected. Duch et al39 proposed a heuristic search based approach for the optimization of modularity function using extremal optimization technique, which has a complexity of 𝑂(𝑛 2 π‘™π‘œπ‘”2𝑛). AdClust method40 can extract modules from complex networks with significant precision and strength. Each node in the network is assumed to act as a self-directed agent representing flocking behaviour. The vertices of the network travel towards the desirable adjoining groups. Wahl and Sheppard41 proposed hierarchical fuzzy spectral clustering based approach. They argued that determining the sub-communities and their hierarchies are as important as determining communities within a network. DENGRAPH42 algorithm uses the idea of density-based incremental clustering of spatial data and is intended to work for large dynamic datasets with noise. The Markov Clustering Algorithm(MCL)43 is a graph flow simulation algorithm which can be used to detect clusters in a graph and is analogous to detection of communities in the networks. This algorithm consists of two alternate processes of β€˜expansion’ and β€˜inflation’. Markov chains are employed to perform random walk through a graph. The method has a worst case run time of 𝑂(π‘›π‘˜ 2 ) where n represents the number of nodes and k is the number of resources.
  • 33. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 33 Nikolaev et al44 used β€˜entropy centrality measure’ based on Markovian process to iteratively detect communities. A random walk through the nodes is performed to find the communities existing in the network structure. For a graph, the transition probability matrix for a Markov chain is created. A locality t is selected and those edges for which the average entropy centrality for the nodes over the graph is reduced are selected and removed. The algorithm proposed by Steinhaeuser et al45 performs many short random walks and interprets visited nodes during the same walk as similar nodes which gives an indication that they belong to the same community. The similar nodes are aggregated and community structure is created using consensus clustering. It has a runtime of 𝑂(𝑛 2 π‘™π‘œπ‘”π‘›). Genetic algorithms (GA) based community detection : Genetic algorithms (GA) are adaptive heuristic search algorithms whose aim is to find the best solution under the given circumstances. A genetic algorithm starts with a set of solutions known as chromosomes and fitness function is calculated for these chromosomes. If a solution with a maximum fitness is obtained, one stops else with some probability crossover and mutation operators are applied to the current set of solutions to obtain the new set of solutions. Community detection can be viewed as an optimization problem in which an objective function that captures the intuition of a community with better internal connectivity than external connectivity is chosen to be optimized. GA have been applied to the process of community discovery and analysis in a few recent research works. These are described briefly in this section. Table 2 enlists the algorithms available in literature for community detection based on GA.
  • 34. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 34 Pizzuti46 proposed the GA-Net algorithm which uses a locus based graph representation of the network. The nodes of the social network are depicted by genes and alleles. The algorithm introduces and optimizes the community score to measure the quality of partitioning. All the dense communities present in the network structure are obtained at the end of the algorithm by selectively exploring the search space, without the need to know in advance the exact number of groups. Another GA based approach MOGA-Net47 proposed by the same author optimizes two objective functions i.e. the community score and community fitness . The higher the community score, the denser the clustering obtained. The community fitness is sum of fitness of nodes belonging to a module. When this sum reaches its maximum, the number of external links is minimized. MOGA-Net generates a set of communities at different hierarchical levels in which solutions at deeper levels, consisting of a higher number of modules, are contained in solutions having a lower number of communities. Hafez et al48 have performed both Single-Objective and Multi-Objective optimization for community detection problem. The former optimization was done using roulette selection based GA while NSGA-II algorithm was used for the latter process. Mazur et al49
  • 35. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 35 have used modularity as the fitness function in addition to the community score. The authors worked on undirected graphs and their algorithm can also discover single node communities. Liu et al50 used GA in addition to clustering to find the community structures in a network. The authors have used a strategy of repeated divisions. The graph is initially divided into two parts, then the subgraphs are further divided and a nested GA is applied to them. Tasgin et al51 have also optimized the network modularity using GA. A multi-cultural algorithm52 for community detection employs the fitness function defined by Pizzuti46 in GA-Net. The belief space which is a state space for the network and contains a set of individuals that have a better fitness value has been used in this work to guide the search direction by determining a range of possible states for individuals. A genetic algorithm for the optimization of modularity, proposed by Nicosia et al53 and has been explained in the overlapping communities section later. Label propagation based community detection: Label propagation in a network is the propagation of a label to various nodes existing in the network. Each node attains the label possessed by a maximum number of the neighbouring nodes. This section discusses some label propagation based algorithms for discovering communities. Table 3 contains a listing of these algorithms, discussed in detail later in the section.
  • 36. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 36 Label Propagation Algorithm(LPA) was proposed by Raghavan et al54 in which initially each node tries to achieve a label from the maximum number of labels possessed by its neighbours. The stopping criteria for the process was also the same, i.e., when each node achieves a label, which a maximum number of its neighbouring nodes have. Each iteration of the algorithm takes 𝑂(π‘š) time where m is the number of edges. SLPA (speaker listener label propagation algorithm)55 is an extension to LPA which could analyse different kinds of communities such as disjoint communities, overlapping communities and hierarchical communities in both unipartite and bipartite networks. The algorithm has a linear run time of 𝑂(π‘‡π‘š), where T is the user defined maximum number of iterations and m is the number of edges. Based on the SLPA algorithm, Hu56 proposed a Weighted Label Propagation Algorithm (WLPA). It uses the similarity between any two of the vertices in a network based on the labels of the vertices achieved in label propagation. The similarity of these vertices is then used as a weight of the edge in label propagation. LPA was further improved by Gregory57 in his algorithm COPRA (Community Overlap Propagation Algorithm). It was the first label propagation based procedure which could also detect overlapping communities. The run time per iteration is 𝑂(π‘£π‘šπ‘™π‘œπ‘”(π‘£π‘šβ„π‘›), here n is the number of nodes, m is the edges and v is the maximum number of
  • 37. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 37 communities per vertex. LabelRank Algorithm58 uses the LPA and MCL (Markov Clustering Algorithm). The node identifiers are used as labels. Each node receives a number of labels from the neighbouring nodes. A community is formed for nodes having the same highest probability label. Four operators are applied namely propagation which propagates the label to neighbours, inflation i.e. the inflation operator of the MCL algorithm, cut-off operator that removes the labels below a threshold and an explicit conditional update operator responsible for a conditional update. The algorithm runs in 𝑂(π‘š) time where m is the number of edges. The LabelRank algorithm was modified to LabelRankT algorithm by Xie et al60. This algorithm included both the edge weights and the edge directions in the detection of communities. This algorithm works for dynamic networks as well and is able to detect evolving communities also. Wu et al59 proposed a Balanced Multi Label Propagation Algorithm (BMPLA) for detection of overlapping communities. Using this algorithm, vertices can belong to any number of communities without having a global maximum limit on largest number of communities membership required by COPRA57 . Each iteration of the algorithm takes 𝑂(π‘›π‘™π‘œπ‘”π‘›) time to execute, where n is the number of nodes. Semantics based community detection Semantic content and edge relationships in a semantic network may be additionally used to partition the nodes into communities. The context, as well as the relationship of the nodes, both are taken into consideration in the process of semantic community detection. LDA(Latent Dirichlet Allocation)61 is used in several semantic community based community detection approaches. A clustering algorithm based on the link- field-topic (LFT) model is put forward by Xin et al62 to overcome the limitation of defining the number of communities beforehand. The study forms the semantic link weight (SLW) based on the investigation of LFT, to evaluate the semantic weight of links for each sampling field. The proposed clustering algorithm is based on the SLW which could separate the semantic social network into clustering units. In another work63 the authors have used ARTs model and divided the process into two phases namely LDA sampling and community detection. In the former process multiple sampling
  • 38. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 38 ARTs have been designed. A community clustering algorithm has also been proposed. The procedure could detect the overlapping communities. Xia et al64 constructed a semantic network using information from the comment content extracted from the initial HTML source files. An average score is obtained for two users for each link assuming comments to be implicit links between people. An analytic method for taking out comment content is proposed to build the semantic network for example, the terms and phrases in data are counted in comments as supportive or opposing. Each phrase is given an associated numerical trust value. On this semantic network, the classical community detection algorithm is applied henceforth. Ding65 has considered the impact of topological as well as topical elements in community detection. Topology based approaches are based on the idea that the real world networks can be modelled as graphs where the nodes depict the entities whereas the interactions between them are shown by the edges of the graph. On the other hand topic based community detection have a basis that the more words two objects share, the more similar they are. The author performs systematic analysis with topology-based and topic-based community detection methodologies on the co-authorship networks. The paper puts forward the argument that, to detect communities, one should take into account together the topical and topological features of networks. A community detection algorithm, SemTagP (Semantic Tag propagation) has been proposed by Ereteo et al66 that takes yield of the semantic data captured while organizing the RDF graphs of social networks. It basically is an extension of the LPA54 algorithm to perform the semantic propagation of tags. The algorithm detects and moreover labels communities using the tags used by group during the social labelling process and the semantic associations derived between tags. In a study by Zhao et al67, a topic oriented approach consisting of an amalgam of social objects clustering and link analysis has been used. Firstly a modified form of k means clustering named as β€˜Entropy Weighting K-Means (EWKM) algorithm’ has been used to cluster the social objects. A subspace clustering algorithm is applied to cluster all the social objects into topics. On the clusters obtained in this
  • 39. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 39 process, topical community detection or link analysis is performed using a modularity optimization algorithm. The members of the objects are separated into topical clusters having unique topic. A link analysis is performed on each topical cluster to discover the topical communities. The end result of the entire method is topical communities. A community extraction approach is given by Abdelbary et al68, which integrates the content published within the social network with its semantic features. Community discovery is performed using two layer generative Restricted Boltzmann Machines model. The model presumes that members of a community communicate over matters of common concern. The model permits associate members to belong to multiple communities. Latent semantic analysis (LSA)69 and Latent Dirichlet Allocation (LDA)61 are the two techniques extensively employed in the process to detect topical communities. Nyugen et al70 have used LDA to find hyper groups in the blog content and then sentiment analysis is done to further find the meta-groups in these units. A Link-Content model is proposed by Natarajan et al71 for discovering topic based communities in social networks. Community has been modelled as a distribution employing Gibbs sampling. This paper uses links and content to extract communities in a content sharing network Twitter. Methods to detect overlapping communities A recent survey by Amelio et al gives a comprehensive review of major overlapping community detection algorithms and includes the methods on dynamic networks. There exists another review of methods for discover overlapping communities done by Xie et al72. The following section discusses some of the methods to detect overlapping communities. Tables 4 and 5 enlist the methods discussed in this section. Clique based methods for overlapping community detection A community can be interpreted as a union of smaller complete (fully connected) subgraphs that share nodes. A k-clique is a fully connected subgraph consisting of k nodes. A k-clique community can be defined as union of all k-cliques that can be reached from each other through a series of adjacent k-cliques. Many researchers have used cliques to detect
  • 40. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 40 overlapping communities. Important contributions using cliques for overlapping community detection are summarized in table 4. The Clique Percolation Method (CPM) was proposed by Palla et al73 to detect overlapping communities. The method first finds all cliques of the network and uses the algorithm of Everett et al83 to identify communities by component analysis of clique-clique overlap matrix. CPM has a runtime of 𝑂(exp(𝑛)). The Clique Percolation Method (CPM) method proposed by Palla et al73 could not discover the hierarchical structure along with the overlapping attribute. This limitation was overcome through method proposed by Lancichinetti et al74 . It performs a local exploration in order to find the community for each of the node. In this process the nodes may be revisited any number of times. The main objective was to find local maxima based on a fitness function. CFinder84 software was developed using CPM for overlapping community detection. Du et al75 proposed ComTector (Community DeTector) for detection of overlapping communities using maximal cliques. Initially all maximal cliques in the network are found which form the kernels of potential community. Then agglomerative technique is iteratively used to add the vertices left to their closest kernels. The obtained clusters are adjusted by merging pair of fractional communities in order to
  • 41. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 41 optimize the modularity of the network. The running time of the algorithm is (πΆβˆ—π‘‡2 ), where the communities detected are denoted by C and T is the number of triangles in the network. EAGLE, an agglomerative hierarchical clustering based algorithm has been proposed by Shen et al 76. In the first step maximal cliques are discovered and those smaller than a threshold are discarded. Subordinate maximal cliques are neglected, remaining give the initial communities (also the subordinate vertices). The similarity is found between these communities, and communities are repeatedly merged together on the basis of this similarity. This is repeated till one community remains at the end. Evans et al77 proposed that by partitioning the links of a network, the overlapping communities may be discovered. In an extension to this work, Evans et al78 used weighted line graphs. In another work Evans79 used clique graphs to detect the overlapping communities in real world social networks. GCE (Greedy Clique Expansion)80 first identifies cliques in a network. These cliques act as seeds for expansion along with the greedy optimization of a fitness function. A community is created by expanding the selected seed and performing its greedy optimization via the fitness function proposed by Lancichinetti et al74 . CONGA (Cluster-Overlap Newman Girvan Algorithm) was proposed by Gregory25. This method was based on split- betweenness algorithm of Girvan-Newman. The runtime of the method is 𝑂(π‘š3 ). In another work CONGO81 (CONGA Optimized) algorithm was proposed which used local betweenness measure, leading to an improved complexity 𝑂(π‘›π‘™π‘œπ‘”π‘›). A two phase Peacock algorithm for detection of overlapping communities is proposed in Gregory82 using disjoint community detection approaches. In the first phase, the network transformation was performed using the split betweenness concept proposed earlier by the author. In the second phase, the transformed network is processed by a disjoint community detection algorithm and the detected communities were converted back to overlapping communities of the original network. Non clique methods for overlapping community detection Some other non-clique methods to discover overlapping communities are given in the table 5. These methods have been briefly explained in this section. An
  • 42. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 42 extension of Newman’s modularity for directed graphs and overlapping communities was done by Nicosia et al53 and modularity was given by π‘„π‘œπ‘£ = 1 π‘š βˆ‘π‘βˆŠπΆ βˆ‘π‘–,π‘—βˆŠπ‘‰[𝛽𝑙(𝑖,𝑗),𝑐𝐴𝑖𝑗 βˆ’ 𝛽𝑙(𝑖,𝑗),π‘π‘œπ‘’π‘‘π‘˜π‘–π‘œπ‘’π‘‘π›½π‘™(𝑖,𝑗),π‘π‘–π‘›π‘˜π‘—π‘–π‘›π‘š . The authors defined a belongingness coefficient 𝛽𝑙,𝑐 of an edge 𝑙 connecting nodes 𝑖 and 𝑗 for a particular community 𝑐 and is given by 𝛽𝑙,𝑐 = β„±(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) where definition for β„±(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) is taken as arbitrary, e.g., it can be taken as a product of the belonging coefficients of the nodes involved, or as max(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐). 𝛽𝑙(𝑖,𝑗),π‘π‘œπ‘’π‘‘ = βˆ‘π‘—βˆŠπ‘‰β„±(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| , 𝛽𝑙(𝑖,𝑗),𝑐𝑖𝑛 = βˆ‘π‘–βˆŠπ‘‰β„±(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| . A genetic approach has been used in this work for the optimization of modularity function. Another work which uses genetic approach to overlapping community detection is GA-Net+, by Pizzuti85 GA-NET+ could detect overlapping communities using edge clustering. Order Statistics Local Optimization Method(OSLOM)86 detects clusters in networks, and can handle various kind of graph properties like edge direction, edge weights, overlapping communities, hierarchy and network dynamics. It is based on local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. Baumes et al87 considered a community as a subset of nodes which induces a locally optimal subgraph with respect to a density function. Two different subsets with significant overlap can be locally optimal which forms the basis to find overlapping communities. Chen et al 88 used game-theoretic approach to address the issue of overlapping communities. Each node is assumed to be an agent trying to improve the utility by joining or leaving the community. The community of the nodes in Nash equilibrium are assumed to form the output of the algorithm. Utility of an agent is formulated as combination of a gain and a loss function. To capture the idea of overlapping communities, each agent is permitted to select multiple communities. In another game- theoretic approach, Alvari et al89 proposed an algorithm consisting of two methods PSGAME based on Pearson correlation, and NGGAME centred on neighbourhood similarity measure. Alvari et al90 proposed the Dynamic Game Theory method (D-GT) which treated nodes as rational agents. These
  • 43. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 43 agents perform actions in iterative and game theoretic manner so as to maximize the total utility. Community detection for Dynamic networks: Dynamic networks are the networks in which the membership of the nodes of communities evolve or change over time. The task of community identification for dynamic networks has received relatively less attention than the static networks The methods have been categorized into two classes by Bansal et al94, one designed for data which is evolving in real time known as incremental or online community detection; and the other for data where all the changes of the network evolution are known a priori, known as offline community detection. Wolf et al95 proposed mathematical and computational formulations for the analysis of dynamic communities on the basis of social interactions occurring in the network. Tantipathananandh et al96 made assumptions about the individual behaviour and group membership. Henceforth they framed the objective as an optimization problem by formulating three cost functions, namely i-cost, g-cost and c- cost. Graph colouring and heuristics based approach were deployed. FacetNet, proposed by Lin et al97 is a unified framework to study the dynamic evolutions of communities. The community structure at any time includes the network data as well as the previous history of the evolution. They have used a cost function and proposed an iterative algorithm which converges to an optimal solution. Palla et al98 conducted experiments on two diverse datasets of phone call network and collaboration network to find time dependence. After building joint graphs for two time steps, the CPM algorithm73 was applied. They have used an auto-correlation function to find overlap among two states of a community, and a stationarity parameter which denotes the average correlation of various states. Greene et al99 proposed a heuristic technique for identification of dynamic communities in the network data. They represented the dynamic network graph as an aggregation of time step graphs. Step communities represent the dynamic
  • 44. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 44 communities at a particular time. The algorithm begins with the application of a static community detection algorithm on the graph. In the subsequent steps, dynamic communities are created for each step and Jaccard similarity is calculated. They have also generated benchmark dataset for experimental work. The algorithm by Bansal et al94 involves the addition or deletion of edges in the network. The algorithm is built on the greedy agglomerative technique of the modularity based method earlier proposed in the work of Clauset et al35 . He et al100 improvised Louvain method36 to include concept of dynamicity in the formation of communities. A key point in their algorithm is to make use of previously detected communities at time 𝑑 βˆ’ 1 to identify the communities at time 𝑑. Dinh et al101 proposed A3CS, an adaptive framework which uses the power-law distribution and achieves approximation guarantees for the NP-hard modularity maximization problem, particularly on dynamic networks. Nguyen et al102 have attempted to identify disjoint community structure in dynamic social networks. An adaptive modularity-based framework Quick Community Adaptation (QCA) is proposed. The method finds and traces the progress of network communities in dynamic online social networks. Takaffoli et al103 have proposed a two-step approach to community detection. In the first step the communities extracted at different time instances are compared using weighted bipartite matching. Next, a β€˜meta’ community is constructed which is defined as a series of similar communities at various time instances. Five events to capture the changes to community are split, survive, dissolve, merge, and form. A similarity function is used to calculate the similarity between two communities and a community matching algorithm has been employed thereafter. The authors, Kim et al104 proposed a particle-and- density based evolutionary clustering method for discovery of communities in dynamic networks. Their approach is grounded on the assumption that a network is built of a number of particles termed as nano-communities, where each community further is made up of particles termed as quasi- clique-by-clique (l-KK). The density based clustering method uses cost embedding technique and optimal modularity method to ensure temporal
  • 45. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 45 smoothness even when the number of cluster varies. They have used an information theory based mapping technique to recognize the stages of the community i.e. evolving, forming or dissolving. Their method improves accuracy and is time efficient as compared to the FacetNet method proposed earlier. In another approach proposed by Chi et al105, two frameworks for evolutionary spectral clustering have been proposed namely PCQ (Preserving cluster quality) and PCM (Preserving cluster membership). In this work the temporal smoothness is ensured by some terms in the clustering cost functions. These two frameworks combine the processes of community extraction and the community evolution process. They use a cost function which consists of the snapshot and temporal cost. The clustering quality of any partition determines the snapshot cost while the temporal cost definition varies for each of the frameworks. For PCQ framework, the temporal cost is decided by the cluster quality when the current partition is applied to the historic data. In PCM, the difference between the current and the historic partition gives the temporal cost. Both the frameworks proposed, can tackle the change in number of clusters. In their work DYNMOGA (Dynamic MultiObjective Genetic Algorithm), the authors Folino et al106 have used a genetic algorithm based approach to dynamic community detection. They attempt to achieve temporal smoothness by multiobjectiveoptimisation, i.e.maximisation of snapshot quality (community score is used) and minimization of temporal cost (here NMI is used). Kim et al107 in their method CHRONICLE have performed two stage clustering and the method can detect clusters of path group type also in addition to the single path type clusters. In first stage of the algorithm, called as CHRONICLE1st the cosine similarity measure is used. In second stage of the algorithm the measure proposed and used is general similarity (GS). It is a combination of the two measures structural affinity and weight affinity.
  • 46. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 46 SOME POTENTIAL APPLICATIONS OF COMMUNITY DETECTION: With the enormous growth of the social networking site users, the graphs representing these sites are becoming very complex, hence difficult to visualize and understand. Communities can be considered as a summary of the whole network thus making the network easy to comprehend. The discovery of these communities in social networks can be useful in various applications. Some of the applications where community detection is useful are briefly described below. Improving recommender systems with community detection Recommender Systems use data of similar users or similar items to generate recommendations. This is analogous to the identification of groups, or similar nodes in a graph. Hence community detection holds an immense potential for recommendation algorithms. Cao et al114 have used a community detection based approach to improve the traditional collaborative filtering process of Recommender Systems. The process starts with the mapping of user-item matrix to user similarity structure. On this matrix, a discrete PSO (particle swarm optimization) algorithm is applied to detect communities. The items are then recommended to the user based on the discovered communities. Evolution of communities in social media With the increase in the number of social networking sites, the focus and scope of sites are getting expanded. The sites are getting diversified in terms of focus. In addition to common sites like Facebook, Twitter, MySpace and Bebo, other sites like Flickr for photo-sharing have also come up. The analysis of the tweetretweet and the follower-followee network in twitter provides an insight into the community structure existing in the Twitter network. Sentiment analysis of the tweets may be performed as an intermediary step to find the general nature of the tweets and then community detection algorithms may be applied to help deduce the structure of communities. Zalmout et al115, applied the community
  • 47. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 47 detection algorithm to UK political tweets dataset. CQA(Community question answering) has been used by Zhang et al116 to discover overlapping communities in dynamic networks based on user interactions. Related Work: Social Network: A social network is depicted by social network graph consisting of number of nodes denoting individuals or the participants in the network. The connection between node and node is represented by the edge of the graph. A directed or an undirected graph may illustrate these connections between the participants of the network. The graph can be represented by an adjacency matrix in which in case there is an edge between and else. Social networks follow the properties of complex networks3,4. Some real life examples1 of social networks include friends based, telephone, email and collaboration networks. These networks can be represented as graphs and it is feasible to study and analyse them to find interesting patterns amongst the entities. These appealing prototypes can be utilized in various useful applications. Community: A community can be defined as a group of entities closer to each other in comparison to other entities of the dataset. Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group. The closeness between entities of a group can be measured via similarity or distance measures between entities. McPherson et al5 stated that β€œsimilarity breeds connection”. They discussed various social factors which lead to similar behaviour or homophily in networks. The communities in social networks are analogous to clusters in networks. An individual represented by a node in graphs may not be part of just a community or a group, it may be an element of many
  • 48. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 48 closely associated or different groups existing in the network. For example a person may concurrently belong to college, school, friends and family groups. All such communities which have common nodes are called overlapping communities. Identification and analysis of the community structure has been done by many researchers applying methodologies from numerous form of sciences. The quality of clustering in networks is normally judged by clustering coefficient which is a measure of how much the vertices of a network tend to cluster together. The global clustering coefficient6 and the local clustering coefficient7 are two types of clustering coefficients discussed in literature. Methods for grouping similar items: Communities are those parts of the graph which have denser connections inside and few connections with the rest of the graph8. The aim of unsupervised learning is to group together similar objects without any prior knowledge about them. In case of networks, the clustering problem refers to grouping of nodes according to their similarity computed based on topological features and/or other characteristics of the graph. Network partitioning and clustering are two commonly used methods in literature to find the groups in the social network graph. These methods are briefly described in the next subsections. Graph partitioning Graph partitioning is the process of partitioning a graph into a predefined number of smaller components with specific properties. A common property to be minimized is called cut size. A cut is a partition of the vertex set of a graph into two disjoint subsets and the size of the cut is the number of edges between the components. A multicut is a set of edges whose removal divides the graph into two or more components. It is necessary to specify the number of components one wishes to get in case of graph partitioning. The size of the components must also be specified, as otherwise a likely but not meaningful solution would be to put the minimum degree vertex into one component
  • 49. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 49 and the rest of the vertices into another. Since the number of communities is usually not known in advance, graph partitioning methods are not suitable to detect communities in such cases. Clustering is the process of grouping a set of similar items together in structures known as clusters. Clustering the social network graph may give a lot of information about the underlying hidden attributes, relationships and properties of the participants as well as the interactions among them. Hierarchical clustering and partitioning method of clustering are the commonly used clustering techniques used in literature. In hierarchical clustering, a hierarchy of clusters is formed. The process of hierarchy creation or levelling can be agglomerative or divisive. In agglomerative clustering methods, a bottom-up approach to clustering is followed. A particular node is clubbed or agglomerated with similar nodes to form a cluster or a community. This aggregation is based on similarity. In divisive clustering approaches, a large cluster is repeatedly divided into smaller clusters. Partitioning methods begin with an initial partition amidst the number of clusters pre-set and relocation of instances by moving them across clusters, e.g., K-means clustering. An exhaustive evaluation of all possible partitions is required to achieve global optimality in partitioned-based clustering. This is time consuming and sometimes infeasible, hence researchers use greedy heuristics for iterative optimization in partitioning methods of clustering. The next section categorizes and discusses major algorithms for community detection. ALGORITHMS FOR COMMUNITY DETECTION: A number of community detection algorithms and methods have been proposed and deployed for the identification of communities in literature. There have also been modifications and revisions to many methods and algorithms already proposed. A comprehensive survey of community
  • 50. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 50 detection in graphs has been done by Fortunato8 in the year 2010. Other reviews available in literature are by Coscia et al9 in 2011, Fortunato et al10 in 2012, Porter et al11 in 2009, Danon et al12 in 2005, and PlantiΓ© et al13 in 2013. The presented work reviews the algorithms available till 2015 to the best of our knowledge including the algorithms given in the earlier surveys. Papers based on new approaches and techniques like big data, not discussed by previous authors have been incorporated in our article.The algorithms for community detection are categorized into approaches based on graph partitioning, clustering, genetic algorithms, label propagation along with methods for overlapping community detection (clique based and non- clique based methods), and community detection for dynamic networks. Algorithms under each of these categories are described below. Graph partitioning based community detection: Graph partitioning based methods have been used in literature to divide the graph into components such that there are few connections between components. The Kernighan-Line14 algorithm for graph partitioning was amongst the earliest techniques to divide a graph. It partitions the nodes of the graph with cost on edges into subsets of given sizes so as to minimize the sum of costs on all edges cut. A major disadvantage of this algorithm however is that the number of groups have to be predefined. The algorithm however is quite fast with a worst case running time of 𝑂(𝑛2). Newman15 reduces the widely-studied maximum likelihood method for community detection to a search through a group of candidate solutions, each of which is itself a solution to a minimum cut graph partitioning problem. The paper shows that the two most essential community inference methods based on the stochastic block model or its degree-corrected variant16 can be mapped onto versions of the familiar minimum-cut graph partitioning problem. This has been illustrated by adapting Laplacian spectral partitioning method17, 18 to perform community inference.
  • 51. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 51 Definition of Local Community: The problem of local community detection is proposed by Clauset [15]. Usually we define the local community problem in the following way: there is a nondirected graph , represents the set of nodes, and represents the edges in the graph. The connecting information of partial nodes in the graph is known or can be obtained. The local community is defined as . The set of nodes connected with is defined as and the set of nodes in connected with nodes in is defined as the boundary node set . That is to say, any node in is connected to one node in , and the rest of is the core node set . Clustering based community detection: The main concern of community detection is to detect clusters, groups or cohesive subgroups. The basis of a large number of community detection algorithms is clustering. Amongst the innovators of community detection methods, Girvan and Newman19 had a main role. They proposed a divisive algorithm based on edge-betweenness for a graph with undirected and unweighted edges. The algorithm focused on edges that are most β€œbetween” the communities and communities are constructed progressively by removing these edges from the original graph. Three different measures for calculation of edge-betweenness in vertices of a graph were proposed in Newman and Girvan20. The worst-case time complexity of the edge betweenness algorithm is 𝑂(π‘š2𝑛) and is 𝑂(𝑛3) for sparse graphs, where m denotes the number of edges and n is the number of vertices. The Girvan Newman (GN) algorithm has been enhanced by many authors and applied to various networks21-28. Chen et al22 extended GN algorithm to partition weighted graphs and used it to identify functional modules in the yeast proteome network. Rattigan et al21 proposed the indexing methods to reduce the computational complexity of the GN algorithm significantly. Pinney et al24 also build an algorithm which uses GN algorithm for the decomposition of networks based on graph theoretical concept of
  • 52. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 52 betweenness centrality. Their paper inspected utility of betweenness centrality to decompose such networks in diverse ways. Radicchi29 et al also proposed an algorithm based on GN algorithm introducing a new definition of community. They defined β€˜strong’ and β€˜weak’ communities. The algorithm uses an edge clustering coefficient to perform the divisive edge removal step of GN and has a running time of 𝑂(π‘š4 𝑛2 ) and 𝑂(𝑛2) for sparse graphs. Moon et al30 have proposed and implemented the parallel version of the GN algorithm to handle large scale data. They have used MapReduce model (Apache Hadoop) and GraphChi. Newman and Girvan first defined a measure known as β€˜modularity’ to judge the quality of partitions or communities formed20. The modularity measure proposed by them has been widely accepted and used by researchers to gauge the goodness of the modules obtained from the community detection algorithms with high modularity corresponding to a better community structure. Modularity was defined as βˆ‘π’†π’Šπ’Šπ’Š βˆ’ π’‚π’ŠπŸ , where π’†π’Šπ’Š denotes fraction of the edges that connect vertices in community π’Š, π’†π’Šπ’‹ denotes fraction of the edges connecting vertices in two different communities π’Š and 𝒋 while π’‚π’Š = βˆ‘ π’†π’Šπ’‹π’‹ is the fraction of edges that connect to vertices in community π’Š. The value 𝑄 = 1 indicates a network with strong community structure. The optimization of modularity function has received great attention in literature. The table 1 lists clustering based community detection methods, including algorithms which use modularity and modularity optimization. Newman31 has worked to maximize modularity so that the process of aggregating nodes to form communities leads to maximum modularity gain. This change in modularity upon joining two communities defined as βˆ†π‘„ = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 βˆ’ 2π‘Žπ‘–π‘Žπ‘— = 2(π‘’π‘–π‘—βˆ’π‘Žπ‘–π‘Žπ‘—) can be calculated in constant time and hence is faster to execute in comparison to the GN algorithm. The run time of the algorithm is 𝑂(𝑛2) for sparse graphs and 𝑂((π‘š + 𝑛)𝑛) for others. In a recent work, a scalable version of this algorithm has been implemented using MapReduce by Chen et al32. Newman33 generalized the betweenness algorithm for weighted networks. The modularity was now represented as 𝑄 = 1 2π‘š βˆ‘ [𝐴𝑖𝑗𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š ]Ξ΄(𝑐𝑖,𝑐𝑗)
  • 53. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 53 where π‘š = 1 2 βˆ‘ 𝐴𝑖𝑗𝑖𝑗 represents the number of edges between communities 𝑐𝑖 and 𝑐𝑗 in the graph, while π‘˜π‘–,π‘˜π‘— are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑒,𝑣) is 1 if 𝑒 = 𝑣 and 0 otherwise. Newman34 in yet another approach characterised the modularity matrix in terms of eigenvectors. The equation for modularity was changed to 𝑄 =1 4π‘šπ‘ π‘‡π΅π‘  , where the modularity matrix was given as 𝐡𝑖𝑗 = 𝐴𝑖𝑗 βˆ’ π‘˜π‘–π‘˜π‘— 2π‘š and modularity was defined using eigenvectors of the modularity matrix. Figure 1 Definition of local community: Local community detection problem is to start from a preselected source node. It adds the node meeting the conditions in into and removes the node which does not meet the conditions from D gradually. 2.2. Related Algorithms: At present, many local community detection algorithms have been proposed. We introduce two representative local community detection algorithms. (1)Clauset Algorithm: In order to solve the problem of local community detection, Clauset [15] put forward the local community modularity R and gave a fast convergence greedy algorithm to find the local community with the greatest modularity. Thedefinition of local community modularity is asfollows where and represent two nodes in the graph. If nodes and are connected, the value of is 1; otherwise, it is 0; if nodes and are both in , the value of is 1; otherwise, it is 0. The local community detection process of Clauset algorithm is similar to that of web crawler algorithm. First, Clauset algorithm starts from an initial
  • 54. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 54 node .Node is added to the subgraph , and all its neighbor nodes are added to . Then the algorithm adds the node in which can bring the maximum increment of into the local community iteratively, until the scale of the local community reaches the preset size. That is to say, the algorithm needs to set up a parameter to decide the size of the community, and the result is greatly influenced by the initial node. (2) LWP Algorithm: LWP [16] algorithm is an improved algorithm and it has a clear end condition compared with Clauset algorithm. The algorithm defines another local community modularity , which is expressed aswhere and represent two nodes in the graph. If nodes and are connected to each other, the value of is 1; otherwise, it is 0; if nodes and are both in , the value of is 1; otherwise, it is 0; if only one of the nodes and is in , the value of is 1; otherwise, it is 0. Given an undirected and unweighted graph , LWP algorithm starts from an initial node to find a subgraph with maximum value of . If the subgraph is a community (i.e., ), then it returns the subgraph as a community. Otherwise, it is considered that there is no community that can be found starting from this initial node. For an initial node, LWP algorithm finds a subgraph with the maximum value of local modularity by two steps. First, the algorithm is initialized by constructing a subgraph with only an initial node and all the neighbor nodes of node are added to the set . Then the algorithm performs incremental step and pruning step. In the incremental step, the node selected from which can make the local modularity of increase with the highest value is added to iteratively. The greedy algorithm will iteratively add nodes in to , until no node in can be added. In the pruning step, if the local modularity of becomes larger when removing a node from , then really remove it from . In the process of pruning, the algorithm must ensure that the connectivity of is not destroyed until no node can be removed. Then update the set and repeat the two steps until there is no change in the process. The algorithm has a high Recall, but its accuracy is low.
  • 55. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 55 The complexity of these two algorithms is , where is the number of nodes to be explored in the local community and is the average degree of the nodes to be explored in the local community. CHAPTER-2 LITERATURE
  • 56. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 56 Literature Review: A wide research study in the recent years focused on community detection in complex systems [4], most of them focus on undirected networks to enhance the efficiency of identifying communities in understanding complex networks. For instances, Fortunato et al [3] based his proposed approach on statistical inference perspectives, Schaeffer et al [5], proposed their approach for clustering problem as an unsupervised learning task based on similarity measure over the data of the network, Girvan and Newman based their community detection proposal on betweenness calculation to find out community boundaries where modularity measure is the overall quality of the graph partitioning [6, 7]. The weight used by Newman and Girvan [7] aims to be the betweenness measure of the edge, representing the number of shortest paths connecting any pair of nodes passing through. However, community detection problem has been studied mainly in case of undirected
  • 57. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 57 networks, various solutions was proposed in this context, motivating many disciplines to deal with the issue. Interestingly, Fortunato et al [3] mentioned the few possibilities for extending techniques from undirected to directed case, where the edge directedness is not the only complication that could face the clustering problem. Nevertheless, diverse graph data in many real- world applications are by nature directed, thus interesting to save available information behinds the edge directionality. Malliaros et al [8], revealed in their survey that the most common way for researcher community to deal with the problem of clustering is to ignore the directionality of the graph, then proceed to clustering with a wide range of proposed tools. Therefore, most of community detection proposals can not be used directly on weighted directed graphs, where the number of communities not always known in advance and the communities present different granularity scale. Since the problem of community detection in complex network analysis acquires more attention, many researchers have been interested into structural information and topological networks metrics [1, 3, 4, 6, 7, 8, 9]. In [10], S. Ahajjam based the proposed community detection algorithms on a new scalable approach using leader nodes characteristics through two steps: (i) Identification of potential leaders in the network, and (ii) exploration of nodes similarities around leaders to build communities. Therefore, recent works start focusing on both topological and topical aspects [9, 11, 12] to overpass limited performances of topology-based community detection approaches. Topic-based community detection started gaining attention through different works for community detection in complex network [9, 13, 14]. The essence behind the approach is to similarly detect nodes with same properties, which are not necessarily real connections between nodes of the network, in which actors communicate on topics of mutual interest [14] to determine the communities which are topically similar. The main contributions of this thesis are to exploring and utilizing local community detection related approaches to improve the services in Online
  • 58. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 58 SocialNetworking Sites in terms of their recommendation efficiency and accuracy. Contributions are listed as follows: M.E.J.NewmananDM.Girvan[1],”Finding and evaluating community structure in network”. One of the most relevant features of graphs representing real systems is community structure, or clustering. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. Community detection is important for other reasons, too. Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules. So, vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities . Such a classification seems to be meaningful in social and metabolic networks. The procedure can be better illustrated by means of dendrograms. Sometimes, stopping conditions are imposed to select a partition or a group of partitions that satisfy a special criterion, like a given number of clusters or the optimization of a quality function. A simple way to identify communities in a graph is to detect the edges that connect vertices of different communities and remove them, so that the clusters get disconnected from each other. This is the philosophy of Divisivealgorithms. H.-W.Shen and X.-Q.Cheng[3],”Spectral methods for the detection of neteork community structure”.In this paper Spectral analysis has been successfully applied at the detection of community structure of networks, respectively being based on the adjacency matrix, the standard Laplacian matrix, the normalized Laplacian matrix, the modularity matrix, the correlation matrix and several other variants of these matrices. However, the comparison between these spectral methods is less reported. More importantly, it is still unclear which matrix is more appropriate for the detection of community structure. This paper answers the question through evaluating the effectiveness of these five matrices against the
  • 59. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 59 benchmark networks with heterogeneous distributions of node degree and community size. In this paper, we conduct a comparative analysis of the aforementioned five matrices on the benchmark networks which have heterogeneous distributions of node degree and community size. The comparison is carried out from two perspectives. The former one focuses on whether the number of intrinsic communities can be exactly identified according to the spectrum of these five matrices. The latter evaluates the effectiveness of these matrices at identifying the intrinsic community structure using their eigenvectors This paper carried out a comparative analysis on the spectral methods for the detection of network community structure through evaluating the performance of five widely used matrices on the benchmark networks with heterogeneous distribution of node degree and community size. These five matrices are respectively the adjacency matrix, the standard Laplacian matrix, the normalized Laplacian matrix, the modularity matrix and the correlation matrix. Test results demonstrate that the normalized Laplacian matrix and the correlation matrix significantly outperform the other three matrices at identifying the community structure of networks. This indicates that the heterogeneity of node degree is a crucial ingredient for the detection of community structure using spectral methods. V.D.Blondel,J.Guillaume,R.Lambiotte et al[7],”Fast unfolding of communities in large networks”. This Paper propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. The problem of community detection requires the partition of a network into communities of densely connected nodes, with the nodes belonging to different communities being only sparsely connected.
  • 60. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 60 This paper introduced an algorithm for optimizing modularity that allows to study networks of unprecedented size. The limitation of the method for the experiments that we performed was the storage of the network in main memory rather than the computation time. The accuracy of our method has also been tested on ad-hoc modular networks and is shown to be excellent in comparison with other (much slower) community detection methods. By construction, our algorithm unfolds a complete hierarchical community structure for the network, each level of the hierarchy being given by the intermediate partitions found at each pass. K.M.Tan,D.Written and A.Shojaie[8],”The cluster graphical lasso for improved estimation of Gaussian graphical models”. In this paper the task of estimating a Gaussian graphical model in the high-dimensional setting is considered. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to a lasso penalty, is a well-studied approach for this task. A surprising connection between the graphical lasso and hierarchical clustering is introduced: the graphical lasso in effect performs a two-step procedure, in which single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then a penalized log likelihood is maximized on the subset of variables within each connected component. Thus, the graphical lasso determines the connected components of the estimated network via single linkage clustering. The single linkage clustering is known to perform poorly in certain finite-sample settings. Therefore, the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster, is proposed. We have shown that identifying the connected components of the graphical lasso solution is equivalent to performing SLC based on S Μƒ , the absolute value of the empirical covariance matrix. Based on this connection, we have proposed the cluster graphical lasso, an improved version of the graphical lasso for sparse inverse covariance estimation. In this paper, we have considered the use of hierarchical clustering in the CGL procedure. We have shown that performing hierarchical clustering on S Μƒ leads to consistent cluster recovery. As a byproduct,
  • 61. Venkat Java Projects Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com 61 we suggest a choice of Ξ»1, …, Ξ»K in CGL that yields consistent identification of the connected components. In addition, we establish the model selection consistency of CGL. Y.J.Wu,H.Huang,Z.F.Hao and F.Chen[17],”Local Community Detection using link similarity”.In this paper Exploring local community structure is an appealing problem that has drawn much recent attention in the area of social network analysis. The existing approaches do well in measuring the community quality, but they are largely dependent on source vertex and putting too strict policy in agglomerating new vertices. Moreover, they have parameters which are difficult to obtain. This paper proposes a method toΒ―find local community structure by analyzing link similarity between the community and the vertex. Inspired by the fact that elements in the same community are more likely to share common links, we explore community structure heuristically by giving priority to vertices which have a high link similarity with the community. A three-phase process is also used for the sake of improving quality of community structure. Experimental results prove that our method performs efectively not only in computer-generated graphs but alsoin real-world graphs. In this paper we present a method that mainly depends on link similarity between the vertex and community to explore local community. This method searches the potential vertices in a specific sequence so as to help improve the accuracy. In this paper, we pro-posed an improved method to detect local community structure. Our algorithm mainly takes advantage of link similarity between the vertex and the community.A greedy agglomeration phase, an optimization phase and a trimming phase are included in our algorithm.Compared with other multi-phase algorithms, our al-gorithm implements comparatively easier and stricter stopping criteria for the purpose of simplifying and op-timizing our algorithm. Experimental results show that our algorithm can discover local community better than other existing methods both in computer-generated benchmark graphs and in real-world networks.