SlideShare a Scribd company logo
1 of 38
Download to read offline
Community Detection
in Graphs
Nicola Barbieri
nicolabarbieri1@gmail.com
References:
Community Detection in Graphs, Santo Fortunato
Community Detection and Mining in Social Media, Lei Tang and Huan Liu
Zackary’s Karate Club
• Members of a university karate
club (unknown location)
• Zachary (1977) used these data to
explain the split-up of this group
following disputes among the
members.
• Conflict between the club
president, John A., and Mr. Hi over
the price of karate lessons
• The conflict results in "pulling"
apart the network of friendship
ties
Structural properties of Real
Networks
• Graph representing real systems are neither regular
(like lattices) nor random
• The distribution of number of links per node of many
real networks is different from what is expected in
random networks
• The degree distribution is broad, with a tail that often
follows a power law
• Proteins interaction nets: some protein act as hubs,
they are highly connected, while most of the others
interact only with few other
• Biological nets: high degree nodes systemically link to
nodes with low degree
• Social nets: nodes with similar degree tend to link
each other
• The scale of organization of complex networks show a
hierarchical structure
• Hierarchical random graph model
Communities and Applications
• Community structure:
• Vertices in networks are often found to
cluster into tightly-knit groups with a
high density of within-group edges and a
lower density of between-group edges.
• Applications:
• Identify web clients with similar interest
and geographically near each other
• Identify customer with similar interests
(purchasing history)
• Graph Compression
• Classification of vertices
Communities in real-world
networks
Relationships in real-world
networks
• Link direction
• Relationships between nodes may not to be reciprocal
• In the Web few hyperlinks are reciprocal (<10%)
• Community detection on directed graph is a hard task
• Overlapping Communities: some vertices may belong to more than one
group
• Heterogeneous Networks: different classes of vertices, playing different
roles
• Multipartite Networks
• Weighted relationships
Finding Communities
• Given a graph G=<V,E>
• Community detection problem: find modules and their
hierarchical organization
• What do we miss?
• Define what is “a community”
• Design algorithms that will find set of nodes which lead to “good
communities”
• Why just “good”??
• Evaluate different results
Community: Definition and
Properties
• Informally, a community C is a subset of nodes of V such that there are
more edges inside the community than edges linking vertices of C
with the rest of the graph
• Intra Cluster Density
• Inter Cluster Density
• ∂ext(C)<< 2m/ n(n-1)<< ∂int(C)
• There is not a universally accepted definition of community
• Connectedness is a required property: for each pair of vertices in C
there must exist a path
• Community detection makes sense only on sparse graphs
Notations
V set of vertices
E set of edges
n |V|
m |E|
C A subset ofV
nc |C|
Local Definitions
• Clique: subset of V such that all the vertices are adjacent to each
other
• Triangles are really frequent in real networks
• Finding cliques in a graph is NP Complete
• Too strict definition
• k-clique: is a maximal subgraph in which the largest geodesic
distance between any two nodes is no greater than k
• k-club: restricts the geodesic distance within the group to be no
greater than k
Global Definitions
• Communities can be also defined with respect to the whole graph
• A graph has a community structure if it is different from a random
graph
• A random graph is not expected to have any community structure:
• any two vertices have the same probability to be adjacent
• We can define a null model and use it to investigate whether the
graph under consideration exhibit a community structure
Similarity
• A community can be defined as a subset of vertices that are similar
to each other
• Do not consider connection
• Structural equivalence: v1 and v2 are structural equivalent if they
share the same neighbors and they are not adjacent
• Overlap between the neighborhoods Γ(i) and Γ(j) of vertices i and j
• Pearson
• Commute-time: average number of steps needed for a random
walker, starting at either vertex, to reach the other vertex for the first
time and to come back to the starting vertex
Partitions
• A partition is a division of a graph in clusters, such that each vertex belongs
to one cluster
• The Stirling numbers of the second kind count the number of ways to
partition a set of n labelled objects into k nonempty unlabelled subsets
• Hierarchical organization
• Communities embedded within other communities
• Nodes can be shared between different communities
Comparing Different partitions
• What is a good clustering?
• A quality function is a function that assigns a number to each
partition of a graph
• We can rank partitions based on their score given by the quality
function.
• A quality function Q is additive if there is an elementary function q
such that, for any partition P of a graph
• Performance P
• # vertices belonging to the same community
and connected by an edge
• # vertices belonging to different communities
and not connected by an edge.
Modularity
• The most popular quality function is the modularity
density of edges in a subgraph vs density
in a null model graph
• The δ-function yields one if vertices i and j are in the same
community, zero otherwise
• Pij represents the expected number of edges between vertices i and j
in the null model (which is arbitrary)
• Bernulli random graph
• Configuration model
ki = degree of the vertex (i)
• nc is the number of clusters,
• lc the total number of edges joining vertices of module c
• dc the sum of the degrees of the vertices of c.
a b
c
d
e
a
b
c
d
e
Graph Partitioning
• Divide the vertices in k groups of predefined size, such that the
number of edges lying between the groups is minimal
• The number of edges running between clusters is called cut size
• Specifying the number of clusters of the partition is necessary
Kernighan-Lin algorithm
• Heuristic procedure for the problem of partitioning electronic circuits onto
boards
• Iterative Improvement:
• Generate an initial solution
• Update the current solution iteratively, until we have an optimal solution
• KL-Algorithm O(n2 log n):
• An initial partition is generated at random
• A solution is acceptable if both the communities contain the same
number of vertices
• The cut size is the goodness of a solution
• Update: Select a subset of pair of vertices (u,v), where u belongs to C1
and v to C2 and swap them
Kernighan-Lin algorithm
• Assume (v,w) is in E
• they belong to different communities --> (v,w) is said cut
• they belong to the same community --> (v,w) is said uncut
• For each vertex v we compute
• Cv= # cuts
• Uv= # uncuts
• Improvement Iv = Cv-Uv
• Record the cut size corresponding to the current configuration
• Compute the improvement for each pair of vertices
• I(v,w)= Iv +Iw -2 if (v,w) is in E
• I(v,w)= Iv +Iw otherwise
Kernighan-Lin algorithm
1.Compute Improvements for all pair of
gates
A
B
D
C
E G
H
F
Vertex C U Iv
A 2 0 2
B 2 0 2
C 0 1 -1
D 2 0 2
E 3 0 3
F 2 1 1
G 2 0 0
H 1 0 1
Pair Iv.w
A,C 1
A,D 2
A,E 3
A,F 3
B,C 1
B,D 4
B,E 3
B,F 2
G,C 1
G,D 2
G,E 3
G,F 3
H,C 0
H,D 3
H,E 4
H,F 0
Kernighan-Lin algorithm
2.Sort the list
Pair Iv.w
A,C 1
A,D 2
A,E 3
A,F 3
B,C 1
B,D 4
B,E 3
B,F 2
G,C 1
G,D 2
G,E 3
G,F 3
H,C 0
H,D 3
H,E 4
H,F 0
Pair Iv.w
B,D 4
H,E 4
A,E 3
A,F 3
B,E 3
G,E 3
G,F 3
H,D 3
A,D 2
B,F 2
G,D 2
A,C 1
B,C 1
G,C 1
H,C 0
H,F 0
Pair Cut Count
Zero swap 7
BD 3
Kernighan-Lin algorithm
3.Perform a tentative swap B,D
Pair Iv.w
A,C 1
A,E -1
A,F -1
G,C -1
G,E -3
G,F -1
H,C 0
H,E 2
H,F -2
Vertex C U Iv
A 1 1 0
C 0 1 -1
E 2 1 1
F 2 1 1
G 2 0 0
H 1 0 1
A
B
D
C
E G
H
F
Pair Cut Count
Zero swap 7
BD 3
HE 1
Kernighan-Lin algorithm
2.Perform a tentative swap H,E
Pair Iv.w
A,C -3
A,F -5
G,C -3
G,F -5
Vertex C U Iv
A 0 2 -2
C 0 1 -1
F 0 3 -3
G 0 2 -2
A
B
D
C
E G
H
F
Pair Cut Count
Zero swap 7
BD 3
HE 1
AC 3
Kernighan-Lin algorithm
5.Perform a tentative swap A,C
Pair Iv.w
G,F -3
Vertex C U Iv
F 1 2 -1
G 0 2 -2
A B
D
C
E G
H
F
Pair Cut Count
Zero swap 7
BD 3
HE 1
AC 4
GF 7
Kernighan-Lin algorithm
6.Scan the list searching for the min cut
7.Perform the swaps (if the min cut < zero swaps)
Pair Cut Count
Zero swap 7
BD 3
HE 1
AC 4
GF 7
A
B
D
C
E G
H
F A
B
D
C
E G
H
F8. Second iteration
Hierarchical Clustering
• Widely used in social network analysis
• No need to specify the number of clusters
• Graph may have a hierarchical structure
• Hierarchical Clustering aim at identifying groups of vertices with high
similarity (not focusing on connectedness)
• Define a similarity measure between vertices
• Compute the n x n similarity matrix
• Agglomerative algorithms: (bottom up) clusters are merged if their
similarity if sufficiently high
• Divisive algorithms: (top-down) clusters are iteratively split by
removing edges connecting vertices with low similarity
Hierarchical Clustering
• Merging Clusters
• Single linkage
• Complete linkage
• Average linkage
• Drawback of the hierarchical procedure: it does not provide a way to
discriminate which level better represents the community structure
of the graph
Girvan-Newman
• Divisive method: detect edges that connect different communities and
remove them until clusters are disconnected
• 4 Steps:
1.Compute Edge centrality
2.Remove the edge with the highest centrality
3.Update (!!!!) Centralities
4.If |E|>0, go to 2
Girvan-Newman
• Instead of trying to construct a measure which tells us which edges are
most central to communities, we focus instead on those edges which are
least central
• If a network contains communities or groups that are only loosely
connected by a few inter-group edges, then all shortest paths between
different communities must go along one of these few edges
• Edge Betweenness O(mn)
• number of shortest path between all vertex
pair that run along the considered edge
• The edges connecting communities will have
high edge betweenness
• Which partition is the best???
• Compute modularity
Girvan-Newman
Optimal community structure for Zachary's karate club.
Modularity without recalculation
Modularity optimization
• If high modularity indicate goods partition, why not simply optimize Q
over all partitions to find the best one?
• The search-space is exponential in |V|
• Greedy algorithm (Newman):
• Agglomerative clustering: we repeatedly join communities together in
pairs, choosing at each step the join that results in the greatest increase
(or smallest decrease) in Q
• Note that the joining of a pair of communities between which there are
no edges at all can never result in an increase in Q.This limit the number
of tentative joins to (m)
eii is the fraction of edges in the network
that connect vertices in the group i
ai is the fraction of edges that connect
vertices in the group i with every other group
Modularity optimization
C1
C2
C3
C4
m=24
Modularity optimization
• The peak modularity is Q = 0.381
• The GN algorithm performs similarly on this task, but not better it
also finds the split but classifies one vertex wrongly (although a
different one, vertex 3).
Overlapping Communities
• The most popular technique to discover
overlapping communities is the Clique
Percolation Method (CPM)
• The internal edges of a community are likely
to form cliques due to their high density
• It is unlikely that inter-community edges form
cliques
• If it were possible for a clique to move on a
graph, in some way, it would probably get
trapped inside its original community, as it
could not cross the bottleneck formed by the
inter-community edges
CPM
• Given a parameter k:
1.Find all the cliques of size k
2.Construct a clique graph. 2 cliques are adjacent if they share k-1
vertices
3.Each connect component of the clique graph is a community
cliques of size 3= {1,2,3}, {1,3,4},{4,5,6},{5,6,7},
{5,6,8},{5,7,8},{6,7,8}
{1,2,3}
{1,3,4}
{4,5,6}
{5,6,7}
{5,6,8}
{5,7,8} {6,7,8}
CPM
The communities of the Karate network (by k-clique percolation for k = 3):
gray nodes are overlapping and white nodes
do not belong to any community
Overlapping Communities: Edge
centric approach
• A vertex can belong to several communities
• A link is usually related to one community
• Cluster edges instead of nodes!!
• After obtaining edge clusters, communities can be recovered by
replacing each edge with its two vertices
• A node is involved in a community as long as any of its connection is
in the community
• Assume we are given a set of description label for each vertex
• Given the similarity matrix, apply a cluster algorithm to discover
edge-clusters
Testing algorithms
• Compare the partition provided by the algorithm with the ground
truth
• We assume that the community membership for each vertex is
know
• The mapping is clear when dealing with 2 communities
• When there are many communities the mapping may not be intuitive
• We need to average all the possible mappings
Normalized Mutual Information
• Consider two partitions πa and πb with size k(a) and k(b)
• nh,l is the number of vertices belonging to the h-th community for (a)
and to the l-th community for (b)
• na
h is the number of vertices belong to the h-th community for the
partition (a)
• nb
l is the number of vertices belong to the l-th community for the
partition (b)
Entropy Information Gain
0<=NMI<=1
Testing algorithms without
ground truth
• Use a common (and simple) objective function
• Conductance: the ratio between the number of edges leaving the
cluster and the number of edge inside the cluster
• Network Community Profile: the score of best cluster of size k

More Related Content

What's hot

Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPremsankar Chakkingal
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network AnalysisRory Sie
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?IAMAl
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part iiTHomas Plotkowiak
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis Dragan Gasevic
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetupLiad Magen
 
Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
Application Of Graph Data Structure
Application Of Graph Data StructureApplication Of Graph Data Structure
Application Of Graph Data StructureGaurang Dobariya
 

What's hot (20)

Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Graph clustering
Graph clusteringGraph clustering
Graph clustering
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part ii
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 
Introduction to Complex Networks
Introduction to Complex NetworksIntroduction to Complex Networks
Introduction to Complex Networks
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Application Of Graph Data Structure
Application Of Graph Data StructureApplication Of Graph Data Structure
Application Of Graph Data Structure
 

Similar to Community detection in graphs

Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
4. social network analysis
4. social network analysis4. social network analysis
4. social network analysisLokesh Ramaswamy
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
Random graph models
Random graph modelsRandom graph models
Random graph modelsnetworksuw
 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network SciencePavel Loskot
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey煜林 车
 
Scalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmScalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmNavid Sedighpour
 
Communities in Network Science
Communities in Network ScienceCommunities in Network Science
Communities in Network Sciencetakiklug
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networksVincent Traag
 
4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf9260SahilPatil
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit Vpkaviya
 
Algorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisAlgorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisoliviaclark2905
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 

Similar to Community detection in graphs (20)

Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
4. social network analysis
4. social network analysis4. social network analysis
4. social network analysis
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Random graph models
Random graph modelsRandom graph models
Random graph models
 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey
 
Scalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmScalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithm
 
Communities in Network Science
Communities in Network ScienceCommunities in Network Science
Communities in Network Science
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networks
 
Floor planning ppt
Floor planning pptFloor planning ppt
Floor planning ppt
 
13047926.ppt
13047926.ppt13047926.ppt
13047926.ppt
 
Sun_MAPL_GNN.pptx
Sun_MAPL_GNN.pptxSun_MAPL_GNN.pptx
Sun_MAPL_GNN.pptx
 
4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
Networks
NetworksNetworks
Networks
 
Algorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisAlgorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysis
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 

More from Nicola Barbieri

Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networksNicola Barbieri
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015Nicola Barbieri
 
Modeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsModeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsNicola Barbieri
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
Influence maximization with viral product design
Influence maximization with viral product designInfluence maximization with viral product design
Influence maximization with viral product designNicola Barbieri
 
Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Nicola Barbieri
 

More from Nicola Barbieri (6)

Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networks
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015
 
Modeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsModeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovations
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
Influence maximization with viral product design
Influence maximization with viral product designInfluence maximization with viral product design
Influence maximization with viral product design
 
Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013
 

Recently uploaded

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Recently uploaded (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

Community detection in graphs

  • 1. Community Detection in Graphs Nicola Barbieri nicolabarbieri1@gmail.com References: Community Detection in Graphs, Santo Fortunato Community Detection and Mining in Social Media, Lei Tang and Huan Liu
  • 2. Zackary’s Karate Club • Members of a university karate club (unknown location) • Zachary (1977) used these data to explain the split-up of this group following disputes among the members. • Conflict between the club president, John A., and Mr. Hi over the price of karate lessons • The conflict results in "pulling" apart the network of friendship ties
  • 3. Structural properties of Real Networks • Graph representing real systems are neither regular (like lattices) nor random • The distribution of number of links per node of many real networks is different from what is expected in random networks • The degree distribution is broad, with a tail that often follows a power law • Proteins interaction nets: some protein act as hubs, they are highly connected, while most of the others interact only with few other • Biological nets: high degree nodes systemically link to nodes with low degree • Social nets: nodes with similar degree tend to link each other • The scale of organization of complex networks show a hierarchical structure • Hierarchical random graph model
  • 4. Communities and Applications • Community structure: • Vertices in networks are often found to cluster into tightly-knit groups with a high density of within-group edges and a lower density of between-group edges. • Applications: • Identify web clients with similar interest and geographically near each other • Identify customer with similar interests (purchasing history) • Graph Compression • Classification of vertices
  • 6. Relationships in real-world networks • Link direction • Relationships between nodes may not to be reciprocal • In the Web few hyperlinks are reciprocal (<10%) • Community detection on directed graph is a hard task • Overlapping Communities: some vertices may belong to more than one group • Heterogeneous Networks: different classes of vertices, playing different roles • Multipartite Networks • Weighted relationships
  • 7. Finding Communities • Given a graph G=<V,E> • Community detection problem: find modules and their hierarchical organization • What do we miss? • Define what is “a community” • Design algorithms that will find set of nodes which lead to “good communities” • Why just “good”?? • Evaluate different results
  • 8. Community: Definition and Properties • Informally, a community C is a subset of nodes of V such that there are more edges inside the community than edges linking vertices of C with the rest of the graph • Intra Cluster Density • Inter Cluster Density • ∂ext(C)<< 2m/ n(n-1)<< ∂int(C) • There is not a universally accepted definition of community • Connectedness is a required property: for each pair of vertices in C there must exist a path • Community detection makes sense only on sparse graphs Notations V set of vertices E set of edges n |V| m |E| C A subset ofV nc |C|
  • 9. Local Definitions • Clique: subset of V such that all the vertices are adjacent to each other • Triangles are really frequent in real networks • Finding cliques in a graph is NP Complete • Too strict definition • k-clique: is a maximal subgraph in which the largest geodesic distance between any two nodes is no greater than k • k-club: restricts the geodesic distance within the group to be no greater than k
  • 10. Global Definitions • Communities can be also defined with respect to the whole graph • A graph has a community structure if it is different from a random graph • A random graph is not expected to have any community structure: • any two vertices have the same probability to be adjacent • We can define a null model and use it to investigate whether the graph under consideration exhibit a community structure
  • 11. Similarity • A community can be defined as a subset of vertices that are similar to each other • Do not consider connection • Structural equivalence: v1 and v2 are structural equivalent if they share the same neighbors and they are not adjacent • Overlap between the neighborhoods Γ(i) and Γ(j) of vertices i and j • Pearson • Commute-time: average number of steps needed for a random walker, starting at either vertex, to reach the other vertex for the first time and to come back to the starting vertex
  • 12. Partitions • A partition is a division of a graph in clusters, such that each vertex belongs to one cluster • The Stirling numbers of the second kind count the number of ways to partition a set of n labelled objects into k nonempty unlabelled subsets • Hierarchical organization • Communities embedded within other communities • Nodes can be shared between different communities
  • 13. Comparing Different partitions • What is a good clustering? • A quality function is a function that assigns a number to each partition of a graph • We can rank partitions based on their score given by the quality function. • A quality function Q is additive if there is an elementary function q such that, for any partition P of a graph • Performance P • # vertices belonging to the same community and connected by an edge • # vertices belonging to different communities and not connected by an edge.
  • 14. Modularity • The most popular quality function is the modularity density of edges in a subgraph vs density in a null model graph • The δ-function yields one if vertices i and j are in the same community, zero otherwise • Pij represents the expected number of edges between vertices i and j in the null model (which is arbitrary) • Bernulli random graph • Configuration model ki = degree of the vertex (i) • nc is the number of clusters, • lc the total number of edges joining vertices of module c • dc the sum of the degrees of the vertices of c. a b c d e a b c d e
  • 15. Graph Partitioning • Divide the vertices in k groups of predefined size, such that the number of edges lying between the groups is minimal • The number of edges running between clusters is called cut size • Specifying the number of clusters of the partition is necessary
  • 16. Kernighan-Lin algorithm • Heuristic procedure for the problem of partitioning electronic circuits onto boards • Iterative Improvement: • Generate an initial solution • Update the current solution iteratively, until we have an optimal solution • KL-Algorithm O(n2 log n): • An initial partition is generated at random • A solution is acceptable if both the communities contain the same number of vertices • The cut size is the goodness of a solution • Update: Select a subset of pair of vertices (u,v), where u belongs to C1 and v to C2 and swap them
  • 17. Kernighan-Lin algorithm • Assume (v,w) is in E • they belong to different communities --> (v,w) is said cut • they belong to the same community --> (v,w) is said uncut • For each vertex v we compute • Cv= # cuts • Uv= # uncuts • Improvement Iv = Cv-Uv • Record the cut size corresponding to the current configuration • Compute the improvement for each pair of vertices • I(v,w)= Iv +Iw -2 if (v,w) is in E • I(v,w)= Iv +Iw otherwise
  • 18. Kernighan-Lin algorithm 1.Compute Improvements for all pair of gates A B D C E G H F Vertex C U Iv A 2 0 2 B 2 0 2 C 0 1 -1 D 2 0 2 E 3 0 3 F 2 1 1 G 2 0 0 H 1 0 1 Pair Iv.w A,C 1 A,D 2 A,E 3 A,F 3 B,C 1 B,D 4 B,E 3 B,F 2 G,C 1 G,D 2 G,E 3 G,F 3 H,C 0 H,D 3 H,E 4 H,F 0
  • 19. Kernighan-Lin algorithm 2.Sort the list Pair Iv.w A,C 1 A,D 2 A,E 3 A,F 3 B,C 1 B,D 4 B,E 3 B,F 2 G,C 1 G,D 2 G,E 3 G,F 3 H,C 0 H,D 3 H,E 4 H,F 0 Pair Iv.w B,D 4 H,E 4 A,E 3 A,F 3 B,E 3 G,E 3 G,F 3 H,D 3 A,D 2 B,F 2 G,D 2 A,C 1 B,C 1 G,C 1 H,C 0 H,F 0 Pair Cut Count Zero swap 7 BD 3
  • 20. Kernighan-Lin algorithm 3.Perform a tentative swap B,D Pair Iv.w A,C 1 A,E -1 A,F -1 G,C -1 G,E -3 G,F -1 H,C 0 H,E 2 H,F -2 Vertex C U Iv A 1 1 0 C 0 1 -1 E 2 1 1 F 2 1 1 G 2 0 0 H 1 0 1 A B D C E G H F Pair Cut Count Zero swap 7 BD 3 HE 1
  • 21. Kernighan-Lin algorithm 2.Perform a tentative swap H,E Pair Iv.w A,C -3 A,F -5 G,C -3 G,F -5 Vertex C U Iv A 0 2 -2 C 0 1 -1 F 0 3 -3 G 0 2 -2 A B D C E G H F Pair Cut Count Zero swap 7 BD 3 HE 1 AC 3
  • 22. Kernighan-Lin algorithm 5.Perform a tentative swap A,C Pair Iv.w G,F -3 Vertex C U Iv F 1 2 -1 G 0 2 -2 A B D C E G H F Pair Cut Count Zero swap 7 BD 3 HE 1 AC 4 GF 7
  • 23. Kernighan-Lin algorithm 6.Scan the list searching for the min cut 7.Perform the swaps (if the min cut < zero swaps) Pair Cut Count Zero swap 7 BD 3 HE 1 AC 4 GF 7 A B D C E G H F A B D C E G H F8. Second iteration
  • 24. Hierarchical Clustering • Widely used in social network analysis • No need to specify the number of clusters • Graph may have a hierarchical structure • Hierarchical Clustering aim at identifying groups of vertices with high similarity (not focusing on connectedness) • Define a similarity measure between vertices • Compute the n x n similarity matrix • Agglomerative algorithms: (bottom up) clusters are merged if their similarity if sufficiently high • Divisive algorithms: (top-down) clusters are iteratively split by removing edges connecting vertices with low similarity
  • 25. Hierarchical Clustering • Merging Clusters • Single linkage • Complete linkage • Average linkage • Drawback of the hierarchical procedure: it does not provide a way to discriminate which level better represents the community structure of the graph
  • 26. Girvan-Newman • Divisive method: detect edges that connect different communities and remove them until clusters are disconnected • 4 Steps: 1.Compute Edge centrality 2.Remove the edge with the highest centrality 3.Update (!!!!) Centralities 4.If |E|>0, go to 2
  • 27. Girvan-Newman • Instead of trying to construct a measure which tells us which edges are most central to communities, we focus instead on those edges which are least central • If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges • Edge Betweenness O(mn) • number of shortest path between all vertex pair that run along the considered edge • The edges connecting communities will have high edge betweenness • Which partition is the best??? • Compute modularity
  • 28. Girvan-Newman Optimal community structure for Zachary's karate club. Modularity without recalculation
  • 29. Modularity optimization • If high modularity indicate goods partition, why not simply optimize Q over all partitions to find the best one? • The search-space is exponential in |V| • Greedy algorithm (Newman): • Agglomerative clustering: we repeatedly join communities together in pairs, choosing at each step the join that results in the greatest increase (or smallest decrease) in Q • Note that the joining of a pair of communities between which there are no edges at all can never result in an increase in Q.This limit the number of tentative joins to (m) eii is the fraction of edges in the network that connect vertices in the group i ai is the fraction of edges that connect vertices in the group i with every other group
  • 31. Modularity optimization • The peak modularity is Q = 0.381 • The GN algorithm performs similarly on this task, but not better it also finds the split but classifies one vertex wrongly (although a different one, vertex 3).
  • 32. Overlapping Communities • The most popular technique to discover overlapping communities is the Clique Percolation Method (CPM) • The internal edges of a community are likely to form cliques due to their high density • It is unlikely that inter-community edges form cliques • If it were possible for a clique to move on a graph, in some way, it would probably get trapped inside its original community, as it could not cross the bottleneck formed by the inter-community edges
  • 33. CPM • Given a parameter k: 1.Find all the cliques of size k 2.Construct a clique graph. 2 cliques are adjacent if they share k-1 vertices 3.Each connect component of the clique graph is a community cliques of size 3= {1,2,3}, {1,3,4},{4,5,6},{5,6,7}, {5,6,8},{5,7,8},{6,7,8} {1,2,3} {1,3,4} {4,5,6} {5,6,7} {5,6,8} {5,7,8} {6,7,8}
  • 34. CPM The communities of the Karate network (by k-clique percolation for k = 3): gray nodes are overlapping and white nodes do not belong to any community
  • 35. Overlapping Communities: Edge centric approach • A vertex can belong to several communities • A link is usually related to one community • Cluster edges instead of nodes!! • After obtaining edge clusters, communities can be recovered by replacing each edge with its two vertices • A node is involved in a community as long as any of its connection is in the community • Assume we are given a set of description label for each vertex • Given the similarity matrix, apply a cluster algorithm to discover edge-clusters
  • 36. Testing algorithms • Compare the partition provided by the algorithm with the ground truth • We assume that the community membership for each vertex is know • The mapping is clear when dealing with 2 communities • When there are many communities the mapping may not be intuitive • We need to average all the possible mappings
  • 37. Normalized Mutual Information • Consider two partitions πa and πb with size k(a) and k(b) • nh,l is the number of vertices belonging to the h-th community for (a) and to the l-th community for (b) • na h is the number of vertices belong to the h-th community for the partition (a) • nb l is the number of vertices belong to the l-th community for the partition (b) Entropy Information Gain 0<=NMI<=1
  • 38. Testing algorithms without ground truth • Use a common (and simple) objective function • Conductance: the ratio between the number of edges leaving the cluster and the number of edge inside the cluster • Network Community Profile: the score of best cluster of size k