SlideShare a Scribd company logo
1 of 75
Community Detection
community structure
A network is said to have community structure if the nodes
of the network can be easily grouped into (potentially
overlapping)a sets of nodes such that each set of nodes is
densely connected internally.
Searching for communities in a network
 There are numerous algorithms with different "target-
functions":
 "Homogenity" - dense connectivity clusters
 "Separation"- graph partitioning, min-cut approach
 Clustering is important for Understanding the
structure of the network
 Provides an overview of the network
3
Overlapping and Non Overlapping
Communities
4
Non Overlapping Overlapping
5
Nested
 Analyzed using adjacency matrix
6
7
Visualization after Grouping
8
(Nodes colored by
Community Membership)
4 Groups:
{1,2,3,5}
{4,8,10,12}
{6,7,11}
{9,13}
Classification
 User Preference or Behavior can be represented as class
labels
• Whether or not clicking on an ad
• Whether or not interested in certain topics
• Subscribed to certain political views
• Like/Dislike a product
 Given
 A social network
 Labels of some actors in the network
 Output
 Labels of remaining actors in the network
9
Visualization after Prediction
10
: Smoking
: Non-Smoking
: ? Unknown
Visualization after Prediction
11
Predictions
6: Non-Smoking
7: Non-Smoking
8: Smoking
9: Non-Smoking
10: Smoking
Link Prediction
 Given a social network, predict which nodes are likely
to get connected
 Output a list of (ranked) pairs of nodes
 Example: Friend recommendation in Facebook
12
(2, 3)
(4, 12)
(5, 7)
(7, 13)
Link Prediction
13
(2, 3)
(4, 12)
(5, 7)
(7, 13)
Principles of
Community Detection
Community: “subsets of actors
among whom there are relatively
strong, direct, intense, frequent or
positive ties.”
Community is a set of actors
interacting with each other
frequently
e.g. people attending this
conference
 A set of people without interaction is NOT a
community
 e.g. people waiting for a bus at station but don’t talk to
each other
 Communities are formed in Facebook
Why Communities in Social Media?
 Human beings are social
 Part of Interactions in social media is a glimpse of
the physical world
 People are connected to friends, relatives, and
colleagues in the real world as well as online
 Easy-to-use social media allows people to extend
their social life in unprecedented ways
 Difficult to meet friends in the physical world, but much
easier to find friend online with similar interests
Community Detection
 Community Detection: “formalize the strong social groups
based on the social network properties”
 Some social media sites allow people to join groups, is it
necessary to extract groups based on network topology?
 Not all sites provide community platform
 Not all people join groups
 Network interaction provides rich information about the
relationship between users
 Groups are implicitly formed
 Can complement other kinds of information
 Help network visualization and navigation
 Provide basic information for other tasks
Taxonomy of Community Criteria
 Criteria vary depending on the tasks
 Roughly, community detection methods can be divided
into 4 categories (not exclusive):
 Node-Centric Community
 Each node in a group satisfies certain properties
 Group-Centric Community
 Consider the connections within a group as a whole. The
group has to satisfy certain properties without zooming into
node-level
 Network-Centric Community
 Partition the whole network into several disjoint sets
 Hierarchy-Centric Community
 Construct a hierarchical structure of communities
Node-Centric Community Detection
 Nodes satisfy different properties
 Complete Mutuality
 cliques
 Reachability of members
 k-clique, k-clan, k-club
 Nodal degrees
 k-plex, k-core
 Relative frequency of Within-Outside Ties
 LS sets, Lambda sets
 Commonly used in traditional social network analysis
Complete Mutuality: Clique
 A maximal complete subgraph of three or more nodes
all of which are adjacent to each other
 NP-hard to find the maximal clique
 Recursive pruning: To find a clique of size k, remove
those nodes with less than k-1 degrees
 Normally use cliques as a core or seed to explore
larger communities
Complete Mutuality: Clique
 A maximal complete subgraph of three or more nodes
all of which are adjacent to each other
24
 NP-hard to find the maximal clique
 Recursive pruning: To find a clique
of size k, remove those nodes with
less than k-1 degrees
 Normally use cliques as a core or
seed to explore larger communities
Finding the Maximum Clique
 In a clique of size k, each node maintains degree >= k-1
 Nodes with degree < k-1 will not be included in the maximum
clique
 Recursively apply the following pruning procedure
 Sample a sub-network from the given network, and find a clique in
the sub-network, say, by a greedy approach
 Suppose the clique above is size k, in order to find out a larger
clique, all nodes with degree <= k-1 should be removed.
 Repeat until the network is small enough
26
Maximum Clique Example
 Suppose we sample a sub-network with nodes {1-9} and
find a clique {1, 2, 3} of size 3
 In order to find a clique >3, remove all nodes with degree
<=3-1=2
 Remove nodes 2 and 9
 Remove nodes 1 and 3
 Remove node 4
27
Geodesic
 Reachability is calibrated by the
Geodesic distance
 Geodesic: a shortest path between
two nodes (12 and 6)
 Two paths: 12-4-1-2-5-6, 12-10-6
 12-10-6 is a geodesic
 Geodesic distance: #hops in geodesic
between two nodes
 e.g., d(12, 6) = 2, d(3, 11)=5
 Diameter: the maximal geodesic
distance for any 2 nodes in a network
 #hops of the longest shortest path
28
Diameter = 5
Reachability: k-clique, k-club
 Any node in a group should be reachable
in k hops
 k-clique: a maximal subgraph in which
the largest geodesic distance between
any nodes <= k
 A k-clique can have diameter larger than
k within the subgraph
 e.g., 2-clique {12, 4, 10, 1, 6}
 Within the subgraph d(1, 6) = 3
 k-club: a substructure of diameter <= k
 e.g., {1,2,5,6,8,9}, {12, 4, 10, 1} are 2-clubs
29
Nodal Degrees: k-core, k-plex
 Each node should have a certain number of connections to
nodes within the group
 k-core: a substracture that each node connects to at least k
members within the group
 Is based on the number of nodes should have in order to be
included in the subgraph. If 3-code then it should be
connected to atleast 3 nodes.
 k-plex: for a group with ns nodes, each node should be
adjacent no fewer than ns-k in the group
 No of nodes to which each node in the group is connected. Eg
2-plex composed of 5 nodes then all nodes should be
connected to atleast 3 other nodes.
30
Within-Outside Ties: LS sets
 LS sets: Any of its proper subsets has more ties to other
nodes in the group than outside the group
 Too strict, not reasonable for network analysis
31
Clique Percolation Method
 Clique is a very strict definition, unstable
 Normally use cliques as a core or a seed to find larger
communities
 CPM is such a method to find overlapping communities
 Input
 A parameter k, and a network
 Procedure
 Find out all cliques of size k in a given network
 Construct a clique graph. Two cliques are adjacent if they share k-
1 nodes
 Each connected components in the clique graph form a
community
32
CPM Example
Cliques of size 3:
{1, 2, 3}, {1, 3, 4}, {4, 5, 6},
{5, 6, 7}, {5, 6, 8}, {5, 7, 8},
{6, 7, 8}
Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8}
33
Group-Centric Community Detection
Community
Detection
Node-
Centric
Group-
Centric
Network-
Centric
Hierarchy-
Centric
Group-Centric Community Detection: Density-Based Groups
 The group-centric criterion requires the whole group to
satisfy a certain condition
 E.g., the group density >= a given threshold
 A subgraph is a quasi-clique if
where the denominator is the maximum number of degrees.
 A similar strategy to that of cliques can be used
 Sample a subgraph, and find a maximal quasi-clique
(say, of size )
 Remove nodes with degree less than the average degree
35
,
<
Network-Centric Community Detection
Community
Detection
Node-
Centric
Group-
Centric
Network-
Centric
Hierarchy-
Centric
Network-Centric Community Detection
 Network-centric criterion needs to consider the
connections within a network globally
 Goal: partition nodes of a network into disjoint sets
 Groups based on
 (1) Clustering based on vertex similarity
 (2) Latent space models (multi-dimensional scaling
)
 (3) Block model approximation
 (4) Spectral clustering
 (5) Modularity maximization
37
Clustering based on Vertex Similarity
 Distance based partition of the nodes
 Two nodes are structurally equivalent if they connect
to the same set of actors
 e.g., nodes 8 and 9 are structurally equivalent
 Groups are defined over equivalent nodes
 Too strict
 Rarely occur in a large-scale
 Relaxed equivalence class is difficult to compute
 In practice, use vector similarity
 e.g., cosine similarity, Jaccard similarity
38
39
Vertex Similarity
1 2 3 4 5 6 7 8 9 10 11 12 13
5 1 1
8 1 1 1
9 1 1 1
40
Cosine Similarity:
6
1
3
2
1
)
8
,
5
( 


sim
4
/
1
)
8
,
5
( |
}
13
,
6
,
2
,
1
{
|
|
}
6
{
|


J
a vector
structurally
equivalent
Jaccard Similarity:
Similarity using clusters
 We may have some items which basically represent the
same thing. We place these represent-the-same-thing
objects
 C1 = {0, 1, 2} C2 = {3, 4} C3 = {5, 6} C4 = {7, 8, 9} in clusters.
 For instance, C1 might represent action movies, C2
comedies, C3 documentaries, and C4 horror movies.
Now we can represent
Aclu = {C1, C3} and Bclu = {C1, C2, C3, C4} since A only contains elements
from C1 and C3, while B contains elements from all clusters. The Jaccard
distance of the clustered sets is now
JSclu(A, B) = JS(Aclu, Bclu) = = 2 / 4 = 0.5
41
|
C4}
C3,
C2,
{C1,
|
|
C3}
{C1,
|
Example
 a rose is a rose is a rose
 K-shingles where k = 4
 1) a rose is a
 2)rose is a rose
 3)is a rose is
 K-Shingles where k=2
44
Jaccard with Shingles
Statements
D1 : I am Sam.
D2 : Sam I am.
D3 : I do not like green eggs and ham.
D4 : I do not like them, Sam I am.
K-Shingles where k=2
 D1 : [I am], [am Sam]
 D2 : [Sam I], [I am]
 D3 : [I do], [do not], [not like], [like green], [green eggs],
[eggs and], [and ham]
 D4 : [I do], [do not], [not like], [like them], [them Sam],
[Sam I], [I am]
45
Jaccard with Shingles
 Now the Jaccard similarity is as follows:
JS(D1, D2) = 1/3 ≈ 0.333
JS(D1, D3) = 0 = 0.0
JS(D1, D4) = 1/8 = 0.125
JS(D2, D3) = 0 = 0.0
JS(D3, D4) = 2/7 ≈ 0.286
JS(D3, D4) = 3/11 ≈ 0.273
46
Latent Space Models
 Map nodes into a low-dimensional space such that the proximity
between nodes based on network connectivity is preserved in the
new space, then apply k-means clustering
 Multi-dimensional scaling (MDS)
 Given a network, construct a proximity matrix P representing the pairwise
distance between nodes (e.g., geodesic distance)
 Let denote the coordinates of nodes in the low-dimensional space
 Objective function:
 Solution:
 V is the top eigenvectors of , and is a diagonal matrix of top
eigenvalues

S  Rnl
47
(2) Latent space models
Centered matrix
MDS Example
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
geodesic
distance
48
(2) Latent space models
Block Models
 S is the community indicator matrix (group memberships)
 Relax S to be numerical values, then the optimal solution
corresponds to the top eigenvectors of A
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
49
(3) Block model approximation
Cut-Minimization
 Between-group interactions should be infrequent
 Cut: number of edges between two sets of nodes
 Objective: minimize the cut
 Limitations: often find communities of
only one node
 Need to consider the group size
 Two commonly-used variants:
50
Cut =1
Cut=2
(4) Spectral clustering
 Partitioning Quality is very important
 Degenerate case:

51
 =5/17

52
53
Ratio Cut & Normalized Cut
 Minimum cut often returns an imbalanced partition, with
one set being a singleton, e.g. node 9
 Change the objective function to consider community size
Ci,:a community
|Ci|: number of nodes in Ci
vol(Ci): sum of degrees in Ci
54
(4) Spectral clustering
Ratio Cut & Normalized Cut Example
For partition in red:
For partition in green:
Both ratio cut and normalized cut prefer a balanced partition
55
(4) Spectral clustering
Spectral Clustering
 Both ratio cut and normalized cut can be reformulated as
 Where
 Spectral relaxation:
 Optimal solution: top eigenvectors with the smallest
eigenvalues
graph Laplacian for ratio cut
normalized graph Laplacian
A diagonal matrix of degrees
56
(4) Spectral clustering
Spectral Clustering Example
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
The 1st eigenvector
means all nodes belong
to the same cluster, no
use
k-means
57
(4) Spectral clustering
Centered matrix
Modularity Maximization
 Modularity measures the group interactions compared
with the expected random connections in the group
 In a network with m edges, for two nodes with degree di
and dj , expected random connections between them are
 The interaction utility in a group:
 To partition the group into
multiple groups, we maximize
58
Expected Number of
edges between 6 and 9 is
5*3/(2*17) = 15/34
(5) Modularity maximization
Properties of Modularity
 Properties of modularity:
 Between (-1, 1)
 Modularity = 0 If all nodes are clustered into one group
 Can automatically determine optimal number of
clusters
 Resolution limit of modularity
 Modularity maximization might return a community
consisting multiple small modules
60
Applications
Navigate through Complicated topologies can be analyzed
and can be used for prediction
 Online social networks
 Ego logical environments
 Social logical Communities
 Metabolic Network
 Neural Networks
 Food Webs
 Biological Networks example Protein-Protein interaction–
 Internetwork
62
63
Divisive Hierarchical Clustering
 Divisive clustering
 Partition nodes into several sets
 Each set is further divided into smaller ones
 Network-centric partition can be applied for the partition
 One particular example: recursively remove the “weakest”
tie
 Find the edge with the least strength
 Remove the edge and update the corresponding strength of each
edge
 Recursively apply the above two steps until a network is
decomposed into desired number of connected
components.
 Each component forms a community
64
Edge Betweenness
 The strength of a tie can be measured by edge betweenness
 Edge betweenness: the number of shortest paths that pass
along with the edge
 The edge with higher betweenness tends to be the bridge
between two communities.
The edge betweenness of e(1, 2) is
4 (=6/2 + 1), as all the shortest
paths from 2 to {4, 5, 6, 7, 8, 9}
have to either pass e(1, 2) or e(2,
3), and e(1,2) is the shortest path
between 1 and 2
65
Divisive clustering based on edge
betweenness
After remove e(4,5), the
betweenness of e(4, 6) becomes 20,
which is the highest;
After remove e(4,6), the edge e(7,9)
has the highest betweenness value 4,
and should be removed.
Initial betweenness value
66
Idea: progressively removing edges with the highest betweenness
Agglomerative Hierarchical Clustering
 Initialize each node as a community
 Merge communities successively into larger
communities following a certain criterion
 E.g., based on modularity increase
67
Dendrogram according to Agglomerative Clustering based on Modularity
Summary of Community Detection
 Node-Centric Community Detection
 cliques, k-cliques, k-clubs
 Group-Centric Community Detection
 quasi-cliques
 Network-Centric Community Detection
 Clustering based on vertex similarity
 Latent space models, block models, spectral clustering, modularity
maximization
 Hierarchy-Centric Community Detection
 Divisive clustering
 Agglomerative clustering
68
COMMUNITY EVALUATION
69
Evaluating Community Detection (1)
 For groups with clear definitions
 E.g., Cliques, k-cliques, k-clubs, quasi-cliques
 Verify whether extracted communities satisfy the
definition
 For networks with ground truth information
 Normalized mutual information
 Accuracy of pairwise community memberships
70
Measuring a Clustering Result
 The number of communities after grouping can be
different from the ground truth
 No clear community correspondence between clustering
result and the ground truth
 Normalized Mutual Information can be used
Ground Truth
1, 2,
3
4, 5,
6
1, 3 2
4, 5,
6
Clustering Result
How to measure the
clustering quality?
71
Normalized Mutual Information
 Entropy: the information contained in a distribution
 Mutual Information: the shared information between two
distributions
 Normalized Mutual Information (between 0 and 1)
 Consider a partition as a distribution (probability of one
node falling into one community), we can compute the
matching between the clustering result and the ground
truth
72
or KDD04, Dhilon
JMLR03, Strehl
NMI
73
NMI-Example
 Partition a: [1, 1, 1, 2, 2, 2]
 Partition b: [1, 2, 1, 3, 3, 3]
1, 2, 3 4, 5, 6
1, 3 2 4, 5,6
h=1 3
h=2 3
a
h
n
l=1 2
l=2 1
l=3 3
b
l
n l=1 l=2 l=3
h=1 2 1 0
h=2 0 0 3
l
h
n ,
=0.8278
74
Reference: http://www.cse.ust.hk/~weikep/notes/NormalizedMI.m
contingency table or confusion matrix
Accuracy of Pairwise Community Memberships
 Consider all the possible pairs of nodes and check whether they reside
in the same community
 An error occurs if
 Two nodes belonging to the same community are assigned to
different communities after clustering
 Two nodes belonging to different communities are assigned to the
same community
 Construct a contingency table or confusion matrix
75
Accuracy Example
Ground Truth
C(vi) = C(vj) C(vi) != C(vj)
Clustering
Result
C(vi) = C(vj) 4 0
C(vi) != C(vj) 2 9
Ground Truth
1, 2,
3
4, 5,
6
1, 3 2
4, 5,
6
Clustering Result
Accuracy = (4+9)/ (4+2+9+0) = 13/15
76
Evaluation using Semantics
 For networks with semantics
 Networks come with semantic or attribute information
of nodes or connections
 Human subjects can verify whether the extracted
communities are coherent
 Evaluation is qualitative
 It is also intuitive and helps understand a community
An animal
community
A health
community
77
Evaluation without Ground Truth
 For networks without ground truth or semantic
information
 This is the most common situation
 An option is to resort to cross-validation
 Extract communities from a (training) network
 Evaluate the quality of the community structure on a network
constructed from a different date or based on a related type of
interaction
 Quantitative evaluation functions
 Modularity (M.Newman. Modularity and community structure in
networks. PNAS 06.)
 Link prediction (the predicted network is compared with the true
network)
78
Book Available at
 Morgan & claypool Publishers
 Amazon
If you have any comments,
please feel free to contact:
• Lei Tang, Yahoo! Labs,
ltang@yahoo-inc.com
• Huan Liu, ASU
huanliu@asu.edu
79

More Related Content

Similar to community Detection.pptx

iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of communityIJCSES Journal
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxAmy Aung
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIAustin Benson
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptxchmeghana1
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2talktoharry
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 

Similar to community Detection.pptx (20)

iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
4 Cliques Clusters
4 Cliques Clusters4 Cliques Clusters
4 Cliques Clusters
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of community
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Higher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoIHigher-order clustering coefficients at Purdue CSoI
Higher-order clustering coefficients at Purdue CSoI
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptx
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 

Recently uploaded (20)

young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 

community Detection.pptx

  • 2. community structure A network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping)a sets of nodes such that each set of nodes is densely connected internally.
  • 3. Searching for communities in a network  There are numerous algorithms with different "target- functions":  "Homogenity" - dense connectivity clusters  "Separation"- graph partitioning, min-cut approach  Clustering is important for Understanding the structure of the network  Provides an overview of the network 3
  • 4. Overlapping and Non Overlapping Communities 4
  • 6.  Analyzed using adjacency matrix 6
  • 7. 7
  • 8. Visualization after Grouping 8 (Nodes colored by Community Membership) 4 Groups: {1,2,3,5} {4,8,10,12} {6,7,11} {9,13}
  • 9. Classification  User Preference or Behavior can be represented as class labels • Whether or not clicking on an ad • Whether or not interested in certain topics • Subscribed to certain political views • Like/Dislike a product  Given  A social network  Labels of some actors in the network  Output  Labels of remaining actors in the network 9
  • 10. Visualization after Prediction 10 : Smoking : Non-Smoking : ? Unknown
  • 11. Visualization after Prediction 11 Predictions 6: Non-Smoking 7: Non-Smoking 8: Smoking 9: Non-Smoking 10: Smoking
  • 12. Link Prediction  Given a social network, predict which nodes are likely to get connected  Output a list of (ranked) pairs of nodes  Example: Friend recommendation in Facebook 12 (2, 3) (4, 12) (5, 7) (7, 13)
  • 13. Link Prediction 13 (2, 3) (4, 12) (5, 7) (7, 13)
  • 15. Community: “subsets of actors among whom there are relatively strong, direct, intense, frequent or positive ties.” Community is a set of actors interacting with each other frequently e.g. people attending this conference
  • 16.  A set of people without interaction is NOT a community  e.g. people waiting for a bus at station but don’t talk to each other  Communities are formed in Facebook
  • 17. Why Communities in Social Media?  Human beings are social  Part of Interactions in social media is a glimpse of the physical world  People are connected to friends, relatives, and colleagues in the real world as well as online  Easy-to-use social media allows people to extend their social life in unprecedented ways  Difficult to meet friends in the physical world, but much easier to find friend online with similar interests
  • 18. Community Detection  Community Detection: “formalize the strong social groups based on the social network properties”  Some social media sites allow people to join groups, is it necessary to extract groups based on network topology?  Not all sites provide community platform  Not all people join groups  Network interaction provides rich information about the relationship between users  Groups are implicitly formed  Can complement other kinds of information  Help network visualization and navigation  Provide basic information for other tasks
  • 19.
  • 20. Taxonomy of Community Criteria  Criteria vary depending on the tasks  Roughly, community detection methods can be divided into 4 categories (not exclusive):  Node-Centric Community  Each node in a group satisfies certain properties  Group-Centric Community  Consider the connections within a group as a whole. The group has to satisfy certain properties without zooming into node-level  Network-Centric Community  Partition the whole network into several disjoint sets  Hierarchy-Centric Community  Construct a hierarchical structure of communities
  • 21.
  • 22. Node-Centric Community Detection  Nodes satisfy different properties  Complete Mutuality  cliques  Reachability of members  k-clique, k-clan, k-club  Nodal degrees  k-plex, k-core  Relative frequency of Within-Outside Ties  LS sets, Lambda sets  Commonly used in traditional social network analysis
  • 23. Complete Mutuality: Clique  A maximal complete subgraph of three or more nodes all of which are adjacent to each other  NP-hard to find the maximal clique  Recursive pruning: To find a clique of size k, remove those nodes with less than k-1 degrees  Normally use cliques as a core or seed to explore larger communities
  • 24. Complete Mutuality: Clique  A maximal complete subgraph of three or more nodes all of which are adjacent to each other 24  NP-hard to find the maximal clique  Recursive pruning: To find a clique of size k, remove those nodes with less than k-1 degrees  Normally use cliques as a core or seed to explore larger communities
  • 25.
  • 26. Finding the Maximum Clique  In a clique of size k, each node maintains degree >= k-1  Nodes with degree < k-1 will not be included in the maximum clique  Recursively apply the following pruning procedure  Sample a sub-network from the given network, and find a clique in the sub-network, say, by a greedy approach  Suppose the clique above is size k, in order to find out a larger clique, all nodes with degree <= k-1 should be removed.  Repeat until the network is small enough 26
  • 27. Maximum Clique Example  Suppose we sample a sub-network with nodes {1-9} and find a clique {1, 2, 3} of size 3  In order to find a clique >3, remove all nodes with degree <=3-1=2  Remove nodes 2 and 9  Remove nodes 1 and 3  Remove node 4 27
  • 28. Geodesic  Reachability is calibrated by the Geodesic distance  Geodesic: a shortest path between two nodes (12 and 6)  Two paths: 12-4-1-2-5-6, 12-10-6  12-10-6 is a geodesic  Geodesic distance: #hops in geodesic between two nodes  e.g., d(12, 6) = 2, d(3, 11)=5  Diameter: the maximal geodesic distance for any 2 nodes in a network  #hops of the longest shortest path 28 Diameter = 5
  • 29. Reachability: k-clique, k-club  Any node in a group should be reachable in k hops  k-clique: a maximal subgraph in which the largest geodesic distance between any nodes <= k  A k-clique can have diameter larger than k within the subgraph  e.g., 2-clique {12, 4, 10, 1, 6}  Within the subgraph d(1, 6) = 3  k-club: a substructure of diameter <= k  e.g., {1,2,5,6,8,9}, {12, 4, 10, 1} are 2-clubs 29
  • 30. Nodal Degrees: k-core, k-plex  Each node should have a certain number of connections to nodes within the group  k-core: a substracture that each node connects to at least k members within the group  Is based on the number of nodes should have in order to be included in the subgraph. If 3-code then it should be connected to atleast 3 nodes.  k-plex: for a group with ns nodes, each node should be adjacent no fewer than ns-k in the group  No of nodes to which each node in the group is connected. Eg 2-plex composed of 5 nodes then all nodes should be connected to atleast 3 other nodes. 30
  • 31. Within-Outside Ties: LS sets  LS sets: Any of its proper subsets has more ties to other nodes in the group than outside the group  Too strict, not reasonable for network analysis 31
  • 32. Clique Percolation Method  Clique is a very strict definition, unstable  Normally use cliques as a core or a seed to find larger communities  CPM is such a method to find overlapping communities  Input  A parameter k, and a network  Procedure  Find out all cliques of size k in a given network  Construct a clique graph. Two cliques are adjacent if they share k- 1 nodes  Each connected components in the clique graph form a community 32
  • 33. CPM Example Cliques of size 3: {1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8} Communities: {1, 2, 3, 4} {4, 5, 6, 7, 8} 33
  • 35. Group-Centric Community Detection: Density-Based Groups  The group-centric criterion requires the whole group to satisfy a certain condition  E.g., the group density >= a given threshold  A subgraph is a quasi-clique if where the denominator is the maximum number of degrees.  A similar strategy to that of cliques can be used  Sample a subgraph, and find a maximal quasi-clique (say, of size )  Remove nodes with degree less than the average degree 35 , <
  • 37. Network-Centric Community Detection  Network-centric criterion needs to consider the connections within a network globally  Goal: partition nodes of a network into disjoint sets  Groups based on  (1) Clustering based on vertex similarity  (2) Latent space models (multi-dimensional scaling )  (3) Block model approximation  (4) Spectral clustering  (5) Modularity maximization 37
  • 38. Clustering based on Vertex Similarity  Distance based partition of the nodes  Two nodes are structurally equivalent if they connect to the same set of actors  e.g., nodes 8 and 9 are structurally equivalent  Groups are defined over equivalent nodes  Too strict  Rarely occur in a large-scale  Relaxed equivalence class is difficult to compute  In practice, use vector similarity  e.g., cosine similarity, Jaccard similarity 38
  • 39. 39
  • 40. Vertex Similarity 1 2 3 4 5 6 7 8 9 10 11 12 13 5 1 1 8 1 1 1 9 1 1 1 40 Cosine Similarity: 6 1 3 2 1 ) 8 , 5 (    sim 4 / 1 ) 8 , 5 ( | } 13 , 6 , 2 , 1 { | | } 6 { |   J a vector structurally equivalent Jaccard Similarity:
  • 41. Similarity using clusters  We may have some items which basically represent the same thing. We place these represent-the-same-thing objects  C1 = {0, 1, 2} C2 = {3, 4} C3 = {5, 6} C4 = {7, 8, 9} in clusters.  For instance, C1 might represent action movies, C2 comedies, C3 documentaries, and C4 horror movies. Now we can represent Aclu = {C1, C3} and Bclu = {C1, C2, C3, C4} since A only contains elements from C1 and C3, while B contains elements from all clusters. The Jaccard distance of the clustered sets is now JSclu(A, B) = JS(Aclu, Bclu) = = 2 / 4 = 0.5 41 | C4} C3, C2, {C1, | | C3} {C1, |
  • 42. Example  a rose is a rose is a rose  K-shingles where k = 4  1) a rose is a  2)rose is a rose  3)is a rose is  K-Shingles where k=2 44
  • 43. Jaccard with Shingles Statements D1 : I am Sam. D2 : Sam I am. D3 : I do not like green eggs and ham. D4 : I do not like them, Sam I am. K-Shingles where k=2  D1 : [I am], [am Sam]  D2 : [Sam I], [I am]  D3 : [I do], [do not], [not like], [like green], [green eggs], [eggs and], [and ham]  D4 : [I do], [do not], [not like], [like them], [them Sam], [Sam I], [I am] 45
  • 44. Jaccard with Shingles  Now the Jaccard similarity is as follows: JS(D1, D2) = 1/3 ≈ 0.333 JS(D1, D3) = 0 = 0.0 JS(D1, D4) = 1/8 = 0.125 JS(D2, D3) = 0 = 0.0 JS(D3, D4) = 2/7 ≈ 0.286 JS(D3, D4) = 3/11 ≈ 0.273 46
  • 45. Latent Space Models  Map nodes into a low-dimensional space such that the proximity between nodes based on network connectivity is preserved in the new space, then apply k-means clustering  Multi-dimensional scaling (MDS)  Given a network, construct a proximity matrix P representing the pairwise distance between nodes (e.g., geodesic distance)  Let denote the coordinates of nodes in the low-dimensional space  Objective function:  Solution:  V is the top eigenvectors of , and is a diagonal matrix of top eigenvalues  S  Rnl 47 (2) Latent space models Centered matrix
  • 46. MDS Example Two communities: {1, 2, 3, 4} and {5, 6, 7, 8, 9} geodesic distance 48 (2) Latent space models
  • 47. Block Models  S is the community indicator matrix (group memberships)  Relax S to be numerical values, then the optimal solution corresponds to the top eigenvectors of A Two communities: {1, 2, 3, 4} and {5, 6, 7, 8, 9} 49 (3) Block model approximation
  • 48. Cut-Minimization  Between-group interactions should be infrequent  Cut: number of edges between two sets of nodes  Objective: minimize the cut  Limitations: often find communities of only one node  Need to consider the group size  Two commonly-used variants: 50 Cut =1 Cut=2 (4) Spectral clustering
  • 49.  Partitioning Quality is very important  Degenerate case:  51
  • 51. 53
  • 52. Ratio Cut & Normalized Cut  Minimum cut often returns an imbalanced partition, with one set being a singleton, e.g. node 9  Change the objective function to consider community size Ci,:a community |Ci|: number of nodes in Ci vol(Ci): sum of degrees in Ci 54 (4) Spectral clustering
  • 53. Ratio Cut & Normalized Cut Example For partition in red: For partition in green: Both ratio cut and normalized cut prefer a balanced partition 55 (4) Spectral clustering
  • 54. Spectral Clustering  Both ratio cut and normalized cut can be reformulated as  Where  Spectral relaxation:  Optimal solution: top eigenvectors with the smallest eigenvalues graph Laplacian for ratio cut normalized graph Laplacian A diagonal matrix of degrees 56 (4) Spectral clustering
  • 55. Spectral Clustering Example Two communities: {1, 2, 3, 4} and {5, 6, 7, 8, 9} The 1st eigenvector means all nodes belong to the same cluster, no use k-means 57 (4) Spectral clustering Centered matrix
  • 56. Modularity Maximization  Modularity measures the group interactions compared with the expected random connections in the group  In a network with m edges, for two nodes with degree di and dj , expected random connections between them are  The interaction utility in a group:  To partition the group into multiple groups, we maximize 58 Expected Number of edges between 6 and 9 is 5*3/(2*17) = 15/34 (5) Modularity maximization
  • 57. Properties of Modularity  Properties of modularity:  Between (-1, 1)  Modularity = 0 If all nodes are clustered into one group  Can automatically determine optimal number of clusters  Resolution limit of modularity  Modularity maximization might return a community consisting multiple small modules 60
  • 58. Applications Navigate through Complicated topologies can be analyzed and can be used for prediction  Online social networks  Ego logical environments  Social logical Communities  Metabolic Network  Neural Networks  Food Webs  Biological Networks example Protein-Protein interaction–  Internetwork 62
  • 59. 63
  • 60. Divisive Hierarchical Clustering  Divisive clustering  Partition nodes into several sets  Each set is further divided into smaller ones  Network-centric partition can be applied for the partition  One particular example: recursively remove the “weakest” tie  Find the edge with the least strength  Remove the edge and update the corresponding strength of each edge  Recursively apply the above two steps until a network is decomposed into desired number of connected components.  Each component forms a community 64
  • 61. Edge Betweenness  The strength of a tie can be measured by edge betweenness  Edge betweenness: the number of shortest paths that pass along with the edge  The edge with higher betweenness tends to be the bridge between two communities. The edge betweenness of e(1, 2) is 4 (=6/2 + 1), as all the shortest paths from 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1,2) is the shortest path between 1 and 2 65
  • 62. Divisive clustering based on edge betweenness After remove e(4,5), the betweenness of e(4, 6) becomes 20, which is the highest; After remove e(4,6), the edge e(7,9) has the highest betweenness value 4, and should be removed. Initial betweenness value 66 Idea: progressively removing edges with the highest betweenness
  • 63. Agglomerative Hierarchical Clustering  Initialize each node as a community  Merge communities successively into larger communities following a certain criterion  E.g., based on modularity increase 67 Dendrogram according to Agglomerative Clustering based on Modularity
  • 64. Summary of Community Detection  Node-Centric Community Detection  cliques, k-cliques, k-clubs  Group-Centric Community Detection  quasi-cliques  Network-Centric Community Detection  Clustering based on vertex similarity  Latent space models, block models, spectral clustering, modularity maximization  Hierarchy-Centric Community Detection  Divisive clustering  Agglomerative clustering 68
  • 66. Evaluating Community Detection (1)  For groups with clear definitions  E.g., Cliques, k-cliques, k-clubs, quasi-cliques  Verify whether extracted communities satisfy the definition  For networks with ground truth information  Normalized mutual information  Accuracy of pairwise community memberships 70
  • 67. Measuring a Clustering Result  The number of communities after grouping can be different from the ground truth  No clear community correspondence between clustering result and the ground truth  Normalized Mutual Information can be used Ground Truth 1, 2, 3 4, 5, 6 1, 3 2 4, 5, 6 Clustering Result How to measure the clustering quality? 71
  • 68. Normalized Mutual Information  Entropy: the information contained in a distribution  Mutual Information: the shared information between two distributions  Normalized Mutual Information (between 0 and 1)  Consider a partition as a distribution (probability of one node falling into one community), we can compute the matching between the clustering result and the ground truth 72 or KDD04, Dhilon JMLR03, Strehl
  • 70. NMI-Example  Partition a: [1, 1, 1, 2, 2, 2]  Partition b: [1, 2, 1, 3, 3, 3] 1, 2, 3 4, 5, 6 1, 3 2 4, 5,6 h=1 3 h=2 3 a h n l=1 2 l=2 1 l=3 3 b l n l=1 l=2 l=3 h=1 2 1 0 h=2 0 0 3 l h n , =0.8278 74 Reference: http://www.cse.ust.hk/~weikep/notes/NormalizedMI.m contingency table or confusion matrix
  • 71. Accuracy of Pairwise Community Memberships  Consider all the possible pairs of nodes and check whether they reside in the same community  An error occurs if  Two nodes belonging to the same community are assigned to different communities after clustering  Two nodes belonging to different communities are assigned to the same community  Construct a contingency table or confusion matrix 75
  • 72. Accuracy Example Ground Truth C(vi) = C(vj) C(vi) != C(vj) Clustering Result C(vi) = C(vj) 4 0 C(vi) != C(vj) 2 9 Ground Truth 1, 2, 3 4, 5, 6 1, 3 2 4, 5, 6 Clustering Result Accuracy = (4+9)/ (4+2+9+0) = 13/15 76
  • 73. Evaluation using Semantics  For networks with semantics  Networks come with semantic or attribute information of nodes or connections  Human subjects can verify whether the extracted communities are coherent  Evaluation is qualitative  It is also intuitive and helps understand a community An animal community A health community 77
  • 74. Evaluation without Ground Truth  For networks without ground truth or semantic information  This is the most common situation  An option is to resort to cross-validation  Extract communities from a (training) network  Evaluate the quality of the community structure on a network constructed from a different date or based on a related type of interaction  Quantitative evaluation functions  Modularity (M.Newman. Modularity and community structure in networks. PNAS 06.)  Link prediction (the predicted network is compared with the true network) 78
  • 75. Book Available at  Morgan & claypool Publishers  Amazon If you have any comments, please feel free to contact: • Lei Tang, Yahoo! Labs, ltang@yahoo-inc.com • Huan Liu, ASU huanliu@asu.edu 79