With the exponential growth and availability of data, the techniques for analyzing it are also
maturing. An emerging trend is to model entities and their interactions as a Social Network.
Extracting communities from these networks provides useful insights into the latent structure of
the underlying network as well as the dynamics of how the network evolves.
The modern science of networks has brought significant advances to our understanding of
complex systems. One of the most relevant features of graphs representing real systems is
community structure, or clustering i.e. the organization of vertices in clusters, with many edges
joining vertices of same clusters and few edges joining vertices of different clusters. Such
clusters can be considered as independent groups of graphs playing a similar role.
Action and content based Community Detection in Social Networks
1. Action and Content Based
Community Detection in
Social Networks
Prabhsimran Singh Baweja
Prakhar Sharma
Ritesh Modi
Vaishali Pal
Mentored By: Prateek Mehta
2. Graph
Graph is a collection of objects where some objects are
connected by link.
Mathematically –
G = (V, E)
V – Vertices
E - Edges
3. MotivationSocial network analysis focuses on mining hidden semantics in a setting
involving interacting agents. Collaboration between them defines same kind
of behavior.
Our motivation is to use edge attribute weights along with the links to find
communities.
Entities are modeled as vertices, edges capture the relationship between
them.
4. Community
A community is a collection objects/people sharing the same
interests or having same characteristics.
E.g., People liking Jazz music might belong to one community
and the people like Folk might belong to the other.
5. Community Detection
Community detection is the task of extracting densely-knit groups within the network.
Unsupervised learning problem, addressed using analysis of linkage, node attributes,
etc.
Given: G = (V, E)
Output: C1, C2, ..., Ck ,
Ci ∩ Cj = φ , ∀i ≠ j , ∪ Ci = V
1≤i≤k
Communities represent a coarse grained view of the network, can be mapped to the
functional units of the network.
6. Community Structures in Real World
Internet U.S. Football Network
Power Grid Network Books Network
7. Modeling Communities as Graphs
Notion of communities is often defined as a graph structure, G
= (V,E), representing set of objects E, and their linkages V.
Given a graph, a community is defined as collection of nodes
that are more densely connected to each other than to the
other nodes in the network.
9. State-of-the-art
Modularity
Measure of denseness of connections between nodes of same
module and sparse connections between different modules.
Higher Modularity, well defined compact communities.
Modularity Maximization
A(i,j): Observed number of intra-community edges
KiKj / 2m : expected no. of edges between i and j if placed randomly
10. Modularity Maximization
Efficient Solution – Louvain Algorithm
Initially each node belongs to its own community
We go through each node and assign them to its neighbours
community as long as its leads to increase in modularity.
This is followed until modularity cannot be maximized further
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotteand Etienne Lefebvre: Fast
unfolding of communities in large networks
11. • Edge Weights between community nodes are defined by the number of
inter-community edges.
• Folding ensures rapid decrease in the number of nodes that need to be
examined and thus enables large-scale application of the method.
12. Dataset
Set of Flickr Images metadata
Number of Images = 268649
Number of Authors (who posted at least 1 image) = 58522
Total numbers of Tags = 4932402
Number of Unique Groups = 203466
Number of Unique Galleries = 67859
13. Dataset Description
Graph |V| |E| Characteristics (each pair of author)
Edges formed
between authors
who used same
tag(s)
58522 1491950 • Cosine Similarity b/w tags used
• Group Contribution and Popularity
Distribution
• Gallery Contribution and Popularity
Distribution
Edges formed
between authors
who share
comments
589461 6012634 • Jaccard Similarity for comments shared
• Group Contribution and Popularity
Distribution
• Gallery Contribution and Popularity
Distribution
14. Dataset Analysis
• Tag Distribution
• Average Number of Tags used by an Author per photo
15. • Photo Distribution
• Number of Authors versus number of photos posted by them
• Logarithmic scale for Y-Axis values
16. • Group Distribution
• Number of Authors versus number of groups they posted photos in.
• Logarithmic scale for Y-Axis values
17. • Gallery Distribution
• Number of Authors versus number of galleries they posted photos in.
• Logarithmic scale for Y-Axis values
19. Group Contribution
P (A1, G1) = No. of Photos by A1 in G1 / Total photos in G1
P (A2, G1) = No. of Photos by A2 in G1 / Total photos in G1
Group Popularity
P (A1, G1) = No. of Photos by A1 in G1 / Total photos of A1
Gallery Contribution
P (A1, G1) = No. of Photos by A1 in G1 / Total photos in G1
Gallery Popularity
P (A1, G1) = No. of Photos by A1 in G1 / Total photos of A1
20. Results
Graph Modularity Number of Communities
Edges formed between
authors who used same tag(s)
(Weighted) (Cosine Similarity)
0.6432 80
Edges formed between
authors who used same tag(s)
(Weighted) (Jaccard Index)
0.5723 1904
Edges formed between
authors who used same tag(s)
(Unweighted)
0.6092 7
Edges formed between
authors who share comments
(Weighted)
0.4306 1092
Edges formed between
authors who share comments
(Unweighted)
0.3372 1087
21. Conclusion
As we can see, when we incorporate textual content
i.e. hash tags and their context using Cosine Similarity,
we see a good gain in modularity. Also, it results in
more compact communities.
In the second graph, edge weight is only on the basis
of count of comments shared. The more the number
of comments shared between them, the less the
22. Future Work
After successful identification of communities, we can also find
most influential author for each communities. With this
information, while someone is posting new images, he can use
the tags used the author or even can mention the author. Since,
the author is the most famous person, the image is likely to get
more hits.