1. Community Detection in Social Networks
Using Random Walk on Graphs
Manish Seal Manojit Chakraborty Sayan Hazra
Heritage Institute of Technology, Kolkata
Dept. of Computer Science and Engineering
May 11, 2018
c Project Group B4 (HIT-K) Community Detection May 11, 2018 1 / 20
2. Social Networks
Complex Networks are everywhere.They crop up wherever there are interactions between actors.
Phenomena Agent Network
Venereal disease Pathogens Sexual network
Research Paper Scientists Citation network
Rumor spreading Information, memes Communication network
Computer viruses Digital viruses Internet network
Bedbugs Parasitic insects Hotel-traveler network
Malaria Plasmodium Mosquito-human Network
Table: Different agents and corresponding networks
c Project Group B4 (HIT-K) Community Detection May 11, 2018 2 / 20
3. Citation and Email Network
c Project Group B4 (HIT-K) Community Detection May 11, 2018 3 / 20
4. Network Representation
Networks portray the interactions between different actors.Graphs hand us a valuable tool to
process and handle networks.
Actors or individuals are nodes in the graph
If there’s interaction between two nodes,
there’s an edge between them
The links can have weights or intensities
signifying connection strength
The links can be directed, like in the web
graph. There’s a directed link between two
nodes (pages) A and B if there’s a
hyperlink to B from A
Figure: Networks
c Project Group B4 (HIT-K) Community Detection May 11, 2018 4 / 20
5. Community Structure
Community
High concentrations of edges within special
groups of vertices, and low concentrations
between them. This feature of real networks is
called Community Structure
Applications
Clustering Web clients having similar
interests and are geographically near
improves the performance of services
Identifying clusters of customers with
similar interests in purchase networks of
online retailers enables efficient
recommendation systems Figure: Communities in a Graph
c Project Group B4 (HIT-K) Community Detection May 11, 2018 5 / 20
6. Community Detection Algorithms
There are several approaches for Community Detection in a network. A comprehensive overview
of the methods can be found in [1]. Here are the 2 main models.
Null Model
Compares some measure of connectivity within groups of nodes with the expected value in
a proper null model [2-4].
Communities are identified as the sets of nodes for which the connectivity deviates the
most from the null model. This is the approach of modularity [2], which the commonly
used Louvain method [3] implements.
Flow Model
Operates on the dynamics on the network.
Communities consist of nodes among which flow persists for a long time once entered.
The Map Equation [5-6] is a flow based method.
c Project Group B4 (HIT-K) Community Detection May 11, 2018 6 / 20
7. Our project Inspiration
The following paper published in 2008 by M.Rosvall, C.Bergstrom initiated the groundwork for
our project.
Maps of information flow reveal community structure in complex networks - It
implements random walk and huffman coding to create a Map equation for detecting
communities within a graph.The network structure with shortest description length gives
the best community structure.The map equation is given by :
L(M) = qH(q) +
m
i=1
Pi ∗ H(Pi)
where L(M) represents description length for module partition M,
q represents rate at which index codebook is used,
H(q) represents frequency weighted average length of codeword in index codebook.
c Project Group B4 (HIT-K) Community Detection May 11, 2018 7 / 20
8. Jaccard Weight
Jaccard Similarity
Jaccard Similarity (J) of two nodes a, b ∈ G.V is given by
Ja,b =
|G.Adj(a) ∩ G.Adj(b)|
|G.Adj(a) ∪ G.Adj(b)|
where G.Adj(a) refers neighbours to node a in graph G.
We take the Unweighted,undirected graph G(V,E) and apply Ja,b∀G.E. Then for a node
u ∈ G.V , we calculate Normalised Weight Nu,v for it’s neighbour v as
Nu,v =
Ju,v
x∈G.Adj(u) Ju,x
which makes the Graph Directed because Nu,v = Nv,u
c Project Group B4 (HIT-K) Community Detection May 11, 2018 8 / 20
9. Random Walk and Community Creation
Random Walk
We start the random walk from a node a chose uniformly at random from G .V on the graph
G . While being at node u, the next node is chosen using a coin toss from one of the nodes
v G .Adj(u) with probability τ and with probability 1 − τ a node v is chosen uniformly at
random from G .V .
Creating Cover
The edges taken one at a time from the sorted list FreqList are joined using UNION-FIND, till
we reach the end of the list or the frequency drops down to zero.
Each component of Random Forest created is a cover
c Project Group B4 (HIT-K) Community Detection May 11, 2018 9 / 20
10. Basic Structure of work
Input : Unweighted undirected graph G(V, E)
Compute Jaccard values for every edge E G
Normalize the values for every node A V as following:
DirectEdge − Weight (A, B) = JaccardV alueOfEdge(A,B)
ΣJaccardV alueOfEdge(A,Neighbor(A))
Do random walk on the graph with the weights. Set a teleportation value of 0.15, as in
PageRank algorithm.
Sort the edges based on their frequency in a non-ascending order.
Keep joining the edges as per the list to a new set using Union-Find algorithm.
Pendant nodes are added later on.
The sets are returned as the final output.
Output : Set of disjoint set of vertices ∀v G.V
c Project Group B4 (HIT-K) Community Detection May 11, 2018 10 / 20
11. Modifications
Converging the edge weights
After getting a Na,b, Nb,a∀ edge (a, b) in G we calculate another component S as
Sa,b = Na,b + Nb,a , ∀edge(a, b) G.E
and then we renormalize the summed up value for every node a G.V to get N as
Na,b =
Sa,b
i G.Adj(a) Si,b
and update the weight of edge (a, b) in G
We keep converging these values till
Na,b − Na,b > T
where T is the tolerance limit
c Project Group B4 (HIT-K) Community Detection May 11, 2018 11 / 20
12. The Multiarm Bandit Problem
The multi-armed bandit problem is a classic reinforcement learning example where we
are given a slot machine with n arms (bandits) with each arm having its own probability
distribution of success. Pulling any one of the arms gives you a stochastic REWARD of
either R=+1 for success, or R=0 for failure.
Our objective is to pull the arms one-by-one in sequence such that we maximize our total
reward collected.
This problem is a popular kind of exploration-exploitation dilemma as agents do not
know which arm gives what reward.
If the reward for pulling arm ai at t-th step be rai,t and if we have T arm pulls then our
job is to maximise Total Reward i.e. T
t=1 rai,t
c Project Group B4 (HIT-K) Community Detection May 11, 2018 12 / 20
13. Modified Algorithm
Input
unweighted Undirected graph G(V, E)
Implementation
Compute Jaccard values for every edge E ∈ G
Normalize the values for every node A ∈ V as following:
For B ∈ Γ(A)
Directed_Edge_Weight (A, B) = Jaccard Value Of Edge(A,B)
x∈Γ(A) Jaccard Value Of Edge(A,x)
Initialize Reward for each edge as 0.
Maintain an ordered list, of size no greater than the threshold value, of edges based on
frequency in a non increasing manner.
Set walk length as Mlog(M)
c Project Group B4 (HIT-K) Community Detection May 11, 2018 13 / 20
14. Algorithm Continued
implementation
Choose Initial random node from the graph.
For i = 0 to walk_length
Teleport with probability
With 1 − probability
reward = −1
while reward < 0
Go to a vertexv, v ∈ Γ(u)
reward = edge_reward(u, v)
Increase edge_frequency of edge (u, v) by 1.
Form Community structure with edges from edge frequency list (in a non-increasing order)
Assign ∆Q as a reward or regret for the edge (u, v)
Output
Set of disjoint set of vertices ∀v ∈ G.V
c Project Group B4 (HIT-K) Community Detection May 11, 2018 14 / 20
15. Results and Visualization-Football Network
Figure: Using Simple Jaccard Figure: Using Current Algorithm
c Project Group B4 (HIT-K) Community Detection May 11, 2018 15 / 20
16. Results and Visualization-Karate Network
Figure: Using Simple Jaccard Figure: Using Current Algorithm
c Project Group B4 (HIT-K) Community Detection May 11, 2018 16 / 20
17. Performance of our algorithm
A comperative view of our algorithm with Louvain and CNM in terms of Modularity and Time
is given here:
Network Louvain CNM Our Algorithm
Modularity(C) T(Sec) Modularity(C) T(Sec) Modularity(C) Avg T(Sec)
Karate Club 0.415 0 0.38 0 0.419789 1.02
Dolphin 0.518 0 0.492 0 0.525869 1.12
Football 0.604 0 0.57 0 0.6045695 1.38
Enron 0.596 0.38 0.49 362 0.619371 53.91
GrQc 0.847 0 0.79 4 0.858353 107.0318
c Project Group B4 (HIT-K) Community Detection May 11, 2018 17 / 20
18. Future Plans
Ensemble methods like Bagging,Boosting can be used to train a large number of weak
classifiers from different random walks to generate one good community cover
Multiple Arm Bandit heuristics can be applied in a better way to choose edges at the time
of agglomeration of clusters or communities.
Unsupervised learning techniques like Association Rule Mining can be used to generate
intermediate covers from a given social network.
c Project Group B4 (HIT-K) Community Detection May 11, 2018 18 / 20
19. Bibliography
[1] Fortunato, S. Community detection in graphs. Physics Reports 486, 75-174 (2010).
[2] Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks.
Physical review E 69, 026113 (2004).
[3] Blondel, V. D., Guillaume, J.L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities
in large networks. Journal of Statistical Mechanics (2008).
[4] Lancichinetti, A., Radicchi, F., Ramasco, J. & Fortunato, S. Finding statistically significant
communities in networks. PLoS ONE 6, e18961 (2011).
[5] Rosvall, M., Bergstrom, Carl T. Maps of information flow reveal community structure in
complex networks . PNAS Vol. 105 No. 4, 1118-1123 (2008).
[6] Rosvall, M., Axelsson, D., Bergstrom, Carl T. The map equation. Eur. Phys. J. Special
Topics 178, 13-23 (2009)
c Project Group B4 (HIT-K) Community Detection May 11, 2018 19 / 20
20. Thank you
c Project Group B4 (HIT-K) Community Detection May 11, 2018 20 / 20