SlideShare a Scribd company logo
Scalable Community Detection with the Louvain Algorithm
Navid Sedighpour
Dr. Alireza Bagheri
Advanced Algorithm Presentation
July 2016
Main Reference
2/ 36
What we will see...
• Introduction
• Background
• Parallel Louvain Algorithm
• Experimental Results
• Conclusions
3/ 36
Introduction
4/ 36
Introduction
 Community Detection is an important problem today.
 Community detection spans many research areas:
 Health Care
 Social Networks
 Systems Biology
 Power Grid
 Community Detection algorithms attempt to identify:
 Modules
 Their hierarchical organization
 There is no rigorous mathematical definition of community structure.
 Modularity is the most used best known function to quantify community structure
in graph.
5/ 36
Introduction
 Techniques used for modularity maximization:
 Greedy: applies different approaches to merge vertices into communities for
higher modularity.
 Simulated Annealing: adopts a probabilistic procedure for global optimization
on modularity.
 External Optimization: heuristic search procedure
 Spectral Optimization: uses the eigenvalues and eigenvectors of a special
matrix for modularity optimization.
 All these methods lead to poor result when they applied to large graphs with many
communities.
 Achieving strong scalability together with high-quality community detection is still
an open problem.
 Unfortunately, most of community detection algorithms are sequential.
6/ 36
The Louvain Algorithm
 Is a greedy algorithm
 Weighted graphs
 This algorithm has been widely utilized in many application domains because of:
 Its rapid convergence properties
 High modularity
 Hierarchal partitioning
7/ 36
First Step: First Level Partitions
1. puts all vertices of a graph in distinct communities, one per vertex.
2. For each vertex i, the algorithm performs two calculations:
1. Compute the modularity gain (∆Q) when putting vertex i in the community of
any neighbor j.
2. Pick the neighbor j that yields the largest gain in ∆Q .
3. This loop continues until no movement yields a gain.
8/ 36
Other Steps: hierarchal structure
 Partitions of the first step become super-vertices and the algorithm reconstructs
the graph.
 Two super-vertices are connected if there is at least one edge between vertices
of the corresponding partitions.
 The weight of the edge between the two super-vertices is the sum of the weights
from all edges between their corresponding partitions at the lower level.
 These two steps repeats until the communities become stable.
9/ 36
Contributions
 Parallelization
 Novel implementation strategy to store and process dynamic graphs
 Extensive experimental evolution
10/ 36
Background
11/ 36
Problem Definition
• A weighted directed graph G = (V; E):
• V — set of vertices
• E — set of edges
• when u, v ∈ V , an edge e(u, v) ∈ E has weight 𝑤 𝑢,𝑣
• The goal of community detection (in this paper) is to partition graph G into a set C
of disjoint communities 𝑐𝑖:
• ∪ 𝑐𝑖 = V ; ∀𝑐𝑖 ∈ C
• 𝑐𝑖 ∩ 𝑐𝑗 = ∅ ; ∀𝑐𝑖; 𝑐𝑗 ∈ C
12/ 36
Problem Definition
13/ 36
Modularity Gain
• Modularity optimization is an NP-Complete problem,
therefore the Louvain algorithm maximizes the modularity
gain, which is :
• W(u): the sum of the weights of the edges incident to
vertex u
• 𝑤 𝑢→𝑐 : the sum of the weights of the edges from vertex u
to vertices in community c :
14/ 36
Sequential Louvain Algorithm
15/ 36
Parallel Louvain Algorithm
16/ 36
Hash-Based Data Organization
• We use two distinct hash tables:
• In_Table: to store the incoming edges of each vertex, that is not changed during the
inner loop and keeps track of the original graph structure.
• Out_Table: for the outgoing edges, which is changed during each iteration
17/ 36
Hash-Based Data Organization
18/ 36
Hash-Based Data Organization
• Both hash-tables are hashed on edges
• The input key (x) is a function of a tuple (t1,t2):
•
(<< is bitwise left shift operator and | is bitwise OR operator)
• For In_Table, the tuple contains the source vertex u and the destination vertex i (i is
the vertex 0 or 2) of the edge being hashed ((u,i), ∀e(u, i) ∈ E).
• the bucket is a triple ((u,i), 𝑤 𝑢,𝑖), which includes the weight of the corresponding edges (𝑤 𝑢,𝑖)
• For Out Table, the tuple contains the source vertex i and the destination vertex v’s
community c for the corresponding edge ((i,c), ∀v∈c,e(i,v)∈E).
• The bucket in the Out Table is a triple ((i,c), 𝑤𝑖→𝑐), where 𝑤𝑖→𝑐 refers to the sum of weights
of all the edges from the vertex i to a specific community c because all these edges are
hashed to the same bucket in the Out Table
19/ 36
Hash-Based Data Organization
• In this paper, we have utilized Fibonacci hash:
• Hash Function:
• M is the size of the hash table
• W is 264 - 1
• φ is the golden ratio
20/ 36
A Novel Convergence Heuristic
 LFR is designed to generate graphs to test community detection algorithms and
mimic the properties of real-world, large-scale complex graphs, through tunable
parameters.
 There is an inverse exponential relationship between the movement of the vertices
and the number of iterations in the inner loop.
 Dynamical threshold : identifies the fraction of the vertices that need to be
updated during each iteration of the inner loop.
 The threshold decreases exponentially with the number of iterations.
𝜺
21/ 36
Regression analysis for the LFR benchmark
 Y: Power-low distribution
 β: Community size
 μ : the fraction of edges between vertices belonging to
different communities
 K: average degree of the graph
 X-axis: inner loop
 Y-axis: ε
22/ 36
Modularity gain threshold
• We translate ε to the modularity gain threshold Δǭ by sorting the Δ𝑄 𝑢 (u ∈ V) for
all the vertices and selecting the top ε vertices to update the community
membership.
• Once Δǭ is known, the vertices can update the community information in parallel.
23/ 36
Parallel Louvain Algorithm
24/ 36
Parallel Louvain Algorithm
 The algorithm starts by initializing all the data structures.
 At the very beginning, the In-Table contains all the in-edges information of the
vertices owned by each node/process and the Out-Table is empty.
 The algorithm starts with STATE-PROPAGATION(), which globally propagates the
community state information.
 Once all the messages have been delivered and hashed in place, Out Table is
initialized.
 REFINE() corresponds to the inner loop of the sequential algorithm and orchestrates
the vertex migration to different communities, based on the modularity gain.
 GRAPH-RECONSTRUCTION()): reconstruct the graph and prepare for the next
round.
 The algorithm halts when there is no further modularity improvement.
25/ 36
Community State Propagation
• All processes sequentially scan their In-Table and send messages to the destination
processes that own the corresponding vertices (v in the algorithm)
• The processes also receive messages from others and insert/update hashed edges
to the Out-Table
26/ 36
Refine
27/ 36
Refine
• It starts with initializing ĉ 𝑢and m 𝑢
• The Out-Table contains all the neighbors’ communities and the accumulated weights
to the communities from the vertices owned by that process.
• Each process starts scanning the hashed edges (u,c’) and then calculates the
modularity gain of putting vertex u to community c’ and updates the corresponding ˆ
ĉ 𝑢and m 𝑢if needed. After all the Out-Tables have been examined, each vertex has
the information on the best community to join.
• Then, updates the community information and the corresponding Σtot based on Δǭ ,
and computes the new modularity.
• The procedure terminates when no further vertex movement can increase the
modularity.
28/ 36
Graph Reconstruction
29/ 36
Graph Reconstruction
• The weights of the edges between the new vertices are given by the sum of the
weight of the edges between vertices in the corresponding two communities.
• Edges between vertices of the same community lead to self-loops in the new graph.
• Line 4,5 scan the Out-Table and send the aggregated edges information to the
owner process of the vertex (community)
• Upon receiving the messages, lines 7–11 finally prepare the In-Table as the input
graph for the next round of execution.
30/ 36
Experimental Results
31/ 36
Modularity
• X-axis: outer loop iteration
• Y-axis: modularity
• The basic parallel version
converges very slowly with a very
low modularity score.
• The parallel version with the
proposed heuristic is on par with
the original sequential algorithm
and, surprisingly, in one case, UK-
2005, obtains slightly higher
modularity.
32/ 36
Speedup
• 49.8x speedup obtained with UK-2005 on 64 nodes, is 5.6 times faster than the
result reported in previous paper, which examines a random sampling generated sub-
network with 31% edges of the original UK-2005 graph.
33/ 36
Execution Breakdown
34/ 36
Scalability
• The parallel algorithm achieves good scalability. The processing rate is proportional
to the number of nodes.
35/ 36
Conclusions
• We have introduced a parallel version of the Louvain algorithm
• we have introduced a novel hash-based strategy to efficiently store and process
dynamic graphs
• Our parallel algorithm preserves the same convergence, modularity and community
properties as the original sequential Louvain algorithm
• We have performed an extensive performance evaluation on a wide variety of real-
world social graphs and synthetic graphs
36/ 36
Scalable Community Detection with the Louvain Algorithm
Navid Sedighpour
Dr. Alireza Bagheri
Advanced Algorithm Presentation
July 2016
Thank You

More Related Content

What's hot

Introduction to Complex Networks
Introduction to Complex NetworksIntroduction to Complex Networks
Introduction to Complex Networks
Hossein A. (Saeed) Rahmani
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
Erika Fille Legara
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
Kent State University
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Premsankar Chakkingal
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networks
Nicola Barbieri
 
Clustering
ClusteringClustering
Clustering
LipikaSaha2
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
Siddharth Vij
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
Universitat Politècnica de Catalunya
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Yashwant Rautela
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data Structure
Anuj Modi
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
SSA KPI
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
Amrinder Arora
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part ii
THomas Plotkowiak
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
R A Akerkar
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
Muhammad Hammad Waseem
 
Clique
Clique Clique
Clique
sk_klms
 

What's hot (20)

Introduction to Complex Networks
Introduction to Complex NetworksIntroduction to Complex Networks
Introduction to Complex Networks
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networks
 
Clustering
ClusteringClustering
Clustering
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data Structure
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part ii
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
 
Clique
Clique Clique
Clique
 

Viewers also liked

email transaction network analysis
email transaction network analysisemail transaction network analysis
email transaction network analysis
keithpjolley
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Alexander Pozdneev
 
DIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive GraphsDIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive Graphs
Jason Riedy
 
Community Detection in Brain Networks
Community Detection in Brain NetworksCommunity Detection in Brain Networks
Community Detection in Brain Networks
Manas Gaur
 
Learning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivityLearning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivity
Gael Varoquaux
 
Distributed computing environment
Distributed computing environmentDistributed computing environment
Distributed computing environment
Ravi Bhushan
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
Neo4j
 

Viewers also liked (7)

email transaction network analysis
email transaction network analysisemail transaction network analysis
email transaction network analysis
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
 
DIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive GraphsDIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive Graphs
 
Community Detection in Brain Networks
Community Detection in Brain NetworksCommunity Detection in Brain Networks
Community Detection in Brain Networks
 
Learning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivityLearning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivity
 
Distributed computing environment
Distributed computing environmentDistributed computing environment
Distributed computing environment
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 

Similar to Scalable community detection with the louvain algorithm

Composition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel ArchitecturesComposition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel Architectures
DmitryZaitsev5
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
DaeJin Kim
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
 
09 placement
09 placement09 placement
09 placement
yogiramesh89
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
Multiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamicsMultiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamics
Emanuele Cozzo
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
thanhdowork
 
Random graph models
Random graph modelsRandom graph models
Random graph models
networksuw
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
ssuser4b1f48
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
Matt Challacombe
 
N41049093
N41049093N41049093
N41049093
IJERA Editor
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
BaliThorat1
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
Quantum persistent k cores for community detection
Quantum persistent k cores for community detectionQuantum persistent k cores for community detection
Quantum persistent k cores for community detection
Colleen Farrelly
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx

Similar to Scalable community detection with the louvain algorithm (20)

Composition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel ArchitecturesComposition of Clans for Solving Linear Systems on Parallel Architectures
Composition of Clans for Solving Linear Systems on Parallel Architectures
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
09 placement
09 placement09 placement
09 placement
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Multiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamicsMultiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamics
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
Random graph models
Random graph modelsRandom graph models
Random graph models
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
N41049093
N41049093N41049093
N41049093
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
Quantum persistent k cores for community detection
Quantum persistent k cores for community detectionQuantum persistent k cores for community detection
Quantum persistent k cores for community detection
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 

Recently uploaded

Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 

Recently uploaded (20)

Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 

Scalable community detection with the louvain algorithm

  • 1. Scalable Community Detection with the Louvain Algorithm Navid Sedighpour Dr. Alireza Bagheri Advanced Algorithm Presentation July 2016
  • 3. What we will see... • Introduction • Background • Parallel Louvain Algorithm • Experimental Results • Conclusions 3/ 36
  • 5. Introduction  Community Detection is an important problem today.  Community detection spans many research areas:  Health Care  Social Networks  Systems Biology  Power Grid  Community Detection algorithms attempt to identify:  Modules  Their hierarchical organization  There is no rigorous mathematical definition of community structure.  Modularity is the most used best known function to quantify community structure in graph. 5/ 36
  • 6. Introduction  Techniques used for modularity maximization:  Greedy: applies different approaches to merge vertices into communities for higher modularity.  Simulated Annealing: adopts a probabilistic procedure for global optimization on modularity.  External Optimization: heuristic search procedure  Spectral Optimization: uses the eigenvalues and eigenvectors of a special matrix for modularity optimization.  All these methods lead to poor result when they applied to large graphs with many communities.  Achieving strong scalability together with high-quality community detection is still an open problem.  Unfortunately, most of community detection algorithms are sequential. 6/ 36
  • 7. The Louvain Algorithm  Is a greedy algorithm  Weighted graphs  This algorithm has been widely utilized in many application domains because of:  Its rapid convergence properties  High modularity  Hierarchal partitioning 7/ 36
  • 8. First Step: First Level Partitions 1. puts all vertices of a graph in distinct communities, one per vertex. 2. For each vertex i, the algorithm performs two calculations: 1. Compute the modularity gain (∆Q) when putting vertex i in the community of any neighbor j. 2. Pick the neighbor j that yields the largest gain in ∆Q . 3. This loop continues until no movement yields a gain. 8/ 36
  • 9. Other Steps: hierarchal structure  Partitions of the first step become super-vertices and the algorithm reconstructs the graph.  Two super-vertices are connected if there is at least one edge between vertices of the corresponding partitions.  The weight of the edge between the two super-vertices is the sum of the weights from all edges between their corresponding partitions at the lower level.  These two steps repeats until the communities become stable. 9/ 36
  • 10. Contributions  Parallelization  Novel implementation strategy to store and process dynamic graphs  Extensive experimental evolution 10/ 36
  • 12. Problem Definition • A weighted directed graph G = (V; E): • V — set of vertices • E — set of edges • when u, v ∈ V , an edge e(u, v) ∈ E has weight 𝑤 𝑢,𝑣 • The goal of community detection (in this paper) is to partition graph G into a set C of disjoint communities 𝑐𝑖: • ∪ 𝑐𝑖 = V ; ∀𝑐𝑖 ∈ C • 𝑐𝑖 ∩ 𝑐𝑗 = ∅ ; ∀𝑐𝑖; 𝑐𝑗 ∈ C 12/ 36
  • 14. Modularity Gain • Modularity optimization is an NP-Complete problem, therefore the Louvain algorithm maximizes the modularity gain, which is : • W(u): the sum of the weights of the edges incident to vertex u • 𝑤 𝑢→𝑐 : the sum of the weights of the edges from vertex u to vertices in community c : 14/ 36
  • 17. Hash-Based Data Organization • We use two distinct hash tables: • In_Table: to store the incoming edges of each vertex, that is not changed during the inner loop and keeps track of the original graph structure. • Out_Table: for the outgoing edges, which is changed during each iteration 17/ 36
  • 19. Hash-Based Data Organization • Both hash-tables are hashed on edges • The input key (x) is a function of a tuple (t1,t2): • (<< is bitwise left shift operator and | is bitwise OR operator) • For In_Table, the tuple contains the source vertex u and the destination vertex i (i is the vertex 0 or 2) of the edge being hashed ((u,i), ∀e(u, i) ∈ E). • the bucket is a triple ((u,i), 𝑤 𝑢,𝑖), which includes the weight of the corresponding edges (𝑤 𝑢,𝑖) • For Out Table, the tuple contains the source vertex i and the destination vertex v’s community c for the corresponding edge ((i,c), ∀v∈c,e(i,v)∈E). • The bucket in the Out Table is a triple ((i,c), 𝑤𝑖→𝑐), where 𝑤𝑖→𝑐 refers to the sum of weights of all the edges from the vertex i to a specific community c because all these edges are hashed to the same bucket in the Out Table 19/ 36
  • 20. Hash-Based Data Organization • In this paper, we have utilized Fibonacci hash: • Hash Function: • M is the size of the hash table • W is 264 - 1 • φ is the golden ratio 20/ 36
  • 21. A Novel Convergence Heuristic  LFR is designed to generate graphs to test community detection algorithms and mimic the properties of real-world, large-scale complex graphs, through tunable parameters.  There is an inverse exponential relationship between the movement of the vertices and the number of iterations in the inner loop.  Dynamical threshold : identifies the fraction of the vertices that need to be updated during each iteration of the inner loop.  The threshold decreases exponentially with the number of iterations. 𝜺 21/ 36
  • 22. Regression analysis for the LFR benchmark  Y: Power-low distribution  β: Community size  μ : the fraction of edges between vertices belonging to different communities  K: average degree of the graph  X-axis: inner loop  Y-axis: ε 22/ 36
  • 23. Modularity gain threshold • We translate ε to the modularity gain threshold Δǭ by sorting the Δ𝑄 𝑢 (u ∈ V) for all the vertices and selecting the top ε vertices to update the community membership. • Once Δǭ is known, the vertices can update the community information in parallel. 23/ 36
  • 25. Parallel Louvain Algorithm  The algorithm starts by initializing all the data structures.  At the very beginning, the In-Table contains all the in-edges information of the vertices owned by each node/process and the Out-Table is empty.  The algorithm starts with STATE-PROPAGATION(), which globally propagates the community state information.  Once all the messages have been delivered and hashed in place, Out Table is initialized.  REFINE() corresponds to the inner loop of the sequential algorithm and orchestrates the vertex migration to different communities, based on the modularity gain.  GRAPH-RECONSTRUCTION()): reconstruct the graph and prepare for the next round.  The algorithm halts when there is no further modularity improvement. 25/ 36
  • 26. Community State Propagation • All processes sequentially scan their In-Table and send messages to the destination processes that own the corresponding vertices (v in the algorithm) • The processes also receive messages from others and insert/update hashed edges to the Out-Table 26/ 36
  • 28. Refine • It starts with initializing ĉ 𝑢and m 𝑢 • The Out-Table contains all the neighbors’ communities and the accumulated weights to the communities from the vertices owned by that process. • Each process starts scanning the hashed edges (u,c’) and then calculates the modularity gain of putting vertex u to community c’ and updates the corresponding ˆ ĉ 𝑢and m 𝑢if needed. After all the Out-Tables have been examined, each vertex has the information on the best community to join. • Then, updates the community information and the corresponding Σtot based on Δǭ , and computes the new modularity. • The procedure terminates when no further vertex movement can increase the modularity. 28/ 36
  • 30. Graph Reconstruction • The weights of the edges between the new vertices are given by the sum of the weight of the edges between vertices in the corresponding two communities. • Edges between vertices of the same community lead to self-loops in the new graph. • Line 4,5 scan the Out-Table and send the aggregated edges information to the owner process of the vertex (community) • Upon receiving the messages, lines 7–11 finally prepare the In-Table as the input graph for the next round of execution. 30/ 36
  • 32. Modularity • X-axis: outer loop iteration • Y-axis: modularity • The basic parallel version converges very slowly with a very low modularity score. • The parallel version with the proposed heuristic is on par with the original sequential algorithm and, surprisingly, in one case, UK- 2005, obtains slightly higher modularity. 32/ 36
  • 33. Speedup • 49.8x speedup obtained with UK-2005 on 64 nodes, is 5.6 times faster than the result reported in previous paper, which examines a random sampling generated sub- network with 31% edges of the original UK-2005 graph. 33/ 36
  • 35. Scalability • The parallel algorithm achieves good scalability. The processing rate is proportional to the number of nodes. 35/ 36
  • 36. Conclusions • We have introduced a parallel version of the Louvain algorithm • we have introduced a novel hash-based strategy to efficiently store and process dynamic graphs • Our parallel algorithm preserves the same convergence, modularity and community properties as the original sequential Louvain algorithm • We have performed an extensive performance evaluation on a wide variety of real- world social graphs and synthetic graphs 36/ 36
  • 37. Scalable Community Detection with the Louvain Algorithm Navid Sedighpour Dr. Alireza Bagheri Advanced Algorithm Presentation July 2016 Thank You