Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs

Introduction Belief Propagation Algorithm Methodology and Experiments Conclusions
Vertex Centric Asynchronous Belief Propagation
Algorithm for Large-Scale Graphs
Gabriel Gimenes
Hugo Gualdron
Jose F. Rodrigues-Jr
Instituto de Ciencias Matematicas e de Computacao
University of Sao Paulo - Sao Carlos
DamNet - 2016 ICDM Workshop, Barcelona, Spain
This work has ﬁnantial support from FAPESP 2014/25337-0

Outline
1 Introduction
2 Belief Propagation Algorithm
3 Methodology and Experiments
4 Conclusions

Context
Ubiquitous data generation
Information availability: pros and cons
Web 2.0 – users are producing data and not only consuming
Relationships between elements
Facebook, Twitter, Amazon, GooglePlay, Email
Intuitive modelling: Graphs(Networks)

Problem
Analyzing large-scale networks – eﬃcient and powerful
Some graphs (e.g YahooWeb e Twitter) may not ﬁt memory
Naive processing: prohibitive
Alternative: distributed processing
complexity, infrastructure, cost
How to process in a single computational node?

Rationale
New approaches: Taking advatange of the multi-core
architecturess
Centralized → Decentralized
Vertex-centric processing techniques
Block-based processing
Asynchronous processing
Proposals: TurboGraph, GraphChi, X-Stream, MMap,
M-Flash, FlashGraph; Pregel, GraphLab, Giraph.

Vertex-centric paradigm
Vertex-centric model
procedure Graph scan(Graph G)
for i = 1 to |V | do
sete ← set of edges adjacent to V [i]
V [i].value ← f (sete )
for each edge e in sete do
e.value ← g(V [i].value, e.value)
Outer loop
procedure Graph processing
while convergence criterion is not satisﬁed do
Graph scan(G)

Algorithm
Belief propagation - bayesian inference method
Estimating the marginal probability distribution for
non-annotated nodes
Message passing: information travels from annotated to
unannotated nodes
Guilty-by-association or ”birds of a feather ﬂock together”
Heterophily vs Homophily

Problem
Original algorithm proposed for trees - no loops
Loopy BP (Murphy et al.) generalized algorithm
Problems with convergence and performance
Early applications in stereo-imaging and facial reconstruction

Evolution
Performance and scalability: distributed processing
Gonzalez et al. – distributed ineﬃciencies
Kang et al. – algorithm relevance for anti-malware and fraud
detection applications
Gatterbauer et al. – linear approximation, convergence
guarantees and better performance

BP vs LinBP
Belief Propagation
bs (i) = es (i)
u∈N(s)
mus (i)
mst (i) =
c−1
j=0
Hst (j, i)es (j)
u∈N(s)t
mus (j)
Linearized Belief Propagation
ˆbs (i) = ˆes (i) +
1
k
u∈N(s)
ˆmus (i)
ˆmst (i) = k
j
ˆHst (j, i)ˆbs (j) −
j
ˆHst (j, i) ˆmts (j)

Proposal and contributions
Algorithm: change of paradigm, asynchronous parallel
vertex-centric processing
Convergence: better convergence speed (number of iterations)
Scalability: commodity computer

Our algorithm
VC-LinBP
1: procedure VC-LinBP(G(V , E), VExplicit, H, h, t)
2: set H = hH
3: set H2 = H2
4: repeat
5: for each vertex in V do
6: Update(vertex)
7: until t iterations or convergence achieved

Our algorithm
Update
1: procedure Update(vertex)
2: Set degree = 0
3: for each class c in vertex do initializing vertex values for each class
4: vertex.value(c) = 0
5: for each incoming edge e to vertex do processing incoming messages
6: degree+ = e.weight2
7: for each each class cfrom do
8: for each each class cto do
9: vertex.value(cto) += e.weight * e.value(cfrom) * H(cfrom, cto)
10: if vertex is not explicit then echo cancellation of messages
11: for each each class cfrom do
12: for each each class cto do
13: vertex.value(cto)− = degree ∗ vertex.value(cfrom) ∗ H2(cfrom, cto)
14: else adding explicit value of the vertex
15: vertex.value(c)+ = VExplicit (vertex)(c)
16: for each outgoing edge e from vertex do sending messages to neighbors
17: for each each class c do
18: e.value(c) = vertex.value(c)

Experiments
Eﬃciency and eﬃcacy
i7 CPU 8 cores, 16GB RAM, 240GB SSD
Comparison with LinBP
2 versions: single e multi-threaded
Utilizing the GraphChi framework

Datasets
Generated with the Kronecker product method – SNAP
4 diﬀerent networks
Datasets
Graph # Nodes # Edges
1 59,049 1,048,576
2 177,147 4,194,304
3 531,441 16,777,216
4 1,594,323 67,108,864

Experiments
Coupling Matrix
1 2 3
1 0.266667 -0.033333 0.366667
2 0.033333 -0.333333 0.366667
3 -0.233333 0.366667 -0.133333
3 classes, 5% randomly initialized (annotated)
Coupling matrix and initialization procedure based on LinBP’s
experimentation

Experiments - Validation
Validation
Graph Top-beliefs’ Agreement (%)
1 100%
2 100%
3 99%
4 99%
Divergences are related to tiebreak scenarios
Eﬃcacy – to be expected

Experiments - Scalability
Runtime (sec)
Graph LinBP-SQL VC-LinBP-1thread VC-LinBP-8threads
1 39.04 0.31 0.23
2 179 1.27 0.75
3 826 5.90 3.15
4 5000 34.62 18.69
Fixed number of iterations (5 iterations)
Only runtime is considered – excluding pre-processing time

Experiments
1
10
100
1000
1e+06 1e+07 1e+08
Number of Edges
Runtime(sec)
VC_LinBP_1thread
VC_LinBP_8threads
LinBP
(a) Scalability
0
2
4
6
8
1 2 3 4
Dataset
NumberofIterations
LinBP
VC_LinBP
(b) Convergence
Elidan et al. – asynchronous version is at worst the same as synchronous

Future Work
In-memory implementation – performance comparison
Experiments with bigger datasets
Detailed tiebreak scenarios
Real-world dataset experiments – DBLP, Malware detection,
Image segmentation

Thank you!
Questions?

Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs

Similar to Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs (20)

More from Universidade de São Paulo

More from Universidade de São Paulo (13)

Recently uploaded

Recently uploaded (20)

Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs