The shortest path is not always a straight line

THE SHORTEST PATH IS NOT
ALWAYS A STRAIGHT LINE
leveraging semi-metricity in large-scale graph analysis
Vasiliki Kalavri (kalavri@kth.se) KTH Royal Institute of Technology
Tiago Simas (tiago.simas@telefonica.com)Telefonica Research
Dionysios Logothetis (dionysios@fb.com) Facebook

2
Alice42 likes
Weighted graphs capture
relationship strength
distance
similarity
social proximity
rating
preference
inﬂuential nodes
optimal propagation paths
communities
recommendations
BobMax
3 likes

3
Sparsiﬁcation techniques reduce the
graph size and still give exact or good
approximate results
G G’
f(G) ~ f(G’)

THE METRIC BACKBONE
Reduces the graph size while
maintaining relevant structure
The minimum subgraph of a weighted graph, that
preserves the shortest paths of the original graph
4
B
E
DA
C
2
3
10
4
2
1
B
E
DA
C
2
3
2
1

WHAT CAN WE USE IT FOR?
• Exact computations
• any algorithm that depends on the shortest paths
• reachability, connectivity
• betweenness centrality, closeness centrality
• Approximation
• PageRank, random walks
• eigenvector centrality
• community detection, clustering
5

WHAT CAN WE USE IT FOR?
• Exact computations
• any algorithm that depends on the shortest paths
• reachability, connectivity
• betweenness centrality, closeness centrality
• Approximation
• PageRank, random walks
• eigenvector centrality
• community detection, clustering
5
Improves community detection
modularity and recommender
systems accuracy

IMPACT ON LARGE-SCALE SYSTEMS
• Graph Databases
• fewer edges => smaller path search space
• Batch Graph Processing
• CPU and memory requirements depend on #messages
• #messages proportional to #edges
• fewer edges => improved analysis performance
• Graph Compression
• fewer edges => storage reduction
6

SEMI-METRICITY
In a weighted graph, an edge is semi-metric, if there
exists a shorter indirect path between its endpoints
8
B
E
DA
C
2
3
10
4
2
1

SEMI-METRICITY
9
B
E
DA
C
2
3
10
4
2
1
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path

SEMI-METRICITY
10
B
E
DA
C
2
3
10
4
2
1
AD is 2nd-order
semi-metric:
A-B-C-D is a shorter
3-hop path
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path

SEMI-METRICITY
11
B
E
DA
C
2
3
10
4
2
1
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path
AD is 2nd-order
semi-metric:
A-B-C-D is a shorter
3-hop path
AB, BC, CD, DE
are metric

BACKBONE CALCULATION
• Calculating the backbone:
• ﬁnd all semi-metric edges: 1 BFS per edge?
• compute APSP and store O(N2) paths
13

BACKBONE CALCULATION
• Calculating the backbone:
• ﬁnd all semi-metric edges: 1 BFS per edge?
• compute APSP and store O(N2) paths
Can we calculate or
approximate the backbone
without solving APSP?
13

ORDER OF SEMI-METRICITY
14
Most semi-metric edges are
1st-order semi-metric

A 3-PHASE BACKBONE ALGORITHM
15
Find 1st-order semi-metric
edges: only look at triangles
1.

15
1. Scalable & practical
for large graphs

EXAMPLE
16
B
E
DA
C
2
3
10
4
2
1

EXAMPLE
17
B
E
DA
C
2
3
10
4
2
1
Phase 1

EXAMPLE
18
B
E
DA
C
2
3
10
2
1
Phase 1

19
1. Scalable & practical
for large graphs

19
1.
Identify metric edges in
2-hop paths
2.
Scalable & practical
for large graphs

19
1.
2-hop paths
2.
for large graphs
Most semi-metric edges
have been removed

EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2

EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
M
M
M
M
The lowest-weight edge
of every vertex is metric

EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
M
M
M
M
u
v
2
4
2
1
any indirect path
from u to v
would have
larger weight

EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
?
M
M
M
M
u
v
2
4
2
1
any indirect path
from u to v
would have
larger weight

21
edges: only look at triangles!
1.
2-hop paths
2.
for large graphs!
have been removed

21
1.
2-hop paths
2.
Run a BFS for remaining
unlabeled edges.
3.
for large graphs!
have been removed

21
1.
2-hop paths
2.
Run a BFS for remaining
unlabeled edges.
3.
for large graphs!
1%-9% edges
have been removed

EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS

EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS
Explore paths
with shorter
distances only

EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS
Explore paths
with shorter
distances only
If the BFS arrives at
the target, the edge
is semi-metric

EXAMPLE
23
B
E
DA
C
2
3
2
1
Metric Backbone

DISTRIBUTED IMPLEMENTATION
code available: http://grafos.ml/okapi.html#analytics
24
Implementation in the vertex-centric model

EVALUATION GOALS
• How does our algorithm compare to APSP?
• Are large, real-world graphs semi-metric?
• Can we improve graph analysis performance?
26

COMPARISONTO APSP
Computing APSP in Giraph
• multiple SSSPs
• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27

COMPARISONTO APSP
• multiple SSSPs
27
In the order of months
for million-edge graphs

COMPARISONTO APSP
• multiple SSSPs
27
In the order of days for
million-edge graphs

COMPARISONTO APSP
• multiple SSSPs
27
In the order of days for
million-edge graphs
Our algorithm is 120-180x faster than SSSP
and 11-14x faster than MSSP:
order of hours for million-edge graphs

ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3

ALGORITHM PHASES
28
Very fast
and scalable

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
of semi-metric edges

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
Moderately fast

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
Moderately fast
Labels up to 60%
of the unlabeled edges

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
Moderately fast
Labels up to 60%
Slow

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
Moderately fast
Labels up to 60%
Slow
Labels up to 1-9%
of the total edges

ALGORITHM PHASES
28
Very fast
and scalable
Removes up to 90%
Moderately fast
Labels up to 60%
Slow
Labels up to 1-9%
of the total edges
Phase 1 is the fastest and most useful phase

PHASE 1 SCALABILITY
29
<200s on a
billion-edge graph

PHASE 1 SCALABILITY
29
almost linear
scalability
<200s on a
billion-edge graph

SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%
Twitter 40M 1.5B jaccard 39%
Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%
NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%
Twitter-ego 81K 1.7M jaccard, adamic 57%-39%
Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K
#messages,
message size
78%-77%
US-Airports 0.5K 6K #passengers 72%
C-Elegans 0.3K 2.3K #connections 17%

SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%
Twitter 40M 1.5B jaccard 39%
Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%
NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%
Twitter-ego 81K 1.7M jaccard, adamic 57%-39%
Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K
#messages,
message size
78%-77%
US-Airports 0.5K 6K #passengers 72%
C-Elegans 0.3K 2.3K #connections 17%
% 1st-order semi-
metric edges =>
reduction in memory and
communication

QUERY SPEEDUP ON NEO4J
31
6.7x speedup

APACHE GIRAPH SPEEDUP
32
Including the time to calculate the backbone
4x speedup

APACHE GIRAPH SPEEDUP
33
6x speedup

COMMUNICATION REDUCTION
34
Up to 70% for highly semi-
metric graphs

BEST PRACTICES
When to use the backbone?
• semi-metric weighting schemes, e.g. neighborhood similarity
• we can amortize the overhead: e.g. many algorithms on the same graph,
multiple distance queries
• lossy compression is ok
When not to use the backbone?
• for metric weighting schemes
• we need to run one-off analysis
• we need lossless compression
35

RECAP: MAIN CONTRIBUTIONS
36
• An algorithm for computing the metric
backbone without solving APSP
• An open-source distributed implementation
• Graph query and graph analytics speedup on
Neo4j and Apache Giraph

The shortest path is not always a straight line

Recommended

Recommended

More Related Content

Similar to The shortest path is not always a straight line

Similar to The shortest path is not always a straight line (20)

More from Vasia Kalavri

More from Vasia Kalavri (17)

Recently uploaded

Recently uploaded (20)

The shortest path is not always a straight line