SlideShare a Scribd company logo
THE SHORTEST PATH IS NOT
ALWAYS A STRAIGHT LINE
leveraging semi-metricity in large-scale graph analysis
Vasiliki Kalavri (kalavri@kth.se) KTH Royal Institute of Technology
Tiago Simas (tiago.simas@telefonica.com)Telefonica Research
Dionysios Logothetis (dionysios@fb.com) Facebook
2
Alice42 likes
Weighted graphs capture
relationship strength
distance
similarity
social proximity
rating
preference
influential nodes
optimal propagation paths
communities
recommendations
BobMax
3 likes
3
Sparsification techniques reduce the
graph size and still give exact or good
approximate results
G G’
f(G) ~ f(G’)
THE METRIC BACKBONE
Reduces the graph size while
maintaining relevant structure
The minimum subgraph of a weighted graph, that
preserves the shortest paths of the original graph
4
B
E
DA
C
2
3
10
4
2
1
B
E
DA
C
2
3
2
1
WHAT CAN WE USE IT FOR?
• Exact computations
• any algorithm that depends on the shortest paths
• reachability, connectivity
• betweenness centrality, closeness centrality
• Approximation
• PageRank, random walks
• eigenvector centrality
• community detection, clustering
5
WHAT CAN WE USE IT FOR?
• Exact computations
• any algorithm that depends on the shortest paths
• reachability, connectivity
• betweenness centrality, closeness centrality
• Approximation
• PageRank, random walks
• eigenvector centrality
• community detection, clustering
5
Improves community detection
modularity and recommender
systems accuracy
IMPACT ON LARGE-SCALE SYSTEMS
• Graph Databases
• fewer edges => smaller path search space
• Batch Graph Processing
• CPU and memory requirements depend on #messages
• #messages proportional to #edges
• fewer edges => improved analysis performance
• Graph Compression
• fewer edges => storage reduction
6
BACKGROUND
SEMI-METRICITY
In a weighted graph, an edge is semi-metric, if there
exists a shorter indirect path between its endpoints
8
B
E
DA
C
2
3
10
4
2
1
SEMI-METRICITY
In a weighted graph, an edge is semi-metric, if there
exists a shorter indirect path between its endpoints
9
B
E
DA
C
2
3
10
4
2
1
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path
SEMI-METRICITY
In a weighted graph, an edge is semi-metric, if there
exists a shorter indirect path between its endpoints
10
B
E
DA
C
2
3
10
4
2
1
AD is 2nd-order
semi-metric:
A-B-C-D is a shorter
3-hop path
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path
SEMI-METRICITY
In a weighted graph, an edge is semi-metric, if there
exists a shorter indirect path between its endpoints
11
B
E
DA
C
2
3
10
4
2
1
CE is 1st-order
semi-metric:
C-D-E is a shorter
2-hop path
AD is 2nd-order
semi-metric:
A-B-C-D is a shorter
3-hop path
AB, BC, CD, DE
are metric
BACKBONE ALGORITHM
BACKBONE CALCULATION
• Calculating the backbone:
• find all semi-metric edges: 1 BFS per edge?
• compute APSP and store O(N2) paths
13
BACKBONE CALCULATION
• Calculating the backbone:
• find all semi-metric edges: 1 BFS per edge?
• compute APSP and store O(N2) paths
Can we calculate or
approximate the backbone
without solving APSP?
13
ORDER OF SEMI-METRICITY
14
ORDER OF SEMI-METRICITY
14
Most semi-metric edges are
1st-order semi-metric
A 3-PHASE BACKBONE ALGORITHM
15
Find 1st-order semi-metric
edges: only look at triangles
1.
A 3-PHASE BACKBONE ALGORITHM
15
Find 1st-order semi-metric
edges: only look at triangles
1. Scalable & practical
for large graphs
EXAMPLE
16
B
E
DA
C
2
3
10
4
2
1
EXAMPLE
17
B
E
DA
C
2
3
10
4
2
1
Phase 1
EXAMPLE
18
B
E
DA
C
2
3
10
2
1
Phase 1
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric
edges: only look at triangles
1. Scalable & practical
for large graphs
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric
edges: only look at triangles
1.
Identify metric edges in
2-hop paths
2.
Scalable & practical
for large graphs
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric
edges: only look at triangles
1.
Identify metric edges in
2-hop paths
2.
Scalable & practical
for large graphs
Most semi-metric edges
have been removed
EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
M
M
M
M
The lowest-weight edge
of every vertex is metric
EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
M
M
M
M
The lowest-weight edge
of every vertex is metric
u
v
2
4
2
1
any indirect path
from u to v
would have
larger weight
EXAMPLE
20
B
E
DA
C
2
3
10
2
1
Phase 2
?
M
M
M
M
The lowest-weight edge
of every vertex is metric
u
v
2
4
2
1
any indirect path
from u to v
would have
larger weight
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric
edges: only look at triangles!
1.
Identify metric edges in
2-hop paths
2.
Scalable & practical
for large graphs!
Most semi-metric edges
have been removed
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric
edges: only look at triangles!
1.
Identify metric edges in
2-hop paths
2.
Run a BFS for remaining
unlabeled edges.
3.
Scalable & practical
for large graphs!
Most semi-metric edges
have been removed
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric
edges: only look at triangles!
1.
Identify metric edges in
2-hop paths
2.
Run a BFS for remaining
unlabeled edges.
3.
Scalable & practical
for large graphs!
1%-9% edges
Most semi-metric edges
have been removed
EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS
EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS
Explore paths
with shorter
distances only
EXAMPLE
22
B
E
DA
C
2
3
10
2
1
Phase 3
M
M
M
M
BFS
Explore paths
with shorter
distances only
If the BFS arrives at
the target, the edge
is semi-metric
EXAMPLE
23
B
E
DA
C
2
3
2
1
Metric Backbone
DISTRIBUTED IMPLEMENTATION
code available: http://grafos.ml/okapi.html#analytics
24
Implementation in the vertex-centric model
EVALUATION
EVALUATION GOALS
• How does our algorithm compare to APSP?
• Are large, real-world graphs semi-metric?
• Can we improve graph analysis performance?
26
COMPARISONTO APSP
Computing APSP in Giraph
• multiple SSSPs
• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
COMPARISONTO APSP
Computing APSP in Giraph
• multiple SSSPs
• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months
for million-edge graphs
COMPARISONTO APSP
Computing APSP in Giraph
• multiple SSSPs
• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months
for million-edge graphs
In the order of days for
million-edge graphs
COMPARISONTO APSP
Computing APSP in Giraph
• multiple SSSPs
• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months
for million-edge graphs
In the order of days for
million-edge graphs
Our algorithm is 120-180x faster than SSSP
and 11-14x faster than MSSP:
order of hours for million-edge graphs
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
Moderately fast
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
Moderately fast
Labels up to 60%
of the unlabeled edges
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
Moderately fast
Labels up to 60%
of the unlabeled edges
Slow
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
Moderately fast
Labels up to 60%
of the unlabeled edges
Slow
Labels up to 1-9%
of the total edges
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fast
and scalable
Removes up to 90%
of semi-metric edges
Moderately fast
Labels up to 60%
of the unlabeled edges
Slow
Labels up to 1-9%
of the total edges
Phase 1 is the fastest and most useful phase
PHASE 1 SCALABILITY
29
PHASE 1 SCALABILITY
29
<200s on a
billion-edge graph
PHASE 1 SCALABILITY
29
almost linear
scalability
<200s on a
billion-edge graph
SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%
Twitter 40M 1.5B jaccard 39%
Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%
NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%
Twitter-ego 81K 1.7M jaccard, adamic 57%-39%
Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K
#messages,
message size
78%-77%
US-Airports 0.5K 6K #passengers 72%
C-Elegans 0.3K 2.3K #connections 17%
SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%
Twitter 40M 1.5B jaccard 39%
Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%
NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%
Twitter-ego 81K 1.7M jaccard, adamic 57%-39%
Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K
#messages,
message size
78%-77%
US-Airports 0.5K 6K #passengers 72%
C-Elegans 0.3K 2.3K #connections 17%
% 1st-order semi-
metric edges =>
reduction in memory and
communication
QUERY SPEEDUP ON NEO4J
31
6.7x speedup
APACHE GIRAPH SPEEDUP
32
Including the time to calculate the backbone
4x speedup
APACHE GIRAPH SPEEDUP
33
6x speedup
COMMUNICATION REDUCTION
34
Up to 70% for highly semi-
metric graphs
BEST PRACTICES
When to use the backbone?
• semi-metric weighting schemes, e.g. neighborhood similarity
• we can amortize the overhead: e.g. many algorithms on the same graph,
multiple distance queries
• lossy compression is ok
When not to use the backbone?
• for metric weighting schemes
• we need to run one-off analysis
• we need lossless compression
35
RECAP: MAIN CONTRIBUTIONS
36
• An algorithm for computing the metric
backbone without solving APSP
• An open-source distributed implementation
• Graph query and graph analytics speedup on
Neo4j and Apache Giraph
THE SHORTEST PATH IS NOT
ALWAYS A STRAIGHT LINE
leveraging semi-metricity in large-scale graph analysis
Vasiliki Kalavri (kalavri@kth.se) KTH Royal Institute of Technology
Tiago Simas (tiago.simas@telefonica.com)Telefonica Research
Dionysios Logothetis (dionysios@fb.com) Facebook

More Related Content

Similar to The shortest path is not always a straight line

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
BigMine
 
Design of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and ExcelDesign of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and Excel
David Sandy
 
Circuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project ReportCircuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project Report
Michael Sandy
 
Emm3104 chapter 1 part4
Emm3104 chapter 1 part4Emm3104 chapter 1 part4
Emm3104 chapter 1 part4
Khairiyah Sulaiman
 
Day 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on dsDay 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on ds
GUNASUNDARISAPIIICSE
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
Mark Kilgard
 
Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka
Sudip khadka
 
CaoTupinThursday20110722.ppt
CaoTupinThursday20110722.pptCaoTupinThursday20110722.ppt
CaoTupinThursday20110722.pptgrssieee
 
Concept of Adaptive Transmission
Concept of Adaptive TransmissionConcept of Adaptive Transmission
Concept of Adaptive Transmission
Pavel Loskot
 
Kyle McKinnon LHarbour Flow Sim NOVIDEO
Kyle McKinnon LHarbour Flow Sim NOVIDEOKyle McKinnon LHarbour Flow Sim NOVIDEO
Kyle McKinnon LHarbour Flow Sim NOVIDEO
COGS Presentations
 
JGrass: the Horton machine (FOSS4G2008)
JGrass: the Horton machine (FOSS4G2008)JGrass: the Horton machine (FOSS4G2008)
JGrass: the Horton machine (FOSS4G2008)
Andrea Antonello
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Saad Liaqat
 
Approximation Algorithms TSP
Approximation Algorithms   TSPApproximation Algorithms   TSP
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...grssieee
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
1516 contouring
1516 contouring1516 contouring
1516 contouring
Dr Fereidoun Dejahang
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Yuichiro Yasui
 
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
KUSHDHIRRA2111026030
 

Similar to The shortest path is not always a straight line (20)

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
Design of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and ExcelDesign of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and Excel
 
Circuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project ReportCircuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project Report
 
Emm3104 chapter 1 part4
Emm3104 chapter 1 part4Emm3104 chapter 1 part4
Emm3104 chapter 1 part4
 
Day 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on dsDay 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on ds
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka
 
CaoTupinThursday20110722.ppt
CaoTupinThursday20110722.pptCaoTupinThursday20110722.ppt
CaoTupinThursday20110722.ppt
 
Concept of Adaptive Transmission
Concept of Adaptive TransmissionConcept of Adaptive Transmission
Concept of Adaptive Transmission
 
Kyle McKinnon LHarbour Flow Sim NOVIDEO
Kyle McKinnon LHarbour Flow Sim NOVIDEOKyle McKinnon LHarbour Flow Sim NOVIDEO
Kyle McKinnon LHarbour Flow Sim NOVIDEO
 
JGrass: the Horton machine (FOSS4G2008)
JGrass: the Horton machine (FOSS4G2008)JGrass: the Horton machine (FOSS4G2008)
JGrass: the Horton machine (FOSS4G2008)
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01
 
Approximation Algorithms TSP
Approximation Algorithms   TSPApproximation Algorithms   TSP
Approximation Algorithms TSP
 
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
1516 contouring
1516 contouring1516 contouring
1516 contouring
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
APznzaZLM_MVouyxM4cxHPJR5BC-TAxTWqhQJ2EywQQuXStxJTDoGkHdsKEQGd4Vo7BS3Q1npCOMV...
 

More from Vasia Kalavri

From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
Vasia Kalavri
 
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with StrymonPredictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
Vasia Kalavri
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Vasia Kalavri
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
Vasia Kalavri
 
Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
Vasia Kalavri
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
Vasia Kalavri
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
Vasia Kalavri
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Vasia Kalavri
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
Vasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
Vasia Kalavri
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Vasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 

More from Vasia Kalavri (17)

From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
 
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with StrymonPredictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

The shortest path is not always a straight line

  • 1. THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE leveraging semi-metricity in large-scale graph analysis Vasiliki Kalavri (kalavri@kth.se) KTH Royal Institute of Technology Tiago Simas (tiago.simas@telefonica.com)Telefonica Research Dionysios Logothetis (dionysios@fb.com) Facebook
  • 2. 2 Alice42 likes Weighted graphs capture relationship strength distance similarity social proximity rating preference influential nodes optimal propagation paths communities recommendations BobMax 3 likes
  • 3. 3 Sparsification techniques reduce the graph size and still give exact or good approximate results G G’ f(G) ~ f(G’)
  • 4. THE METRIC BACKBONE Reduces the graph size while maintaining relevant structure The minimum subgraph of a weighted graph, that preserves the shortest paths of the original graph 4 B E DA C 2 3 10 4 2 1 B E DA C 2 3 2 1
  • 5. WHAT CAN WE USE IT FOR? • Exact computations • any algorithm that depends on the shortest paths • reachability, connectivity • betweenness centrality, closeness centrality • Approximation • PageRank, random walks • eigenvector centrality • community detection, clustering 5
  • 6. WHAT CAN WE USE IT FOR? • Exact computations • any algorithm that depends on the shortest paths • reachability, connectivity • betweenness centrality, closeness centrality • Approximation • PageRank, random walks • eigenvector centrality • community detection, clustering 5 Improves community detection modularity and recommender systems accuracy
  • 7. IMPACT ON LARGE-SCALE SYSTEMS • Graph Databases • fewer edges => smaller path search space • Batch Graph Processing • CPU and memory requirements depend on #messages • #messages proportional to #edges • fewer edges => improved analysis performance • Graph Compression • fewer edges => storage reduction 6
  • 9. SEMI-METRICITY In a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints 8 B E DA C 2 3 10 4 2 1
  • 10. SEMI-METRICITY In a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints 9 B E DA C 2 3 10 4 2 1 CE is 1st-order semi-metric: C-D-E is a shorter 2-hop path
  • 11. SEMI-METRICITY In a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints 10 B E DA C 2 3 10 4 2 1 AD is 2nd-order semi-metric: A-B-C-D is a shorter 3-hop path CE is 1st-order semi-metric: C-D-E is a shorter 2-hop path
  • 12. SEMI-METRICITY In a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints 11 B E DA C 2 3 10 4 2 1 CE is 1st-order semi-metric: C-D-E is a shorter 2-hop path AD is 2nd-order semi-metric: A-B-C-D is a shorter 3-hop path AB, BC, CD, DE are metric
  • 14. BACKBONE CALCULATION • Calculating the backbone: • find all semi-metric edges: 1 BFS per edge? • compute APSP and store O(N2) paths 13
  • 15. BACKBONE CALCULATION • Calculating the backbone: • find all semi-metric edges: 1 BFS per edge? • compute APSP and store O(N2) paths Can we calculate or approximate the backbone without solving APSP? 13
  • 17. ORDER OF SEMI-METRICITY 14 Most semi-metric edges are 1st-order semi-metric
  • 18. A 3-PHASE BACKBONE ALGORITHM 15 Find 1st-order semi-metric edges: only look at triangles 1.
  • 19. A 3-PHASE BACKBONE ALGORITHM 15 Find 1st-order semi-metric edges: only look at triangles 1. Scalable & practical for large graphs
  • 23. A 3-PHASE BACKBONE ALGORITHM 19 Find 1st-order semi-metric edges: only look at triangles 1. Scalable & practical for large graphs
  • 24. A 3-PHASE BACKBONE ALGORITHM 19 Find 1st-order semi-metric edges: only look at triangles 1. Identify metric edges in 2-hop paths 2. Scalable & practical for large graphs
  • 25. A 3-PHASE BACKBONE ALGORITHM 19 Find 1st-order semi-metric edges: only look at triangles 1. Identify metric edges in 2-hop paths 2. Scalable & practical for large graphs Most semi-metric edges have been removed
  • 28. EXAMPLE 20 B E DA C 2 3 10 2 1 Phase 2 M M M M The lowest-weight edge of every vertex is metric u v 2 4 2 1 any indirect path from u to v would have larger weight
  • 29. EXAMPLE 20 B E DA C 2 3 10 2 1 Phase 2 ? M M M M The lowest-weight edge of every vertex is metric u v 2 4 2 1 any indirect path from u to v would have larger weight
  • 30. A 3-PHASE BACKBONE ALGORITHM 21 Find 1st-order semi-metric edges: only look at triangles! 1. Identify metric edges in 2-hop paths 2. Scalable & practical for large graphs! Most semi-metric edges have been removed
  • 31. A 3-PHASE BACKBONE ALGORITHM 21 Find 1st-order semi-metric edges: only look at triangles! 1. Identify metric edges in 2-hop paths 2. Run a BFS for remaining unlabeled edges. 3. Scalable & practical for large graphs! Most semi-metric edges have been removed
  • 32. A 3-PHASE BACKBONE ALGORITHM 21 Find 1st-order semi-metric edges: only look at triangles! 1. Identify metric edges in 2-hop paths 2. Run a BFS for remaining unlabeled edges. 3. Scalable & practical for large graphs! 1%-9% edges Most semi-metric edges have been removed
  • 35. EXAMPLE 22 B E DA C 2 3 10 2 1 Phase 3 M M M M BFS Explore paths with shorter distances only If the BFS arrives at the target, the edge is semi-metric
  • 37. DISTRIBUTED IMPLEMENTATION code available: http://grafos.ml/okapi.html#analytics 24 Implementation in the vertex-centric model
  • 39. EVALUATION GOALS • How does our algorithm compare to APSP? • Are large, real-world graphs semi-metric? • Can we improve graph analysis performance? 26
  • 40. COMPARISONTO APSP Computing APSP in Giraph • multiple SSSPs • multiple MSSPs, i.e. SSSPs from several sources in parallel 27
  • 41. COMPARISONTO APSP Computing APSP in Giraph • multiple SSSPs • multiple MSSPs, i.e. SSSPs from several sources in parallel 27 In the order of months for million-edge graphs
  • 42. COMPARISONTO APSP Computing APSP in Giraph • multiple SSSPs • multiple MSSPs, i.e. SSSPs from several sources in parallel 27 In the order of months for million-edge graphs In the order of days for million-edge graphs
  • 43. COMPARISONTO APSP Computing APSP in Giraph • multiple SSSPs • multiple MSSPs, i.e. SSSPs from several sources in parallel 27 In the order of months for million-edge graphs In the order of days for million-edge graphs Our algorithm is 120-180x faster than SSSP and 11-14x faster than MSSP: order of hours for million-edge graphs
  • 44. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3
  • 45. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable
  • 46. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges
  • 47. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges Moderately fast
  • 48. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges Moderately fast Labels up to 60% of the unlabeled edges
  • 49. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges Moderately fast Labels up to 60% of the unlabeled edges Slow
  • 50. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges Moderately fast Labels up to 60% of the unlabeled edges Slow Labels up to 1-9% of the total edges
  • 51. ALGORITHM PHASES 28 Phase 1 Phase 2 Phase 3 Very fast and scalable Removes up to 90% of semi-metric edges Moderately fast Labels up to 60% of the unlabeled edges Slow Labels up to 1-9% of the total edges Phase 1 is the fastest and most useful phase
  • 53. PHASE 1 SCALABILITY 29 <200s on a billion-edge graph
  • 54. PHASE 1 SCALABILITY 29 almost linear scalability <200s on a billion-edge graph
  • 55. SEMI-METRICITY IN REAL GRAPHS 30 Graph |V| |E| metric semi-metricity Facebook 190M 49.9B custom 26.5% Twitter 40M 1.5B jaccard 39% Tuenti 12M 685M jaccard 59% Livejournal 4.8M 34M jaccard 40% NotreDame 0.3M 1.5M jaccard, adamic 45%-29% DBLP 318K 1M jaccard, adamic 23%-9% Twitter-ego 81K 1.7M jaccard, adamic 57%-39% Movielens 1.6K 1.9M jaccard 88% Facebook 1K 143K #messages, message size 78%-77% US-Airports 0.5K 6K #passengers 72% C-Elegans 0.3K 2.3K #connections 17%
  • 56. SEMI-METRICITY IN REAL GRAPHS 30 Graph |V| |E| metric semi-metricity Facebook 190M 49.9B custom 26.5% Twitter 40M 1.5B jaccard 39% Tuenti 12M 685M jaccard 59% Livejournal 4.8M 34M jaccard 40% NotreDame 0.3M 1.5M jaccard, adamic 45%-29% DBLP 318K 1M jaccard, adamic 23%-9% Twitter-ego 81K 1.7M jaccard, adamic 57%-39% Movielens 1.6K 1.9M jaccard 88% Facebook 1K 143K #messages, message size 78%-77% US-Airports 0.5K 6K #passengers 72% C-Elegans 0.3K 2.3K #connections 17% % 1st-order semi- metric edges => reduction in memory and communication
  • 57. QUERY SPEEDUP ON NEO4J 31 6.7x speedup
  • 58. APACHE GIRAPH SPEEDUP 32 Including the time to calculate the backbone 4x speedup
  • 60. COMMUNICATION REDUCTION 34 Up to 70% for highly semi- metric graphs
  • 61. BEST PRACTICES When to use the backbone? • semi-metric weighting schemes, e.g. neighborhood similarity • we can amortize the overhead: e.g. many algorithms on the same graph, multiple distance queries • lossy compression is ok When not to use the backbone? • for metric weighting schemes • we need to run one-off analysis • we need lossless compression 35
  • 62. RECAP: MAIN CONTRIBUTIONS 36 • An algorithm for computing the metric backbone without solving APSP • An open-source distributed implementation • Graph query and graph analytics speedup on Neo4j and Apache Giraph
  • 63. THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE leveraging semi-metricity in large-scale graph analysis Vasiliki Kalavri (kalavri@kth.se) KTH Royal Institute of Technology Tiago Simas (tiago.simas@telefonica.com)Telefonica Research Dionysios Logothetis (dionysios@fb.com) Facebook