Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị

A Geometric Distance Oracle for Large Real-World
Graphs
Hong, Ong Xuan
Data Science School
November 16, 2017
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 1 / 30

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Introduction
Explosion of available
information → Mining
information about interactions
between: Subscribers, Groups,
People, Objects, etc.
Fundamental graph
computational is computing
shortest path distance
between arbitrary nodes, but:
Slow calculating and querying
distance results.
Limited memory for storing
graph.
How to do this analysis
eﬀectively?

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Background
Graph theory.
Distance oracle.
Approximate distance.
Metric space: Euclidean, Hyperbolic.
δ - hyperbolic metric space.

Graph theory
Let G(V , E) be an undirected, weighted graph, with n = |N| nodes and
m = |E| edges. What is the distance between the nodes s and t?
Dijkstra algorithm: O(m + nlogn) with Fibonacci heap, requires no
extra space.
Adjacency matrix: query time O(1), requires O(n2) extra space.
Floyd-Warshall algorithm: return all-pairs shortest paths, initialized
in time O(n3)
How to use less than O(n2) space and answer queries in less than
O(m + nlogn)?

Distance oracle
A distance oracle (constant query time) is a data structure which is
cheaper to compute, fast to query, and satisfy 4 properties:
Preprocessing time should be O(n) or O(nlogn).
Storage less than O(n2).
Query less than O(m + nlogn).
Fidelity: approximated distance as close as possible to the actual
distances.

Approximate distance oracles
Using spanning trees and distance labeling for approximating distances
(Thorup and Zwick):
Preprocessing time: O(kmn1/k).
Storage: O(kn1+1/k).
Query less than O(k).
Fidelity: estimated distance vs actual distance ∈ [1, 2k − 1].
Note: k = 1, 2, logn, higher values of k do not improve the space or
preprocessing time.

Metric space
Ordered pair (M, d) where M is a set and d is a metric
d : M × M → R
∀x, y, z ∈ M, the following holds:
d(x, y) ≥ 0
d(x, y) = 0 ⇐⇒ x = y
d(x, y) = d(y, x)
d(x, z) ≤ d(x, y) + d(y, z)

Euclidean distance
d(p, q) = d(q, p) = (q1 − p1)2 + (q2 − p2)2 + ... + (qn − pn)2
=
n
i=1
(qi − pi )2

Hyperbolic distance
d( x1, y1 , x2, y2 ) = arcosh(coshy1cosh(x2 − x1)coshy2 − sinhy1sinhy2)
Where:
sinhx = ex −e−x
2 (hyperbolic Sine).
coshx = ex +e−x
2 (hyperbolic Cosine).

δ - hyperbolic metric space
Given metric space (V , d) embeds into tree metric iﬀ 4-point condition
holds:
∀w, x, y, z ∈ V :
S := S(w, x, y, z) = d(w, x) + d(y, z)
M := M(w, x, y, z) = d(x, y) + d(w, z)
L := L(w, x, y, z) = d(x, z) + d(w, y)
S ≤ M ≤ L
Then: ∀δ ≥ 0, (L − M)/2 ≤ δ

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Related works
Theoretical results provide guaranteed approximation bounds for
speciﬁc graph classes:
Distance labeling in hyperbolic graphs
A Note on Distance Approximating Trees in Graphs
Additive spanners and distance and routing labeling schemes for
hyperbolic graphs
A compact routing scheme and approximate distance oracle for
power-law graphs
Reconstructing approximate tree metrics
Essays in Group Theory
Diameters, centers, and approximating trees of δ-hyperbolic geodesic
spaces and graphs
But has not been empirically evaluated on real-world graphs.

Related works
Spanning trees
Quick query O(nlogn).
Reduce space storage.

Related works
Developing approximate distance oracles on empirical Graphs small world
graphs, hypergrid graphs, Facebook, telecom, Google news graph, web
graph, etc.
Eﬃcient Shortest Paths on Massive Social Graphs
Fast fully dynamic landmark-based estimation of shortest path
distances in very large graphs
Querying Shortest Path Distance with Bounded Errors in Large
Graphs
Orion: shortest path estimation for large social graphs
Approximating Shortest Paths in Social Graphs
Fast exact shortest-path distance queries on large networks by pruned
landmark labeling
Toward a distance oracle for billion-node graphs
Heuristics lack a theoretical foundation.

Related works

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Proposed method
Hyperbolicity-based Breath First Search (HyperBFS). Notation from graph
hyperbolicity on real world networks for developing spanning trees:
Height ≤ O(logn)
Distance queries: O(logn)
Storage O(n) words of space for an n-node graph.

Algorithm
Hyperbolicity-based Tree Oracle: constructing geometric oracle
Choose highly central vertex (measure of centrality in graph based on
shortest paths) as root. But we use out degree instead (power-law
network) cause they are correlated.
Build 1-10 trees (BFS algorithm) with distinct root by ordered degree
for approximation → parallel computing distance labeling.
Distances between x and y is minimum distances in diﬀerent trees
constructed.

Algorithm
Set 1: Embedding graph into multi-dimensional geometric space
Mapping the nodes of the graph into points in the hyperbolic space.
Distance between two d-dimension points x = (x1, x2, ..., xd ) and
y = (y1, y2, ..., yd ) is deﬁned as follow:
arcosh( (1 +
d
i=1
x2
i )(1 +
d
i=1
y2
i ) −
d
i=1
xi yi ).|c|
Note: no guarantees on the distance estimation error

Algorithm
Set 2: Gromov-type tree contraction: improves the accuracy of distance
estimates.
partitioning tree into i-level connected component (coalesce multiple
edges into a single edge)
additive error guaranteed not to exceed 2δlogn, where δ is the
hyperbolic constant of the graph.

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Evaluation
Four Bench-marked:
Gromov-type contraction-based tree.
Steiner trees with proven multiplicative bound.
Rigel: landmark-based approach.
HyperBFS: centrality-based spanning tree oracle.

Setup
2.4 GHz Intel(R) Xeon(R) processor with 190GB of RAM.
Calculate distortion: Let x, y be vertices of a graph G and let dA be the
distance approximated by a distance oracle:
Additive distortion: dG − dA.
Absolute distortion: |dG − dA|.
Multiplicative distortion: |dG −dA|
dG
.
Figure: Computational Time of Hyper BFS on Call Graph II.

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Average absolute error
Figure: Average absolute error on various real-world graph.

Average additive and multiplicative error
Figure: Average additive and multiplicative error on SantaBarbara Facebook
graph.

Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion

Discussion
Exact and approximate algorithms for computing the hyperbolicity of
large-scale graphs (N. Cohen, D. Coudert, A. Lancin)
Indexing and space O(nm) vs O(n).
Query O(n) vs O(logn).
Exact distance vs error bound 2δlogn.
Extending metrics:
Clustering local coeﬃcient: Ci =
2|{eji :vj ,vk ∈Ni ,ejk ∈E}|
ki (ki −1)

Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị

Similar to Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị (20)

More from Hong Ong

More from Hong Ong (8)

Recently uploaded

Recently uploaded (20)

Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị