SlideShare a Scribd company logo
1 of 164
Download to read offline
Link Discovery Tutorial
Part I: Efficiency
Axel-Cyrille Ngonga Ngomo(1)
, Irini Fundulaki(2)
, Mohamed Sherif(1)
(1) Institute for Applied Informatics, Germany
(2) FORTH, Greece
October 18th, 2016
Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 1 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 2 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 3 / 106
Introduction
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
Introduction
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Problem
Naïve complexity ∈ O(S × T), i.e., O(n2
)
Example: ≈ 70 days for cities in DBpedia and LinkedGeoData
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
Introduction
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Problem
Naïve complexity ∈ O(S × T), i.e., O(n2
)
Example: ≈ 70 days for cities in DBpedia and LinkedGeoData
Solutions
1 Reduce the number of comparisons C(A) ≥ |M | [NA11; IJB11; Ngo12;
Ngo13]
2 Use planning [Ngo14]
3 Reduce the complexity of linking to < n2
[GSN16]
4 Use features of hardware environment [Ngo+13; NH16]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 5 / 106
LIMES
Intuition
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Some δ are distances (in the mathematical sense) [NA11]
Examples
Levenshtein distance
Minkowski distance
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
LIMES
Intuition
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Some δ are distances (in the mathematical sense) [NA11]
Examples
Levenshtein distance
Minkowski distance
Intuition
Distances abide by triangle inequality
δ(x, z) + δ(z, y) ≤ δ(x, y) ≤ δ(x, z) + δ(z, y)
Use exemplars for pessimistic approximation
Reduce number of computations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
LIMES
Approach
1 Start with random e1 ∈ T
2 en+1 = arg max
x∈T
n
i=1
δ(x, ei )
3 Assign each t to e(t) with e(t) = arg min
ei
δ(t, ei )
4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t)) − δ(e(t), t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 7 / 106
LIMES
Approach
1 Start with random e1 ∈ T
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 8 / 106
LIMES
Approach
1 Start with random e1 ∈ T
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 9 / 106
LIMES
Approach
2 en+1 = arg max
x∈T
n
i=1
δ(x, ei )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 10 / 106
LIMES
Approach
3 Assign each t to e(t) with e(t) = arg min
ei
δ(t, ei )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 11 / 106
LIMES
Approach
4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t)) − δ(e(t), t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 12 / 106
LIMES
Evaluation
Measure = Levenshtein
Threshold = 0.9, KB base size = 103
x-axis is number of exemplars, y-axis is number of comparisons
0
2
4
6
8
10
0 50 100 150 200 250 300
0.75
0.8
0.85
0.9
0.95
Brute force
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 13 / 106
LIMES
Evaluation
Measure = Levenshtein
Threshold = 0.9, KB base size = 104
x-axis is number of exemplars, y-axis is number of comparisons
Optimal number of examplars ≈ |T|
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200 250 300
0.75
0.8
0.85
0.9
0.95
Brute force
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 14 / 106
LIMES
Conclusion
Time-efficent approach
Reduces number of comparisons through approximation
No guarantee pertaining to reduction ratio
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 15 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 16 / 106
MultiBlock
Intuition
Idea
Create multidimensional index of data
Partition the space so that σ(x, y) < θ ⇒ index(x) = index(y)
Decrease number of computations by only comparing pairs (x, y) with
index(x) = index(y)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
MultiBlock
Intuition
Idea
Create multidimensional index of data
Partition the space so that σ(x, y) < θ ⇒ index(x) = index(y)
Decrease number of computations by only comparing pairs (x, y) with
index(x) = index(y)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
MultiBlock
Approach
Similarity functions are aggregations of atomic measures sim, e.g.,
σ = MIN(levenshtein, jaccard)
Each atomic measure must define a blocking function
block : (S ∪ T) × [0, 1] → P(Nn
)
Each aggregation must define a similarity, a block and a threshold
aggregation
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 18 / 106
MultiBlock
Example
sim = levenshtein(s,t)
max(|s|,|t|) (normalized levenshtein)
block : Σq
→ N
Maps each q-gram to exactly one block (e.g., Goedelisation)
Only c = max(|s|, |t|)(1 − θ) · q + 1 q-grams to be indexed per string
index(s, θ) = {block(qgrams(s[0...c])}
String mapped to several blocks
Comparison only within blocks
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
MultiBlock
Example
sim = levenshtein(s,t)
max(|s|,|t|) (normalized levenshtein)
block : Σq
→ N
Maps each q-gram to exactly one block (e.g., Goedelisation)
Only c = max(|s|, |t|)(1 − θ) · q + 1 q-grams to be indexed per string
index(s, θ) = {block(qgrams(s[0...c])}
String mapped to several blocks
Comparison only within blocks
Comparison of drugs in DBpedia and Drugbank
Setting Comparisons Runtime Links
Full comparison 22,242,292 430s 1,403
Blocking (100 blocks) 906,314 44s 1,349
Blocking (1,000 blocks) 322,573 14s 1,287
MultiBlock 122,630 6s 1,403
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 20 / 106
HR3
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
HR3
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Intuition: Reduce the number of comparisons C(A) ≥ |M |
Maximize reduction ratio:
RR(A) = 1 −
C(A)
|S||T|
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
HR3
Link Discovery
Definition (Declarative Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ}
Intuition: Reduce the number of comparisons C(A) ≥ |M |
Maximize reduction ratio:
RR(A) = 1 −
C(A)
|S||T|
Question
Can we devise lossless approaches with guaranteed RR?
Advantages
Space management
Runtime prediction
Resource scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T|
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T|
Approach H(α) fulfills RR guarantee criterion, iff:
∀r < RRmax, ∃α : RR(H(α)) ≥ r
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
HR3
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T|
Approach H(α) fulfills RR guarantee criterion, iff:
∀r < RRmax, ∃α : RR(H(α)) ≥ r
Here, we use relative reduction ratio (RRR):
RRR(A) =
RRmax
RR(A)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
Goal
Formal Goal
Devise H(α) : ∀r > 1, ∃α : RRR(H(α)) ≤ r
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 23 / 106
HR3
Restrictions
Minkowski Distance
δ(s, t) = p
n
i=1
|si − ti |p, p ≥ 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 24 / 106
HR3
Space Tiling
HYPPO
δ(s, t) ≤ θ describes a hypersphere
Approximate hypersphere by using a hypercube
Easy to compute
No loss of recall (blocking)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 25 / 106
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/α
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/α
Tile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , cn) ∈ Nn
Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
HR3
Space Tiling
Set width of single hypercube to ∆ = θ/α
Tile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , cn) ∈ Nn
Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
HYPPO
Idea
Combine (2α + 1)n
hypercubes around C(ω) to approximate hypersphere
RRR(HYPPO(α)) = (2α+1)n
αnS(n)
lim
α→∞
RRR(HYPPO(α)) = 2n
S(n)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 27 / 106
HYPPO
Reduction Ratio
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
HYPPO
Reduction Ratio
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
lim
α→∞
RRR(HYPPO(α)) = 4
π ≈ 1.27 (n = 2)
lim
α→∞
RRR(HYPPO(α)) = 6
π ≈ 1.91 (n = 3)
lim
α→∞
RRR(HYPPO(α)) = 32
π2 ≈ 3.24 (n = 4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
HR3
Idea
index(C, ω) =



0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n,
n
i=1
(|ci − c(ω)i | − 1)p
else,
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 29 / 106
HR3
Idea
Compare C(ω) with C iff index(C, ω) ≤ αp
α = 4, p = 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 30 / 106
HR3
Idea
Claims
No loss of recall
lim
α→∞
RRR(HR3
(α)) = 1
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 31 / 106
HR3
Lemma 1
Lemma
index(C, s) = x ⇒ ∀t ∈ C δp
(s, t) > x∆p
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
HR3
Lemma 1
Lemma
index(C, s) = x ⇒ ∀t ∈ C δp
(s, t) > x∆p
p = 2, n = 2, index(C, s) = 5
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
HR3
Lemma 1
Lemma
index(C, s) = x ⇒ ∀t ∈ C δp
(s, t) > x∆p
Proof.
index(C, s) = x ⇒
n
i=1
(|ci − ci (s)| − 1)p
= x
Out of definition of cube index follows |si − ti | > (|ci − ci (s)| − 1)∆
Thus,
n
i=1
|si − ti |p
>
n
i=1
(|ci − ci (s)| − 1)p
∆p
Therewith, δp
(s, t) > x∆p
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 33 / 106
HR3
Lemma 2
Lemma
∀s ∈ S : index(C, s) > αp
implies that all t ∈ C are non-matches
Proof.
Follows directly from Lemma 1:
index(C, s) > αp
⇒ ∀t ∈ C, δp
(s, t) > ∆p
αp
= θp
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 34 / 106
HR3
Idea
Claims
No loss of recall
lim
α→∞
RRR(HR3
(α)) = 1
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 35 / 106
HR3
Lemma 3
Lemma
∀α > 1 RRR(HR3
(2α)) < RRR(HR3
(α))
p = 2, α = 4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 36 / 106
HR3
Proof (idea)
Lemma
∀α > 1 RRR(HR3
(2α)) < RRR(HR3
(α))
p = 2, α = 8
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 37 / 106
HR3
Proof
Lemma
∀α > 1 RRR(HR3
(2α)) < RRR(HR3
(α))
p = 2, α = 25
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 38 / 106
HR3
Proof
Lemma
∀α > 1 RRR(HR3
(2α)) < RRR(HR3
(α))
p = 2, α = 50
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 39 / 106
HR3
Theorem
Theorem
lim
α→∞
RRR(HR3
(α)) = 1
Proof.
α → ∞ ⇒ ∆ → 0
Thus, C(s) = {s}, C = {t}
Index function :
n
i=1
∆p
(|ci (s) − ci | − 1)p
≤ ∆p
αp
∆ → 0 ⇒
n
i=1
|si − ti |p
≤ θp
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 40 / 106
HR3
Idea
Claims
No loss of recall
lim
α→∞
RRR(HR3
(α)) = 1
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 41 / 106
HR3
Evaluation
Compare HR3
with LIMES 0.5’s HYPPO and SILK 2.5.1
Experimental Setup:
Deduplicating DBpedia places by minimum elevation, elevation and maximum
elevation (θ = 49m, 99m).
Geonames and LinkedGeoData by longitude and latitude (θ = 1◦
, 9◦
)
Windows 7 Enterprise machine 64-bit computer with a 2.8GHz i7 processor
with 8GB RAM.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 42 / 106
HR3
Evaluation (Comparisons)
Experiment 2: Deduplicating DBpedia places, θ = 99m
0.64 × 106
less comparisons
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 43 / 106
HR3
Experiments (Comparisons)
Experiment 4: Linking Geonames and LinkedGedData, θ = 9◦
4.3 × 106
less comparisons
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 44 / 106
HR3
Experiments (RRR)
Experiment 3,4: Geonames and LGD, θ = 1, 9◦
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 45 / 106
HR3
Experiments (Runtime)
Experiment 3,4: Geonames and LGD, θ = 1, 9◦
2 4 8 16 32
Granularity
0
20
40
60
80
100
120
140
160
180
Runtime(s)
HYPPO (Exp 3)
HR3 (Exp 3)
HYPPO (Exp. 4)
HR3 (Exp. 4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 46 / 106
HR3
Experiments (Runtime)
Experiment 1, 2: DBpedia, θ = 49, 99m
Experiment 3, 4: Geonames and LGD, θ = 1, 9◦
Exp. 1 Exp. 2 Exp. 3 Exp. 4
10
0
10
1
10
2
10
3
10
4
Runtime(s)
HR3
HYPPO
SILK
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 47 / 106
HR3
Conclusion
Main result: new category of algorithms for link discovery
Outperforms the state of the art (runtime, comparisons)
Future Work
Combine HR3
with multi-indexing approach
Devise resource management approach
Develop other algorithms (esp. for strings) with the same/similar theoretical
guarantees
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 48 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 49 / 106
AEGLE
Why?
:E1 rdfs:label "Engine failure"@en
:E1 rdf:type :Error
:E1 :beginDate :"2015-04-22T11:39:35"
:E1 :endDate :"2015-04-22T11:39:37"
:E2 rdfs:label "Car accident"@en
:E2 rdf:type :Accident
:E2 :beginDate :"2015-06-28T11:45:22"
:E2 :endDate :"2015-06-28T11:45:24"
Need to create links between events, e.g., :startsEvent
Need to deal with volume and velocity
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 50 / 106
AEGLE
Event Definition
Definition (Event)
Events can be modeled as time intervals: v = (b(v), e(v))
b(v) is the beginning time (:beginDate)
e(v) is the end time (:endDate)
b(v) < e(v)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 51 / 106
AEGLE
Allen’s Interval Algebra
Relation Notation Inverse Illustration
X before Y bf (X, Y ) bfi(X, Y )
X
Y
X meets Y m(X, Y ) mi(X, Y )
X
Y
X finishes Y f (X, Y ) fi(X, Y )
X
Y
X starts Y st(X, Y ) sti(X, Y )
X
Y
X during Y d(X, Y ) di(X, Y )
X
Y
X equal Y eq(X, Y ) eq(X, Y )
X
Y
X overlaps with Y ov(X, Y ) ovi(X, Y )
X
Y
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 52 / 106
AEGLE
Solution
Aegle: Allen’s intErval alGebra for Link
discovEry
Efficient computation of temporal
relations between events
Allen’s Interval Algebra: distinct,
exhaustive, and qualitative relations
between time intervals
Intuition
Expressing 13 Allen relations
using 8 atomic relations
Time is ordered: Find matching
entities using two sorted lists
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 53 / 106
AEGLE
Express st(s, t) using atomic relations
s
t
b(s) = b(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
AEGLE
Express st(s, t) using atomic relations
s
t
b(s) = b(t)
s
t
b(s) < e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
AEGLE
Express st(s, t) using atomic relations
s
t
b(s) = b(t)
s
t
b(s) < e(t)
s
t
e(s) > b(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
AEGLE
Express st(s, t) using atomic relations
s
t
b(s) = b(t)
s
t
b(s) < e(t)
s
t
e(s) > b(t)
s
t
e(s) < e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
AEGLE
Atomic relations
Compute 8 atomic Boolean relations between begin and end points
BeginBegin (BB) for b(s), b(t):
BB1(s, t) ⇔ (b(s) < b(t))
BB0(s, t) ⇔ (b(s) = b(t))
BB−1(s, t) ⇔ (b(s) > b(t)) ⇔ ¬(BB1(s, t) ∨ BB0(s, t))
BeginEnd(BE) for b(s), e(t)
EndBegin(EB) for e(s), b(t)
EndEnd(EE) for e(s), e(t)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 55 / 106
AEGLE
Combination of relations
s
t
st(s, t) ⇔ BB0
(s, t) ∧ BE1
(s, t) ∧ EB−1
(s, t) ∧ EE1
(s, t) ⇔ {BB0
(s, t) ∧ EE1
(s, t) }
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
AEGLE
Combination of relations
s
t
st(s, t) ⇔ BB0
(s, t) ∧ BE1
(s, t) ∧ EB−1
(s, t) ∧ EE1
(s, t) ⇔ {BB0
(s, t) ∧ EE1
(s, t) }
t
s
sti(s, t) ⇔ BB0
(s, t) ∧ BE1
(s, t) ∧ EB−1
(s, t) ∧ EE−1
(s, t) ⇔
{BB0
(s, t) ∧ EE−1
(s, t)} =
{ BB0
(s, t) ∧¬(EE0
(s, t)∨ EE1
(s, t))}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
AEGLE
Combination of relations
s
t
st(s, t) ⇔ BB0
(s, t) ∧ BE1
(s, t) ∧ EB−1
(s, t) ∧ EE1
(s, t) ⇔ {BB0
(s, t) ∧ EE1
(s, t) }
t
s
sti(s, t) ⇔ BB0
(s, t) ∧ BE1
(s, t) ∧ EB−1
(s, t) ∧ EE−1
(s, t) ⇔
{BB0
(s, t) ∧ EE−1
(s, t)} =
{ BB0
(s, t) ∧¬(EE0
(s, t)∨ EE1
(s, t))}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For st:
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For st:
Compute BB0
:
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For st:
Compute BB0
:
s1 s2 t1 t2
{(s1, t1), (s2, t1)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For st:
Compute BB0
:
s1 s2 t1 t2
{(s1, t1), (s2, t1)}
Compute EE1
:
s1 s2 t1 t2
{(s1, t1), (s2, t1), (s1, t2)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For st:
Compute BB0
:
s1 s2 t1 t2
{(s1, t1), (s2, t1)}
Compute EE1
:
s1 s2 t1 t2
{(s1, t1), (s2, t1), (s1, t2)}
Intersection between BB0
and EE1
:
{(s1, t1), (s2, t1)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For sti:
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For sti:
Retrieve BB0
and EE1
:
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For sti:
Retrieve BB0
and EE1
:
Compute EE0
:
s1 s2 t1 t2
{(s2, t2)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For sti:
Retrieve BB0
and EE1
:
Compute EE0
:
s1 s2 t1 t2
{(s2, t2)}
Union between EE0
and EE1
:
{(s1, t1), (s2, t1), (s2, t1), (s2, t2)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
AEGLE
Algorithm for st, sti
Source
s1
s2
Target
t1
t2
For sti:
Retrieve BB0
and EE1
:
Compute EE0
:
s1 s2 t1 t2
{(s2, t2)}
Union between EE0
and EE1
:
{(s1, t1), (s2, t1), (s2, t1), (s2, t2)}
Difference between BB0
and EE0
, EE1
:
{(s1, t2), (s2, t2)}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
AEGLE
Experimental Set-Up
Datasets: S = T
Log Type Dataset name Size Unique b(s) Unique e(s)
Machinery
3KMachines 3,154 960 960
30KMachines 28,869 960 960
300KMachines 288,690 960 960
Query
3KQueries 3,888 3,636 3,638
30KQueries 30,635 3,070 3,070
300KQueries 303,991 184 184
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
AEGLE
Experimental Set-Up
Datasets: S = T
Log Type Dataset name Size Unique b(s) Unique e(s)
Machinery
3KMachines 3,154 960 960
30KMachines 28,869 960 960
300KMachines 288,690 960 960
Query
3KQueries 3,888 3,636 3,638
30KQueries 30,635 3,070 3,070
300KQueries 303,991 184 184
State-of-the-art:
Silk extended to deal with spatio-temporal data
Baseline for eq using brute-force
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
AEGLE
Experimental Set-Up
Datasets: S = T
Log Type Dataset name Size Unique b(s) Unique e(s)
Machinery
3KMachines 3,154 960 960
30KMachines 28,869 960 960
300KMachines 288,690 960 960
Query
3KQueries 3,888 3,636 3,638
30KQueries 30,635 3,070 3,070
300KQueries 303,991 184 184
State-of-the-art:
Silk extended to deal with spatio-temporal data
Baseline for eq using brute-force
Evaluation measures:
atomic runtime of each of the atomic relations
relation runtime required to compute each Allen’s relation
total runtime required to compute all 13 relations
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
AEGLE
Results
Q1: Does the reduction of Allen relations to 8 atomic relations influence the
overall runtime of the approach?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 60 / 106
AEGLE
Results
Q1: Does the reduction of Allen relations to 8 atomic relations influence the
overall runtime of the approach?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 61 / 106
AEGLE
Results
Q2: How does Aegle perform when compared with the state of the art in
terms of time efficiency?
Log Type Dataset Name
Total Runtime
Aegle Aegle * Silk
Machine
3KMachines 11.26 5.51 294.00
30KMachines 1,016.21 437.79 29,846.00
300KMachines 189,442.16 78,416.61 NA
Query
3KQueries 26.94 17.91 541.00
30KQueries 988.78 463.27 33,502.00
300KQueries 211,996.88 86,884.98 NA
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 62 / 106
AEGLE
Results
Machine Query
Relation Approach 3KMachines 30KMachines 300KMachines 3KQueries 30KQueries 300KQueries
m
Aegle 0.02 0.19 3.42 0.02 0.21 3.89
Silk 23.00 2,219.00 NA 41.00 2,466.00 NA
eq
Aegle 0.05 0.79 49.84 0.05 0.45 348.51
Silk 23.00 2,250.00 NA 41.00 2,473.00 NA
baseline 2.05 171.10 23,436.30 3.15 196.09 31,452.54
ovi
Aegle 3.16 222.27 38,226.32 11.97 257.59 42,121.68
Silk 22.00 2,189.00 NA 42.00 2,503.00 NA
Conclusion
Aegle = efficient temporal linking
Reduction of 13 Allen Interval relations to 8 atomic relations
Efficiency: simple sorting with complexity O(n log n)
Scalable LD
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 63 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 64 / 106
Gnome
Problem
What if ...
Data does not fit memory C, i.e.,
|S| + |T| > |C|
1 Common problem
2 Growing size and number of
datasets
≈ 150 × 109
triples
≈ 10 × 103
datasets
Largest dataset with
> 20 × 109
triples
3 Mostly in-memory solutions
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 65 / 106
Gnome
Idea
Insight
Most approaches rely on divide-and-merge paradigm
Example: HR3
σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
Gnome
Idea
Insight
Most approaches rely on divide-and-merge paradigm
Example: HR3
σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
Gnome
Formal Model
1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧
i
Si = S
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
Gnome
Formal Model
1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧
i
Si = S
2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧
j
Tj = T
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
Gnome
Formal Model
1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧
i
Si = S
2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧
j
Tj = T
3 Find mapping function µ : S → 2T
with
elements of Si only compared with elements of sets in µ(Si )
union of results over all Si ∈ S is exactly M .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
Gnome
Formal Model
1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧
i
Si = S
2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧
j
Tj = T
3 Find mapping function µ : S → 2T
with
elements of Si only compared with elements of sets in µ(Si )
union of results over all Si ∈ S is exactly M .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
GNOME
Task Graph
Definition
A task Eij stands for comparing Si with Tj ∈ µ(Si )
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
GNOME
Task Graph
Definition
A task Eij stands for comparing Si with Tj ∈ µ(Si )
Task Graph G = (V , E, wv , we), with
V = S ∪ T
wv (v) = |V |
we(eij ) = |Si ||Tj |
S1
3
T1
2
S2
2
T2
1
S3
4
T3
3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
GNOME
Task Graph
Definition
A task Eij stands for comparing Si with Tj ∈ µ(Si )
Task Graph G = (V , E, wv , we), with
V = S ∪ T
wv (v) = |V |
we(eij ) = |Si ||Tj |
S1
3
T1
2
S2
2
T2
1
S3
4
T3
36
3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
GNOME
Task Graph
Definition
A task Eij stands for comparing Si with Tj ∈ µ(Si )
Task Graph G = (V , E, wv , we), with
V = S ∪ T
wv (v) = |V |
we(eij ) = |Si ||Tj |
S1
3
T1
2
S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
GNOME
Problem Reformulation
Locality maximization
Two-step approach:
1 Clustering: Find groups of nodes that fit in memory and
2 Scheduling: Compute sequence of groups that minimizes hard drive access
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 69 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Naïve Approach
Cluster by Si
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
6
3
4
2
6
4
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
GNOME
Step1: Clustering
Greedy Approach
Start by largest task
Add connected largest tasks until none fits in C
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
GNOME
Step1: Clustering
Greedy Approach
Start by largest task
Add connected largest tasks until none fits in C
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
GNOME
Step1: Clustering
Greedy Approach
Start by largest task
Add connected largest tasks until none fits in C
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
6
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
GNOME
Step1: Clustering
Greedy Approach
Start by largest task
Add connected largest tasks until none fits in C
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
6
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
GNOME
Step1: Clustering
Greedy Approach
Start by largest task
Add connected largest tasks until none fits in C
Example: |C| = 7
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
GNOME
Step2: Scheduling
Insights
Output of clustering: Sequence G1, . . . , GN of clusters
Intuition: Consecutive clusters should share data
Goal: Maximize overlap of generated sequence
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
GNOME
Step2: Scheduling
Insights
Output of clustering: Sequence G1, . . . , GN of clusters
Intuition: Consecutive clusters should share data
Goal: Maximize overlap of generated sequence
Overlap o(Gi , Gj ) =
v∈V (Gi )∩V (Gj )
|v|
o(G4, G1) = 4
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
GNOME
Step2: Scheduling
Insights
Output of clustering: Sequence G1, . . . , GN of clusters
Intuition: Consecutive clusters should share data
Goal: Maximize overlap of generated sequence
Overlap o(Gi , Gj ) =
v∈V (Gi )∩V (Gj )
|v|
o(G4, G1) = 4
Overlap o(G1, . . . , GN) =
N−1
i=1
o(Gi , Gi+1)
o(G4, G1, G2, G1) = 9
S1
3
T1
2
S2
2
T2
1
S3 4
T3
36
3
4
2
4
12
6
12
6
46
3 2
4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
GNOME
Step2: Scheduling
Best-Effort
Select random pair of clusters
If permutation improves overlap, then permute
Relies on local knowledge for scalability
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
GNOME
Step2: Scheduling
Best-Effort
Select random pair of clusters
If permutation improves overlap, then permute
Relies on local knowledge for scalability
Trick:
∆(Gi , Gj ) = (o(Gi−1, Gj ) + o(Gj , Gi+1) + o(Gj−1, Gi ) + o(Gi , Gj+1))−
(o(Gi−1, Gi ) + o(Gi , Gi+1) + o(Gj−1, Gj ) + o(Gj , Gj+1))
(1)
G3 G1 G2 G4
0 3 2
G4 G1 G2 G3
4 3 2
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
GNOME
Step2: Scheduling
Greedy
Start with random cluster
Choose next cluster with largest overlap
Global knowledge needed
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
GNOME
Step2: Scheduling
Greedy
Start with random cluster
Choose next cluster with largest overlap
Global knowledge needed
G3 G1 G2 G4
0 3 2
G3 G2 G1 G4
2 3 4
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
GNOME
Experimental Setup
Datasets
1 DBP: 1 million labels from DBpedia
version 04-2015
2 LGD: 0.8 million places from
LinkedGeoData
Hardware
1 Intel Xeon E5-2650 v3 processors
(2.30GHz)
2 Ubuntu 14.04.3 LTS
3 10GB RAM
Measures
1 Total runtime
2 Hit ratio
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 75 / 106
GNOME
Evaluation of Clustering
Only show results of LGD
Results on DBP lead to similar insights
Runtimes Hit Ratio
|C| Naive Greedy Naive Greedy
100 568.0 646.3 0.57 0.77
200 518.3 594.0 0.66 0.80
400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.64
2,000 6,168.0 115,450.0 0.51 0.63
4,000 7,118.3 121,901.7 0.50 0.63
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
GNOME
Evaluation of Clustering
Only show results of LGD
Results on DBP lead to similar insights
Runtimes Hit Ratio
|C| Naive Greedy Naive Greedy
100 568.0 646.3 0.57 0.77
200 518.3 594.0 0.66 0.80
400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.64
2,000 6,168.0 115,450.0 0.51 0.63
4,000 7,118.3 121,901.7 0.50 0.63
Conclusion
1 Naïve approach is more efficient
2 Greedy approach is more effective
3 Select naïve approach for clustering
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
GNOME
Evaluation of Scheduling
Only show results of LGD
Results on DBP lead to similar insights
Runtimes (ms) Hit ratio
|C| Best-Effort Greedy Best-Effort Greedy
100 571.3 1,599.3 0.56 0.68
200 565.7 1,448.3 0.66 0.85
400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.86
2,000 6,268.0 810,855.0 0.51 0.86
4,000 6,675.7 814,041.7 0.50 0.86
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
GNOME
Evaluation of Scheduling
Only show results of LGD
Results on DBP lead to similar insights
Runtimes (ms) Hit ratio
|C| Best-Effort Greedy Best-Effort Greedy
100 571.3 1,599.3 0.56 0.68
200 565.7 1,448.3 0.66 0.85
400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.86
2,000 6,268.0 810,855.0 0.51 0.86
4,000 6,675.7 814,041.7 0.50 0.86
Conclusion
1 Best-effort approach more time-efficient
2 Greedy more effective
3 Best-effort approach is to be used for scheduling
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
GNOME
Comparison with Caching Approaches
Runtimes (ms)
|C| Gnome FIFO F2 LFU LRU SLRU
1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.3
2,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.0
4,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0
Hit ratio
1,000 0.51 0.17 0.16 0.19 0.17 0.17
2,000 0.51 0.29 0.30 0.32 0.30 0.30
4,000 0.51 0.54 0.55 0.59 0.55 0.56
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
GNOME
Comparison with Caching Approaches
Runtimes (ms)
|C| Gnome FIFO F2 LFU LRU SLRU
1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.3
2,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.0
4,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0
Hit ratio
1,000 0.51 0.17 0.16 0.19 0.17 0.17
2,000 0.51 0.29 0.30 0.32 0.30 0.30
4,000 0.51 0.54 0.55 0.59 0.55 0.56
Conclusion
1 Gnome is more time-efficient
2 Leads to higher hit rates in most cases
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
GNOME
Scalability
100k 200k 400k 800k
LGD 362,141.3 1,452,922.0 5,934,038.7 20,001,965.7
DBP 434,630.7 1,790,350.7 6,677,923.0 12,653,403.3
Conclusion
Sub-quadratic growth of runtime
Runtime grows linearly with number of mappings
For LGD, 360 – 370 mappings/s
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 79 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 80 / 106
Reach for the Cloud?
Dealing with Time Complexity
1 Devise better algorithms
Blocking
Algorithms for given metrics
PPJoin+, EDJoin
HR3
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
Reach for the Cloud?
Dealing with Time Complexity
1 Devise better algorithms
Blocking
Algorithms for given metrics
PPJoin+, EDJoin
HR3
2 Use parallel hardware
Threads pools
MapReduce
Massively Parallel Hardware
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
Reach for the Cloud?
Parallel Implementations
Different architectures
Memory (shared, hybrid, distributed)
Execution paths (different, same)
Location (remote, local)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
Reach for the Cloud?
Parallel Implementations
Different architectures
Memory (shared, hybrid, distributed)
Execution paths (different, same)
Location (remote, local)
Question
When should which use which hardware?
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
Reach for the Cloud?
Premises and Goals
Premises
Given an algorithm that runs on all three
architectures ...
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
Reach for the Cloud?
Premises and Goals
Premises
Given an algorithm that runs on all three
architectures ...
Note to self: Implement one
Picked HR3
Reduction-ratio-optimal
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
Reach for the Cloud?
Premises and Goals
Premises
Given an algorithm that runs on all three
architectures ...
Note to self: Implement one
Picked HR3
Reduction-ratio-optimal
Reach for the Cloud?
Goals
Compare runtimes on all three parallel
architectures
Find break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
Reach for the Cloud?
HR3
1 Assume spaces with Minkowski metric and p ≥ 2
δ(s, t) = p
n
i=1
|si − ti |p
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 84 / 106
Reach for the Cloud?
HR3
2 Create grid of width ∆ = θ/α
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 85 / 106
Reach for the Cloud?
HR3
3 Link discovery condition describes a hypersphere
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 86 / 106
Reach for the Cloud?
HR3
4 Approximate hypersphere with hypercube
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 87 / 106
Reach for the Cloud?
HR3
5 Use index to discard portions of hypercube
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 88 / 106
Reach for the Cloud?
HR3 in GPUs
Large number of simple compute cores
Same instruction, multiple data
Bottleneck: PCI Express Bus
1 Run discretization on CPU
2 Run indexing on GPU
3 Run comparisons on CPU
Local memory
Global/constant memory
Local memory Local memory Local memory
Work item
Work group
Global/constant cache
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 89 / 106
Reach for the Cloud?
HR3 on the Cloud
Naive Approach
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
Reach for the Cloud?
HR3 on the Cloud
Naive Approach
1 Rely on Map Reduce paradigm
2 Run discretization and assignment to cubes in map step
3 Run distance computation in reduce step
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
Reach for the Cloud?
HR3 on the Cloud
Load Balancing
1 Run two jobs
2 Job1: Compute cube population matrix
3 Job2: Distribute balanced linking tasks across mappers and reducers
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 91 / 106
Reach for the Cloud?
Evaluation Hardware
1 CPU (Java)
32-core server running Linux 10.0.4
AMD Opteron 6128 clocket at 2.0GHz
2 GPU (C++)
AMD Radeon 7870 with 20 compute units, 64
parallel threads
Host program ran on Intel Core i7 3770 CPU with
8GB RAM and Linux 12.10
Ran the Java code on the same machine for scaling
3 Cloud (Java)
10 c1.medium nodes (2 cores, 1.7GB) for small
experiments
30 c1.large nodes (8 cores, 7GB) for large
experiments
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 92 / 106
Reach for the Cloud?
Experimental Setup
Run deduplication task
Evaluate behavior on different number of dimensions
Important: Scale results
Different hardware (2-7 times faster C++ workstation)
Programming language
Evaluate scalability (DS4)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 93 / 106
Reach for the Cloud?
Results – DS1
DBpedia, 3 dimensions, 26K
GPU scales better
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 94 / 106
Reach for the Cloud?
Results – DS2
DBpedia, 2 dimensions, 475K
GPUs scale better across different θ
Break-even point ≈ 108
results
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 95 / 106
Reach for the Cloud?
Results – DS3
LinkedGeoData, 2 dimensions, 500K
Similar picture
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 96 / 106
Reach for the Cloud?
Results – Scalability
LinkedGeoData, 2 dimensions, 6M
Cloud (with load balancing) better for ≈ 1010
+ results
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 97 / 106
Reach for the Cloud?
Summary
Question: When should we use which hardware for link discovery?
Results
Implemented HR3
on different hardware
Provided the first implementation of link discovery on GPUs
Devised a load balancing approach for linking on the cloud
Discovered 108
and 1010
results as break-even points
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 98 / 106
Reach for the Cloud?
Summary
Question: When should we use which hardware for link discovery?
Insights
Load balancing important for using the cloud
GPUs: Need faster buses that PCIe (e.g., Firewire speed)
Accurate use of local resources sufficient for most of the current
applications
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 99 / 106
Table of Contents
1 Introduction
2 LIMES
3 MultiBlock
4 Reduction-Ratio-Optimal Link Discovery
5 Link Discovery for Temporal Data
6 GNOME
7 GPUs and Hadoop
8 Summary and Conclusion
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 100 / 106
Summary and Conclusion
Four approaches to scalability
1 Improve reduction ratio (LIMES,
MultiBlock, HYPPO, HR3
, . . . )
2 Reduce runtime complexity
(AEGLE)
3 Better use of hardware (GNOME)
4 Improve execution of
specifications
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
Summary and Conclusion
Four approaches to scalability
1 Improve reduction ratio (LIMES,
MultiBlock, HYPPO, HR3
, . . . )
2 Reduce runtime complexity
(AEGLE)
3 Better use of hardware (GNOME)
4 Improve execution of
specifications
Challenges include
1 Increase number of
reduction-ratio-optimal
approaches (HR3
, ORCHID)
2 Adaptive resource scheduling
3 Self-regulating approaches
4 Distribution in modern in-memory
architecture (SPARK)
5 . . .
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
Acknowledgment
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 102 / 106
References I
Kleanthi Georgala, Mohamed Ahmed Sherif, and
Axel-Cyrille Ngonga Ngomo. “An Efficient Approach for the
Generation of Allen Relations”. In: ECAI 2016 - 22nd European
Conference on Artificial Intelligence, 29 August-2 September 2016,
The Hague, The Netherlands - Including Prestigious Applications of
Artificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al.
Vol. 285. Frontiers in Artificial Intelligence and Applications. IOS
Press, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi:
10.3233/978-1-61499-672-9-948. url:
http://dx.doi.org/10.3233/978-1-61499-672-9-948.
Robert Isele, Anja Jentzsch, and Christian Bizer. “Efficient
Multidimensional Blocking for Link Discovery without losing Recall”.
In: Proceedings of the 14th International Workshop on the Web and
Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. by
Amélie Marian and Vasilis Vassalos. 2011. url:
http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 103 / 106
References II
Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - A
Time-Efficient Approach for Large-Scale Link Discovery on the Web
of Data”. In: IJCAI 2011, Proceedings of the 22nd International Joint
Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July
16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317.
isbn: 978-1-57735-516-8. doi:
10.5591/978-1-57735-516-8/IJCAI11-385. url: http:
//dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385.
Axel-Cyrille Ngonga Ngomo et al. “When to Reach for the Cloud:
Using Parallel Hardware for Link Discovery”. In: The Semantic Web:
Semantics and Big Data, 10th International Conference, ESWC 2013,
Montpellier, France, May 26-30, 2013. Proceedings. Ed. by
Philipp Cimiano et al. Vol. 7882. Lecture Notes in Computer Science.
Springer, 2013, pp. 275–289. isbn: 978-3-642-38287-1. doi:
10.1007/978-3-642-38288-8_19. url:
http://dx.doi.org/10.1007/978-3-642-38288-8_19.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 104 / 106
References III
Axel-Cyrille Ngonga Ngomo. “Link Discovery with Guaranteed
Reduction Ratio in Affine Spaces with Minkowski Measures”. In: The
Semantic Web - ISWC 2012 - 11th International Semantic Web
Conference, Boston, MA, USA, November 11-15, 2012, Proceedings,
Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notes
in Computer Science. Springer, 2012, pp. 378–393. isbn:
978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url:
http://dx.doi.org/10.1007/978-3-642-35176-1_24.
Axel-Cyrille Ngonga Ngomo. “ORCHID - Reduction-Ratio-Optimal
Computation of Geo-spatial Distances for Link Discovery”. In: The
Semantic Web - ISWC 2013 - 12th International Semantic Web
Conference, Sydney, NSW, Australia, October 21-25, 2013,
Proceedings, Part I. Ed. by Harith Alani et al. Vol. 8218. Lecture
Notes in Computer Science. Springer, 2013, pp. 395–410. isbn:
978-3-642-41334-6. doi: 10.1007/978-3-642-41335-3_25. url:
http://dx.doi.org/10.1007/978-3-642-41335-3_25.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 105 / 106
References IV
Axel-Cyrille Ngonga Ngomo. “HELIOS - Execution Optimization for
Link Discovery”. In: The Semantic Web - ISWC 2014 - 13th
International Semantic Web Conference, Riva del Garda, Italy,
October 19-23, 2014. Proceedings, Part I. Ed. by Peter Mika et al.
Vol. 8796. Lecture Notes in Computer Science. Springer, 2014,
pp. 17–32. isbn: 978-3-319-11963-2. doi:
10.1007/978-3-319-11964-9_2. url:
http://dx.doi.org/10.1007/978-3-319-11964-9_2.
Axel-Cyrille Ngonga Ngomo and Mofeed M. Hassan. “The Lazy
Traveling Salesman - Memory Management for Large-Scale Link
Discovery”. In: The Semantic Web. Latest Advances and New
Domains - 13th International Conference, ESWC 2016, Heraklion,
Crete, Greece, May 29 - June 2, 2016, Proceedings. Ed. by
Harald Sack et al. Vol. 9678. Lecture Notes in Computer Science.
Springer, 2016, pp. 423–438. isbn: 978-3-319-34128-6. doi:
10.1007/978-3-319-34129-3_26. url:
http://dx.doi.org/10.1007/978-3-319-34129-3_26.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 106 / 106

More Related Content

What's hot

Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Vsevolod Dyomkin
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsJulien PLU
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
 
Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Feynman Liang
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classificationshakimov
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDFSara EL HASSAD
 
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and HyperlinkingDCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and Hyperlinkingmultimediaeval
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Personalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebPersonalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebOana Tifrea-Marciuska
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1net2-project
 
Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISPDevnology
 
Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2Goran S. Milovanovic
 
Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1Goran S. Milovanovic
 

What's hot (20)

Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classification
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDF
 
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and HyperlinkingDCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
Multiple Dispatch
Multiple DispatchMultiple Dispatch
Multiple Dispatch
 
F# and the DLR
F# and the DLRF# and the DLR
F# and the DLR
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data Mining
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Personalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebPersonalised Search for the Social Semantic Web
Personalised Search for the Social Semantic Web
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1
 
Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISP
 
Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2Introduction to R for Data Science :: Session 2
Introduction to R for Data Science :: Session 2
 
Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1Introduction to R for Data Science :: Session 1
Introduction to R for Data Science :: Session 1
 

Viewers also liked

An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...Holistic Benchmarking of Big Linked Data
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communicationSören Auer
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016Edgard Marx
 
openQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkopenQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkEdgard Marx
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your datalocloud
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimizationKisung Kim
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 

Viewers also liked (15)

Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
Versioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and BenchmarksVersioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and Benchmarks
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Benchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory RemarksBenchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory Remarks
 
HOBBIT Project Overview @ ESWC HOBBIT Workshop
HOBBIT Project Overview @ ESWC HOBBIT WorkshopHOBBIT Project Overview @ ESWC HOBBIT Workshop
HOBBIT Project Overview @ ESWC HOBBIT Workshop
 
HOBBIT Survey results
HOBBIT Survey resultsHOBBIT Survey results
HOBBIT Survey results
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016
 
openQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkopenQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering Framework
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your data
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimization
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Workshop Report Benchmarking Linked Data
Workshop Report Benchmarking Linked DataWorkshop Report Benchmarking Linked Data
Workshop Report Benchmarking Linked Data
 

Similar to Link Discovery Tutorial Part I: Efficiency

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryHolistic Benchmarking of Big Linked Data
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
 
Strategies for Cooperation Emergence in Distributed Service Discovery
Strategies for Cooperation Emergence in Distributed Service DiscoveryStrategies for Cooperation Emergence in Distributed Service Discovery
Strategies for Cooperation Emergence in Distributed Service DiscoveryMiguel Rebollo
 
Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Mojisola Erdt née Anjorin
 
Ectel sem_info_rec_learning_resources_v6.0_20120921_ma
Ectel  sem_info_rec_learning_resources_v6.0_20120921_maEctel  sem_info_rec_learning_resources_v6.0_20120921_ma
Ectel sem_info_rec_learning_resources_v6.0_20120921_maMojisola Erdt née Anjorin
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
Programming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTXProgramming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTXStefano Mariani
 
Bagging-Clustering Methods to Forecast Time Series
Bagging-Clustering Methods to Forecast Time SeriesBagging-Clustering Methods to Forecast Time Series
Bagging-Clustering Methods to Forecast Time SeriesTiago Mendes Dantas
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedGiuseppe Rizzo
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 

Similar to Link Discovery Tutorial Part I: Efficiency (20)

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Strategies for Cooperation Emergence in Distributed Service Discovery
Strategies for Cooperation Emergence in Distributed Service DiscoveryStrategies for Cooperation Emergence in Distributed Service Discovery
Strategies for Cooperation Emergence in Distributed Service Discovery
 
Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...
 
Ectel sem_info_rec_learning_resources_v6.0_20120921_ma
Ectel  sem_info_rec_learning_resources_v6.0_20120921_maEctel  sem_info_rec_learning_resources_v6.0_20120921_ma
Ectel sem_info_rec_learning_resources_v6.0_20120921_ma
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
Programming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTXProgramming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTX
 
Bagging-Clustering Methods to Forecast Time Series
Bagging-Clustering Methods to Forecast Time SeriesBagging-Clustering Methods to Forecast Time Series
Bagging-Clustering Methods to Forecast Time Series
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 

More from Holistic Benchmarking of Big Linked Data

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...Holistic Benchmarking of Big Linked Data
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignHolistic Benchmarking of Big Linked Data
 
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017.
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK  ISWC2017. Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK  ISWC2017.
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017. Holistic Benchmarking of Big Linked Data
 

More from Holistic Benchmarking of Big Linked Data (20)

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
 
Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017
 
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017.
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK  ISWC2017. Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK  ISWC2017.
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017.
 

Recently uploaded

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 

Recently uploaded (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 

Link Discovery Tutorial Part I: Efficiency

  • 1. Link Discovery Tutorial Part I: Efficiency Axel-Cyrille Ngonga Ngomo(1) , Irini Fundulaki(2) , Mohamed Sherif(1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 1 / 106
  • 2. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 2 / 106
  • 3. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 3 / 106
  • 4. Introduction Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
  • 5. Introduction Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Problem Naïve complexity ∈ O(S × T), i.e., O(n2 ) Example: ≈ 70 days for cities in DBpedia and LinkedGeoData Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
  • 6. Introduction Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Problem Naïve complexity ∈ O(S × T), i.e., O(n2 ) Example: ≈ 70 days for cities in DBpedia and LinkedGeoData Solutions 1 Reduce the number of comparisons C(A) ≥ |M | [NA11; IJB11; Ngo12; Ngo13] 2 Use planning [Ngo14] 3 Reduce the complexity of linking to < n2 [GSN16] 4 Use features of hardware environment [Ngo+13; NH16] Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 4 / 106
  • 7. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 5 / 106
  • 8. LIMES Intuition Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Some δ are distances (in the mathematical sense) [NA11] Examples Levenshtein distance Minkowski distance Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
  • 9. LIMES Intuition Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Some δ are distances (in the mathematical sense) [NA11] Examples Levenshtein distance Minkowski distance Intuition Distances abide by triangle inequality δ(x, z) + δ(z, y) ≤ δ(x, y) ≤ δ(x, z) + δ(z, y) Use exemplars for pessimistic approximation Reduce number of computations Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 6 / 106
  • 10. LIMES Approach 1 Start with random e1 ∈ T 2 en+1 = arg max x∈T n i=1 δ(x, ei ) 3 Assign each t to e(t) with e(t) = arg min ei δ(t, ei ) 4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t)) − δ(e(t), t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 7 / 106
  • 11. LIMES Approach 1 Start with random e1 ∈ T Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 8 / 106
  • 12. LIMES Approach 1 Start with random e1 ∈ T Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 9 / 106
  • 13. LIMES Approach 2 en+1 = arg max x∈T n i=1 δ(x, ei ) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 10 / 106
  • 14. LIMES Approach 3 Assign each t to e(t) with e(t) = arg min ei δ(t, ei ) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 11 / 106
  • 15. LIMES Approach 4 Approximate δ(s, t) by using δ(s, t) ≥ δ(s, e(t)) − δ(e(t), t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 12 / 106
  • 16. LIMES Evaluation Measure = Levenshtein Threshold = 0.9, KB base size = 103 x-axis is number of exemplars, y-axis is number of comparisons 0 2 4 6 8 10 0 50 100 150 200 250 300 0.75 0.8 0.85 0.9 0.95 Brute force Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 13 / 106
  • 17. LIMES Evaluation Measure = Levenshtein Threshold = 0.9, KB base size = 104 x-axis is number of exemplars, y-axis is number of comparisons Optimal number of examplars ≈ |T| 0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300 0.75 0.8 0.85 0.9 0.95 Brute force Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 14 / 106
  • 18. LIMES Conclusion Time-efficent approach Reduces number of comparisons through approximation No guarantee pertaining to reduction ratio Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 15 / 106
  • 19. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 16 / 106
  • 20. MultiBlock Intuition Idea Create multidimensional index of data Partition the space so that σ(x, y) < θ ⇒ index(x) = index(y) Decrease number of computations by only comparing pairs (x, y) with index(x) = index(y) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
  • 21. MultiBlock Intuition Idea Create multidimensional index of data Partition the space so that σ(x, y) < θ ⇒ index(x) = index(y) Decrease number of computations by only comparing pairs (x, y) with index(x) = index(y) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 17 / 106
  • 22. MultiBlock Approach Similarity functions are aggregations of atomic measures sim, e.g., σ = MIN(levenshtein, jaccard) Each atomic measure must define a blocking function block : (S ∪ T) × [0, 1] → P(Nn ) Each aggregation must define a similarity, a block and a threshold aggregation Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 18 / 106
  • 23. MultiBlock Example sim = levenshtein(s,t) max(|s|,|t|) (normalized levenshtein) block : Σq → N Maps each q-gram to exactly one block (e.g., Goedelisation) Only c = max(|s|, |t|)(1 − θ) · q + 1 q-grams to be indexed per string index(s, θ) = {block(qgrams(s[0...c])} String mapped to several blocks Comparison only within blocks Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
  • 24. MultiBlock Example sim = levenshtein(s,t) max(|s|,|t|) (normalized levenshtein) block : Σq → N Maps each q-gram to exactly one block (e.g., Goedelisation) Only c = max(|s|, |t|)(1 − θ) · q + 1 q-grams to be indexed per string index(s, θ) = {block(qgrams(s[0...c])} String mapped to several blocks Comparison only within blocks Comparison of drugs in DBpedia and Drugbank Setting Comparisons Runtime Links Full comparison 22,242,292 430s 1,403 Blocking (100 blocks) 906,314 44s 1,349 Blocking (1,000 blocks) 322,573 14s 1,287 MultiBlock 122,630 6s 1,403 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 19 / 106
  • 25. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 20 / 106
  • 26. HR3 Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
  • 27. HR3 Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Intuition: Reduce the number of comparisons C(A) ≥ |M | Maximize reduction ratio: RR(A) = 1 − C(A) |S||T| Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
  • 28. HR3 Link Discovery Definition (Declarative Link Discovery) Given sets S and T of resources and relation R Find M = {(s, t) ∈ S × T : R(s, t)} Here, find M = {(s, t) ∈ S × T : δ(s, t) ≥ τ} Intuition: Reduce the number of comparisons C(A) ≥ |M | Maximize reduction ratio: RR(A) = 1 − C(A) |S||T| Question Can we devise lossless approaches with guaranteed RR? Advantages Space management Runtime prediction Resource scheduling Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 21 / 106
  • 29. HR3 RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T| Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
  • 30. HR3 RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T| Approach H(α) fulfills RR guarantee criterion, iff: ∀r < RRmax, ∃α : RR(H(α)) ≥ r Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
  • 31. HR3 RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T| Approach H(α) fulfills RR guarantee criterion, iff: ∀r < RRmax, ∃α : RR(H(α)) ≥ r Here, we use relative reduction ratio (RRR): RRR(A) = RRmax RR(A) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 22 / 106
  • 32. Goal Formal Goal Devise H(α) : ∀r > 1, ∃α : RRR(H(α)) ≤ r Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 23 / 106
  • 33. HR3 Restrictions Minkowski Distance δ(s, t) = p n i=1 |si − ti |p, p ≥ 2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 24 / 106
  • 34. HR3 Space Tiling HYPPO δ(s, t) ≤ θ describes a hypersphere Approximate hypersphere by using a hypercube Easy to compute No loss of recall (blocking) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 25 / 106
  • 35. HR3 Space Tiling Set width of single hypercube to ∆ = θ/α Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
  • 36. HR3 Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
  • 37. HR3 Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 26 / 106
  • 38. HYPPO Idea Combine (2α + 1)n hypercubes around C(ω) to approximate hypersphere RRR(HYPPO(α)) = (2α+1)n αnS(n) lim α→∞ RRR(HYPPO(α)) = 2n S(n) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 27 / 106
  • 39. HYPPO Reduction Ratio RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
  • 40. HYPPO Reduction Ratio RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 lim α→∞ RRR(HYPPO(α)) = 4 π ≈ 1.27 (n = 2) lim α→∞ RRR(HYPPO(α)) = 6 π ≈ 1.91 (n = 3) lim α→∞ RRR(HYPPO(α)) = 32 π2 ≈ 3.24 (n = 4) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 28 / 106
  • 41. HR3 Idea index(C, ω) =    0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n, n i=1 (|ci − c(ω)i | − 1)p else, Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 29 / 106
  • 42. HR3 Idea Compare C(ω) with C iff index(C, ω) ≤ αp α = 4, p = 2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 30 / 106
  • 43. HR3 Idea Claims No loss of recall lim α→∞ RRR(HR3 (α)) = 1 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 31 / 106
  • 44. HR3 Lemma 1 Lemma index(C, s) = x ⇒ ∀t ∈ C δp (s, t) > x∆p Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
  • 45. HR3 Lemma 1 Lemma index(C, s) = x ⇒ ∀t ∈ C δp (s, t) > x∆p p = 2, n = 2, index(C, s) = 5 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 32 / 106
  • 46. HR3 Lemma 1 Lemma index(C, s) = x ⇒ ∀t ∈ C δp (s, t) > x∆p Proof. index(C, s) = x ⇒ n i=1 (|ci − ci (s)| − 1)p = x Out of definition of cube index follows |si − ti | > (|ci − ci (s)| − 1)∆ Thus, n i=1 |si − ti |p > n i=1 (|ci − ci (s)| − 1)p ∆p Therewith, δp (s, t) > x∆p Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 33 / 106
  • 47. HR3 Lemma 2 Lemma ∀s ∈ S : index(C, s) > αp implies that all t ∈ C are non-matches Proof. Follows directly from Lemma 1: index(C, s) > αp ⇒ ∀t ∈ C, δp (s, t) > ∆p αp = θp Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 34 / 106
  • 48. HR3 Idea Claims No loss of recall lim α→∞ RRR(HR3 (α)) = 1 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 35 / 106
  • 49. HR3 Lemma 3 Lemma ∀α > 1 RRR(HR3 (2α)) < RRR(HR3 (α)) p = 2, α = 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 36 / 106
  • 50. HR3 Proof (idea) Lemma ∀α > 1 RRR(HR3 (2α)) < RRR(HR3 (α)) p = 2, α = 8 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 37 / 106
  • 51. HR3 Proof Lemma ∀α > 1 RRR(HR3 (2α)) < RRR(HR3 (α)) p = 2, α = 25 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 38 / 106
  • 52. HR3 Proof Lemma ∀α > 1 RRR(HR3 (2α)) < RRR(HR3 (α)) p = 2, α = 50 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 39 / 106
  • 53. HR3 Theorem Theorem lim α→∞ RRR(HR3 (α)) = 1 Proof. α → ∞ ⇒ ∆ → 0 Thus, C(s) = {s}, C = {t} Index function : n i=1 ∆p (|ci (s) − ci | − 1)p ≤ ∆p αp ∆ → 0 ⇒ n i=1 |si − ti |p ≤ θp Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 40 / 106
  • 54. HR3 Idea Claims No loss of recall lim α→∞ RRR(HR3 (α)) = 1 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 41 / 106
  • 55. HR3 Evaluation Compare HR3 with LIMES 0.5’s HYPPO and SILK 2.5.1 Experimental Setup: Deduplicating DBpedia places by minimum elevation, elevation and maximum elevation (θ = 49m, 99m). Geonames and LinkedGeoData by longitude and latitude (θ = 1◦ , 9◦ ) Windows 7 Enterprise machine 64-bit computer with a 2.8GHz i7 processor with 8GB RAM. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 42 / 106
  • 56. HR3 Evaluation (Comparisons) Experiment 2: Deduplicating DBpedia places, θ = 99m 0.64 × 106 less comparisons Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 43 / 106
  • 57. HR3 Experiments (Comparisons) Experiment 4: Linking Geonames and LinkedGedData, θ = 9◦ 4.3 × 106 less comparisons Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 44 / 106
  • 58. HR3 Experiments (RRR) Experiment 3,4: Geonames and LGD, θ = 1, 9◦ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 45 / 106
  • 59. HR3 Experiments (Runtime) Experiment 3,4: Geonames and LGD, θ = 1, 9◦ 2 4 8 16 32 Granularity 0 20 40 60 80 100 120 140 160 180 Runtime(s) HYPPO (Exp 3) HR3 (Exp 3) HYPPO (Exp. 4) HR3 (Exp. 4) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 46 / 106
  • 60. HR3 Experiments (Runtime) Experiment 1, 2: DBpedia, θ = 49, 99m Experiment 3, 4: Geonames and LGD, θ = 1, 9◦ Exp. 1 Exp. 2 Exp. 3 Exp. 4 10 0 10 1 10 2 10 3 10 4 Runtime(s) HR3 HYPPO SILK Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 47 / 106
  • 61. HR3 Conclusion Main result: new category of algorithms for link discovery Outperforms the state of the art (runtime, comparisons) Future Work Combine HR3 with multi-indexing approach Devise resource management approach Develop other algorithms (esp. for strings) with the same/similar theoretical guarantees Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 48 / 106
  • 62. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 49 / 106
  • 63. AEGLE Why? :E1 rdfs:label "Engine failure"@en :E1 rdf:type :Error :E1 :beginDate :"2015-04-22T11:39:35" :E1 :endDate :"2015-04-22T11:39:37" :E2 rdfs:label "Car accident"@en :E2 rdf:type :Accident :E2 :beginDate :"2015-06-28T11:45:22" :E2 :endDate :"2015-06-28T11:45:24" Need to create links between events, e.g., :startsEvent Need to deal with volume and velocity Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 50 / 106
  • 64. AEGLE Event Definition Definition (Event) Events can be modeled as time intervals: v = (b(v), e(v)) b(v) is the beginning time (:beginDate) e(v) is the end time (:endDate) b(v) < e(v) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 51 / 106
  • 65. AEGLE Allen’s Interval Algebra Relation Notation Inverse Illustration X before Y bf (X, Y ) bfi(X, Y ) X Y X meets Y m(X, Y ) mi(X, Y ) X Y X finishes Y f (X, Y ) fi(X, Y ) X Y X starts Y st(X, Y ) sti(X, Y ) X Y X during Y d(X, Y ) di(X, Y ) X Y X equal Y eq(X, Y ) eq(X, Y ) X Y X overlaps with Y ov(X, Y ) ovi(X, Y ) X Y Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 52 / 106
  • 66. AEGLE Solution Aegle: Allen’s intErval alGebra for Link discovEry Efficient computation of temporal relations between events Allen’s Interval Algebra: distinct, exhaustive, and qualitative relations between time intervals Intuition Expressing 13 Allen relations using 8 atomic relations Time is ordered: Find matching entities using two sorted lists Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 53 / 106
  • 67. AEGLE Express st(s, t) using atomic relations s t b(s) = b(t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
  • 68. AEGLE Express st(s, t) using atomic relations s t b(s) = b(t) s t b(s) < e(t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
  • 69. AEGLE Express st(s, t) using atomic relations s t b(s) = b(t) s t b(s) < e(t) s t e(s) > b(t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
  • 70. AEGLE Express st(s, t) using atomic relations s t b(s) = b(t) s t b(s) < e(t) s t e(s) > b(t) s t e(s) < e(t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 54 / 106
  • 71. AEGLE Atomic relations Compute 8 atomic Boolean relations between begin and end points BeginBegin (BB) for b(s), b(t): BB1(s, t) ⇔ (b(s) < b(t)) BB0(s, t) ⇔ (b(s) = b(t)) BB−1(s, t) ⇔ (b(s) > b(t)) ⇔ ¬(BB1(s, t) ∨ BB0(s, t)) BeginEnd(BE) for b(s), e(t) EndBegin(EB) for e(s), b(t) EndEnd(EE) for e(s), e(t) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 55 / 106
  • 72. AEGLE Combination of relations s t st(s, t) ⇔ BB0 (s, t) ∧ BE1 (s, t) ∧ EB−1 (s, t) ∧ EE1 (s, t) ⇔ {BB0 (s, t) ∧ EE1 (s, t) } Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
  • 73. AEGLE Combination of relations s t st(s, t) ⇔ BB0 (s, t) ∧ BE1 (s, t) ∧ EB−1 (s, t) ∧ EE1 (s, t) ⇔ {BB0 (s, t) ∧ EE1 (s, t) } t s sti(s, t) ⇔ BB0 (s, t) ∧ BE1 (s, t) ∧ EB−1 (s, t) ∧ EE−1 (s, t) ⇔ {BB0 (s, t) ∧ EE−1 (s, t)} = { BB0 (s, t) ∧¬(EE0 (s, t)∨ EE1 (s, t))} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
  • 74. AEGLE Combination of relations s t st(s, t) ⇔ BB0 (s, t) ∧ BE1 (s, t) ∧ EB−1 (s, t) ∧ EE1 (s, t) ⇔ {BB0 (s, t) ∧ EE1 (s, t) } t s sti(s, t) ⇔ BB0 (s, t) ∧ BE1 (s, t) ∧ EB−1 (s, t) ∧ EE−1 (s, t) ⇔ {BB0 (s, t) ∧ EE−1 (s, t)} = { BB0 (s, t) ∧¬(EE0 (s, t)∨ EE1 (s, t))} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 56 / 106
  • 75. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 76. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For st: Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 77. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For st: Compute BB0 : Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 78. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For st: Compute BB0 : s1 s2 t1 t2 {(s1, t1), (s2, t1)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 79. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For st: Compute BB0 : s1 s2 t1 t2 {(s1, t1), (s2, t1)} Compute EE1 : s1 s2 t1 t2 {(s1, t1), (s2, t1), (s1, t2)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 80. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For st: Compute BB0 : s1 s2 t1 t2 {(s1, t1), (s2, t1)} Compute EE1 : s1 s2 t1 t2 {(s1, t1), (s2, t1), (s1, t2)} Intersection between BB0 and EE1 : {(s1, t1), (s2, t1)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 57 / 106
  • 81. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For sti: Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
  • 82. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For sti: Retrieve BB0 and EE1 : Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
  • 83. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For sti: Retrieve BB0 and EE1 : Compute EE0 : s1 s2 t1 t2 {(s2, t2)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
  • 84. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For sti: Retrieve BB0 and EE1 : Compute EE0 : s1 s2 t1 t2 {(s2, t2)} Union between EE0 and EE1 : {(s1, t1), (s2, t1), (s2, t1), (s2, t2)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
  • 85. AEGLE Algorithm for st, sti Source s1 s2 Target t1 t2 For sti: Retrieve BB0 and EE1 : Compute EE0 : s1 s2 t1 t2 {(s2, t2)} Union between EE0 and EE1 : {(s1, t1), (s2, t1), (s2, t1), (s2, t2)} Difference between BB0 and EE0 , EE1 : {(s1, t2), (s2, t2)} Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 58 / 106
  • 86. AEGLE Experimental Set-Up Datasets: S = T Log Type Dataset name Size Unique b(s) Unique e(s) Machinery 3KMachines 3,154 960 960 30KMachines 28,869 960 960 300KMachines 288,690 960 960 Query 3KQueries 3,888 3,636 3,638 30KQueries 30,635 3,070 3,070 300KQueries 303,991 184 184 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
  • 87. AEGLE Experimental Set-Up Datasets: S = T Log Type Dataset name Size Unique b(s) Unique e(s) Machinery 3KMachines 3,154 960 960 30KMachines 28,869 960 960 300KMachines 288,690 960 960 Query 3KQueries 3,888 3,636 3,638 30KQueries 30,635 3,070 3,070 300KQueries 303,991 184 184 State-of-the-art: Silk extended to deal with spatio-temporal data Baseline for eq using brute-force Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
  • 88. AEGLE Experimental Set-Up Datasets: S = T Log Type Dataset name Size Unique b(s) Unique e(s) Machinery 3KMachines 3,154 960 960 30KMachines 28,869 960 960 300KMachines 288,690 960 960 Query 3KQueries 3,888 3,636 3,638 30KQueries 30,635 3,070 3,070 300KQueries 303,991 184 184 State-of-the-art: Silk extended to deal with spatio-temporal data Baseline for eq using brute-force Evaluation measures: atomic runtime of each of the atomic relations relation runtime required to compute each Allen’s relation total runtime required to compute all 13 relations Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 59 / 106
  • 89. AEGLE Results Q1: Does the reduction of Allen relations to 8 atomic relations influence the overall runtime of the approach? Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 60 / 106
  • 90. AEGLE Results Q1: Does the reduction of Allen relations to 8 atomic relations influence the overall runtime of the approach? Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 61 / 106
  • 91. AEGLE Results Q2: How does Aegle perform when compared with the state of the art in terms of time efficiency? Log Type Dataset Name Total Runtime Aegle Aegle * Silk Machine 3KMachines 11.26 5.51 294.00 30KMachines 1,016.21 437.79 29,846.00 300KMachines 189,442.16 78,416.61 NA Query 3KQueries 26.94 17.91 541.00 30KQueries 988.78 463.27 33,502.00 300KQueries 211,996.88 86,884.98 NA Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 62 / 106
  • 92. AEGLE Results Machine Query Relation Approach 3KMachines 30KMachines 300KMachines 3KQueries 30KQueries 300KQueries m Aegle 0.02 0.19 3.42 0.02 0.21 3.89 Silk 23.00 2,219.00 NA 41.00 2,466.00 NA eq Aegle 0.05 0.79 49.84 0.05 0.45 348.51 Silk 23.00 2,250.00 NA 41.00 2,473.00 NA baseline 2.05 171.10 23,436.30 3.15 196.09 31,452.54 ovi Aegle 3.16 222.27 38,226.32 11.97 257.59 42,121.68 Silk 22.00 2,189.00 NA 42.00 2,503.00 NA Conclusion Aegle = efficient temporal linking Reduction of 13 Allen Interval relations to 8 atomic relations Efficiency: simple sorting with complexity O(n log n) Scalable LD Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 63 / 106
  • 93. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 64 / 106
  • 94. Gnome Problem What if ... Data does not fit memory C, i.e., |S| + |T| > |C| 1 Common problem 2 Growing size and number of datasets ≈ 150 × 109 triples ≈ 10 × 103 datasets Largest dataset with > 20 × 109 triples 3 Mostly in-memory solutions Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 65 / 106
  • 95. Gnome Idea Insight Most approaches rely on divide-and-merge paradigm Example: HR3 σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
  • 96. Gnome Idea Insight Most approaches rely on divide-and-merge paradigm Example: HR3 σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆ Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 66 / 106
  • 97. Gnome Formal Model 1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧ i Si = S Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
  • 98. Gnome Formal Model 1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧ i Si = S 2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧ j Tj = T Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
  • 99. Gnome Formal Model 1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧ i Si = S 2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧ j Tj = T 3 Find mapping function µ : S → 2T with elements of Si only compared with elements of sets in µ(Si ) union of results over all Si ∈ S is exactly M . Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
  • 100. Gnome Formal Model 1 Define S = {S1, . . . , Sn} with Si ⊆ S ∧ i Si = S 2 Define T = {T1, . . . , Sm} with Tj ⊆ T ∧ j Tj = T 3 Find mapping function µ : S → 2T with elements of Si only compared with elements of sets in µ(Si ) union of results over all Si ∈ S is exactly M . Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 67 / 106
  • 101. GNOME Task Graph Definition A task Eij stands for comparing Si with Tj ∈ µ(Si ) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
  • 102. GNOME Task Graph Definition A task Eij stands for comparing Si with Tj ∈ µ(Si ) Task Graph G = (V , E, wv , we), with V = S ∪ T wv (v) = |V | we(eij ) = |Si ||Tj | S1 3 T1 2 S2 2 T2 1 S3 4 T3 3 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
  • 103. GNOME Task Graph Definition A task Eij stands for comparing Si with Tj ∈ µ(Si ) Task Graph G = (V , E, wv , we), with V = S ∪ T wv (v) = |V | we(eij ) = |Si ||Tj | S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
  • 104. GNOME Task Graph Definition A task Eij stands for comparing Si with Tj ∈ µ(Si ) Task Graph G = (V , E, wv , we), with V = S ∪ T wv (v) = |V | we(eij ) = |Si ||Tj | S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 68 / 106
  • 105. GNOME Problem Reformulation Locality maximization Two-step approach: 1 Clustering: Find groups of nodes that fit in memory and 2 Scheduling: Compute sequence of groups that minimizes hard drive access S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 69 / 106
  • 106. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 107. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 6 3 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 108. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 6 3 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 109. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 6 3 4 2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 110. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 6 3 4 2 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 111. GNOME Step1: Clustering Naïve Approach Cluster by Si Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 6 3 4 2 6 4 12 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 70 / 106
  • 112. GNOME Step1: Clustering Greedy Approach Start by largest task Add connected largest tasks until none fits in C Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
  • 113. GNOME Step1: Clustering Greedy Approach Start by largest task Add connected largest tasks until none fits in C Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
  • 114. GNOME Step1: Clustering Greedy Approach Start by largest task Add connected largest tasks until none fits in C Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 6 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
  • 115. GNOME Step1: Clustering Greedy Approach Start by largest task Add connected largest tasks until none fits in C Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 6 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
  • 116. GNOME Step1: Clustering Greedy Approach Start by largest task Add connected largest tasks until none fits in C Example: |C| = 7 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 6 46 3 2 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 71 / 106
  • 117. GNOME Step2: Scheduling Insights Output of clustering: Sequence G1, . . . , GN of clusters Intuition: Consecutive clusters should share data Goal: Maximize overlap of generated sequence Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
  • 118. GNOME Step2: Scheduling Insights Output of clustering: Sequence G1, . . . , GN of clusters Intuition: Consecutive clusters should share data Goal: Maximize overlap of generated sequence Overlap o(Gi , Gj ) = v∈V (Gi )∩V (Gj ) |v| o(G4, G1) = 4 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 6 46 3 2 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
  • 119. GNOME Step2: Scheduling Insights Output of clustering: Sequence G1, . . . , GN of clusters Intuition: Consecutive clusters should share data Goal: Maximize overlap of generated sequence Overlap o(Gi , Gj ) = v∈V (Gi )∩V (Gj ) |v| o(G4, G1) = 4 Overlap o(G1, . . . , GN) = N−1 i=1 o(Gi , Gi+1) o(G4, G1, G2, G1) = 9 S1 3 T1 2 S2 2 T2 1 S3 4 T3 36 3 4 2 4 12 6 12 6 46 3 2 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 72 / 106
  • 120. GNOME Step2: Scheduling Best-Effort Select random pair of clusters If permutation improves overlap, then permute Relies on local knowledge for scalability Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
  • 121. GNOME Step2: Scheduling Best-Effort Select random pair of clusters If permutation improves overlap, then permute Relies on local knowledge for scalability Trick: ∆(Gi , Gj ) = (o(Gi−1, Gj ) + o(Gj , Gi+1) + o(Gj−1, Gi ) + o(Gi , Gj+1))− (o(Gi−1, Gi ) + o(Gi , Gi+1) + o(Gj−1, Gj ) + o(Gj , Gj+1)) (1) G3 G1 G2 G4 0 3 2 G4 G1 G2 G3 4 3 2 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 73 / 106
  • 122. GNOME Step2: Scheduling Greedy Start with random cluster Choose next cluster with largest overlap Global knowledge needed Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
  • 123. GNOME Step2: Scheduling Greedy Start with random cluster Choose next cluster with largest overlap Global knowledge needed G3 G1 G2 G4 0 3 2 G3 G2 G1 G4 2 3 4 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 74 / 106
  • 124. GNOME Experimental Setup Datasets 1 DBP: 1 million labels from DBpedia version 04-2015 2 LGD: 0.8 million places from LinkedGeoData Hardware 1 Intel Xeon E5-2650 v3 processors (2.30GHz) 2 Ubuntu 14.04.3 LTS 3 10GB RAM Measures 1 Total runtime 2 Hit ratio Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 75 / 106
  • 125. GNOME Evaluation of Clustering Only show results of LGD Results on DBP lead to similar insights Runtimes Hit Ratio |C| Naive Greedy Naive Greedy 100 568.0 646.3 0.57 0.77 200 518.3 594.0 0.66 0.80 400 532.0 593.3 0.67 0.80 1,000 5,974.0 118,454.7 0.51 0.64 2,000 6,168.0 115,450.0 0.51 0.63 4,000 7,118.3 121,901.7 0.50 0.63 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
  • 126. GNOME Evaluation of Clustering Only show results of LGD Results on DBP lead to similar insights Runtimes Hit Ratio |C| Naive Greedy Naive Greedy 100 568.0 646.3 0.57 0.77 200 518.3 594.0 0.66 0.80 400 532.0 593.3 0.67 0.80 1,000 5,974.0 118,454.7 0.51 0.64 2,000 6,168.0 115,450.0 0.51 0.63 4,000 7,118.3 121,901.7 0.50 0.63 Conclusion 1 Naïve approach is more efficient 2 Greedy approach is more effective 3 Select naïve approach for clustering Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 76 / 106
  • 127. GNOME Evaluation of Scheduling Only show results of LGD Results on DBP lead to similar insights Runtimes (ms) Hit ratio |C| Best-Effort Greedy Best-Effort Greedy 100 571.3 1,599.3 0.56 0.68 200 565.7 1,448.3 0.66 0.85 400 581.0 1,379.3 0.67 0.88 1,000 5,666.0 814,271.7 0.51 0.86 2,000 6,268.0 810,855.0 0.51 0.86 4,000 6,675.7 814,041.7 0.50 0.86 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
  • 128. GNOME Evaluation of Scheduling Only show results of LGD Results on DBP lead to similar insights Runtimes (ms) Hit ratio |C| Best-Effort Greedy Best-Effort Greedy 100 571.3 1,599.3 0.56 0.68 200 565.7 1,448.3 0.66 0.85 400 581.0 1,379.3 0.67 0.88 1,000 5,666.0 814,271.7 0.51 0.86 2,000 6,268.0 810,855.0 0.51 0.86 4,000 6,675.7 814,041.7 0.50 0.86 Conclusion 1 Best-effort approach more time-efficient 2 Greedy more effective 3 Best-effort approach is to be used for scheduling Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 77 / 106
  • 129. GNOME Comparison with Caching Approaches Runtimes (ms) |C| Gnome FIFO F2 LFU LRU SLRU 1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.3 2,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.0 4,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0 Hit ratio 1,000 0.51 0.17 0.16 0.19 0.17 0.17 2,000 0.51 0.29 0.30 0.32 0.30 0.30 4,000 0.51 0.54 0.55 0.59 0.55 0.56 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
  • 130. GNOME Comparison with Caching Approaches Runtimes (ms) |C| Gnome FIFO F2 LFU LRU SLRU 1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.3 2,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.0 4,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0 Hit ratio 1,000 0.51 0.17 0.16 0.19 0.17 0.17 2,000 0.51 0.29 0.30 0.32 0.30 0.30 4,000 0.51 0.54 0.55 0.59 0.55 0.56 Conclusion 1 Gnome is more time-efficient 2 Leads to higher hit rates in most cases Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 78 / 106
  • 131. GNOME Scalability 100k 200k 400k 800k LGD 362,141.3 1,452,922.0 5,934,038.7 20,001,965.7 DBP 434,630.7 1,790,350.7 6,677,923.0 12,653,403.3 Conclusion Sub-quadratic growth of runtime Runtime grows linearly with number of mappings For LGD, 360 – 370 mappings/s Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 79 / 106
  • 132. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 80 / 106
  • 133. Reach for the Cloud? Dealing with Time Complexity 1 Devise better algorithms Blocking Algorithms for given metrics PPJoin+, EDJoin HR3 Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
  • 134. Reach for the Cloud? Dealing with Time Complexity 1 Devise better algorithms Blocking Algorithms for given metrics PPJoin+, EDJoin HR3 2 Use parallel hardware Threads pools MapReduce Massively Parallel Hardware Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 81 / 106
  • 135. Reach for the Cloud? Parallel Implementations Different architectures Memory (shared, hybrid, distributed) Execution paths (different, same) Location (remote, local) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
  • 136. Reach for the Cloud? Parallel Implementations Different architectures Memory (shared, hybrid, distributed) Execution paths (different, same) Location (remote, local) Question When should which use which hardware? Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 82 / 106
  • 137. Reach for the Cloud? Premises and Goals Premises Given an algorithm that runs on all three architectures ... Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
  • 138. Reach for the Cloud? Premises and Goals Premises Given an algorithm that runs on all three architectures ... Note to self: Implement one Picked HR3 Reduction-ratio-optimal Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
  • 139. Reach for the Cloud? Premises and Goals Premises Given an algorithm that runs on all three architectures ... Note to self: Implement one Picked HR3 Reduction-ratio-optimal Reach for the Cloud? Goals Compare runtimes on all three parallel architectures Find break-even points Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 83 / 106
  • 140. Reach for the Cloud? HR3 1 Assume spaces with Minkowski metric and p ≥ 2 δ(s, t) = p n i=1 |si − ti |p Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 84 / 106
  • 141. Reach for the Cloud? HR3 2 Create grid of width ∆ = θ/α Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 85 / 106
  • 142. Reach for the Cloud? HR3 3 Link discovery condition describes a hypersphere Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 86 / 106
  • 143. Reach for the Cloud? HR3 4 Approximate hypersphere with hypercube Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 87 / 106
  • 144. Reach for the Cloud? HR3 5 Use index to discard portions of hypercube Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 88 / 106
  • 145. Reach for the Cloud? HR3 in GPUs Large number of simple compute cores Same instruction, multiple data Bottleneck: PCI Express Bus 1 Run discretization on CPU 2 Run indexing on GPU 3 Run comparisons on CPU Local memory Global/constant memory Local memory Local memory Local memory Work item Work group Global/constant cache Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 89 / 106
  • 146. Reach for the Cloud? HR3 on the Cloud Naive Approach Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
  • 147. Reach for the Cloud? HR3 on the Cloud Naive Approach 1 Rely on Map Reduce paradigm 2 Run discretization and assignment to cubes in map step 3 Run distance computation in reduce step Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 90 / 106
  • 148. Reach for the Cloud? HR3 on the Cloud Load Balancing 1 Run two jobs 2 Job1: Compute cube population matrix 3 Job2: Distribute balanced linking tasks across mappers and reducers Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 91 / 106
  • 149. Reach for the Cloud? Evaluation Hardware 1 CPU (Java) 32-core server running Linux 10.0.4 AMD Opteron 6128 clocket at 2.0GHz 2 GPU (C++) AMD Radeon 7870 with 20 compute units, 64 parallel threads Host program ran on Intel Core i7 3770 CPU with 8GB RAM and Linux 12.10 Ran the Java code on the same machine for scaling 3 Cloud (Java) 10 c1.medium nodes (2 cores, 1.7GB) for small experiments 30 c1.large nodes (8 cores, 7GB) for large experiments Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 92 / 106
  • 150. Reach for the Cloud? Experimental Setup Run deduplication task Evaluate behavior on different number of dimensions Important: Scale results Different hardware (2-7 times faster C++ workstation) Programming language Evaluate scalability (DS4) Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 93 / 106
  • 151. Reach for the Cloud? Results – DS1 DBpedia, 3 dimensions, 26K GPU scales better Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 94 / 106
  • 152. Reach for the Cloud? Results – DS2 DBpedia, 2 dimensions, 475K GPUs scale better across different θ Break-even point ≈ 108 results Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 95 / 106
  • 153. Reach for the Cloud? Results – DS3 LinkedGeoData, 2 dimensions, 500K Similar picture Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 96 / 106
  • 154. Reach for the Cloud? Results – Scalability LinkedGeoData, 2 dimensions, 6M Cloud (with load balancing) better for ≈ 1010 + results Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 97 / 106
  • 155. Reach for the Cloud? Summary Question: When should we use which hardware for link discovery? Results Implemented HR3 on different hardware Provided the first implementation of link discovery on GPUs Devised a load balancing approach for linking on the cloud Discovered 108 and 1010 results as break-even points Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 98 / 106
  • 156. Reach for the Cloud? Summary Question: When should we use which hardware for link discovery? Insights Load balancing important for using the cloud GPUs: Need faster buses that PCIe (e.g., Firewire speed) Accurate use of local resources sufficient for most of the current applications Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 99 / 106
  • 157. Table of Contents 1 Introduction 2 LIMES 3 MultiBlock 4 Reduction-Ratio-Optimal Link Discovery 5 Link Discovery for Temporal Data 6 GNOME 7 GPUs and Hadoop 8 Summary and Conclusion Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 100 / 106
  • 158. Summary and Conclusion Four approaches to scalability 1 Improve reduction ratio (LIMES, MultiBlock, HYPPO, HR3 , . . . ) 2 Reduce runtime complexity (AEGLE) 3 Better use of hardware (GNOME) 4 Improve execution of specifications Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
  • 159. Summary and Conclusion Four approaches to scalability 1 Improve reduction ratio (LIMES, MultiBlock, HYPPO, HR3 , . . . ) 2 Reduce runtime complexity (AEGLE) 3 Better use of hardware (GNOME) 4 Improve execution of specifications Challenges include 1 Increase number of reduction-ratio-optimal approaches (HR3 , ORCHID) 2 Adaptive resource scheduling 3 Self-regulating approaches 4 Distribution in modern in-memory architecture (SPARK) 5 . . . Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 101 / 106
  • 160. Acknowledgment This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227). Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 102 / 106
  • 161. References I Kleanthi Georgala, Mohamed Ahmed Sherif, and Axel-Cyrille Ngonga Ngomo. “An Efficient Approach for the Generation of Allen Relations”. In: ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands - Including Prestigious Applications of Artificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al. Vol. 285. Frontiers in Artificial Intelligence and Applications. IOS Press, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi: 10.3233/978-1-61499-672-9-948. url: http://dx.doi.org/10.3233/978-1-61499-672-9-948. Robert Isele, Anja Jentzsch, and Christian Bizer. “Efficient Multidimensional Blocking for Link Discovery without losing Recall”. In: Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. by Amélie Marian and Vasilis Vassalos. 2011. url: http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 103 / 106
  • 162. References II Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data”. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317. isbn: 978-1-57735-516-8. doi: 10.5591/978-1-57735-516-8/IJCAI11-385. url: http: //dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385. Axel-Cyrille Ngonga Ngomo et al. “When to Reach for the Cloud: Using Parallel Hardware for Link Discovery”. In: The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings. Ed. by Philipp Cimiano et al. Vol. 7882. Lecture Notes in Computer Science. Springer, 2013, pp. 275–289. isbn: 978-3-642-38287-1. doi: 10.1007/978-3-642-38288-8_19. url: http://dx.doi.org/10.1007/978-3-642-38288-8_19. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 104 / 106
  • 163. References III Axel-Cyrille Ngonga Ngomo. “Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures”. In: The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notes in Computer Science. Springer, 2012, pp. 378–393. isbn: 978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url: http://dx.doi.org/10.1007/978-3-642-35176-1_24. Axel-Cyrille Ngonga Ngomo. “ORCHID - Reduction-Ratio-Optimal Computation of Geo-spatial Distances for Link Discovery”. In: The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I. Ed. by Harith Alani et al. Vol. 8218. Lecture Notes in Computer Science. Springer, 2013, pp. 395–410. isbn: 978-3-642-41334-6. doi: 10.1007/978-3-642-41335-3_25. url: http://dx.doi.org/10.1007/978-3-642-41335-3_25. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 105 / 106
  • 164. References IV Axel-Cyrille Ngonga Ngomo. “HELIOS - Execution Optimization for Link Discovery”. In: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I. Ed. by Peter Mika et al. Vol. 8796. Lecture Notes in Computer Science. Springer, 2014, pp. 17–32. isbn: 978-3-319-11963-2. doi: 10.1007/978-3-319-11964-9_2. url: http://dx.doi.org/10.1007/978-3-319-11964-9_2. Axel-Cyrille Ngonga Ngomo and Mofeed M. Hassan. “The Lazy Traveling Salesman - Memory Management for Large-Scale Link Discovery”. In: The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Proceedings. Ed. by Harald Sack et al. Vol. 9678. Lecture Notes in Computer Science. Springer, 2016, pp. 423–438. isbn: 978-3-319-34128-6. doi: 10.1007/978-3-319-34129-3_26. url: http://dx.doi.org/10.1007/978-3-319-34129-3_26. Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Efficiency October 17, 2016 106 / 106