LDOW15

Interlinking: Performance Assessment of User
Evaluation vs. Supervised Learning Approaches
Mofeed Hassan, Jens Lehmann and Axel-Cyrille Ngonga Ngomo
AKSW
Department of Computer Science
University of Leipzig
Augustusplatz 10, 04109 Leipzig
{mounir,lehmann,ngonga}@informatik.uni-leipzig.de
WWW home page: http://limes.sf.net
June 14, 2015

t
LDOW 2015
Why Link Discovery?
1 Fourth Linked Data principle
2 Links are central for
Cross-ontology QA
Data Integration
Reasoning
Federated Queries
...
3 Valuable asset for enterprises
4 Linked Data on the Web:
10+ thousand datasets
89+ billion triples
≈ 500+ million links?
M. Hassan, J. Lehmann and A. Ngonga June 14, 2015 Interlinking: Humans vs. Machines
2 / 16

t
LDOW 2015
Why is it diﬃcult?
Deﬁnition (Link Discovery)
Given sets S and T of resources and relation R
Task: Find M = {(s, t) ∈ S × T : R(s, t)}
Common approaches:
Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
3 / 16

t
LDOW 2015
Deﬁnition (Link Discovery)
Given sets S and T of resources and relation R
Task: Find M = {(s, t) ∈ S × T : R(s, t)}
Common approaches:
Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
1 Time complexity
Large number of triples
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames (1ms per
comparison)
Decades for linking DBpedia and
LGD . . .
3 / 16

t
LDOW 2015
2 Complexity of speciﬁcations
Combination of several attributes required for high precision
Adequate atomic similarity functions diﬃcult to detect
Tedious discovery of most adequate mapping
4 / 16

t
LDOW 2015
Motivation
Several frameworks, e.g., LIMES,
SILK, SLINT+
Diﬀerences
1 Domain dependency
2 Automation & user involvement
3 Matching and learning techniques
(unsupervised, active, batch
learning)
5 / 16

t
LDOW 2015
Motivation
Several frameworks, e.g., LIMES,
SILK, SLINT+
Diﬀerences
1 Domain dependency
2 Automation & user involvement
3 Matching and learning techniques
(unsupervised, active, batch
learning)
Questions
Q1 How much does a link cost?
Q2 How much does it cost to train a framework?
Q3 Where is the break-even point?
5 / 16

t
LDOW 2015
Empirical Study: Methodology
Deﬁne m interlinking tasks
Request links from n annotators
Measure cost/links (Q1)
Measure tool performance for increasing learning data (Q2)
Find intersection of both lines (Q3)
6 / 16

t
LDOW 2015
Empirical Study
7 / 16

t
LDOW 2015
Empirical Study
Designed simple interface to visualize resources
Users are given links and can choose between correct,
incorrect and unsure
https://github.com/AKSW/Evalink
8 / 16

t
LDOW 2015
Empirical Study
Source Target Restrictions Properties
DBpedia LinkedGeoData Cities Label
(18114 instances) (979 instances) Latitude
Longitude
DBpedia LinkedMDB Label
(13429 instances) (628 instances) Film Director
ReleaseDate
DBpedia Drugbank Drug Label
(5352 instances) (1652 instances) GenericName
Deﬁned m = 3 tasks
Tasks submitted to n = 5 human annotators
Speciﬁcations for sets of links to annotate created manually
9 / 16

t
LDOW 2015
Evaluation: Costs/link
Task 1 Task 2 Task 3
User 1 36.8 23 10.2
User 2 21.5 18.8 20.4
User 3 12.3 39.4 9.8
User 4 10.9 11.3 34.6
User 5 38.9 43.8 44.7
High variance across users
Costs/link vary between roughly 10 and 40s/link
Highly dependent on familarity with the domain and
complexity of data
10 / 16

t
LDOW 2015
Evaluation: Human Accuracy
Task 1
Precision Recall F-Measure
User 1 0.81 0.98 0.89
User 2 0.83 1 0.91
User 3 0.74 0.9 0.81
User 4 0.81 0.98 0.88
User 5 0.82 0.99 0.9
Bias towards recall
F-measure varies between roughly 0.8 and 1
Variance due to familiarity with domain
11 / 16

t
LDOW 2015
Evaluation: Human Accuracy
Task 3
Precision Recall F-Measure
User 1 0.97 0.99 0.98
User 2 0.96 0.98 0.97
User 3 0.94 0.96 0.95
User 4 0.93 0.95 0.94
User 5 0.91 0.93 0.92
Bias towards recall
F-measure varies between roughly 0.8 and 1
Variance due to familiarity with domain
12 / 16

t
LDOW 2015
Evaluation: Machine Accuracy
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
Cost(Min.)
F−Measure
GAL
GCAL
GBL
Human
Three learning algorithms (batch learning, active learning,
clustering-based active learning)
13 / 16

t
LDOW 2015
Evaluation: Machine Accuracy
0 10 20 30 40 50 60
0
0.2
0.4
0.6
0.8
1
Cost(Min.)
F−Measure
GAL
GCAL
GBL
Human
Machines with above-human performance
14 / 16

t
LDOW 2015
Conclusions
Preliminary results
High variance of annotation costs
Costs depend mostly on evaluator and familarity with domain
(roughly 10 to 40s/link)
Machines outperform humans even on small tasks
Future work
Extend experiments with larger crowd
Try other machine-learning approaches
Evaluate diﬀerent interfaces
15 / 16

t
LDOW 2015
That’s all Folks!
Thank you!
Questions?
Axel Ngonga
University of Leipzig
AKSW Research Group
ngonga@informatik.uni-leipzig.de
http://limes.sf.net
16 / 16

LDOW15

Recommended

Recommended

More Related Content

Similar to LDOW15

Similar to LDOW15 (20)

LDOW15