Focused Exploration of Geospatial Context on Linked Open Data

Focused Exploration of Geospatial
Context on Linked Open Data
Thomas Gottron, Johannes Schmitz, Stuart E. Middleton
20 October 2014
IESD workshop, Riva del Garda
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 1

Challenge: Focused Exploration of LOD
• Linked Data entities
Thomas Gottron Focused Exploration of LOD 2

• (Semantic) link
structure

• (Semantic) link
structure
• „Relevant“ entities

• (Semantic) link
structure
• Seed entity

• (Semantic) link
structure
• Seed entity
? ?
? ?
? ?
Classification:
Which links lead to
relevant entities?
Ranking:
How probable is a link
leading to a relevant entity?
Use Cases:
Guided exploration
Focused LOD crawler

Focused exploration of Geospatial Context
Rovereto
Relevant entities:
Locations semantically
related to seed entities
Bensheim (Germany)

Focused Exploration: Formalisation
• E: set of entities (URIs)
• R: set of RDF triples (s,p,o)
s∈ L
– Restricted to s,o ∈ E
wgs84:long
• L⊆E: relevant entities
-1.404
– For us: Locations with coordinates
• Task: for given s‘ and all (s‘,p,o) ∈ R
– Classification: Predict which o are in L
– Ranking: Sort object entities o starting from the
one presumed most probable to be relevant
wgs84:lat
50.897

5 Approaches
• Based on 3 paradigms:
– Schema semantics (1 approach)
– Supervised machine learning (2 approaches)
– Information Retrieval inspired (2 approaches)

Exploration based on Schema Semantics
• Exploit rdfs:range definitions of link predicates
rdfs:range
dbpedia:Place
rdfs:subClassOf
dbponto:twinCity dbpedia:City
• Follow links which lead to locations

Exploration based on Schema Semantics
s
Classification
p1
p2
• Range of any pi is a
location?
àLabel = relevant
o
pm
Ranking
Location?
• Re-use classification:
– Relevant before
irrelevant
...

Supervised Machine Learning
• Use incoming link predicates as features
– Learn predicates which typically leading to locations
p4
p6
p2
p3 o‘
o
xxx
wgs84:lat
yyy
wgs84:long
• Train a classifier (e.g. Naive Bayes)
2 Variations:
Use all or only
observed predicates

Supervised Machine Learning
s
Classification
•
p1
P(o ∈ L) > P(o ∉ L)?
àLabel = relevant
o
pm
Ranking
Location?
• Rank by odds:
p2
...
O(o ∈ L) =
P(o ∈ L)
P(o ∉ L)

IR Inspired Approaches
• Discriminativeness of predicates (inspired by tf-idf)
• Property relevance frequency:
• Inverse property frequency
• Combine into prf-ipf and prr-ipf
• Total score ρ: aggregate over all predicates
prf = c(p, L)
ipf = log
c(∗,∗)
c(p,∗)
"
# $
%
& '
o p3
2nd Variation:
prr: normalised prf

IR Inspired Approaches
s
Classification
p1
p2
• Determine threshold
– Nearest centroid
o
pm
Ranking
Location?
• Rank by score
...
ρ prr-ipf (o)

Evaluation
• Metrics:
– Ranking:
• ROC curves
• AUC
– Classification:
• Precision
• Recall
• F1
• Accuracy
• Cross validation:
– 10-times / 10-fold
– Averages
99,951 entities
1,728,633 links
425,338 entities
128,171 relevant
Seed
Exploration
owl:sameAs

Performance (Ranking)
1
0.8
0.6
0.4
0.2
0
ROC
1
0.975
0.95
0 0.025 0.05
0 0.2 0.4 0.6 0.8 1
random
Schema Semantics
NB (all predicates)
NB (present predicates)
prf-ipf
prr-ipf

Performance (Classification & Ranking)
2. Average performance of approaches († indicates significant improvements confidence level ⇢ = 0.01)
Method Recall Precision F1 Accuracy AUC
Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552
NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970
NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968
prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958
prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769
performance in bold. Furthermore, we marked the results where we had a significant over the second best method at confidence level of ⇢ = 0.01. The aggregated
basically Thomas Gottron confirm the observations Focused Exploration made of above. LOD In general, when considering 18

Summary
• Focused exploration feasible
• ML approach performing best
• Future work:
– Other data sets
– Generalise scenario (more than locations)
– Better approaches using more features

Questions?
Thomas Gottron
Institute for Web Science and Technologies
Universität Koblenz-Landau
gottron@uni-koblenz.de
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 20

Focused Exploration of Geospatial Context on Linked Open Data

More Related Content

Viewers also liked

Similar to Focused Exploration of Geospatial Context on Linked Open Data

More from Thomas Gottron

Recently uploaded

Focused Exploration of Geospatial Context on Linked Open Data