Talk at IESD 2014 workshop in Riva del Garda (at ISWC).
Abstract The Linked Open Data cloud provides a wide range of different types of information which are interlinked and connected. When a user or application is interested in specific types of information under time constraints it is best to ex- plore this vast knowledge network in a focused and directed way. In this paper we address the novel task of focused exploration of Linked Open Data for geospatial resources, helping journalists in real-time during breaking news stories to find contextual geospatial information related to geoparsed content. After formalising the task of focused exploration, we present and evaluate five approaches based on three different paradigms. Our results on a dataset with 425,338 entities show that focused exploration on the Linked Data cloud is feasible and can be implemented at very high levels of accuracy of more than 98%.
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Focused Exploration of Geospatial Context on Linked Open Data
1. Focused Exploration of Geospatial
Context on Linked Open Data
Thomas Gottron, Johannes Schmitz, Stuart E. Middleton
20 October 2014
IESD workshop, Riva del Garda
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 1
3. Challenge: Focused Exploration of LOD
• Linked Data entities
• (Semantic) link
structure
Thomas Gottron Focused Exploration of LOD 3
4. Challenge: Focused Exploration of LOD
• Linked Data entities
• (Semantic) link
structure
• „Relevant“ entities
Thomas Gottron Focused Exploration of LOD 4
5. Challenge: Focused Exploration of LOD
• Linked Data entities
• (Semantic) link
structure
• „Relevant“ entities
• Seed entity
Thomas Gottron Focused Exploration of LOD 5
6. Challenge: Focused Exploration of LOD
• Linked Data entities
• (Semantic) link
structure
• „Relevant“ entities
• Seed entity
? ?
? ?
? ?
Classification:
Which links lead to
relevant entities?
Ranking:
How probable is a link
leading to a relevant entity?
Use Cases:
Guided exploration
Focused LOD crawler
Thomas Gottron Focused Exploration of LOD 6
7. Focused exploration of Geospatial Context
Rovereto
Relevant entities:
Locations semantically
related to seed entities
Bensheim (Germany)
Thomas Gottron Focused Exploration of LOD 7
8. Focused Exploration: Formalisation
• E: set of entities (URIs)
• R: set of RDF triples (s,p,o)
s∈ L
– Restricted to s,o ∈ E
wgs84:long
• L⊆E: relevant entities
-1.404
– For us: Locations with coordinates
• Task: for given s‘ and all (s‘,p,o) ∈ R
– Classification: Predict which o are in L
– Ranking: Sort object entities o starting from the
one presumed most probable to be relevant
wgs84:lat
50.897
Thomas Gottron Focused Exploration of LOD 8
9. 5 Approaches
• Based on 3 paradigms:
– Schema semantics (1 approach)
– Supervised machine learning (2 approaches)
– Information Retrieval inspired (2 approaches)
Thomas Gottron Focused Exploration of LOD 9
10. Exploration based on Schema Semantics
• Exploit rdfs:range definitions of link predicates
rdfs:range
dbpedia:Place
rdfs:subClassOf
dbponto:twinCity dbpedia:City
• Follow links which lead to locations
Thomas Gottron Focused Exploration of LOD 10
11. Exploration based on Schema Semantics
s
Classification
p1
p2
• Range of any pi is a
location?
àLabel = relevant
o
pm
Ranking
Location?
• Re-use classification:
– Relevant before
irrelevant
...
Thomas Gottron Focused Exploration of LOD 11
12. Supervised Machine Learning
• Use incoming link predicates as features
– Learn predicates which typically leading to locations
p4
p6
p2
p3 o‘
o
xxx
wgs84:lat
yyy
wgs84:long
• Train a classifier (e.g. Naive Bayes)
2 Variations:
Use all or only
observed predicates
Thomas Gottron Focused Exploration of LOD 12
13. Supervised Machine Learning
s
Classification
•
p1
P(o ∈ L) > P(o ∉ L)?
àLabel = relevant
o
pm
Ranking
Location?
• Rank by odds:
p2
...
O(o ∈ L) =
P(o ∈ L)
P(o ∉ L)
Thomas Gottron Focused Exploration of LOD 13
14. IR Inspired Approaches
• Discriminativeness of predicates (inspired by tf-idf)
• Property relevance frequency:
• Inverse property frequency
• Combine into prf-ipf and prr-ipf
• Total score ρ: aggregate over all predicates
prf = c(p, L)
ipf = log
c(∗,∗)
c(p,∗)
"
# $
Thomas Gottron Focused Exploration of LOD 14
%
& '
o p3
2nd Variation:
prr: normalised prf
15. IR Inspired Approaches
s
Classification
p1
p2
• Determine threshold
– Nearest centroid
o
pm
Ranking
Location?
• Rank by score
...
ρ prr-ipf (o)
Thomas Gottron Focused Exploration of LOD 15
17. Performance (Ranking)
1
0.8
0.6
0.4
0.2
0
ROC
1
0.975
0.95
0 0.025 0.05
0 0.2 0.4 0.6 0.8 1
random
Schema Semantics
NB (all predicates)
NB (present predicates)
prf-ipf
prr-ipf
Thomas Gottron Focused Exploration of LOD 17
18. Performance (Classification & Ranking)
2. Average performance of approaches († indicates significant improvements confidence level ⇢ = 0.01)
Method Recall Precision F1 Accuracy AUC
Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552
NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970
NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968
prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958
prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769
performance in bold. Furthermore, we marked the results where we had a significant over the second best method at confidence level of ⇢ = 0.01. The aggregated
basically Thomas Gottron confirm the observations Focused Exploration made of above. LOD In general, when considering 18
19. Summary
• Focused exploration feasible
• ML approach performing best
• Future work:
– Other data sets
– Generalise scenario (more than locations)
– Better approaches using more features
Thomas Gottron Focused Exploration of LOD 19
20. Questions?
Thomas Gottron
Institute for Web Science and Technologies
Universität Koblenz-Landau
gottron@uni-koblenz.de
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 20