Focused Exploration of Geospatial 
Context on Linked Open Data 
Thomas Gottron, Johannes Schmitz, Stuart E. Middleton 
20 October 2014 
IESD workshop, Riva del Garda 
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 1
Challenge: Focused Exploration of LOD 
• Linked Data entities 
Thomas Gottron Focused Exploration of LOD 2
Challenge: Focused Exploration of LOD 
• Linked Data entities 
• (Semantic) link 
structure 
Thomas Gottron Focused Exploration of LOD 3
Challenge: Focused Exploration of LOD 
• Linked Data entities 
• (Semantic) link 
structure 
• „Relevant“ entities 
Thomas Gottron Focused Exploration of LOD 4
Challenge: Focused Exploration of LOD 
• Linked Data entities 
• (Semantic) link 
structure 
• „Relevant“ entities 
• Seed entity 
Thomas Gottron Focused Exploration of LOD 5
Challenge: Focused Exploration of LOD 
• Linked Data entities 
• (Semantic) link 
structure 
• „Relevant“ entities 
• Seed entity 
? ? 
? ? 
? ? 
Classification: 
Which links lead to 
relevant entities? 
Ranking: 
How probable is a link 
leading to a relevant entity? 
Use Cases: 
Guided exploration 
Focused LOD crawler 
Thomas Gottron Focused Exploration of LOD 6
Focused exploration of Geospatial Context 
Rovereto 
Relevant entities: 
Locations semantically 
related to seed entities 
Bensheim (Germany) 
Thomas Gottron Focused Exploration of LOD 7
Focused Exploration: Formalisation 
• E: set of entities (URIs) 
• R: set of RDF triples (s,p,o) 
s∈ L 
– Restricted to s,o ∈ E 
wgs84:long 
• L⊆E: relevant entities 
-1.404 
– For us: Locations with coordinates 
• Task: for given s‘ and all (s‘,p,o) ∈ R 
– Classification: Predict which o are in L 
– Ranking: Sort object entities o starting from the 
one presumed most probable to be relevant 
wgs84:lat 
50.897 
Thomas Gottron Focused Exploration of LOD 8
5 Approaches 
• Based on 3 paradigms: 
– Schema semantics (1 approach) 
– Supervised machine learning (2 approaches) 
– Information Retrieval inspired (2 approaches) 
Thomas Gottron Focused Exploration of LOD 9
Exploration based on Schema Semantics 
• Exploit rdfs:range definitions of link predicates 
rdfs:range 
dbpedia:Place 
rdfs:subClassOf 
dbponto:twinCity dbpedia:City 
• Follow links which lead to locations 
Thomas Gottron Focused Exploration of LOD 10
Exploration based on Schema Semantics 
s 
Classification 
p1 
p2 
• Range of any pi is a 
location? 
àLabel = relevant 
o 
pm 
Ranking 
Location? 
• Re-use classification: 
– Relevant before 
irrelevant 
... 
Thomas Gottron Focused Exploration of LOD 11
Supervised Machine Learning 
• Use incoming link predicates as features 
– Learn predicates which typically leading to locations 
p4 
p6 
p2 
p3 o‘ 
o 
xxx 
wgs84:lat 
yyy 
wgs84:long 
• Train a classifier (e.g. Naive Bayes) 
2 Variations: 
Use all or only 
observed predicates 
Thomas Gottron Focused Exploration of LOD 12
Supervised Machine Learning 
s 
Classification 
• 
p1 
P(o ∈ L) > P(o ∉ L)? 
àLabel = relevant 
o 
pm 
Ranking 
Location? 
• Rank by odds: 
p2 
... 
O(o ∈ L) = 
P(o ∈ L) 
P(o ∉ L) 
Thomas Gottron Focused Exploration of LOD 13
IR Inspired Approaches 
• Discriminativeness of predicates (inspired by tf-idf) 
• Property relevance frequency: 
• Inverse property frequency 
• Combine into prf-ipf and prr-ipf 
• Total score ρ: aggregate over all predicates 
prf = c(p, L) 
ipf = log 
c(∗,∗) 
c(p,∗) 
" 
# $ 
Thomas Gottron Focused Exploration of LOD 14 
% 
& ' 
o p3 
2nd Variation: 
prr: normalised prf
IR Inspired Approaches 
s 
Classification 
p1 
p2 
• Determine threshold 
– Nearest centroid 
o 
pm 
Ranking 
Location? 
• Rank by score 
... 
ρ prr-ipf (o) 
Thomas Gottron Focused Exploration of LOD 15
Evaluation 
• Metrics: 
– Ranking: 
• ROC curves 
• AUC 
– Classification: 
• Precision 
• Recall 
• F1 
• Accuracy 
• Cross validation: 
– 10-times / 10-fold 
– Averages 
99,951 entities 
1,728,633 links 
425,338 entities 
128,171 relevant 
Seed 
Exploration 
owl:sameAs 
Thomas Gottron Focused Exploration of LOD 16
Performance (Ranking) 
1 
0.8 
0.6 
0.4 
0.2 
0 
ROC 
1 
0.975 
0.95 
0 0.025 0.05 
0 0.2 0.4 0.6 0.8 1 
random 
Schema Semantics 
NB (all predicates) 
NB (present predicates) 
prf-ipf 
prr-ipf 
Thomas Gottron Focused Exploration of LOD 17
Performance (Classification & Ranking) 
2. Average performance of approaches († indicates significant improvements confidence level ⇢ = 0.01) 
Method Recall Precision F1 Accuracy AUC 
Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552 
NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970 
NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968 
prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958 
prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769 
performance in bold. Furthermore, we marked the results where we had a significant over the second best method at confidence level of ⇢ = 0.01. The aggregated 
basically Thomas Gottron confirm the observations Focused Exploration made of above. LOD In general, when considering 18
Summary 
• Focused exploration feasible 
• ML approach performing best 
• Future work: 
– Other data sets 
– Generalise scenario (more than locations) 
– Better approaches using more features 
Thomas Gottron Focused Exploration of LOD 19
Questions? 
Thomas Gottron 
Institute for Web Science and Technologies 
Universität Koblenz-Landau 
gottron@uni-koblenz.de 
Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 20

Focused Exploration of Geospatial Context on Linked Open Data

  • 1.
    Focused Exploration ofGeospatial Context on Linked Open Data Thomas Gottron, Johannes Schmitz, Stuart E. Middleton 20 October 2014 IESD workshop, Riva del Garda Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 1
  • 2.
    Challenge: Focused Explorationof LOD • Linked Data entities Thomas Gottron Focused Exploration of LOD 2
  • 3.
    Challenge: Focused Explorationof LOD • Linked Data entities • (Semantic) link structure Thomas Gottron Focused Exploration of LOD 3
  • 4.
    Challenge: Focused Explorationof LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities Thomas Gottron Focused Exploration of LOD 4
  • 5.
    Challenge: Focused Explorationof LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities • Seed entity Thomas Gottron Focused Exploration of LOD 5
  • 6.
    Challenge: Focused Explorationof LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities • Seed entity ? ? ? ? ? ? Classification: Which links lead to relevant entities? Ranking: How probable is a link leading to a relevant entity? Use Cases: Guided exploration Focused LOD crawler Thomas Gottron Focused Exploration of LOD 6
  • 7.
    Focused exploration ofGeospatial Context Rovereto Relevant entities: Locations semantically related to seed entities Bensheim (Germany) Thomas Gottron Focused Exploration of LOD 7
  • 8.
    Focused Exploration: Formalisation • E: set of entities (URIs) • R: set of RDF triples (s,p,o) s∈ L – Restricted to s,o ∈ E wgs84:long • L⊆E: relevant entities -1.404 – For us: Locations with coordinates • Task: for given s‘ and all (s‘,p,o) ∈ R – Classification: Predict which o are in L – Ranking: Sort object entities o starting from the one presumed most probable to be relevant wgs84:lat 50.897 Thomas Gottron Focused Exploration of LOD 8
  • 9.
    5 Approaches •Based on 3 paradigms: – Schema semantics (1 approach) – Supervised machine learning (2 approaches) – Information Retrieval inspired (2 approaches) Thomas Gottron Focused Exploration of LOD 9
  • 10.
    Exploration based onSchema Semantics • Exploit rdfs:range definitions of link predicates rdfs:range dbpedia:Place rdfs:subClassOf dbponto:twinCity dbpedia:City • Follow links which lead to locations Thomas Gottron Focused Exploration of LOD 10
  • 11.
    Exploration based onSchema Semantics s Classification p1 p2 • Range of any pi is a location? àLabel = relevant o pm Ranking Location? • Re-use classification: – Relevant before irrelevant ... Thomas Gottron Focused Exploration of LOD 11
  • 12.
    Supervised Machine Learning • Use incoming link predicates as features – Learn predicates which typically leading to locations p4 p6 p2 p3 o‘ o xxx wgs84:lat yyy wgs84:long • Train a classifier (e.g. Naive Bayes) 2 Variations: Use all or only observed predicates Thomas Gottron Focused Exploration of LOD 12
  • 13.
    Supervised Machine Learning s Classification • p1 P(o ∈ L) > P(o ∉ L)? àLabel = relevant o pm Ranking Location? • Rank by odds: p2 ... O(o ∈ L) = P(o ∈ L) P(o ∉ L) Thomas Gottron Focused Exploration of LOD 13
  • 14.
    IR Inspired Approaches • Discriminativeness of predicates (inspired by tf-idf) • Property relevance frequency: • Inverse property frequency • Combine into prf-ipf and prr-ipf • Total score ρ: aggregate over all predicates prf = c(p, L) ipf = log c(∗,∗) c(p,∗) " # $ Thomas Gottron Focused Exploration of LOD 14 % & ' o p3 2nd Variation: prr: normalised prf
  • 15.
    IR Inspired Approaches s Classification p1 p2 • Determine threshold – Nearest centroid o pm Ranking Location? • Rank by score ... ρ prr-ipf (o) Thomas Gottron Focused Exploration of LOD 15
  • 16.
    Evaluation • Metrics: – Ranking: • ROC curves • AUC – Classification: • Precision • Recall • F1 • Accuracy • Cross validation: – 10-times / 10-fold – Averages 99,951 entities 1,728,633 links 425,338 entities 128,171 relevant Seed Exploration owl:sameAs Thomas Gottron Focused Exploration of LOD 16
  • 17.
    Performance (Ranking) 1 0.8 0.6 0.4 0.2 0 ROC 1 0.975 0.95 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1 random Schema Semantics NB (all predicates) NB (present predicates) prf-ipf prr-ipf Thomas Gottron Focused Exploration of LOD 17
  • 18.
    Performance (Classification &Ranking) 2. Average performance of approaches († indicates significant improvements confidence level ⇢ = 0.01) Method Recall Precision F1 Accuracy AUC Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552 NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970 NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968 prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958 prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769 performance in bold. Furthermore, we marked the results where we had a significant over the second best method at confidence level of ⇢ = 0.01. The aggregated basically Thomas Gottron confirm the observations Focused Exploration made of above. LOD In general, when considering 18
  • 19.
    Summary • Focusedexploration feasible • ML approach performing best • Future work: – Other data sets – Generalise scenario (more than locations) – Better approaches using more features Thomas Gottron Focused Exploration of LOD 19
  • 20.
    Questions? Thomas Gottron Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 20