The intensive growth of the Linked Open Data Cloud has spawned a web of data where a multitude of data sources provides huge amounts of valuable information across different domains. Nowadays, when accessing and using Linked Data more and more often the challenging question is not so much whether there is relevant data available, but rather where it can be found and how it is structured. Thus, index structures play an important role for making use of the information in LOD cloud. In this talk I will address three aspects of Linked Data index structures: (1) a high level view and categorization of indices structures and how they can be queried and explored, (2) approaches for building index structures and the need to maintain them and (3) some example applications which greatly benefit from indices over linked data.
Making Use of the Linked Data Cloud: The Role of Index Structures
1. Institute for Web Science & Technologies – WeST
Making Use of the
Linked Data Cloud:
The Role of Index Structures
Thomas Gottron
March 20th, 2014
FGDB Frühjahrstreffen
2. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 2Role of Index Structures on LOD
Making Use of the Linked Data Cloud ...
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
LOD: a rich, huge, diverse, public and distributed knowledge base on the Web.
Pros Cons
rich
knowledge
base
diversepublic
huge
on the Web
diversedistributed
Shall I?
3. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 3Role of Index Structures on LOD
Challenges Underlying the „Cons“
Volume
Semi-
structured
No
schema
No central
access point
Multitude of
data sources
Quality
Dynamics
Availability
huge
4. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 4Role of Index Structures on LOD
Making Use of the Linked Data Cloud ...
Pros Cons
rich
knowledge
base
diversepublic
huge
on the Web
diversedistributed
Shall I?
5. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 5Role of Index Structures on LOD
20 years ago ...
6. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 6Role of Index Structures on LOD
Making Use of the World Wide Web... Shall I?
Source: Chris 73 / Wikimedia Commons
Pros Cons
rich
document
collection
diversepublic
huge
on the
Internet
diversedistributed
Technical
solutions to
the problems
7. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 7Role of Index Structures on LOD
Making Use of the Linked Data Cloud ... Shall I?
Pros Cons
rich
knowledge
base
diversepublic
huge
on the Web
diversedistributed
Indexstructures
Provide:
Solutions for the storage,
management, organization
of, and access to a
rich, huge, diverse
distributed knowledge
base on the Web.
8. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 8Role of Index Structures on LOD
Types of
Indices
Building
Indices
Using
Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Search
data
structure
Efficientstorage
andretrieval
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1
rdf:type dc:creator
E2
Bad News ...dc:title
foaf:Document
swrc:InProceedings
rdf:type
9. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 9Role of Index Structures on LOD
Types of
Indices
Building
Indices
Using
Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Search
data
structure
Efficientstorage
andretrieval
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1
rdf:type dc:creator
E2
Bad News ...dc:title
foaf:Document
swrc:InProceedings
rdf:type
10. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 10Role of Index Structures on LOD
Data Format
§ Linked Data as N-Quads:
triple – what is the information?
context URI – where does it come from?
s op
c
( )s op c
11. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 11Role of Index Structures on LOD
Index Models
12. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 12Role of Index Structures on LOD
(Abstract) Index Models
w D : Data elements to be retrieved (payload)
w K : Key elements to access the data (index elements)
w σ : Selection function: How to get data for a key
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
DK σ
Searchdata
structure
Efficientstorage
andretrieval
℘( )
Data items / PayloadKeys
13. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 13Role of Index Structures on LOD
Concrete Example: Subject Based Index Model
ukob:Gottron
ukob:Staab
ukob:Schegi
...
tud:CGottron
(ukob:Gottron, rdf:type, foaf:Person)
(ukob:Gottron, foaf:knows, ukob:Staab)
...
(ukob:Staab, swrc:institution, ukob:WeST)
(ukob:Staab, foaf:name, „Steffen Staab“)
...
(ukob:Schegi, rdf:type, foaf:Person)
(ukob:Schegi, foaf:name, „Stefan Scheglmann“)
(tud:CGottron, swrc:institution, tud:KOM)
(tud:CGottron, foaf:knows, ukob:Gottron)
...
14. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 14Role of Index Structures on LOD
Schema-level Indices
15. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 15Role of Index Structures on LOD
Schema Information on the LOD Cloud
(No)
Schema?
Guidelines / best practices
Automatic tools Social effects
Emerging
Schema!
Induce from data
observations
16. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 16Role of Index Structures on LOD
Examples for Schema Information
p1
x
p2
p3
{p1, p2, p3}
...
x, ... {cA, cB}
...
y, ...
rdf:type
y
cB
cA
rdf:type
Property Set Type Set
17. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 17Role of Index Structures on LOD
Indexing „Styles“ for the Payload
Full Caching
local
Web
s op c
Triples
local
Web
s op
Entities
local
Web
s
Data Sources
local
Web
c
18. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 18Role of Index Structures on LOD
Schema-based Access to the LOD cloud
?
foaf:Document
fb:Computer_Scientist
dc:creator
x
swrc:InProceedings
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type swrc:InProceedings .
?x dc:creator ?y .
?y rdf:type fb:Computer_Scientist
}
19. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 19Role of Index Structures on LOD
Schema-based Access to the LOD cloud
Schema-
level Index
Where?
• ACM
• DBLP
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type swrc:InProceedings .
?x dc:creator ?y .
?y rdf:type fb:Computer_Scientist
}
20. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 20Role of Index Structures on LOD
Building
Indices
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
Types of
Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Search
data
structure
Efficientstorage
andretrieval
Using
Indices
E1
rdf:type dc:creator
E2
Bad News ...dc:title
foaf:Document
swrc:InProceedings
rdf:type
21. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 21Role of Index Structures on LOD
Index Construction
22. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 22Role of Index Structures on LOD
Building Indices: Operators
§ Combination of few simple operations
w Aggregate, Join, Invert
§ Example: Property Set index
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s3 o3p1 c1
s3 o4p2 c1
s4 o1p3 c1
s1 p1 p2
s2 p2
s3 p1 p2
s4 p3
p1 p2 s1 s3
p2 s2
p3 s4
Aggregate Invert
23. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 23Role of Index Structures on LOD
12 Implemented Index Models
§ Triple based
w Subject à Triple
w Predicate à Triple
w Object à Triple
§ Meta data
w Keywords à Triple
w Context à Triple
w PLD à Triple
§ Schema-level
w RDF Type à Entity
w Type set (TS) à Entity
w Property set (PS) à Entity
w Incoming property set (IPS) à Entity
w Type and properties (ECS) à Entity
w SchemEX à Entity
https://github.com/gottron/lod-index-models
24. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 24Role of Index Structures on LOD
Indices over Evolving Data
25. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 25Role of Index Structures on LOD
Index Maintenance
2007
2008
2009
2010
2011
Not just growth, but
also deletion and
modification of data
26. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 26Role of Index Structures on LOD
How to Measure Accuracy?
§ Queries?
w No established query log
for data set
w Different key elements
require different queries
w Cover all of the index
§ Distributions!
w Relevant to several
applications
w Established metrics for
comparison
SPARQL
27. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 27Role of Index Structures on LOD
Quantifying Divergence of Index Accuracy over Time
Index construction / Estimation of distributions
...
...
T0 (Base) T1 T2
T3 Tn
...
Tn-1
T0
„deviation“
T1 T2
T3 TnTn-1
28. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 28Role of Index Structures on LOD
Evolving Data: Normalised Perplexity
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
Subject Predicate Object
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
Context Keywords PLD
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
RDF Type
TS
PS
IPS
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
ECS SchemEX
29. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 29Role of Index Structures on LOD
Evolving Data: Normalised Perplexity (Zoom in)
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
Subject Predicate Object
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
Context Keywords PLD
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
RDF Type
TS
PS
IPS
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
Norm.Perplexity
Week of Data Snapshot
ECS SchemEX
30. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 30Role of Index Structures on LOD
Using
Indices
E1
rdf:type dc:creator
E2
Bad News ...dc:title
foaf:Document
swrc:InProceedings
rdf:type
Types of
Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Search
data
structure
Efficientstorage
andretrieval
Building
Indices
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
31. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 31Role of Index Structures on LOD
Programming Support
32. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 32Role of Index Structures on LOD
LITEQ and NPQL
§ Support programming with Linked Data sources
§ NPQL (Node Path Query Language)
w Intensional queries à class descriptions, properties
w Extensional queries à instance data
§ LITEQ
w Implementiation of NPQL (F# type provider)
w Autocompletion
33. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 33Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.````http://example.org/ns#dog``
``http://example.org/ns#cat``
``http://example.org/ns#person``
...
34. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 34Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``
35. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 35Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``↵
.PropNavigation.````http://example.org/ns#hasOwner``
``http://example.org/ns#hasName``
``http://example.org/ns#taxNumber``
...
36. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 36Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``↵
.PropNavigation.``http://example.org/ns#hasOwner``
37. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 37Role of Index Structures on LOD
LITEQ and NPQL
§ Accessing instances (extension)
let allDogs = dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``.↵
.Extension
§ Accessing individuals
let bello = dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``↵
.Individuals.``http://example.org/ns#bello``↵
.getRdfObject
bello.get_hasName()
bello.get_taxNumber()
38. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 38Role of Index Structures on LOD
Exploring Entity
Descriptions
39. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 39Role of Index Structures on LOD
Schema-based Access to the LOD cloud
Schema-
level Index
Where?
• ACM
• DBLP
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type swrc:InProceedings .
?x dc:creator ?y .
?y rdf:type fb:Computer_Scientist
}
40. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 40Role of Index Structures on LOD
Schema-level Search of Relevant Data Sources
41. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 41Role of Index Structures on LOD
Searching for a Suitable Description
SELECT ?x
WHERE {
?x rdf:type foaf:Document
}
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type foaf:PersonalProfileDocument
}
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type sioc:Post .
}
Did you mean ...
Related Queries ...
So far: gentle,
iterative modification
42. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 42Role of Index Structures on LOD
Parallel Indices Over the Data
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
43. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 43Role of Index Structures on LOD
Parallel Indices Over the Data
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
44. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 44Role of Index Structures on LOD
General Idea for Mapping
Entity
Set
c1
c2
p3
p4
p5
Approx.
Entity
Set
derive
derive
approximate
description alternative
description
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
45. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 45Role of Index Structures on LOD
Types of
Indices
Building
Indices
Using
Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Search
data
structure
Efficientstorage
andretrieval
s1 o1p1 c1
s1 o1p2 c1
s2 o2p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1
rdf:type dc:creator
E2
Bad News ...dc:title
foaf:Document
swrc:InProceedings
rdf:type
46. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 46Role of Index Structures on LOD
Summary
Pros Cons
rich
knowledge
base
diversepublic
huge
on the Web
diversedistributed
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Technical solutions to
some of the problems
47. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 47Role of Index Structures on LOD
Summary
Pros Cons
rich
knowledge
base
diversepublic
huge
on the Web
diversedistributed
48. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 48Role of Index Structures on LOD
Thank you!
Contact:
Thomas Gottron
WeST – Institute for Web Science and Technologies
Universität Koblenz-Landau
gottron@uni-koblenz.de
49. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 49Role of Index Structures on LOD
References
1. M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open
data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011.
2. M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue
by stream-based indexing of linked data,” Journal of Web Semantics, 2012.
3. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the
linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität
Koblenz-Landau, 2012.
4. T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on
linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012.
5. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding
relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the
Billion Triple Track, 2012.
6. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support
Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on
Knowledge Capture, 2013.
7. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit
Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended
Semantic Web Conference, 2013.
8. J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using
Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management,
2013.
9. R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked
Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.
50. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 50Role of Index Structures on LOD
References
10. T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based
on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases,
pp. 1–39, 2014.
11. T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14:
Proceedings of the Extended Semantic Web Conference, 2014.
12. T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets
using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference,
2014.
13. Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n-
quads-20140225/, (accessed 14 March 2014)
14. Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In:
The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213–
227. Springer Berlin Heidelberg (2013)
51. Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 51Role of Index Structures on LOD
Sources
• Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work
is available under a CC-BY-SA license.
• WorldWideWeb Around Wikipedia – Wikipedia as part of the world wide web, This Wikipedia and
Wikimedia Commons image is from the user Chris 73 and is freely available at //commons.wikimedia.org/
wiki/File:WorldWideWebAroundWikipedia.png under the creative commons CC-BY-SA 3.0 license.