The document discusses generating high quality Linked Open Data using the RDF Mapping Language (RML). RML allows for the uniform and declarative generation of RDF from heterogeneous data sources through mapping rules. It supports assessing mapping quality to identify issues before data is generated. Metadata can also be automatically generated from the mappings. The document emphasizes that non-technical data specialists should be able to easily edit the mappings over time.
1. RML.io
Generating High Quality
Linked Open Data
from Open or Not Data
Anastasia Dimou
Data Science Lab, Ghent University - iMinds
anastasia.dimou@ugent.be
@natadimou
4. Are you the owner of your data?
OR
is the application that hosts your data?
5. The Semantic Web
is the extension of the World Wide Web
enables sharing content beyond
the boundaries of applications & websites
6. The Web for humans, thanks to HTML,
is understandable & constant
BUT
is the Web for machines too?
7. The Semantic Web
is the extension of the World Wide Web
enables sharing content beyond
the boundaries of applications & websites
allows machines to understand the
meaning of hyperlinked information
10. Linked (Open) Data
a standardized way of
expressing the relationships between data
semantically annotated the data
with different vocabularies or ontologies
describe domain-level knowledge
understandable by humans & machines
13. Resource Description Framework (RDF)
is the prevalent data model
for describing Linked (Open) Data
driven by unique identifiers (URIs)
allows establishing a shared meaning
predicate
subject object
14. How is Linked Data derived
from (semi-)structured data?
15. How is Linked Data derived
from (semi-)structured data?
id firstname lastname lab city
1 Anastasia Dimou DSLab Ghent
2 Ruben Verborgh DSLab Ghent
3 Erik Mannens DSLab Ghent
16. Person 1
Data Science
Labworks
“Anastasia Dimou”
locatedDataScience
Lab
Ghent
Person 2
Data Science
Labworks
“Ruben Verborgh”
Person 3
DataScience
Labworks
“Erik Mannens”
21. ex:{id}
ex:{lab}
sets of triples of a dataset have repetitive patterns
“{firstname}
{surname}”
RDF dataset generation tools
rely their implementation on repetitively
applying those patterns to input data
ex:located
ex:{lab} ex:{city}
22. What are the different
Linked Data Generation approaches?
24. R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
25. RDF Terms (focusing on IRIs) are…
generated independently
disregarding their possible prior definitions
manually replicated
by reconstructing the same URIs (if possible)
manually aligned afterwards
links with other datasets are defined after
the RDF terms are published
27. Uniform and declarative RDF generation
from heterogeneous data sources
mappings processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML RDF
28. RDF Mapping Language (RML)
generic scalable mapping language
for generating and interlinking
RDF data from heterogeneous resources
in an integrable and interoperable fashion
superset of the W3C standardized
R2RML mapping language
http://rml.io
29. Uniform and declarative RDF generation
from heterogeneous data sources
RML mappings processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML RDF
30. Defining Mappings to generate Linked Data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
31. Defining Mappings to generate RDF data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
32. RML describes
how to generated RDF
from structured data
predicate
subject object
Predicate MapSubject
Map
Object
Map
<#TriplesMap>
37. RML describes
rules to map any structured data to RDF
RML supports any data independently of
which structure and format they have
where they originally reside
how they are accessed & retrieved
38. Specifying data
which data form a data input
how to reference data input extracts
Accessing & Retrieving data
data input from original source(s)
39. Specifying data
which data form a data input
how to reference data input extracts
Accessing & Retrieving data
data input from original source(s)
41. Support data in Heterogeneous Structures and Formats
tabular-structured
tables in DBs or CSV files …
hierarchical-structured
JSON or XML …
(semi-)structured
HTML …
… … …
57. Violations
Most frequent violations are
related to how
vocabularies or ontologies
are applied to the data
dbo:birthDate range xsd:date
dbo:birthDate domain dbo:Person
http://example.com/
Chuck_Bednarik
dbo:Event
"1925-05-01"
xsd:gYear
dbo:birthDate
58. RDF DQA with RDFUnit
test-driven data-debugging framework
based on SPARQL-patterns
dbo:birthDate
http://example.com/
Chuck_Bednarik
dbo:Event
"1925-05-01"
xsd:gYear
http://rdfunit.aksw.org
59. DQA: Dataset Quality Assessment
Adjustments to the dataset
are manually but rarely applied
but not at the root (hard to identify)
are overwritten if a new version of
the original data is mapped & published
violations
DQA
60. Instead of applying Quality Assessment
to the already published RDF dataset
as part of data consumption
Apply Quality Assessment to the Mappings
that generate the RDF dataset
61. MQA: Mapping Quality Assessment
discover violations before
they are even generated
specify the origin of the violation
easily apply structural adjustments
to the mappings
62. sets of triples of a dataset have repetitive patterns
dbo:birthDatehttp://example.com/
{Name}_{Surname}
dbo:Event
“Birth"
xsd:gYear
Mapping languages
formalize patterns into rules
to generate the RDF dataset
from the original data
63. MQA with RDFUnit over RML
dbo:birthDate
http://example.com/
Chuck_Bednarik
dbo:Person
"1925-05-01"
xsd:date
DEL: <#ObjectMap> rr:datatype xsd:gYear
ADD: <#ObjectMap> rr:datatype xsd:date
68. Consider mapping rules to
automatically generate
self-descriptive
provenance and other metadata
69. W3C standardized Metadata
PROV
provenance information
VoID
expressing RDF dataset metadata
general metadata
structural metadata,
links between datasets
DCAT
describe datasets in data catalogs
70. Defining Mappings to generate Linked Data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
71. Semantic Web experts Vs. Data specialists
Modeling Domain Knowledge
as Linked (Open) Data
is not straightforward for
Data Specialists
Data context
is not straightforward for
Semantic Web experts
72. Semantic Web experts Vs. Data specialists
Data Specialists
should be able to specify the mappings,
modify and extend them at any time
81. RML.io
Generating High Quality
Linked Open Data
from Open or Not Data
Anastasia Dimou
Data Science Lab, Ghent University - iMinds
anastasia.dimou@ugent.be
@natadimou