Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
RML.io
Generating High Quality
Linked Open Data
from Open or Not Data
Anastasia Dimou
Data Science Lab, Ghent University - iMinds
anastasia.dimou@ugent.be
@natadimou
Are you the owner of your data?
OR
is the application that hosts your data?
The Semantic Web
is the extension of the World Wide Web
enables sharing content beyond
the boundaries of applications & websites
The Web for humans, thanks to HTML,
is understandable & constant
BUT
is the Web for machines too?
The Semantic Web
is the extension of the World Wide Web
enables sharing content beyond
the boundaries of applications & websites
allows machines to understand the
meaning of hyperlinked information
Linked (Open) Data
a standardized way of
expressing the relationships between data
semantically annotated the data
with different vocabularies or ontologies
describe domain-level knowledge
understandable by humans & machines
Resource Description Framework (RDF)
is the prevalent data model
for describing Linked (Open) Data
driven by unique identifiers (URIs)
allows establishing a shared meaning
predicate
subject object
How is Linked Data derived
from (semi-)structured data?
id firstname lastname lab city
1 Anastasia Dimou DSLab Ghent
2 Ruben Verborgh DSLab Ghent
3 Erik Mannens DSLab Ghent
Person 1
Data Science
Labworks
“Anastasia Dimou”
locatedDataScience
Lab
Ghent
Person 2
Data Science
Labworks
“Ruben Verborgh”
Person 3
DataScience
Labworks
“Erik Mannens”
ex:{id}
ex:{lab}
sets of triples of a dataset have repetitive patterns
“{firstname}
{surname}”
RDF dataset generation tools
rely their implementation on repetitively
applying those patterns to input data
ex:located
ex:{lab} ex:{city}
What are the different
Linked Data Generation approaches?
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
RDF Terms (focusing on IRIs) are…
generated independently
disregarding their possible prior definitions
manually replicated
by reconstructing the same URIs (if possible)
manually aligned afterwards
links with other datasets are defined after
the RDF terms are published
Uniform and declarative RDF generation
from heterogeneous data sources
mappings processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML RDF
RDF Mapping Language (RML)
generic scalable mapping language
for generating and interlinking
RDF data from heterogeneous resources
in an integrable and interoperable fashion
superset of the W3C standardized
R2RML mapping language
http://rml.io
Uniform and declarative RDF generation
from heterogeneous data sources
RML mappings processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML RDF
Defining Mappings to generate Linked Data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
Defining Mappings to generate RDF data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
RML describes
how to generated RDF
from structured data
predicate
subject object
Predicate MapSubject
Map
Object
Map
<#TriplesMap>
RML describes
rules to map any structured data to RDF
RML supports any data independently of
which structure and format they have
where they originally reside
how they are accessed & retrieved
Specifying data
which data form a data input
how to reference data input extracts
Accessing & Retrieving data
data input from original source(s)
Specifying data
which data form a data input
how to reference data input extracts
Accessing & Retrieving data
data input from original source(s)
Support data in Heterogeneous Structures and Formats
tabular-structured
tables in DBs or CSV files …
hierarchical-structured
JSON or XML …
(semi-)structured
HTML …
… … …
Support different Locations and Access Interfaces
Local File(s)
Database connectivity
D2RQ
Web source(s) (Web API/service)
DCAT, CSVW, Hydra, VOiD (Dataset)
RDF source(s)
VOiD (Endpoint), SPARQL-SD
Violations
Most frequent violations are
related to how
vocabularies or ontologies
are applied to the data
dbo:birthDate range xsd:date
dbo:birthDate domain dbo:Person
http://example.com/
Chuck_Bednarik
dbo:Event
"1925-05-01"
xsd:gYear
dbo:birthDate
RDF DQA with RDFUnit
test-driven data-debugging framework
based on SPARQL-patterns
dbo:birthDate
http://example.com/
Chuck_Bednarik
dbo:Event
"1925-05-01"
xsd:gYear
http://rdfunit.aksw.org
DQA: Dataset Quality Assessment
Adjustments to the dataset
are manually but rarely applied
but not at the root (hard to identify)
are overwritten if a new version of
the original data is mapped & published
violations
DQA
Instead of applying Quality Assessment
to the already published RDF dataset
as part of data consumption
Apply Quality Assessment to the Mappings
that generate the RDF dataset
MQA: Mapping Quality Assessment
discover violations before
they are even generated
specify the origin of the violation
easily apply structural adjustments
to the mappings
sets of triples of a dataset have repetitive patterns
dbo:birthDatehttp://example.com/
{Name}_{Surname}
dbo:Event
“Birth"
xsd:gYear
Mapping languages
formalize patterns into rules
to generate the RDF dataset
from the original data
MQA with RDFUnit over RML
dbo:birthDate
http://example.com/
Chuck_Bednarik
dbo:Person
"1925-05-01"
xsd:date
DEL: <#ObjectMap> rr:datatype xsd:gYear
ADD: <#ObjectMap> rr:datatype xsd:date
Consider mapping rules to
automatically generate
self-descriptive
provenance and other metadata
W3C standardized Metadata
PROV
provenance information
VoID
expressing RDF dataset metadata
general metadata
structural metadata,
links between datasets
DCAT
describe datasets in data catalogs
Defining Mappings to generate Linked Data
Retrieving Input Data
Assessing Quality
Generating Metadata
Editing Mappings
Semantic Web experts Vs. Data specialists
Modeling Domain Knowledge
as Linked (Open) Data
is not straightforward for
Data Specialists
Data context
is not straightforward for
Semantic Web experts
Semantic Web experts Vs. Data specialists
Data Specialists
should be able to specify the mappings,
modify and extend them at any time
RML.io
Generating High Quality
Linked Open Data
from Open or Not Data
Anastasia Dimou
Data Science Lab, Ghent University - iMinds
anastasia.dimou@ugent.be
@natadimou