Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources.
http://rml.io
https://github.com/mmlab/RMLProcessor
Mapping Hierarchical Sources into RDF using the RML Mapping Language
1. Mapping Hierarchical Sources into
RDF using the RML Mapping Language
Anastasia Dimou1, Miel Vander Sande1,
Jason Slepicka2, Pedro Szekely2,
Erik Mannens1, Craig Knoblock2, Rik Van de Walle1
1Ghent University – iMinds – Multimedia Lab
2University of Southern California – Information Science Institute –
Department of Computer Science
http://rml.io
IEEE-ICSC14
Newport beach, California, 18th June 2014
2. Most of the data that we would like to
be able to query as Linked Open Data
exists in formats other than RDF
3. There are…
over 11,000 APIs according to
ProgrammableWeb.org
only 74 of which return results in RDF
But more than 5000
return results in JSON or XML
5. Relational Database to RDF (R2RML W3C)
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB
6.
7. R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
8. lack of uniform definitions
to describe mapping rules for heterogeneous sources
lack of interoperable definitions
that would allow the re-use of mapping rules
across different implementations
lack of reusable definitions
that would allow the re-use of mapping rules
for representing data in the same or different formats
9. mapping data
on a per-source and per-format basis
or on case-specific basis
Uniform way of defining mappings
for heterogeneous sources
that can be re-used across data
in the same or different formats
and be interoperable
across different implementations
10. R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
12. RDF Mapping Language (RML)
generic scalable mapping language
for mapping heterogeneous resources into RDF
in an integrable and interoperable fashion
superset of the W3C standardized
R2RML mapping language
http://semweb.mmlab.be/ns/rml
19. RDF Mapping Language (RML)
mapping hierarchical sources to RDF
deal with hierarchy and heterogeneity
20. R2RML: each row is a self-contained
that can be processed independently
R2RML: the columns in each row
can be referred to unambiguously
R2RML: for each reference to a column in a single row
a unique value is returned
21. explicit reference to the iteration pattern
R2RML: each row is a self-contained
that can be processed independently
abstract reference to the input data
R2RML: the columns in each row
can be referred to unambiguously
more than one triples per Predicate-Object Map
R2RML: for each reference to a column in a single row
a unique value is returned
26. Referring to the input data
R2RML: databases
RML: XML or JSON or CSV or ….
R2RML: (SQL)
RML: Xpath/Xquery or JSONPath or RFC 4180 or …
R2RML: (rr:sqlQuery)
RML: rml:referenceFormulation
31. Referring to the extracts of the input data
explicitly and implicitly
R2RML: column name
RML: XML element or JSON object or …
R2RML: rr:column
RML: rml:reference