Mapping Hierarchical Sources into
RDF using the RML Mapping Language
Anastasia Dimou1, Miel Vander Sande1,
Jason Slepicka2, Pedro Szekely2,
Erik Mannens1, Craig Knoblock2, Rik Van de Walle1
1Ghent University – iMinds – Multimedia Lab
2University of Southern California – Information Science Institute –
Department of Computer Science
http://rml.io
IEEE-ICSC14
Newport beach, California, 18th June 2014
Most of the data that we would like to
be able to query as Linked Open Data
exists in formats other than RDF
There are…
over 11,000 APIs according to
ProgrammableWeb.org
only 74 of which return results in RDF
But more than 5000
return results in JSON or XML
Many
languages, tools and approaches
were proposed
to convert data
from relational databases to RDF
Relational Database to RDF (R2RML W3C)
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
lack of uniform definitions
to describe mapping rules for heterogeneous sources
lack of interoperable definitions
that would allow the re-use of mapping rules
across different implementations
lack of reusable definitions
that would allow the re-use of mapping rules
for representing data in the same or different formats
mapping data
on a per-source and per-format basis
or on case-specific basis
Uniform way of defining mappings
for heterogeneous sources
that can be re-used across data
in the same or different formats
and be interoperable
across different implementations
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
RDF RDF RDF
Mappings definitions processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSONXML
any format to RDF
RDF Mapping Language (RML)
generic scalable mapping language
for mapping heterogeneous resources into RDF
in an integrable and interoperable fashion
superset of the W3C standardized
R2RML mapping language
http://semweb.mmlab.be/ns/rml
Relational Database to RDF
Mapping Language
(R2RML)
R2RML mapping document
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
Triples Map
Logical Table
Table Name
<#ArtistMapping>
rr:logicalTable
[
rr:tableName “ARTISTS”
].
R2RML mapping definition
Table Name
Triples
Map
Logical Table
Subject Map
Predicate-Object Map
Predicate-Object Map
Predicate-Object Map
Predicate Map
Object Map
R2RML mapping document
Triples Map
Subject Map
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
<#ArtistMapping>
rr:subjectMap [
rr:template “http://ex.com/{NAME}” ;
rr:class ex:Person ];
<http://ex.com/Robert+Theodore+McCall> a ex:Person
R2RML mapping document
Predicate Map
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
<#ArtistMapping>
rr:predicateObjectMap [
rr:predicate ex:birth_date;
rr:objectMap [
rr:column "BIRTH_DATE" ] ];
<http://ex.com/Robert+Theodore+McCall> ex:birth_date “1919-12-23”
Predicate Object Map
Objectt Map
RDF Mapping Language
(RML)
RDF Mapping Language (RML)
mapping hierarchical sources to RDF
deal with hierarchy and heterogeneity
R2RML: each row is a self-contained
that can be processed independently
R2RML: the columns in each row
can be referred to unambiguously
R2RML: for each reference to a column in a single row
a unique value is returned
explicit reference to the iteration pattern
R2RML: each row is a self-contained
that can be processed independently
abstract reference to the input data
R2RML: the columns in each row
can be referred to unambiguously
more than one triples per Predicate-Object Map
R2RML: for each reference to a column in a single row
a unique value is returned
RDF Mapping Language
(RML)
For hierarchical sources
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
artworks.JSON artists.XML
Specifying the input data
R2RML: database
RML: file, API, …
R2RML: Logical Table (rr:logicalTable)
RML: Logical Source (rml:logicalSource)
R2RML: logical Name (rr:logicalName)
RML: source (rml:source)
Triples Map
Logical Source
source
<#ArtworkMapping>
rml:logicalSource
[rml:source “http://ex.com/artworks.json”].
Triples Map
Logical Source
source
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml” ].
Referring to the input data
R2RML: databases
RML: XML or JSON or CSV or ….
R2RML: (SQL)
RML: Xpath/Xquery or JSONPath or RFC 4180 or …
R2RML: (rr:sqlQuery)
RML: rml:referenceFormulation
<#ArtworkMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json” ;
rml:rererenceFormulation ql:JSONPath ].
Triples Map
Logical Source
source
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml”;
rml:referenceFormulation ql:XPath ].Reference Formulation
Triples Map
Logical Source
source
Reference Formulation
Iterating over the input data
R2RML: per row
RML: ?
R2RML:
RML: rml:iterator
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml”;
rml:referenceFormulation ql:Xpath ;
rml:iterator “/Artists/Artist” ].
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#ArtworkMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json” ;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*]” ].
<#SitterMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*].Sitter” ].
Referring to the extracts of the input data
explicitly and implicitly
R2RML: column name
RML: XML element or JSON object or …
R2RML: rr:column
RML: rml:reference
<#ArtistMapping>
rml:logicalSource
[ rml:source “http://ex.com/artists.xml”;
rml:rererenceFormulation ql:XPath ;
rml:iterator “/Artists/Artist” ] ;
rr:subjectMap [
rr:template
“http://ex.com/{Name}”
];
rr:predicateObjectMap [
rr:predicate ex:death_date ;
rr:objectMap [
rml:reference
“/Artists/Artist/Death_Date”] ].
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
<http://ex.com/Robert+Theodore+McCall>
ex:death_date “1929-12-06”.
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#ArtworkMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*]” ] ;
rr:subjectMap [
rr:template
“http://ex.com/{Ref}”];
rr:predicateObjectMap [
rr:predicate rdfs:label ;
rr:objectMap [
rml:reference “$.[*].Title” ]
].
<http://ex.com/NPG_70_36>
rdfs:label “Apollo 11 Crew”.
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#SitterMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*].Sitter” ] ;
rr:subjectMap [
rr:template
“http://ex.com/{Name}”];
rr:predicateObjectMap [
rr:predicate ex:birth_date ;
rr:objectMap [
rml:reference “$.[*].Sitter.Birth Date” ]].
<http://ex.com/Neil+Armstrong>
ex:birth_date “1930-08-05”.
RDF Mapping Language (RML)
Source
Triples Map
Logical Source
Subject Map
Predicate-Object
Map
Predicate
Map
Object Map
Term
Map
template
constant
reference
Iterator
Reference
Formulation
Referencing
Object Map
Triples
Map
Join
Condition
Parent
column
Child
column
RDF Mapping Language
(RML)
Editing mappings with Karma
http://www.isi.edu/integration/karma/
RDF Mapping Language
(RML)
Processing
mapping-driven processing:
processing driven by the mapping module
data-driven processing:
processing driven by the extraction module
Extraction Module Mapping Module
RML Processor
Mapping Hierarchical Sources into RDF
using the RML mapping language
RML: http://rml.io
RML Namespace: http://semweb.mmlab.be/ns/rml
RML Processor: https://github.com/mmlab/RMLProcessor
Contact us
Anastasia Dimou anastasia.dimou@ugent.be @natadimou
Miel Vander Sande miel.vandersande@ugent.be @Miel_vds

Mapping Hierarchical Sources into RDF using the RML Mapping Language

  • 1.
    Mapping Hierarchical Sourcesinto RDF using the RML Mapping Language Anastasia Dimou1, Miel Vander Sande1, Jason Slepicka2, Pedro Szekely2, Erik Mannens1, Craig Knoblock2, Rik Van de Walle1 1Ghent University – iMinds – Multimedia Lab 2University of Southern California – Information Science Institute – Department of Computer Science http://rml.io IEEE-ICSC14 Newport beach, California, 18th June 2014
  • 2.
    Most of thedata that we would like to be able to query as Linked Open Data exists in formats other than RDF
  • 3.
    There are… over 11,000APIs according to ProgrammableWeb.org only 74 of which return results in RDF But more than 5000 return results in JSON or XML
  • 4.
    Many languages, tools andapproaches were proposed to convert data from relational databases to RDF
  • 5.
    Relational Database toRDF (R2RML W3C) R2RML mappings R2RML processor Data OWNER / PUBLISHER defines RDF DB
  • 7.
    R2RML mappings R2RMLprocessor Data OWNER / PUBLISHER defines RDF DB CSV JSONXML RDF RDF RDF
  • 8.
    lack of uniformdefinitions to describe mapping rules for heterogeneous sources lack of interoperable definitions that would allow the re-use of mapping rules across different implementations lack of reusable definitions that would allow the re-use of mapping rules for representing data in the same or different formats
  • 9.
    mapping data on aper-source and per-format basis or on case-specific basis Uniform way of defining mappings for heterogeneous sources that can be re-used across data in the same or different formats and be interoperable across different implementations
  • 10.
    R2RML mappings R2RMLprocessor Data OWNER / PUBLISHER defines RDF DB CSV JSONXML RDF RDF RDF
  • 11.
    Mappings definitions processor DataOWNER / PUBLISHER defines RDF DB CSV JSONXML any format to RDF
  • 12.
    RDF Mapping Language(RML) generic scalable mapping language for mapping heterogeneous resources into RDF in an integrable and interoperable fashion superset of the W3C standardized R2RML mapping language http://semweb.mmlab.be/ns/rml
  • 13.
    Relational Database toRDF Mapping Language (R2RML)
  • 14.
    R2RML mapping document NAMEBIRTH_DATE DEATH_DATE Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 Triples Map Logical Table Table Name <#ArtistMapping> rr:logicalTable [ rr:tableName “ARTISTS” ].
  • 15.
    R2RML mapping definition TableName Triples Map Logical Table Subject Map Predicate-Object Map Predicate-Object Map Predicate-Object Map Predicate Map Object Map
  • 16.
    R2RML mapping document TriplesMap Subject Map NAME BIRTH_DATE DEATH_DATE Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 <#ArtistMapping> rr:subjectMap [ rr:template “http://ex.com/{NAME}” ; rr:class ex:Person ]; <http://ex.com/Robert+Theodore+McCall> a ex:Person
  • 17.
    R2RML mapping document PredicateMap NAME BIRTH_DATE DEATH_DATE Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 <#ArtistMapping> rr:predicateObjectMap [ rr:predicate ex:birth_date; rr:objectMap [ rr:column "BIRTH_DATE" ] ]; <http://ex.com/Robert+Theodore+McCall> ex:birth_date “1919-12-23” Predicate Object Map Objectt Map
  • 18.
  • 19.
    RDF Mapping Language(RML) mapping hierarchical sources to RDF deal with hierarchy and heterogeneity
  • 20.
    R2RML: each rowis a self-contained that can be processed independently R2RML: the columns in each row can be referred to unambiguously R2RML: for each reference to a column in a single row a unique value is returned
  • 21.
    explicit reference tothe iteration pattern R2RML: each row is a self-contained that can be processed independently abstract reference to the input data R2RML: the columns in each row can be referred to unambiguously more than one triples per Predicate-Object Map R2RML: for each reference to a column in a single row a unique value is returned
  • 22.
    RDF Mapping Language (RML) Forhierarchical sources
  • 23.
    [ ... … {"Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] <Artists> ... ... <Artist> <Name>Robert Theodore McCall</Name> <Birth_Date>1919-12-23</Birth_Date> <Death_Date>2010-02-26</Death_Date> </Artist> <Artist> <Name>Ronald Anderson</Name> <Birth_Date>1929-12-06</Birth_Date> <Death_Date/> </Artist> ... ... </Artists> artworks.JSON artists.XML
  • 24.
    Specifying the inputdata R2RML: database RML: file, API, … R2RML: Logical Table (rr:logicalTable) RML: Logical Source (rml:logicalSource) R2RML: logical Name (rr:logicalName) RML: source (rml:source)
  • 25.
    Triples Map Logical Source source <#ArtworkMapping> rml:logicalSource [rml:source“http://ex.com/artworks.json”]. Triples Map Logical Source source <#ArtistMapping> rml:logicalSource [ rml:source “artists.xml” ].
  • 26.
    Referring to theinput data R2RML: databases RML: XML or JSON or CSV or …. R2RML: (SQL) RML: Xpath/Xquery or JSONPath or RFC 4180 or … R2RML: (rr:sqlQuery) RML: rml:referenceFormulation
  • 27.
    <#ArtworkMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json”; rml:rererenceFormulation ql:JSONPath ]. Triples Map Logical Source source <#ArtistMapping> rml:logicalSource [ rml:source “artists.xml”; rml:referenceFormulation ql:XPath ].Reference Formulation Triples Map Logical Source source Reference Formulation
  • 28.
    Iterating over theinput data R2RML: per row RML: ? R2RML: RML: rml:iterator
  • 29.
    <#ArtistMapping> rml:logicalSource [ rml:source “artists.xml”; rml:referenceFormulationql:Xpath ; rml:iterator “/Artists/Artist” ]. <Artists> ... ... <Artist> <Name>Robert Theodore McCall</Name> <Birth_Date>1919-12-23</Birth_Date> <Death_Date>2010-02-26</Death_Date> </Artist> <Artist> <Name>Ronald Anderson</Name> <Birth_Date>1929-12-06</Birth_Date> <Death_Date/> </Artist> ... ... </Artists>
  • 30.
    [ ... … {"Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] <#ArtworkMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json” ; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*]” ]. <#SitterMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json”; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*].Sitter” ].
  • 31.
    Referring to theextracts of the input data explicitly and implicitly R2RML: column name RML: XML element or JSON object or … R2RML: rr:column RML: rml:reference
  • 32.
    <#ArtistMapping> rml:logicalSource [ rml:source “http://ex.com/artists.xml”; rml:rererenceFormulationql:XPath ; rml:iterator “/Artists/Artist” ] ; rr:subjectMap [ rr:template “http://ex.com/{Name}” ]; rr:predicateObjectMap [ rr:predicate ex:death_date ; rr:objectMap [ rml:reference “/Artists/Artist/Death_Date”] ]. <Artists> ... ... <Artist> <Name>Robert Theodore McCall</Name> <Birth_Date>1919-12-23</Birth_Date> <Death_Date>2010-02-26</Death_Date> </Artist> <Artist> <Name>Ronald Anderson</Name> <Birth_Date>1929-12-06</Birth_Date> <Death_Date/> </Artist> ... ... </Artists> <http://ex.com/Robert+Theodore+McCall> ex:death_date “1929-12-06”.
  • 33.
    [ ... … {"Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] <#ArtworkMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json”; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*]” ] ; rr:subjectMap [ rr:template “http://ex.com/{Ref}”]; rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rml:reference “$.[*].Title” ] ]. <http://ex.com/NPG_70_36> rdfs:label “Apollo 11 Crew”.
  • 34.
    [ ... … {"Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] <#SitterMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json”; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*].Sitter” ] ; rr:subjectMap [ rr:template “http://ex.com/{Name}”]; rr:predicateObjectMap [ rr:predicate ex:birth_date ; rr:objectMap [ rml:reference “$.[*].Sitter.Birth Date” ]]. <http://ex.com/Neil+Armstrong> ex:birth_date “1930-08-05”.
  • 35.
    RDF Mapping Language(RML) Source Triples Map Logical Source Subject Map Predicate-Object Map Predicate Map Object Map Term Map template constant reference Iterator Reference Formulation Referencing Object Map Triples Map Join Condition Parent column Child column
  • 36.
    RDF Mapping Language (RML) Editingmappings with Karma http://www.isi.edu/integration/karma/
  • 38.
  • 39.
    mapping-driven processing: processing drivenby the mapping module data-driven processing: processing driven by the extraction module
  • 40.
    Extraction Module MappingModule RML Processor
  • 41.
    Mapping Hierarchical Sourcesinto RDF using the RML mapping language RML: http://rml.io RML Namespace: http://semweb.mmlab.be/ns/rml RML Processor: https://github.com/mmlab/RMLProcessor Contact us Anastasia Dimou anastasia.dimou@ugent.be @natadimou Miel Vander Sande miel.vandersande@ugent.be @Miel_vds