Translation of Relational and Non-Relational Databases into RDF with xR2RML

1
Translation of Relational and
Non-Relational Databases
into RDF with xR2RML
F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat
I3S lab, CNRS, Univ. Nice Sophia

2
 Web of data  publication/interlinking of open datasets
• Goal: publish heterogeneous data in a common format (RDF)
 Driven by data integration initiatives, e.g.:
• Linking Open Data, 1015 ds.
• W3C Data Activity
• BIO2RDF, 35 ds.
• Neuroscience Information
Framework
(12598 registry entries)
Web-scale data integration
Linked Datasets as of Aug. 30th 2014.
(c) R. Cyganiak & and A. Jentzsch
(Data: Apr. 2015)

3
 Need to access data from the Deep Web [1]
• Strd./unstrd. data
hardly indexed by search engines,
hardly linked with other data sources
 Exponential data growth goes on
• Various types of DBs:
RDB, NoSQL, NewSQL, Native XML,
LDAP directory, OODB...
• Heterogeneous data models and
query capabilities
[1] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the deep web. Communications of the ACM, 50(5):94–101, 2007

4
To enrich the web of data with
existing and new data being created
ever faster...
... we need standardized approaches
to enable the translation of
heterogeneous data sources to RDF

5
 Previous works
 Background: R2RML and RML
 Description of xR2RML
 Evaluation and perspectives
Agenda

6
 Previous works
Agenda

7
 Much work achieved on RDBs
D2RQ, Virtuoso, R2RML (W3C)…
Goals: generic RDB-to-RDF, OBDA, ontology learning, schema mapping…
Methods: direct mapping vs. domain-specific,
materialization vs. SQL-to-SPARQL query rewriting
 XML: using either XPath (RML), XQuery (XSPARQL,
SPARQL2XQuery) or XSLT (Scissor-Lift), XSD-to-OWL
(SPARQL2XQuery)
 CSV/TSV/Spreadsheets: CSV on the web (W3C WG)
 JSON: using JSONPath (RML)
 Integration frameworks: DataLift, RML, Asio Tool Suite…
Previous works

8
 Existing approaches to map specific types of databases or
map specific data formats to RDF
 Each comes with its own mapping language or UI
 Supporting a new system (data model and QL) not
straightforward
Previous works
No unified mapping language to equally apply to most common
databases (RDB, NoSQL, XML, LDAP, OO…)
Supporting a new data model and/or QL  develop a DB
connector but no change in the mapping language

9
 Previous works
Agenda

10
R2RML – RDB To RDF Mapping Language
 W3C recommendation, 2012
 Goals:
• Describe mappings of relational entities to RDF
• Reuse of existing ontologies
• Operationalization not addressed
 How: TriplesMaps (TM) define how to generate RDF triples
• 1 logical table  rows to process
• 1 subject map  subject IRIs
• N (predicate map-object map) couples
• 1 opt. graph map  graph IRIs
 An R2RML mapping is an RDF graph
Triples

11
R2RML – RDB To RDF Mapping Language
Id Acronym Centre_Id
10 CAC2010 4
Id Name address
4 Pasteur ...
Study
Centre
FK
R2RML mapping graph:
Produced RDF:
<#Centre> a rr:TriplesMap;
rr:logicalTable [ rr:tableName "Centre" ];
rr:subjectMap [ rr:class ex:Centre;
rr:template "http://example.org/centre#{Name}"; ].
<#Study> a rr:TriplesMap;
rr:logicalTable [ rr:tableName “Study" ];
rr:subjectMap [ rr:class ex:Study;
rr:template "http://example.org/study#{Id}"; ];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [ rr:column "Acronym" ]; ];
rr:predicate ex:locatedIn;
rr:objectMap [
rr:parentTriplesMap <#Centre>;
rr:joinCondition [
rr:child "Centre_id";
rr:parent "Id";
]; ]; ].
<http://example.org/centre#Pasteur> a ex:Centre.
<http://example.org/study#10> a ex:Study;
ex:hasName "CAC2010";
ex:locatedIn <http://example.org/centre#Pasteur>.

12
<#Centre>
rml:logicalSource [
rml:source “http://example.org/Centres.xml";
rml:referenceFormulation ql:XPath;
rml:iterator “/centres/centre”:
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
"http://example.org/centre#{//centre/@Id}";
];
rr:objectMap [
rml:reference "//centre/name" ];
];
RML extensions to R2RML
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
Advantages:
• Extends to CSV, JSON, XML sources
• Map several sources simultaneously
Limitations:
• Fixed list of reference formulations
• No distinction between reference
formulation and query language
• No RDF collections
RML mapping graph:XML document:

13
 Previous works
Agenda

14
xR2RML - Overall picture
xR2RML
Translation
Engine
xR2RML
Mapping
description
Native QL
Source database
Flexible language to describe mappings from
most common types of DB to RDF.
Extends R2RML and leverages RML extensions.
Domain
ontologies
refers to
Domain
ontologies
uses

15
xR2RML: Logical source
<#Centre>
xrr:logicalSource [
xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre
where ... return $x’’’;
];
rr: R2RML vocabulary
xrr: xR2RML vocabulary
<centres>
<centre @Id="4">
</centre>
<centre @Id="6">
</centre>
</centres>
XML database
supporting XQuey:
xR2RML mapping graph:

16
xR2RML: Data element references
<#Centre>
xrr:logicalSource [
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
];
rr:objectMap [
xrr:reference "//centre/name" ];
];
<centres>
<centre @Id="4">
</centre>
<centre @Id="6">
</centre>
</centres>
XML database
supporting XQuey:

17
xR2RML: Data element references
<centres>
<centre @Id="4">
</centre>
<centre @Id="6">
</centre>
</centres>
XML database
supporting XQuey:
<#Centre>
xrr:logicalSource [
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
];
rr:objectMap [
xrr:reference “//centre/name" ];
];
xR2RML engine usage guidelines
Types of DB xrr:query
xrr:reference
rr:template
RDB, Column
stores
SQL, CQL, HQL Column name
Native XML DB XQuery XPath
NoSQL doc. Store Proprietary JS-based JSONPath
SPARQL endpoint SPARQL
Variable name,
Column name (s, p, o)
Neo4J (graph db) Cypher Column name (s, p, o)
LDAP directory LDAP Query Attribute name
... ... ...

18
{ "studyid": 10,
"acronym": "CAC2010",
"centres": [
{ "centreid": 4, "name": "Pasteur" },
{ "centreid": 6, "name": "Pontchaillou" }
]
}
xR2RML: multiple values vs. RDF list/container
Mapping case: link the study
with the centres it involves
<http://example.org/study#10> ex:involves “Pasteur”.
<http://example.org/study#10> ex:involves “Pontchaillou”.
<http://example.org/study#10>
ex:involvesCenters ( “Pasteur” “Pontchaillou” )

19
{ "studyid": 10,
"acronym": "CAC2010",
"centres": [
{ "centreid": 4, "name": "Pasteur" },
{ "centreid": 6, "name": "Pontchaillou" }
]
}
xR2RML: multiple values vs. RDF list/container
Mapping case: link the study
with the centres it involves
rr:objectMap [
xrr:reference "$.centres.*.name“;
rr:termType xrr:RdfList;
];
R2RML
term types
rr:IRI,
rr:Literal,
rr:BlankNode
xR2RML
term types
xrr:RdfList,
xrr:RdfSeq,
xrr:RdfBag,
xrr:RdfAlt

20
xR2RML: nested collections
From structured values (XML, JSON...):
nested collections and key-value associations...
... to RDF:
 generate nested lists/containers,
qualify members (data type,
language tag...)
rr:objectMap [
xrr:reference “...";
xrr:nestedTermMap [
xrr:reference “...";
xrr:nestedTermMap [
rr:datatype xsd:string;
]; ]; ];
(
( “John”^^xsd:string “Bob”^^xsd:string )
( “Ted”^^xsd:string “Mark”^^xsd:string )
)
E.g.: produce a list of lists of strings

21
Collection “studies”:
{ “studyid”: 10,
“acronym”: “CAC2010”,
“centres”: [ 4, 6 ]
}
Collection “centres”:
{ “centreid”: 4,
“name”: “Pasteur” },
“name”: “Pontchaillou”}
xR2RML: cross-references
<#Centre>
xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].
<#Study>
xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];
rr:predicate ex:involvesSeq;
rr:objectMap [
rr:joinCondition [
rr:child "$.centres.*";
rr:parent "$.centreid";
];
rr:termType xrr:RdfSeq;
];
].
<http://example.org/study#10> ex:involvesSeq
[ a rdf:Seq;
rdf:_1 <http://example.org/centre#Pasteur>;
rdf:_2 <http://example.org/centre#Pontchaillou>; ].
xR2RML mapping graph:MongoDB database:
Produced RDF:

22
Collection “studies”:
{ “studyid”: 10,
“acronym”: “CAC2010”,
“centres”: [ 4, 6 ]
}
Collection “centres”:
“name”: “Pasteur” },
“name”: “Pontchaillou”}
xR2RML: cross-references
<#Centre>
xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].
<#Study>
xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];
rr:predicate ex:involvesSeq;
rr:objectMap [
rr:joinCondition [
rr:child "$.centres.*";
rr:parent "$.centreid";
];
rr:termType xrr:RdfSeq;
];
].
xR2RML mapping graph:MongoDB database:
Joint query pushed to the DB
if supported, performed by
the xR2RML engine otherwise
<http://example.org/study#10> ex:involvesSeq
[ a rdf:Seq;
rdf:_1 <http://example.org/centre#Pasteur>;
rdf:_2 <http://example.org/centre#Pontchaillou>; ].
Produced RDF:

23
<#Centre>
xrr:logicalSource [
xrr:sourceName "STAFF";
];
...
rr:predicate ex:fist-name;
rr:objectMap [
xrr:reference
"Column(Name)/JSONPath($.FirstName)" ];
];
xR2RML: content with mixed formats
Data with mixed content
Relational table “STAFF”, column “Name”
contains JSON data:
... Name ...
... {
“FirstName”: “Bob”,
“LastName: “Smith”
}
...

24
<#Centre>
xrr:logicalSource [
xrr:sourceName "STAFF";
];
...
rr:predicate ex:fist-name;
rr:objectMap [
xrr:reference
"Column(Name)/JSONPath($.FirstName)" ];
];
xR2RML: content with mixed formats
Data with mixed content
Relational table “STAFF”, column “Name”
contains JSON data:
... Name ...
... {
“FirstName”: “Bob”,
“LastName: “Smith”
}
...
Data
format
Syntax path constructor
Row Column(), CSV(), TSV()
XML XPath()
JSON JSONPath()
... ...

25
 Previous works
 Description of xR2RML main features
Agenda

26
 Use case: study the history and transmission of
zoological knowledge
along historical periods
 TAXREF taxonomical reference
• Designed to support studies in Conservation Biology, enriched
with bioarchaeological taxa
• Maintained the French National Museum of Natural History
• ~ 450.000 terms, CSV/JSON/XML
Use case in Digital Humanities

27
 Ongoing work [2]: Construction of a SKOS1 thesaurus based
on TAXREF
• Import of TAXREF/JSON into MongoDB
• Use of the Morph-xR2RML prototype implementation of
xR2RML, to convert the MongoDB data to RDF
• Make alignments with existing well-adopted ontologies
(e.g. NCBI Taxonomic Classification, GeoNames...)
• Static alignments at mapping design time
• Using automatic alignment methods
Use case in Digital Humanities
1 SKOS: Simple Knowledge Organization System, W3C RDF-based standard to represent controlled
vocabularies, taxonomies and thesauri. Bridge the gap between existing KOS and the Semantic Web
and Linked Data.

28
 Ongoing discussion about the use of
xR2RML to support ecology and
agronomic studies
• Large phenotype databases
 Consider the query rewriting approach to support large
datasets
 How to write xR2RML mappings
• Automatic xR2RML mapping generation from data schema
(XSD/DTD, JSON schema, JSON-LD...)
• Schema mapping
• Schema discovery
Perspectives

29
Conclusions
 Data deluge keeps on ever faster
 Data stored in many kinds of DBs
 xR2RML:
• Flexible language to map most common types of database to
RDF
• Supports various data models and query languages
• Rich features: RDF collections/containers, joins, content with
mixed formats
 Applied to the construction of a SKOS thesaurus of
TAXREF, a taxonomical reference

30
Contacts:
Franck Michel
Johan Montagnat
Catherine Faron-Zucker
[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for
Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.
[3] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. xR2RML: Non-Relational Databases to RDF
Mapping Language. Research report. ISRN I3S/RR 2014-04-FR. http://hal.archives-ouvertes.fr/hal-01066663
https://github.com/frmichel/morph-xr2rml/

Translation of Relational and Non-Relational Databases into RDF with xR2RML

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (17)

Similar to Translation of Relational and Non-Relational Databases into RDF with xR2RML

Similar to Translation of Relational and Non-Relational Databases into RDF with xR2RML (20)

More from Franck Michel

More from Franck Michel (14)

Recently uploaded

Recently uploaded (20)

Translation of Relational and Non-Relational Databases into RDF with xR2RML