xR2RML is a mapping language that extends R2RML and RML to enable the translation of heterogeneous data sources, including relational databases, NoSQL databases, XML documents, JSON documents and more, to RDF. xR2RML provides a unified approach for describing mappings from various data models and query languages to RDF through the use of logical sources, references to data elements, and support for nested collections and cross-references between data sources. This allows for standardized translation of diverse data to the semantic web.
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Translation of Relational and Non-Relational Databases into RDF with xR2RML
1. 1
Translation of Relational and
Non-Relational Databases
into RDF with xR2RML
F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat
I3S lab, CNRS, Univ. Nice Sophia
2. 2
Web of data publication/interlinking of open datasets
• Goal: publish heterogeneous data in a common format (RDF)
Driven by data integration initiatives, e.g.:
• Linking Open Data, 1015 ds.
• W3C Data Activity
• BIO2RDF, 35 ds.
• Neuroscience Information
Framework
(12598 registry entries)
Web-scale data integration
Linked Datasets as of Aug. 30th 2014.
(c) R. Cyganiak & and A. Jentzsch
(Data: Apr. 2015)
3. 3
Web-scale data integration
Need to access data from the Deep Web [1]
• Strd./unstrd. data
hardly indexed by search engines,
hardly linked with other data sources
Exponential data growth goes on
• Various types of DBs:
RDB, NoSQL, NewSQL, Native XML,
LDAP directory, OODB...
• Heterogeneous data models and
query capabilities
[1] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the deep web. Communications of the ACM, 50(5):94–101, 2007
4. 4
Web-scale data integration
To enrich the web of data with
existing and new data being created
ever faster...
... we need standardized approaches
to enable the translation of
heterogeneous data sources to RDF
5. 5
Previous works
Background: R2RML and RML
Description of xR2RML
Evaluation and perspectives
Agenda
6. 6
Previous works
Background: R2RML and RML
Description of xR2RML
Evaluation and perspectives
Agenda
7. 7
Much work achieved on RDBs
D2RQ, Virtuoso, R2RML (W3C)…
Goals: generic RDB-to-RDF, OBDA, ontology learning, schema mapping…
Methods: direct mapping vs. domain-specific,
materialization vs. SQL-to-SPARQL query rewriting
XML: using either XPath (RML), XQuery (XSPARQL,
SPARQL2XQuery) or XSLT (Scissor-Lift), XSD-to-OWL
(SPARQL2XQuery)
CSV/TSV/Spreadsheets: CSV on the web (W3C WG)
JSON: using JSONPath (RML)
Integration frameworks: DataLift, RML, Asio Tool Suite…
Previous works
8. 8
Existing approaches to map specific types of databases or
map specific data formats to RDF
Each comes with its own mapping language or UI
Supporting a new system (data model and QL) not
straightforward
Previous works
No unified mapping language to equally apply to most common
databases (RDB, NoSQL, XML, LDAP, OO…)
Supporting a new data model and/or QL develop a DB
connector but no change in the mapping language
9. 9
Previous works
Background: R2RML and RML
Description of xR2RML
Evaluation and perspectives
Agenda
10. 10
R2RML – RDB To RDF Mapping Language
W3C recommendation, 2012
Goals:
• Describe mappings of relational entities to RDF
• Reuse of existing ontologies
• Operationalization not addressed
How: TriplesMaps (TM) define how to generate RDF triples
• 1 logical table rows to process
• 1 subject map subject IRIs
• N (predicate map-object map) couples
• 1 opt. graph map graph IRIs
An R2RML mapping is an RDF graph
Triples
11. 11
R2RML – RDB To RDF Mapping Language
Id Acronym Centre_Id
10 CAC2010 4
Id Name address
4 Pasteur ...
Study
Centre
FK
R2RML mapping graph:
Produced RDF:
<#Centre> a rr:TriplesMap;
rr:logicalTable [ rr:tableName "Centre" ];
rr:subjectMap [ rr:class ex:Centre;
rr:template "http://example.org/centre#{Name}"; ].
<#Study> a rr:TriplesMap;
rr:logicalTable [ rr:tableName “Study" ];
rr:subjectMap [ rr:class ex:Study;
rr:template "http://example.org/study#{Id}"; ];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [ rr:column "Acronym" ]; ];
rr:predicateObjectMap [
rr:predicate ex:locatedIn;
rr:objectMap [
rr:parentTriplesMap <#Centre>;
rr:joinCondition [
rr:child "Centre_id";
rr:parent "Id";
]; ]; ].
<http://example.org/centre#Pasteur> a ex:Centre.
<http://example.org/study#10> a ex:Study;
ex:hasName "CAC2010";
ex:locatedIn <http://example.org/centre#Pasteur>.
12. 12
<#Centre>
rml:logicalSource [
rml:source “http://example.org/Centres.xml";
rml:referenceFormulation ql:XPath;
rml:iterator “/centres/centre”:
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
"http://example.org/centre#{//centre/@Id}";
];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [
rml:reference "//centre/name" ];
];
RML extensions to R2RML
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
Advantages:
• Extends to CSV, JSON, XML sources
• Map several sources simultaneously
Limitations:
• Fixed list of reference formulations
• No distinction between reference
formulation and query language
• No RDF collections
RML mapping graph:XML document:
13. 13
Previous works
Background: R2RML and RML
Description of xR2RML
Evaluation and perspectives
Agenda
14. 14
xR2RML - Overall picture
xR2RML
Translation
Engine
xR2RML
Mapping
description
Native QL
Source database
Flexible language to describe mappings from
most common types of DB to RDF.
Extends R2RML and leverages RML extensions.
Domain
ontologies
refers to
Domain
ontologies
uses
24. 24
<#Centre>
xrr:logicalSource [
xrr:sourceName "STAFF";
];
...
rr:predicateObjectMap [
rr:predicate ex:fist-name;
rr:objectMap [
xrr:reference
"Column(Name)/JSONPath($.FirstName)" ];
];
xR2RML: content with mixed formats
Data with mixed content
Relational table “STAFF”, column “Name”
contains JSON data:
... Name ...
... {
“FirstName”: “Bob”,
“LastName: “Smith”
}
...
Data
format
Syntax path constructor
Row Column(), CSV(), TSV()
XML XPath()
JSON JSONPath()
... ...
xR2RML mapping graph:
25. 25
Previous works
Background: R2RML and RML
Description of xR2RML main features
Evaluation and perspectives
Agenda
26. 26
Use case: study the history and transmission of
zoological knowledge
along historical periods
TAXREF taxonomical reference
• Designed to support studies in Conservation Biology, enriched
with bioarchaeological taxa
• Maintained the French National Museum of Natural History
• ~ 450.000 terms, CSV/JSON/XML
Use case in Digital Humanities
27. 27
Ongoing work [2]: Construction of a SKOS1 thesaurus based
on TAXREF
• Import of TAXREF/JSON into MongoDB
• Use of the Morph-xR2RML prototype implementation of
xR2RML, to convert the MongoDB data to RDF
• Make alignments with existing well-adopted ontologies
(e.g. NCBI Taxonomic Classification, GeoNames...)
• Static alignments at mapping design time
• Using automatic alignment methods
Use case in Digital Humanities
1 SKOS: Simple Knowledge Organization System, W3C RDF-based standard to represent controlled
vocabularies, taxonomies and thesauri. Bridge the gap between existing KOS and the Semantic Web
and Linked Data.
28. 28
Ongoing discussion about the use of
xR2RML to support ecology and
agronomic studies
• Large phenotype databases
Consider the query rewriting approach to support large
datasets
How to write xR2RML mappings
• Automatic xR2RML mapping generation from data schema
(XSD/DTD, JSON schema, JSON-LD...)
• Schema mapping
• Schema discovery
Perspectives
29. 29
Conclusions
Data deluge keeps on ever faster
Data stored in many kinds of DBs
xR2RML:
• Flexible language to map most common types of database to
RDF
• Supports various data models and query languages
• Rich features: RDF collections/containers, joins, content with
mixed formats
Applied to the construction of a SKOS thesaurus of
TAXREF, a taxonomical reference
30. 30
Contacts:
Franck Michel
Johan Montagnat
Catherine Faron-Zucker
[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for
Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.
[3] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. xR2RML: Non-Relational Databases to RDF
Mapping Language. Research report. ISRN I3S/RR 2014-04-FR. http://hal.archives-ouvertes.fr/hal-01066663
https://github.com/frmichel/morph-xr2rml/