This document discusses mapping data from relational databases to RDF. It provides an overview of the direct mapping approach and the R2RML standard for customizable mapping. Direct mapping generates URIs and RDF triples automatically based on the relational schema. R2RML allows customizing the mapping through a mapping language. The document also covers ETL systems for extracting relational data and loading it into triplestores as RDF, as well as use cases involving mapping biological and music databases to Linked Data.
What is RDB2RDF?
2
IDNAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/1>
<City/100>
Alice 25
Austin
<Person/2>
Alice
<City/200> Madrid
foaf:namefoaf:name foaf:age
foaf:name
foaf:name
foaf:based_near
3.
Context
RDF
Data Management
Relational Databaseto RDF
(RDB2RDF)
Triplestores
Wrapper
Systems
Extract-Transform-Load
(ETL)
RDBMS-backed
Triplestores
Native
Triplestores
NoSQL
Triplestores
3
4.
Outline
• Scenarios
• W3CRDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
4
Outline
• Scenarios
• W3CRDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
9
10.
W3C RDB2RDF Standards
•Standards to map relational data to RDF
• A Direct Mapping of Relational Data to RDF
– Default automatic mapping of relational data to
RDF
• R2RML: RDB to RDF Mapping Language
– Customizable language to map relational data to
RDF
10RDB2RDF
ID (pk) NAMEAGE
1 Alice 25
2 Bob NULL
Person
TableTriple
13
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person>
rdf:type
Base IRI “Table Name”/“PK attr”=“PK value”
Note: If there is no PK, then
a fresh blank node for every
row is generated.
ID
(pk)
NAME AGE
CID
(fk)
1 Alice25 100
2 Bob NULL 200
Person
CID
(pk)
TITLE
100 Austin
200 Madrid
City
ReferenceTriples
15
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person#ref-CID>
<http://www.ex.com/City/CID=100>.
16.
Direct Mapping Result
16
IDNAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/ID=1>
<City/CID=100>
Alice
25
Austin
<Person/ID=2>
Alice
<City/CID=200> Madrid
<Person#NAME>
<Person#AGE> <Person#NAME>
<Person#NAME>
<Person#NAME>
<Person#ref-CID>
<Person#ref-CID>
17.
Summary: Direct Mapping
•Default and Automatic Mapping
• URIs are automatically generated
– <table>
– <table#attribute>
– <table#ref-attribute>
– <Table#pkAttr=pkValue>
• RDF represents the same relational schema
• RDF can be transformed by
SPARQL CONSTRUCT
– RDF represents the structure and ontology of mapping
author’s choice
17
18.
What else ismissing?
• Relational Schema to OWL is *not* in the
W3C standard
• Many-to-Many relationships (binary tables)
• “Ugly” IRIs
18
Create R2RML
• Input
–Knowledge of the database (schema and data)
– Knowledge of the domain ontologies
– Knowledge of mappings
• Output
– R2RML file
• Direct Mapping helps to “bootstrap”
20
W3C RDB2RDF
• Task:Integrate data from
relational DBMS with
Linked Data
• Approach: map from
relational schema to
semantic vocabulary with
R2RML
• Publishing: two
alternatives –
– Translate SPARQL into SQL
on the fly
– Batch transform data into
RDF, index and provide
SPARQL access in a
triplestore
43
LDDatasetAccess
Integrated
Data in
Triplestore
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
Dataacquisition
R2RML
Engine
Relational
DBMS
RDB2RDF
44.
MusicBrainz Next GenSchema
44
• artist
As pre-NGS, but
further attributes
• artist_credit
Allows joint credit
• release_group
Cf. ‘album’
versus:
• release
• medium
• track
• tracklist
• work
• recording
https://wiki.musicbrainz.org/Next_Generation_Schema
RDB2RDF
45.
Music Ontology
45
• MusicArtist
–ArtistEvent, member_of
• SignalGroup
‘Album’ as per Release_Group
• Release
– ReleaseEvent
• Record
• Track
• Work
• Composition
http://musicontology.com/
RDB2RDF
46.
Scale
46
• MusicBrainz RDFderived via R2RML:
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""]
;
rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
300M
Triples
R2RML Property Mapping
•Mapping columns to properties can be easy:
lb:artist_name a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid, artist_name.name
FROM artist
INNER JOIN artist_name ON artist.name =
artist_name.id"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate foaf:name ;
rr:objectMap [rr:column "name"]] .
RDB2RDF 48
49.
NGS Advanced Relations
49
•Major entities (Artist, Release Group, Track, etc.) plus
URL are paired
(l_artist_artist)
• Each pairing
of instances
refers to a Link
• Links have types
(cf. RDF properties)
and attributes
http://wiki.musicbrainz.org/Advanced_Relationship
RDB2RDF
50.
Advanced Relations Mapping
•Mapping advanced relationships (SQL joins):
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-
12ffb71e3821'"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
50RDB2RDF
51.
Advanced Relations Mapping
•Mapping advanced relationships (SQL joins):
lb:artist_dbpedia a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid,
REPLACE(REPLACE(url, 'wikipedia.org/wiki',
'dbpedia.org/resource'),
'http://en.',
'http://')
AS url
FROM artist
INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0
INNER JOIN link ON l_artist_url.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN url on l_artist_url.entity1 = url.id
WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb'
AND url SIMILAR TO
'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ;
rr:subjectMap lb:sm_artist ;
rr:predicateObjectMap
[rr:predicate owl:sameAs ;
rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .
51RDB2RDF
52.
SPARQL Example
• SPARQLversus SQL
ASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}
SELECT …
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
WHERE AND … AND … AND … AND …
52RDB2RDF
53.
UpcomingTutorials
• ESWC –Montpellier, France
– May 27, 2013
• SemTechBiz – San Francisco, USA
– June 2, 2013
• More info: www.rdb2rdf.org
RDB2RDF 53
54.
For exercises, quizand further material visit our website:
54
@euclid_project EUCLID project EUCLIDproject
http://www.euclid-project.eu
Other channels:
eBook Course