Relational Database to RDF
(RDB2RDF)
Juan Sequeda
Barry Norton
What is RDB2RDF?
2
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/1>
<City/100>
Alice 25
Austin
<Person/2>
Alice
<City/200> Madrid
foaf:namefoaf:name foaf:age
foaf:name
foaf:name
foaf:based_near
Context
RDF
Data Management
Relational Database to RDF
(RDB2RDF)
Triplestores
Wrapper
Systems
Extract-Transform-Load
(ETL)
RDBMS-backed
Triplestores
Native
Triplestores
NoSQL
Triplestores
3
Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
4
Ideal Scenario: Automatic
Mapping
5
Relational Database
Refined
R2RML
Direct
Mapping as
Ontology
RDB2RDF Wrapper
SPARQL
Source
Putative
Ontology
RDF
Automatic
Mapping
Domain
Ontologies
Semi-automatic Mapping
6
Relational
Database
Refined
R2RML
Direct
Mapping as
Ontology
RDB2RDF
Wrapper
SPARQL
Source
Putative
Ontology
RDF
Semi-
Automatic
Mapping
Domain
Ontologies
R2RML
7
Relational
Database
R2RML
Mapping
Engine
Domain
Ontologies
(e.g FOAF, etc)
R2RML
File
Extract Transform Load
Triplestore
SPARQL
Direct Mapping
8
Relational
Database
Direct
Mapping
Engine
Triplestore
Extract Transform Load
SPARQL
Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
9
W3C RDB2RDF Standards
• Standards to map relational data to RDF
• A Direct Mapping of Relational Data to RDF
– Default automatic mapping of relational data to
RDF
• R2RML: RDB to RDF Mapping Language
– Customizable language to map relational data to
RDF
10RDB2RDF
RDF
Direct Mapping
11
Relational
Database
Direct
Mapping
Engine
W3C Direct Mapping
• Input:
– Database (Schema and Data)
– Primary Keys
– Foreign Keys
• Output
– RDF graph
12
ID (pk) NAME AGE
1 Alice 25
2 Bob NULL
Person
TableTriple
13
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person>
rdf:type
Base IRI “Table Name”/“PK attr”=“PK value”
Note: If there is no PK, then
a fresh blank node for every
row is generated.
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person#NAME> “Alice” .
LiteralTriples
14
ID (pk) NAME AGE
1 Alice 25
2 Bob NULL
Person
Base IRI “Table Name”#“Attribute”
ID
(pk)
NAME AGE
CID
(fk)
1 Alice 25 100
2 Bob NULL 200
Person
CID
(pk)
TITLE
100 Austin
200 Madrid
City
ReferenceTriples
15
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person#ref-CID>
<http://www.ex.com/City/CID=100>.
Direct Mapping Result
16
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/ID=1>
<City/CID=100>
Alice
25
Austin
<Person/ID=2>
Alice
<City/CID=200> Madrid
<Person#NAME>
<Person#AGE> <Person#NAME>
<Person#NAME>
<Person#NAME>
<Person#ref-CID>
<Person#ref-CID>
Summary: Direct Mapping
• Default and Automatic Mapping
• URIs are automatically generated
– <table>
– <table#attribute>
– <table#ref-attribute>
– <Table#pkAttr=pkValue>
• RDF represents the same relational schema
• RDF can be transformed by
SPARQL CONSTRUCT
– RDF represents the structure and ontology of mapping
author’s choice
17
What else is missing?
• Relational Schema to OWL is *not* in the
W3C standard
• Many-to-Many relationships (binary tables)
• “Ugly” IRIs
18
RDF
R2RML
19
Relational
Database
R2RML
Mapping
Engine
OWL
Ontologies
(e.g FOAF, etc)
R2RML
File
Create R2RML
• Input
– Knowledge of the database (schema and data)
– Knowledge of the domain ontologies
– Knowledge of mappings
• Output
– R2RML file
• Direct Mapping helps to “bootstrap”
20
@prefix rr: <http://www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/ID={ID}";
rr:class <http://www.ex.com/Person>
];
rr:predicateObjectMap [
rr:predicate <http://www.ex.com/Person#NAME> ;
rr:objectMap [rr:column ”NAME" ]
].
Direct Mapping as R2RML
21
@prefix rr: <http://www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/ID={ID}";
rr:class <http://www.ex.com/Person>
];
rr:predicateObjectMap [
rr:predicate <http://www.ex.com/Person#NAME> ;
rr:objectMap [rr:column ”NAME" ]
]
.
Subject URITemplate
22
Subject URI
<Subject URI> rdf:type <Class URI>
@prefix rr: <http://www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/ID={ID}";
rr:class <http://www.ex.com/Person>
];
rr:predicateObjectMap [
rr:predicate <http://www.ex.com/Person#NAME> ;
rr:objectMap [rr:column ”NAME" ]
]
.
Predicate URI Constant
23
Predicate URI
@prefix rr: <http://www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/ID={ID}";
rr:class <http://www.ex.com/Person>
];
rr:predicateObjectMap [
rr:predicate <http://www.ex.com/Person#NAME> ;
rr:objectMap [rr:column ”NAME" ]
]
.
Object ColumnValue
24
Object Literal
<http://www.ex.com/Person/ID=1>
<http://www.ex.com/Person#NAME>
<http://www.ex.com/Person/1>
foaf:name
“Cool” URIs
25
foaf:Person
<http://www.ex.com/Person>
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/{ID}";
rr:class foaf:Person
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [rr:column ”NAME" ]
]
.
Customized R2RML
26
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName”Person" ];
rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}";
rr:class foaf:Person ];
rr:predicateObjectMap [
rr:predicate foaf:based_near ;
rr:objectMap [
rr:parentTripelMap <TripleMap2>;
rr:joinCondition [
rr:child “CID”;
rr:parent “CID”;
]
]
]
.
<TriplesMap2>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”City" ];
rr:subjectMap [ rr:template "http://ex.com/City/{CID}";
rr:class ex:City ];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [ rr:column ”TITLE" ]
]
. 27
SELECT ID, NAME FROM Person WHERE GENDER = "F"
Ex:Person1 rdf:type ex:Woman .
Ex:Person1 foaf:name “Alice” .
R2RMLViews
28
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:sqlQuery
“””SELECT ID, NAME
FROM Person WHERE gender = “F” “””];
rr:subjectMap [
rr:template "http://www.ex.com/Person/{ID}";
rr:class <http://www.ex.com/Woman>
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [rr:column ”NAME" ]
]
.
R2RMLView
29
Summary: R2RML
• Manual and Customizable Language
• Learning Curve
• Direct Mapping bootstraps R2RML
• RDF represents the structure and ontology of
mapping author’s choice
30
What else is missing?
• 100 tables x 10 attributes each
• >1000 R2RML mappings
• Lack of R2RML editing tools
31
Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
32
Triplestore
SPARQL
Extract – Transform – Load
(ETL)
Relational
Database
RDB2RDF
Dump
33
SPARQL
RDF
SQL
SQL
Results
SPARQL/RDF
Results
Relational
Database
RDB2RDF
Mapping
Wrapper Systems
34
Two Important
Optimizations
• Translate SPARQL to semantically equivalent
SQL
1. Detection of Unsatisfiable Conditions
2. Self-Join Elimination
35
SPARQL as Fast as SQL
36
Berlin Benchmark on 100 Million Triples on Oracle 11g using
Ultrawrap
Outline
• Scenarios
• W3C RDB2RDF Standards
– Direct Mapping
– R2RML
• ETL and Wrapper Systems
• Use Cases
– RNA Databases
– Musicbrainz
37
RNA Database
• Use Case: Exploratory Search
• Two Relational Databases
– rCAD
– Rfam
• Three Domain Ontologies
– Gene Ontology
– RNA Ontology
– NCBI Taxonomy
38
RNA Database
• Direct Mapping as Ontology
– Direct Mapping + Schema as Ontology
• Leverage Ontology Matching systems
• Ultrawrap
39
Semantic Enrichment
40
Alignment Mappings
Database
Ultrawrap
Direct Mapping as
Ontology
Source
Putative
Ontology
Domain
Ontology
R2RML
RNA Database Architecture
41
rCAD
Ultrawrap
Putative
Ontology
Gene
Ontology
RNA
Ontology
NCBI
Ontology
Rfam
Ultrawrap
QODI: Query-driven Ontology-based Data IntegrationSPARQL
Putative
Ontology
Reformulated
SPARQL
EUCLID Scenario
42
Visualization
Module
Metadata
Streaming providers
Physical Wrapper
Downloads
Dataacquisition
R2R Transf.LD Wrapper
Musical Content
Application
Analysis &
Mining Module
LDDatasetAccess
LD Wrapper
RDF/
XML
Integrated
Dataset
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
RDFa
Other content
W3C RDB2RDF
• Task: Integrate data from
relational DBMS with
Linked Data
• Approach: map from
relational schema to
semantic vocabulary with
R2RML
• Publishing: two
alternatives –
– Translate SPARQL into SQL
on the fly
– Batch transform data into
RDF, index and provide
SPARQL access in a
triplestore
43
LDDatasetAccess
Integrated
Data in
Triplestore
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
Dataacquisition
R2RML
Engine
Relational
DBMS
RDB2RDF
MusicBrainz Next Gen Schema
44
• artist
As pre-NGS, but
further attributes
• artist_credit
Allows joint credit
• release_group
Cf. ‘album’
versus:
• release
• medium
• track
• tracklist
• work
• recording
https://wiki.musicbrainz.org/Next_Generation_Schema
RDB2RDF
Music Ontology
45
• MusicArtist
– ArtistEvent, member_of
• SignalGroup
‘Album’ as per Release_Group
• Release
– ReleaseEvent
• Record
• Track
• Work
• Composition
http://musicontology.com/
RDB2RDF
Scale
46
• MusicBrainz RDF derived via R2RML:
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""]
;
rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
300M
Triples
R2RMLClass Mapping
• Mapping tables to classes is ‘easy’:
lb:Artist a rr:TriplesMap ;
rr:logicalTable [rr:tableName "artist"] ;
rr:subjectMap
[rr:class mo:MusicArtist ;
rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate mo:musicbrainz_guid ;
rr:objectMap [rr:column "gid" ;
rr:datatype xsd:string]] .
47RDB2RDF
R2RML Property Mapping
• Mapping columns to properties can be easy:
lb:artist_name a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid, artist_name.name
FROM artist
INNER JOIN artist_name ON artist.name =
artist_name.id"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate foaf:name ;
rr:objectMap [rr:column "name"]] .
RDB2RDF 48
NGS Advanced Relations
49
• Major entities (Artist, Release Group, Track, etc.) plus
URL are paired
(l_artist_artist)
• Each pairing
of instances
refers to a Link
• Links have types
(cf. RDF properties)
and attributes
http://wiki.musicbrainz.org/Advanced_Relationship
RDB2RDF
Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-
12ffb71e3821'"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .
50RDB2RDF
Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_dbpedia a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT artist.gid,
REPLACE(REPLACE(url, 'wikipedia.org/wiki',
'dbpedia.org/resource'),
'http://en.',
'http://')
AS url
FROM artist
INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0
INNER JOIN link ON l_artist_url.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN url on l_artist_url.entity1 = url.id
WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb'
AND url SIMILAR TO
'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ;
rr:subjectMap lb:sm_artist ;
rr:predicateObjectMap
[rr:predicate owl:sameAs ;
rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .
51RDB2RDF
SPARQL Example
• SPARQL versus SQL
ASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}
SELECT …
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
INNER JOIN
WHERE AND … AND … AND … AND …
52RDB2RDF
UpcomingTutorials
• ESWC – Montpellier, France
– May 27, 2013
• SemTechBiz – San Francisco, USA
– June 2, 2013
• More info: www.rdb2rdf.org
RDB2RDF 53
For exercises, quiz and further material visit our website:
54
@euclid_project EUCLID project EUCLIDproject
http://www.euclid-project.eu
Other channels:
eBook Course

Relational Database to RDF (RDB2RDF)

  • 1.
    Relational Database toRDF (RDB2RDF) Juan Sequeda Barry Norton
  • 2.
    What is RDB2RDF? 2 IDNAME AGE CID 1 Alice 25 100 2 Bob NULL 100 Person CID NAME 100 Austin 200 Madrid City <Person/1> <City/100> Alice 25 Austin <Person/2> Alice <City/200> Madrid foaf:namefoaf:name foaf:age foaf:name foaf:name foaf:based_near
  • 3.
    Context RDF Data Management Relational Databaseto RDF (RDB2RDF) Triplestores Wrapper Systems Extract-Transform-Load (ETL) RDBMS-backed Triplestores Native Triplestores NoSQL Triplestores 3
  • 4.
    Outline • Scenarios • W3CRDB2RDF Standards – Direct Mapping – R2RML • ETL and Wrapper Systems • Use Cases – RNA Databases – Musicbrainz 4
  • 5.
    Ideal Scenario: Automatic Mapping 5 RelationalDatabase Refined R2RML Direct Mapping as Ontology RDB2RDF Wrapper SPARQL Source Putative Ontology RDF Automatic Mapping Domain Ontologies
  • 6.
  • 7.
  • 8.
  • 9.
    Outline • Scenarios • W3CRDB2RDF Standards – Direct Mapping – R2RML • ETL and Wrapper Systems • Use Cases – RNA Databases – Musicbrainz 9
  • 10.
    W3C RDB2RDF Standards •Standards to map relational data to RDF • A Direct Mapping of Relational Data to RDF – Default automatic mapping of relational data to RDF • R2RML: RDB to RDF Mapping Language – Customizable language to map relational data to RDF 10RDB2RDF
  • 11.
  • 12.
    W3C Direct Mapping •Input: – Database (Schema and Data) – Primary Keys – Foreign Keys • Output – RDF graph 12
  • 13.
    ID (pk) NAMEAGE 1 Alice 25 2 Bob NULL Person TableTriple 13 <http://www.ex.com/Person/ID=1> <http://www.ex.com/Person> rdf:type Base IRI “Table Name”/“PK attr”=“PK value” Note: If there is no PK, then a fresh blank node for every row is generated.
  • 14.
    <http://www.ex.com/Person/ID=1> <http://www.ex.com/Person#NAME> “Alice” . LiteralTriples 14 ID(pk) NAME AGE 1 Alice 25 2 Bob NULL Person Base IRI “Table Name”#“Attribute”
  • 15.
    ID (pk) NAME AGE CID (fk) 1 Alice25 100 2 Bob NULL 200 Person CID (pk) TITLE 100 Austin 200 Madrid City ReferenceTriples 15 <http://www.ex.com/Person/ID=1> <http://www.ex.com/Person#ref-CID> <http://www.ex.com/City/CID=100>.
  • 16.
    Direct Mapping Result 16 IDNAME AGE CID 1 Alice 25 100 2 Bob NULL 100 Person CID NAME 100 Austin 200 Madrid City <Person/ID=1> <City/CID=100> Alice 25 Austin <Person/ID=2> Alice <City/CID=200> Madrid <Person#NAME> <Person#AGE> <Person#NAME> <Person#NAME> <Person#NAME> <Person#ref-CID> <Person#ref-CID>
  • 17.
    Summary: Direct Mapping •Default and Automatic Mapping • URIs are automatically generated – <table> – <table#attribute> – <table#ref-attribute> – <Table#pkAttr=pkValue> • RDF represents the same relational schema • RDF can be transformed by SPARQL CONSTRUCT – RDF represents the structure and ontology of mapping author’s choice 17
  • 18.
    What else ismissing? • Relational Schema to OWL is *not* in the W3C standard • Many-to-Many relationships (binary tables) • “Ugly” IRIs 18
  • 19.
  • 20.
    Create R2RML • Input –Knowledge of the database (schema and data) – Knowledge of the domain ontologies – Knowledge of mappings • Output – R2RML file • Direct Mapping helps to “bootstrap” 20
  • 21.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ]. Direct Mapping as R2RML 21
  • 22.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] . Subject URITemplate 22 Subject URI <Subject URI> rdf:type <Class URI>
  • 23.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] . Predicate URI Constant 23 Predicate URI
  • 24.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] . Object ColumnValue 24 Object Literal
  • 25.
  • 26.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix foaf: <http://xmlns.com/foaf/0.1/> . <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] . Customized R2RML 26
  • 27.
    <TriplesMap1> a rr:TriplesMap; rr:logicalTable [rr:tableName”Person" ]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [ rr:parentTripelMap <TripleMap2>; rr:joinCondition [ rr:child “CID”; rr:parent “CID”; ] ] ] . <TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ]; rr:subjectMap [ rr:template "http://ex.com/City/{CID}"; rr:class ex:City ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ] ] . 27
  • 28.
    SELECT ID, NAMEFROM Person WHERE GENDER = "F" Ex:Person1 rdf:type ex:Woman . Ex:Person1 foaf:name “Alice” . R2RMLViews 28
  • 29.
    @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix foaf: <http://xmlns.com/foaf/0.1/> . <TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:sqlQuery “””SELECT ID, NAME FROM Person WHERE gender = “F” “””]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class <http://www.ex.com/Woman> ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] . R2RMLView 29
  • 30.
    Summary: R2RML • Manualand Customizable Language • Learning Curve • Direct Mapping bootstraps R2RML • RDF represents the structure and ontology of mapping author’s choice 30
  • 31.
    What else ismissing? • 100 tables x 10 attributes each • >1000 R2RML mappings • Lack of R2RML editing tools 31
  • 32.
    Outline • Scenarios • W3CRDB2RDF Standards – Direct Mapping – R2RML • ETL and Wrapper Systems • Use Cases – RNA Databases – Musicbrainz 32
  • 33.
    Triplestore SPARQL Extract – Transform– Load (ETL) Relational Database RDB2RDF Dump 33
  • 34.
  • 35.
    Two Important Optimizations • TranslateSPARQL to semantically equivalent SQL 1. Detection of Unsatisfiable Conditions 2. Self-Join Elimination 35
  • 36.
    SPARQL as Fastas SQL 36 Berlin Benchmark on 100 Million Triples on Oracle 11g using Ultrawrap
  • 37.
    Outline • Scenarios • W3CRDB2RDF Standards – Direct Mapping – R2RML • ETL and Wrapper Systems • Use Cases – RNA Databases – Musicbrainz 37
  • 38.
    RNA Database • UseCase: Exploratory Search • Two Relational Databases – rCAD – Rfam • Three Domain Ontologies – Gene Ontology – RNA Ontology – NCBI Taxonomy 38
  • 39.
    RNA Database • DirectMapping as Ontology – Direct Mapping + Schema as Ontology • Leverage Ontology Matching systems • Ultrawrap 39
  • 40.
    Semantic Enrichment 40 Alignment Mappings Database Ultrawrap DirectMapping as Ontology Source Putative Ontology Domain Ontology R2RML
  • 41.
    RNA Database Architecture 41 rCAD Ultrawrap Putative Ontology Gene Ontology RNA Ontology NCBI Ontology Rfam Ultrawrap QODI:Query-driven Ontology-based Data IntegrationSPARQL Putative Ontology Reformulated SPARQL
  • 42.
    EUCLID Scenario 42 Visualization Module Metadata Streaming providers PhysicalWrapper Downloads Dataacquisition R2R Transf.LD Wrapper Musical Content Application Analysis & Mining Module LDDatasetAccess LD Wrapper RDF/ XML Integrated Dataset Interlinking Cleansing Vocabulary Mapping SPARQL Endpoint Publishing RDFa Other content
  • 43.
    W3C RDB2RDF • Task:Integrate data from relational DBMS with Linked Data • Approach: map from relational schema to semantic vocabulary with R2RML • Publishing: two alternatives – – Translate SPARQL into SQL on the fly – Batch transform data into RDF, index and provide SPARQL access in a triplestore 43 LDDatasetAccess Integrated Data in Triplestore Interlinking Cleansing Vocabulary Mapping SPARQL Endpoint Publishing Dataacquisition R2RML Engine Relational DBMS RDB2RDF
  • 44.
    MusicBrainz Next GenSchema 44 • artist As pre-NGS, but further attributes • artist_credit Allows joint credit • release_group Cf. ‘album’ versus: • release • medium • track • tracklist • work • recording https://wiki.musicbrainz.org/Next_Generation_Schema RDB2RDF
  • 45.
    Music Ontology 45 • MusicArtist –ArtistEvent, member_of • SignalGroup ‘Album’ as per Release_Group • Release – ReleaseEvent • Record • Track • Work • Composition http://musicontology.com/ RDB2RDF
  • 46.
    Scale 46 • MusicBrainz RDFderived via R2RML: lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template "http://musicbrainz.org/artist/{band}#_" ; rr:termType rr:IRI]] . 300M Triples
  • 47.
    R2RMLClass Mapping • Mappingtables to classes is ‘easy’: lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] . 47RDB2RDF
  • 48.
    R2RML Property Mapping •Mapping columns to properties can be easy: lb:artist_name a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, artist_name.name FROM artist INNER JOIN artist_name ON artist.name = artist_name.id"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate foaf:name ; rr:objectMap [rr:column "name"]] . RDB2RDF 48
  • 49.
    NGS Advanced Relations 49 •Major entities (Artist, Release Group, Track, etc.) plus URL are paired (l_artist_artist) • Each pairing of instances refers to a Link • Links have types (cf. RDF properties) and attributes http://wiki.musicbrainz.org/Advanced_Relationship RDB2RDF
  • 50.
    Advanced Relations Mapping •Mapping advanced relationships (SQL joins): lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-9afa-4ea0-910b- 12ffb71e3821'"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template "http://musicbrainz.org/artist/{band}#_" ; rr:termType rr:IRI]] . 50RDB2RDF
  • 51.
    Advanced Relations Mapping •Mapping advanced relationships (SQL joins): lb:artist_dbpedia a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, REPLACE(REPLACE(url, 'wikipedia.org/wiki', 'dbpedia.org/resource'), 'http://en.', 'http://') AS url FROM artist INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0 INNER JOIN link ON l_artist_url.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN url on l_artist_url.entity1 = url.id WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb' AND url SIMILAR TO 'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ; rr:subjectMap lb:sm_artist ; rr:predicateObjectMap [rr:predicate owl:sameAs ; rr:objectMap [rr:column "url"; rr:termType rr:IRI]] . 51RDB2RDF
  • 52.
    SPARQL Example • SPARQLversus SQL ASK {dbp:Paul_McCartney mo:member dbp:The_Beatles} SELECT … INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN INNER JOIN WHERE AND … AND … AND … AND … 52RDB2RDF
  • 53.
    UpcomingTutorials • ESWC –Montpellier, France – May 27, 2013 • SemTechBiz – San Francisco, USA – June 2, 2013 • More info: www.rdb2rdf.org RDB2RDF 53
  • 54.
    For exercises, quizand further material visit our website: 54 @euclid_project EUCLID project EUCLIDproject http://www.euclid-project.eu Other channels: eBook Course