Relational Database to RDF (RDB2RDF)

2,534 views

Published on

Webinar version of the RDB2RDF tutorial, extending EUCLID Module 3

Published in: Technology, Education

Relational Database to RDF (RDB2RDF)

  1. 1. Relational Database to RDF(RDB2RDF)Juan SequedaBarry Norton
  2. 2. What is RDB2RDF?2ID NAME AGE CID1 Alice 25 1002 Bob NULL 100PersonCID NAME100 Austin200 MadridCity<Person/1><City/100>Alice 25Austin<Person/2>Alice<City/200> Madridfoaf:namefoaf:name foaf:agefoaf:namefoaf:namefoaf:based_near
  3. 3. ContextRDFData ManagementRelational Database to RDF(RDB2RDF)TriplestoresWrapperSystemsExtract-Transform-Load(ETL)RDBMS-backedTriplestoresNativeTriplestoresNoSQLTriplestores3
  4. 4. Outline• Scenarios• W3C RDB2RDF Standards– Direct Mapping– R2RML• ETL and Wrapper Systems• Use Cases– RNA Databases– Musicbrainz4
  5. 5. Ideal Scenario: AutomaticMapping5Relational DatabaseRefinedR2RMLDirectMapping asOntologyRDB2RDF WrapperSPARQLSourcePutativeOntologyRDFAutomaticMappingDomainOntologies
  6. 6. Semi-automatic Mapping6RelationalDatabaseRefinedR2RMLDirectMapping asOntologyRDB2RDFWrapperSPARQLSourcePutativeOntologyRDFSemi-AutomaticMappingDomainOntologies
  7. 7. R2RML7RelationalDatabaseR2RMLMappingEngineDomainOntologies(e.g FOAF, etc)R2RMLFileExtract Transform LoadTriplestoreSPARQL
  8. 8. Direct Mapping8RelationalDatabaseDirectMappingEngineTriplestoreExtract Transform LoadSPARQL
  9. 9. Outline• Scenarios• W3C RDB2RDF Standards– Direct Mapping– R2RML• ETL and Wrapper Systems• Use Cases– RNA Databases– Musicbrainz9
  10. 10. W3C RDB2RDF Standards• Standards to map relational data to RDF• A Direct Mapping of Relational Data to RDF– Default automatic mapping of relational data toRDF• R2RML: RDB to RDF Mapping Language– Customizable language to map relational data toRDF10RDB2RDF
  11. 11. RDFDirect Mapping11RelationalDatabaseDirectMappingEngine
  12. 12. W3C Direct Mapping• Input:– Database (Schema and Data)– Primary Keys– Foreign Keys• Output– RDF graph12
  13. 13. ID (pk) NAME AGE1 Alice 252 Bob NULLPersonTableTriple13<http://www.ex.com/Person/ID=1><http://www.ex.com/Person>rdf:typeBase IRI “Table Name”/“PK attr”=“PK value”Note: If there is no PK, thena fresh blank node for everyrow is generated.
  14. 14. <http://www.ex.com/Person/ID=1><http://www.ex.com/Person#NAME> “Alice” .LiteralTriples14ID (pk) NAME AGE1 Alice 252 Bob NULLPersonBase IRI “Table Name”#“Attribute”
  15. 15. ID(pk)NAME AGECID(fk)1 Alice 25 1002 Bob NULL 200PersonCID(pk)TITLE100 Austin200 MadridCityReferenceTriples15<http://www.ex.com/Person/ID=1><http://www.ex.com/Person#ref-CID><http://www.ex.com/City/CID=100>.
  16. 16. Direct Mapping Result16ID NAME AGE CID1 Alice 25 1002 Bob NULL 100PersonCID NAME100 Austin200 MadridCity<Person/ID=1><City/CID=100>Alice25Austin<Person/ID=2>Alice<City/CID=200> Madrid<Person#NAME><Person#AGE> <Person#NAME><Person#NAME><Person#NAME><Person#ref-CID><Person#ref-CID>
  17. 17. Summary: Direct Mapping• Default and Automatic Mapping• URIs are automatically generated– <table>– <table#attribute>– <table#ref-attribute>– <Table#pkAttr=pkValue>• RDF represents the same relational schema• RDF can be transformed bySPARQL CONSTRUCT– RDF represents the structure and ontology of mappingauthor’s choice17
  18. 18. What else is missing?• Relational Schema to OWL is *not* in theW3C standard• Many-to-Many relationships (binary tables)• “Ugly” IRIs18
  19. 19. RDFR2RML19RelationalDatabaseR2RMLMappingEngineOWLOntologies(e.g FOAF, etc)R2RMLFile
  20. 20. Create R2RML• Input– Knowledge of the database (schema and data)– Knowledge of the domain ontologies– Knowledge of mappings• Output– R2RML file• Direct Mapping helps to “bootstrap”20
  21. 21. @prefix rr: <http://www.w3.org/ns/r2rml#> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”Person”];rr:subjectMap [rr:template "http://www.ex.com/Person/ID={ID}";rr:class <http://www.ex.com/Person>];rr:predicateObjectMap [rr:predicate <http://www.ex.com/Person#NAME> ;rr:objectMap [rr:column ”NAME" ]].Direct Mapping as R2RML21
  22. 22. @prefix rr: <http://www.w3.org/ns/r2rml#> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”Person”];rr:subjectMap [rr:template "http://www.ex.com/Person/ID={ID}";rr:class <http://www.ex.com/Person>];rr:predicateObjectMap [rr:predicate <http://www.ex.com/Person#NAME> ;rr:objectMap [rr:column ”NAME" ]].Subject URITemplate22Subject URI<Subject URI> rdf:type <Class URI>
  23. 23. @prefix rr: <http://www.w3.org/ns/r2rml#> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”Person”];rr:subjectMap [rr:template "http://www.ex.com/Person/ID={ID}";rr:class <http://www.ex.com/Person>];rr:predicateObjectMap [rr:predicate <http://www.ex.com/Person#NAME> ;rr:objectMap [rr:column ”NAME" ]].Predicate URI Constant23Predicate URI
  24. 24. @prefix rr: <http://www.w3.org/ns/r2rml#> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”Person”];rr:subjectMap [rr:template "http://www.ex.com/Person/ID={ID}";rr:class <http://www.ex.com/Person>];rr:predicateObjectMap [rr:predicate <http://www.ex.com/Person#NAME> ;rr:objectMap [rr:column ”NAME" ]].Object ColumnValue24Object Literal
  25. 25. <http://www.ex.com/Person/ID=1><http://www.ex.com/Person#NAME><http://www.ex.com/Person/1>foaf:name“Cool” URIs25foaf:Person<http://www.ex.com/Person>
  26. 26. @prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”Person”];rr:subjectMap [rr:template "http://www.ex.com/Person/{ID}";rr:class foaf:Person];rr:predicateObjectMap [rr:predicate foaf:name;rr:objectMap [rr:column ”NAME" ]].Customized R2RML26
  27. 27. <TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:tableName”Person" ];rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}";rr:class foaf:Person ];rr:predicateObjectMap [rr:predicate foaf:based_near ;rr:objectMap [rr:parentTripelMap <TripleMap2>;rr:joinCondition [rr:child “CID”;rr:parent “CID”;]]].<TriplesMap2>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”City" ];rr:subjectMap [ rr:template "http://ex.com/City/{CID}";rr:class ex:City ];rr:predicateObjectMap [rr:predicate foaf:name;rr:objectMap [ rr:column ”TITLE" ]]. 27
  28. 28. SELECT ID, NAME FROM Person WHERE GENDER = "F"Ex:Person1 rdf:type ex:Woman .Ex:Person1 foaf:name “Alice” .R2RMLViews28
  29. 29. @prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .<TriplesMap1>a rr:TriplesMap;rr:logicalTable [ rr:sqlQuery“””SELECT ID, NAMEFROM Person WHERE gender = “F” “””];rr:subjectMap [rr:template "http://www.ex.com/Person/{ID}";rr:class <http://www.ex.com/Woman>];rr:predicateObjectMap [rr:predicate foaf:name;rr:objectMap [rr:column ”NAME" ]].R2RMLView29
  30. 30. Summary: R2RML• Manual and Customizable Language• Learning Curve• Direct Mapping bootstraps R2RML• RDF represents the structure and ontology ofmapping author’s choice30
  31. 31. What else is missing?• 100 tables x 10 attributes each• >1000 R2RML mappings• Lack of R2RML editing tools31
  32. 32. Outline• Scenarios• W3C RDB2RDF Standards– Direct Mapping– R2RML• ETL and Wrapper Systems• Use Cases– RNA Databases– Musicbrainz32
  33. 33. TriplestoreSPARQLExtract – Transform – Load(ETL)RelationalDatabaseRDB2RDFDump33
  34. 34. SPARQLRDFSQLSQLResultsSPARQL/RDFResultsRelationalDatabaseRDB2RDFMappingWrapper Systems34
  35. 35. Two ImportantOptimizations• Translate SPARQL to semantically equivalentSQL1. Detection of Unsatisfiable Conditions2. Self-Join Elimination35
  36. 36. SPARQL as Fast as SQL36Berlin Benchmark on 100 Million Triples on Oracle 11g usingUltrawrap
  37. 37. Outline• Scenarios• W3C RDB2RDF Standards– Direct Mapping– R2RML• ETL and Wrapper Systems• Use Cases– RNA Databases– Musicbrainz37
  38. 38. RNA Database• Use Case: Exploratory Search• Two Relational Databases– rCAD– Rfam• Three Domain Ontologies– Gene Ontology– RNA Ontology– NCBI Taxonomy38
  39. 39. RNA Database• Direct Mapping as Ontology– Direct Mapping + Schema as Ontology• Leverage Ontology Matching systems• Ultrawrap39
  40. 40. Semantic Enrichment40Alignment MappingsDatabaseUltrawrapDirect Mapping asOntologySourcePutativeOntologyDomainOntologyR2RML
  41. 41. RNA Database Architecture41rCADUltrawrapPutativeOntologyGeneOntologyRNAOntologyNCBIOntologyRfamUltrawrapQODI: Query-driven Ontology-based Data IntegrationSPARQLPutativeOntologyReformulatedSPARQL
  42. 42. EUCLID Scenario42VisualizationModuleMetadataStreaming providersPhysical WrapperDownloadsDataacquisitionR2R Transf.LD WrapperMusical ContentApplicationAnalysis &Mining ModuleLDDatasetAccessLD WrapperRDF/XMLIntegratedDatasetInterlinking CleansingVocabularyMappingSPARQLEndpointPublishingRDFaOther content
  43. 43. W3C RDB2RDF• Task: Integrate data fromrelational DBMS withLinked Data• Approach: map fromrelational schema tosemantic vocabulary withR2RML• Publishing: twoalternatives –– Translate SPARQL into SQLon the fly– Batch transform data intoRDF, index and provideSPARQL access in atriplestore43LDDatasetAccessIntegratedData inTriplestoreInterlinking CleansingVocabularyMappingSPARQLEndpointPublishingDataacquisitionR2RMLEngineRelationalDBMSRDB2RDF
  44. 44. MusicBrainz Next Gen Schema44• artistAs pre-NGS, butfurther attributes• artist_creditAllows joint credit• release_groupCf. ‘album’versus:• release• medium• track• tracklist• work• recordinghttps://wiki.musicbrainz.org/Next_Generation_SchemaRDB2RDF
  45. 45. Music Ontology45• MusicArtist– ArtistEvent, member_of• SignalGroup‘Album’ as per Release_Group• Release– ReleaseEvent• Record• Track• Work• Compositionhttp://musicontology.com/RDB2RDF
  46. 46. Scale46• MusicBrainz RDF derived via R2RML:lb:artist_member a rr:TriplesMap ;rr:logicalTable [rr:sqlQuery"""SELECT a1.gid, a2.gid AS bandFROM artist a1INNER JOIN l_artist_artist ON a1.id =l_artist_artist.entity0INNER JOIN link ON l_artist_artist.link = link.idINNER JOIN link_type ON link_type = link_type.idINNER JOIN artist a2 on l_artist_artist.entity1 = a2.idWHERE link_type.gid=5be4c609-9afa-4ea0-910b-12ffb71e3821"""];rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"];rr:predicateObjectMap[rr:predicate mo:member_of ;rr:objectMap [rr:template"http://musicbrainz.org/artist/{band}#_" ;rr:termType rr:IRI]] .300MTriples
  47. 47. R2RMLClass Mapping• Mapping tables to classes is ‘easy’:lb:Artist a rr:TriplesMap ;rr:logicalTable [rr:tableName "artist"] ;rr:subjectMap[rr:class mo:MusicArtist ;rr:template"http://musicbrainz.org/artist/{gid}#_"] ;rr:predicateObjectMap[rr:predicate mo:musicbrainz_guid ;rr:objectMap [rr:column "gid" ;rr:datatype xsd:string]] .47RDB2RDF
  48. 48. R2RML Property Mapping• Mapping columns to properties can be easy:lb:artist_name a rr:TriplesMap ;rr:logicalTable [rr:sqlQuery"""SELECT artist.gid, artist_name.nameFROM artistINNER JOIN artist_name ON artist.name =artist_name.id"""] ;rr:subjectMap [rr:template"http://musicbrainz.org/artist/{gid}#_"] ;rr:predicateObjectMap[rr:predicate foaf:name ;rr:objectMap [rr:column "name"]] .RDB2RDF 48
  49. 49. NGS Advanced Relations49• Major entities (Artist, Release Group, Track, etc.) plusURL are paired(l_artist_artist)• Each pairingof instancesrefers to a Link• Links have types(cf. RDF properties)and attributeshttp://wiki.musicbrainz.org/Advanced_RelationshipRDB2RDF
  50. 50. Advanced Relations Mapping• Mapping advanced relationships (SQL joins):lb:artist_member a rr:TriplesMap ;rr:logicalTable [rr:sqlQuery"""SELECT a1.gid, a2.gid AS bandFROM artist a1INNER JOIN l_artist_artist ON a1.id =l_artist_artist.entity0INNER JOIN link ON l_artist_artist.link = link.idINNER JOIN link_type ON link_type = link_type.idINNER JOIN artist a2 on l_artist_artist.entity1 = a2.idWHERE link_type.gid=5be4c609-9afa-4ea0-910b-12ffb71e3821"""] ;rr:subjectMap [rr:template"http://musicbrainz.org/artist/{gid}#_"] ;rr:predicateObjectMap[rr:predicate mo:member_of ;rr:objectMap [rr:template"http://musicbrainz.org/artist/{band}#_" ;rr:termType rr:IRI]] .50RDB2RDF
  51. 51. Advanced Relations Mapping• Mapping advanced relationships (SQL joins):lb:artist_dbpedia a rr:TriplesMap ;rr:logicalTable [rr:sqlQuery"""SELECT artist.gid,REPLACE(REPLACE(url, wikipedia.org/wiki,dbpedia.org/resource),http://en.,http://)AS urlFROM artistINNER JOIN l_artist_url ON artist.id = l_artist_url.entity0INNER JOIN link ON l_artist_url.link = link.idINNER JOIN link_type ON link_type = link_type.idINNER JOIN url on l_artist_url.entity1 = url.idWHERE link_type.gid=29651736-fa6d-48e4-aadc-a557c6add1cbAND url SIMILAR TOhttp://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%"""] ;rr:subjectMap lb:sm_artist ;rr:predicateObjectMap[rr:predicate owl:sameAs ;rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .51RDB2RDF
  52. 52. SPARQL Example• SPARQL versus SQLASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}SELECT …INNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOINWHERE AND … AND … AND … AND …52RDB2RDF
  53. 53. UpcomingTutorials• ESWC – Montpellier, France– May 27, 2013• SemTechBiz – San Francisco, USA– June 2, 2013• More info: www.rdb2rdf.orgRDB2RDF 53
  54. 54. For exercises, quiz and further material visit our website:54@euclid_project EUCLID project EUCLIDprojecthttp://www.euclid-project.euOther channels:eBook Course

×