Releasing Relational Data to the Semantic Web

2,804 views

Published on

Enterprises are drowning in data that they can't find, access, or use.

For many years, enterprises have wrestled with the best way to combine all that data into actionable information without building systems that break as schemas evolve. Approaches like warehousing and ETL can be brittle in the face of changing data sources or expensive to create. Data integration at the application level is common but this results in significant complexity in the code. Data-oriented web services attempt to provide reusable sources of integrated data, however these have just added another layer of data access that constrain query and access patterns.

This talk will look at how semantic web technologies can be used to make existing data visible and actionable using standards like RDF (data), R2RML (data translation), OWL (schema definition and integration), SPARQL (federated query), and RIF (rules). The semantic web approach takes the data you already have and makes that data available for query and use across your existing data sources. This base capability is an excellent platform for building federated analytics.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
2,804
On SlideShare
0
From Embeds
0
Number of Embeds
331
Actions
Shares
0
Downloads
75
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Releasing Relational Data to the Semantic Web

  1. 1. Releasing Relational Data to the Semantic Web Alex Miller amiller@revelytix.com 1
  2. 2. Semantic webRelational data Federation HREIW Analytics 2
  3. 3. There are things we wish to describe. 3
  4. 4. We need some way to identify each thing. 4
  5. 5. A URI is abo ut "identifying" things, not "locating" things (a URL).On the web, we identify things with a URI. 5
  6. 6. dbp:Chicago_(band)dbp:Wrigley_Field dbp:The_Blues_Brothers_(film) dbp:Chicagodbp:Chicago_Cubs dbp:Barack_Obama dbp:Pizza dbp: http://dbpedia.org/resource/ 6
  7. 7. Things are moreinteresting if we relate them.Relationships are also described by a URI. 7
  8. 8. dbp:Chicago_(band) dbp:The_Blues_Brothers_(film) dbp:Wrigley_Field n db tio po oca :lo c _l m at ion :fil ie ov mdbpo:owner dbp:Chicago dbp o:r e si den c e dbp:Chicago_Cubs dbp:Barack_Obama dbp:Pizza dbp: http://dbpedia.org/resource/ dbpo: http://dbpedia.org/ontology/ 8
  9. 9. Triple<subject> <predicate> <object> 9
  10. 10. Subject dbp:Chicago_(band) dbp:The_Blues_Brothers_(film) dbp:Wrigley_Field Predicate n db tio po ca :lo o ca _l m tio fil Object : n ie ov mdbpo:owner dbp:Chicago dbp o:r e si den c e dbp:Chicago_Cubs dbp:Barack_Obama dbp:Pizza dbp: http://dbpedia.org/resource/ dbpo: http://dbpedia.org/ontology/ 10
  11. 11. <subject> <predicate> <object>dbp:Wrigley_Field dbpo:location dbp:Chicago resource resource resource or value 11
  12. 12. dbp:Chicago_(band) dbp:The_Blues_Brothers_(film) dbp:Wrigley_Field n db tio po oca :lo c _l m at ion :fil ie ov mdbpo:owner dbp:Chicago dbp o:r e si den c e dbp:Chicago_Cubs dbp:Barack_Obama dbp:Pizza dbp: http://dbpedia.org/resource/ dbpo: http://dbpedia.org/ontology/ 12
  13. 13. Congratulations! You now know RDF. 13
  14. 14. If things and relationships can be defined by any URI, how do we knowwhat were talking about? 14
  15. 15. We need metadata. 15
  16. 16. Specifically, we need avocabulary of common terms that describe our data. 16
  17. 17. A class describes agroup of things that share common properties. 17
  18. 18. ex:City is a is a is adbp:San_Francisco dbp:Chicago dbp:Saint_Louis dbp: http://dbpedia.org/resource/ ex: http://example.org/ontology/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# 18
  19. 19. ex:City rdf:type rdf:type rdf:typedbp:San_Francisco dbp:Chicago dbp:Saint_Louis dbp: http://dbpedia.org/resource/ ex: http://example.org/ontology/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# 19
  20. 20. rdfs:Class rdf:type ex:City rdf:type rdf:type rdf:typedbp:San_Francisco dbp:Chicago dbp:Saint_Louis dbp: http://dbpedia.org/resource/ ex: http://example.org/ontology/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# 20
  21. 21. rdf:type ex:Location rdfs:Class rdfs:subClassOf rdf:type ex:City rdfs:Classdbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema# 21
  22. 22. Classes let us talk aboutkinds of things. Now we need some way to describe attributes. 22
  23. 23. ex:City rdf:type ex:country ex:foundeddbp:United_States 1837 dbp:Chicago dbp: http://dbpedia.org/resource/ ex: http://example.org/ontology/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# 23
  24. 24. rdfs:doex:City main rdfs:range rdf:Property xsd:gYear rdf:type rdf:type ex:founded 1837 dbp:Chicago dbp: http://dbpedia.org/resource/ ex: http://example.org/ontology/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# 24
  25. 25. Congratulations! You now know RDF Schema. 25
  26. 26. How do we find stuff in this data? SPARQL 26
  27. 27. ex:Baseball_Team ex:Stadium ex:City rdf:type rdf:type rdf:type dbpo:owner dbpo:location dbp:Chicago dbp:Chicago_Cubs dbp:Wrigley_Field dbp: http://dbpedia.org/resource/ dbpo: http://dbpedia.org/ontology/ 27
  28. 28. ex:Stadium ex:City rdf:type rdf:type dbpo:owner dbpo:location?owner ?stadium ?city 28
  29. 29. ex:Stadium ex:City?stadium rdf:type ex:Stadium . ?city rdf:type ex:City . rdf:type rdf:type dbpo:owner dbpo:location ?owner ?stadium ?city ?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city . 29
  30. 30. ex:Stadium ex:City ?stadium rdf:type ex:Stadium . ?city rdf:type ex:City . rdf:type rdf:type dbpo:owner dbpo:location ?owner ?stadium ?city ?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city .SELECT ?owner ?stadium ?cityWHERE { ?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city . ?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .} 30
  31. 31. UnionsJoins SPARQLOuter joinsFilter with criteriaProject expressionsSortDuplicate removalSlice (limit / offset)Aggregates (grouping, etc)Subqueries 22 31
  32. 32. Semantic webRelational data Federation HREIW Analytics 32
  33. 33. Sounds interesting. But my data is in arelational database! 33
  34. 34. Music DatabaseMusicians: MID First Last Inst_ID 1 Eddie Van Halen 10 2 Yo Yo Ma 20 3 Kenny G 30 Instruments: IID Instrument Type 10 Guitar String 20 Cello String 30 Saxophone Woodwind 34
  35. 35. Musician Schema rdfs:Class rdf:Property rdf:type rdf:type rdfs:domain music:firstName music:Musician rdfs:doma in rdfs music:lastName :dom ain rdfs:range music:playsmusic:Instrument rdfs:dom ain rdfs :do music:instName mai n music:instType 35
  36. 36. Triples From Tables Musicians: Instruments: MID First Last Inst_ID IID Instrument Type 1 Eddie Van Halen 10 10 Guitar String 2 Yo Yo Ma 20 20 Cello String 3 Kenny G 30 30 Saxophone Woodwind Turn each key into a resource and specify the proper type of each resource:artist:1 rdf:type music:Musician instrument:10 rdf:type music:Instrumentartist:2 rdf:type music:Musician instrument:20 rdf:type music:Instrumentartist:3 rdf:type music:Musician instrument:30 rdf:type music:Instrument 36
  37. 37. Triples From Tables Musicians: Instruments: MID First Last Inst_ID IID Instrument Type 1 Eddie Van Halen 10 10 Guitar String 2 Yo Yo Ma 20 20 Cello String 3 Kenny G 30 30 Saxophone Woodwind Turn each cell into a triple based on the key, property (mapped per column), and value:artist:1 music:firstName "Eddie" instrument:10 music:instName "Guitar"artist:1 music:lastName "Van Halen" instrument:10 music:instType "String"artist:2 music:firstName "Yo Yo" instrument:20 music:instName "Cello"artist:2 music:lastName "Ma" instrument:20 music:instType "String"artist:3 music:firstName "Kenny" instrument:30 music:instName "Saxophone"artist:3 music:lastName "G" instrument:30 music:instType "Woodwind" 37
  38. 38. Triples From Tables Musicians: Instruments: MID First Last Inst_ID IID Instrument Type 1 Eddie Van Halen 10 10 Guitar String 2 Yo Yo Ma 20 20 Cello String 3 Kenny G 30 30 Saxophone WoodwindTurn each foreign key reference into a relationshipbetween the foreign and primary resources. artist:1 music:plays instrument:10 artist:1 music:plays instrument:20 artist:2 music:plays instrument:30 38
  39. 39. R2RML• "Relational to RDF Mapping Language"• RDB2RDF Working Group at W3C• ETL "data transformation" use case• Dynamic "query translation" use case • SPARQL to SQL 39
  40. 40. R2RML Triple Mapping ain music:instName rdfs:dommusic:Instrument rdfs:d omain music:instType Instruments: IID Instrument Type 10 Guitar String 40
  41. 41. R2RML Triple Mapping ain music:instName rdfs:dom music:Instrument rdfs:d omain music:instTypeTriples Map rr:tableName Instruments: IID Instrument Type 10 Guitar String 40
  42. 42. R2RML Triple Mapping ain music:instName rdfs:dom music:Instrument rdfs:d omain rr:class music:instType Subject Map "http://example.com/music/ Inst-{iid}"Triples Map rr:tableName Instruments: IID Instrument Type 10 Guitar String 40
  43. 43. R2RML Triple Mapping ain music:instName rdfs:dom music:Instrument rdfs:d omain rr:class music:instType rr:predicate Subject Map "http://example.com/music/ Inst-{iid}" Predicate Map Predicate Object Map Object MapTriples Map rr:tableName Instruments: rr:column IID Instrument Type 10 Guitar String 40
  44. 44. @prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix music: <http://example.com/music/> .@prefix mapping: <http://example.com/ont/> .mapping:InstrumentMapping a rr:TriplesMapClass; rr:tableName "Instruments"; rr:subjectMap [ rr:template "http://example.com/music/Inst-{iid}"; rr:class music:Instrument ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instName ]; rr:objectMap [ rr:column "instrument" ]; ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instType ]; rr:objectMap [ rr:column "type" ]; ];. 41
  45. 45. SPARQL translation SPARQL Solutions R2RML SQL Results Database 42
  46. 46. Semantic webRelational data Federation HREIW Analytics 43
  47. 47. SPARQL Protocol• Standard HTTP API for calling a SPARQL processor• Supported by all major triple stores and query processors 44
  48. 48. SPARQL FederationSELECT ?artist ?song ?buyLink Return Federated dataWHERE { SERVICE <http://listening> { ?listened rdf:type listen:event . Call SPARQL endpoint that tracks ?listened listen:artist ?artist . your listening (like last.fm) ?listened listen:song ?song } OPTIONAL { SERVICE <http://amazon> { Call Amazon endpoint to get info ?isbn rdf:type amaz:mp3 . on where to download the song. ?isbn amaz:artist ?artist . ?isbn amaz:song ?song . ?isbn amaz:link ?buyLink } } } 45
  49. 49. Service Descriptions 46
  50. 50. Federator SPARQL Endpoint Ontology and Federator service registry R2RML Web SPARQLEndpoint Endpoint Endpoint Data Db Triple base pedia Store 47
  51. 51. Named graph mapping• Services can provide named graphs, described in their service description• Federator lets you create federated named graphs that map to service named graphs 48
  52. 52. Data integration• Performance - data volume from sources is key• Source capabilities• Source statistics 49
  53. 53. Performance concerns: data volume SELECT ... FILTER (?age >= 24) ...Reduction factors: •criteria Domain •minimal projection Results Query •aggregation •joins (sometimes) •dup removal WHERE Person.age >= 24 50
  54. 54. Performance concerns: federated joins 51
  55. 55. Data source capabilities• SQL support• Function support• Function translation• Inverse functions• Data type mappings and translations 52
  56. 56. Data source statistics• Table cardinality• Column selectivity• Column null density• Join selectivity 53
  57. 57. Semantic webRelational data Federation HREIW Analytics 54
  58. 58. HREIW - HR Analysis 55
  59. 59. “Which Marines that speak French and/ or French Creole have had at least six months since their last deployment?” 56
  60. 60. “How many discharges were theresult of the Don’t Ask Don’t Tell policy per year?” 57
  61. 61. “What is the average length of service forsoldiers deployed in Afghanistan vs Iraq?” 58
  62. 62. Where’s the data? 59
  63. 63. Ontologies HR Standards HR DomainMappingSources 60
  64. 64. Technologies cs Analyti y HR Standards Ontolog ment develo p HR Domain SPARQL FederationMapping Rule sSources SPARQL to database 61
  65. 65. Collaborative Ontologies model Domain ontology wiki discussOntologist Subject Matter Experts diagram discuss 62
  66. 66. Ontology Visualization 63
  67. 67. Semantic webRelational data Federation HREIW Analytics 64
  68. 68. RIF• Rule Interchange Format, W3C recommendation• Rule = IF - THEN statement• Used to derive new triples from existing triples• Dialects • Core • Framework for Logic Dialects (FLD) • Basic Logic Dialect (BLD) • Production Rules Dialect (PRD)• Rex - Revelytix RIF Core implementation 65
  69. 69. Dashboards 66
  70. 70. Dashboards 67
  71. 71. Dashboards 68
  72. 72. Enterprise Semantic Web• Knoodl - collaborative ontology creation• OntVis - ontology visualization (OWL)• Spyder - SPARQL to SQL (RDF, R2RML)• Federator - SPARQL federation (SPARQL 1.1, SPARQL Federation extensions)• Rex - entailment with rules (RIF)• Dashboards - analytics, visualization 69
  73. 73. More information• Revelytix - http://revelytix.com• Knoodl - http://knoodl.com• OntVis - http://bit.ly/hLm3sd• Spyder - http://revelytix.com/content/spyder• Federator - beta coming soon...• Rex - beta coming soon... 70

×