Successfully reported this slideshow.

Virtuoso Relational To RDF Mapping

4,456 views

Published on

Published in: Technology
  • Be the first to comment

Virtuoso Relational To RDF Mapping

  1. 1. Mapping Relational Databases to RDF with OpenLink Virtuoso © 2008 OpenLink Software, All rights reserved. Orri Erling - Lead Developer, Virtuoso Team
  2. 2. Who Wants to Map? <ul><li>Semantic Web Scalers </li></ul><ul><ul><li>Expose whatever there is as RDF, the next guy will unify terms, make search and apps </li></ul></ul><ul><li>Data Warehouse Keepers </li></ul><ul><ul><li>Data is spread out, has implicit semantics, complex schemas, heterogeneous sources, ambiguous terms but we must make it join and aggregate cleanly </li></ul></ul>© 2008 OpenLink Software, All rights reserved.
  3. 3. Present State <ul><li>SPARQL to SQL exists but still, complex integrations are data warehouses </li></ul><ul><li>We'd really like to map, but... </li></ul><ul><li>Can it be otherwise? </li></ul>© 2008 OpenLink Software, All rights reserved.
  4. 4. Why RDF Data Warehouse? <ul><li>Pros </li></ul><ul><ul><li>Even query performance across all data </li></ul></ul><ul><ul><li>Possibility of forward-chaining inference </li></ul></ul><ul><ul><li>Some SPARQL features may be better supported, e.g. Unspecified predicates </li></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Keeping data up-to-date </li></ul></ul><ul><ul><li>Complex set up, needs dedicated servers: you don't build them on a whim </li></ul></ul>© 2008 OpenLink Software, All rights reserved.
  5. 5. Why Map? <ul><li>No copying, no timeliness issues </li></ul><ul><li>RDBMS outperforms RDF for analytics workloads </li></ul><ul><li>Agile reconfiguration without reloading data </li></ul>© 2008 OpenLink Software, All rights reserved.
  6. 6. Virtuoso <ul><li>Mapping of SPARQL to SQL against any existing schema - whether stored in Virtuoso or elsewhere </li></ul><ul><li>Physical quad store </li></ul><ul><li>Federated/local RDBMS </li></ul>© 2008 OpenLink Software, All rights reserved.
  7. 7. For Mapping to Deliver... <ul><li>Tackle any SQL analytics workload in SPARQL without extra cost </li></ul><ul><li>Deal with arbitrary SQL schema </li></ul><ul><li>Produce single SQL statements, optimizable by target RDBMS </li></ul><ul><li>Have intelligence for cases where one RDF entity can come from many relational sources </li></ul>© 2008 OpenLink Software, All rights reserved.
  8. 8. The Cases of Integration <ul><li>Bring similar but heterogeneous schemas into a unified ontology - Union View </li></ul><ul><li>Translate FKs of one schema to PKs in another - Distributed Join </li></ul><ul><li>Hide differences in normalization - Views for hiding joins </li></ul><ul><li>- Unit/Terminology conversions </li></ul>© 2008 OpenLink Software, All rights reserved.
  9. 9. Defining a Mapping <ul><li>Define URI formats and their subclass relations </li></ul><ul><li>Define which key-column-value combinations make a triple </li></ul><ul><li>Arbitrary SQL is allowed for mapping values and filtering </li></ul><ul><li>A single RDF node can be a composite of many columns, e.g. multipart key </li></ul>© 2008 OpenLink Software, All rights reserved. Use SPARQL/SQL to:
  10. 10. The TPC-H Case <ul><li>The 22 queries as extended SPARQL </li></ul><ul><li>Each generates a single SQL statement, executable by Virtuoso, Oracle, Others </li></ul><ul><li>Next make several TPC-H databases on different servers and run the queries against the union </li></ul>© 2008 OpenLink Software, All rights reserved. http://demo.openlinksw.com/tpc-h/
  11. 11. Where Problems Begin <ul><li>In OpenLink Data Spaces, 6 Collaborative apps all mapped to SIOC: </li></ul><ul><li>Trivially becomes a union of everything, 1000+ lines of SQL </li></ul><ul><li>Intelligently (once per app) becomes a Union of : </li></ul>© 2008 OpenLink Software, All rights reserved. select * from <ods> where {?s ?p ?o . ?s has_comment ?c . ?c has_author <xxx> } select post.* from post, comment, user where c_post = p_id and c_author = u_id and u_name = f ('xxx')
  12. 12. What One Must Know <ul><li>Mapping for integration is not trivial </li></ul><ul><li>Be careful when mapping multiple tables/columns to one class/property </li></ul><ul><li>Make URI schemes which encode type and source, so that senseless joins are not attempted if types not specified in query </li></ul><ul><li>Understand what the mapping logic can and cannot optimize </li></ul><ul><li>Understand what SQL can and cannot optimize </li></ul><ul><li>View resulting SQL for sanity check </li></ul>© 2008 OpenLink Software, All rights reserved.
  13. 13. SQL Extensions <ul><li>Mapping must work against any RDBMS/Schema, as is </li></ul><ul><li>But there is Virtuoso SQL between the mapping and target RDBMS(s) </li></ul><ul><li>Location and latency - conscious distributed cost model </li></ul><ul><li>Breakup for making a wide result set into a row per property </li></ul><ul><li>Inverse functions </li></ul>© 2008 OpenLink Software, All rights reserved.
  14. 14. Use Cases <ul><li>OpenLink Data Spaces - Blog, Wiki, News, Social Network, Feed Aggregation, Tag Clouds, Bookmarks etc. </li></ul><ul><li>OpenLink's own MIS - “total information awareness”: URI for any CRM Object, Account, Product, Support Case, Email etc.. </li></ul><ul><li>Musicbrainz </li></ul><ul><li>phpBB, Drupal, MediaWiki, WordPress, Bugzilla, and others. </li></ul>© 2008 OpenLink Software, All rights reserved.
  15. 15. OpenLink Software © 2008 OpenLink Software, All rights reserved. Thank You! http://virtuoso.openlinksw.com

×