• Save
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink Virtuoso
Upcoming SlideShare
Loading in...5
×
 

ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink Virtuoso

on

  • 3,130 views

 

Statistics

Views

Total Views
3,130
Views on SlideShare
3,129
Embed Views
1

Actions

Likes
4
Downloads
0
Comments
1

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • shit
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink Virtuoso ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink Virtuoso Presentation Transcript

  • Mapping Relational Databases to RDF with OpenLink Virtuoso © 2008 OpenLink Software, All rights reserved. Orri Erling - Lead Developer, Virtuoso Team
  • Who Wants to Map?
    • Semantic Web Scalers
    • Expose whatever there is as RDF, the next guy will unify terms, make search and apps
    • Data Warehouse Keepers
    • Data is spread out, has implicit semantics, complex schemas, heterogeneous sources, ambiguous terms but we must make it join and aggregate cleanly
    © 2008 OpenLink Software, All rights reserved.
  • Present State
    • SPARQL to SQL exists but still, complex integrations are data warehouses
    • We'd really like to map, but...
    • Can it be otherwise?
    © 2008 OpenLink Software, All rights reserved.
  • Why RDF Data Warehouse?
    • Pros
      • Even query performance across all data
      • Possibility of forward-chaining inference
      • Some SPARQL features may be better supported, e.g. Unspecified predicates
    • Cons
      • Keeping data up-to-date
      • Complex set up, needs dedicated servers: you don't build them on a whim
    © 2008 OpenLink Software, All rights reserved.
  • Why Map?
    • No copying, no timeliness issues
    • RDBMS outperforms RDF for analytics workloads
    • Agile reconfiguration without reloading data
    © 2008 OpenLink Software, All rights reserved.
  • Virtuoso
    • Mapping of SPARQL to SQL against any existing schema - whether stored in Virtuoso or elsewhere
    • Physical Quad-store
    • Federated/local RDBMS
    © 2008 OpenLink Software, All rights reserved.
  • For Mapping to Deliver...
    • Tackle any SQL analytics workload in SPARQL without extra cost
    • Deal with arbitrary SQL schema
    • Produce single SQL statements, for target RDBMS to optimize
    • Have intelligence for cases where one RDF entity can come from many relational sources
    © 2008 OpenLink Software, All rights reserved.
  • The Cases of Integration
    • Bring similar but heterogeneous schemas into a unified ontology - Union View
    • Translate FKs of one schema to PKs in another - Distributed Join
    • Hide differences in normalization - Views for hiding joins
    • Unit/Terminology conversions
    © 2008 OpenLink Software, All rights reserved.
  • Defining a Mapping
    • Define URI formats and their subclass relations
    • Define which key-column-value combinations make a triple
    • Arbitrary SQL is allowed for mapping values and filtering
    • A single RDF node can be a composite of many columns, e.g. multipart key
    © 2008 OpenLink Software, All rights reserved. Use SPARQL/SQL to:
  • The TPC-H Case
    • The 22 queries as extended SPARQL
    • Each generates a single SQL statement, executable by Virtuoso, Oracle, Others
    • Next make several TPC-H databases on different servers and run the queries against the union
    © 2008 OpenLink Software, All rights reserved. http://demo.openlinksw.com/tpc-h/
  • Where Problems Begin
    • In OpenLink Data Spaces, 6 Collaborative apps all mapped to SIOC:
    • Trivially becomes a union of everything, 1000+ lines of SQL
    • Intelligently Becomes a Union of :
    © 2008 OpenLink Software, All rights reserved. select * from <ods> where {?s ?p ?o . ?s has_comment ?c . ?c has_author <xxx> } select post.* from post, comment, user where c_post = p_id and c_author = u_id and u_name = f ('xxx') Once per app
  • What One Must Know
    • Mapping for integration is not trivial
    • Be careful when mapping multiple tables/columns to one class/property
    • Make URI schemes which encode type and source, so that senseless joins are not attempted if types not specified in query
    • Understand what the mapping logic can and cannot optimize
    • Understand what SQL can and cannot optimize
    • View resulting SQL for sanity check
    © 2008 OpenLink Software, All rights reserved.
  • SQL Extensions
    • Mapping must work against any RDBMS/Schema, as is
    • But there is Virtuoso SQL between the mapping and target RDBMS(s)
    • Location and latency - conscious distributed cost model
    • Breakup for making a wide result set into a row per property
    • Inverse functions
    © 2008 OpenLink Software, All rights reserved.
  • Use Cases
    • OpenLink Data Spaces - Blog, Wiki, News, Social Network, CRM, Threaded Discussions etc.
    • OpenLink's own MIS - “total information awareness”: URI for any CRM Object, Account, Product, Support Case, Email, &c
    • Musicbrainz
    • PHP BB, Drupal, MediaWiki, Bugzilla etc.
    © 2008 OpenLink Software, All rights reserved.
  • OpenLink Software © 2008 OpenLink Software, All rights reserved. Thank You! http://virtuoso.openlinksw.com http://demo.openlinksw.com/tpc-h