Open Conceptual Data Models


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Open Conceptual Data Models

  1. 1. Open Conceptual Data Models <ul><ul><li>Making the Conceptual Layer Real </li></ul></ul><ul><ul><li>via </li></ul></ul><ul><ul><li>RDF Linked Data </li></ul></ul>
  2. 2. Conceptual Data Models in the Linked Data Web <ul><li>Linked Data Vision: </li></ul><ul><li>The transition of the Web </li></ul><ul><ul><li>from a Web of linked documents </li></ul></ul><ul><ul><li>to a Web of interlinked structured data items (aka: entities, data objects, resources) </li></ul></ul><ul><li>Concurrent trend in the IT industry: </li></ul><ul><li>A recognition of the benefits of conceptual data models vs logical data models </li></ul><ul><li>The Big Question: </li></ul><ul><li>To what extent does the Linked Data support conceptual level data models ? </li></ul>
  3. 3. Open Conceptual Data Models <ul><li>Topics: </li></ul><ul><li>Conceptual & Logical Data Models </li></ul><ul><li>Conceptual Models for the Semantic Web </li></ul><ul><li>Realizing Conceptual Models through Ontologies & Linked Data </li></ul><ul><li>Virtuoso RDF Views </li></ul><ul><li>ADO.NET Data Services & the Entity Data Model </li></ul>
  4. 4. Conceptual & Logical Data Models <ul><li>Describe a software system’s target problem space </li></ul><ul><li>Typically, in today’s database-driven applications </li></ul><ul><li>Three levels of data model </li></ul><ul><li>Physical </li></ul><ul><ul><li>How data is physically represented on disk </li></ul></ul><ul><li>Logical (aka logical schema) </li></ul><ul><ul><li>Expresses problem domain in terms of data management technology (tables / columns) </li></ul></ul><ul><ul><li>e.g. relational schema </li></ul></ul><ul><li>Conceptual (aka conceptual schema) </li></ul><ul><ul><li>Purely semantic description of problem space </li></ul></ul><ul><ul><li>Describes things (entities), their characteristics (attributes) & associations between things (relationships) </li></ul></ul>
  5. 5. Logical Data Model <ul><li>Most prominent of the three data model types </li></ul><ul><li>Main focus of database applications </li></ul><ul><ul><li>Due to pervasiveness of SQL in application code </li></ul></ul><ul><li>Weaknesses </li></ul><ul><li>Impedance mismatch </li></ul><ul><li>Loss of semantics during development process </li></ul><ul><li>Heterogeneous databases & interoperability </li></ul>
  6. 6. Logical Data Model Weaknesses <ul><li>Impedance Mismatch </li></ul><ul><li>SQL expresses queries in terms of tables / views </li></ul><ul><ul><li>=> targets logical schema </li></ul></ul><ul><li>Normalization fragments the data model </li></ul><ul><ul><li>Entities & their attributes may be split across several tables </li></ul></ul><ul><ul><li>Navigation between objects requires relational joins over two or more tables </li></ul></ul><ul><ul><li>Table rows must be reconstituted into higher level conceptual entities </li></ul></ul><ul><li>Conceptual level data model is desirable to: </li></ul><ul><li>Remove impedance mismatch </li></ul><ul><li>Isolate application from changes to logical data model </li></ul><ul><li>Provide framework for human level interaction </li></ul>
  7. 7. Logical Data Model Weaknesses <ul><li>Loss of Semantics During Development </li></ul><ul><li>Process: </li></ul><ul><li>Develop conceptual model (E-R modelling) </li></ul><ul><li>Transform to logical model for implementation </li></ul><ul><li>Derive physical model from logical model </li></ul><ul><li>Problems: </li></ul><ul><li>Each move to a lower level model discards meaning </li></ul><ul><li>Higher level model typically not retained </li></ul><ul><li>Model semantics fragmented across schema / business rules / application code </li></ul><ul><li>Application must know logical data model </li></ul><ul><ul><li>Must be hardcoded or inferred (imperfectly) from system tables </li></ul></ul>
  8. 8. Logical Data Model Weaknesses <ul><li>Heterogeneous Databases & Interoperability </li></ul><ul><li>Logical data model </li></ul><ul><li>Describes problem domain in terms of tables/columns </li></ul><ul><li>Requires SQL to navigate model </li></ul><ul><li>Application </li></ul><ul><li>Exposed to specifics of a particular vendor’s RDBMS </li></ul><ul><li>In heterogeneous database environment , must handle </li></ul><ul><li>Different SQL dialects </li></ul><ul><li>Different schemas </li></ul><ul><ul><li>No explicit data model. No explicit semantics. </li></ul></ul><ul><li>Interoperability/integration = perpetual problem for IT depts </li></ul>
  9. 9. Conceptual Models for the Semantic Web <ul><li>Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications </li></ul><ul><ul><li>e.g. Microsoft’s Entity Data Model / Entity Framework </li></ul></ul><ul><li>Semantic Web technologies provide powerful tools for this paradigm shift </li></ul>
  10. 10. Benefits of Conceptual Models <ul><li>How the Semantic Web benefits </li></ul><ul><li>More faithfully represents human view of domain of interest </li></ul><ul><li>Conceptual model & semantics </li></ul><ul><ul><li>Explicit & available globally </li></ul></ul><ul><ul><li>Not implicit & fragmented across business logic / UI etc </li></ul></ul><ul><li>Better / explicit semantics promises better search engines </li></ul><ul><li>Much easier heterogeneous data integration </li></ul><ul><ul><li>Data on the Web is inherently heterogeneous </li></ul></ul>
  11. 11. Application Areas – Present & Future <ul><li>Social networking, e-commerce, collaborative working </li></ul><ul><ul><li>Require shareable, standards-based, cross-platform conceptual views of data </li></ul></ul><ul><li>Data portability </li></ul><ul><ul><li>Needed as Web users maintain multiple points of presence – blogs, social network accounts etc. </li></ul></ul><ul><li>Open business models </li></ul><ul><ul><li>Require exchange & integration of large amounts of data </li></ul></ul><ul><li>Scientific research – sharing of knowledge & findings </li></ul><ul><ul><li>Requires transparent access to distributed heterogeneous data </li></ul></ul><ul><ul><li>Requires database integration using global schema </li></ul></ul><ul><li>Autonomous intelligent agents </li></ul><ul><ul><li>Free humans from large-volume information processing </li></ul></ul>
  12. 12. Semantic Web Technology Benefits <ul><li>What Semantic Web technologies bring: </li></ul><ul><li>Ontologies </li></ul><ul><li>Can represent common semantics </li></ul><ul><ul><li>Spanning databases, applications, enterprises, on-line communities </li></ul></ul><ul><li>Act as a shared conceptual model </li></ul><ul><li>Provide common models (FOAF, SIOC etc) </li></ul><ul><li>Common Semantics (Ontologies) & Common Data Representation (RDF) </li></ul><ul><li>Enable cross data source querying using SPARQL </li></ul><ul><ul><li>Content from several sites can be combined / explored </li></ul></ul><ul><ul><li>Querying using proprietary APIs unnecessary </li></ul></ul><ul><ul><li>Brute force data merging unnecessary </li></ul></ul><ul><li>Open Data Formats, Platform Independence, Common Models </li></ul><ul><li>Allow data portability and data integration </li></ul>
  13. 13. Realizing Conceptual Models <ul><li>Ontologies </li></ul><ul><li>Provide the building blocks of Semantic Web conceptual models </li></ul><ul><li>Define the concepts and their relationships in a domain of interest </li></ul><ul><li>Describing Classes & Properties – Ontology Languages </li></ul><ul><li>RDFS </li></ul><ul><ul><li>Introduces the notions of concepts (classes) & instances </li></ul></ul><ul><li>OWL </li></ul><ul><ul><li>Adds more vocabulary for describing: </li></ul></ul><ul><ul><ul><li>relations between classes </li></ul></ul></ul><ul><ul><ul><li>cardinality </li></ul></ul></ul><ul><ul><ul><li>richer typing of properties, etc. </li></ul></ul></ul>
  14. 14. Goodness of Fit <ul><li>RDF was designed from the ground up as a metadata data model </li></ul><ul><li>RDF / RDFS / OWL work directly at the level of conceptual models </li></ul><ul><li>Conceptual model terminology matches RDF/OWL terminology </li></ul><ul><ul><li>Concepts, entities, attributes, relationships </li></ul></ul><ul><li>A natural fit! </li></ul><ul><li>RDF lends itself naturally to describing conceptual models </li></ul>
  15. 15. Semantic Expressivity <ul><li>DDL-based Relational Model </li></ul><ul><li>Relationship between two entities isn’t explicit </li></ul><ul><li>Foreign key relating two rows in separate tables doesn’t express the nature of the relationship </li></ul><ul><li>Semantics must often be inferred from table definitions </li></ul><ul><li>RDF-based Conceptual Model </li></ul><ul><li>Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple </li></ul><ul><li>Semantic expressivity of RDF/RDFS/OWL is much better than DDL </li></ul><ul><li>Has richer semantic content than equivalent DDL-based logical/relational model </li></ul>
  16. 16. RDF Conceptual Model – Artist / Records / Tracks
  17. 17. Global Granular Information Sharing <ul><li>Traditional Logical/Relational Data Model </li></ul><ul><li>Schema described by DDL is internal to DBMS </li></ul><ul><li>Primary keys identifying an individual table row (i.e. entity instance) not globally unique, not easily usable outside host DBMS </li></ul><ul><li>Gives rise to ‘data silos’ </li></ul><ul><li>RDF’s use of HTTP-based URLs </li></ul><ul><li>Externalises the data and schema </li></ul><ul><li>Makes both globally accessible & scalable </li></ul><ul><li>Provides globally unique IDs for entities/relations/classes </li></ul><ul><li>A vehicle for granular, global information sharing down to the equivalent of the record level </li></ul>
  18. 18. Linked Data – What is It? <ul><li>A method for exposing, sharing & connecting data on the Web </li></ul><ul><li>A term coined by Tim Berners-Lee that describes HTTP-based Data Access by Reference for the Web </li></ul><ul><li>Open Data Access & Connectivity mechanism for the Web </li></ul><ul><li>A richer linking mechanism for the Web that takes us from Hypertext Links (Document to Document) to Hyperdata Links (across things that documents are about) </li></ul>
  19. 19. Linked Data – Why Is It Important <ul><li>It exposes the compound nature of Web Resources </li></ul><ul><ul><li>Information resources (Containers) are uniquely identified & referenceable </li></ul></ul><ul><ul><li>Entities within Containers are uniquely identified & referencable </li></ul></ul><ul><li>It provides an Open Data Access & Connectivity mechanism for the Web </li></ul><ul><li>It delivers a powerful mechanism for meshing disparate and heterogeneous data sources </li></ul>
  20. 20. Linked Data Model Changes the focus from linked documents to linked entities The document as a data container becomes less relevant
  21. 21. Hyperdata Links Between Data Objects
  22. 22. Linked Data Benefits – Natural Navigation <ul><li>Natural Navigation Through Typed Links </li></ul><ul><li>RDF entities are identified by dereferencable URIs (URLs) </li></ul><ul><li>Navigating from one data item to another is easy </li></ul><ul><ul><li>One click to dereference in Semantic Web Browser </li></ul></ul><ul><ul><li>e.g. OpenLink Data Explorer </li></ul></ul><ul><li>URI of object in an RDF statement is a typed link </li></ul><ul><ul><li>Link’s “type” is defined by the statement predicate </li></ul></ul><ul><li>Relational/Logical Model </li></ul><ul><li>Cumbersome </li></ul><ul><li>Requires SQL joins + typically Object-Relational mapping </li></ul><ul><li>e.g. in C# : track = lennonAlbum.Tracks[“Imagine”] </li></ul>
  23. 23. Linked Data Benefits - Aggregatable Data <ul><li>Often desirable to have an integrated view of all the data available about an item or topic </li></ul><ul><li>Database Realm </li></ul><ul><li>Integration problematic, difficult to combine logical schemas </li></ul><ul><li>Semantic Web </li></ul><ul><li>Data aggregation is easy: every resource has a unique URI </li></ul><ul><ul><li>Individual items can be linked </li></ul></ul><ul><ul><li>Conceptual models can be linked </li></ul></ul><ul><li>Cross-domain links enrich domain knowledge </li></ul><ul><li>Different facets of the same entity may be described by different URIs minted by different authors </li></ul><ul><ul><li>Can be linked. e.g. owl:sameAs, rdf:type predicates </li></ul></ul><ul><ul><li>May expose facts not directly represented in any one source </li></ul></ul>
  24. 24. Linked Data – Data Aggregation
  25. 25. Linked Data Benefits - Self Describing Data <ul><li>RDF </li></ul><ul><li>A technology for creating self-describing Web resources </li></ul><ul><li>Entity’s type definition ‘accompanies’ it using rdfs:type </li></ul><ul><li>An RDF dataset can be queried using SPARQL without knowing anything beforehand about the data </li></ul><ul><li>Provides the basis for powerful data exploration tools </li></ul><ul><li>Logical / Relational Schema </li></ul><ul><li>Users / applications need a detailed understanding of the schema to use and navigate the data </li></ul><ul><li>Application’s knowledge of the schema typically hardcoded </li></ul><ul><li>Ad-hoc end-user data exploration potentially error prone </li></ul>
  26. 26. Linked Data Benefits - SPARQL <ul><li>If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it can use the URI to retrieve the information </li></ul><ul><li>The Power of SPARQL </li></ul><ul><li>Discover what sorts of things a data source contains </li></ul><ul><li>select distinct ?URI ?ObjectType where { ?URI a ?ObjectType } </li></ul><ul><li>Determine all the properties of an entity class </li></ul><ul><li>select * where { <> ?property ?hasValue } </li></ul><ul><li>Determine all the properties and values of an entity instance </li></ul><ul><li>DESCRIBE <> </li></ul><ul><li>No prior knowledge of the RDF data source is needed </li></ul>
  27. 27. Virtuoso - Linked Data Generation Options <ul><li>Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity </li></ul>
  28. 28. Virtuoso RDF Views <ul><li>Expose relational data as RDF </li></ul><ul><li>Provide the means to move from a logical model view to a conceptual model view </li></ul><ul><li>Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL) </li></ul><ul><li>No physical regeneration of relational data </li></ul><ul><li>RDF Views = </li></ul><ul><ul><li>Virtuoso RDF Meta-Schema + </li></ul></ul><ul><ul><li>Meta-Schema Language </li></ul></ul><ul><li>MSL = </li></ul><ul><ul><li>A domain specific, declarative language for mapping a logical SQL data model to a conceptual RDF data model </li></ul></ul>
  29. 29. Northwind Demo Database: RDF View Definition Extract prefix northwind: <> … create iri class northwind:Customer <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null) … alter quad storage virtrdf:DefaultQuadStorage … from Demo.demo.Customers as customers from Demo.demo.Orders as orders … { Demo.demo.Customers Northwind RDF View Definition create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) { … northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ; northwind:companyName customers.CompanyName as … ; … northwind:fax customers.Fax as virtrdf:Customer-fax . … } } northwind:Customer(orders.CustomerID) northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order . Country Phone Postal Code Address City Fax Contact Title Contact Name Company Name Customer ID
  30. 30. Northwind Demo Database: Customer Table to RDF Entity Mapping Orders Table Germany Country 030 - 0074321 Phone 12209 Postal Code Obere Str. 57 Address Berlin City 030 - 0076545 Sales Represe-ntative Maria Anders Alfreds Futterkiste ALFKI Fax Contact Title Contact Name Company Name Customer ID companyName contactName contactTitle address city PostalCode country phone fax Alfreds Futterkiste Maria Anders Sales Representative Obere Str. 57 Berlin 12209 Germany 030-0074321 030-0076545 … Order/10643#this has_order Order/10692#this … has_order Customer/ALFKI#this prefix <> has_customer has_customer … ALFKI 10643 … ALFKI 10692 … Customer ID Order ID
  31. 31. LinqToRdf + Virtuoso
  32. 32. LinqToRdf to MusicBrainz - Conceptual Model Veneer
  33. 33. ADO.NET Data Services & Entity Data Model <ul><li>A framework for exposing ‘pure data’ service over HTTP </li></ul><ul><li>No support for RDF </li></ul><ul><li>Fails to imbibe any of RDF’s inherent benefits </li></ul><ul><li>Lack of platform independence & standards compliance </li></ul><ul><li>Supports REST-style interfaces </li></ul><ul><li>Supports Atom, JSON and XML payloads </li></ul><ul><li>But </li></ul><ul><li>Server-side: Windows only </li></ul><ul><li>Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser </li></ul>
  34. 34. ADO.NET Data Services & Entity Data Model <ul><li>Server-side only conceptual model </li></ul><ul><li>Powerful URL addressing to query/navigate/sort/filter etc </li></ul><ul><ul><li>Customers collection: http://myserver/data.svc/Customers </li></ul></ul><ul><ul><li>Customer ALFKI: http://myserver/data.svc/Customers('ALFKI') </li></ul></ul><ul><ul><li>Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders </li></ul></ul><ul><li>But </li></ul><ul><li>Client must know conceptual schema </li></ul><ul><ul><li>e.g. to construct above URIs </li></ul></ul><ul><li>Lack of Deferencable Entity IDs </li></ul><ul><li>Ability to discover entities and dereference their descriptions (attributes/relations) is confined to the facilities offered by .NET </li></ul><ul><li>c.f. SPARQL’s ability to handle unknown data sources </li></ul>
  35. 35. ADO.NET Data Services & Entity Data Model <ul><li>No Support for Non-SQL Data Sources </li></ul><ul><li>Astoria is aimed exclusively at making relational data Web accessible </li></ul><ul><li>c.f. Semantic Web & Linked Data </li></ul><ul><li>Recognize that vast amounts of data resides in unstructured and semi-structured data sources </li></ul><ul><li>Support for embedding RDF into existing (X)HTML </li></ul><ul><ul><li>RDFa, GRDDL, eRDF </li></ul></ul><ul><li>Emerging tools for converting non-RDF data to RDF </li></ul><ul><li>Emerging tools for exposing SQL data as RDF </li></ul><ul><li>Astoria lacks scalability & scope of Semantic Web technologies </li></ul>