Open Conceptual Data Models Making the Conceptual Layer Real via RDF Linked Data
Conceptual Data Models in the Linked Data Web Linked Data Vision: The transition of the Web  from  a Web of  linked documents to  a Web of  interlinked structured data items   (aka: entities, data objects, resources) Concurrent trend in the IT industry: A recognition of the benefits of  conceptual data models  vs logical data models The Big Question: To what extent does the Linked Data  support conceptual level data models ?
Open Conceptual Data Models Topics: Conceptual & Logical Data Models Conceptual Models for the Semantic Web Realizing Conceptual Models  through Ontologies & Linked Data Virtuoso RDF Views ADO.NET Data Services & the Entity Data Model
Conceptual & Logical Data Models Describe a software system’s target problem space Typically, in today’s database-driven applications Three levels of data model Physical How data is physically represented on disk Logical  (aka logical schema) Expresses problem domain in terms of data management technology (tables / columns) e.g. relational schema Conceptual  (aka conceptual schema) Purely  semantic  description of problem space Describes  things  (entities), their  characteristics  (attributes) &  associations  between things (relationships)
Logical Data Model Most prominent of the three data model types  Main focus of database applications Due to pervasiveness of SQL in application code Weaknesses Impedance mismatch Loss of semantics during development process Heterogeneous databases & interoperability
Logical Data Model Weaknesses Impedance Mismatch SQL expresses queries in terms of tables / views => targets logical schema Normalization fragments the data model Entities & their attributes may be split across several tables Navigation between objects requires relational joins over two or more tables Table rows must be reconstituted into higher level conceptual entities Conceptual level data model is desirable to: Remove impedance mismatch Isolate application from changes to logical data model Provide framework for  human level  interaction
Logical Data Model Weaknesses Loss of Semantics During Development Process: Develop conceptual model (E-R modelling) Transform to logical model for implementation Derive physical model from logical model Problems: Each move to a lower level model discards meaning Higher level model typically not retained Model semantics fragmented across  schema / business rules / application code Application must know logical data model Must be hardcoded or inferred (imperfectly) from system tables
Logical Data Model Weaknesses Heterogeneous Databases & Interoperability Logical data model Describes problem domain in terms of tables/columns Requires SQL to navigate model Application Exposed to specifics of a particular vendor’s RDBMS In heterogeneous database environment , must handle Different SQL dialects Different schemas No explicit data model. No explicit semantics. Interoperability/integration = perpetual problem for IT depts
Conceptual Models for the Semantic Web Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications e.g. Microsoft’s Entity Data Model / Entity Framework Semantic Web technologies provide powerful tools for this paradigm shift
Benefits of Conceptual Models How the Semantic Web benefits More faithfully represents human view of domain of interest Conceptual model & semantics  Explicit & available globally Not implicit & fragmented across business logic / UI etc Better / explicit semantics promises better search engines Much easier heterogeneous data integration Data on the Web is inherently heterogeneous
Application Areas – Present & Future Social networking, e-commerce, collaborative working Require shareable, standards-based, cross-platform conceptual views of data Data portability Needed as Web users maintain multiple points of presence – blogs, social network accounts etc. Open business models Require exchange & integration of large amounts of data Scientific research  – sharing of knowledge & findings Requires transparent access to distributed heterogeneous data Requires database integration using global schema Autonomous intelligent agents Free humans from large-volume information processing
Semantic Web Technology Benefits What Semantic Web technologies bring: Ontologies Can represent common semantics Spanning databases, applications, enterprises, on-line communities Act as a shared conceptual model Provide common models (FOAF, SIOC etc) Common Semantics (Ontologies) & Common Data Representation (RDF) Enable cross data source querying using SPARQL Content from several sites can be combined / explored Querying using proprietary APIs unnecessary Brute force data merging unnecessary Open Data Formats, Platform Independence, Common Models Allow data portability and data integration
Realizing Conceptual Models Ontologies Provide the building blocks of Semantic Web conceptual models Define the concepts and their relationships in a domain of interest Describing Classes & Properties – Ontology Languages RDFS Introduces the notions of concepts (classes) & instances OWL Adds more vocabulary for describing: relations between classes cardinality richer typing of properties, etc.
Goodness of Fit RDF was designed from the ground up as a metadata data model RDF / RDFS / OWL work directly at the level of conceptual models Conceptual model terminology matches RDF/OWL terminology Concepts, entities, attributes, relationships A natural fit! RDF lends itself naturally to describing conceptual models
Semantic Expressivity DDL-based Relational Model Relationship between two entities isn’t explicit Foreign key relating two rows in separate tables doesn’t express the nature of the relationship Semantics must often be inferred from table definitions RDF-based Conceptual Model Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple Semantic expressivity of RDF/RDFS/OWL is much better than DDL Has richer semantic content than equivalent DDL-based logical/relational model
RDF Conceptual Model – Artist / Records / Tracks
Global Granular Information Sharing Traditional Logical/Relational Data Model Schema described by DDL is internal to DBMS Primary keys identifying an individual table row  (i.e. entity instance) not globally unique, not easily usable outside host DBMS Gives rise to  ‘data silos’ RDF’s use of HTTP-based URLs Externalises the data and schema Makes both globally accessible & scalable Provides globally unique IDs for entities/relations/classes A vehicle for  granular, global information sharing down to the equivalent of the record level
Linked Data – What is It? A method for exposing, sharing & connecting data on the Web A term coined by Tim Berners-Lee that describes HTTP-based  Data Access by Reference  for the Web Open Data Access  & Connectivity mechanism for the Web A richer linking mechanism for the Web that takes us from Hypertext Links (Document to Document) to  Hyperdata   Links  (across things that documents are about)
Linked Data – Why Is It Important It exposes the compound nature of Web Resources Information resources (Containers) are uniquely identified & referenceable Entities within Containers are uniquely identified & referencable It provides an  Open Data Access & Connectivity  mechanism for the Web It delivers a powerful mechanism for meshing disparate and heterogeneous data sources
Linked Data Model Changes the focus from linked documents to linked entities The document as a data container becomes less relevant
Hyperdata Links Between Data Objects
Linked Data Benefits – Natural Navigation Natural Navigation Through Typed Links RDF entities are identified by  dereferencable  URIs (URLs) Navigating from one data item to another is easy One click to dereference in Semantic Web Browser e.g. OpenLink Data Explorer URI of object in an RDF statement is a typed link Link’s “type” is defined by the statement predicate Relational/Logical Model Cumbersome Requires SQL joins + typically Object-Relational mapping e.g. in C# :  track = lennonAlbum.Tracks[“Imagine”]
Linked Data Benefits - Aggregatable Data Often desirable to have an  integrated view  of all the data available about an item or topic Database Realm Integration problematic, difficult to combine logical schemas Semantic Web Data aggregation is easy: every resource has a unique URI Individual items can be linked Conceptual models can be linked Cross-domain links enrich domain knowledge Different facets of the same entity may be described by different URIs minted by different authors Can be linked. e.g. owl:sameAs, rdf:type predicates May expose facts not directly represented in any one source
Linked Data – Data Aggregation
Linked Data Benefits - Self Describing Data RDF A technology for creating self-describing Web resources Entity’s type definition ‘accompanies’ it using  rdfs:type An RDF dataset can be queried using SPARQL without knowing anything beforehand about the data Provides the  basis for powerful data exploration tools Logical / Relational Schema Users / applications need a detailed understanding of the schema to use and navigate the data Application’s knowledge of the schema typically hardcoded Ad-hoc end-user data exploration potentially error prone
Linked Data Benefits - SPARQL If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it can use the URI to retrieve the information The Power of SPARQL Discover what sorts of things a data source contains select distinct ?URI ?ObjectType where { ?URI a ?ObjectType } Determine all the properties of an entity class select * where { <http://my.org/resourceTypes/Department> ?property ?hasValue }  Determine all the properties and values of an entity instance DESCRIBE <http://my.org/resource/Accounts> No prior knowledge of the RDF data source is needed
Virtuoso - Linked Data Generation Options Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
Virtuoso RDF Views Expose relational data as RDF Provide the means to move from a logical model view to a conceptual model view Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL) No physical regeneration of relational data RDF Views =  Virtuoso RDF Meta-Schema +  Meta-Schema Language MSL = A domain specific, declarative language for mapping a logical SQL data model to a conceptual RDF data model
Northwind Demo Database: RDF View Definition Extract prefix northwind: <http://www.openlinksw.com/schemas/northwind#> … create iri class northwind:Customer  <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null) … alter quad storage virtrdf:DefaultQuadStorage … from Demo.demo.Customers as customers from Demo.demo.Orders as orders … { Demo.demo.Customers Northwind RDF View Definition create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) { … northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ; northwind:companyName customers.CompanyName as … ; … northwind:fax customers.Fax as virtrdf:Customer-fax .  … } } northwind:Customer(orders.CustomerID) northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order . Country Phone Postal Code Address City Fax Contact Title Contact Name Company  Name Customer ID
Northwind Demo Database: Customer Table to RDF Entity Mapping Orders Table Germany Country 030 - 0074321 Phone 12209 Postal Code Obere Str. 57 Address Berlin City 030 - 0076545 Sales Represe-ntative Maria Anders Alfreds Futterkiste ALFKI Fax Contact Title Contact Name Company  Name Customer ID companyName contactName contactTitle address city PostalCode country phone fax Alfreds Futterkiste Maria Anders Sales Representative Obere Str. 57 Berlin 12209 Germany 030-0074321 030-0076545 … Order/10643#this has_order Order/10692#this … has_order Customer/ALFKI#this prefix  <http://demo.openlinksw.com/Northwind/> has_customer has_customer … ALFKI 10643 … ALFKI 10692 … Customer ID Order  ID
LinqToRdf + Virtuoso
LinqToRdf to MusicBrainz - Conceptual Model Veneer
ADO.NET Data Services & Entity Data Model A framework for exposing ‘pure data’ service over HTTP No support for RDF Fails to imbibe any of RDF’s inherent benefits Lack of platform independence & standards compliance Supports REST-style interfaces Supports  Atom, JSON and XML payloads But  Server-side: Windows only Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser
ADO.NET Data Services & Entity Data Model Server-side only conceptual model Powerful URL addressing to query/navigate/sort/filter etc Customers collection: http://myserver/data.svc/Customers Customer ALFKI:  http://myserver/data.svc/Customers('ALFKI') Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders But Client must know conceptual schema  e.g. to construct above URIs Lack of Deferencable Entity IDs Ability to discover entities and dereference their descriptions (attributes/relations) is confined to the facilities offered by .NET c.f. SPARQL’s ability to handle unknown data sources
ADO.NET Data Services & Entity Data Model No Support for Non-SQL Data Sources Astoria is aimed exclusively at making relational data Web accessible c.f. Semantic Web & Linked Data   Recognize that vast amounts of data resides in unstructured and semi-structured data sources Support for embedding RDF into existing (X)HTML RDFa, GRDDL, eRDF Emerging tools for converting non-RDF data to RDF Emerging tools for exposing SQL data as RDF Astoria lacks scalability & scope of Semantic Web technologies

Open Conceptual Data Models

  • 1.
    Open Conceptual DataModels Making the Conceptual Layer Real via RDF Linked Data
  • 2.
    Conceptual Data Modelsin the Linked Data Web Linked Data Vision: The transition of the Web from a Web of linked documents to a Web of interlinked structured data items (aka: entities, data objects, resources) Concurrent trend in the IT industry: A recognition of the benefits of conceptual data models vs logical data models The Big Question: To what extent does the Linked Data support conceptual level data models ?
  • 3.
    Open Conceptual DataModels Topics: Conceptual & Logical Data Models Conceptual Models for the Semantic Web Realizing Conceptual Models through Ontologies & Linked Data Virtuoso RDF Views ADO.NET Data Services & the Entity Data Model
  • 4.
    Conceptual & LogicalData Models Describe a software system’s target problem space Typically, in today’s database-driven applications Three levels of data model Physical How data is physically represented on disk Logical (aka logical schema) Expresses problem domain in terms of data management technology (tables / columns) e.g. relational schema Conceptual (aka conceptual schema) Purely semantic description of problem space Describes things (entities), their characteristics (attributes) & associations between things (relationships)
  • 5.
    Logical Data ModelMost prominent of the three data model types Main focus of database applications Due to pervasiveness of SQL in application code Weaknesses Impedance mismatch Loss of semantics during development process Heterogeneous databases & interoperability
  • 6.
    Logical Data ModelWeaknesses Impedance Mismatch SQL expresses queries in terms of tables / views => targets logical schema Normalization fragments the data model Entities & their attributes may be split across several tables Navigation between objects requires relational joins over two or more tables Table rows must be reconstituted into higher level conceptual entities Conceptual level data model is desirable to: Remove impedance mismatch Isolate application from changes to logical data model Provide framework for human level interaction
  • 7.
    Logical Data ModelWeaknesses Loss of Semantics During Development Process: Develop conceptual model (E-R modelling) Transform to logical model for implementation Derive physical model from logical model Problems: Each move to a lower level model discards meaning Higher level model typically not retained Model semantics fragmented across schema / business rules / application code Application must know logical data model Must be hardcoded or inferred (imperfectly) from system tables
  • 8.
    Logical Data ModelWeaknesses Heterogeneous Databases & Interoperability Logical data model Describes problem domain in terms of tables/columns Requires SQL to navigate model Application Exposed to specifics of a particular vendor’s RDBMS In heterogeneous database environment , must handle Different SQL dialects Different schemas No explicit data model. No explicit semantics. Interoperability/integration = perpetual problem for IT depts
  • 9.
    Conceptual Models forthe Semantic Web Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications e.g. Microsoft’s Entity Data Model / Entity Framework Semantic Web technologies provide powerful tools for this paradigm shift
  • 10.
    Benefits of ConceptualModels How the Semantic Web benefits More faithfully represents human view of domain of interest Conceptual model & semantics Explicit & available globally Not implicit & fragmented across business logic / UI etc Better / explicit semantics promises better search engines Much easier heterogeneous data integration Data on the Web is inherently heterogeneous
  • 11.
    Application Areas –Present & Future Social networking, e-commerce, collaborative working Require shareable, standards-based, cross-platform conceptual views of data Data portability Needed as Web users maintain multiple points of presence – blogs, social network accounts etc. Open business models Require exchange & integration of large amounts of data Scientific research – sharing of knowledge & findings Requires transparent access to distributed heterogeneous data Requires database integration using global schema Autonomous intelligent agents Free humans from large-volume information processing
  • 12.
    Semantic Web TechnologyBenefits What Semantic Web technologies bring: Ontologies Can represent common semantics Spanning databases, applications, enterprises, on-line communities Act as a shared conceptual model Provide common models (FOAF, SIOC etc) Common Semantics (Ontologies) & Common Data Representation (RDF) Enable cross data source querying using SPARQL Content from several sites can be combined / explored Querying using proprietary APIs unnecessary Brute force data merging unnecessary Open Data Formats, Platform Independence, Common Models Allow data portability and data integration
  • 13.
    Realizing Conceptual ModelsOntologies Provide the building blocks of Semantic Web conceptual models Define the concepts and their relationships in a domain of interest Describing Classes & Properties – Ontology Languages RDFS Introduces the notions of concepts (classes) & instances OWL Adds more vocabulary for describing: relations between classes cardinality richer typing of properties, etc.
  • 14.
    Goodness of FitRDF was designed from the ground up as a metadata data model RDF / RDFS / OWL work directly at the level of conceptual models Conceptual model terminology matches RDF/OWL terminology Concepts, entities, attributes, relationships A natural fit! RDF lends itself naturally to describing conceptual models
  • 15.
    Semantic Expressivity DDL-basedRelational Model Relationship between two entities isn’t explicit Foreign key relating two rows in separate tables doesn’t express the nature of the relationship Semantics must often be inferred from table definitions RDF-based Conceptual Model Relationship between two entities is stated explicitly by predicate in subject-predicate-object triple Semantic expressivity of RDF/RDFS/OWL is much better than DDL Has richer semantic content than equivalent DDL-based logical/relational model
  • 16.
    RDF Conceptual Model– Artist / Records / Tracks
  • 17.
    Global Granular InformationSharing Traditional Logical/Relational Data Model Schema described by DDL is internal to DBMS Primary keys identifying an individual table row (i.e. entity instance) not globally unique, not easily usable outside host DBMS Gives rise to ‘data silos’ RDF’s use of HTTP-based URLs Externalises the data and schema Makes both globally accessible & scalable Provides globally unique IDs for entities/relations/classes A vehicle for granular, global information sharing down to the equivalent of the record level
  • 18.
    Linked Data –What is It? A method for exposing, sharing & connecting data on the Web A term coined by Tim Berners-Lee that describes HTTP-based Data Access by Reference for the Web Open Data Access & Connectivity mechanism for the Web A richer linking mechanism for the Web that takes us from Hypertext Links (Document to Document) to Hyperdata Links (across things that documents are about)
  • 19.
    Linked Data –Why Is It Important It exposes the compound nature of Web Resources Information resources (Containers) are uniquely identified & referenceable Entities within Containers are uniquely identified & referencable It provides an Open Data Access & Connectivity mechanism for the Web It delivers a powerful mechanism for meshing disparate and heterogeneous data sources
  • 20.
    Linked Data ModelChanges the focus from linked documents to linked entities The document as a data container becomes less relevant
  • 21.
  • 22.
    Linked Data Benefits– Natural Navigation Natural Navigation Through Typed Links RDF entities are identified by dereferencable URIs (URLs) Navigating from one data item to another is easy One click to dereference in Semantic Web Browser e.g. OpenLink Data Explorer URI of object in an RDF statement is a typed link Link’s “type” is defined by the statement predicate Relational/Logical Model Cumbersome Requires SQL joins + typically Object-Relational mapping e.g. in C# : track = lennonAlbum.Tracks[“Imagine”]
  • 23.
    Linked Data Benefits- Aggregatable Data Often desirable to have an integrated view of all the data available about an item or topic Database Realm Integration problematic, difficult to combine logical schemas Semantic Web Data aggregation is easy: every resource has a unique URI Individual items can be linked Conceptual models can be linked Cross-domain links enrich domain knowledge Different facets of the same entity may be described by different URIs minted by different authors Can be linked. e.g. owl:sameAs, rdf:type predicates May expose facts not directly represented in any one source
  • 24.
    Linked Data –Data Aggregation
  • 25.
    Linked Data Benefits- Self Describing Data RDF A technology for creating self-describing Web resources Entity’s type definition ‘accompanies’ it using rdfs:type An RDF dataset can be queried using SPARQL without knowing anything beforehand about the data Provides the basis for powerful data exploration tools Logical / Relational Schema Users / applications need a detailed understanding of the schema to use and navigate the data Application’s knowledge of the schema typically hardcoded Ad-hoc end-user data exploration potentially error prone
  • 26.
    Linked Data Benefits- SPARQL If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it can use the URI to retrieve the information The Power of SPARQL Discover what sorts of things a data source contains select distinct ?URI ?ObjectType where { ?URI a ?ObjectType } Determine all the properties of an entity class select * where { <http://my.org/resourceTypes/Department> ?property ?hasValue } Determine all the properties and values of an entity instance DESCRIBE <http://my.org/resource/Accounts> No prior knowledge of the RDF data source is needed
  • 27.
    Virtuoso - LinkedData Generation Options Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
  • 28.
    Virtuoso RDF ViewsExpose relational data as RDF Provide the means to move from a logical model view to a conceptual model view Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL) No physical regeneration of relational data RDF Views = Virtuoso RDF Meta-Schema + Meta-Schema Language MSL = A domain specific, declarative language for mapping a logical SQL data model to a conceptual RDF data model
  • 29.
    Northwind Demo Database:RDF View Definition Extract prefix northwind: <http://www.openlinksw.com/schemas/northwind#> … create iri class northwind:Customer <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null) … alter quad storage virtrdf:DefaultQuadStorage … from Demo.demo.Customers as customers from Demo.demo.Orders as orders … { Demo.demo.Customers Northwind RDF View Definition create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) { … northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ; northwind:companyName customers.CompanyName as … ; … northwind:fax customers.Fax as virtrdf:Customer-fax . … } } northwind:Customer(orders.CustomerID) northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order . Country Phone Postal Code Address City Fax Contact Title Contact Name Company Name Customer ID
  • 30.
    Northwind Demo Database:Customer Table to RDF Entity Mapping Orders Table Germany Country 030 - 0074321 Phone 12209 Postal Code Obere Str. 57 Address Berlin City 030 - 0076545 Sales Represe-ntative Maria Anders Alfreds Futterkiste ALFKI Fax Contact Title Contact Name Company Name Customer ID companyName contactName contactTitle address city PostalCode country phone fax Alfreds Futterkiste Maria Anders Sales Representative Obere Str. 57 Berlin 12209 Germany 030-0074321 030-0076545 … Order/10643#this has_order Order/10692#this … has_order Customer/ALFKI#this prefix <http://demo.openlinksw.com/Northwind/> has_customer has_customer … ALFKI 10643 … ALFKI 10692 … Customer ID Order ID
  • 31.
  • 32.
    LinqToRdf to MusicBrainz- Conceptual Model Veneer
  • 33.
    ADO.NET Data Services& Entity Data Model A framework for exposing ‘pure data’ service over HTTP No support for RDF Fails to imbibe any of RDF’s inherent benefits Lack of platform independence & standards compliance Supports REST-style interfaces Supports Atom, JSON and XML payloads But Server-side: Windows only Consuming Astoria services at a higher level requires Windows .NET client or Silverlight-supported browser
  • 34.
    ADO.NET Data Services& Entity Data Model Server-side only conceptual model Powerful URL addressing to query/navigate/sort/filter etc Customers collection: http://myserver/data.svc/Customers Customer ALFKI: http://myserver/data.svc/Customers('ALFKI') Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders But Client must know conceptual schema e.g. to construct above URIs Lack of Deferencable Entity IDs Ability to discover entities and dereference their descriptions (attributes/relations) is confined to the facilities offered by .NET c.f. SPARQL’s ability to handle unknown data sources
  • 35.
    ADO.NET Data Services& Entity Data Model No Support for Non-SQL Data Sources Astoria is aimed exclusively at making relational data Web accessible c.f. Semantic Web & Linked Data Recognize that vast amounts of data resides in unstructured and semi-structured data sources Support for embedding RDF into existing (X)HTML RDFa, GRDDL, eRDF Emerging tools for converting non-RDF data to RDF Emerging tools for exposing SQL data as RDF Astoria lacks scalability & scope of Semantic Web technologies