Federated Data Stores using
Semantic Web Technology
Steve Ray
Distinguished Research Fellow
Carnegie Mellon University
Interoperability is all about DATA
Three Technology Trends
that could help*
1. Semantic Web technologies
2. Cloud
3. Natural Language Processing
I will focus on semantic web technologies
*Inspired by “Top Three Technologies to Tame the Big Data Beast,” Huffington Post, 11/22/2011
Steve Ray, Carnegie Mellon University
Representation Trends
IBM Card Format
EDI
XML
Metadata
Metamodels
Meta-meta-
models
RDF/OWL
XML
Schema
BPML/
BPEL
CBA
Semantic
Mediation
Web
Services
Protocols
40
25
7
6
5
0
2
4
3
1
SOA
Legacy
Current Practice
Exploratory
18 Info Modeling
FOL
(Slide adapted from Donald Hall, Logistics Enterprise Services Office, DLA)
Steve Ray, Carnegie Mellon University
Why Consider RDF & OWL
Semantic Web Technology?
RDF = Resource Description Framework
OWL = Web Ontology Language
1. Simple representation
– Everything is a triple: <subject – predicate – object>
2. Self-describing models
– Schemas and data coexist in data stores
3. Easy to interrogate
– SPARQL queries (over schema and data)
4. Easy to validate
– Supports automated reasoning
5. Easy to interoperate
– Natively supports distributed data stores
Steve Ray, Carnegie Mellon University
Simple Representation
Everything is stored as triples:
<subject predicate object>
Steve Ray, Carnegie Mellon University
Self-Describing Models
• The schema (model) and the data is stored in
the same place
• Schema:
– Mammal subClassOf Animal
– Human subClassOf Mammal
• Data:
– george is-a Human
– george marriedTo lisa
Steve Ray, Carnegie Mellon University
Easy to Interrogate
SPARQL
†
language to query an RDF database
(Just matches against patterns of triples)
SELECT ?x
WHERE {
george marriedTo ?x .
}
Returns a table: x
lisa
SELECT ?y
WHERE {
y? subClassOf Animal .
}
Returns a table:
y
Mammal
†
SPARQL = SPARQL Protocol and RDF Query LanguageSteve Ray, Carnegie Mellon University
Easy to Validate
SPARQL can be used
for reasoning,
not just interrogating
In SPARQL:
If
George sonOf Fred
and
Fred siblingOf Mary
Then
George nephewOf Mary
CONSTRUCT
{ ?a nephewOf ?c .}
WHERE
{
?a sonOf ?b ;
?b siblingOf ?c .
}
Steve Ray, Carnegie Mellon University
Easy to Interoperate
• A single query can interact with more than one
RDF database
– Linked Movie Database contains movies, actors
– DBPedia contains people and birthdates
• Find the birthdates of all Star Trek actors
– Answer does not exist in one source
Dbpedia is just one
of many RDF data stores
on the Web
We are not alone
Implications
• OWL/RDF provides a representation that can
natively support transformations from other
modeling languages and native formats for
product and process models
• The API is SPARQL
• Storage can be local or web-based
Steve Ray, Carnegie Mellon University
Take-away
• Poor interoperability is expensive
• Interoperability solutions can be expensive
• Semantic technology can make interoperability
solutions easier and cheaper to implement
Steve Ray, Carnegie Mellon University

Federated data stores using semantic web technology

  • 1.
    Federated Data Storesusing Semantic Web Technology Steve Ray Distinguished Research Fellow Carnegie Mellon University
  • 2.
    Interoperability is allabout DATA Three Technology Trends that could help* 1. Semantic Web technologies 2. Cloud 3. Natural Language Processing I will focus on semantic web technologies *Inspired by “Top Three Technologies to Tame the Big Data Beast,” Huffington Post, 11/22/2011 Steve Ray, Carnegie Mellon University
  • 3.
    Representation Trends IBM CardFormat EDI XML Metadata Metamodels Meta-meta- models RDF/OWL XML Schema BPML/ BPEL CBA Semantic Mediation Web Services Protocols 40 25 7 6 5 0 2 4 3 1 SOA Legacy Current Practice Exploratory 18 Info Modeling FOL (Slide adapted from Donald Hall, Logistics Enterprise Services Office, DLA) Steve Ray, Carnegie Mellon University
  • 4.
    Why Consider RDF& OWL Semantic Web Technology? RDF = Resource Description Framework OWL = Web Ontology Language 1. Simple representation – Everything is a triple: <subject – predicate – object> 2. Self-describing models – Schemas and data coexist in data stores 3. Easy to interrogate – SPARQL queries (over schema and data) 4. Easy to validate – Supports automated reasoning 5. Easy to interoperate – Natively supports distributed data stores Steve Ray, Carnegie Mellon University
  • 5.
    Simple Representation Everything isstored as triples: <subject predicate object> Steve Ray, Carnegie Mellon University
  • 6.
    Self-Describing Models • Theschema (model) and the data is stored in the same place • Schema: – Mammal subClassOf Animal – Human subClassOf Mammal • Data: – george is-a Human – george marriedTo lisa Steve Ray, Carnegie Mellon University
  • 7.
    Easy to Interrogate SPARQL † languageto query an RDF database (Just matches against patterns of triples) SELECT ?x WHERE { george marriedTo ?x . } Returns a table: x lisa SELECT ?y WHERE { y? subClassOf Animal . } Returns a table: y Mammal † SPARQL = SPARQL Protocol and RDF Query LanguageSteve Ray, Carnegie Mellon University
  • 8.
    Easy to Validate SPARQLcan be used for reasoning, not just interrogating In SPARQL: If George sonOf Fred and Fred siblingOf Mary Then George nephewOf Mary CONSTRUCT { ?a nephewOf ?c .} WHERE { ?a sonOf ?b ; ?b siblingOf ?c . } Steve Ray, Carnegie Mellon University
  • 9.
    Easy to Interoperate •A single query can interact with more than one RDF database – Linked Movie Database contains movies, actors – DBPedia contains people and birthdates • Find the birthdates of all Star Trek actors – Answer does not exist in one source
  • 10.
    Dbpedia is justone of many RDF data stores on the Web We are not alone
  • 11.
    Implications • OWL/RDF providesa representation that can natively support transformations from other modeling languages and native formats for product and process models • The API is SPARQL • Storage can be local or web-based Steve Ray, Carnegie Mellon University
  • 12.
    Take-away • Poor interoperabilityis expensive • Interoperability solutions can be expensive • Semantic technology can make interoperability solutions easier and cheaper to implement Steve Ray, Carnegie Mellon University