Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

on

  • 480 views

Seminar on Ontology Based Data Access for RDBMSs through query rewriting at Stanford's BMIR lab. 2012.

Seminar on Ontology Based Data Access for RDBMSs through query rewriting at Stanford's BMIR lab. 2012.

Statistics

Views

Total Views
480
Views on SlideShare
480
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting Presentation Transcript

  • ONTOLOGY BASED DATA ACCESS Architecture, Techniques and Systems Mariano Rodríguez-Muro KRDB Research Group Free University of Bozen-Bolzano BMIR, Stanford February, 2012
  • ONTOLOGIES Reasoning and Data OBDA: Architecture, Techniques and Systems
  • Ontologies • A formal conceptualization of a domain of interest • They come in many different languages: RDFS, OBO, OWL 2, SWRL, etc. • Uses •  Documentation •  Knowledge Exchange •  Discovering new knowledge •  Ontologies + Data… OBDA: Architecture, Techniques and Systems
  • Instance reasoning •  Instance reasoning •  Infer new information about the data •  Detect inconsistent data •  Use inferred information for complex queries (e.g., SPARQL) •  Queries •  Is :person/mariano an instance of :Mammal? •  Retrieve all instances of :Mammal •  SELECT ?x, ?y WHERE { ?x a :Mammal; :hasAncestor ?y. ?y a :Mammal } •  Requirements •  Fast execution •  Efficient resource management •  Big data, Big ontologies OBDA: Architecture, Techniques and Systems
  • The usual workflow OBDA: Architecture, Techniques and Systems Reasoner Source Application Communication Ontology Inputs Triples Application Code
  • Problem with approach •  Software Complexity •  Duplication •  Data refreshing •  Data structure is lost (PKEYS, FOREIGN KEYS, information about the import procedure) OBDA: Architecture, Techniques and Systems Reasoner Source Application Communication Ontology Inputs Triples Application Code
  • OBDA Models and Architecture OBDA: Architecture, Techniques and Systems
  • OBDA as an Architecture OBDA: Architecture, Techniques and Systems Reasoner Source Application Direct Communication Ontology OBDA Model Inputs
  • OBDA Models: Sources and Mappings “A formal specification of the relationship between data in a data source and the vocabulary of the ontology” OBDA: Architecture, Techniques and Systems OBDA Model Source Source Declaration A set of mappings
  • Mapping “A tuple of 2 queries, one over the source and one over the ontology, with the same signature. Intuitively, a mapping associates the data specified by qs with the answers for qo ” OBDA: Architecture, Techniques and Systems qs⊆qo SELECT id FROM condition WHERE c_id = 3333 ⊆ CardiacArrestPatient(?id)èq(?id) id = (23) <23> rdf:type CardiacArrestPatient
  • Example OBDA model OBDA: Architecture, Techniques and Systems SELECT id FROM condition WHERE c_id = 3333 ⊆CardiacArrestPatient(?id) è q(?id) SELECT id,name,age,ssn FROM patient ⊆Patient(?id) ^ name(?id,?name) ^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn) id [PKEY] name age ssn 12345 John 37 xxx-999 … … … … Table: patient patient_id [FKEY] c_id [FKEY] 12345 3333 … … Table: condition
  • Example OBDA model OBDA: Architecture, Techniques and Systems id [PKEY] name age ssn 12345 John 37 xxx-999 … … … … Table: patient patient_id [FKEY] c_id [FKEY] 12345 3333 … … Table: condition <12345> rdf:type :Patient. <12345> :name “John”. <12345> :age “37”. <12345> :ssn “xxx-999” <12345> rdf:type :CardiacArrestPatient …
  • The Pay-off •  At least •  The source is documented •  Data handling can be done automatically (by the reasoner) •  Reduced cost of application development and maintenance •  The reasoner can analyze source and mappings to minimize the cost of inference •  The sweet spot •  On-the-fly data access •  Reasoning by query rewriting •  Exploitation of efficient engines OBDA: Architecture, Techniques and Systems
  • QUERY REWRITING OBDA: Architecture, Techniques and Systems
  • Query Rewriting in a Nutshell • Given a query Q, a TBox T, an OBDA model <D, M> to compute a query Q’ such that: answer(Q,T,mat(D,M)) = answer(Q’,D) where mat(D,M) is the collection of assertion resulting from “materializing” the mappings into ABox assertions (assertional triples) OBDA: Architecture, Techniques and Systems
  • Example OBDA model OBDA: Architecture, Techniques and Systems SELECT id FROM condition WHERE c_id = 3333 ⤳ CardiacArrestPatient(?id) è q(?id) SELECT id,name,age,ssn FROM patient ⤳ Patient(?id) ^ name(?id,?name) ^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn) id [PKEY] name age ssn 12345 John 37 xxx-999 … … … … Table: patient patient_id [FKEY] c_id [FKEY] 12345 3333 … … Table: condition
  • Query Rewriting: An example OBDA: Architecture, Techniques and Systems Ontology (Tbox) SubClassOf(:CardiacArrest :HearthCondition) SubClassOf(:CardiacArrestPatient :Patient) SubClassOf(:CardiacArrestPatient ObjectSomeValuesFrom(:affectedBy :CardiacArrest)) Query (SPARQL) SELECT ?p ?name ?ssn WHERE { ?p a :Patient; :name ?name; :ssn ?ssn; :age ?age :affectedBy [ a :HeartCondition ]. FILTER (?age >= 21 && ?age <= 50) }
  • Query Rewriting: An example OBDA: Architecture, Techniques and Systems Rewritten query SELECT ?p ?name ?ssn WHERE { {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age :affectedBy [ a :HeartCondition ]. FILTER (?age >= 21 && ?age <= 50) } UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age :affectedBy [ a :CardiacArrest ]. FILTER (?age >= 21 && ?age <= 50) } UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age; a :CardiacArrestPatient. FILTER (?age >= 21 && ?age <= 50) } UNION … }
  • Query Rewriting An Example OBDA: Architecture, Techniques and Systems SQL query SELECT tp.id as p, tp.name as name, tp.age as age FROM patient tp JOIN condition tc ON tp.id = tc.patient_id WHERE c.c_id = 3333 AND tp.age >= 21 AND tp.age <= 50 ?p ?name ?ssn 12345 John xxx-999 Answer “Fast execution even in the presence of millions of assertions”
  • That Simple? •  Warning: Query rewritings can easily grow to exponentially. •  Effective query rewriting requires: •  Highly efficient rewriting algorithm that is able to detect redundancy •  Highly efficient SQL generation: •  Detect redundant SQL (w.r.t. constraints and mappings) •  Optimize individual SQL queries (w.r.t. constraints and mappings) •  Generate optimal SQL (w.r.t. the database engine) •  Able to deal with impedance miss-match (URIs and Literals vs. Data values) •  Database engine tuning (indexing, buffers, disk, etc.) •  Effective query rewriting gives you: •  Fast system initialization •  Small footprint •  Fast query execution OBDA: Architecture, Techniques and Systems
  • Efficient Languages (for pure query rewriting) •  RDFS, DL-Lite, OWL 2 QL •  Datalog+- •  DL-lite/OWL 2 QL/Datalog+- fragments of SWRL Promising Languages (for combined approaches) •  EL++ and OWL 2 EL •  OWL-Horst and OWL 2 RL •  SWRL with limited recursivity OBDA: Architecture, Techniques and Systems
  • SYSTEMS OBDALib, OBDA Plugin for Protègè 4 OBDA: Architecture, Techniques and Systems
  • OBDA as an Architecture OBDA: Architecture, Techniques and Systems Ontology Reasoner OBDA Model Source Application Communication Inputs
  • OBDALib A Java library for: •  OBDA Model creation and manipulation •  OBDA Model persistence •  Interfaces for OBDA-capable reasoners •  SQL parsing and Datalog translation •  RDBMS metadata extraction libraries •  OBDA model materialization In the near future: •  Automatic OBDA model generation (compatible with W3C’s RDB2RDF direct mapping) •  Support for W3C’s R2RML syntax OBDA: Architecture, Techniques and Systems
  • OBDA Plugin for Protégé 4 “A plugin to write and test OBDA models interact with OBDA- capable reasoners” OBDA: Architecture, Techniques and Systems
  • OBDA Model tab and tools OBDA: Architecture, Techniques and Systems
  • OBDA Model tab and tools OBDA: Architecture, Techniques and Systems
  • OBDA Model synch An EditorKitHook plugin to: • Associate an OBDA model to the editor environment • Synchronize OBDA models with OBDA- capable reasoners OBDA: Architecture, Techniques and Systems
  • DataQuery Tab OBDA: Architecture, Techniques and Systems
  • SYSTEMS Quest OBDA: Architecture, Techniques and Systems
  • Quest An OBDA-capable reasoner with focus on fast and efficient query answering over very large ontologies and volumes of data. Features: •  Support for RDFS and OWL 2 QL and DL-Lite •  SPARQL •  On-the-fly reasoning based on query rewriting •  Read-only “Virtual OBDA” •  Read/Write “Triple-store” mode •  Generation of highly optimized SQL •  OWLAPI 3 and Protégé support OBDA: Architecture, Techniques and Systems
  • Quest in virtual mode OBDA: Architecture, Techniques and Systems Ontology Quest OBDA Model Source Application JDBC Inputs MySQL, PostgreSQL, DB2 and Oracle
  • Data integration with Quest in virtual mode OBDA: Architecture, Techniques and Systems Ontology Quest OBDA Model Database Federator Application JDBC Inputs E.g., Teiid
  • Read/Write triple-store mode OBDA: Architecture, Techniques and Systems Ontology Quest Triples JDBC Storage Application JDBC Storage is is based on the Semantic Index technique (ISWC11, KR12) Technique based on “smart index” computation that allows to retrieve hierarchy inferences by means of interval queries (FAST SQL!)
  • Performance in triple-store mode: Resource Index Experiments •  Input: •  Ontology: The asserted is-a relations in obs_relation (for all RI ontologies) •  Data: The annotations for Clinical Trials.gov •  Queries e.g,. SELECT ?x WHERE { ?x a :DNA_Repair_Gene; a :Antigen_Gene; a :Cancer_Gene. } OBDA: Architecture, Techniques and Systems
  • Performance in triple-store mode: Resource Index Experiments • System setup costs: •  Resource Index workflow: •  Ontology Closure: X ? •  CT annotation closure: 7 days (naïve), 40 mins optimized •  Space requirements for CT: 16 GB + isa-closure: 70 GB •  Using a naïve implementation of Quest’s reasoning technique for the RI: •  Ontology Closure: 5 mins •  CT annotation closure: none •  Space requirements for CT: 16 GB • Execution speed: roughly the same • Potential to eliminate all _isa_annotation_tables and the closure of relation_isa. OBDA: Architecture, Techniques and Systems
  • DEMO OBDA: Architecture, Techniques and Systems
  • CONCLUSIONS OBDA: Architecture, Techniques and Systems
  • Summary •  OBDA as an architecture •  Benefits: Software Complexity, Optimization and On-the-fly query answering •  Basis of query rewriting in OBDA •  Introduced •  OBDALib •  OBDA Plugin for Protégé •  Quest •  Briefly mentioned the performance advantages of Quest’s reasoning technique OBDA: Architecture, Techniques and Systems
  • Where to go now? •  Resource index overhauling? •  Demos? •  More detail on the techniques? •  More details on the systems? •  Development and plugins for Protege •  Projects?! •  You call it J OBDA: Architecture, Techniques and Systems
  • THANK YOU OBDA: Architecture, Techniques and Systems