Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Query Translation for Ontology-extended Data Sources

1,378 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Query Translation for Ontology-extended Data Sources

  1. 1. Query Translation for Ontology-extended Data Sources Jie Bao 1 , Doina Caragea 2 , Vasant Honavar 1 1 Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA {baojie, honavar}@cs.iastate.edu 2 Department of Computing and Information Sciences Kansas State University, Manhattan, KS 66506, USA {dcaragea}@ksu.edu
  2. 2. INDUS Group Vasant Honavar Jie Bao Doina Caragea Jyotishman Pathak Neeraj Koul
  3. 3. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  4. 4. Background <ul><li>Data revolution </li></ul><ul><li>Bioinformatics </li></ul><ul><ul><li>Over 200 data repositories of interest to molecular biologists alone </li></ul></ul><ul><li>Environmental Informatics </li></ul><ul><li>Enterprise Informatics </li></ul><ul><li>Medical Informatics </li></ul><ul><li>Social Informatics ... </li></ul><ul><li>Connectivity revolution (Internet and the web) </li></ul><ul><li>Integration revolution </li></ul><ul><li>Need to understand the elephant as opposed to examining the trunk, the tail, etc. </li></ul><ul><li>Needed – infrastructure to support collaborative, integrative analysis of data </li></ul>
  5. 5. Solution: INDUS for Learning from Semantically Heterogeneous Distributed Autonomous Data Sources
  6. 6. (Relational) Data Source D Data Set Extensional Definition (Facts) MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers S Schema Intensional Definition Classes Faculty Teaches name:String code:String rank:String name:String Student Registers name:String status:String
  7. 7. Semantic Extensions of Data Sources Return classes that graduate students are registered in Return all people in the database ? ? D S MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers
  8. 8. Ontology-Extended Data Source Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String People Student Instructor MSc Bob First-year Alice status name Student student Undergrad Graduate First- year MSc Fourth- year … PhD MA
  9. 9. Ontology-Extended Data Source D Data Set S Schema O S Schema Ontology O D Data Content Ontology O’ S O’ D
  10. 10. Ontology-Extended Data Source <ul><li>Relational Model (Reiter, 1982) </li></ul><ul><ul><li>Schema S: a first order language with predicate symbols R S , each for a relational table (e.g. Classes, Faculty) </li></ul></ul><ul><ul><li>Data Set D: a first order interpretation of S with domain  </li></ul></ul><ul><li>Ontology-Extended (Relational) Data Source (Caragea et al. 2004) </li></ul><ul><ul><li>Extending relational model with </li></ul></ul><ul><ul><li>Schema Ontology : a first order language L OS with predicate symbols R OS , and R S  R OS </li></ul></ul><ul><ul><li>Data Content ontology : O OD =(L OD ,D OD ) </li></ul></ul><ul><ul><ul><li>L OD : a first order language with predicate symbols R OD , R OD  R S =  </li></ul></ul></ul><ul><ul><ul><li>D OD : a first order interpretation of L OD with domain  ’,    ’ </li></ul></ul></ul>
  11. 11. OEDS: Example S: Instructor(x,y); Classes(x,y), Student(x,y)… see survey [Shvaiko & Euzenat 2005] MSc Bob First-year Alice status name Student D Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String L OS  x,y, Student(x,y)  Instructor(x,y)  People(x) isa(x,y)  isa(y,z)  isa(x,z) L OD D OD isa(First-year,Undergraduate) isa(Undergraduate,Student) isa(MSc,Graduate) … O D
  12. 12. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  13. 13. Query <ul><li>Tuple Relational Calculus (TRC) </li></ul><ul><ul><li>Tuple: a multiset of attributes </li></ul></ul><ul><ul><li>TRC  Relational Algebra </li></ul></ul><ul><ul><li>q(t) := Student(t)  (t.status=”Graduate”) </li></ul></ul><ul><li>Ontology-Extended Tuple Relational Calculus </li></ul><ul><ul><li>q(t) := Student(t)  isa(t.status, Graduate) </li></ul></ul><ul><li>We focus on data content ontologies in this talk </li></ul>
  14. 14. Query Translation D S O 2 q’ D S q O 1 User Ontology Data Source Ontology M Ontology Mapping
  15. 15. Ontology Mapping <ul><li>isa 1 (x, c 1 ) ^ into(c 1 , c 2 )  isa 2 (x, c 2 ) </li></ul><ul><li>isa 1 (c 1 , x) ^ onto(c 1 , c 2 )  isa 2 (c 2 , x) </li></ul><ul><li>…… </li></ul>Student Undergrad Graduate First- year MSc Fourth- year … PhD MA Student Undergrad Postgraduate Freshman … Doctoral Master into onto equ isa 1 isa 2
  16. 16. Query Translation Student(t) ^ isa 1 (t:status,Master) Student(t) ^ isa 2 (t.status, Graduate) Student(t) ^ isa 2 (t.status, MSc) D S O 2 q’ q D S O 1 M
  17. 17. Soundness, Completeness and Exactness {q} {q’} {q’} {q} {q}={q’} Sound Translation Complete Translation Exact Translation q := Student(t) ^ isa 1 (t:status,Master) q’ := Student(t) ^ isa 2 (t.status, MSc) q’ := Student(t) ^ isa 2 (t.status, Graduate) Non-existent
  18. 18. Most Informative Translation c 1 d 1 d 2 O 1 O 2 q := isa 1 (x,c 1 ) isa 2 (x,d 1 )  isa 2 (x,d 2 ) Most informative sound translation! onto onto LUB (least upper bound) isa 2 (x,d 1 ) isa 2 (x,d 2 ) find its sound translation(s)
  19. 19. Query Translation Rules <ul><li>For hierarchical ontologies </li></ul>(similarly for complete translation of complex queries) Atomic conditions Complex conditions GLB=greatest lower bound, LUB=least upper bound
  20. 20. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  21. 21. INDUS Tools <ul><li>Ontology Editor </li></ul><ul><li>Schema Editor </li></ul><ul><li>Mapping Editor </li></ul><ul><li>Data Editor </li></ul><ul><li>Query Engine and Interface </li></ul><ul><li>… </li></ul>
  22. 22. INDUS – Mapping Editor http://sourceforge.net/projects/indus-project/
  23. 23. INDUS – Data Editor http://sourceforge.net/projects/indus-project/
  24. 24. INDUS – Query Editor http://sourceforge.net/projects/indus-project/
  25. 25. Optimization for Scalability <ul><li>Database storage for ontologies </li></ul><ul><li>Using transitive closure for fast inference with hierarchies </li></ul><ul><li>Server-side caching </li></ul><ul><ul><li>Using temporary tables on the data source server </li></ul></ul><ul><li>Client-side caching </li></ul><ul><ul><li>Of remote ontologies and ontology mappings </li></ul></ul>
  26. 26. Performance <ul><li>O 1 : Enzyme Classification (EC) hierarchy (4,564 terms) </li></ul><ul><li>M: SCOP to EC mapping [Richard George et. al.] with 15,765 rules </li></ul><ul><li>O 2 : SCOP (Structural Classification of Proteins) hierarchy (86,766 terms). </li></ul>Server Client D Internet
  27. 27. Performance
  28. 28. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  29. 29. Conclusion <ul><li>We have studied the query translation process for relational data sources extended with context-specific data content ontologies. </li></ul><ul><ul><li>how to exploit ontologies and mappings for flexibly querying semantic-rich data sources. </li></ul></ul><ul><ul><li>query translation strategy that works for hierarchical ontologies. </li></ul></ul><ul><ul><li>the conditions under which the soundness and completeness of such a procedure can be guaranteed. </li></ul></ul><ul><li>Ongoing Work </li></ul><ul><ul><li>More expressive ontologies, e.g., Description Logics </li></ul></ul><ul><ul><li>Schema ontology + data content ontology </li></ul></ul><ul><ul><li>Statistical learning from OEDS </li></ul></ul>
  30. 30. <ul><li>Thank You! </li></ul>
  31. 31. Semantics Preserving Translation <ul><li>Conservative Extension </li></ul>

×