Query Translation for Ontology-extended Data Sources

1,314 views
1,239 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,314
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Query Translation for Ontology-extended Data Sources

  1. 1. Query Translation for Ontology-extended Data Sources Jie Bao 1 , Doina Caragea 2 , Vasant Honavar 1 1 Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA {baojie, honavar}@cs.iastate.edu 2 Department of Computing and Information Sciences Kansas State University, Manhattan, KS 66506, USA {dcaragea}@ksu.edu
  2. 2. INDUS Group Vasant Honavar Jie Bao Doina Caragea Jyotishman Pathak Neeraj Koul
  3. 3. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  4. 4. Background <ul><li>Data revolution </li></ul><ul><li>Bioinformatics </li></ul><ul><ul><li>Over 200 data repositories of interest to molecular biologists alone </li></ul></ul><ul><li>Environmental Informatics </li></ul><ul><li>Enterprise Informatics </li></ul><ul><li>Medical Informatics </li></ul><ul><li>Social Informatics ... </li></ul><ul><li>Connectivity revolution (Internet and the web) </li></ul><ul><li>Integration revolution </li></ul><ul><li>Need to understand the elephant as opposed to examining the trunk, the tail, etc. </li></ul><ul><li>Needed – infrastructure to support collaborative, integrative analysis of data </li></ul>
  5. 5. Solution: INDUS for Learning from Semantically Heterogeneous Distributed Autonomous Data Sources
  6. 6. (Relational) Data Source D Data Set Extensional Definition (Facts) MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers S Schema Intensional Definition Classes Faculty Teaches name:String code:String rank:String name:String Student Registers name:String status:String
  7. 7. Semantic Extensions of Data Sources Return classes that graduate students are registered in Return all people in the database ? ? D S MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers
  8. 8. Ontology-Extended Data Source Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String People Student Instructor MSc Bob First-year Alice status name Student student Undergrad Graduate First- year MSc Fourth- year … PhD MA
  9. 9. Ontology-Extended Data Source D Data Set S Schema O S Schema Ontology O D Data Content Ontology O’ S O’ D
  10. 10. Ontology-Extended Data Source <ul><li>Relational Model (Reiter, 1982) </li></ul><ul><ul><li>Schema S: a first order language with predicate symbols R S , each for a relational table (e.g. Classes, Faculty) </li></ul></ul><ul><ul><li>Data Set D: a first order interpretation of S with domain  </li></ul></ul><ul><li>Ontology-Extended (Relational) Data Source (Caragea et al. 2004) </li></ul><ul><ul><li>Extending relational model with </li></ul></ul><ul><ul><li>Schema Ontology : a first order language L OS with predicate symbols R OS , and R S  R OS </li></ul></ul><ul><ul><li>Data Content ontology : O OD =(L OD ,D OD ) </li></ul></ul><ul><ul><ul><li>L OD : a first order language with predicate symbols R OD , R OD  R S =  </li></ul></ul></ul><ul><ul><ul><li>D OD : a first order interpretation of L OD with domain  ’,    ’ </li></ul></ul></ul>
  11. 11. OEDS: Example S: Instructor(x,y); Classes(x,y), Student(x,y)… see survey [Shvaiko & Euzenat 2005] MSc Bob First-year Alice status name Student D Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String L OS  x,y, Student(x,y)  Instructor(x,y)  People(x) isa(x,y)  isa(y,z)  isa(x,z) L OD D OD isa(First-year,Undergraduate) isa(Undergraduate,Student) isa(MSc,Graduate) … O D
  12. 12. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  13. 13. Query <ul><li>Tuple Relational Calculus (TRC) </li></ul><ul><ul><li>Tuple: a multiset of attributes </li></ul></ul><ul><ul><li>TRC  Relational Algebra </li></ul></ul><ul><ul><li>q(t) := Student(t)  (t.status=”Graduate”) </li></ul></ul><ul><li>Ontology-Extended Tuple Relational Calculus </li></ul><ul><ul><li>q(t) := Student(t)  isa(t.status, Graduate) </li></ul></ul><ul><li>We focus on data content ontologies in this talk </li></ul>
  14. 14. Query Translation D S O 2 q’ D S q O 1 User Ontology Data Source Ontology M Ontology Mapping
  15. 15. Ontology Mapping <ul><li>isa 1 (x, c 1 ) ^ into(c 1 , c 2 )  isa 2 (x, c 2 ) </li></ul><ul><li>isa 1 (c 1 , x) ^ onto(c 1 , c 2 )  isa 2 (c 2 , x) </li></ul><ul><li>…… </li></ul>Student Undergrad Graduate First- year MSc Fourth- year … PhD MA Student Undergrad Postgraduate Freshman … Doctoral Master into onto equ isa 1 isa 2
  16. 16. Query Translation Student(t) ^ isa 1 (t:status,Master) Student(t) ^ isa 2 (t.status, Graduate) Student(t) ^ isa 2 (t.status, MSc) D S O 2 q’ q D S O 1 M
  17. 17. Soundness, Completeness and Exactness {q} {q’} {q’} {q} {q}={q’} Sound Translation Complete Translation Exact Translation q := Student(t) ^ isa 1 (t:status,Master) q’ := Student(t) ^ isa 2 (t.status, MSc) q’ := Student(t) ^ isa 2 (t.status, Graduate) Non-existent
  18. 18. Most Informative Translation c 1 d 1 d 2 O 1 O 2 q := isa 1 (x,c 1 ) isa 2 (x,d 1 )  isa 2 (x,d 2 ) Most informative sound translation! onto onto LUB (least upper bound) isa 2 (x,d 1 ) isa 2 (x,d 2 ) find its sound translation(s)
  19. 19. Query Translation Rules <ul><li>For hierarchical ontologies </li></ul>(similarly for complete translation of complex queries) Atomic conditions Complex conditions GLB=greatest lower bound, LUB=least upper bound
  20. 20. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  21. 21. INDUS Tools <ul><li>Ontology Editor </li></ul><ul><li>Schema Editor </li></ul><ul><li>Mapping Editor </li></ul><ul><li>Data Editor </li></ul><ul><li>Query Engine and Interface </li></ul><ul><li>… </li></ul>
  22. 22. INDUS – Mapping Editor http://sourceforge.net/projects/indus-project/
  23. 23. INDUS – Data Editor http://sourceforge.net/projects/indus-project/
  24. 24. INDUS – Query Editor http://sourceforge.net/projects/indus-project/
  25. 25. Optimization for Scalability <ul><li>Database storage for ontologies </li></ul><ul><li>Using transitive closure for fast inference with hierarchies </li></ul><ul><li>Server-side caching </li></ul><ul><ul><li>Using temporary tables on the data source server </li></ul></ul><ul><li>Client-side caching </li></ul><ul><ul><li>Of remote ontologies and ontology mappings </li></ul></ul>
  26. 26. Performance <ul><li>O 1 : Enzyme Classification (EC) hierarchy (4,564 terms) </li></ul><ul><li>M: SCOP to EC mapping [Richard George et. al.] with 15,765 rules </li></ul><ul><li>O 2 : SCOP (Structural Classification of Proteins) hierarchy (86,766 terms). </li></ul>Server Client D Internet
  27. 27. Performance
  28. 28. Outline <ul><li>Ontology-Extended Data Source </li></ul><ul><ul><li>Schema, Data, and Ontology </li></ul></ul><ul><li>Query Translation for OEDS </li></ul><ul><ul><li>Ontology mapping, query translation / soundness / completeness </li></ul></ul><ul><li>Implementation and Optimization </li></ul><ul><ul><li>The INDUS system </li></ul></ul><ul><li>Conclusion </li></ul>
  29. 29. Conclusion <ul><li>We have studied the query translation process for relational data sources extended with context-specific data content ontologies. </li></ul><ul><ul><li>how to exploit ontologies and mappings for flexibly querying semantic-rich data sources. </li></ul></ul><ul><ul><li>query translation strategy that works for hierarchical ontologies. </li></ul></ul><ul><ul><li>the conditions under which the soundness and completeness of such a procedure can be guaranteed. </li></ul></ul><ul><li>Ongoing Work </li></ul><ul><ul><li>More expressive ontologies, e.g., Description Logics </li></ul></ul><ul><ul><li>Schema ontology + data content ontology </li></ul></ul><ul><ul><li>Statistical learning from OEDS </li></ul></ul>
  30. 30. <ul><li>Thank You! </li></ul>
  31. 31. Semantics Preserving Translation <ul><li>Conservative Extension </li></ul>

×