Query Translation for Ontology-extended Data Sources Jie Bao 1 , Doina Caragea 2 , Vasant Honavar 1 1 Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA {baojie, honavar}@cs.iastate.edu 2 Department of Computing and Information Sciences Kansas State University, Manhattan, KS 66506, USA {dcaragea}@ksu.edu
INDUS Group Vasant Honavar Jie Bao Doina Caragea Jyotishman Pathak Neeraj Koul
Outline Ontology-Extended Data Source Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
Background Data revolution Bioinformatics Over 200 data repositories of interest to molecular biologists alone Environmental Informatics Enterprise Informatics  Medical Informatics Social Informatics ... Connectivity revolution  (Internet and the web) Integration revolution   Need to understand the elephant as opposed to examining the trunk, the tail, etc. Needed –  infrastructure to support collaborative, integrative analysis of data
Solution: INDUS for Learning from Semantically Heterogeneous Distributed Autonomous Data Sources
(Relational) Data Source D Data Set Extensional Definition (Facts) MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers S Schema Intensional Definition Classes Faculty Teaches name:String code:String rank:String name:String Student Registers name:String status:String
Semantic Extensions of Data Sources Return classes that  graduate  students are registered in Return all  people   in the database ? ? D S MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers
Ontology-Extended Data Source Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String People Student Instructor MSc Bob First-year Alice status name Student student Undergrad Graduate First- year MSc Fourth- year … PhD MA
Ontology-Extended Data Source D Data Set S Schema O S Schema Ontology O D Data Content Ontology O’ S O’ D
Ontology-Extended Data Source Relational Model (Reiter, 1982) Schema  S: a first order language with  predicate symbols  R S , each for a relational table (e.g. Classes, Faculty) Data Set  D: a first order interpretation of S with domain   Ontology-Extended (Relational) Data Source (Caragea et al. 2004) Extending relational model with Schema Ontology : a first order language L OS  with  predicate symbols  R OS , and R S     R OS Data Content ontology : O OD =(L OD ,D OD ) L OD : a first order language with  predicate symbols  R OD , R OD    R S =  D OD : a first order interpretation of L OD  with domain   ’,         ’
OEDS: Example S:  Instructor(x,y); Classes(x,y), Student(x,y)… see survey [Shvaiko & Euzenat 2005] MSc Bob First-year Alice status name Student D Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String L OS  x,y, Student(x,y)    Instructor(x,y)    People(x) isa(x,y)    isa(y,z)   isa(x,z) L OD D OD isa(First-year,Undergraduate) isa(Undergraduate,Student) isa(MSc,Graduate) … O D
Outline Ontology-Extended Data Source Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
Query Tuple Relational Calculus (TRC) Tuple: a multiset of attributes TRC    Relational Algebra  q(t) := Student(t)    (t.status=”Graduate”) Ontology-Extended Tuple Relational Calculus q(t) := Student(t)     isa(t.status, Graduate) We focus on data content ontologies in this talk
Query Translation D S O 2 q’ D S q O 1 User Ontology Data Source Ontology M Ontology Mapping
Ontology Mapping isa 1 (x, c 1 ) ^ into(c 1 , c 2 )    isa 2 (x, c 2 )  isa 1 (c 1 , x) ^ onto(c 1 , c 2 )    isa 2 (c 2 , x)  …… Student Undergrad Graduate First- year MSc Fourth- year … PhD MA Student Undergrad Postgraduate  Freshman … Doctoral Master into onto equ isa 1 isa 2
Query Translation Student(t) ^ isa 1 (t:status,Master) Student(t) ^ isa 2 (t.status, Graduate) Student(t) ^ isa 2 (t.status, MSc) D S O 2 q’ q D S O 1 M
Soundness, Completeness and Exactness {q} {q’} {q’} {q} {q}={q’} Sound Translation Complete Translation Exact Translation q := Student(t) ^ isa 1 (t:status,Master) q’ := Student(t) ^ isa 2 (t.status, MSc) q’ := Student(t) ^ isa 2 (t.status, Graduate) Non-existent
Most Informative Translation c 1 d 1 d 2 O 1 O 2 q := isa 1 (x,c 1 ) isa 2 (x,d 1 )    isa 2 (x,d 2 ) Most informative  sound translation! onto onto LUB (least upper bound) isa 2 (x,d 1 ) isa 2 (x,d 2 ) find its sound translation(s)
Query Translation Rules For hierarchical ontologies (similarly for  complete  translation of complex queries) Atomic conditions Complex conditions  GLB=greatest lower bound, LUB=least upper bound
Outline Ontology-Extended Data Source Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
INDUS Tools Ontology Editor Schema Editor Mapping Editor Data Editor Query Engine and Interface …
INDUS – Mapping Editor http://sourceforge.net/projects/indus-project/
INDUS – Data Editor http://sourceforge.net/projects/indus-project/
INDUS – Query Editor http://sourceforge.net/projects/indus-project/
Optimization for Scalability Database storage for ontologies Using transitive closure for fast inference with hierarchies Server-side caching Using temporary tables on the data source server Client-side caching  Of remote ontologies and ontology mappings
Performance O 1 : Enzyme Classification (EC) hierarchy (4,564 terms) M: SCOP to EC mapping [Richard George et. al.] with 15,765 rules O 2 : SCOP (Structural Classification of Proteins) hierarchy (86,766 terms). Server Client D Internet
Performance
Outline Ontology-Extended Data Source Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
Conclusion We have studied the query translation process for relational data sources extended with context-specific data content ontologies. how to exploit ontologies and mappings for flexibly querying semantic-rich data sources. query translation strategy that works for hierarchical ontologies.  the conditions under which the soundness and completeness of such a procedure can be guaranteed. Ongoing Work More expressive ontologies, e.g., Description Logics Schema ontology + data content ontology Statistical learning from OEDS
Thank You!
Semantics Preserving Translation Conservative Extension

Query Translation for Ontology-extended Data Sources

  • 1.
    Query Translation forOntology-extended Data Sources Jie Bao 1 , Doina Caragea 2 , Vasant Honavar 1 1 Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA {baojie, honavar}@cs.iastate.edu 2 Department of Computing and Information Sciences Kansas State University, Manhattan, KS 66506, USA {dcaragea}@ksu.edu
  • 2.
    INDUS Group VasantHonavar Jie Bao Doina Caragea Jyotishman Pathak Neeraj Koul
  • 3.
    Outline Ontology-Extended DataSource Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
  • 4.
    Background Data revolutionBioinformatics Over 200 data repositories of interest to molecular biologists alone Environmental Informatics Enterprise Informatics Medical Informatics Social Informatics ... Connectivity revolution (Internet and the web) Integration revolution Need to understand the elephant as opposed to examining the trunk, the tail, etc. Needed – infrastructure to support collaborative, integrative analysis of data
  • 5.
    Solution: INDUS forLearning from Semantically Heterogeneous Distributed Autonomous Data Sources
  • 6.
    (Relational) Data SourceD Data Set Extensional Definition (Facts) MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers S Schema Intensional Definition Classes Faculty Teaches name:String code:String rank:String name:String Student Registers name:String status:String
  • 7.
    Semantic Extensions ofData Sources Return classes that graduate students are registered in Return all people in the database ? ? D S MSc Bob First-year Alice status name Student algorithm CS511 data structure CS103 name code Classes CS511 Bob CS103 Alice class instructor Registers
  • 8.
    Ontology-Extended Data SourceClasses Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String People Student Instructor MSc Bob First-year Alice status name Student student Undergrad Graduate First- year MSc Fourth- year … PhD MA
  • 9.
    Ontology-Extended Data SourceD Data Set S Schema O S Schema Ontology O D Data Content Ontology O’ S O’ D
  • 10.
    Ontology-Extended Data SourceRelational Model (Reiter, 1982) Schema S: a first order language with predicate symbols R S , each for a relational table (e.g. Classes, Faculty) Data Set D: a first order interpretation of S with domain  Ontology-Extended (Relational) Data Source (Caragea et al. 2004) Extending relational model with Schema Ontology : a first order language L OS with predicate symbols R OS , and R S  R OS Data Content ontology : O OD =(L OD ,D OD ) L OD : a first order language with predicate symbols R OD , R OD  R S =  D OD : a first order interpretation of L OD with domain  ’,    ’
  • 11.
    OEDS: Example S: Instructor(x,y); Classes(x,y), Student(x,y)… see survey [Shvaiko & Euzenat 2005] MSc Bob First-year Alice status name Student D Classes Instructor Teaches name:String code:String rank:String name:String Student registers name:String status:String L OS  x,y, Student(x,y)  Instructor(x,y)  People(x) isa(x,y)  isa(y,z)  isa(x,z) L OD D OD isa(First-year,Undergraduate) isa(Undergraduate,Student) isa(MSc,Graduate) … O D
  • 12.
    Outline Ontology-Extended DataSource Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
  • 13.
    Query Tuple RelationalCalculus (TRC) Tuple: a multiset of attributes TRC  Relational Algebra q(t) := Student(t)  (t.status=”Graduate”) Ontology-Extended Tuple Relational Calculus q(t) := Student(t)  isa(t.status, Graduate) We focus on data content ontologies in this talk
  • 14.
    Query Translation DS O 2 q’ D S q O 1 User Ontology Data Source Ontology M Ontology Mapping
  • 15.
    Ontology Mapping isa1 (x, c 1 ) ^ into(c 1 , c 2 )  isa 2 (x, c 2 ) isa 1 (c 1 , x) ^ onto(c 1 , c 2 )  isa 2 (c 2 , x) …… Student Undergrad Graduate First- year MSc Fourth- year … PhD MA Student Undergrad Postgraduate Freshman … Doctoral Master into onto equ isa 1 isa 2
  • 16.
    Query Translation Student(t)^ isa 1 (t:status,Master) Student(t) ^ isa 2 (t.status, Graduate) Student(t) ^ isa 2 (t.status, MSc) D S O 2 q’ q D S O 1 M
  • 17.
    Soundness, Completeness andExactness {q} {q’} {q’} {q} {q}={q’} Sound Translation Complete Translation Exact Translation q := Student(t) ^ isa 1 (t:status,Master) q’ := Student(t) ^ isa 2 (t.status, MSc) q’ := Student(t) ^ isa 2 (t.status, Graduate) Non-existent
  • 18.
    Most Informative Translationc 1 d 1 d 2 O 1 O 2 q := isa 1 (x,c 1 ) isa 2 (x,d 1 )  isa 2 (x,d 2 ) Most informative sound translation! onto onto LUB (least upper bound) isa 2 (x,d 1 ) isa 2 (x,d 2 ) find its sound translation(s)
  • 19.
    Query Translation RulesFor hierarchical ontologies (similarly for complete translation of complex queries) Atomic conditions Complex conditions GLB=greatest lower bound, LUB=least upper bound
  • 20.
    Outline Ontology-Extended DataSource Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
  • 21.
    INDUS Tools OntologyEditor Schema Editor Mapping Editor Data Editor Query Engine and Interface …
  • 22.
    INDUS – MappingEditor http://sourceforge.net/projects/indus-project/
  • 23.
    INDUS – DataEditor http://sourceforge.net/projects/indus-project/
  • 24.
    INDUS – QueryEditor http://sourceforge.net/projects/indus-project/
  • 25.
    Optimization for ScalabilityDatabase storage for ontologies Using transitive closure for fast inference with hierarchies Server-side caching Using temporary tables on the data source server Client-side caching Of remote ontologies and ontology mappings
  • 26.
    Performance O 1: Enzyme Classification (EC) hierarchy (4,564 terms) M: SCOP to EC mapping [Richard George et. al.] with 15,765 rules O 2 : SCOP (Structural Classification of Proteins) hierarchy (86,766 terms). Server Client D Internet
  • 27.
  • 28.
    Outline Ontology-Extended DataSource Schema, Data, and Ontology Query Translation for OEDS Ontology mapping, query translation / soundness / completeness Implementation and Optimization The INDUS system Conclusion
  • 29.
    Conclusion We havestudied the query translation process for relational data sources extended with context-specific data content ontologies. how to exploit ontologies and mappings for flexibly querying semantic-rich data sources. query translation strategy that works for hierarchical ontologies. the conditions under which the soundness and completeness of such a procedure can be guaranteed. Ongoing Work More expressive ontologies, e.g., Description Logics Schema ontology + data content ontology Statistical learning from OEDS
  • 30.
  • 31.
    Semantics Preserving TranslationConservative Extension