Translating natural language competency questions into sparql queries web2013

1,564 views

Published on

it's our presentation at WEB 2013 the first internationl conference on building and exploring web based environments, held at seville, Spain (January 27-February 01)

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,564
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
44
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Translating natural language competency questions into sparql queries web2013

  1. 1. GlobeNet 2013 WEB 2013, The First International Conference on Building and Exploring Web Based Environments January 27 - February 1, 2013 - Seville, Spain Translating Natural LanguageCompetency Questions into SPARQL Queries: A Case Study Authors: Leila ZEMMOUCHI-GHOMARI, l_zemmouchi@esi.dz Abdessamed Réda GHOMARI, a_ghomari@esi.dz LMCS Laboratory National Superior School of Computer Science, Algiers, Algeria www.esi.dz
  2. 2. OUTLINE 1. MOTIVATION 2. RELATED WORK 3. PROPOSED TRANSLATION APPROACH 4. CASE STUDY 6. CONCLUSIONS AND FUTURE WORKWEB 2013 January 27 - February 1, 2013 - Seville, Spain 2
  3. 3. 1. MOTIVATION The context of the current research work is a PHD thesis focused on an ontology engineering process TranslationWEB 2013 January 27 - February 1, 2013 - Seville, Spain 3
  4. 4. 1. MOTIVATION The context of the current research work is a PHD thesis focused on an ontology engineering process expressed in a formal language in order to allow automatic evaluation Competency questions is a well-known technique that allow to determine the requirements or needs the ontology should fulfill TranslationWEB 2013 January 27 - February 1, 2013 - Seville, Spain 4
  5. 5. 2. RELATED WORK To the best of our knowledge, automatic translation of competency questions into SPARQL queries, with the aim of validating an ontology, has not been tackled by researchers. Although, in a more general perspective, there exist several approaches dedicated to web Question Answering (QA) area CNL OWLPATH PANTO DEANNA Ben Abacha & Zweigenbaum Approach, 2012WEB 2013 January 27 - February 1, 2013 - Seville, Spain 7
  6. 6. 2. RELATED WORK CNL OWLPATH PANTO DEANNA Ben Abacha & Zweigenbaum Ontology-based OWL Ontology- Portable Natural Deep Answers for Approach, 2012 Controlled guided query Language Naturally Asked Natural Language Editor Interface to Questions Translating Medical Editor Ontologies Questions into SPARQL Queries Limitations:  Scalability: Their test ontologies are relatively small  Preliminary work are necessary to apply theses approaches like Mapping set between concepts’ questions and queried knowledge bases difficult to carry out and to maintain.  some of them focus on some types of questions and some know. domains  No consensus of web QA community on a single approachWEB 2013 January 27 - February 1, 2013 - Seville, Spain 8
  7. 7. 3. PROPOSED TRANSLATION APPROACH (1/3) A variation of [Ben Abacha & Zweigenbaum, 2012] Approach  Specific to the medical field WHY ?  Limited to a particular set of questions: WH questions, except complex ones (why and when). Their approach Our approach 1. Identifying QuestionType 1. Identifying QuestionType HOW ? 2. Determining the Expected Answer(s) 2. Determining the expected Type(s) for WH questions answer 3. Constructing the question’s affirmative and simplified form 4. Medical Entity Recognition 3. Entity Extraction (treatment, disease…) 5. Relation Extraction 4. Identifying answer entity type and entity location in the ontologyWEB 2013 6. SPARQL Query Construction 5.January 27 - February 1,Construction SPARQL Query 2013 - Seville, Spain 9
  8. 8. 3. PROPOSED TRANSLATION APPROACH (2/3) Phase I: Identifying competency questions’ categories according to expected answers’ types: a) Definition Questions: that begins with “What is/are” or “What does mean” b) Boolean or Yes/No Questions c) Factual Questions: the answer is a fact or a precise information d) List questions: the answer is a list of entities e) Complex Questions: that begins with “How” and “Why”WEB 2013 January 27 - February 1, 2013 - Seville, Spain 10
  9. 9. 3. PROPOSED TRANSLATION query result clause (2/3) the APPROACH specifies the result form Phase I: Identifying competency questions’ categories according to expected answers’ types: a) Definition Questions: that begins with “What is/are” or “What does mean” b) Boolean or Yes/No Questions c) Factual Questions: the answer is a fact or a precise information d) List questions: the answer is a list of entities e) Complex Questions: that begins with “How” and “Why”WEB 2013 January 27 - February 1, 2013 - Seville, Spain 11
  10. 10. 3. PROPOSED TRANSLATION APPROACH (3/3) Phase II: Determining the expected (perfect or ideal) answer Phase III: Extracting Entity or Entities from questions and their corresponding expected answers identified in II Phase IV: Identifying answer entity type (class, data property, object property, annotation, axiom, instance) and entity location in the ontology Phase V: Constructing SPARQL query based on question type identified in phase I, question/answer entity extracted from phase III and its corresponding entity type/entity location in the ontology from phase IVWEB 2013 January 27 - February 1, 2013 - Seville, Spain 12
  11. 11. 3. PROPOSED TRANSLATION APPROACH (3/3) Mapping between question/answer entity Phase II: Determining the expected (perfect or ideal) answer and ontology entity Phase III: Extracting Entity or Entities from questions and their corresponding expected answers identified in II Phase IV: Identifying answer entity type (class, data property, object property, annotation, axiom, instance) and entity location in the ontology Phase V: Constructing SPARQL query based on question type identified in phase I, question/answer entity extracted from phase III and its corresponding entity type/entity location in the ontology from phase IVWEB 2013 January 27 - February 1, 2013 - Seville, Spain 13
  12. 12. 3. PROPOSED TRANSLATION APPROACH (3/3) Phase II: Determining the expected (perfect or ideal) answer Phase III: Extracting Entity or Entities from questions and their corresponding expected answers * WHERE in II SELECT identified {?Teacher rdf:type HERO:Teacher . } Phase IV: Identifying answer entity type (class, data property, object property, annotation, axiom, instance) and entity location in the ontology Phase V: Constructing SPARQL query based on question type identified in phase I, question/answer entity extracted from phase III and its corresponding entity type/entity location in the ontology from phase IVWEB 2013 January 27 - February 1, 2013 - Seville, Spain 14
  13. 13. 4. CASE STUDY: HERO Translation of Competency Questions of HERO ontology (Higher Education Reference Ontology) into SPARQL Queries HERO describes several aspects of university domain such as organizational structure, administration, staff, roles, incomes, etc. HERO aims to be a valuable tool for researchers and institutional employees interested in analyzing the system of higher education as a whole.  HERO Ontology is available at: http://sourceforge.net/projects/heronto/?source=directory  Competency questions (81) and their corresponding queries are available at: http://herontology.esi.dz/content/downloadsWEB 2013 January 27 - February 1, 2013 - Seville, Spain 15
  14. 14. 4. CASE STUDY Phase I: Identifying competency questions’ categories according to expected answers’ types CQs’ Categories CQs’ Examples from 81 CQs Definition questions CQ59.What is a Credit? Yes/No questions CQ3. Must a university teacher be a researcher? Factual questions CQ44. What average size and duration have governing board? List questions CQ1. What are the possible academic ranks of a teacher? Complex questions CQ41.Why universities are organized into departments?WEB 2013 January 27 - February 1, 2013 - Seville, Spain 16
  15. 15. 4. CASE STUDY Phase II: Determining the expected answer CQs’ Examples Corresponding Answers CQ59.What is a Credit? Each course bears a specified number of credits. In general, the number of credits a course carries is determined by the number of class hours the course meets each week. CQ3. Must a university Nearly all faculty members are expected to engage in research. teacher be a researcher? CQ44. What average size and The average size of public boards is approximately 10 people and duration have governing the average size among independent (private) institutions is 30. board? The length of board members’ terms varies from three years to as long as 12 years. CQ1. What are the possible Assistant Professor, Associate Professor, Full Professor, Professor academic ranks of a teacher? Emeritus. CQ41.Why universities are The basic unit of academic organization in most institutions is organized into departments? the department (e.g., chemistry, political science). Every department belongs to an academic field.WEB 2013 January 27 - February 1, 2013 - Seville, Spain 17
  16. 16. 4. CASE STUDY Answers sources are: academic reports, Phase II: Determining the expected answer governmental websites, experts’ interviews, ... CQs’ Examples Corresponding Answers CQ59.What is a Credit? Each course bears a specified number of credits. In general, the number of credits a course carries is determined by the number of class hours the course meets each week. CQ3. Must a university Nearly all faculty members are expected to engage in research. teacher be a researcher? CQ44. What average size and The average size of public boards is approximately 10 people and duration have governing the average size among independent (private) institutions is 30. board? The length of board members’ terms varies from three years to as long as 12 years. CQ1. What are the possible Assistant Professor, Associate Professor, Full Professor, Professor academic ranks of a teacher? Emeritus. CQ41.Why universities are The basic unit of academic organization in most institutions is organized into departments? the department (e.g., chemistry, political science). Every department belongs to an academic field.WEB 2013 January 27 - February 1, 2013 - Seville, Spain 18
  17. 17. 4. CASE STUDY Phase III: Extracting Entity or Entities from competency questions and their corresponding expected answers identified in II. This extraction is based on a mapping between relevant terms in questions/answers pairs and their equivalent terms in the ontology Extracted terms from CQs’ Extracted terms from Answers CQ59.What is a Credit? Each course bears a specified number of credits. In general, the number of credits a course carries is determined by the number of class hours the course meets each week. CQ3. Must a university teacher Nearly all faculty members are expected to engage in be a researcher? research. CQ44. What average size and The average size of public boards is approximately 10 people duration has governing and the average size among independent (private) board? institutions is 30. The length of board members’ terms varies from three years to as long as 12 years. CQ41.Why universities are The basic unit of academic organization in most institutions organized into departments? is the department (e.g., chemistry, political science). Every department belongs to an academic field.WEB 2013 January 27 - February 1, 2013 - Seville, Spain 19
  18. 18. 4. CASE STUDY: Phase IV: Identifying answer entity type (class, data property, object property, annotation, axiom, instance) and entity location in the ontology Entities’ Types Entities’ Locations in the ontology Class: Course CourseCreditsNumber Domain Course Data Property: CourseCreditsNumber Classes: Teacher, Researcher Teacher SubClassOf Researcher Class: Governing Board GoverningBoardSize Domain GoverningBoard Data Properties: Size, Duration GoverningBoardDuration Domain GoverningBoard Class: Teacher TeacherRank Domain Teacher Data Property: Rank, Assistant AssistantProfessor SubPropertyOf TeacherRank Professor, Associate Professor, Full AssociateProfessor SubPropertyOf TeacherRank Professor, Professor Emeritus FullProfessor SubPropertyOf TeacherRank ProfessorEmeritus SubPropertyOf TeacherRank Classes: Higher Education Department SubClassOf Faculty Organization, Department Faculty SubClassOf Role Role SubClassOf HigherEducationOrganization Department DefinitionWEB 2013 January 27 - February 1, 2013 - Seville, Spain 20
  19. 19. 4. CASE STUDY: Phase V: Construction of SPARQL queries Competency Questions SPARQL Queries CQ59.What is a Credit? SELECT ?comment WHERE { HERO:CourseCreditsNumber rdfs:comment ?comment } CQ3. Must a university teacher be a ASK researcher? {HERO:Teacher rdfs:subClassOf HERO:Researcher .} SELECT ?university ?size WHERE CQ44. What average size and { ?university rdf:type HERO:HigherEducationOrganization; duration have governing board? ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardSize ?size } SELECT ?university ?duration WHERE { ?university rdf:type HERO:HigherEducationOrganization ; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardDuration?duration } CQ1. What are the possible SELECT ?a ?b ?c ?d WHERE academic ranks of a teacher? {?a rdfs:subPropertyOf HERO:TeacherRank. ?b rdfs:subPropertyOf ?a . ?c rdfs:subPropertyOf ?b . ?d rdfs:subPropertyOf ?c .}WEB 2013 January 27 - February 1, 2013 - Seville, Spain 21
  20. 20. 4. CASE STUDY: These queries can be checked out by using available online SPARQL end- Phase V: Construction of SPARQL queries or off-line tools such as: TWINKLE points Competency Questions SPARQL Queries CQ59.What is a Credit? SELECT ?comment WHERE { HERO:CourseCreditsNumber rdfs:comment ?comment } CQ3. Must a university teacher be a ASK researcher? {HERO:Teacher rdfs:subClassOf HERO:Researcher .} SELECT ?university ?size WHERE CQ44. What average size and { ?university rdf:type HERO:HigherEducationOrganization; duration have governing board? ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardSize ?size } SELECT ?university ?duration WHERE { ?university rdf:type HERO:HigherEducationOrganization ; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardDuration?duration } CQ1. What are the possible SELECT ?a ?b ?c ?d WHERE academic ranks of a teacher? {?a rdfs:subPropertyOf HERO:TeacherRank. ?b rdfs:subPropertyOf ?a . ?c rdfs:subPropertyOf ?b . ?d rdfs:subPropertyOf ?c .}WEB 2013 January 27 - February 1, 2013 - Seville, Spain 22
  21. 21. 5. CONCLUSION AND FUTURE WORK • Summary Intended users: ontology developers, i.e.; They are familiar with: ontology language, ontology structure and query language Intended uses: ontology validation, i.e.; Since competency questions are the starting point for extracting relevant terms that become later ontology entities translated CQs on SPARQL Queries target directly ontology entitiesWEB 2013 January 27 - February 1, 2013 - Seville, Spain 23
  22. 22. 5. CONCLUSION AND FUTURE WORK Helps in Entity location • Summary (phase 4 ) and query construction (phase 5) Intended users: ontology developers, i.e.; They are familiar with: ontology language, ontology structure and query language Helps in Entity extraction (phase 3 ) Intended uses: ontology validation, i.e.; Since competency questions are the starting point for extracting relevant terms that become later ontology entities translated CQs on SPARQL Queries target directly ontology entitiesWEB 2013 January 27 - February 1, 2013 - Seville, Spain 24
  23. 23. 5. CONCLUSION AND FUTURE WORK • Limitations  Two of proposed approach phases are manual and dependent of user knowledge background: Entity extraction from questions/answers pairs and mapping between questions/answers relevant terms and ontology entities  Weak treatment of complex questions • Future Work  The best way to tackle the issue of manual phases is to integrate natural language processing tools like GATE in terms extraction phase and automatic matching systems such as COMA 3.0 which efficiency has been already proved.WEB 2013 January 27 - February 1, 2013 - Seville, Spain 25
  24. 24. SOME REFERENCES 1. CQs……M. Gruninger and M. S. Fox, “Methodology for the design and evaluation of ontologies”, IJCAI95, Workshop on Basic Ontological Issues in Knowledge Sharing. Montreal, 1995, pp. 6.1–6.10. 2. Web QA Approach….. A. Ben Abacha and P. Zweigenbaum, “Medical Question Answering: Translating Medical Questions into SPARQL Queries”, Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, Miami, Florida, USA, 2012, pp. 41-50. 3. SPARQL….Querying the Semantic Web: SPARQL by Emanuelle Della Valle and Stefano Ceri, pp 299-363 in HANDBOOK OF SEMANTIC WEB TECHNOLOGIES, 2011, SPRINGER. THANK YOU FOR YOUR ATTENTIONWEB 2013 January 27 - February 1, 2013 - Seville, Spain 26

×