Intelligent natural language system


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Intelligent natural language system

  1. 1. Intelligent NI lli Natural L l Language System MANISH JOSHI RAJENDRA AKERKAR
  2. 2. Open Domain Question Answering  What is Question Answering?  How is QA related to IR, IE? S Some i issues related to QA l d  Question taxonomies Q  General approach to QAENLIGHT sys 2 8 July, 2007
  3. 3. Question Answering Systems These types of systems try to provide exact information as an answer in response to the natural language query raised by the user. Motivation: given a question, system should provide an answer instead of requiring user to search for the answer in a set of documents Example: Q: What year was Mozart born? A: Mozart was born in 1756. ENLIGHT sys 3 8 July, 2007
  4. 4. Information Retrieval  Document is the unit of information  Answers questions indirectly One has to search into the Document  Results: (ranked) list based on estimated relevance Effective approaches are p pp predominantly statistical y (“bag of words”) Q QA = ( y short) p (very ) passage retrieval with natural language g g g questions (not queries)ENLIGHT sys 4 8 July, 2007
  5. 5. Information Extraction Task  Identify messages that fall under a number of specific topics  Extract information according to p g pre-defined templates p  Place the information into frame-like database records Limitations  Templates are hand-crafted by human experts  Templates are domain dependent and not easily portableENLIGHT sys 5 8 July, 2007
  6. 6. Issues Applications  Source of the answers Structured data — natural language queries on databases A fixed collection or book — encyclopedia Web d b data  Domain-independent vs. Domain specific p p Users  Casual users vs. Regular users — Profile, History, etc.  May be maintained for regular users y gENLIGHT sys 6 8 July, 2007
  7. 7. Question TaxonomyFactual questions: answer is often found in a text snippetfrom one or more documents Questions that may have yes/no answers wh questions ( h where, when, etc.) h i (who, h h ) what, which questions are hard Questions may be phrased as requests or commandsQuestions requiring simple reasoning: Some worldknowledge,knowledge elementary reasoning may be required to relatethe question with the answer. why, how questionse.g.e g How did Socrates die? (by) drinking poisoned wine. wineENLIGHT sys 7 8 July, 2007
  8. 8. Question TaxonomyContext questions: Questions have to be answered in thecontext of previous interactions with the user Who assassinated Indira Gandhi? When did this happen? List questions: Fusion of partial answers scattered over several documents is necessary Ex. - List 3 major rice producing nations. How do I assemble a bicycle?ENLIGHT sys 8 8 July, 2007
  9. 9. QA System ArchitectureENLIGHT sys 9 8 July, 2007
  10. 10. General ApproachQuestion analysis: Find type of object that answers question:"when" -time, date "who" -person, organization, etc.Document collection preprocessing: Prepare documentsfor real-time query processing q yp gDocument retrieval (IR): Using (augmented) question,retrieve set of possible relevant documents/passages using IRENLIGHT sys 10 8 July, 2007
  11. 11. General ApproachDocument processing (IE): Search documents for entitiesof the desired type and in appropriate relations using NLPAnswer extraction and ranking: Extract and rankcandidate answers from the documentsAnswer construction: Provide (links to) context, evidence context evidence,etc.ENLIGHT sys 11 8 July, 2007
  12. 12. Question Analysis Identify semantic type of the entity sought by the question  when, where, who — easy to handle  which, what — ambiguous e.g. What was the Beatles’ first hit single? Beatles Determine additional constraints on the answer entity key words that will be used to locate candidate answer-bearing sentences relations (syntactic/semantic) that should hold between a candidate answer entity and other entities mentioned in the questionENLIGHT sys 12 8 July, 2007
  13. 13. Document ProcessingPreprocessing: Detailed analysis of all texts in the corpusmay b d be done a priori i i one group annotates terms with one of 50 semantic tags which are indexed along with terms Retrieval: Initial set of candidate answer bearing documents answer-bearing are selected from a large collection  Boolean retrieval methods may be used profitably  Passage retrieval may be more appropriateENLIGHT sys 13 8 July, 2007
  14. 14. Document ProcessingAnalysis: P t of speech t Part f h tagging i Named entity identification: recognizes multi-word strings as names of companies/persons, locations/addresses, quantities, etc. Shallow/deep syntactic analysis: Obtains information about syntactic relations, semantic rolesENLIGHT sys 14 8 July, 2007
  15. 15. History MURAX ((Kupiec, 1993 )was designed to answer questions from the Trivial Pursuitgeneral-knowledge board game – drawing answers fromGrolier’s on-line encyclopaedia (1990).Text Retrieval Conference (TREC). TREC was started in 1992with the aim of supporting information retrieval research by pp g yproviding the infrastructure necessary for large-scaleevaluation of text retrieval methodologies.The QA track was first included as part of TREC in 1999 withseventeen research groups entering one or more systems systems. ENLIGHT sys 15 8 July, 2007
  16. 16. Techniques for performing open-domain questionansweringManual and automatically constructed question analysers,Document retrieval specifically for question answering,Semantic type answer extraction extraction,Answer extraction via automatically acquired surface matching textppatterns, ,principled target processing combined with document retrieval fordefinition questions,and various approaches to sentence simplification which aid in thegeneration of concise definitions.ENLIGHT sys 16 8 July, 2007
  17. 17. Answer ExtractionLook for strings whose semantic type matches that of theexpected answer - matching may include subasumption(incorporating something under a more general category )Check additional constraints Select a window around matching candidate and calculate word overlap between window and query; OR Check how many distinct question keywords are found in a matching sentence order of occurrence, etc. sentence, occurrence etcCheck syntactic/semantic role of matching candidate Semantic Symmetry Ambiguous ModificationENLIGHT sys 17 8 July, 2007
  18. 18. Semantic Symmetry Question – Who killed militants?  Militants killed five innocents in Doda District.  After 6 hour long encounter army soldiers killed 3 Militants.We are looking for sentences containing word ‘Militant’ assubject but we got a sentence where word ‘Militant’ acts asobject (second sentence)It is a Linguistic Phenomena which occur when an entity actsas subject in some sentences and as object in anothersentences.ENLIGHT sys 18 8 July, 2007
  19. 19. ExampleFollowing Example illustrates the phenomenon of semantic symmetryand demonstrates problems caused thereof.Question : Who visited President of India?Candidate Answer 1: George Bush visited President of IndiaCandidate Answer 2: President of India visited flood affected area ofMumbai.More than one sentences are similar at the word level, but they havevery different meanings.ENLIGHT sys 19 8 July, 2007
  20. 20. Some more examples showing semantic symmetry (1) The birds ate the snake. (1) The snake ate the bird. (What does snake eat?) (2) Communists in India are (2) Small parties are supporting supporting UPA government. Communists in Kerala. (To whom communists are supporting?)ENLIGHT sys 20 8 July, 2007
  21. 21. Ambiguous ModificationIt is a Linguistic Phenomena which occurs when anadjective in the sentence may modify more than onenoun.nounQuestion : What is the largest volcano in the Solar System?Candidate Answer 1: In the Solar System, the largest planetJupitor has several volcanoes. ---- Wrong Candidate Answer 2: Olympus Mons, the largest volcano in the solar system. --- CorrectIn first sentence Largest modifies word ‘planet’ whereas insecond sentence Largest modifies word ‘volcano’.ENLIGHT sys 21 8 July, 2007
  22. 22. Approaches to tackle the problem Boris Katz and James Lin of MIT developed a system SAPERE that handles problems occurring due to semantic symmetry and ambiguous modification. These problems occurs at semantic level level. To deal with problems occurring at semantic level detailed information at syntactic level is g y gathered in all approaches pp System developed by Katz and Lin gives results after utilizing syntactic relations. These typical S-V-O ternary relations are obtained after processing the information gat e ed gathered by Minipar functional depe de cy pa se . pa u ct o a dependency parser.ENLIGHT sys 22 8 July, 2007
  23. 23. Our Approach To deal with problems at semantic level most of the approaches available need to obtain and work on information gathered at syntactic level. We have proposed a new approach to deal with the problems caused by Linguistic phenomena of Semantic Symmetry and Ambiguous Modification. The Algorithms based on our approach removes wrong sentences f t from th answer with th h l of i f the ith the help f information ti obtained at Lexical level (Lexical Analysis).ENLIGHT sys 23 8 July, 2007
  24. 24. Algorithm for Handling Semantic Symmetry Rule 1 - If (sequence of keywords in question and candidate answer matches) then If (POS of verb keyword are same) then Candidate C did t answer i Cis Correct t Rule 2 - If (sequence of keywords in question and candidate answer do not match) then If (POS verb keyword are not same) then Candidate answer is Correct Otherwise - Candidate Answer is wrongENLIGHT sys 24 8 July, 2007
  25. 25. Algorithm for Handling Ambiguous Modification We have identified the adjective as Adj, Scope defining noun as SN and the Identifier noun as IN. Rules – If the sentence contains keywords in following order – Adj α SN Where α indicate string of zero or more keywords. Thene Rule1-a  If α is IN == Correct Answer Or Rule1-b Rule1 b  If α is Blank == Correct Answer Else Rule 2  If α is Otherwise == Wrong AnswerENLIGHT sys 25 8 July, 2007
  26. 26. Algorithm for Handling Ambiguous Modification(Cont.) If the sentence contains keywords in following order – y g SN α Adj β IN Where α and β indicate string of zero or more keywords. Then Rule 3  If β is Blank == Correct Answer (Value f (V l of α D Does not matter) t tt ) Else Rule 4  If β is Otherwise == Wrong AnswerENLIGHT sys 26 8 July, 2007
  27. 27. Working System - ENLIGHT We have developed a system that answers questions using ‘keyword based matching paradigm’. We have incorporated newly formulated algorithms in the system and we got good results.ENLIGHT sys 27 8 July, 2007
  28. 28. ENLIGHT System ArchitectureENLIGHT sys 28 8 July, 2007
  29. 29. PreprocessingThi module prepares platform f th I t lliThis d l l tf for the Intelligent and t dEffective interface.This module transfer raw format data into well organizedcorpus with the help of following activities. Keyword Extraction Sentence Segmentation Handling of Abbreviations and Punctuation Marks Tokenization Stemming Identifying Group of Words with Specific Meaning Shallow Parsing Reference ResolutionENLIGHT sys 29 8 July, 2007
  30. 30. Question Analysis Question T k i i Q i Tokenization Question ClassificationCorpus MC Management tVarious database tables are created to manage the vast data InfoKeywords QuestionKeyword QuestionAnswer CorpusSentences Abbreviations Abb i i Apostrophes StopWords Answer Retrieval Answer Searching Answer GenerationENLIGHT sys 30 8 July, 2007
  31. 31. Answer RescoringHandling problems caused due to linguistic phenomenausing shallow parsing based algorithms Semantic Symmetry Ambiguous ModificationIntelligence Incorporation Learning Rote Learning g Feedback Can Improve Satisfactory Wrong Answer Loose criterion Automated ClassificationENLIGHT sys 31 8 July, 2007
  32. 32. Results Preciseness P i Response Time p AdaptabilityENLIGHT sys 32 8 July, 2007
  33. 33. Preciseness Basic Keyword ENLIGHT MatchinggAverage Number of sentences 3 34.6returned as AnswerAverage Number of correct sentences 2.63 6Average precision 84 % 32 %ENLIGHT sys 33 8 July, 2007
  34. 34. Response Time (ENLIGHT Vs Sapere) Time Required by Time Required byType of Data and QTAG MiniparNo. fN of words d (Used in ENLIGHT) (Used in Sapere)News extract, Times of 1.71 s 2.88 sIndia.India 202 WordsReply, START QA 1.89 s 3.11 sSystem. 251 WordsGoogle Search Engine 1.55 s 2.86 sResultYahoo SY h Search E i h Engine 1.67 s 3.13 sResultsAVERAGE 1.705 s 2.995 sENLIGHT sys 34 8 July, 2007
  35. 35. Adaptability Handling Additional Keywords Question like ‘who killed the Prime Minister?’ can also be handled by ENLIGHT System y y Use of synonyms If the question and answer contains synonyms ENLIGHT System can associate these two words using the Learning phase.ENLIGHT sys 35 8 July, 2007
  36. 36. ReferencesL. Hirschman, R. Gaizauskas, Natural language question answering: theview from here, Natural Language engineering, 7(4), December2001.Manish Joshi, Rajendra Akerkar, The ENLIGHT System, IntelligentNatural Language System, Journal of Digital InformationManagement, JM June 2007.ENLIGHT sys 36 8 July, 2007