Instant Question Answering System


Published on

Instant Question Answering System using machine learning and natural language processing

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Instant Question Answering System

  1. 1. Instant Question Answering Dhwaj Raj
  2. 2. What is Instant Question Answering? User asks a question in text format and the instantQA system automatically retrieves or formulates an answer and presents it back to the user, instantly. ●
  3. 3. Why Instant Question Answering? ● ● ● ● In spite of the continuous progress of search engines, many of users’ needs still remain unanswered. While Community Question Answering (e.g. AnA platform) can feature factoid questions but their primary goal is to satisfy needs such as: Opinion seeking, Recommendation, Open-ended questions, Problem solving. In community question answering user has to wait for answers which he seeks, even if question is very simple and a mere fact. Better User Experience : Why browse through search result listings or related questions when information can be catered upfront.
  4. 4. Why Instant Question Answering? ● CASE : SHIKSHA.COM ● ● ● Top domains being searched based on Both query logs and data availability with listings: fees, duration, seats, application date, application url, affiliation, approval, entrance exams, placement companies and job salaries. High number of Fact type questions, which can be targeted, although we are not targeting opinion based or open ended questions. 23% of questions belong to these 10 domains out of 1.15L random sample.
  5. 5. Is it something similar to AnA platform? ● ● Our organization have a discussion forum called as AnA(Ask and Answer) platform. InstantQA has no relation what so ever and no direct usecase with the current AnA forum contents, as of now.
  6. 6. What kind of questions we target? ● What is the price of X? ● When is the last date of Y? ● How much is the fee for W? ● What is the fee for W? ● ● What is meaning of life, universe and everything? I do not feel like studying, what to do? ● Which company hire from campus Q? Will I get admission in Z? ● How to improve my career? ● ● ● Should I invest in noida? How is the placement at Z? ● ● Is Z college in Delhi? (transform to where) ● I have purchased X project, should I sell it now or hold? Is it beneficial to buy 2bhk in 30 lacs?
  7. 7. What kind of questions we target? ● When is the last date of Y? ID How much is the fee for W? ● What is the fee for W? TO ● Which company hire from campus Q? FA C ● ● ● How is the placement at Z? Is Z college in Delhi? (transform to where) ● What is meaning of life, universe and everything? O N pe ot n de en fin de ite d. What is the price of X? S ● ● I do not feel like studying, what to do? ● Will I get admission in Z? ● How to improve my career? ● Should I invest in noida? ● ● I have purchased X project, should I sell it now or hold? Is it beneficial to buy 2bhk in 30 lacs?
  8. 8. What is the very basic approach to instant question answering? ● General architecture question e.g. What is Calvad os? Question Classification and Analysis /Q is /A where:/Q= “(Calvado s)” Information Retrieval Query=“Calvad os is” Text retrieva l=“…Calvados is often used in cooking… Calvados is a dry apple brandy made in… Answer Extraction /A is : a dry apple brandy answer Answe r Answer: /Q is /A: “Calvad os” is ”a dry apple brandy”
  9. 9. If it is so simple, why haven't you done it already?
  10. 10. There are challenges in QA ! ● ● ● ● ● ● ● Quality of text data. Language variability (paraphrase) Knowledge base domain: the answer has to be supported by the collection, not by the current state of the world. How to locate the information given the question keywords. It is unlikely that a system will have all necessary resources pre-computed. The task requires some deduction or extra linguistic knowledge. How does a reasoning system find relevant pieces of information.
  11. 11. Do we have any prior research to tackle these challeneges?
  12. 12. QA research ● ● Well established over two decades TREC (Text REtrieval Conference) ● ● ● CLEF (Cross Language Evaluation Forum) ● ● ● 2001- current Information Retrieval, language resources NTCIR (NII Test Collection for IR Systems) ● ● ● funded by NIST/DARPA since 1992 QA track 1999 – 2007, directed at ‘Factoids’ 1997 – current IR, question answering, summarization, extraction Our Literature Survey can be accessed at :
  13. 13. Ok investigation is done. But how to do it actually?
  14. 14. Knowledge base generation
  15. 15. PH AS E 1 Knowledge base generation
  16. 16. Knowledge base generation: Example Index Btech, iit d, fees, 24000, INR ● PH AS E 1 ● ● ● ● ● ● The fees for Btech course in IIT D is 24000 INR. The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>. The fees for Btech course in IIT D is 24000 INR. The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>. Fees, Btech, IIT D, 24000 What is the fees of Btech course at IIT Delhi? How much is the fees for Btech Coure from IIT Delhi? How many INR is the fees of btech from iit delhi. What ….........
  17. 17. Answer Retreival
  18. 18. Answer Retreival : Example Already indexed knowledge base. Trained once at startup. How much will I pay for btech from IIT D? How much will I <<pay for>> <<btech>> from <<IIT D>>? Rank and prune best answer based on collective match. Focus: How Much Object : Pay Class: quanitity to pay, fees ● ● Consistency checks ● You should pay 24000 INR for Btech from IIT D. The fees for Btech from IITD is 24000 INR. 24000 INR should be paid for Btech from IIT D.
  19. 19. So many boxes !! Let us check out major components in brief.
  20. 20. A.1. Fact phrase generator from structured listings ● Structured listing to factoid text. ● No need to rely only on user generated sentences. ● Use basic language model techniques to create sentences from templates. <doc> ….. <college_name>iit</college_name> <college_id>13213</college_id> <fee>54000 inr annual</fee> <location>delhi</location> ….... </doc> Language Model Fee of iit delhi is 54000 inr annual.
  21. 21. A.2. Template Generator ● Start with identifying: – – – ● Answer Type Entities in focus Part of Speech tags With these tags and language grammar rules, a factoid/ sentence can be converted into all possible question forms. (Question Generation QG task) Fee of iit delhi is 54000 inr annually. Answer type: quantity● What is the fee of iit delhi annually? ● What is the fee of iit delhi focus: fee Fee of <II> <LL> is <$$>. ● How much is the fee of iit delhi? entity : iit + delhi Fees of <II> <LL> is <$$>. ● Is fee of iit delhi 54000 inr? Pos tags etc. Cost of <II> <LL> is <$$>.
  22. 22. B.1. Text Preprocessing ● Short-forms – i’m, im, i m – can’t, cant, can t i am can not ● Spelling correction ● Repeated punctuation (!!!, ???, …) ● Smilies ● Salutations (Hi all, Hiya, etc.) ● Names, signature, course codes
  23. 23. B.2. Entity and POS Tagger ● QER – ● Names, locations etc. Part of Speech Tagger using word sequence patterns – Sequence (noun, verbs, auxiliaries, modifiers) ● Phrase Chunker ● Dependency parsing : validate tag relationships
  24. 24. B.3. Question Analysis ● Create features to be used during answer extraction ● Identify keywords to be matched in document sentences ● ● ● Identify answer type to match answer candidates. We can create an inventory of questions and expected answer types and so we can train a classifier – Quantity? – Dates? – Definition? Select a list of useful patterns from a pattern repository Identify question relations which may be used for sentence analysis, etc.
  25. 25. B.4. Query Formulation ● ● ● ● The question needs to be transformed in a query to the document retrieval system Each IR system has its own query language so we need to perform this mapping Identify useful keywords; use type of answer sought, entities to boost etc. Query Creation : Ordered terms, combined terms, weighted terms.
  26. 26. B.5. Answer Candidate Searcher ● ● ● Index the <question, qtypes, entities, answer template> in a training corpus Retrieve set of n <question, qtypes, entities, answer template> given a new question Decide based on the scores of answers returned the best answer to the new question
  27. 27. Pheww.... !
  28. 28. Where do we need Natural Language Processing? ● ● ● ● ● ● ● ● Tokenisation (words, numbers, punctuation, whitespace) Sentence detection Part of speech tagging (verbs, nouns, pronouns, etc.) Query entity recognition Chunking/Parsing (noun/verb phrases and relationships) Statistical modelling tools Dictionaries, word-lists, WordNet , VerbNet Template generation using grammar rules.
  29. 29. So you are telling me there are readymade nlp tools?
  30. 30. NLP tools problems ● Training data issues ● ● ● Training domains are completely different. Local english language: slang, spell, localisation Sentence detection failures: ● ● ● Tokenisation failures: ● ● ● Multiple punctuation ???, !!! (student emphasis) Abbreviations (im, m.b.a, cant, doesnt, etc.) POS errors ● ● Bad style (capitalisation, punctuation) Ellipsis (i tried... it failed... error message...) Spelling, grammar We need to experiment, modify codes and train on our domain data !
  31. 31. What are the use cases of instant QA ? How does it fit in our system?
  32. 32. Interaction ● If users are not writing good english then try to minimize their writings. We can focus on capturing user intent with least amount of typed text. ✔ Auto complete ✔ Guidance ✔ Spell check ✔ Auto correct ✔ Manual feedback on conflicts ✔ Make them write good queries ● This helps not onle user experience but increases the accuracy of language based statistical systems.
  33. 33. Shiksha : main search & cafe search
  34. 34. Shiksha : Integration with main search auto-suggestor We will already generate good quality questions. Could be intigrated here.
  35. 35. 99acres ● ● ● Similar use cases like shiksha. The real estate domain has more open ended opinion question and very less factoid questions. If a single text box search is introduced in future – – SRP can cater not only listings but also Question Answers Instant QA would be really helpful in user experience.
  36. 36. And many more other use cases …... Plus some components of this system will be utilized separately in improving other existing systems.
  37. 37. Thank you.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.