Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation of Domain Specific Question Answering System Using N-gram Approach.

Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Presentation of Domain Specific Question Answering System Using N-gram Approach.

  1. 1. Presented by: Tasnim Ara Islam Roll: 1007010 Farh Naz Chowdhuy Roll: 1007038 Supervisor: Dr. K.M. Azharul Hasan Professor Dept of CSE, KUET. Domain Specific Question Answering System Using N-gram Approach Project/Thesis CSE 4000
  2. 2. Outline  Introduction  Objective  Problem Statement  Scope of thesis  Theoretical Consideration  POS Tagger  N-Gram  Q/A System Using N-gram Approach  Experimental Analysis Project/Thesis CSE 4000
  3. 3. Introduction Project/Thesis CSE 4000
  4. 4. Objective  User wants specific answers rather than full text documents or best-matching passages.  To find answers of factoid (people or places, or the amounts of stuffs) questions by using domain specific documents. Project/Thesis CSE 4000
  5. 5. Problem Statement  Our system is a Q/A system which is a specific type of information retrieval.  Given a text document, the system attempts to find out the best matching answer to the question.  The output will be a sentence, not be any snippet or any short answer. Project/Thesis CSE 4000
  6. 6. Scope of the Thesis  WH- words: Who, What, When, Where, Which, Whom.  Domain specific document.  N-Gram mining approach.  Environment:  Eclipse Java EE IDE (Version: Luna Service Release 2 (4.4.2)),  jre 1.8.0_45  Stanford POS Tagger (Version 3.0.1).  List: Regular and irregular verb list, Synonym List Project/Thesis CSE 4000
  7. 7. Theoretical Consideration Project/Thesis CSE 4000
  8. 8. Q/A systems  Pattern based question answering system. Ex. <NAME> was born on <ANSWER>  Key reference - AskMSR, a web based Q/A system.  Used N-gram mining, filtering and tiling for getting answer.  Applied N-gram both for question and text sentences. Project/Thesis CSE 4000
  9. 9. N-Gram  N-grams are sequences of characters or words extracted from a text.  Types - 1. Character based 2. Word based  An n-gram of size 1 is referred to as a Unigram; size 2 is a Bigram; size 3 is a Trigram and so on. Taj mahal is a world heritage site. Bigrams are- Taj mahal, mahal is, is a, a world, world heritage, heritage site Trigrams are- Taj mahal is, mahal is a, is a world, a world heritage, world heritage site Project/Thesis CSE 4000
  10. 10. POS Tagger  POS Tagger is a software that reads text in some language and assigns parts of speech to each word such as noun, verb, adjective etc.  Stanford POS tagger is NLP based library which deals with parts of speech detection of English language. Input: I like watching movies. Output: I_PRP like_VBP watching_VBG movies_NNS Project/Thesis CSE 4000
  11. 11. Q/A System Using N-gram Project/Thesis CSE 4000
  12. 12. Steps of implementation 1. Domain specific question in GUI. 2. Splitting the Text files. 3. Query reformulation. I. Change corresponding verb. a. Do, Does, Did. b. Regular or irregular. c. Synonym word. II. Find the Parts of speech from words in questions using POS Tagger. III. Select Verb, Main Verb and Noun. 4. Verb, Main verb and Noun are matched with passage sentences by N-Gram Mining. 5. Sentence of maximum match based on verb and main verb is the answer. Project/Thesis CSE 4000
  13. 13. System in Brief… web-query-solution (filename, passageName, question) begin sSentence{} := get sentence from file, qVerb{} := verb from Question, qMainVerb{} := mainVerb from Question, qNoun{} := noun from Question if(NGram(sSentence) = NGram(qVerb OR qMainVerb OR qNoun)) then begin count verb, mainverb, noun and return. end max:= no. of verb and no. of mainverb if(max is MAXIMUM) then return answer String end Fig: System Algorithm. Project/Thesis CSE 4000
  14. 14. User Input Project/Thesis CSE 4000
  15. 15. Experimental Analysis Project/Thesis CSE 4000
  16. 16. Case Study 1 The Taj Mahal is a white marble mausoleum. It is located in Agra, Uttar Pradesh, India. Mughal emperor Shah Jahan built Taj mahal in memory of his third wife, Mumtaz Mahal. The Taj Mahal is widely recognized as "the jewel of Muslim art in India". In 1983, the Taj Mahal became a UNESCO World Heritage Site. The construction began around 1632. The construction was completed around 1653. The architects of Taj mahal are Abd ul-Karim Ma'mur Khan, Makramat Khan, and Ustad Ahmad Lahauri. Lahauri is generally considered to be the principal designer. In 1631, Shah Jahan was grief-stricken for the death of his wife. Mumtaz Mahal was Shah Jahan's third wife and a Persian princess. Mumtaz died during the birth of their 14th child, Gauhara Begum. Project/Thesis CSE 4000
  17. 17. Case Study 2 and 3  Child Labour  Bangladesh Cricket team Project/Thesis CSE 4000
  18. 18. Output Ranking  Excellent  Satisfactory  Bad Project/Thesis CSE 4000
  19. 19. Experimental Analysis No Q/A Rank 1. Q.Where is Taj mahal located in? Ans: taj mahal is located in agra uttar pradesh india Excellent 2. Q.What is Taj mahal? Ans: the taj mahal is widely recognized as the jewel of muslim art in india. Satisfactory 3. Q.Who was mumtaj mahal? Ans: in 1631 shah jahan was grief-stricken for the death of his Bad 4. Q.When did the construction begin? Ans: the construction began around 1632. Excellent 5. Q.Who is the principal designer? Ans: the taj mahal is widely recognized as the jewel of muslim art in india. Bad Project/Thesis CSE 4000
  20. 20. Experimental Results Case study 1: Taj Mahal : 32 questions. From those: Excellent: 15, so, 46.87% Satisfactory: 14, so, 43.75% Bad: 3, so, 9.38% Case study 2: Child Labour :14 questions. From those: Excellent: 6, so, 42.86% Satisfactory: 1, so, 7.14% Bad: 7, so, 50% Case study 3: Bangladesh Cricket Team : 24 questions. From those: Excellent: 14, so, 58.33% Satisfactory: 0, so, 0% Bad: 10, so, 41.67% Project/Thesis CSE 4000
  21. 21. Accuracy Measure Total question asked = 32 + 14 + 24 = 70 questions Among those, Excellent answers = 15 + 6 + 14 = 35 Satisfactory answers = 14 + 1 + 0 = 15 Bad Answers = 3 + 7 + 10 = 20 Percentage of Excellent answers = 50% Percentage of Satisfactory answers = 21.43% Percentage of Bad answers = 28.57% Project/Thesis CSE 4000
  22. 22. Limitations  Deals with simple sentences only.  Does not handle antonyms, spell checking.  Not domain independent.  Complex questions can not be handled. Project/Thesis CSE 4000
  23. 23. Conclusion  While implementing the system we faced difficulties. A lot can be done to make the system domain independent. We can implement more linguistic features. These will make the system more robust. Project/Thesis CSE 4000
  24. 24. Thank You. Project/Thesis CSE 4000

    Be the first to comment

    Login to see the comments

Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.

Views

Total views

734

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

39

Shares

0

Comments

0

Likes

0

×