Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talking Geckos (Question and Answering)

NLP Question and Answering system with 0.5+ F measure,ranked Top 1st in class project

  • Login to see the comments

  • Be the first to like this

Talking Geckos (Question and Answering)

  1. 1. Talking Geckos QA System Jie Cao Boya Song
  2. 2. Overview • Rule based system [Based on Ellen’s Quarc Paper] • Stanford core nlp lib: POS, sentence split, NER, dependency relations • Classify questions according to keyword: what, where, when, why, how, etc. • Word Match function • Rules that award points to each sentence • Narrow down answer according to key word type
  3. 3. Work Process Input wordMatch(Common Component) (Question,StorySentences)->ScoredSentences reverseWordMatch(Make Assumption) correctAnswer->correctSentence narrowDownAnswer(Why) correctSentence->correctAnswer Google: Fast is better than slow narrowDownAnswer(What) correctSentence->correctAnswer narrowDownAnswer(…) correctSentence->correctAnswer narrowDownAnswer(How) correctSentence->correctAnswer BestSentence(Why) ScoredSentence->correctSentence BestSentence(What) ScoredSentence->correctSentence BestSentence(…) ScoredSentence->correctSentence BestSentence(How) ScoredSentence->correctSentence Rule Isolation Rule Isolation Developing Task Parallelism
  4. 4. Reports and Testing Fast Testing Classify Question Type Rules WordMatch Rules Object Persistence, Reload Test all data in less than 1 minute Pre-annotated Story and Question Annotated Test Report Test By Question Type List Annotated and Scored Report For Every Question Type 151 Stories,1200 Questions
  5. 5. Precision Improvement • Used reports and searched for general patterns to narrow down answer within sentence • Find continuous named entity tags in answer • Location. time, and preposition keywords • Why: substring after “so” or “because”—simple but very effective • What: substring after core sub, core verb, root verb, or strike of word contained in question • Who, Where type of questions are more complicated
  6. 6. Sample Report QuestionID: 1999-W38-1-9
 Question: How many peacekeeping troops does Canada now have in 22 nations around the world?
 Answer: about 3,900 | 3,900
 SentenceSize: 1
 CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
 MyAnswer: 3,900 22
 MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
 MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1, myTotal=2, matchKey=' 3,900'}
 Difficulty: easy
 Included: Yes
 QuestionID: 1999-W37-5-3
 Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to?
 Answer: 49 seconds
 SentenceSize: 1
 CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
 MyAnswer: 49 seconds
 MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
 MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49 seconds'}
 Difficulty: moderate
 Included: Yes rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure = 0.32152025726198186
  7. 7. WordMatch: TokenScore&RuleScore Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a ranch for sexually abused children? MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon [+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0] Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned [+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream [+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building [+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5] for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused [+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross . [WHY_CURRENT_GOOD:2.0 ] Stopword Total Score =Token Score+Rule Score Strike_1: Continuous Match Word Match:0.5 closer dis_word_1.0 Math.pow(2, 2 - dobj_1 root verb_6 coreverb_4: verb but not root further dis_word_0.125 Math.pow(2, 2 - distance) Incomplete Lucene Stopword Why RuleScore
  8. 8. WordMatch: TokenScore 1. Generate 1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion 2. General Word Match Score 1. WORD: “word_0.5” 2. DIS_WORD: “disword_{pow(2,2-distance)}” //unimportant “Mod”, longer distance from the root 3. Continuous Word Match Bonus Score 1. STRIKE: “strike_1” // for every continuous word(not include stopword) 4. Verb Match 1. Verb stopwords no score. //lucene stopwords 2. VerbsInQuestion.contains(“verb”) • AUX_VERB: 0.5 • ROOT_VERB: 6 • COM_VERB: 0.5 • CORE_VERB: 4 5. Noun Match 1. ROOT_VERB: 6 // Root Verb in noun form 2. subjInQuestion != null && matched • SUBJ: 1 // match subj in question. • CORE_SUBJ: 3 // match core_subj in question • SEC_VERB: 3 // verb of this matched noun is also in Question 3. dObjInQuestion != null && matched • DOBJ: 1 // match dobj in question • SEC_VERB: 3 // verb of this matched noun is also in Question Time Limited Just Extract Features, Manual tuning weights Future Work: 1. Learning weights for features 2. dcoref 3. Thesaurus
  9. 9. BestSentence: Rule Score WEAK_CLUE=1 CLUE=2 GOOD_CLUE =4 CONFIDENT=6 SLAM_DUNK=20 Only TokenScore in all dataset Average: CorrectSentence/All TokenScore+RuleScore in all dataset Average: CorrectSentence/All Where GOOD_CLUE: WHERE_LOCATION_PREP CLUE: WHERE_LOCATION_NER 86/155=0.5548 93/155=0.6000 When GOOD_CLUE: WHEN_TIME_NER SLAM_DUNK: WHEN_ORDER_TOKEN SLAM_DUNK: WHEN_BEGIN_TOKEN 117/173=0.6763 119/173=0.6879 Why GOOD_CLUE: WHY_REASON_TOKEN CLUE: WHY_CURRENT_GOOD CLUE: WHY_POST_GOOD CLUE: WHY_PRE_GOOD WEAK_CLUE: WHY_LEAD_TOKEN WEAK_CLUE: WHY_THINK_TOKEN WEAK_CLUE: WHY_WANT_TOKEN 70/115=0.6087 71/115=0.6174 Who/ Whose GOOD_CLUE:WHO_PERSON_NER CLUE:WHO_ORGANIZATION_MISC_NER CLUE:WHOSE_PRP_POS 116/187=0.6203 118/187=0.6310 What GOOD_CLUE: WHAT_KIND_TOKEN CLUE: WHAT_DATE_NER SLAM_DUNK: WHAT_NAME_TOKEN 202/311=0.6325 202/311=0.6325 How HOW_SECKEY_K: distance from k to root GOOD_CLUE: HOW_DISTANCE_TEXT 141/232=0.6078 143/232=0.6164 testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why … 212/313=0.6773 212/313=0.6773 testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why … 215/315=0.6825 221/315=0.7016
  10. 10. Summary • Reporting system worked very well, helped to improve precision • System is fairly simple and straight forward • Unfortunately due to Stanford core nlp lib bug, we could not incorporate coreference resolution into our system • Short on time: Tuning weights manually(ML?) Recall and precision still have room for improvement
  11. 11. Thanks Q&A