Successfully reported this slideshow.

Talking Geckos (Question and Answering)



Upcoming SlideShare
Being Google
Being Google
Loading in …3
1 of 11
1 of 11

More Related Content

Talking Geckos (Question and Answering)

  1. 1. Talking Geckos QA System Jie Cao Boya Song
  2. 2. Overview • Rule based system [Based on Ellen’s Quarc Paper] • Stanford core nlp lib: POS, sentence split, NER, dependency relations • Classify questions according to keyword: what, where, when, why, how, etc. • Word Match function • Rules that award points to each sentence • Narrow down answer according to key word type
  3. 3. Work Process Input wordMatch(Common Component) (Question,StorySentences)->ScoredSentences reverseWordMatch(Make Assumption) correctAnswer->correctSentence narrowDownAnswer(Why) correctSentence->correctAnswer Google: Fast is better than slow narrowDownAnswer(What) correctSentence->correctAnswer narrowDownAnswer(…) correctSentence->correctAnswer narrowDownAnswer(How) correctSentence->correctAnswer BestSentence(Why) ScoredSentence->correctSentence BestSentence(What) ScoredSentence->correctSentence BestSentence(…) ScoredSentence->correctSentence BestSentence(How) ScoredSentence->correctSentence Rule Isolation Rule Isolation Developing Task Parallelism
  4. 4. Reports and Testing Fast Testing Classify Question Type Rules WordMatch Rules Object Persistence, Reload Test all data in less than 1 minute Pre-annotated Story and Question Annotated Test Report Test By Question Type List Annotated and Scored Report For Every Question Type 151 Stories,1200 Questions
  5. 5. Precision Improvement • Used reports and searched for general patterns to narrow down answer within sentence • Find continuous named entity tags in answer • Location. time, and preposition keywords • Why: substring after “so” or “because”—simple but very effective • What: substring after core sub, core verb, root verb, or strike of word contained in question • Who, Where type of questions are more complicated
  6. 6. Sample Report QuestionID: 1999-W38-1-9
 Question: How many peacekeeping troops does Canada now have in 22 nations around the world?
 Answer: about 3,900 | 3,900
 SentenceSize: 1
 CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
 MyAnswer: 3,900 22
 MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
 MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1, myTotal=2, matchKey=' 3,900'}
 Difficulty: easy
 Included: Yes
 QuestionID: 1999-W37-5-3
 Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to?
 Answer: 49 seconds
 SentenceSize: 1
 CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
 MyAnswer: 49 seconds
 MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
 MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49 seconds'}
 Difficulty: moderate
 Included: Yes rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure = 0.32152025726198186
  7. 7. WordMatch: TokenScore&RuleScore Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a ranch for sexually abused children? MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon [+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0] Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned [+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream [+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building [+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5] for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused [+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross . [WHY_CURRENT_GOOD:2.0 ] Stopword Total Score =Token Score+Rule Score Strike_1: Continuous Match Word Match:0.5 closer dis_word_1.0 Math.pow(2, 2 - dobj_1 root verb_6 coreverb_4: verb but not root further dis_word_0.125 Math.pow(2, 2 - distance) Incomplete Lucene Stopword Why RuleScore
  8. 8. WordMatch: TokenScore 1. Generate 1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion 2. General Word Match Score 1. WORD: “word_0.5” 2. DIS_WORD: “disword_{pow(2,2-distance)}” //unimportant “Mod”, longer distance from the root 3. Continuous Word Match Bonus Score 1. STRIKE: “strike_1” // for every continuous word(not include stopword) 4. Verb Match 1. Verb stopwords no score. //lucene stopwords 2. VerbsInQuestion.contains(“verb”) • AUX_VERB: 0.5 • ROOT_VERB: 6 • COM_VERB: 0.5 • CORE_VERB: 4 5. Noun Match 1. ROOT_VERB: 6 // Root Verb in noun form 2. subjInQuestion != null && matched • SUBJ: 1 // match subj in question. • CORE_SUBJ: 3 // match core_subj in question • SEC_VERB: 3 // verb of this matched noun is also in Question 3. dObjInQuestion != null && matched • DOBJ: 1 // match dobj in question • SEC_VERB: 3 // verb of this matched noun is also in Question Time Limited Just Extract Features, Manual tuning weights Future Work: 1. Learning weights for features 2. dcoref 3. Thesaurus
  9. 9. BestSentence: Rule Score WEAK_CLUE=1 CLUE=2 GOOD_CLUE =4 CONFIDENT=6 SLAM_DUNK=20 Only TokenScore in all dataset Average: CorrectSentence/All TokenScore+RuleScore in all dataset Average: CorrectSentence/All Where GOOD_CLUE: WHERE_LOCATION_PREP CLUE: WHERE_LOCATION_NER 86/155=0.5548 93/155=0.6000 When GOOD_CLUE: WHEN_TIME_NER SLAM_DUNK: WHEN_ORDER_TOKEN SLAM_DUNK: WHEN_BEGIN_TOKEN 117/173=0.6763 119/173=0.6879 Why GOOD_CLUE: WHY_REASON_TOKEN CLUE: WHY_CURRENT_GOOD CLUE: WHY_POST_GOOD CLUE: WHY_PRE_GOOD WEAK_CLUE: WHY_LEAD_TOKEN WEAK_CLUE: WHY_THINK_TOKEN WEAK_CLUE: WHY_WANT_TOKEN 70/115=0.6087 71/115=0.6174 Who/ Whose GOOD_CLUE:WHO_PERSON_NER CLUE:WHO_ORGANIZATION_MISC_NER CLUE:WHOSE_PRP_POS 116/187=0.6203 118/187=0.6310 What GOOD_CLUE: WHAT_KIND_TOKEN CLUE: WHAT_DATE_NER SLAM_DUNK: WHAT_NAME_TOKEN 202/311=0.6325 202/311=0.6325 How HOW_SECKEY_K: distance from k to root GOOD_CLUE: HOW_DISTANCE_TEXT 141/232=0.6078 143/232=0.6164 testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why … 212/313=0.6773 212/313=0.6773 testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why … 215/315=0.6825 221/315=0.7016
  10. 10. Summary • Reporting system worked very well, helped to improve precision • System is fairly simple and straight forward • Unfortunately due to Stanford core nlp lib bug, we could not incorporate coreference resolution into our system • Short on time: Tuning weights manually(ML?) Recall and precision still have room for improvement
  11. 11. Thanks Q&A