Talking Geckos QA system overview and performance

Talking Geckos QA System
Jie Cao
Boya Song

Overview
• Rule based system [Based on Ellen’s Quarc Paper]
• Stanford core nlp lib: POS, sentence split, NER,
dependency relations
• Classify questions according to keyword: what, where,
when, why, how, etc.
• Word Match function
• Rules that award points to each sentence
• Narrow down answer according to key word type

Work Process
Input
wordMatch(Common Component)
(Question,StorySentences)->ScoredSentences
reverseWordMatch(Make Assumption)
correctAnswer->correctSentence
narrowDownAnswer(Why)
correctSentence->correctAnswer
Google: Fast is better than slow
narrowDownAnswer(What)
narrowDownAnswer(…)
narrowDownAnswer(How)
BestSentence(Why)
ScoredSentence->correctSentence
BestSentence(What)
BestSentence(…)
BestSentence(How)
Rule Isolation
Rule Isolation
Developing Task Parallelism

Reports and Testing
Fast Testing
Classify Question Type Rules
WordMatch Rules Object Persistence, Reload
Test all data in less than 1 minute
Pre-annotated Story and Question
Annotated Test Report
Test By Question Type List
Annotated and Scored Report For
Every Question Type
151 Stories,1200 Questions

Precision Improvement
• Used reports and searched for general patterns to narrow down
answer within sentence
• Find continuous named entity tags in answer
• Location. time, and preposition keywords
• Why: substring after “so” or “because”—simple but very effective
• What: substring after core sub, core verb, root verb, or strike of
word contained in question
• Who, Where type of questions are more complicated

Sample Report
QuestionID: 1999-W38-1-9 
Question: How many peacekeeping troops does Canada now have in 22 nations around the world? 
Answer: about 3,900 | 3,900 
SentenceSize: 1 
CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ] 
MyAnswer: 3,900 22 
MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ] 
MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1,
myTotal=2, matchKey=' 3,900'} 
Difficulty: easy 
Included: Yes 
 
 
QuestionID: 1999-W37-5-3 
Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to? 
Answer: 49 seconds 
SentenceSize: 1 
CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ] 
MyAnswer: 49 seconds 
MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ] 
MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49
seconds'} 
Difficulty: moderate 
Included: Yes
rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure =
0.32152025726198186

WordMatch: TokenScore&RuleScore
Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a
ranch for sexually abused children?
MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon
[+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0]
Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned
[+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream
[+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building
[+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5]
for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused
[+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will
hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross .
[WHY_CURRENT_GOOD:2.0 ]
Stopword
Total Score
=Token Score+Rule Score
Strike_1:
Continuous Match
Word Match:0.5
closer
dis_word_1.0
Math.pow(2, 2 -
dobj_1
root verb_6
coreverb_4:
verb but not root
further dis_word_0.125
Math.pow(2, 2 - distance)
Incomplete
Lucene Stopword
Why RuleScore

WordMatch: TokenScore
1. Generate
1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion
2. General Word Match Score
1. WORD: “word_0.5”
2. DIS_WORD: “disword_{pow(2,2-distance)}” //unimportant “Mod”, longer distance from the root
3. Continuous Word Match Bonus Score
1. STRIKE: “strike_1” // for every continuous word(not include stopword)
4. Verb Match
1. Verb stopwords no score. //lucene stopwords
2. VerbsInQuestion.contains(“verb”)
• AUX_VERB: 0.5
• ROOT_VERB: 6
• COM_VERB: 0.5
• CORE_VERB: 4
5. Noun Match
1. ROOT_VERB: 6 // Root Verb in noun form
2. subjInQuestion != null && matched
• SUBJ: 1 // match subj in question.
• CORE_SUBJ: 3 // match core_subj in question
• SEC_VERB: 3 // verb of this matched noun is also in Question
3. dObjInQuestion != null && matched
• DOBJ: 1 // match dobj in question
• SEC_VERB: 3 // verb of this matched noun is also in Question
Time Limited
Just Extract Features, Manual tuning weights
Future Work:
1. Learning weights for features
2. dcoref
3. Thesaurus

BestSentence: Rule Score
WEAK_CLUE=1
CLUE=2
GOOD_CLUE =4
CONFIDENT=6
SLAM_DUNK=20
Only TokenScore
in all dataset
Average:
CorrectSentence/All
TokenScore+RuleScore
in all dataset
Average:
CorrectSentence/All
Where GOOD_CLUE: WHERE_LOCATION_PREP
CLUE: WHERE_LOCATION_NER
86/155=0.5548 93/155=0.6000
When
GOOD_CLUE: WHEN_TIME_NER
SLAM_DUNK: WHEN_ORDER_TOKEN
SLAM_DUNK: WHEN_BEGIN_TOKEN
117/173=0.6763 119/173=0.6879
Why
GOOD_CLUE: WHY_REASON_TOKEN
CLUE: WHY_CURRENT_GOOD
CLUE: WHY_POST_GOOD
CLUE: WHY_PRE_GOOD
WEAK_CLUE: WHY_LEAD_TOKEN
WEAK_CLUE: WHY_THINK_TOKEN
WEAK_CLUE: WHY_WANT_TOKEN
70/115=0.6087 71/115=0.6174
Who/
Whose
GOOD_CLUE:WHO_PERSON_NER
CLUE:WHO_ORGANIZATION_MISC_NER
CLUE:WHOSE_PRP_POS
116/187=0.6203 118/187=0.6310
What
GOOD_CLUE: WHAT_KIND_TOKEN
CLUE: WHAT_DATE_NER
SLAM_DUNK: WHAT_NAME_TOKEN
202/311=0.6325 202/311=0.6325
How HOW_SECKEY_K: distance from k to root
GOOD_CLUE: HOW_DISTANCE_TEXT
141/232=0.6078 143/232=0.6164
testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why
…
212/313=0.6773 212/313=0.6773
testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why … 215/315=0.6825 221/315=0.7016

Summary
• Reporting system worked very well, helped to
improve precision
• System is fairly simple and straight forward
• Unfortunately due to Stanford core nlp lib bug, we
could not incorporate coreference resolution into
our system
• Short on time: Tuning weights manually(ML?) Recall
and precision still have room for improvement

Talking Geckos QA system overview and performance

Recommended

Recommended

More Related Content

Similar to Talking Geckos QA system overview and performance

Similar to Talking Geckos QA system overview and performance (13)

More from jie cao

More from jie cao (7)

Recently uploaded

Recently uploaded (20)

Talking Geckos QA system overview and performance