2. Overview
• Rule based system [Based on Ellen’s Quarc Paper]
• Stanford core nlp lib: POS, sentence split, NER,
dependency relations
• Classify questions according to keyword: what, where,
when, why, how, etc.
• Word Match function
• Rules that award points to each sentence
• Narrow down answer according to key word type
3. Work Process
Input
wordMatch(Common Component)
(Question,StorySentences)->ScoredSentences
reverseWordMatch(Make Assumption)
correctAnswer->correctSentence
narrowDownAnswer(Why)
correctSentence->correctAnswer
Google: Fast is better than slow
narrowDownAnswer(What)
correctSentence->correctAnswer
narrowDownAnswer(…)
correctSentence->correctAnswer
narrowDownAnswer(How)
correctSentence->correctAnswer
BestSentence(Why)
ScoredSentence->correctSentence
BestSentence(What)
ScoredSentence->correctSentence
BestSentence(…)
ScoredSentence->correctSentence
BestSentence(How)
ScoredSentence->correctSentence
Rule Isolation
Rule Isolation
Developing Task Parallelism
4. Reports and Testing
Fast Testing
Classify Question Type Rules
WordMatch Rules Object Persistence, Reload
Test all data in less than 1 minute
Pre-annotated Story and Question
Annotated Test Report
Test By Question Type List
Annotated and Scored Report For
Every Question Type
151 Stories,1200 Questions
5. Precision Improvement
• Used reports and searched for general patterns to narrow down
answer within sentence
• Find continuous named entity tags in answer
• Location. time, and preposition keywords
• Why: substring after “so” or “because”—simple but very effective
• What: substring after core sub, core verb, root verb, or strike of
word contained in question
• Who, Where type of questions are more complicated
6. Sample Report
QuestionID: 1999-W38-1-9
Question: How many peacekeeping troops does Canada now have in 22 nations around the world?
Answer: about 3,900 | 3,900
SentenceSize: 1
CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
MyAnswer: 3,900 22
MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]
MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1,
myTotal=2, matchKey=' 3,900'}
Difficulty: easy
Included: Yes
QuestionID: 1999-W37-5-3
Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to?
Answer: 49 seconds
SentenceSize: 1
CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
MyAnswer: 49 seconds
MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]
MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49
seconds'}
Difficulty: moderate
Included: Yes
rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure =
0.32152025726198186
7. WordMatch: TokenScore&RuleScore
Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a
ranch for sexually abused children?
MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon
[+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0]
Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned
[+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream
[+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building
[+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5]
for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused
[+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will
hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross .
[WHY_CURRENT_GOOD:2.0 ]
Stopword
Total Score
=Token Score+Rule Score
Strike_1:
Continuous Match
Word Match:0.5
closer
dis_word_1.0
Math.pow(2, 2 -
dobj_1
root verb_6
coreverb_4:
verb but not root
further dis_word_0.125
Math.pow(2, 2 - distance)
Incomplete
Lucene Stopword
Why RuleScore
8. WordMatch: TokenScore
1. Generate
1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion
2. General Word Match Score
1. WORD: “word_0.5”
2. DIS_WORD: “disword_{pow(2,2-distance)}” //unimportant “Mod”, longer distance from the root
3. Continuous Word Match Bonus Score
1. STRIKE: “strike_1” // for every continuous word(not include stopword)
4. Verb Match
1. Verb stopwords no score. //lucene stopwords
2. VerbsInQuestion.contains(“verb”)
• AUX_VERB: 0.5
• ROOT_VERB: 6
• COM_VERB: 0.5
• CORE_VERB: 4
5. Noun Match
1. ROOT_VERB: 6 // Root Verb in noun form
2. subjInQuestion != null && matched
• SUBJ: 1 // match subj in question.
• CORE_SUBJ: 3 // match core_subj in question
• SEC_VERB: 3 // verb of this matched noun is also in Question
3. dObjInQuestion != null && matched
• DOBJ: 1 // match dobj in question
• SEC_VERB: 3 // verb of this matched noun is also in Question
Time Limited
Just Extract Features, Manual tuning weights
Future Work:
1. Learning weights for features
2. dcoref
3. Thesaurus
9. BestSentence: Rule Score
WEAK_CLUE=1
CLUE=2
GOOD_CLUE =4
CONFIDENT=6
SLAM_DUNK=20
Only TokenScore
in all dataset
Average:
CorrectSentence/All
TokenScore+RuleScore
in all dataset
Average:
CorrectSentence/All
Where GOOD_CLUE: WHERE_LOCATION_PREP
CLUE: WHERE_LOCATION_NER
86/155=0.5548 93/155=0.6000
When
GOOD_CLUE: WHEN_TIME_NER
SLAM_DUNK: WHEN_ORDER_TOKEN
SLAM_DUNK: WHEN_BEGIN_TOKEN
117/173=0.6763 119/173=0.6879
Why
GOOD_CLUE: WHY_REASON_TOKEN
CLUE: WHY_CURRENT_GOOD
CLUE: WHY_POST_GOOD
CLUE: WHY_PRE_GOOD
WEAK_CLUE: WHY_LEAD_TOKEN
WEAK_CLUE: WHY_THINK_TOKEN
WEAK_CLUE: WHY_WANT_TOKEN
70/115=0.6087 71/115=0.6174
Who/
Whose
GOOD_CLUE:WHO_PERSON_NER
CLUE:WHO_ORGANIZATION_MISC_NER
CLUE:WHOSE_PRP_POS
116/187=0.6203 118/187=0.6310
What
GOOD_CLUE: WHAT_KIND_TOKEN
CLUE: WHAT_DATE_NER
SLAM_DUNK: WHAT_NAME_TOKEN
202/311=0.6325 202/311=0.6325
How HOW_SECKEY_K: distance from k to root
GOOD_CLUE: HOW_DISTANCE_TEXT
141/232=0.6078 143/232=0.6164
testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why
…
212/313=0.6773 212/313=0.6773
testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why … 215/315=0.6825 221/315=0.7016
10. Summary
• Reporting system worked very well, helped to
improve precision
• System is fairly simple and straight forward
• Unfortunately due to Stanford core nlp lib bug, we
could not incorporate coreference resolution into
our system
• Short on time: Tuning weights manually(ML?) Recall
and precision still have room for improvement