5. Question Detection
● Rule based methods
○ 5W1H
○ End with ' ? '
■ 30% questions do not end with question marks.
● I am wondering where I can visit in Bangkok.
● I am having doubt about changing tyre.
■ 9% are not questions
● Like to enjoy a long walk while enjoying great
sights and tastes?
● Only have three days to explore this city?
Not good!
6. Labeled Sequential Pattern
1. Pre-process each sentence into POS tags
“where can you find a job”
→ “where can PRP VB DT NN”
1. Build sequence database.
<a, d, e, f> → Q
<a, f, e, f> → Q
<d, a, f> → NQ
1. Calculate the support and confidence
- <a, e, f> with support 66.7% and 100% confidence
- <a, f> with support 66.7% and 66.7% confidence
1. Set minimum support threshold and minimum
confidence threshold
F1 score = 97.4%
7. Answer Detection
● Observation: Many-to-many
○ Multiple questions and answers within same
thread.
■ 1 question may have multiple replies.
■ 1 post may contain answers to multiple
questions.
8. ● Treat as traditional document retrieval problem
○ Cosine Similarity
○ Query likelihood language model
○ KL-divergence language model
● Classification method
Answer Detection
9. Think of a “distance” between question language model
and answer language model
p(w|Ma) :
p(w|Mq) :
Probability of keyword appeared in
candidate answer.
Probability of keyword appeared in
question.
KL-divergence
10. ● Treat as traditional document retrieval problem
○ Cosine Similarity
○ Query likelihood language model
○ KL-divergence language model
● Classification method
Cons: Do not consider the relationship of candidate
answers and forum-specific features.
a1: world hotel is good but I prefer century hotel
a2: world hotel has a very good restaurant
a2(generator) → a1(offspring)
Answer Detection
13. ● Calculate weight based on
○ Probability assigned by language model of
generating one candidate answer from the other
candidate answer
○ The distance of candidate answer from question
○ The authority of authors of candidate answer.
author(ag ; #reply2, #start)
Graph-Based Propagation
22. Reference
1. Finding question-answer pairs from online forum
http://research.microsoft.com/en-us/people/cyl/sigir2008-gao-
msra.pdf
1. PageRank without hyperlinks: Structural re-ranking
using links induced by language models
https://www.cs.cornell.edu/home/llee/papers/lmpagerank.hom
e.html