SlideShare a Scribd company logo
1 of 23
G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. (2008)
Presenter: Tan Kent Loong r04944005
Finding Question-Answer Pairs from
Online Forums
Motivation
● Forums contain a huge amount of user-
generated content on a variety of topics.
○ Knowledge of human
○ Largely unstructured
Flow Chart
Question Detection Answer DetectionThreads
Forum
Question Detection
● Rule based methods
○ 5W1H
○ End with ' ? '
■ 30% questions do not end with question marks.
● I am wondering where I can visit in Bangkok.
● I am having doubt about changing tyre.
■ 9% are not questions
● Like to enjoy a long walk while enjoying great
sights and tastes?
● Only have three days to explore this city?
Not good!
Labeled Sequential Pattern
1. Pre-process each sentence into POS tags
“where can you find a job”
→ “where can PRP VB DT NN”
1. Build sequence database.
<a, d, e, f> → Q
<a, f, e, f> → Q
<d, a, f> → NQ
1. Calculate the support and confidence
- <a, e, f> with support 66.7% and 100% confidence
- <a, f> with support 66.7% and 66.7% confidence
1. Set minimum support threshold and minimum
confidence threshold
F1 score = 97.4%
Answer Detection
● Observation: Many-to-many
○ Multiple questions and answers within same
thread.
■ 1 question may have multiple replies.
■ 1 post may contain answers to multiple
questions.
● Treat as traditional document retrieval problem
○ Cosine Similarity
○ Query likelihood language model
○ KL-divergence language model
● Classification method
Answer Detection
Think of a “distance” between question language model
and answer language model
p(w|Ma) :
p(w|Mq) :
Probability of keyword appeared in
candidate answer.
Probability of keyword appeared in
question.
KL-divergence
● Treat as traditional document retrieval problem
○ Cosine Similarity
○ Query likelihood language model
○ KL-divergence language model
● Classification method
Cons: Do not consider the relationship of candidate
answers and forum-specific features.
a1: world hotel is good but I prefer century hotel
a2: world hotel has a very good restaurant
a2(generator) → a1(offspring)
Answer Detection
PageRank (without hyperlinks)
Graph-Based Propagation
● Calculate weight based on
○ Probability assigned by language model of
generating one candidate answer from the other
candidate answer
○ The distance of candidate answer from question
○ The authority of authors of candidate answer.
author(ag ; #reply2, #start)
Graph-Based Propagation
Graph-Based Propagation
1. Propagation without initial score:
Graph-Based Propagation
1. Propagation without initial score:
2. Propagation with initial score:
Integration with other methods
1. Graph based propagation → classification
2. Lexical mapping
e.g. “why → because”
Evaluation
Evaluation
Evaluation
Evaluation
Summary
Question Detection
(Labeled Sequence
Pattern)
Answer Detection
(Enhance with
Graph-based
Propagation)
Threads
Forum
Reference
1. Finding question-answer pairs from online forum
http://research.microsoft.com/en-us/people/cyl/sigir2008-gao-
msra.pdf
1. PageRank without hyperlinks: Structural re-ranking
using links induced by language models
https://www.cs.cornell.edu/home/llee/papers/lmpagerank.hom
e.html
Thank you

More Related Content

Viewers also liked

Viewers also liked (9)

Creando contactos en gmail
Creando contactos en gmailCreando contactos en gmail
Creando contactos en gmail
 
fLYER-2
fLYER-2fLYER-2
fLYER-2
 
the
thethe
the
 
Twitter: From Newbie To Ninja
Twitter:  From Newbie To NinjaTwitter:  From Newbie To Ninja
Twitter: From Newbie To Ninja
 
Lake Turnover PowerPoint, Summer Stagnation, Epilimnion, Thermocline, Hypolim...
Lake Turnover PowerPoint, Summer Stagnation, Epilimnion, Thermocline, Hypolim...Lake Turnover PowerPoint, Summer Stagnation, Epilimnion, Thermocline, Hypolim...
Lake Turnover PowerPoint, Summer Stagnation, Epilimnion, Thermocline, Hypolim...
 
Scientific Method and Variables
Scientific Method and VariablesScientific Method and Variables
Scientific Method and Variables
 
Brand positioning part 4
Brand positioning   part 4Brand positioning   part 4
Brand positioning part 4
 
Metallic Bond
Metallic BondMetallic Bond
Metallic Bond
 
The Trees are Down
The Trees are DownThe Trees are Down
The Trees are Down
 

Similar to Finding question answer pairs from online forum

[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsSergey Sosnovsky
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systemsQi He
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine ReadingIsabelle Augenstein
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningNikhil Dandekar
 
Knowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jKnowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jChristophe Willemsen
 
Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
 Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
Teaching Machines to Read and Answer Questions - Sandya - Conduent LabsCodeOps Technologies LLP
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitmVinu Ev
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenGraphAware
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxKaiduTester
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningVo Viet Anh
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxgyan98
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering SystemDhwaj Raj
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 

Similar to Finding question answer pairs from online forum (20)

[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 
Knowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jKnowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4j
 
Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
 Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
Teaching Machines to Read and Answer Questions - Sandya - Conduent Labs
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitm
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptx
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

Finding question answer pairs from online forum

  • 1. G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. (2008) Presenter: Tan Kent Loong r04944005 Finding Question-Answer Pairs from Online Forums
  • 2.
  • 3. Motivation ● Forums contain a huge amount of user- generated content on a variety of topics. ○ Knowledge of human ○ Largely unstructured
  • 4. Flow Chart Question Detection Answer DetectionThreads Forum
  • 5. Question Detection ● Rule based methods ○ 5W1H ○ End with ' ? ' ■ 30% questions do not end with question marks. ● I am wondering where I can visit in Bangkok. ● I am having doubt about changing tyre. ■ 9% are not questions ● Like to enjoy a long walk while enjoying great sights and tastes? ● Only have three days to explore this city? Not good!
  • 6. Labeled Sequential Pattern 1. Pre-process each sentence into POS tags “where can you find a job” → “where can PRP VB DT NN” 1. Build sequence database. <a, d, e, f> → Q <a, f, e, f> → Q <d, a, f> → NQ 1. Calculate the support and confidence - <a, e, f> with support 66.7% and 100% confidence - <a, f> with support 66.7% and 66.7% confidence 1. Set minimum support threshold and minimum confidence threshold F1 score = 97.4%
  • 7. Answer Detection ● Observation: Many-to-many ○ Multiple questions and answers within same thread. ■ 1 question may have multiple replies. ■ 1 post may contain answers to multiple questions.
  • 8. ● Treat as traditional document retrieval problem ○ Cosine Similarity ○ Query likelihood language model ○ KL-divergence language model ● Classification method Answer Detection
  • 9. Think of a “distance” between question language model and answer language model p(w|Ma) : p(w|Mq) : Probability of keyword appeared in candidate answer. Probability of keyword appeared in question. KL-divergence
  • 10. ● Treat as traditional document retrieval problem ○ Cosine Similarity ○ Query likelihood language model ○ KL-divergence language model ● Classification method Cons: Do not consider the relationship of candidate answers and forum-specific features. a1: world hotel is good but I prefer century hotel a2: world hotel has a very good restaurant a2(generator) → a1(offspring) Answer Detection
  • 13. ● Calculate weight based on ○ Probability assigned by language model of generating one candidate answer from the other candidate answer ○ The distance of candidate answer from question ○ The authority of authors of candidate answer. author(ag ; #reply2, #start) Graph-Based Propagation
  • 14. Graph-Based Propagation 1. Propagation without initial score:
  • 15. Graph-Based Propagation 1. Propagation without initial score: 2. Propagation with initial score:
  • 16. Integration with other methods 1. Graph based propagation → classification 2. Lexical mapping e.g. “why → because”
  • 21. Summary Question Detection (Labeled Sequence Pattern) Answer Detection (Enhance with Graph-based Propagation) Threads Forum
  • 22. Reference 1. Finding question-answer pairs from online forum http://research.microsoft.com/en-us/people/cyl/sigir2008-gao- msra.pdf 1. PageRank without hyperlinks: Structural re-ranking using links induced by language models https://www.cs.cornell.edu/home/llee/papers/lmpagerank.hom e.html