1. Anirban Sen, IIT Delhi
Manjira Sinha, Conduent Labs India
Sandya Mannarswamy, Conduent
Labs India
2. ‘billions of voices’ participate in CQA forums
regularly
Each has its own style of expression,
vocabulary or knowledge
Results in wide lexico-syntactic gap
Makes it hard to identify matching pairs of
question and answer
Required to ease the load to retrieval and
answering similar questions repeatedly
3. I write: does drinking co.ee increase blood
pressure?
My father writes: Has there been any
known link between caffeine intake and
hypertension?
A way to address the challenge is incorporate
vocabulary diversity in consolidated
representation
4.
5.
6. Stack Exchange Data Set released by Doris
Hoogeveen, Karin M. Verspoor, and Timothy
Baldwin. 2015. CQADupStack: A Benchmark Data
Set for Community Question-Answering
Research. In Proceedings of the 20th Australasian
Document Computing Symposium (ADCS’15)
Training 2K, Test 8K
Balanced/Unbalanced;
0-verlap, 2-overlap
7.
8. Incorporating essence of ‘multi-sentence’
input
Experiment with attention based models
and recurring networks
Look into different domains of discussions
separately
Use of background domain knowledge