IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification

1/34
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple
Features for Community Question Answering and
Implicit Dialogue Identiﬁcation
Titas Nandi1, Chris Biemann2, Seid Muhie Yimam2, Deepak Gupta1,
Sarah Kohail2, Asif Ekbal1 and Pushpak Bhattacharyya1
1Indian Institute of Technology Patna, India
2Universit¨at Hamburg, Germany
{titas.ee13,deepak.pcs16,asif,pb}@iitp.ac.in
{biemann,yimam,kohail}@informatik.uni-hamburg.de
Presented by Alexander Panchenko2
August 3, 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
August
/ 34

2/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Statistical Model
3 Results
Results on Diﬀerent Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

3/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

4/34
SemEval 2017 Task 3: the Three Sub-Tasks
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

5/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

6/34
Related Work
Useful ideas from the best systems of 2015 and 2016 tasks:
Belinkov (2015): word vectors and meta-data features
Nicosia (2015): derived features from a comment in the
context of the entire thread
Filice (2016): stacking classiﬁers across subtasks
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

7/34
Outline of the Method
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

8/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

9/34
String Similarity Features
String similarity between a question-comment/question pair:
Jaro-Winkler
Levenshtein
Jaccard
Sorensen-Dice
n-gram
LCS
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

10/34
Domain (Task) Speciﬁc Features
If a comment by asker of the question is an acknowledgement
Position of comment in the thread
Coverage (the ratio of the number of tokens) of question by the
comment and comment by the question
Presence of URLs, emails or HTML tags
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

11/34
Word Embedding Features
Trained word embedding model using Word2Vec on
unannotated data
Sentence vectors
averaging word vectors
wscore = wquestion − wcomment
Distance scores
Based on the computed sentence vectors
Cosine Distance (1 − cos)
Manhattan Distance
Euclidean Distance
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

12/34
Topic Modeling Features
Trained LDA Topic model using Mallet tool on training data
Extracted the 20 most relevant topics for the data
Topic Vector of a Question/Comment
wscore = wquestion − wcomment
Topic Vocabulary of a Question/Comment
Vocabulary(T) =
10
i=1
topic words(ti )
where ti is one of the top topics for comment/question T.
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

13/34
Keyword and Named Entity Features
Extracted keywords or focus words from question and
comment using the RAKE algorithm (Rose et al., 2010)
Keyword match between question and comment
Extracted Named Entities from question and comment
Entity tags consisted of LOCATION, PERSON,
ORGANIZATION, DATE, MONEY, PERCENT and TIME
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

14/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

15/34
Identiﬁed implicit dialogues among users
User Interaction Graph
Each user is in dialogue with some other user who came before
him/her
Asker - desirable
Other users - not desirable
Vertices - Users in a comment thread
Edges - Directed edges showing interaction
Edge weight - the level of interaction
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

16/34
Implicit Dialogue Identiﬁcation: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

17/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

18/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

19/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

20/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

21/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

22/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

23/34
Computing Edge Weights
The edge weight is computed (or revised) on the basis of:
Explicit dialogue score. If one user refers the other explicitly,
then add 1.0 to the edge score.
Embedding score. For each word in a comment, ﬁnd the word
in the other comment that has maximum cosine similarity with it.
Then ﬁnally average all those max cosine scores to get a value.
Topic score. The cosine of topic vectors of the two comments.
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

24/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

25/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

26/34
Classiﬁcation Model
Normalized all feature values with Z-scores
Feature Selection using wrapper methods to maximize
accuracy on the development set
Used SVM conﬁdence probabilities for ranking (RBF kernel)
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

27/34
Subtask C: Similarity of Questions and External Comments
Oversample the data using the SMOTE (Chawla, 2002)
technique and run classiﬁer on original question - external
comment pair
Stacking across tasks: the SVM scores of all three subtasks
are combined:
Score C = log(SVM Score) + log(Score A) + log(Score B)
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

28/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

29/34
Feature Ablation Results: Impact of Diﬀerent Feature Sets
Features Development Set 2017
Subtask A MAP P R F1 Acc
All Features 65.50 58.43 62.71 60.50 72.54
All — string 65.53 57.84 62.71 60.18 72.17
All — embedding 62.11 53.03 53.42 53.23 68.52
All — domain 61.85 54.46 54.52 54.49 69.47
All — topic 65.15 59.02 61.98 60.47 72.83
All — keyword 65.73 57.98 62.59 60.20 72.25
IR Baseline 53.84 - - - -
Runs Test Set 2017
Subtask A MAP P R F1 Acc
Primary 86.88 73.37 74.52 73.94 72.70
Contrastive 1 86.35 79.42 51.94 62.80 68.02
Contrastive 2 85.24 81.22 57.65 67.43 71.06
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

30/34
Outline
1 Task Description
Related Work
Basic Features
Statistical Model
3 Results
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

31/34
Comparison of Results on Subtask A at SemEval 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

32/34
Comparison of Results on Subtask C at SemEval 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

33/34
Observations and Conclusions
Small in-domain texts are better for training, compared to
large out-of-domain pre-trained GoogleNews embeddings
Most instrumental are features based on:
User dialogues
Word embeddings
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

34/34
Thank you!
Any questions from the
community?
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
August
/ 34

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification

Recommended

Recommended

More Related Content

More from Alexander Panchenko

More from Alexander Panchenko (19)

Recently uploaded

Recently uploaded (20)

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification