1. HSE-School of linguistics at Russian Paraphrase
Detection Shared Task
Anastasia Romanova, Mikhail Nefedov
Saint-Petersburg, 2016
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
2. Overview
1 Introduction
2 Task
3 Standard Features
4 Word Embedding Features
5 Results
6 Next steps
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
3. Introduction
Higher School of Economics School of Linguistics
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
4. Task
Compare two sentences
Two types of classification
Standard and Non-standard runs
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
6. Standard Features
SyntaxNet
Released by Google in May, 2016
Models for 40 languages
Dependency parse tree
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
7. Standard Features
Tree Edit Distance (Zhang, Shasha, 1989)
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
9. Word Embedding Features
Words as vectors
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
10. Word Embedding Features
Drawbacks of the averaging approach (Rijke
and Kenter, 2015)
Vectors for words Mean vectors
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
11. Word Embedding Features
Before preprocessing
Клинтон выступила с первой речью после поражения на
выборах
After preprocessing
клинтон_S выступать_V первый_A речь_S поражение_S
выбор_S
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
12. Word Embedding Features
BM25 + Word2Vec
sl - longest sentences
ss - shortest sentences
avgsl - average sentence length
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
13. Word Embedding Features
All to all similarities
The boy smiles - The girls laughs
Similarity matrix
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
14. Word Embedding Features
All to all similarities
The boy smiles - The girls laughs
Bins for all values
Bins for maximum values
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
15. Word Embedding Features
Per-dimension similarities
Cosine similarity
Similarity bins
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S
17. Next steps
Find optimal intervals for bins
Create a new Word2Vec model
Test AdaGram
Compute idf on a larger corpus
Include dependency weighting into BM25
Anastasia Romanova, Mikhail Nefedov HSE-School of linguistics at Russian Paraphrase Detection S