Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

Fighting with Sparsity Ustalov D.A. et al.
Fighting with Sparsity of the
Synonymy Dictionaries for
Automatic Synset Induction
Dmitry Ustalov, Mikhail Chernoskutov
Ural Federal University
Chris Biemann, Alexander Panchenko
Universität Hamburg

Outline
•Introduction
•The Problem
•The Approaches
•Evaluation
•Discussion
•Conclusion
2

Introduction
•Synset Induction is an unsupervised
task of discovering synsets in a
synonymy graph.
•Notable Methods:
• MaxMax (Hope & Keller, 2013),
• ECO (Gonçalo-Oliveira & Gomes, 2014),
• WATSET (Ustalov et al., 2017) ← SOTA.
•See the survey in our paper.
3

The Problem
•A synonymy graph contains densely
connected subgraphs.
•These subgraphs correspond to the
synsets.
•The synonymy dictionaries are not
perfect.
•Sometimes they have missing edges.
4

“As Is” “To Be”
5
The Intuition

The Approaches
•We propose two approaches for
reducing graph sparseness by adding
potentially pertinent edges.
• Synonymy Relation Transitivity (A1)
• Similar Synset Merging (A2)
•We also evaluate them on two lexical
semantic resources for Russian:
RuWordNet and YARN.
6

•Synonymy is an equivalence relation:
• reflexiveness, symmetry, transitivity.
•We assume that if an edge is missing,
the graph still contains several
relatively short paths between the
synonymous words.
•This approach is designed to be
executed before the synset induction.
7
A1: Synonymy Transitivity

A1: Synonymy Transitivity
•For each vertex, extract its 2nd order
ego network.
• Compute the set of candidate edges by
connecting the disconnected nodes.
• Compute the number of paths between
the nodes in candidate edges.
• Add an edge iff there exist at least k
paths of lengths [i; j].
•Then, the augmented graph is passed
to synset induction.
8

A2: Synset Merging
•A similarity measure can be computed
between two vectors.
• Think of synset embeddings.
•We assume that if two synsets are
really similar, then they can be
merged.
•This approach is designed to be
executed after the synset induction.
9

A2: Synset Merging
•Obtain synset embeddings using
SenseGram (Pelevina et al., 2016).
• Just average the word vectors in synsets.
•Identify the closely related synsets
using m-kNN algorithm (Panchenko et
al., 2012).
•Merge the t closely related synsets.
• The smallest are merged first.
10

Evaluation
•We use WATSET, a
soft clustering
algorithm for
undirected
graphs.
•WATSET shows
SOTA results on
synset induction.
11
Ustalov D., Panchenko A., Biemann C. Watset: Automatic Induction of Synsets
from a Graph of Synonyms. In: Proc. ACL 2017.

Evaluation: Measure & Data
•Measure: paired precision and recall.
•Gold standard: RuWordNet and YARN.
•The input graph: Wiktionary + Abramov
+ UNLDC.
•Word vectors are from RDT.
12

RuWordNet YARN
13
Evaluation: Results
Input Graph Synonymy Transitivity Synset Merging

Evaluation: Results
•Obviously, the transitivity approach
shown virtually no improvement.
•The merging approach substantially
increased the recall.
•Both methods trade off gains in recall
for the drops in precision.
14

Discussion
•Transitivity. No word is a perfect
synonym of another. The communities
with the new edges become bigger.
•Merging. Distributional semantic
models tend to connect co-hyponyms
instead of synonyms.
•Alternatives. Structural Heuristics?
Hearst Patterns? Anaphora
Resolution? Crowdsourcing?
15

Conclusion
•We fought with sparsity of the
synonymy dictionaries using two
approaches.
• Only synset merging won.
•Synset embeddings are easy to obtain.
They also show better results on such
a challenging task.
• Just average the word vectors and
compute similarity.
16

Thank You!
• Dmitry Ustalov
dmitry.ustalov@gmail.com
• nlpub.ru/Watset
• nlpub.ru/RDT
Join SIGSLAV,
an ACL SIG on
Slavic languages!
sigslav.cs.helsinki.fi
We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under
the “JOIN-T” project, the DAAD, the RFBR under the projects no. 16-37-00203 мол_а and no. 16-
37-00354 мол_а, and the RFH under the project no. 16-04-12019. The calculations were carried
out using the supercomputer “Uran” at the Krasovskii Institute of Mathematics and Mechanics.
We also thank four anonymous reviewers for their helpful comments.

Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

Recommended

Recommended

More Related Content

Similar to Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

Similar to Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction (20)

More from Alexander Panchenko

More from Alexander Panchenko (20)

Recently uploaded

Recently uploaded (20)

Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction