SlideShare a Scribd company logo
Fighting with Sparsity Ustalov D.A. et al.
Fighting with Sparsity of the
Synonymy Dictionaries for
Automatic Synset Induction
Dmitry Ustalov, Mikhail Chernoskutov
Ural Federal University
Chris Biemann, Alexander Panchenko
Universität Hamburg
Fighting with Sparsity Ustalov D.A. et al.
Outline
•Introduction
•The Problem
•The Approaches
•Evaluation
•Discussion
•Conclusion
2
Fighting with Sparsity Ustalov D.A. et al.
Introduction
•Synset Induction is an unsupervised
task of discovering synsets in a
synonymy graph.
•Notable Methods:
• MaxMax (Hope & Keller, 2013),
• ECO (Gonçalo-Oliveira & Gomes, 2014),
• WATSET (Ustalov et al., 2017) ← SOTA.
•See the survey in our paper.
3
Fighting with Sparsity Ustalov D.A. et al.
The Problem
•A synonymy graph contains densely
connected subgraphs.
•These subgraphs correspond to the
synsets.
•The synonymy dictionaries are not
perfect.
•Sometimes they have missing edges.
4
Fighting with Sparsity Ustalov D.A. et al.
“As Is” “To Be”
5
The Intuition
Fighting with Sparsity Ustalov D.A. et al.
The Approaches
•We propose two approaches for
reducing graph sparseness by adding
potentially pertinent edges.
• Synonymy Relation Transitivity (A1)
• Similar Synset Merging (A2)
•We also evaluate them on two lexical
semantic resources for Russian:
RuWordNet and YARN.
6
Fighting with Sparsity Ustalov D.A. et al.
•Synonymy is an equivalence relation:
• reflexiveness, symmetry, transitivity.
•We assume that if an edge is missing,
the graph still contains several
relatively short paths between the
synonymous words.
•This approach is designed to be
executed before the synset induction.
7
A1: Synonymy Transitivity
Fighting with Sparsity Ustalov D.A. et al.
A1: Synonymy Transitivity
•For each vertex, extract its 2nd order
ego network.
• Compute the set of candidate edges by
connecting the disconnected nodes.
• Compute the number of paths between
the nodes in candidate edges.
• Add an edge iff there exist at least k
paths of lengths [i; j].
•Then, the augmented graph is passed
to synset induction.
8
Fighting with Sparsity Ustalov D.A. et al.
A2: Synset Merging
•A similarity measure can be computed
between two vectors.
• Think of synset embeddings.
•We assume that if two synsets are
really similar, then they can be
merged.
•This approach is designed to be
executed after the synset induction.
9
Fighting with Sparsity Ustalov D.A. et al.
A2: Synset Merging
•Obtain synset embeddings using
SenseGram (Pelevina et al., 2016).
• Just average the word vectors in synsets.
•Identify the closely related synsets
using m-kNN algorithm (Panchenko et
al., 2012).
•Merge the t closely related synsets.
• The smallest are merged first.
10
Fighting with Sparsity Ustalov D.A. et al.
Evaluation
•We use WATSET, a
soft clustering
algorithm for
undirected
graphs.
•WATSET shows
SOTA results on
synset induction.
11
Ustalov D., Panchenko A., Biemann C. Watset: Automatic Induction of Synsets
from a Graph of Synonyms. In: Proc. ACL 2017.
Fighting with Sparsity Ustalov D.A. et al.
Evaluation: Measure & Data
•Measure: paired precision and recall.
•Gold standard: RuWordNet and YARN.
•The input graph: Wiktionary + Abramov
+ UNLDC.
•Word vectors are from RDT.
12
Fighting with Sparsity Ustalov D.A. et al.
RuWordNet YARN
13
Evaluation: Results
Input Graph Synonymy Transitivity Synset Merging
Fighting with Sparsity Ustalov D.A. et al.
Evaluation: Results
•Obviously, the transitivity approach
shown virtually no improvement.
•The merging approach substantially
increased the recall.
•Both methods trade off gains in recall
for the drops in precision.
14
Fighting with Sparsity Ustalov D.A. et al.
Discussion
•Transitivity. No word is a perfect
synonym of another. The communities
with the new edges become bigger.
•Merging. Distributional semantic
models tend to connect co-hyponyms
instead of synonyms.
•Alternatives. Structural Heuristics?
Hearst Patterns? Anaphora
Resolution? Crowdsourcing?
15
Fighting with Sparsity Ustalov D.A. et al.
Conclusion
•We fought with sparsity of the
synonymy dictionaries using two
approaches.
• Only synset merging won.
•Synset embeddings are easy to obtain.
They also show better results on such
a challenging task.
• Just average the word vectors and
compute similarity.
16
Fighting with Sparsity Ustalov D.A. et al.
Thank You!
• Dmitry Ustalov
dmitry.ustalov@gmail.com
• nlpub.ru/Watset
• nlpub.ru/RDT
Join SIGSLAV,
an ACL SIG on
Slavic languages!
sigslav.cs.helsinki.fi
We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under
the “JOIN-T” project, the DAAD, the RFBR under the projects no. 16-37-00203 мол_а and no. 16-
37-00354 мол_а, and the RFH under the project no. 16-04-12019. The calculations were carried
out using the supercomputer “Uran” at the Krasovskii Institute of Mathematics and Mechanics.
We also thank four anonymous reviewers for their helpful comments.

More Related Content

Similar to Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

F0422052058
F0422052058F0422052058
F0422052058
ijceronline
 
Thesis_NickyGrant_2013
Thesis_NickyGrant_2013Thesis_NickyGrant_2013
Thesis_NickyGrant_2013Nicky Grant
 
Combining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing InstructionCombining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing Instruction
guest05424
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
Uddalok Jana
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension system
Masa Kato
 
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
Sanjib Narzary
 
unit-4.ppt
unit-4.pptunit-4.ppt
unit-4.ppt
MsRAMYACSE
 
unit 4.ppt
unit 4.pptunit 4.ppt
unit 4.ppt
ChiefExamcell
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
AIST
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
Lei Wang
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
Enterprise Search Warsaw Meetup
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
maximum parsimony.pdf
maximum parsimony.pdfmaximum parsimony.pdf
maximum parsimony.pdf
SrimathideviJ
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
ThAnhonc
 
Math in the modern world math as a language.pptx
Math in the modern world math as a language.pptxMath in the modern world math as a language.pptx
Math in the modern world math as a language.pptx
JayLagman3
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
Tara Kissel, M.Ed
 

Similar to Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction (20)

F0422052058
F0422052058F0422052058
F0422052058
 
Thesis_NickyGrant_2013
Thesis_NickyGrant_2013Thesis_NickyGrant_2013
Thesis_NickyGrant_2013
 
Combining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing InstructionCombining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing Instruction
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension system
 
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
Basics of Language Science, Probabilities & Statistics and Artificial Neural ...
 
unit-4.ppt
unit-4.pptunit-4.ppt
unit-4.ppt
 
unit 4.ppt
unit 4.pptunit 4.ppt
unit 4.ppt
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
maximum parsimony.pdf
maximum parsimony.pdfmaximum parsimony.pdf
maximum parsimony.pdf
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
Math in the modern world math as a language.pptx
Math in the modern world math as a language.pptxMath in the modern world math as a language.pptx
Math in the modern world math as a language.pptx
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
 

More from Alexander Panchenko

Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...
Alexander Panchenko
 
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Alexander Panchenko
 
Improving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic ClassesImproving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic Classes
Alexander Panchenko
 
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesInducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Alexander Panchenko
 
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
Alexander Panchenko
 
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
Alexander Panchenko
 
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationUsing Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Alexander Panchenko
 
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Alexander Panchenko
 
Making Sense of Word Embeddings
Making Sense of Word EmbeddingsMaking Sense of Word Embeddings
Making Sense of Word Embeddings
Alexander Panchenko
 
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Alexander Panchenko
 
Getting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part IIGetting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part II
Alexander Panchenko
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
Alexander Panchenko
 
Text Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataText Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataAlexander Panchenko
 
Неологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукНеологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукAlexander Panchenko
 
Sentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking FacebookSentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking Facebook
Alexander Panchenko
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
Alexander Panchenko
 
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Alexander Panchenko
 
Detecting Gender by Full Name: Experiments with the Russian Language
Detecting Gender by Full Name:  Experiments with the Russian LanguageDetecting Gender by Full Name:  Experiments with the Russian Language
Detecting Gender by Full Name: Experiments with the Russian Language
Alexander Panchenko
 
Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...
Alexander Panchenko
 

More from Alexander Panchenko (20)

Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...
 
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
 
Improving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic ClassesImproving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic Classes
 
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesInducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
 
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
 
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
 
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationUsing Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
 
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
 
Making Sense of Word Embeddings
Making Sense of Word EmbeddingsMaking Sense of Word Embeddings
Making Sense of Word Embeddings
 
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
 
Getting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part IIGetting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part II
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
 
Text Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataText Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK Data
 
Неологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукНеологизмы в социальной сети Фейсбук
Неологизмы в социальной сети Фейсбук
 
Sentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking FacebookSentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking Facebook
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
 
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
 
Detecting Gender by Full Name: Experiments with the Russian Language
Detecting Gender by Full Name:  Experiments with the Russian LanguageDetecting Gender by Full Name:  Experiments with the Russian Language
Detecting Gender by Full Name: Experiments with the Russian Language
 
Document
DocumentDocument
Document
 
Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...
 

Recently uploaded

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 

Recently uploaded (20)

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 

Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

  • 1. Fighting with Sparsity Ustalov D.A. et al. Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction Dmitry Ustalov, Mikhail Chernoskutov Ural Federal University Chris Biemann, Alexander Panchenko Universität Hamburg
  • 2. Fighting with Sparsity Ustalov D.A. et al. Outline •Introduction •The Problem •The Approaches •Evaluation •Discussion •Conclusion 2
  • 3. Fighting with Sparsity Ustalov D.A. et al. Introduction •Synset Induction is an unsupervised task of discovering synsets in a synonymy graph. •Notable Methods: • MaxMax (Hope & Keller, 2013), • ECO (Gonçalo-Oliveira & Gomes, 2014), • WATSET (Ustalov et al., 2017) ← SOTA. •See the survey in our paper. 3
  • 4. Fighting with Sparsity Ustalov D.A. et al. The Problem •A synonymy graph contains densely connected subgraphs. •These subgraphs correspond to the synsets. •The synonymy dictionaries are not perfect. •Sometimes they have missing edges. 4
  • 5. Fighting with Sparsity Ustalov D.A. et al. “As Is” “To Be” 5 The Intuition
  • 6. Fighting with Sparsity Ustalov D.A. et al. The Approaches •We propose two approaches for reducing graph sparseness by adding potentially pertinent edges. • Synonymy Relation Transitivity (A1) • Similar Synset Merging (A2) •We also evaluate them on two lexical semantic resources for Russian: RuWordNet and YARN. 6
  • 7. Fighting with Sparsity Ustalov D.A. et al. •Synonymy is an equivalence relation: • reflexiveness, symmetry, transitivity. •We assume that if an edge is missing, the graph still contains several relatively short paths between the synonymous words. •This approach is designed to be executed before the synset induction. 7 A1: Synonymy Transitivity
  • 8. Fighting with Sparsity Ustalov D.A. et al. A1: Synonymy Transitivity •For each vertex, extract its 2nd order ego network. • Compute the set of candidate edges by connecting the disconnected nodes. • Compute the number of paths between the nodes in candidate edges. • Add an edge iff there exist at least k paths of lengths [i; j]. •Then, the augmented graph is passed to synset induction. 8
  • 9. Fighting with Sparsity Ustalov D.A. et al. A2: Synset Merging •A similarity measure can be computed between two vectors. • Think of synset embeddings. •We assume that if two synsets are really similar, then they can be merged. •This approach is designed to be executed after the synset induction. 9
  • 10. Fighting with Sparsity Ustalov D.A. et al. A2: Synset Merging •Obtain synset embeddings using SenseGram (Pelevina et al., 2016). • Just average the word vectors in synsets. •Identify the closely related synsets using m-kNN algorithm (Panchenko et al., 2012). •Merge the t closely related synsets. • The smallest are merged first. 10
  • 11. Fighting with Sparsity Ustalov D.A. et al. Evaluation •We use WATSET, a soft clustering algorithm for undirected graphs. •WATSET shows SOTA results on synset induction. 11 Ustalov D., Panchenko A., Biemann C. Watset: Automatic Induction of Synsets from a Graph of Synonyms. In: Proc. ACL 2017.
  • 12. Fighting with Sparsity Ustalov D.A. et al. Evaluation: Measure & Data •Measure: paired precision and recall. •Gold standard: RuWordNet and YARN. •The input graph: Wiktionary + Abramov + UNLDC. •Word vectors are from RDT. 12
  • 13. Fighting with Sparsity Ustalov D.A. et al. RuWordNet YARN 13 Evaluation: Results Input Graph Synonymy Transitivity Synset Merging
  • 14. Fighting with Sparsity Ustalov D.A. et al. Evaluation: Results •Obviously, the transitivity approach shown virtually no improvement. •The merging approach substantially increased the recall. •Both methods trade off gains in recall for the drops in precision. 14
  • 15. Fighting with Sparsity Ustalov D.A. et al. Discussion •Transitivity. No word is a perfect synonym of another. The communities with the new edges become bigger. •Merging. Distributional semantic models tend to connect co-hyponyms instead of synonyms. •Alternatives. Structural Heuristics? Hearst Patterns? Anaphora Resolution? Crowdsourcing? 15
  • 16. Fighting with Sparsity Ustalov D.A. et al. Conclusion •We fought with sparsity of the synonymy dictionaries using two approaches. • Only synset merging won. •Synset embeddings are easy to obtain. They also show better results on such a challenging task. • Just average the word vectors and compute similarity. 16
  • 17. Fighting with Sparsity Ustalov D.A. et al. Thank You! • Dmitry Ustalov dmitry.ustalov@gmail.com • nlpub.ru/Watset • nlpub.ru/RDT Join SIGSLAV, an ACL SIG on Slavic languages! sigslav.cs.helsinki.fi We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under the “JOIN-T” project, the DAAD, the RFBR under the projects no. 16-37-00203 мол_а and no. 16- 37-00354 мол_а, and the RFH under the project no. 16-04-12019. The calculations were carried out using the supercomputer “Uran” at the Krasovskii Institute of Mathematics and Mechanics. We also thank four anonymous reviewers for their helpful comments.