SlideShare a Scribd company logo
Human and Machine Judgements
about Russian Semantic Relatedness
• A. Panchenko, D. Ustalov, D. Paperno, C. Meyer, N.
Konstantinova, N. Loukachevitch, Ch. Bieman
Motivation
• A semantic similarity measure is a specific kind of
• similarity measure for nouns or multiword expressions.
• … high values for synonyms, hyponyms, free associations, etc.
• … low values for unrelated pairs
• Applications:
• information retrieval, document clustering, topic detection, question
answering, word sense disambiguation, text summarization…
• Most datasets, approaches were proposed for English
• 2015 Russe
• The First International Workshop on Russian Semantic Similarity
Evaluation (RUSSE)
• 19 participants, 105 runs, special session at the Dialog-2015 conference.
Russian Datasets for Measuring Word
Semantic Similarity
• Human Judgement dataset (HJ dataset)
– Word pairs with human judgements
• Russian Thesaurus dataset (RT dataset)
– synonyms and hypernyms from RuThes thesaurus
• Associative Thesaurus dataset (AE dataset)
– cognitive associations between words
• Machine Judgements
– combination of submissions from a shared task on Russian
semantic similarity
• Russian Distributional Thesaurus
Human judgements about semantic
similarity (HJ)
• This is the standard way to assess a semantic similarity
measure.
• The HJ dataset contains word pairs translated from the
widely used benchmarks for English:
• Miller-Charles set – 30 word pairs
• Rubenstein, H., Goodenough – 65 word pairs
• WordSim – 353 word pairs:
• Additionally subdivided into similarity set and relatedness set
• Evaluation: Correlations with human judgments in terms of Spearman’s
rank correlation
• Agreement in ordering
Human judgements: Crowdsourcing
Example of human judgements about
semantic similarity (HJ)
RuThes Lingustic Ontology
http://www.labinform.ru/pub/ruthes/index.htm
• 96 thousand unique words and expressions
– Synonyms
– Conceptual relations: class-subclass, part-whole, conceptual
dependence
•The dataset contains 114 066
relations for 6 832 nouns.
•Half of these relations are
synonyms and hypernyms
from the RuThes-lite
thesaurus
•half of them are unrelated
words.
Thesaurus Sociation.org
•Non-commercial
Internet-project
• contains 325,863
associations for 37,463
words
Structure of the semantic relation
classification (RT, AE) benchmarks
Russe: Best models according to the
HJ benchmark
MJ: Machine Judgements of Word Pairs
from the RUSSE Shared Task
• This dataset contains 12 886 word pairs coming
from HJ, RT, and AE datasets
• The pairs have continuous relatedness scores
• To estimate these scores we averaged 105
submissions of the shared task on Russian
semantic similarity, RUSSE.
• Each run consisted of 12 886 word pairs along
with their similarity scores.
Gathering Machine Judgements
• Select one best submission for each of 19
participating teams for HJ, RT and AE datasets
• Rank the 19 best submissions. The best one
has rank r1 = 19; the worst has rank r19 = 1
• Combine scores of these 19 best submissions
– The score of a pair is equal to sum of run scores
multiplied by run weight
– Run weight: rank, exponent of rank, or square root of
rank
• Combined approach is better than single
submission
Machine Judgements: Example
• word1,word2,sim,wmean
• препарат,вещество, 1.0,0.484418
• препарат,лекарство, 1.0,0.634770
• препарат,перестройка, 0.0,0.157699
• препарат,барселона, 0.0,0.105411
• инспекция,проверка, 1.0,0.532748
• инспекция,гол, 0.0,0.107823
• латы,меч, 1.0,0.428076
• латы,щит, 1.0,0.441120
• латы,рыцарь, 1.0,0.453718
• латы,броня, 1.0,0.414047
• латы,доспехи, 1.0,0.543852
DT: Open Russian Distributional Thesaurus
• skip-gram model (Mikolov et al., 2013)
• trained on a 12.9 billion word collection of books
in Russian
– minimal word frequency -- 5,
– number of dimensions in a word vector -- 500,
– Context window size: 10 words
– For the most frequent 932,000 words, 250 nearest
neighbours with the cosine similarity between word
vectors are calculated.
– These related words were lemmatized using
PyMorphy2.
Conclusion
• We presented new Russian resources for evaluating of
semantic relatedness measures
• Russian HJ datasets: Miller-Charles, Rubenstein, Goodenough;
WordSim-353
• RuThes dataset and Human associations dataset
• Machine Judgements Dataset and Distributional Thesaurus
• The resources can be obtained from
• http://panchenko.me/rsr/
• The semantic similarity and relatedness are useful in
many NLP and information retrieval applications

More Related Content

Viewers also liked

Climate Smart Agriculture-Brochure
Climate Smart Agriculture-BrochureClimate Smart Agriculture-Brochure
Climate Smart Agriculture-Brochure
surendra gautam
 
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
Grant Goddard
 
Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16
Bernard Moore
 
Research Paper (1)
Research Paper (1)Research Paper (1)
Research Paper (1)
guest309917
 
MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)
Namrata Bhowmik
 
GR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails WebflowGR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails Webflow
GR8Conf
 
Cervell, creativitat i oci.
Cervell, creativitat i oci.Cervell, creativitat i oci.
Cervell, creativitat i oci.
Manel Villar (Institut Poeta Maragall)
 
Portrait of professional developer 2.0
Portrait of professional developer 2.0Portrait of professional developer 2.0
Portrait of professional developer 2.0
Mikalai Alimenkou
 
Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)
Manel Villar (Institut Poeta Maragall)
 
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
Tauqeer Khalid Khan
 
IIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global SummaryIIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global Summary
Alan Quayle
 
tecnica de Respiracion
tecnica de Respiraciontecnica de Respiracion
tecnica de Respiracion
Lina Sapuy
 
Interpolation
InterpolationInterpolation
Interpolation
Dmytro Mitin
 
Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014
Sriram Krishnan
 
OOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applicationsOOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applications
Mikalai Alimenkou
 
L'inconscient
L'inconscientL'inconscient
Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10
datamemoryusa
 

Viewers also liked (17)

Climate Smart Agriculture-Brochure
Climate Smart Agriculture-BrochureClimate Smart Agriculture-Brochure
Climate Smart Agriculture-Brochure
 
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
 
Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16
 
Research Paper (1)
Research Paper (1)Research Paper (1)
Research Paper (1)
 
MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)
 
GR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails WebflowGR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails Webflow
 
Cervell, creativitat i oci.
Cervell, creativitat i oci.Cervell, creativitat i oci.
Cervell, creativitat i oci.
 
Portrait of professional developer 2.0
Portrait of professional developer 2.0Portrait of professional developer 2.0
Portrait of professional developer 2.0
 
Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)
 
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
 
IIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global SummaryIIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global Summary
 
tecnica de Respiracion
tecnica de Respiraciontecnica de Respiracion
tecnica de Respiracion
 
Interpolation
InterpolationInterpolation
Interpolation
 
Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014
 
OOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applicationsOOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applications
 
L'inconscient
L'inconscientL'inconscient
L'inconscient
 
Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10
 

Similar to Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness

Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...
Jinho Choi
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document Retrieval
Madhusudan Daad
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
James Hendler
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
jcscholtes
 
Semantic Application for Healthcare
Semantic Application for HealthcareSemantic Application for Healthcare
Semantic Application for Healthcare
scholten
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
joinson
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
Aldo Gangemi
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
butest
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
A review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingA review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word Embedding
Reza Sadeghi
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
GESIS
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
Armin Haller
 
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
Rocío Cañamares
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
Na'im Tyson
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice Questions
Cheng Zhang
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
Nicolas Van Labeke
 
Relationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology SearchRelationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology Search
NUST School of Electrical Engineering and Computer Science
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 

Similar to Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness (20)

Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document Retrieval
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
 
Semantic Application for Healthcare
Semantic Application for HealthcareSemantic Application for Healthcare
Semantic Application for Healthcare
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
A review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingA review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word Embedding
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
 
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice Questions
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
Relationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology SearchRelationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology Search
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 

More from AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
AIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
AIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
AIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
AIST
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
AIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
AIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
AIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
AIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
AIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
AIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
AIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
AIST
 

More from AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 

Recently uploaded

[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 

Recently uploaded (20)

[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 

Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness

  • 1. Human and Machine Judgements about Russian Semantic Relatedness • A. Panchenko, D. Ustalov, D. Paperno, C. Meyer, N. Konstantinova, N. Loukachevitch, Ch. Bieman
  • 2. Motivation • A semantic similarity measure is a specific kind of • similarity measure for nouns or multiword expressions. • … high values for synonyms, hyponyms, free associations, etc. • … low values for unrelated pairs • Applications: • information retrieval, document clustering, topic detection, question answering, word sense disambiguation, text summarization… • Most datasets, approaches were proposed for English • 2015 Russe • The First International Workshop on Russian Semantic Similarity Evaluation (RUSSE) • 19 participants, 105 runs, special session at the Dialog-2015 conference.
  • 3. Russian Datasets for Measuring Word Semantic Similarity • Human Judgement dataset (HJ dataset) – Word pairs with human judgements • Russian Thesaurus dataset (RT dataset) – synonyms and hypernyms from RuThes thesaurus • Associative Thesaurus dataset (AE dataset) – cognitive associations between words • Machine Judgements – combination of submissions from a shared task on Russian semantic similarity • Russian Distributional Thesaurus
  • 4. Human judgements about semantic similarity (HJ) • This is the standard way to assess a semantic similarity measure. • The HJ dataset contains word pairs translated from the widely used benchmarks for English: • Miller-Charles set – 30 word pairs • Rubenstein, H., Goodenough – 65 word pairs • WordSim – 353 word pairs: • Additionally subdivided into similarity set and relatedness set • Evaluation: Correlations with human judgments in terms of Spearman’s rank correlation • Agreement in ordering
  • 6. Example of human judgements about semantic similarity (HJ)
  • 7. RuThes Lingustic Ontology http://www.labinform.ru/pub/ruthes/index.htm • 96 thousand unique words and expressions – Synonyms – Conceptual relations: class-subclass, part-whole, conceptual dependence •The dataset contains 114 066 relations for 6 832 nouns. •Half of these relations are synonyms and hypernyms from the RuThes-lite thesaurus •half of them are unrelated words.
  • 9. Structure of the semantic relation classification (RT, AE) benchmarks
  • 10. Russe: Best models according to the HJ benchmark
  • 11. MJ: Machine Judgements of Word Pairs from the RUSSE Shared Task • This dataset contains 12 886 word pairs coming from HJ, RT, and AE datasets • The pairs have continuous relatedness scores • To estimate these scores we averaged 105 submissions of the shared task on Russian semantic similarity, RUSSE. • Each run consisted of 12 886 word pairs along with their similarity scores.
  • 12. Gathering Machine Judgements • Select one best submission for each of 19 participating teams for HJ, RT and AE datasets • Rank the 19 best submissions. The best one has rank r1 = 19; the worst has rank r19 = 1 • Combine scores of these 19 best submissions – The score of a pair is equal to sum of run scores multiplied by run weight – Run weight: rank, exponent of rank, or square root of rank • Combined approach is better than single submission
  • 13. Machine Judgements: Example • word1,word2,sim,wmean • препарат,вещество, 1.0,0.484418 • препарат,лекарство, 1.0,0.634770 • препарат,перестройка, 0.0,0.157699 • препарат,барселона, 0.0,0.105411 • инспекция,проверка, 1.0,0.532748 • инспекция,гол, 0.0,0.107823 • латы,меч, 1.0,0.428076 • латы,щит, 1.0,0.441120 • латы,рыцарь, 1.0,0.453718 • латы,броня, 1.0,0.414047 • латы,доспехи, 1.0,0.543852
  • 14. DT: Open Russian Distributional Thesaurus • skip-gram model (Mikolov et al., 2013) • trained on a 12.9 billion word collection of books in Russian – minimal word frequency -- 5, – number of dimensions in a word vector -- 500, – Context window size: 10 words – For the most frequent 932,000 words, 250 nearest neighbours with the cosine similarity between word vectors are calculated. – These related words were lemmatized using PyMorphy2.
  • 15.
  • 16. Conclusion • We presented new Russian resources for evaluating of semantic relatedness measures • Russian HJ datasets: Miller-Charles, Rubenstein, Goodenough; WordSim-353 • RuThes dataset and Human associations dataset • Machine Judgements Dataset and Distributional Thesaurus • The resources can be obtained from • http://panchenko.me/rsr/ • The semantic similarity and relatedness are useful in many NLP and information retrieval applications