Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Comparison of Taggers

•

1 like•201 views

AIST

Morphological Analysis for Russian: Integration and Comparison of Taggers

Data & Analytics

The two problems
○ Russian is a highly inflective and morphologically rich
language.
○ There are many morphological analyzers for Russian.
○ Which one is better?
○ Taggers make errors in different issues: when one analyzer
fails, another may guess the correct tag.
○ Do the cases where taggers make errors overlap or not?

Features of morphological
processing of Russian
○ No standard for part-of-speech annotation;
○ Variety of solutions:
● positional tags following the MULTEXT-East
guidelines
скалолазание NCNSAI0000
● combinations of tags employed in the Russian
National Corpus
<ana lex="год" gr="S,m,inan=sg,gen"/>

Taggers
○ Pymorphy2: based on OpenCorpora dictionaries, predictions
for unknown words.
○ Freeling: tokenizing, sentence splitting, morphology
analyzers with disambiguation, syntax parsing, named entity
recognition, etc.
○ TreeTagger: based on decision trees and should be trained
on a lexicon and a manually tagged training corpus.
○ MyStem: disambiguation, hypotheses for both known and
unknown words.

Experimental design
We evaluated the performance of our taggers compared to the
disambiguated part of the RNC.
Three modes:
○ Identifying lemma
○ Identifying POS
○ Predicting full tag
The main problem: correspondences between annotation
schemes.
○ Absence of a category: animacy is present in Mystem, but
absent in TreeTagger.
○ POS standards: participles as a separate part of speech or
verbal forms.
○ Lemmatization: participles as adjectives or verbs.

Conventions
Gold standard tag Accepted tag
transitivity not important
animacy not important
dat2, gen2, loc2, acc2 dat, gen, loc, acc
adnum (count forms), A-NUM NUM
ADV-PRO, A-PRO PRO

Evaluation
Tagger Mode Precision (%) Recall (%) F1
Freeling
lemma 81.98 100 0.902
POS 91.19 98.76 0.948
Full tag 81.31 100 0.897
Pymorphy
lemma 87.8 100 0.950
POS 90.5 99.3 0.947
Full tag 59.2 100 0.743
TreeTagger
lemma 97 93 0.950
POS 95 97 0.960
Full tag 91 98 0.944

Conclusion
○ The taggers show different performance with regard to
different modes of testing.
○ TreeTagger demonstrates the best precision rate and
decent recall.
○ The big goal: build an improved tagger for Russian that
will combine all the forces of other taggers.

Recently uploaded

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Midocean dropshipping via API with DroFxolyaivanovalion

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

Invezz.com - Grow your wealth with trading signalsInvezz1

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

Recently uploaded (20)

Predicting Loan Approval: A Data Science Project

Midocean dropshipping via API with DroFx

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Invezz.com - Grow your wealth with trading signals

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Anomaly detection and data imputation within time series

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Probability Grade 10 Third Quarter Lessons

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Generative AI on Enterprise Cloud with NiFi and Milvus

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Sampling (random) method and Non random.ppt

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

BigBuy dropshipping via API with DroFx.pptx

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Comparison of Taggers

1. Morphological analysis for Russian: integration and comparison of taggers E. Kuzmenko, E. Mustakimova, A. Blazhievskaya T. Arkhangelskiy, S. Toldova National Research University “Higher School of Economics”

2. The two problems ○ Russian is a highly inflective and morphologically rich language. ○ There are many morphological analyzers for Russian. ○ Which one is better? ○ Taggers make errors in different issues: when one analyzer fails, another may guess the correct tag. ○ Do the cases where taggers make errors overlap or not?

3. Features of morphological processing of Russian ○ No standard for part-of-speech annotation; ○ Variety of solutions: ● positional tags following the MULTEXT-East guidelines скалолазание NCNSAI0000 ● combinations of tags employed in the Russian National Corpus <ana lex="год" gr="S,m,inan=sg,gen"/>

4. Taggers ○ Pymorphy2: based on OpenCorpora dictionaries, predictions for unknown words. ○ Freeling: tokenizing, sentence splitting, morphology analyzers with disambiguation, syntax parsing, named entity recognition, etc. ○ TreeTagger: based on decision trees and should be trained on a lexicon and a manually tagged training corpus. ○ MyStem: disambiguation, hypotheses for both known and unknown words.

5. Experimental design We evaluated the performance of our taggers compared to the disambiguated part of the RNC. Three modes: ○ Identifying lemma ○ Identifying POS ○ Predicting full tag The main problem: correspondences between annotation schemes. ○ Absence of a category: animacy is present in Mystem, but absent in TreeTagger. ○ POS standards: participles as a separate part of speech or verbal forms. ○ Lemmatization: participles as adjectives or verbs.

6. Conventions Gold standard tag Accepted tag transitivity not important animacy not important dat2, gen2, loc2, acc2 dat, gen, loc, acc adnum (count forms), A-NUM NUM ADV-PRO, A-PRO PRO

7. Evaluation Tagger Mode Precision (%) Recall (%) F1 Freeling lemma 81.98 100 0.902 POS 91.19 98.76 0.948 Full tag 81.31 100 0.897 Pymorphy lemma 87.8 100 0.950 POS 90.5 99.3 0.947 Full tag 59.2 100 0.743 TreeTagger lemma 97 93 0.950 POS 95 97 0.960 Full tag 91 98 0.944

8. Conclusion ○ The taggers show different performance with regard to different modes of testing. ○ TreeTagger demonstrates the best precision rate and decent recall. ○ The big goal: build an improved tagger for Russian that will combine all the forces of other taggers.

Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Comparison of Taggers

Recommended

Recommended

More Related Content

More from AIST

More from AIST (20)

Recently uploaded

Recently uploaded (20)

Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Comparison of Taggers