Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Comparison of Taggers
1. Morphological analysis
for Russian:
integration and comparison of taggers
E. Kuzmenko, E. Mustakimova, A. Blazhievskaya
T. Arkhangelskiy, S. Toldova
National Research University “Higher School of Economics”
2. The two problems
○ Russian is a highly inflective and morphologically rich
language.
○ There are many morphological analyzers for Russian.
○ Which one is better?
○ Taggers make errors in different issues: when one analyzer
fails, another may guess the correct tag.
○ Do the cases where taggers make errors overlap or not?
3. Features of morphological
processing of Russian
○ No standard for part-of-speech annotation;
○ Variety of solutions:
● positional tags following the MULTEXT-East
guidelines
скалолазание NCNSAI0000
● combinations of tags employed in the Russian
National Corpus
<ana lex="год" gr="S,m,inan=sg,gen"/>
4. Taggers
○ Pymorphy2: based on OpenCorpora dictionaries, predictions
for unknown words.
○ Freeling: tokenizing, sentence splitting, morphology
analyzers with disambiguation, syntax parsing, named entity
recognition, etc.
○ TreeTagger: based on decision trees and should be trained
on a lexicon and a manually tagged training corpus.
○ MyStem: disambiguation, hypotheses for both known and
unknown words.
5. Experimental design
We evaluated the performance of our taggers compared to the
disambiguated part of the RNC.
Three modes:
○ Identifying lemma
○ Identifying POS
○ Predicting full tag
The main problem: correspondences between annotation
schemes.
○ Absence of a category: animacy is present in Mystem, but
absent in TreeTagger.
○ POS standards: participles as a separate part of speech or
verbal forms.
○ Lemmatization: participles as adjectives or verbs.
6. Conventions
Gold standard tag Accepted tag
transitivity not important
animacy not important
dat2, gen2, loc2, acc2 dat, gen, loc, acc
adnum (count forms), A-NUM NUM
ADV-PRO, A-PRO PRO
7. Evaluation
Tagger Mode Precision (%) Recall (%) F1
Freeling
lemma 81.98 100 0.902
POS 91.19 98.76 0.948
Full tag 81.31 100 0.897
Pymorphy
lemma 87.8 100 0.950
POS 90.5 99.3 0.947
Full tag 59.2 100 0.743
TreeTagger
lemma 97 93 0.950
POS 95 97 0.960
Full tag 91 98 0.944
8. Conclusion
○ The taggers show different performance with regard to
different modes of testing.
○ TreeTagger demonstrates the best precision rate and
decent recall.
○ The big goal: build an improved tagger for Russian that
will combine all the forces of other taggers.