SlideShare a Scribd company logo
1 of 14
ANALYSIS OF IMAGES, SOCIAL NETWORKS,
AND TEXTS
Single-sentence Readability Prediction in Russian
Nikolay Karpov, Julia Baranova, Fedor Vitugin
National Research University Higher School of Economics
Ekaterinburg, Russia
Structure

Motivation

Text readability prediction

Single-sentence readability prediction

Single-sentence readability prediction with syntactic features

Conclusion
Motivation
We present a part of a project which aim is to develop a system with
a simplification functionality.

It should be a system for a text adaptation to a target level
readability in Russian language as a foreign language (RFL).

We were solving the identification problem of a source level of
difficulty (readability) of the sentences or texts.

Further step will be their lexical and syntactic simplification.
In this study we give the results of application which identify the
level of difficulty of a single-sentence and whole text using
different statistical and syntactic features
Text readability prediction
First task was to perform the prototyping of Russian text retrieval
with needed readability. The main goal of this process was to find
which kind of variables and classification algorithm would allow us to
obtain the highest indicators of precision and recall of readability
prediction.
There was conducted a series of experiments on the training of
different classification algorithms.

naive Bayes;

k-nearest neighbors;

classification tree;

random forests;

SVM.
We extract 25 variables from texts proposed in the
previous works
Average number of words in the sentence of the text.
Average length of one word in a sentence.
Text length in letters.
Text length in words.
Average sentence length in syllables.
Average word length in syllables.
Percentage of words with number of syllables more or equal to N. We define N as
each value from 3 to 6.
Average sentence length in letters.
Average length of words in letters.
Percentage of words with number of letters more or equal to N. We define N as
each value from 5 to 13.
The percentage of words in a sentence, not included in the active vocabulary of
A1 level.
The percentage of words in a sentence, not included in the active vocabulary of
A2 level.
The percentage of words in a sentence, not included in the active vocabulary of
B1 level.
The occurrence in the sentence of concrete parts of speech.
Text readability prediction
For evaluation we used collection consists of 219 texts divided into
four groups. Levels distribution is following: A1 (elementary – 52),
A2 (basic) – 57, B1 (first) – 60, C2 (difficult) – 50 according to levels
described in Common European Framework of Reference for
Languages (CEFR). A1, A2, B1 texts was created specially for
students by language teachers on the basis of news. С2 texts was an
original news in Russian language.
First experiment was a binary classification of readability:

A1 versus C2,

A2 versus C2,

B1 versus C2.
With the help of Classification Tree, SVM and Logistic Regression
algorithms the accuracy we got was really high, it was almost equal to
1.
Text classification into four levels of readability
Method Classification
accuracy
F-measure Precision Recall
SVM 0.8092 0.7965 0.8491 0.75
Classification
Tree
0.9905 0.9916 1 0.9833
kNN 0.8131 0.7333 0.7333 0.7333
Random
Forest
0.9818 0.9667 0.9667 0.9667
Naive Bayes 0.8726 0.7890 0.8776 0.7167
An example of precision and recall for text retrieval with B1 level of
readability
Classification variables ranked by information gain
ratio
Variable name Information gain ratio
The percentage of words in a sentence, are not
included in the active vocabulary of A1 level
0.105141
The percentage of words in a sentence, are not
included in the active vocabulary of A2 level
0.105141
The percentage of words in a sentence, are not
included in the active vocabulary of B1 level
0.084211
Percentage of words with 8 letters or more 0.040098
Percentage of words with 9 letters or more 0.038431
Percentage of words with 7 letters or more 0.036923
Average sentence length in syllables 0.034359
The average length of one word in a text 0.034359
Percentage of words with 10 letters or more 0.033689
Single-sentence redability prediction

Prototyping sentence classification with respect to its readability.
For result evaluation we use corpus SunTagRus.

Level B1 suits to the majority of our students. So, we created a
binary sentence markup, which is:
1) B1 or lower than B1;
2) Higher than B1.

Manually tagged 3500 sentences in this corpus to mark their
structural level of perception complexity.

Lexical readability for each sentence we obtain on the basis of
lexical vocabulary of B1 level. Defined sentences having more
than 33% words not in active vocabulary as lexically difficult
ones.
Thus, we have two kinds of markup: structural complexity and
lexical difficulty. As an intersection we obtained a total level of
Results of total readability prediction using all kinds
of variables and syntactic links
Method Classification
accuracy
F-measure
(difficult
/simple)
Precision
(difficult
/simple)
Recall (difficult
/simple)
Naive Bayes 0.8191 0.8906/
0.4767
0.8354/
0.6975
0.9537/
0.3621
kNN 0.8224 0.8893/
0.5501
0.8571/
0.6493
0.9241/
0.4772
Random
Forest
0.9443 0.9640/
0.8768
0.9620/
0.8832
0.9661/
0.8705
Classification
Tree
0.9364 0.9584/
0.8648
0.9679/
0.8380
0.9491/
0.8933
SVM 0.8633 0.9125/
0.6875
0.9679/
0.7165
0.9491/
0.6607
Recall value of complex sentences retrieval using
different set of features
Dale-Chall
Flesch-Kincaid
Syntactic links for structural complexity
Total set
0.75
0.8
0.85
0.9
0.95
1
kNN Logistic regression Random Forest Classification Tree
Classification variables of sentenses ranked by
information gain ratio
Variable name Information gain ratio
The percentage of words in a sentence, are not included in
the active vocabulary of B1 level
0.318
Sentence length in letters 0.122
Percentage of words with 3 syllable and more 0.119
Sentence length in syllables 0.118
Sentence length in words 0.098
Syntactic predicative link 0.095
Average words length in syllables 0.092
The average length of one word in a text 0.092
Percentage of words with 7 letters or more 0.069
Percentage of words with 5 letters or more 0.069
Top 10 of classification variables
Conclusion

For text readability prediction obtained results reached 99-98%, so
we can say that they met our needs.

We adapted features from traditional readability prediction
techniques to identify lexical and structural complexity of single-
sentences in Russian.

We tested the readability prediction of Russian sentences using
syntactic links.

Single-sentence readability prediction algorithm was tested on the
set of sentences from SynTagRus, where readability was manually
marked (a binary classification).

Total set of features with statistical, lexical and syntactial ones can
predict sentence readability with 0.9661 amount of recall using
Random Forest algorithm.

Most important features for this classification are lexical ones.
Thank you for your attention
nkarpov@hse.ru

More Related Content

What's hot

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension systemMasa Kato
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingJenny Midwinter
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 

What's hot (13)

Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension system
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 

Viewers also liked

Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...AIST
 
Борис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийБорис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийAIST
 
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...AIST
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemAIST
 
Trends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementTrends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementYury Kupriyanov
 
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...AIST
 
Marina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsMarina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsAIST
 
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...AIST
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisAIST
 
Nikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalNikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalAIST
 
Dmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningDmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningAIST
 
Rostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationRostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationAIST
 
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...AIST
 
Dialogue systems and personal assistants
Dialogue systems and personal assistantsDialogue systems and personal assistants
Dialogue systems and personal assistantsNatalia Konstantinova
 
Alexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAlexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAIST
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data JournalismIrina Radchenko
 
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...AIST
 
Daniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataDaniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataAIST
 

Viewers also liked (18)

Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
 
Борис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийБорис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображений
 
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification system
 
Trends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementTrends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge Management
 
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
 
Marina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsMarina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical texts
 
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysis
 
Nikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalNikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journal
 
Dmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningDmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 Opening
 
Rostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationRostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing Presentation
 
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
 
Dialogue systems and personal assistants
Dialogue systems and personal assistantsDialogue systems and personal assistants
Dialogue systems and personal assistants
 
Alexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAlexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network Analysis
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data Journalism
 
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
 
Daniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataDaniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm data
 

Similar to Nikolay Karpov - Single-sentence readability prediction in russian

Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationAndi Wu
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysisIndexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysiskevig
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy재연 윤
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...cscpconf
 

Similar to Nikolay Karpov - Single-sentence readability prediction in russian (20)

Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysisIndexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
CICLing_2016_paper_52
CICLing_2016_paper_52CICLing_2016_paper_52
CICLing_2016_paper_52
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Presentation
PresentationPresentation
Presentation
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 

More from AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray ImagesAIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныAIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискAIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...AIST
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAAIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeAIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesAIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationAIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsAIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceAIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumAIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 

More from AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 

Recently uploaded

ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 

Recently uploaded (20)

ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 

Nikolay Karpov - Single-sentence readability prediction in russian

  • 1. ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS Single-sentence Readability Prediction in Russian Nikolay Karpov, Julia Baranova, Fedor Vitugin National Research University Higher School of Economics Ekaterinburg, Russia
  • 2. Structure  Motivation  Text readability prediction  Single-sentence readability prediction  Single-sentence readability prediction with syntactic features  Conclusion
  • 3. Motivation We present a part of a project which aim is to develop a system with a simplification functionality.  It should be a system for a text adaptation to a target level readability in Russian language as a foreign language (RFL).  We were solving the identification problem of a source level of difficulty (readability) of the sentences or texts.  Further step will be their lexical and syntactic simplification. In this study we give the results of application which identify the level of difficulty of a single-sentence and whole text using different statistical and syntactic features
  • 4. Text readability prediction First task was to perform the prototyping of Russian text retrieval with needed readability. The main goal of this process was to find which kind of variables and classification algorithm would allow us to obtain the highest indicators of precision and recall of readability prediction. There was conducted a series of experiments on the training of different classification algorithms.  naive Bayes;  k-nearest neighbors;  classification tree;  random forests;  SVM.
  • 5. We extract 25 variables from texts proposed in the previous works Average number of words in the sentence of the text. Average length of one word in a sentence. Text length in letters. Text length in words. Average sentence length in syllables. Average word length in syllables. Percentage of words with number of syllables more or equal to N. We define N as each value from 3 to 6. Average sentence length in letters. Average length of words in letters. Percentage of words with number of letters more or equal to N. We define N as each value from 5 to 13. The percentage of words in a sentence, not included in the active vocabulary of A1 level. The percentage of words in a sentence, not included in the active vocabulary of A2 level. The percentage of words in a sentence, not included in the active vocabulary of B1 level. The occurrence in the sentence of concrete parts of speech.
  • 6. Text readability prediction For evaluation we used collection consists of 219 texts divided into four groups. Levels distribution is following: A1 (elementary – 52), A2 (basic) – 57, B1 (first) – 60, C2 (difficult) – 50 according to levels described in Common European Framework of Reference for Languages (CEFR). A1, A2, B1 texts was created specially for students by language teachers on the basis of news. С2 texts was an original news in Russian language. First experiment was a binary classification of readability:  A1 versus C2,  A2 versus C2,  B1 versus C2. With the help of Classification Tree, SVM and Logistic Regression algorithms the accuracy we got was really high, it was almost equal to 1.
  • 7. Text classification into four levels of readability Method Classification accuracy F-measure Precision Recall SVM 0.8092 0.7965 0.8491 0.75 Classification Tree 0.9905 0.9916 1 0.9833 kNN 0.8131 0.7333 0.7333 0.7333 Random Forest 0.9818 0.9667 0.9667 0.9667 Naive Bayes 0.8726 0.7890 0.8776 0.7167 An example of precision and recall for text retrieval with B1 level of readability
  • 8. Classification variables ranked by information gain ratio Variable name Information gain ratio The percentage of words in a sentence, are not included in the active vocabulary of A1 level 0.105141 The percentage of words in a sentence, are not included in the active vocabulary of A2 level 0.105141 The percentage of words in a sentence, are not included in the active vocabulary of B1 level 0.084211 Percentage of words with 8 letters or more 0.040098 Percentage of words with 9 letters or more 0.038431 Percentage of words with 7 letters or more 0.036923 Average sentence length in syllables 0.034359 The average length of one word in a text 0.034359 Percentage of words with 10 letters or more 0.033689
  • 9. Single-sentence redability prediction  Prototyping sentence classification with respect to its readability. For result evaluation we use corpus SunTagRus.  Level B1 suits to the majority of our students. So, we created a binary sentence markup, which is: 1) B1 or lower than B1; 2) Higher than B1.  Manually tagged 3500 sentences in this corpus to mark their structural level of perception complexity.  Lexical readability for each sentence we obtain on the basis of lexical vocabulary of B1 level. Defined sentences having more than 33% words not in active vocabulary as lexically difficult ones. Thus, we have two kinds of markup: structural complexity and lexical difficulty. As an intersection we obtained a total level of
  • 10. Results of total readability prediction using all kinds of variables and syntactic links Method Classification accuracy F-measure (difficult /simple) Precision (difficult /simple) Recall (difficult /simple) Naive Bayes 0.8191 0.8906/ 0.4767 0.8354/ 0.6975 0.9537/ 0.3621 kNN 0.8224 0.8893/ 0.5501 0.8571/ 0.6493 0.9241/ 0.4772 Random Forest 0.9443 0.9640/ 0.8768 0.9620/ 0.8832 0.9661/ 0.8705 Classification Tree 0.9364 0.9584/ 0.8648 0.9679/ 0.8380 0.9491/ 0.8933 SVM 0.8633 0.9125/ 0.6875 0.9679/ 0.7165 0.9491/ 0.6607
  • 11. Recall value of complex sentences retrieval using different set of features Dale-Chall Flesch-Kincaid Syntactic links for structural complexity Total set 0.75 0.8 0.85 0.9 0.95 1 kNN Logistic regression Random Forest Classification Tree
  • 12. Classification variables of sentenses ranked by information gain ratio Variable name Information gain ratio The percentage of words in a sentence, are not included in the active vocabulary of B1 level 0.318 Sentence length in letters 0.122 Percentage of words with 3 syllable and more 0.119 Sentence length in syllables 0.118 Sentence length in words 0.098 Syntactic predicative link 0.095 Average words length in syllables 0.092 The average length of one word in a text 0.092 Percentage of words with 7 letters or more 0.069 Percentage of words with 5 letters or more 0.069 Top 10 of classification variables
  • 13. Conclusion  For text readability prediction obtained results reached 99-98%, so we can say that they met our needs.  We adapted features from traditional readability prediction techniques to identify lexical and structural complexity of single- sentences in Russian.  We tested the readability prediction of Russian sentences using syntactic links.  Single-sentence readability prediction algorithm was tested on the set of sentences from SynTagRus, where readability was manually marked (a binary classification).  Total set of features with statistical, lexical and syntactial ones can predict sentence readability with 0.9661 amount of recall using Random Forest algorithm.  Most important features for this classification are lexical ones.
  • 14. Thank you for your attention nkarpov@hse.ru