SlideShare a Scribd company logo
1 of 7
Download to read offline
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
10.5121/ijnlc.2014.3303 25
INTER-RATER AGREEMENT STUDY ON
READABILITY ASSESSMENT IN BENGALI
Shanta Phani1
, Shibamouli Lahiri2
and Arindam Biswas3
1
Information Technology, IIEST, Shibpur,Howrah 711103, West Bengal, India
2
Computer Science and Engineering, University of North Texas, Denton, TX 76207,
USA
3
Information Technology, IIEST, Shibpur,Howrah 711103, West Bengal, India
ABSTRACT
An inter-rater agreement study is performed for readability assessment in Bengali. A 1-7 rating scale was
used to indicate different levels of readability. We obtained moderate to fair agreement among seven
independent annotators on 30 text passages written by four eminent Bengali authors. As a by product of
our study, we obtained a readability-annotated ground truth dataset in Bengali.
KEYWORDS
Readability, Annotator, Inter-annotator agreement, Correlation Coefficient
1. INTRODUCTION
Readability refers to the ease with which a given piece of natural language text can be read and
understood. Intuitively, readability emerges from an interaction between the reader and the text,
and depends on the prior knowledge of the reader, his/her reading skills, interest, and motivation
[23]. Although it may seem that automatic assessment of readability would be a very complicated
process, as it turns out, fairly effective readability scoring can be achieved by means of several
lowlevel features.
Readability has many important applications, such as assessing the quality of student essays (one
of the original applications of readability scoring), designing educational materials for
schoolchildren and second-language learners, moderating newspaper content to convey
information more clearly and effectively, and standardizing the language-learning experience
of different age groups. Readability (“reading ease”) and its converse – reading difficulty – are
associated with different grade levels in school. It is generally observed that students from higher
grade levels can write and comprehend texts with greater reading difficulty than students from
lower grade levels. A lot of studies in readability therefore focused on correlating readability
scores with grade levels, or even predicting grade levels from readability-oriented features.
Existing methods of readability assessment look into a handful of low-level signals such as
average sentence length (ASL), average word length in syllables (AWL), percentage of difficult
words, and number of polysyllabic words. Early studies used word frequency lists to identify
difficult words. Recently, readability evaluation has been tackled as a supervised machine
learning problem [12], [17], [22].
There have been many different studies on readability assessment in English (cf. Section 2).
Bengali has received much less attention owing to inadequate resources and a lack of robust
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
26
natural language processing tools. It is only very recently that some groups of researchers looked
into readability assessment in Bengali. They observed that English readability formulas did not
work well on Bengali texts [11], [21]. This observation is not surprising, because Bengali is very
different than English. Bengali is a highly inflected language, follows subject-object-verb
ordering in sentences, and has a rich morphology. Further, Bengali shows word compounding and
diglossia, i.e. formal and informal language variants (sadhu bhasha and cholit bhasha). All these
factors complicate readability scoring in Bengali. Since the concept of readability is highly
subjective and reader-dependent, it is necessary to find out how much two native Bengali
speakers agree on the readability level of a piece of text. Generalizing from there, we performed
an inter-rater agreement study on readability assessment in Bengali. This study not only enables
us to see how much human annotators agree on readability assessment, but also shows how
difficult it is for humans to assign consistent readability scores. Since Bengali is very different
than English, we want to see if (and how) readability is affected by the peculiarities of the
language. As a by-product of this study, we obtained a human-annotated gold standard dataset for
readability evaluation in Bengali. The rest of this paper is organized as follows. We briefly
discuss related studies in Section 2, followed by a discussion of our dataset and annotation
scheme in Section 3. Experimental results are described in Section 4, along with their explanation
and observations. Section 5 concludes the paper with contributions, limitations, and further
research directions.
Table 2. Mean Readability Rating and Standard Deviation of 30 Text Passages
Mean Rating Standard
Deviation
2.0 0.52
2 0.75
2 1.26
2 1.17
3 0.98
4 0.75
2 1.10
4 1.47
4 1.26
4 1.05
5 1.05
3 0.98
3 1.33
4 1.52
3 0.75
4 1.60
4 1.50
4 1.72
3 1.03
5 1.76
4 1.17
3 1.03
5 1.75
4 1.67
3 1.55
3 0.84
6 1.50
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
27
5 1.50
5 1.17
4 1.03
2. RELATED WORK
Readability scoring in English has a long and rich history, starting with the work of L. A.
Sherman in the late nineteenth century [20]. Among the early readability formulas were Flesch
Reading Ease [7], Dale-Chall Formula [5], Automated Readability Index [19], Gunning Fog
Index [9], SMOG score [16], and Coleman-Liau Index [2]. These early indices were based on
simple features like average number of characters, words, syllables and sentences, number of
difficult and polysyllabic words, etc. Albeit simple, these readability indices were surprisingly
good predictors of a reader’s grade level. Two different lines of work focused on children and
adult readability formulas. Recently Lahiri et al. showed moderate correlation between readability
indices and formality score ([10]) in four different domains [14].
Sinha et al. classified English readability formulas into three broad categories – traditional
methods, cognitively motivated methods, and machine learning methods [21]. Traditional
methods assess readability using surface features and shallow linguistic features such as the ones
mentioned in the preceding paragraph. Cognitively motivated methods take into account the
cohesion and coherence of text, its latent topic structure, Kintsch’s propositions, etc [1], [8], [13].
Finally, machine learning methods utilize sophisticated structures such as language models [3],
[4], [18], query logs [15], and several other features to predict the readability of open- domain
text data.
There are very few studies on readability assessment in Bengali texts. We found only three lines
of work that specifically looked into Bengali readability [6], [11], [21]. Das and Roychoudhury
worked with a miniature model of two parameters in their pioneering study [6]. They found that
the two-parameter model was a better predictor of readability than the one-parameter model.
Note, however, that Das and Roychoudhury’s corpus was small (only seven documents), thereby
calling into question the validity of their results. Sinha et al. alleviated these problems by
considering six parameters instead of just two [21]. They further showed that English readability
indices were inadequate for Bengali, and built their own readability model on 16 texts. Around
the same time, Islam et al. independently reached the same conclusion [11]. They designed a
Bengali readability classifier on lexical and information-theoretic features, resulting in an F-score
50% higher than that from traditional scoring approaches.
While all the above studies are very important and insightful, none of them explicitly performed
an inter-rater agreement study. For reasons mentioned in Section 1, an inter-rater agreement study
is very important when we talk about readability assessment. Further, none of these studies made
available their readability-annotated gold standard datasets, thereby stymieing further research.
We attempt to bridge these gaps in our work.
3. METHODOLOGY
We collected a corpus of 30 Bengali text passages. The passages were randomly selected from the
writings of four eminent Bengali authors – Rabindranath Tagore (1861-1941), Sarat Chandra
Chattopadhyay (1876-1938), Bankim Chandra Chattopadhyay (1838-1894), and Bibhutibhushan
Bandyopadhyay (1894-1950). We ensured that samples from both sadhu bhasha as well as cholit
bhasha were incorporated in our corpus. We also ensured that we had both adult text as well as
children’s text in the mix. The number of passages from different authors is shown in Table 1.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
28
Table 1 also shows the number of passages in sadhu bhasha and cholit bhasha. Note that there are
almost twice as many passages in sadhu bhasha as in cholit bhasha.
Table 1: Number of Bengali text passages from different Authors, and from two different bengali language
forms: Sadhu bhasha and Cholit bhasha
Author Sadhu
Bhasha
Chalit
Bhasha
Total
Rabindranath Tagore 8 4 12
Sarat Chandra Chattopadhyay 6 3 9
Bankim Chandra Chattopadhyay 6 0 6
Bibhutibhusan Bandopadhyay 1 2 3
Total 21 9 30
We assigned the 30 text passages to seven independent annotators. The annotators were 30 to 35
years of age; they were from a similar educational background and socio-economic milieu; there
were four female and three male annotators; and they all were native speakers of Bengali.
Annotators were asked to assign a readability rating to each of the 30 passages.
The rating scale was as follows:
1) Very easy to read
2) Easy to read
3) Somewhat easy to read
4) In-between
5) Somewhat difficult to read
6) Difficult to read
7) Very difficult to read
This rating scale reflects the fact that readability is not a binary/ternary variable; it is an ordinal
variable. We further collected the data on whether the annotators were avid readers of Bengali or
not. Each annotator rated every passage. Note that readability annotation in Bengali is challenging
because passages written in sadhu bhasha tend to be harder to read than those written in cholit
bhasha. Since our dataset contains both sadhu bhasha and cholit bhasha, maintaining consistency
in readability rating becomes a big issue.
4. RESULTS
Table 2 gives the mean readability rating of the 30 text passages, along with their standard
deviations. These ratings are averages over seven independent annotations. Note from Table 2
that none of the mean ratings is 1 or 7. In other words, mean ratings never reach the extreme
readability values. This phenomenon is known as the central tendency bias. Note also that the
standard deviations are not very high, which should be intuitive because the rating scale varies
between 1 and 7.Agreement among the annotators was measured by Cohen’s kappa (κ) and
Spearman’s rank correlation coefficient (ρ). Table 3 shows the pairwise κ values among different
annotators, and Table 4 gives the pairwise ρ values. Both tables are symmetric around the main
diagonal. Note from Table 3 that 11 out of 21 κ values fall within the range [0:5; 0:8]. Table 4
shows that 13 out of 21 ρ values are within the range [0:5; 0:8], and one ρ value is greater than
0.8. This indicates moderate to fair agreement among different annotators. This observation in
turn indicates that human annotators agree pretty well on Bengali readability scoring.
Table 3. Cohen’s Kappa Between Different Annotators
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
29
Annotato
r1
Annotato
r2
Annotato
r3
Annotato
r4
Annotato
r5
Annotato
r6
Annotato
r7
Annotat
or1
1.00 0.51 0.26 0.60 0.12 0.51 0.50
Annotat
or2
0.51 1.00 0.45 0.64 0.12 0.56 0.53
Annotat
or3
0.26 0.45 1.00 0.43 -0.03 0.49 0.54
Annotat
or4
0.60 0.64 0.43 1.00 0.09 0.58 0.74
Annotat
or5
0.12 0.12 -0.03 0.09 1.00 0.13 0.02
Annotat
or6
0.51 0.56 0.49 0.58 0.13 1.00 0.57
Annotat
or7
0.50 0.53 0.54 0.74 0.02 0.57 1.00
Table 4. Spearman Rank Correlation between Different Annotators
Annotato
r1
Annotato
r2
Annotato
r3
Annotato
r4
Annotato
r5
Annotato
r6
Annotato
r7
Annotato
r1
1.00 0.62 0.34 0.70 0.16 0.60 0.60
Annotato
r2
0.62 1.00 0.52 0.75 0.14 0.65 0.63
Annotato
r3
0.34 0.52 1.00 0.53 -0.04 0.59 0.64
Annotato
r4
0.70 0.75 0.53 1.00 0.12 0.68 0.83
Annotato
r5
0.16 0.14 -0.04 0.12 1.00 0.15 0.02
Annotato
r6
0.60 0.65 0.59 0.68 0.15 1.00 0.66
Annotato
r7
0.60 0.63 0.64 0.83 0.02 0.66 1.00
5. CONCLUSION
We performed an inter-rater agreement study for readability assessment in Bengali. This is the
first time such an agreement study has been performed. We obtained moderate to fair agreement
among seven independent annotators on 30 text passages written by four eminent Bengali authors.
As a byproduct of this study, we obtained a gold standard human annotated readability dataset for
Bengali. We plan to release this dataset for future research. We are working on readability
modeling in Bengali, and this dataset will be very helpful. An important limitation of our study is
the small corpus size. We only have 30 annotated passages at our disposal, whereas Islam et al.
[11] had around 300. But Islam et al.’s dataset is not annotated in as fine-grained a fashion as
ours. Note also that our dataset is larger than both Sinha et al.’s 16document dataset [21], and Das
and Roychoudhury’s seven document dataset [6]. We plan to increase the size of our dataset in
future.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
30
REFERENCES
[1] B. K. Britton and S. Gulg oz, “Using Kintsch’s computational model to improve instructional text:
Effects of repairing inference calls on recall and cognitive structures,” Journal of Educational
Psychology, vol. 83, no. 3, p. 329, 1991.
[2] M. Coleman and T. L. Liau, “A computer readability formula designed for machine scoring,” Journal
of Applied Psychology, vol. 60, no. 2, p. 283, 1975.
[3] K. Collins-Thompson and J. P. Callan, “A Language Modeling Approach to Predicting Reading
Difficulty,” in HLT-NAACL, 2004, pp. 193–200.
[4] K. Collins-Thompson and J. Callan, “Predicting reading difficulty with statistical language models,”
Journal of the American Society for Information Science and Technology, vol. 56, no. 13, pp. 1448–
1462, 2005.
[5] E. Dale and J. S. Chall, “A formula for predicting readability,” Educational research bulletin, pp. 11–
28, 1948.
[6] S. Das and R. Roychoudhury, “Readability Modelling and Comparison of One and Two Parametric
Fit: A Case Study in Bangla,” Journal of Quantitative Linguistics, vol. 13, no. 1, pp. 17– 34, 2006.
[Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/09296170500500843
[7] R. Flesch, “A new readability yardstick,” Journal of applied psychology, vol. 32, no. 3, p. 221, 1948.
[8] P. W. Foltz, W. Kintsch, and T. K. Landauer, “The measurement of textual coherence with latent
semantic analysis,” Discourse processes, vol. 25, no. 2-3, pp. 285–307, 1998.
[9] R. Gunning, The technique of clear writing. McGraw-Hill New York, 1968.
[10] F. Heylighen and J.-M. Dewaele, “Formality of Language: definition, measurement and behavioral
determinants,” Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brussel, 1999.
[11] Z. Islam, A. Mehler, and R. Rahman, “Text Readability Classification of Textbooks of a Low-
Resource Language,” in Proceedings of the 26th Pacific Asia Conference on Language, Information,
and Computation. Bali,Indonesia: Faculty of Computer Science, Universitas Indonesia, November
2012, pp. 545–553. [Online]. Available: http://www.aclweb.org/anthology/Y12-1059
[12] R. J. Kate, X. Luo, S. Patwardhan, M. Franz, R. Florian, R. J. Mooney, S. Roukos, and C. Welty,
“Learning to predict readability using diverse linguistic features,” in Proceedings of the 23rd
International Conference on Computational Linguistics. Association for Computational Linguistics,
2010, pp. 546–554.
[13] W. Kintsch and T. A. Van Dijk, “Toward a model of text comprehension and production,”
Psychological review, vol. 85, no. 5, p. 363, 1978.
[14] S. Lahiri, P. Mitra, and X. Lu, “Informality Judgment at Sentence Level and Experiments with
Formality Score,” in Proceedings of the 12th International Conference on Computational Linguistics
and Intelligent Text Processing - Volume Part II, ser. CICLing’11. Berlin, Heidelberg: Springer-
Verlag, 2011, pp. 446–457. [Online]. Available: http://dl.acm.org/citation.cfm?id=1964750.1964792
[15] X. Liu, W. B. Croft, P. Oh, and D. Hart, “Automatic recognition of reading levels from user queries,”
in Proceedings of the 27th annual international ACM SIGIR conference on Research and
development in information retrieval. ACM, 2004, pp. 548–549.
[16] G. H. McLaughlin, “SMOG grading: A new readability formula,” Journal of reading, vol. 12, no. 8,
pp. 639–646, 1969.
[18] S. E. Schwarm and M. Ostendorf, “Reading level assessment using support vector machines and
statistical language models,” in Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics. Association for Computational Linguistics, 2005, pp. 523–530.
[19] R. J. Senter and E. A. Smith, “Automated readability index,” DTIC Document, Tech. Rep., 1967.
[20] L. A. Sherman, Analytics of literature: A manual for the objective study of English prose and poetry.
Ginn, 1893.
[21] M. Sinha, S. Sharma, T. Dasgupta, and A. Basu, “New Readability Measures for Bangla and Hindi
Texts,” in COLING (Posters), 2012, pp. 1141–1150.
[22] T. vor der Bruck, S. Hartrumpf, and H. Helbig, “A Readability Checker with Supervised Learning
Using Deep Indicators,” Informatica (Slovenia), vol. 32, no. 4, pp. 429–435, 2008.
[23] Wikipedia, “Readability — Wikipedia, The Free Encyclopedia,” 2004, [Online; accessed 1-January-
2014]. [Online]. Available: http: //en.wikipedia.org/wiki/Readability
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
31
Authors
Shanta Phani: She is a PhD student in IIEST, Kolkata. She has done her M.Tech in
Computer Science and Engineering, from West Bengal University of Technology, in 2009.
Her research interests are in image processing, Natural Language Processing, especially
Authorship Attribution, style identification, etc for Bengali language.
Shibamouli Lahiri: He is currently a PhD student at University of North Texas. He
completed his Master of Engineering from Pennsylvania State University in 2012. His
primary research interests are in Natural Language Processing, Data Mining and Machine
Learning. He has worked in many different problems in NLP, including but not limited to
keyword extraction, multi-document summarization, formality analysis, authorship
attribution, and native language identification. His research involves the application of
machine learning and complex network analysis techniques to different problems in Natural Language
Processing.
Dr. Arindam Biswas: He has done is doctors from ISI Kolkata. He has over 50 published
papers. He works in the area of Digital Geometry, Medical Image Analysis, and Natural
Language Processing.

More Related Content

What's hot

Attitude od teachers_and_students_towards_classroom_code_switching-libre
Attitude od teachers_and_students_towards_classroom_code_switching-libreAttitude od teachers_and_students_towards_classroom_code_switching-libre
Attitude od teachers_and_students_towards_classroom_code_switching-libreStoic Mills
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGEijnlc
 
Cmc assignment, adilla's group, sec2
Cmc assignment, adilla's group, sec2Cmc assignment, adilla's group, sec2
Cmc assignment, adilla's group, sec2Wan Aliaa
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSijnlc
 
Method effects on reading comprehension test performance
Method effects on reading comprehension test performanceMethod effects on reading comprehension test performance
Method effects on reading comprehension test performanceMasoomehVedadtaghavi
 
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...هارئ Kya-Habib Nadia-Dayyan
 
The use of personal pronouns: A comparison between Iranian and Malaysian dyads
The use of personal pronouns: A comparison between Iranian and Malaysian dyadsThe use of personal pronouns: A comparison between Iranian and Malaysian dyads
The use of personal pronouns: A comparison between Iranian and Malaysian dyadsJames Cook University
 
Sub-skills in reading comprehension tests
Sub-skills in reading comprehension testsSub-skills in reading comprehension tests
Sub-skills in reading comprehension testsCindy Shen
 
Proposal semantics hyponim
Proposal semantics hyponimProposal semantics hyponim
Proposal semantics hyponimAni Istiana
 
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...Yasser Al-Shboul
 
The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...
 The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide... The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...
The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...English Literature and Language Review ELLR
 
The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...
 The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M... The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...
The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...English Literature and Language Review ELLR
 
Automatic measures of cohesion and lexical proficiency
Automatic measures of cohesion and lexical proficiencyAutomatic measures of cohesion and lexical proficiency
Automatic measures of cohesion and lexical proficiencyAnik Utomo
 

What's hot (19)

Attitude od teachers_and_students_towards_classroom_code_switching-libre
Attitude od teachers_and_students_towards_classroom_code_switching-libreAttitude od teachers_and_students_towards_classroom_code_switching-libre
Attitude od teachers_and_students_towards_classroom_code_switching-libre
 
Relationship between Creativity and Tolerance of Ambiguity to Understand Meta...
Relationship between Creativity and Tolerance of Ambiguity to Understand Meta...Relationship between Creativity and Tolerance of Ambiguity to Understand Meta...
Relationship between Creativity and Tolerance of Ambiguity to Understand Meta...
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
 
Cmc assignment, adilla's group, sec2
Cmc assignment, adilla's group, sec2Cmc assignment, adilla's group, sec2
Cmc assignment, adilla's group, sec2
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
 
Method effects on reading comprehension test performance
Method effects on reading comprehension test performanceMethod effects on reading comprehension test performance
Method effects on reading comprehension test performance
 
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...
The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Co...
 
The use of personal pronouns: A comparison between Iranian and Malaysian dyads
The use of personal pronouns: A comparison between Iranian and Malaysian dyadsThe use of personal pronouns: A comparison between Iranian and Malaysian dyads
The use of personal pronouns: A comparison between Iranian and Malaysian dyads
 
E434145.pdf
E434145.pdfE434145.pdf
E434145.pdf
 
Wistner
WistnerWistner
Wistner
 
Sub-skills in reading comprehension tests
Sub-skills in reading comprehension testsSub-skills in reading comprehension tests
Sub-skills in reading comprehension tests
 
Proposal semantics hyponim
Proposal semantics hyponimProposal semantics hyponim
Proposal semantics hyponim
 
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...
A Study on the Perception of Jordanian EFL Learners’ Pragmatic Transfer of Re...
 
D3123741.pdf
D3123741.pdfD3123741.pdf
D3123741.pdf
 
E3124253.pdf
E3124253.pdfE3124253.pdf
E3124253.pdf
 
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغةالمجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
 
The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...
 The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide... The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...
The Influence of Content Schema on L2 Learners’ Reading Comprehension: Evide...
 
The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...
 The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M... The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...
The Production of Mandarin Voiceless Sibilant Fricatives by Late Cantonese-M...
 
Automatic measures of cohesion and lexical proficiency
Automatic measures of cohesion and lexical proficiencyAutomatic measures of cohesion and lexical proficiency
Automatic measures of cohesion and lexical proficiency
 

Similar to Inter rater agreement study on readability assessment in bengali

An Evaluation of the New Interchange Series
An Evaluation of the New Interchange SeriesAn Evaluation of the New Interchange Series
An Evaluation of the New Interchange SeriesAJHSSR Journal
 
A Comparative Study of American English File and New Headway English Course
A Comparative Study of American English File and New Headway English CourseA Comparative Study of American English File and New Headway English Course
A Comparative Study of American English File and New Headway English CourseAJHSSR Journal
 
An Analysis Of The Cohesion And Coherence Of The Students Narrative Writings...
An Analysis Of The Cohesion And Coherence Of The Students  Narrative Writings...An Analysis Of The Cohesion And Coherence Of The Students  Narrative Writings...
An Analysis Of The Cohesion And Coherence Of The Students Narrative Writings...Anita Miller
 
Reading fluency as an indicator of reading comprehension
Reading fluency as an indicator of reading comprehensionReading fluency as an indicator of reading comprehension
Reading fluency as an indicator of reading comprehensionmizzyatie14
 
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...English Literature and Language Review ELLR
 
English reading strategic instructions effectiveness on reading comprehension
English reading strategic instructions effectiveness on reading comprehensionEnglish reading strategic instructions effectiveness on reading comprehension
English reading strategic instructions effectiveness on reading comprehensionAhmad Mashhood
 
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptx
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptxJOURANA ARTICLE REVIEW PRESENTATION - Copy.pptx
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptxDESTAWWAGNEW
 
A longitudinal multidimensional analysis of EAP writing Determining EAP cour...
A longitudinal multidimensional analysis of EAP writing  Determining EAP cour...A longitudinal multidimensional analysis of EAP writing  Determining EAP cour...
A longitudinal multidimensional analysis of EAP writing Determining EAP cour...Sarah Marie
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...ijnlc
 
The impact of personality traits on the affective category of english languag...
The impact of personality traits on the affective category of english languag...The impact of personality traits on the affective category of english languag...
The impact of personality traits on the affective category of english languag...Dr. Seyed Hossein Fazeli
 
Factors Responsible for Poor English Reading Comprehension at Secondary Level
Factors Responsible for Poor English Reading Comprehension at Secondary LevelFactors Responsible for Poor English Reading Comprehension at Secondary Level
Factors Responsible for Poor English Reading Comprehension at Secondary LevelBahram Kazemian
 
Corpus based research on the development of theme choices in chinese learners...
Corpus based research on the development of theme choices in chinese learners...Corpus based research on the development of theme choices in chinese learners...
Corpus based research on the development of theme choices in chinese learners...Alexander Decker
 
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...Christine May Petajen-Brillantes
 
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...
A Comparison Of Freshman And Sophomore EFL Students  Written Performance Thro...A Comparison Of Freshman And Sophomore EFL Students  Written Performance Thro...
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...Bryce Nelson
 
The correlation between vocabulary mastery and reading comprehension
The correlation between vocabulary mastery and reading comprehensionThe correlation between vocabulary mastery and reading comprehension
The correlation between vocabulary mastery and reading comprehensionfitri_international_office
 
The Effect of Schema Theory on Reading Comprehension
The Effect of Schema Theory on Reading ComprehensionThe Effect of Schema Theory on Reading Comprehension
The Effect of Schema Theory on Reading ComprehensionDhe Dhe Sulistio
 

Similar to Inter rater agreement study on readability assessment in bengali (20)

An Evaluation of the New Interchange Series
An Evaluation of the New Interchange SeriesAn Evaluation of the New Interchange Series
An Evaluation of the New Interchange Series
 
A Comparative Study of American English File and New Headway English Course
A Comparative Study of American English File and New Headway English CourseA Comparative Study of American English File and New Headway English Course
A Comparative Study of American English File and New Headway English Course
 
An Analysis Of The Cohesion And Coherence Of The Students Narrative Writings...
An Analysis Of The Cohesion And Coherence Of The Students  Narrative Writings...An Analysis Of The Cohesion And Coherence Of The Students  Narrative Writings...
An Analysis Of The Cohesion And Coherence Of The Students Narrative Writings...
 
Reading fluency as an indicator of reading comprehension
Reading fluency as an indicator of reading comprehensionReading fluency as an indicator of reading comprehension
Reading fluency as an indicator of reading comprehension
 
B240715
B240715B240715
B240715
 
Measuring English Language Self-Efficacy: Psychometric Properties and Use
Measuring English Language Self-Efficacy: Psychometric Properties and UseMeasuring English Language Self-Efficacy: Psychometric Properties and Use
Measuring English Language Self-Efficacy: Psychometric Properties and Use
 
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
 
English reading strategic instructions effectiveness on reading comprehension
English reading strategic instructions effectiveness on reading comprehensionEnglish reading strategic instructions effectiveness on reading comprehension
English reading strategic instructions effectiveness on reading comprehension
 
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptx
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptxJOURANA ARTICLE REVIEW PRESENTATION - Copy.pptx
JOURANA ARTICLE REVIEW PRESENTATION - Copy.pptx
 
A longitudinal multidimensional analysis of EAP writing Determining EAP cour...
A longitudinal multidimensional analysis of EAP writing  Determining EAP cour...A longitudinal multidimensional analysis of EAP writing  Determining EAP cour...
A longitudinal multidimensional analysis of EAP writing Determining EAP cour...
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
 
The impact of personality traits on the affective category of english languag...
The impact of personality traits on the affective category of english languag...The impact of personality traits on the affective category of english languag...
The impact of personality traits on the affective category of english languag...
 
Factors Responsible for Poor English Reading Comprehension at Secondary Level
Factors Responsible for Poor English Reading Comprehension at Secondary LevelFactors Responsible for Poor English Reading Comprehension at Secondary Level
Factors Responsible for Poor English Reading Comprehension at Secondary Level
 
S.nazari bibliography
S.nazari bibliographyS.nazari bibliography
S.nazari bibliography
 
Thesis malyn
Thesis malynThesis malyn
Thesis malyn
 
Corpus based research on the development of theme choices in chinese learners...
Corpus based research on the development of theme choices in chinese learners...Corpus based research on the development of theme choices in chinese learners...
Corpus based research on the development of theme choices in chinese learners...
 
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...
Writing Proficiency of Junior Bachelor of Secondary Education (BSED) and Bach...
 
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...
A Comparison Of Freshman And Sophomore EFL Students  Written Performance Thro...A Comparison Of Freshman And Sophomore EFL Students  Written Performance Thro...
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...
 
The correlation between vocabulary mastery and reading comprehension
The correlation between vocabulary mastery and reading comprehensionThe correlation between vocabulary mastery and reading comprehension
The correlation between vocabulary mastery and reading comprehension
 
The Effect of Schema Theory on Reading Comprehension
The Effect of Schema Theory on Reading ComprehensionThe Effect of Schema Theory on Reading Comprehension
The Effect of Schema Theory on Reading Comprehension
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Inter rater agreement study on readability assessment in bengali

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 10.5121/ijnlc.2014.3303 25 INTER-RATER AGREEMENT STUDY ON READABILITY ASSESSMENT IN BENGALI Shanta Phani1 , Shibamouli Lahiri2 and Arindam Biswas3 1 Information Technology, IIEST, Shibpur,Howrah 711103, West Bengal, India 2 Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA 3 Information Technology, IIEST, Shibpur,Howrah 711103, West Bengal, India ABSTRACT An inter-rater agreement study is performed for readability assessment in Bengali. A 1-7 rating scale was used to indicate different levels of readability. We obtained moderate to fair agreement among seven independent annotators on 30 text passages written by four eminent Bengali authors. As a by product of our study, we obtained a readability-annotated ground truth dataset in Bengali. KEYWORDS Readability, Annotator, Inter-annotator agreement, Correlation Coefficient 1. INTRODUCTION Readability refers to the ease with which a given piece of natural language text can be read and understood. Intuitively, readability emerges from an interaction between the reader and the text, and depends on the prior knowledge of the reader, his/her reading skills, interest, and motivation [23]. Although it may seem that automatic assessment of readability would be a very complicated process, as it turns out, fairly effective readability scoring can be achieved by means of several lowlevel features. Readability has many important applications, such as assessing the quality of student essays (one of the original applications of readability scoring), designing educational materials for schoolchildren and second-language learners, moderating newspaper content to convey information more clearly and effectively, and standardizing the language-learning experience of different age groups. Readability (“reading ease”) and its converse – reading difficulty – are associated with different grade levels in school. It is generally observed that students from higher grade levels can write and comprehend texts with greater reading difficulty than students from lower grade levels. A lot of studies in readability therefore focused on correlating readability scores with grade levels, or even predicting grade levels from readability-oriented features. Existing methods of readability assessment look into a handful of low-level signals such as average sentence length (ASL), average word length in syllables (AWL), percentage of difficult words, and number of polysyllabic words. Early studies used word frequency lists to identify difficult words. Recently, readability evaluation has been tackled as a supervised machine learning problem [12], [17], [22]. There have been many different studies on readability assessment in English (cf. Section 2). Bengali has received much less attention owing to inadequate resources and a lack of robust
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 26 natural language processing tools. It is only very recently that some groups of researchers looked into readability assessment in Bengali. They observed that English readability formulas did not work well on Bengali texts [11], [21]. This observation is not surprising, because Bengali is very different than English. Bengali is a highly inflected language, follows subject-object-verb ordering in sentences, and has a rich morphology. Further, Bengali shows word compounding and diglossia, i.e. formal and informal language variants (sadhu bhasha and cholit bhasha). All these factors complicate readability scoring in Bengali. Since the concept of readability is highly subjective and reader-dependent, it is necessary to find out how much two native Bengali speakers agree on the readability level of a piece of text. Generalizing from there, we performed an inter-rater agreement study on readability assessment in Bengali. This study not only enables us to see how much human annotators agree on readability assessment, but also shows how difficult it is for humans to assign consistent readability scores. Since Bengali is very different than English, we want to see if (and how) readability is affected by the peculiarities of the language. As a by-product of this study, we obtained a human-annotated gold standard dataset for readability evaluation in Bengali. The rest of this paper is organized as follows. We briefly discuss related studies in Section 2, followed by a discussion of our dataset and annotation scheme in Section 3. Experimental results are described in Section 4, along with their explanation and observations. Section 5 concludes the paper with contributions, limitations, and further research directions. Table 2. Mean Readability Rating and Standard Deviation of 30 Text Passages Mean Rating Standard Deviation 2.0 0.52 2 0.75 2 1.26 2 1.17 3 0.98 4 0.75 2 1.10 4 1.47 4 1.26 4 1.05 5 1.05 3 0.98 3 1.33 4 1.52 3 0.75 4 1.60 4 1.50 4 1.72 3 1.03 5 1.76 4 1.17 3 1.03 5 1.75 4 1.67 3 1.55 3 0.84 6 1.50
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 27 5 1.50 5 1.17 4 1.03 2. RELATED WORK Readability scoring in English has a long and rich history, starting with the work of L. A. Sherman in the late nineteenth century [20]. Among the early readability formulas were Flesch Reading Ease [7], Dale-Chall Formula [5], Automated Readability Index [19], Gunning Fog Index [9], SMOG score [16], and Coleman-Liau Index [2]. These early indices were based on simple features like average number of characters, words, syllables and sentences, number of difficult and polysyllabic words, etc. Albeit simple, these readability indices were surprisingly good predictors of a reader’s grade level. Two different lines of work focused on children and adult readability formulas. Recently Lahiri et al. showed moderate correlation between readability indices and formality score ([10]) in four different domains [14]. Sinha et al. classified English readability formulas into three broad categories – traditional methods, cognitively motivated methods, and machine learning methods [21]. Traditional methods assess readability using surface features and shallow linguistic features such as the ones mentioned in the preceding paragraph. Cognitively motivated methods take into account the cohesion and coherence of text, its latent topic structure, Kintsch’s propositions, etc [1], [8], [13]. Finally, machine learning methods utilize sophisticated structures such as language models [3], [4], [18], query logs [15], and several other features to predict the readability of open- domain text data. There are very few studies on readability assessment in Bengali texts. We found only three lines of work that specifically looked into Bengali readability [6], [11], [21]. Das and Roychoudhury worked with a miniature model of two parameters in their pioneering study [6]. They found that the two-parameter model was a better predictor of readability than the one-parameter model. Note, however, that Das and Roychoudhury’s corpus was small (only seven documents), thereby calling into question the validity of their results. Sinha et al. alleviated these problems by considering six parameters instead of just two [21]. They further showed that English readability indices were inadequate for Bengali, and built their own readability model on 16 texts. Around the same time, Islam et al. independently reached the same conclusion [11]. They designed a Bengali readability classifier on lexical and information-theoretic features, resulting in an F-score 50% higher than that from traditional scoring approaches. While all the above studies are very important and insightful, none of them explicitly performed an inter-rater agreement study. For reasons mentioned in Section 1, an inter-rater agreement study is very important when we talk about readability assessment. Further, none of these studies made available their readability-annotated gold standard datasets, thereby stymieing further research. We attempt to bridge these gaps in our work. 3. METHODOLOGY We collected a corpus of 30 Bengali text passages. The passages were randomly selected from the writings of four eminent Bengali authors – Rabindranath Tagore (1861-1941), Sarat Chandra Chattopadhyay (1876-1938), Bankim Chandra Chattopadhyay (1838-1894), and Bibhutibhushan Bandyopadhyay (1894-1950). We ensured that samples from both sadhu bhasha as well as cholit bhasha were incorporated in our corpus. We also ensured that we had both adult text as well as children’s text in the mix. The number of passages from different authors is shown in Table 1.
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 28 Table 1 also shows the number of passages in sadhu bhasha and cholit bhasha. Note that there are almost twice as many passages in sadhu bhasha as in cholit bhasha. Table 1: Number of Bengali text passages from different Authors, and from two different bengali language forms: Sadhu bhasha and Cholit bhasha Author Sadhu Bhasha Chalit Bhasha Total Rabindranath Tagore 8 4 12 Sarat Chandra Chattopadhyay 6 3 9 Bankim Chandra Chattopadhyay 6 0 6 Bibhutibhusan Bandopadhyay 1 2 3 Total 21 9 30 We assigned the 30 text passages to seven independent annotators. The annotators were 30 to 35 years of age; they were from a similar educational background and socio-economic milieu; there were four female and three male annotators; and they all were native speakers of Bengali. Annotators were asked to assign a readability rating to each of the 30 passages. The rating scale was as follows: 1) Very easy to read 2) Easy to read 3) Somewhat easy to read 4) In-between 5) Somewhat difficult to read 6) Difficult to read 7) Very difficult to read This rating scale reflects the fact that readability is not a binary/ternary variable; it is an ordinal variable. We further collected the data on whether the annotators were avid readers of Bengali or not. Each annotator rated every passage. Note that readability annotation in Bengali is challenging because passages written in sadhu bhasha tend to be harder to read than those written in cholit bhasha. Since our dataset contains both sadhu bhasha and cholit bhasha, maintaining consistency in readability rating becomes a big issue. 4. RESULTS Table 2 gives the mean readability rating of the 30 text passages, along with their standard deviations. These ratings are averages over seven independent annotations. Note from Table 2 that none of the mean ratings is 1 or 7. In other words, mean ratings never reach the extreme readability values. This phenomenon is known as the central tendency bias. Note also that the standard deviations are not very high, which should be intuitive because the rating scale varies between 1 and 7.Agreement among the annotators was measured by Cohen’s kappa (κ) and Spearman’s rank correlation coefficient (ρ). Table 3 shows the pairwise κ values among different annotators, and Table 4 gives the pairwise ρ values. Both tables are symmetric around the main diagonal. Note from Table 3 that 11 out of 21 κ values fall within the range [0:5; 0:8]. Table 4 shows that 13 out of 21 ρ values are within the range [0:5; 0:8], and one ρ value is greater than 0.8. This indicates moderate to fair agreement among different annotators. This observation in turn indicates that human annotators agree pretty well on Bengali readability scoring. Table 3. Cohen’s Kappa Between Different Annotators
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 29 Annotato r1 Annotato r2 Annotato r3 Annotato r4 Annotato r5 Annotato r6 Annotato r7 Annotat or1 1.00 0.51 0.26 0.60 0.12 0.51 0.50 Annotat or2 0.51 1.00 0.45 0.64 0.12 0.56 0.53 Annotat or3 0.26 0.45 1.00 0.43 -0.03 0.49 0.54 Annotat or4 0.60 0.64 0.43 1.00 0.09 0.58 0.74 Annotat or5 0.12 0.12 -0.03 0.09 1.00 0.13 0.02 Annotat or6 0.51 0.56 0.49 0.58 0.13 1.00 0.57 Annotat or7 0.50 0.53 0.54 0.74 0.02 0.57 1.00 Table 4. Spearman Rank Correlation between Different Annotators Annotato r1 Annotato r2 Annotato r3 Annotato r4 Annotato r5 Annotato r6 Annotato r7 Annotato r1 1.00 0.62 0.34 0.70 0.16 0.60 0.60 Annotato r2 0.62 1.00 0.52 0.75 0.14 0.65 0.63 Annotato r3 0.34 0.52 1.00 0.53 -0.04 0.59 0.64 Annotato r4 0.70 0.75 0.53 1.00 0.12 0.68 0.83 Annotato r5 0.16 0.14 -0.04 0.12 1.00 0.15 0.02 Annotato r6 0.60 0.65 0.59 0.68 0.15 1.00 0.66 Annotato r7 0.60 0.63 0.64 0.83 0.02 0.66 1.00 5. CONCLUSION We performed an inter-rater agreement study for readability assessment in Bengali. This is the first time such an agreement study has been performed. We obtained moderate to fair agreement among seven independent annotators on 30 text passages written by four eminent Bengali authors. As a byproduct of this study, we obtained a gold standard human annotated readability dataset for Bengali. We plan to release this dataset for future research. We are working on readability modeling in Bengali, and this dataset will be very helpful. An important limitation of our study is the small corpus size. We only have 30 annotated passages at our disposal, whereas Islam et al. [11] had around 300. But Islam et al.’s dataset is not annotated in as fine-grained a fashion as ours. Note also that our dataset is larger than both Sinha et al.’s 16document dataset [21], and Das and Roychoudhury’s seven document dataset [6]. We plan to increase the size of our dataset in future.
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 30 REFERENCES [1] B. K. Britton and S. Gulg oz, “Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures,” Journal of Educational Psychology, vol. 83, no. 3, p. 329, 1991. [2] M. Coleman and T. L. Liau, “A computer readability formula designed for machine scoring,” Journal of Applied Psychology, vol. 60, no. 2, p. 283, 1975. [3] K. Collins-Thompson and J. P. Callan, “A Language Modeling Approach to Predicting Reading Difficulty,” in HLT-NAACL, 2004, pp. 193–200. [4] K. Collins-Thompson and J. Callan, “Predicting reading difficulty with statistical language models,” Journal of the American Society for Information Science and Technology, vol. 56, no. 13, pp. 1448– 1462, 2005. [5] E. Dale and J. S. Chall, “A formula for predicting readability,” Educational research bulletin, pp. 11– 28, 1948. [6] S. Das and R. Roychoudhury, “Readability Modelling and Comparison of One and Two Parametric Fit: A Case Study in Bangla,” Journal of Quantitative Linguistics, vol. 13, no. 1, pp. 17– 34, 2006. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/09296170500500843 [7] R. Flesch, “A new readability yardstick,” Journal of applied psychology, vol. 32, no. 3, p. 221, 1948. [8] P. W. Foltz, W. Kintsch, and T. K. Landauer, “The measurement of textual coherence with latent semantic analysis,” Discourse processes, vol. 25, no. 2-3, pp. 285–307, 1998. [9] R. Gunning, The technique of clear writing. McGraw-Hill New York, 1968. [10] F. Heylighen and J.-M. Dewaele, “Formality of Language: definition, measurement and behavioral determinants,” Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brussel, 1999. [11] Z. Islam, A. Mehler, and R. Rahman, “Text Readability Classification of Textbooks of a Low- Resource Language,” in Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation. Bali,Indonesia: Faculty of Computer Science, Universitas Indonesia, November 2012, pp. 545–553. [Online]. Available: http://www.aclweb.org/anthology/Y12-1059 [12] R. J. Kate, X. Luo, S. Patwardhan, M. Franz, R. Florian, R. J. Mooney, S. Roukos, and C. Welty, “Learning to predict readability using diverse linguistic features,” in Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010, pp. 546–554. [13] W. Kintsch and T. A. Van Dijk, “Toward a model of text comprehension and production,” Psychological review, vol. 85, no. 5, p. 363, 1978. [14] S. Lahiri, P. Mitra, and X. Lu, “Informality Judgment at Sentence Level and Experiments with Formality Score,” in Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, ser. CICLing’11. Berlin, Heidelberg: Springer- Verlag, 2011, pp. 446–457. [Online]. Available: http://dl.acm.org/citation.cfm?id=1964750.1964792 [15] X. Liu, W. B. Croft, P. Oh, and D. Hart, “Automatic recognition of reading levels from user queries,” in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2004, pp. 548–549. [16] G. H. McLaughlin, “SMOG grading: A new readability formula,” Journal of reading, vol. 12, no. 8, pp. 639–646, 1969. [18] S. E. Schwarm and M. Ostendorf, “Reading level assessment using support vector machines and statistical language models,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005, pp. 523–530. [19] R. J. Senter and E. A. Smith, “Automated readability index,” DTIC Document, Tech. Rep., 1967. [20] L. A. Sherman, Analytics of literature: A manual for the objective study of English prose and poetry. Ginn, 1893. [21] M. Sinha, S. Sharma, T. Dasgupta, and A. Basu, “New Readability Measures for Bangla and Hindi Texts,” in COLING (Posters), 2012, pp. 1141–1150. [22] T. vor der Bruck, S. Hartrumpf, and H. Helbig, “A Readability Checker with Supervised Learning Using Deep Indicators,” Informatica (Slovenia), vol. 32, no. 4, pp. 429–435, 2008. [23] Wikipedia, “Readability — Wikipedia, The Free Encyclopedia,” 2004, [Online; accessed 1-January- 2014]. [Online]. Available: http: //en.wikipedia.org/wiki/Readability
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 31 Authors Shanta Phani: She is a PhD student in IIEST, Kolkata. She has done her M.Tech in Computer Science and Engineering, from West Bengal University of Technology, in 2009. Her research interests are in image processing, Natural Language Processing, especially Authorship Attribution, style identification, etc for Bengali language. Shibamouli Lahiri: He is currently a PhD student at University of North Texas. He completed his Master of Engineering from Pennsylvania State University in 2012. His primary research interests are in Natural Language Processing, Data Mining and Machine Learning. He has worked in many different problems in NLP, including but not limited to keyword extraction, multi-document summarization, formality analysis, authorship attribution, and native language identification. His research involves the application of machine learning and complex network analysis techniques to different problems in Natural Language Processing. Dr. Arindam Biswas: He has done is doctors from ISI Kolkata. He has over 50 published papers. He works in the area of Digital Geometry, Medical Image Analysis, and Natural Language Processing.