MADAD: A Readability Annotation Tool for
Arabic Text
00
Road Map
• What is Text readability?
– The need.
• MADAD
– Base functionality.
– Illustration of Text Annotation.
– What do we have for annotating Arabic text?
– MADAD Architecture.
– MADAD Annotation Function.
– Live Demo Anlp.ksu.edu.sa/madad
• Conclusion
What is readability?
01
Text Readability
• Degree to which a text can be understood (Klare, 2000).
• Readability is a way of deciding how hard a text is.
• Sum of all elements in textual material that affect a reader’s
understanding (graphical aspects or linguistic variables [semantic or syntax]).
04
The need
• Establishing a well-defined standard for readability measurements is
difficult.
– The diversity of reading audience.
– The reading material are different for each study.
• To train and test the readability prediction model, a gold-standard
training corpus is used.
– text is assigned a readability level by expert human annotators.
• Lack of readability training dataset (specially Arabic training dataset).
Solution:
• Providing an online environment to collect readability assessments
on various kinds of corpora.
02
Who uses readability measurements
and what for?
• Teachers in selecting reading material.
• Calibrating public health information (medical instructions,
online resources).
• Producing effective product guides.
• Creating informative web sites and forms for critical government
services.
03
How to measure Readability of text?
AutomatedReadability
Assessment
• Step 1: constructing gold standard
training corpus.
• Step 2: defining set of features to be
computed from text.
• Step 3: machine learning model learns
how to predict the gold standard label
from extracted features.
• Step 4: optimized model is applied to
unseen subset of corpus (test set).
• Semantic units (words or phrases).
• Complexity of syntax.
• Do not have enough features to
provide maximal accuracy.
• Flesch Reading Ease formula
(word frequency list ).
Text Readability
Score
05
‫د‬
َ
‫ـد‬‫ـ‬‫ـ‬‫ـ‬ َ‫م‬
•ُ‫ء‬‫الشي‬ ‫به‬ ُّ‫د‬َ‫م‬ُ‫ي‬ ‫ما‬‫و‬‫ما‬‫يء‬َّ‫ش‬‫ال‬ ‫به‬ ‫زاد‬ُ‫ي‬‫ويكثر‬.
•(ْ‫ل‬
ُ
‫ق‬َ‫د‬ِ‫ف‬َ‫ن‬
َ
‫ل‬ ‫ي‬ِ‫ب‬َ‫ر‬ ‫مات‬ِ‫ل‬
َ
‫ك‬ِ‫ل‬
ً
‫دادا‬ِ‫م‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ َ‫كان‬ ْ‫و‬
َ
‫ل‬ِ‫ل‬
َ
‫ك‬ َ‫د‬
َ
‫ف‬ْ‫ن‬
َ
‫ت‬ ْ‫أن‬ َ‫ل‬ْ‫ب‬
َ
‫ق‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ْ‫و‬
َ
‫ل‬َ‫و‬ ‫ي‬ِ‫ب‬َ‫ر‬
ُ‫مات‬
ِ‫ه‬ِ‫ل‬
ْ
‫ث‬ِ‫م‬ِ‫ب‬ ‫نا‬
ْ
‫ئ‬ ِ‫ج‬
ً
‫دا‬ َ‫د‬ َ‫م‬)(.‫الكهف‬109.)
•‫فاألداة‬‫لغوية‬ ‫واضافات‬ ‫بمعاني‬ ‫النص‬ ‫تمد‬.
06
MADAD base functionality
• The practice of adding interpretative linguistic information to an electronic corpus
(Garside et al., 1997).
syntactic annotation sentiment annotationstructural
annotation
Types of annotations
Readability
annotation
Corpus Annotation
Illustration of Text Annotation
07
08
What do we have for annotating
Arabic text?
• Arabic annotation tools:
• Most of these tools designed for a specific NLP task.
Semantic
annotation
Dialect Annotation morphological,
POS-Tags,
phonetic, and
semantic
annotation
Arabic error
correction
(Saleh & Al-Khalifa,
2009)
(Benajiba & Diab,
2010)
(Attia et al., 2009) (Zaghouani et al.,
2014)
(El-ghobashy et al.,
2014)
(Al-Shargi & Rambow,
2015)
MADADArchitecture
09
08
What do we have for annotating
Arabic text?
• Arabic annotation tools:
• Most of these tools designed for a specific NLP task.
• MADAD supports a broad range of annotation tasks for various linguistic and
semantic phenomena by allowing users to create their customized annotation
schemes.
Semantic
annotation
Dialect Annotation morphological,
POS-Tags,
phonetic, and
semantic
annotation
Arabic error
correction
(Saleh & Al-Khalifa,
2009)
(Benajiba & Diab,
2010)
(Attia et al., 2009) (Zaghouani et al.,
2014)
(El-ghobashy et al.,
2014)
(Al-Shargi & Rambow,
2015)
10
MADAD
List of Functions
1. Creating
corpus
Manager
2. Creating task
Manager
Annotating text
Annotator
Readability
2 modes
Pairwise
comparisons
Direct evaluation
Schema-oriented
XML Schema language
Ending
annotation Task
Manager
Evaluate task
Kappa and Cohen's kappa
Adjudicating
Manager
Exporting
annotated corpus
Manager
12
MADAD
http://anlp.ksu.edu.sa/madad/
11
1-Schema-oriented annotation
• The user define his/her own schema and not hard-coding the
annotation tasks in the tool.
• MADAD uses the XML Schema language supported by W3C1
for the schema definitions
12
2-MADAD readability annotation
1) Comparison method (pairwise comparisons
between the texts).
2) Direct evaluation method.
13
MADAD readability annotation
(1) Pairwise comparisons
• Comparison statements based on
task manager definition.
14
MADAD readability annotation
(2)Direct evaluation method
• The Annotation Manager will
define the scale range for the text
difficulty.
• The default range is 0 (easy) to
100 (difficult)
15
The validity of the manual annotation
• Whether the annotated categories are correct
• But there is no “ground truth”
– Linguistic categories are determined by human judgment.
– Consequence: we cannot measure correctness directly instead measure
the reliability of annotation
• i.e. Whether human annotator A consistently make same
decisions
• Assumption: high reliability implies validity
• How can reliability be determined?
15
Annotation management interface
• Calculate Inter-annotator agreement (IAA)
• View and resolve differences in annotated texts.
• To identify the level of agreement between annotators.
• Inter-annotation agreement using Kappa coefficient index.
11
Export annotated corpus
• The task manager will be able to export the annotated corpus
as xml file.
16
Conclusion
• Gold standard annotations are a prerequisite for the evaluation
state-of-the-art tools for most (NLP) tasks.
• MADAD is generic, web-based framework for collaborative
Arabic text annotation.
• To gauge the effectiveness of the annotation process:
– we will compare MADAD with the available general purpose annotation
tools according to an evaluation framework that is derived from (Dipper,
et al., 2004) for annotation tools evaluation criteria.
• Quality Control in Crowdsourcing:
– Gold standard to be used as test questions.
– Rating system.
17
Dr.Hend Al-Khalifa, Dr.Maha Al-Yahya, Dr.Sinaa Alageel, and Dr.Nora Abanmy , Nora AlTwairesh, Abeer
Al-Dayel,
MADAD Team
QUESTIONS
ANDANSWERS
Presented by
Abeer Al-Dayel

Madad : Arabic Annotation Tool for Arabic Text

  • 1.
    MADAD: A ReadabilityAnnotation Tool for Arabic Text
  • 2.
    00 Road Map • Whatis Text readability? – The need. • MADAD – Base functionality. – Illustration of Text Annotation. – What do we have for annotating Arabic text? – MADAD Architecture. – MADAD Annotation Function. – Live Demo Anlp.ksu.edu.sa/madad • Conclusion
  • 3.
  • 4.
    01 Text Readability • Degreeto which a text can be understood (Klare, 2000). • Readability is a way of deciding how hard a text is. • Sum of all elements in textual material that affect a reader’s understanding (graphical aspects or linguistic variables [semantic or syntax]).
  • 5.
    04 The need • Establishinga well-defined standard for readability measurements is difficult. – The diversity of reading audience. – The reading material are different for each study. • To train and test the readability prediction model, a gold-standard training corpus is used. – text is assigned a readability level by expert human annotators. • Lack of readability training dataset (specially Arabic training dataset). Solution: • Providing an online environment to collect readability assessments on various kinds of corpora.
  • 6.
    02 Who uses readabilitymeasurements and what for? • Teachers in selecting reading material. • Calibrating public health information (medical instructions, online resources). • Producing effective product guides. • Creating informative web sites and forms for critical government services.
  • 7.
    03 How to measureReadability of text? AutomatedReadability Assessment • Step 1: constructing gold standard training corpus. • Step 2: defining set of features to be computed from text. • Step 3: machine learning model learns how to predict the gold standard label from extracted features. • Step 4: optimized model is applied to unseen subset of corpus (test set). • Semantic units (words or phrases). • Complexity of syntax. • Do not have enough features to provide maximal accuracy. • Flesch Reading Ease formula (word frequency list ). Text Readability Score
  • 9.
    05 ‫د‬ َ ‫ـد‬‫ـ‬‫ـ‬‫ـ‬ َ‫م‬ •ُ‫ء‬‫الشي‬ ‫به‬ُّ‫د‬َ‫م‬ُ‫ي‬ ‫ما‬‫و‬‫ما‬‫يء‬َّ‫ش‬‫ال‬ ‫به‬ ‫زاد‬ُ‫ي‬‫ويكثر‬. •(ْ‫ل‬ ُ ‫ق‬َ‫د‬ِ‫ف‬َ‫ن‬ َ ‫ل‬ ‫ي‬ِ‫ب‬َ‫ر‬ ‫مات‬ِ‫ل‬ َ ‫ك‬ِ‫ل‬ ً ‫دادا‬ِ‫م‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ َ‫كان‬ ْ‫و‬ َ ‫ل‬ِ‫ل‬ َ ‫ك‬ َ‫د‬ َ ‫ف‬ْ‫ن‬ َ ‫ت‬ ْ‫أن‬ َ‫ل‬ْ‫ب‬ َ ‫ق‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ْ‫و‬ َ ‫ل‬َ‫و‬ ‫ي‬ِ‫ب‬َ‫ر‬ ُ‫مات‬ ِ‫ه‬ِ‫ل‬ ْ ‫ث‬ِ‫م‬ِ‫ب‬ ‫نا‬ ْ ‫ئ‬ ِ‫ج‬ ً ‫دا‬ َ‫د‬ َ‫م‬)(.‫الكهف‬109.) •‫فاألداة‬‫لغوية‬ ‫واضافات‬ ‫بمعاني‬ ‫النص‬ ‫تمد‬.
  • 10.
    06 MADAD base functionality •The practice of adding interpretative linguistic information to an electronic corpus (Garside et al., 1997). syntactic annotation sentiment annotationstructural annotation Types of annotations Readability annotation Corpus Annotation
  • 11.
    Illustration of TextAnnotation 07
  • 12.
    08 What do wehave for annotating Arabic text? • Arabic annotation tools: • Most of these tools designed for a specific NLP task. Semantic annotation Dialect Annotation morphological, POS-Tags, phonetic, and semantic annotation Arabic error correction (Saleh & Al-Khalifa, 2009) (Benajiba & Diab, 2010) (Attia et al., 2009) (Zaghouani et al., 2014) (El-ghobashy et al., 2014) (Al-Shargi & Rambow, 2015)
  • 13.
  • 14.
    08 What do wehave for annotating Arabic text? • Arabic annotation tools: • Most of these tools designed for a specific NLP task. • MADAD supports a broad range of annotation tasks for various linguistic and semantic phenomena by allowing users to create their customized annotation schemes. Semantic annotation Dialect Annotation morphological, POS-Tags, phonetic, and semantic annotation Arabic error correction (Saleh & Al-Khalifa, 2009) (Benajiba & Diab, 2010) (Attia et al., 2009) (Zaghouani et al., 2014) (El-ghobashy et al., 2014) (Al-Shargi & Rambow, 2015)
  • 15.
    10 MADAD List of Functions 1.Creating corpus Manager 2. Creating task Manager Annotating text Annotator Readability 2 modes Pairwise comparisons Direct evaluation Schema-oriented XML Schema language Ending annotation Task Manager Evaluate task Kappa and Cohen's kappa Adjudicating Manager Exporting annotated corpus Manager
  • 16.
  • 17.
    11 1-Schema-oriented annotation • Theuser define his/her own schema and not hard-coding the annotation tasks in the tool. • MADAD uses the XML Schema language supported by W3C1 for the schema definitions
  • 18.
    12 2-MADAD readability annotation 1)Comparison method (pairwise comparisons between the texts). 2) Direct evaluation method.
  • 19.
    13 MADAD readability annotation (1)Pairwise comparisons • Comparison statements based on task manager definition.
  • 20.
    14 MADAD readability annotation (2)Directevaluation method • The Annotation Manager will define the scale range for the text difficulty. • The default range is 0 (easy) to 100 (difficult)
  • 21.
    15 The validity ofthe manual annotation • Whether the annotated categories are correct • But there is no “ground truth” – Linguistic categories are determined by human judgment. – Consequence: we cannot measure correctness directly instead measure the reliability of annotation • i.e. Whether human annotator A consistently make same decisions • Assumption: high reliability implies validity • How can reliability be determined?
  • 22.
    15 Annotation management interface •Calculate Inter-annotator agreement (IAA) • View and resolve differences in annotated texts. • To identify the level of agreement between annotators. • Inter-annotation agreement using Kappa coefficient index.
  • 23.
    11 Export annotated corpus •The task manager will be able to export the annotated corpus as xml file.
  • 24.
    16 Conclusion • Gold standardannotations are a prerequisite for the evaluation state-of-the-art tools for most (NLP) tasks. • MADAD is generic, web-based framework for collaborative Arabic text annotation. • To gauge the effectiveness of the annotation process: – we will compare MADAD with the available general purpose annotation tools according to an evaluation framework that is derived from (Dipper, et al., 2004) for annotation tools evaluation criteria. • Quality Control in Crowdsourcing: – Gold standard to be used as test questions. – Rating system.
  • 25.
    17 Dr.Hend Al-Khalifa, Dr.MahaAl-Yahya, Dr.Sinaa Alageel, and Dr.Nora Abanmy , Nora AlTwairesh, Abeer Al-Dayel, MADAD Team QUESTIONS ANDANSWERS Presented by Abeer Al-Dayel

Editor's Notes

  • #5 Assessing text readability is a long-established problem which aims to grade the difficulty or the ease of the text. Determining readability level is an important measurement to specify the possible audience of text materials and to evaluate the impact on the readers Measurement has usually been approached through the use of formulae. formulas take into account the length of words as well as the length of sentences. Example: Flesch Reading Ease score = 206.835 − (1.015 × ASL) − (84.6 × ASW) Where: ASL = average sentence length (number of words divided by number of sentences); ASW = average word length in syllables (number of syllables divided by number of words)
  • #6  Collecting readability assessments requires recruiting people from different backgrounds to evaluate text readability. Establishing a well-defined standard for readability measurements is difficult ( the reading audience and the reading material are different for each study). Lack of Arabic readability training data. To train and test the readability prediction model, a gold-standard training corpus is used. In this training corpus the text is assigned a readability level by expert human annotators. providing an online environment to collect readability assessments on various kinds of corpora. Collecting readability assessments requires recruiting people from different backgrounds to evaluate text readability.
  • #7 Assessing text readability is a long-established problem which aims to grade the difficulty or the ease of the text Second language learner need automated readability program, determine which documents they should tackle for the most effective learning experience. دى الى ان يكتسب موضوع المقرؤية ان إيجاد التوافق والانسجام بين القارئ والمقروء أ المقروئية أهمية متزايدة في الوقت الحاضر، وقد أصبح الاهتمام لا يقتصر على ميدان التعليم، بل شمل أيضاً الكتب جميعها، وبذلك نظر إلى المقروئية على أنها إحدى الوسائل والمعايير المفيدة في تطوير الكتب المدرسية والتعرف إلى مستوى صعوبتها Teachers in selecting reading material appropriate for their students' reading level. Second language learners.
  • #8 Given the importance of text readability in meeting people’s information needs, along with modern access to ever-larger volumes of information, the implications of achieving effective text readability assessment are as diverse as the uses for text itself. The ability to quantify the readability of a text is achieved through the use of readability measures that take a text as input and estimate a numerical score or other form of prediction that indicates the level or degree of readability for a given population With the advent of increasingly sophisticated computation methods, along with new sources of data and applications to the Web and social media, the field of automated text readability assessment has evolved significantly in the last decade, and its utility and scope across applications have increased dramatically. On the one hand, widely-used traditional readability measures like Flesch-Kincaid, which estimate text readability based on simple functions of two or three linguistic variables such as syllable and word counts, have been used for decades on traditional texts. However, there is now a shift underway away from these simple but shallow traditional measures, in favor of data-driven, user-centric, knowledge-based computational readability assessment algorithms that use rich text representations derived from computational linguistics, combined with sophisticated prediction models from machine learning, for deeper, more accurate and robust analysis of text difficulty. These new approaches are dynamic and oriented towards both traditional and non-traditional texts: They can learn to evolve automatically as vocabulary evolves, adapt to individual users or groups, and exploit the growing volume of deep knowledge and semantic resources now becoming available online. In addition, non-traditional domain areas like the Web and social media offers novel challenges and opportunities for new forms of content, serving broad categories of tasks and user populations. One of the easiest ways to measure readability is to use the Flesch Reading Ease formula word frequency list functioned as another feature for predicting readability for English. It lists the 10,000 most frequent words from various English doc- uments, but Thorndike did not specify the base corpus from which he generated this list. Lively and Pressey (1923) used Thorndike's list to rate the vocabulary diculty of various documents used in elementary, middle school, and college-level U.S Looking at readability as a classification problem is useful for purposes of machine learning, which is a method for classifying things automatically. In working with corpora researchers often use machine learning to gather and classify the data into separate categories. For an in-depth understanding of machine learning see Witten et al. Machine learning methods are gaining interest in current research in automated read- ability. Previous methods|traditional readability formulas|only accounted for a few fea- tures because they required human counts. In contrast, a machine learning approach is scalable to a large number of features and can accept features generated automatically from electronic documents. Machine learning performs advanced computations that account for interactions between features, and this approach is optimal for development because one can easily adjust the number of features and retest eciently. Repeated cycles of adjustments and retests can e ectively determine the optimal feature set by comparing the results of di erent permutations of feature sets. Additionally, traditional formulas do not have enough features to provide maximal accuracy, and were not intended to do so. Rather they were designed for the purpose of providing human raters a simple approximation of the diculty of a given document. A
  • #10 http://www.almaany.com/ar/dict/ar-ar/%D9%85%D8%AF%D8%AF/
  • #11 Using MADAD, different users will be able to rank text according to relative difficulty Definition: added value to the raw corpus and a crucial contribution to it e orts use corpora for building readability models and as subjects of readability predictions. Corpora are collections of documents that may be annotated with additional information about the document such as the syntactic structure of sentences or the part-of-speech of words. Researchers use the terms `type' and `token' to refer to the items being counted in a corpus which may include words and punctuation. A token is an occurrence of any counted item in a corpus; often, the count of all the words in a given corpus is considered the token count
  • #12 Annotation is usually done by linguistic specialists and/or simple users according to the required annotation task. That is why it is essential for an annotation tool to be intuitive with a user-friendly interface.
  • #14 MADAD provides a user-friendly interface to serve different types of users from linguistic experts to novice users. Project managers are typically in charge of defining new corpus annotation projects and their workflows, monitoring annotation progress, dealing with annotator performance issues, and carrying out annotator training. They also define the annotation guidelines, the associated schemas (or set of tags), and prepare and upload the corpus to be annotated. Managers also make methodological choices: whether to have multiple annotators per document; how many; which automatic NLP services need to be used to pre-process the data; and what is the overall workflow of annotation, quality assurance, adjudication, and corpus delivery
  • #17 Users , manager: Keerthi Password 123 ---- Annotator Girid Password 123
  • #20  In this method two texts will be displayed and the user is asked to compare between them based on previously defined statements like for example "Left text much easier" . After selecting the comparison statement, a text pair and its corresponding assessment statement are added to the database and two new randomly selected texts appear to the annotator.
  • #21 For the direct evaluation method, the Annotation Manager should define the scale range for the text difficulty. The default range is 0 (easy) to 100 (difficult)
  • #24 Schemas are resources just like other GATE components. Below we give some examples of such schemas. Section 3.4.6 describes how to create new schemas. Note that each schema file defines a single annotation type, however it is possible to use include definitions in a schema to refer to other schemas in order to load a whole set of schemas as a group. The default schemas for ANNIE annotation types (defined in resources/schema in the ANNIE plugin) give an example of this technique. Date Schema annotation schemas provide a means to define types of annotations By default GATE Developer will allow you to create any annotations in a document, whether or not there is a schema to describe them. An alternative annotation editor component is available which constrains the available annotation types and features much more tightly, based on the annotation schemas that are currently loaded. This is particularly useful when annotating large quantities of data or for use by less skilled users.
  • #25 This tool will advance the research in the Arabic text readability field, by providing a method to construct a readability assessment corpus that serves as gold standard against which new readability scoring methods can be tested. Also, the tool will provide schema-oriented annotation to be used in existing NLP tasks and new emerging tasks. This is done by giving the user the flexibility to define his/her own schema and not hard-coding the annotation tasks in the tool. MADAD also provides a user-friendly interface to serve different types of users from linguistic experts to novice users. Wray, Samantha, Hamdy Mubarak, and Ahmed Ali. "Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription." ANLP Workshop 2015. 2015. http://www.aclweb.org/anthology/W15-3211