Madad : Arabic Annotation Tool for Arabic Text

MADAD: A Readability Annotation Tool for
Arabic Text

00
Road Map
• What is Text readability?
– The need.
• MADAD
– Base functionality.
– Illustration of Text Annotation.
– What do we have for annotating Arabic text?
– MADAD Architecture.
– MADAD Annotation Function.
– Live Demo Anlp.ksu.edu.sa/madad
• Conclusion

01
Text Readability
• Degree to which a text can be understood (Klare, 2000).
• Readability is a way of deciding how hard a text is.
• Sum of all elements in textual material that affect a reader’s
understanding (graphical aspects or linguistic variables [semantic or syntax]).

04
The need
• Establishing a well-defined standard for readability measurements is
difficult.
– The diversity of reading audience.
– The reading material are different for each study.
• To train and test the readability prediction model, a gold-standard
training corpus is used.
– text is assigned a readability level by expert human annotators.
• Lack of readability training dataset (specially Arabic training dataset).
Solution:
• Providing an online environment to collect readability assessments
on various kinds of corpora.

02
Who uses readability measurements
and what for?
• Teachers in selecting reading material.
• Calibrating public health information (medical instructions,
online resources).
• Producing effective product guides.
• Creating informative web sites and forms for critical government
services.

03
How to measure Readability of text?
AutomatedReadability
Assessment
• Step 1: constructing gold standard
training corpus.
• Step 2: defining set of features to be
computed from text.
• Step 3: machine learning model learns
how to predict the gold standard label
from extracted features.
• Step 4: optimized model is applied to
unseen subset of corpus (test set).
• Semantic units (words or phrases).
• Complexity of syntax.
• Do not have enough features to
provide maximal accuracy.
• Flesch Reading Ease formula
(word frequency list ).
Text Readability
Score

05
‫د‬
َ
‫ـد‬‫ـ‬‫ـ‬‫ـ‬ َ‫م‬
•ُ‫ء‬‫الشي‬ ‫به‬ ُّ‫د‬َ‫م‬ُ‫ي‬ ‫ما‬‫و‬‫ما‬‫يء‬َّ‫ش‬‫ال‬ ‫به‬ ‫زاد‬ُ‫ي‬‫ويكثر‬.
•(ْ‫ل‬
ُ
‫ق‬َ‫د‬ِ‫ف‬َ‫ن‬
َ
‫ل‬ ‫ي‬ِ‫ب‬َ‫ر‬ ‫مات‬ِ‫ل‬
َ
‫ك‬ِ‫ل‬
ً
‫دادا‬ِ‫م‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ َ‫كان‬ ْ‫و‬
َ
‫ل‬ِ‫ل‬
َ
‫ك‬ َ‫د‬
َ
‫ف‬ْ‫ن‬
َ
‫ت‬ ْ‫أن‬ َ‫ل‬ْ‫ب‬
َ
‫ق‬ ُ‫ر‬ ْ‫ح‬َ‫الب‬ْ‫و‬
َ
‫ل‬َ‫و‬ ‫ي‬ِ‫ب‬َ‫ر‬
ُ‫مات‬
ِ‫ه‬ِ‫ل‬
ْ
‫ث‬ِ‫م‬ِ‫ب‬ ‫نا‬
ْ
‫ئ‬ ِ‫ج‬
ً
‫دا‬ َ‫د‬ َ‫م‬)(.‫الكهف‬109.)
•‫فاألداة‬‫لغوية‬ ‫واضافات‬ ‫بمعاني‬ ‫النص‬ ‫تمد‬.

06
MADAD base functionality
• The practice of adding interpretative linguistic information to an electronic corpus
(Garside et al., 1997).
syntactic annotation sentiment annotationstructural
annotation
Types of annotations
Readability
annotation
Corpus Annotation

Illustration of Text Annotation
07

08
What do we have for annotating
Arabic text?
• Arabic annotation tools:
• Most of these tools designed for a specific NLP task.
Semantic
annotation
Dialect Annotation morphological,
POS-Tags,
phonetic, and
semantic
annotation
Arabic error
correction
(Saleh & Al-Khalifa,
2009)
(Benajiba & Diab,
2010)
(Attia et al., 2009) (Zaghouani et al.,
2014)
(El-ghobashy et al.,
2014)
(Al-Shargi & Rambow,
2015)

08
What do we have for annotating
Arabic text?
• Arabic annotation tools:
• Most of these tools designed for a specific NLP task.
• MADAD supports a broad range of annotation tasks for various linguistic and
semantic phenomena by allowing users to create their customized annotation
schemes.
Semantic
annotation
Dialect Annotation morphological,
POS-Tags,
phonetic, and
semantic
annotation
Arabic error
correction
(Saleh & Al-Khalifa,
2009)
(Benajiba & Diab,
2010)
(Attia et al., 2009) (Zaghouani et al.,
2014)
(El-ghobashy et al.,
2014)
(Al-Shargi & Rambow,
2015)

10
MADAD
List of Functions
1. Creating
corpus
Manager
2. Creating task
Manager
Annotating text
Annotator
Readability
2 modes
Pairwise
comparisons
Direct evaluation
Schema-oriented
XML Schema language
Ending
annotation Task
Manager
Evaluate task
Kappa and Cohen's kappa
Adjudicating
Manager
Exporting
annotated corpus
Manager

12
MADAD
http://anlp.ksu.edu.sa/madad/

11
1-Schema-oriented annotation
• The user define his/her own schema and not hard-coding the
annotation tasks in the tool.
• MADAD uses the XML Schema language supported by W3C1
for the schema definitions

12
2-MADAD readability annotation
1) Comparison method (pairwise comparisons
between the texts).
2) Direct evaluation method.

13
MADAD readability annotation
(1) Pairwise comparisons
• Comparison statements based on
task manager definition.

14
MADAD readability annotation
(2)Direct evaluation method
• The Annotation Manager will
define the scale range for the text
difficulty.
• The default range is 0 (easy) to
100 (difficult)

15
The validity of the manual annotation
• Whether the annotated categories are correct
• But there is no “ground truth”
– Linguistic categories are determined by human judgment.
– Consequence: we cannot measure correctness directly instead measure
the reliability of annotation
• i.e. Whether human annotator A consistently make same
decisions
• Assumption: high reliability implies validity
• How can reliability be determined?

15
Annotation management interface
• Calculate Inter-annotator agreement (IAA)
• View and resolve differences in annotated texts.
• To identify the level of agreement between annotators.
• Inter-annotation agreement using Kappa coefficient index.

11
Export annotated corpus
• The task manager will be able to export the annotated corpus
as xml file.

16
Conclusion
• Gold standard annotations are a prerequisite for the evaluation
state-of-the-art tools for most (NLP) tasks.
• MADAD is generic, web-based framework for collaborative
Arabic text annotation.
• To gauge the effectiveness of the annotation process:
– we will compare MADAD with the available general purpose annotation
tools according to an evaluation framework that is derived from (Dipper,
et al., 2004) for annotation tools evaluation criteria.
• Quality Control in Crowdsourcing:
– Gold standard to be used as test questions.
– Rating system.

17
Dr.Hend Al-Khalifa, Dr.Maha Al-Yahya, Dr.Sinaa Alageel, and Dr.Nora Abanmy , Nora AlTwairesh, Abeer
Al-Dayel,
MADAD Team
QUESTIONS
ANDANSWERS
Presented by
Abeer Al-Dayel

Madad : Arabic Annotation Tool for Arabic Text

More Related Content

Recently uploaded

Featured

Madad : Arabic Annotation Tool for Arabic Text

Editor's Notes