1. Discourse Annotation for
Arabic
Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr
Supervisor: Amal Al-Saif
Natural Language Processing - CS465
11-6-1434 H
2. Outline
• Leeds Arabic Discourse Treebank
• Discourse Annotation
• Arabic language characteristics
• Discourse relations
• Characteristics of Modern Standard Arabic
• Arabic Discourse Connectives
• Agreement Studies
• Discourse Connective Recognition
• Result of Discourse Connective Recognition
• Discourse Relation Recognition
• Result of Discourse Relation Recognition
• Conclusion
3. Leeds Arabic Discourse Treebank
• The Leeds Arabic Discourse Treebank LADTB v1 is the first
discourse Treebank for MSA
• LADTB has similar annotation principles as PDTB project for
English, Turkish, Hindi and Chinese discourse TB
• Although LADTB was built to be a gold standard for automatic
discourse processing studies
4. Discourse Annotation
• Discourse relations such as CAUSAL or CONTRAST
relations between textual units play an important role in
producing a coherent discourse.
• In defining discourse connectives as lexical expressions that
relate two text segments (arguments) that express abstract
entities such as events, belief, facts or propositions ( /lkn/but,
/Aw/or).
Contrast
Causal
6. • Arabic discourse connectives are ambiguity.
• Explicit discourse connectives.
• The variety of Arabic discourse connectives.
• The annotation principles designed to annotate discourse
connectives in English in the PDTB2, can be applied to
reliably annotate discourse connectives in Arabic newswire.
• Machine learning models can be used to identify discourse
connectives and relations in Arabic newswire.
• Supervised machine learning models can identify Arabic
discourse connectives and their relations with high reliability.
Arabic Language Characteristics
7. Discourse Relations
• Explicit discourse relations:
[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2
• Implicit discourse relations:
[He has to stay in bed.]Arg1 [He has the flu.]Arg2
10. Characteristics of Modern Standard Arabic
EnglishAl-masdar nounMorph. PatternRoot
swimmingSbh
reflectionEks
experimentJrb
warHrb
defenceDfe
Al-maSdar noun:
11. • Word order in Arabic. (verb –subject –object)
• Punctuations in Arabic.
Characteristics of Modern Standard Arabic
12. Arabic Discourse Connectives
• Conjunctions ( /lkn/but, /Aw/or or /w/and)
• Adverbial ( /TAlmA.. f../as-long-as)
• Prepositional phrases, prepositions also can link discourse segments
when one or both arguments are al-maSdar nouns.
some nouns such as ( /ntyjp/result, /ks.yp/fear and
/bqyp/desire) are used as discourse connectives in Arabic.
The discourse connectives in Arabic might occur:
• Individually such as ( /lkn/however).
• In conjunction with other connectives using the coordinating conjunction
/w/and such as ( /lkn w qbl/however and before).
• As multiple connectives without conjunction such as ( /AlA bEd/
except after).
13. Agreement Studies
• TASK I :
measures whether annotators agree on the binary decision on
whether an item constitutes a discourse connective in context.
• TASK 2:
measures whether annotators agree on which discourse relation an
identified connective expresses.
The agreement was measured for the distinction of discourse vs.
non-discourse usage, relation assignment and argument
assignment:
agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
14. Discourse Connective Recognition
• Surface Features (SConn).
• Lexical features of surrounding words (Lex).
Arg1DCArg2
• Part of Speech features (POS).
• Syntactic category of related phrases (Syn).
non-discourse usage of w/and ( / ¯almdrsh
kbyrh w ˇgmylh/ the school is very large and beautiful).
• Al-Masdar feature.
16. Discourse Relation Recognition
• Words and POS of arguments.
• Masdar.
• Tense and Negation.
• Length, Distance and Order Features.
• Production Rules.
17. Result
Acurr KFeaturesRef
All connectives (6039)
52.5 0Baseline (CONJUNCTION)
77.2 0.60Conn only (1)M1
78.8 0.66Conn + Conn_f + Arg_f (37)M2
78.3 0.65Conn + Conn_f + Arg_f + Production
rules (1237)
M3
Excluding wa at BOP (3813)
35 0Baseline (CONJUNCTION)
74.3 0.65Conn only (1)M1
77 0.69Conn + Conn_f + Arg_f (37)M2
76.7 0.69Conn + Conn_f + Arg_f + Production
rules (1237)
M3
18. Result
Acurr KFeaturesRef
All connectives (6039)
62.4 0Baseline (EXPANSION)
88.7 0.78Conn only (1)M1
88.7 0.78Conn + Conn_f + Arg_f (37)M2
Excluding wa at BOP (3813)
41.8 0Baseline (EXPANSION)
82.7 0.74Conn only (1)M1
83.5 0.75Conn + Conn_f + Arg_f (37)M2
19. Conclusion:
We talked about Arabic discourse annotation;
discourse connective and relations. We also show
Arabic language characteristics which related to this
subject and the result.