Discourse Annotation forArabicArwa Al-Zammam, Ruba Al-Homaid, Eman Al-BadrSupervisor: Amal Al-SaifNatural Language Processing - CS46511-6-1434 H
Outline• Leeds Arabic Discourse Treebank• Discourse Annotation• Arabic language characteristics• Discourse relations• Characteristics of Modern Standard Arabic• Arabic Discourse Connectives• Agreement Studies• Discourse Connective Recognition• Result of Discourse Connective Recognition• Discourse Relation Recognition• Result of Discourse Relation Recognition• Conclusion
Leeds Arabic Discourse Treebank• The Leeds Arabic Discourse Treebank LADTB v1 is the firstdiscourse Treebank for MSA• LADTB has similar annotation principles as PDTB project forEnglish, Turkish, Hindi and Chinese discourse TB• Although LADTB was built to be a gold standard for automaticdiscourse processing studies
Discourse Annotation• Discourse relations such as CAUSAL or CONTRASTrelations between textual units play an important role inproducing a coherent discourse.• In defining discourse connectives as lexical expressions thatrelate two text segments (arguments) that express abstractentities such as events, belief, facts or propositions ( /lkn/but,/Aw/or).ContrastCausal
• Arabic discourse connectives are ambiguity.• Explicit discourse connectives.• The variety of Arabic discourse connectives.• The annotation principles designed to annotate discourseconnectives in English in the PDTB2, can be applied toreliably annotate discourse connectives in Arabic newswire.• Machine learning models can be used to identify discourseconnectives and relations in Arabic newswire.• Supervised machine learning models can identify Arabicdiscourse connectives and their relations with high reliability.Arabic Language Characteristics
Discourse Relations• Explicit discourse relations:[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2• Implicit discourse relations:[He has to stay in bed.]Arg1 [He has the flu.]Arg2
Characteristics of Modern Standard ArabicAl-maSdar noun:
Characteristics of Modern Standard ArabicEnglishAl-masdar nounMorph. PatternRootswimmingSbhreflectionEksexperimentJrbwarHrbdefenceDfeAl-maSdar noun:
• Word order in Arabic. (verb –subject –object)• Punctuations in Arabic.Characteristics of Modern Standard Arabic
Arabic Discourse Connectives• Conjunctions ( /lkn/but, /Aw/or or /w/and)• Adverbial ( /TAlmA.. f../as-long-as)• Prepositional phrases, prepositions also can link discourse segmentswhen one or both arguments are al-maSdar nouns.some nouns such as ( /ntyjp/result, /ks.yp/fear and/bqyp/desire) are used as discourse connectives in Arabic.The discourse connectives in Arabic might occur:• Individually such as ( /lkn/however).• In conjunction with other connectives using the coordinating conjunction/w/and such as ( /lkn w qbl/however and before).• As multiple connectives without conjunction such as ( /AlA bEd/except after).
Agreement Studies• TASK I :measures whether annotators agree on the binary decision onwhether an item constitutes a discourse connective in context.• TASK 2:measures whether annotators agree on which discourse relation anidentified connective expresses.The agreement was measured for the distinction of discourse vs.non-discourse usage, relation assignment and argumentassignment:agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
Discourse Connective Recognition• Surface Features (SConn).• Lexical features of surrounding words (Lex).Arg1DCArg2• Part of Speech features (POS).• Syntactic category of related phrases (Syn).non-discourse usage of w/and ( / ¯almdrshkbyrh w ˇgmylh/ the school is very large and beautiful).• Al-Masdar feature.