Discourse annotation for arabic


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Discourse annotation for arabic

  1. 1. Discourse Annotation forArabicArwa Al-Zammam, Ruba Al-Homaid, Eman Al-BadrSupervisor: Amal Al-SaifNatural Language Processing - CS46511-6-1434 H
  2. 2. Outline• Leeds Arabic Discourse Treebank• Discourse Annotation• Arabic language characteristics• Discourse relations• Characteristics of Modern Standard Arabic• Arabic Discourse Connectives• Agreement Studies• Discourse Connective Recognition• Result of Discourse Connective Recognition• Discourse Relation Recognition• Result of Discourse Relation Recognition• Conclusion
  3. 3. Leeds Arabic Discourse Treebank• The Leeds Arabic Discourse Treebank LADTB v1 is the firstdiscourse Treebank for MSA• LADTB has similar annotation principles as PDTB project forEnglish, Turkish, Hindi and Chinese discourse TB• Although LADTB was built to be a gold standard for automaticdiscourse processing studies
  4. 4. Discourse Annotation• Discourse relations such as CAUSAL or CONTRASTrelations between textual units play an important role inproducing a coherent discourse.• In defining discourse connectives as lexical expressions thatrelate two text segments (arguments) that express abstractentities such as events, belief, facts or propositions ( /lkn/but,/Aw/or).ContrastCausal
  5. 5. Discourse Annotation• Applications using discourse annotation:• Automatic summarization• Question answering• Sentiment analysis• Readability assessment
  6. 6. • Arabic discourse connectives are ambiguity.• Explicit discourse connectives.• The variety of Arabic discourse connectives.• The annotation principles designed to annotate discourseconnectives in English in the PDTB2, can be applied toreliably annotate discourse connectives in Arabic newswire.• Machine learning models can be used to identify discourseconnectives and relations in Arabic newswire.• Supervised machine learning models can identify Arabicdiscourse connectives and their relations with high reliability.Arabic Language Characteristics
  7. 7. Discourse Relations• Explicit discourse relations:[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2• Implicit discourse relations:[He has to stay in bed.]Arg1 [He has the flu.]Arg2
  8. 8. Characteristics of Modern Standard Arabic
  9. 9. Characteristics of Modern Standard ArabicAl-maSdar noun:
  10. 10. Characteristics of Modern Standard ArabicEnglishAl-masdar nounMorph. PatternRootswimmingSbhreflectionEksexperimentJrbwarHrbdefenceDfeAl-maSdar noun:
  11. 11. • Word order in Arabic. (verb –subject –object)• Punctuations in Arabic.Characteristics of Modern Standard Arabic
  12. 12. Arabic Discourse Connectives• Conjunctions ( /lkn/but, /Aw/or or /w/and)• Adverbial ( /TAlmA.. f../as-long-as)• Prepositional phrases, prepositions also can link discourse segmentswhen one or both arguments are al-maSdar nouns.some nouns such as ( /ntyjp/result, /ks.yp/fear and/bqyp/desire) are used as discourse connectives in Arabic.The discourse connectives in Arabic might occur:• Individually such as ( /lkn/however).• In conjunction with other connectives using the coordinating conjunction/w/and such as ( /lkn w qbl/however and before).• As multiple connectives without conjunction such as ( /AlA bEd/except after).
  13. 13. Agreement Studies• TASK I :measures whether annotators agree on the binary decision onwhether an item constitutes a discourse connective in context.• TASK 2:measures whether annotators agree on which discourse relation anidentified connective expresses.The agreement was measured for the distinction of discourse vs.non-discourse usage, relation assignment and argumentassignment:agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
  14. 14. Discourse Connective Recognition• Surface Features (SConn).• Lexical features of surrounding words (Lex).Arg1DCArg2• Part of Speech features (POS).• Syntactic category of related phrases (Syn).non-discourse usage of w/and ( / ¯almdrshkbyrh w ˇgmylh/ the school is very large and beautiful).• Al-Masdar feature.
  15. 15. ResultAcurr KFeatures68.9 0Baseline (not conn)75.7 0.48Conn onlyM1Tokenization by white space + auto tagger85.6 0.62Conn + SConn + LexM287.6 0.69Conn + SConn + Lex + POSM388.5 0.70Conn + SConn + Lex + POS + MasdarM4ATB – based features86.2 0.65Conn + SConn + LexM591.2 0.79Conn + SConn + Lex + Syn/POSM692.4 0.82Conn + Sconn + Lex + Syn/POS + MasdarM791.2 0.79Conn + Sconn + SynM891.2 0.79Sconn + Lex + Syn + MasderM9
  16. 16. Discourse Relation Recognition• Words and POS of arguments.• Masdar.• Tense and Negation.• Length, Distance and Order Features.• Production Rules.
  17. 17. ResultAcurr KFeaturesRefAll connectives (6039)52.5 0Baseline (CONJUNCTION)77.2 0.60Conn only (1)M178.8 0.66Conn + Conn_f + Arg_f (37)M278.3 0.65Conn + Conn_f + Arg_f + Productionrules (1237)M3Excluding wa at BOP (3813)35 0Baseline (CONJUNCTION)74.3 0.65Conn only (1)M177 0.69Conn + Conn_f + Arg_f (37)M276.7 0.69Conn + Conn_f + Arg_f + Productionrules (1237)M3
  18. 18. ResultAcurr KFeaturesRefAll connectives (6039)62.4 0Baseline (EXPANSION)88.7 0.78Conn only (1)M188.7 0.78Conn + Conn_f + Arg_f (37)M2Excluding wa at BOP (3813)41.8 0Baseline (EXPANSION)82.7 0.74Conn only (1)M183.5 0.75Conn + Conn_f + Arg_f (37)M2
  19. 19. Conclusion:We talked about Arabic discourse annotation;discourse connective and relations. We also showArabic language characteristics which related to thissubject and the result.