Discourse annotation for arabic 2


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Discourse annotation for arabic 2

  1. 1. Survey on DiscourseAnnotation for ArabicA. Algarni, H. Alharbi and N. AlmutairySupervisor: Dr. A. AlsaifApril 23, 2013Kingdom of Saudi ArabiaMinistry of Higher EducationImam Mohammed Ibn Saud Islamic UniversityCollege of computer and Information SciencesCS465 - Natural Language Processing –1
  2. 2. Outline Introduction The Leeds Arabic Discourse Treebank Discourse Connective Recognition Discourse Relation Recognition Semantic-Based Segmentation Discourse Segmentation Based on RhetoricalMethods A Comprehensive Taxonomy of Arabic DiscourseCoherence Relations2
  3. 3. Introduction Linguistic annotation covers any descriptiveor analytic notations applied to raw languagedata. Annotated Discourse Corpora can be veryuseful to facilitate theoretical studies alongwith contributing in the development of NLPapplications.3
  4. 4. Applications Information extraction Question-answering Summarization Machine translation, generation.4
  5. 5. Discourse Relations andDiscourse Connectives Discourse Relation is the way that twoarguments (text segments) logically connected. Temporal, Comparison, Causal, Expansion..etc Discourse Connective (DC) :A lexical markerused to link two abstract objects in a text. Abstract Object (AO) : Abstract objects indiscourse are things like proposition, events, facts and opinions. Argument (Arg) : A text expressing an abstractobject and linked by a DC.5
  6. 6. The Leeds Arabic DiscourseTreebank6• First effort towards producing an ArabicDiscourse Treebank was introduced in 2011by A. Alsaif and K. Markert.• Collected a large set of Arabic discourseconnectives using text analysis and corpusbased techniques.•Final list contains 107 discourseconnectives.
  7. 7. Types of Discourse connectives7
  8. 8. Types of Relations8
  9. 9. Types of Relations Cont.. COMPARISON.Similarity:9
  10. 10. Arabic Discourse Annotation Tool(ADA) and Annotation Process10
  11. 11. Annotation Methodology1. Measuring whether annotators agree onthe binary decision on whether an itemconstitutes a discourse connective incontext.2. Measuring whether annotators agree onwhich discourse relation an identifiedconnective expresses. As annotators canuse sets of relations for a connective.11
  12. 12. Results Agreement in task 1 is highly reliable(N=23331) percentage agreement of0.95, kappa of 0.88. Agreement in task 2 (relation assignment)is relatively low (N=5586), percentageagreement of 0.66, kappa 0.57, and alphaof 0.58.12
  13. 13. Discourse Connective Recognition To distinguish between discourse and non-discourse usage of a connective. Example: once, while. A. Alsaif and K.Markert (2011) introduceda Connective identifier for Arabic based onsyntactic features.13
  14. 14. Discourse Connective Recognitionby A. Alsaif and K.Markert (2011)Features: Surface Features (SConn) Lexical features of surrounding words(Lex) ExampleArg1DCArg2.[Children might be tired]Arg1 [and]DC [feel sleepy]Arg2 during school time if they didnot sleep well14
  15. 15. Features: Part of Speech features (POS) Syntactic category of related phrases(Syn) (E.g.: / the school isvery large and beautiful) Al-Masdar feature.Discourse Connective Recognitionby A. Alsaif and K.Markert (2011) Cont…15
  16. 16.  ResultsDiscourse Connective Recognitionby A. Alsaif and K.Markert (2011) Cont…Features Acurr KBaseline (not Conn) 68.9 0M1 Conn only 75.7 0.48Tokenization by white space + auto taggerM2M3M4Conn+ SConn+LexConn+ SConn+Lex+POSConn+SConn+Lex+POS+Masdar85.6 0.6287.6 0.6988.5 0.70ATB-based featuresM5M6M7Conn+SConn+LexConn+SConn+Lex+Syn/POSConn+SConn+Lex+Syn/POS+Masdar86.2 0.6591.2 0.7992.4 0.82M8M9Conn+SConn+SynSConn+Lex+Syn+Masdar91.2 0.7991.2 0.7916
  17. 17. Discourse Relation Recognition To identify the type of the relation A. Alsaif and K.Markert (2011) introducedthe first algorithms to automaticallyidentify relations for Arabic17
  18. 18. Features: Connective features Words and POS of arguments Masdar Tense and Negation Length, Distance and Order Features Argument Parent Production RulesDiscourse Relation Recognitionby A. Alsaif and K.Markert (2011)18
  19. 19. ResultsAcurr kFeaturesAll connectives (6039)52.5 0Baseline (CONJUNCTION)77.2 0.6078.7 0.6678.3 0.65Conn only (1)Conn+Conn f+ Arg f (37)Conn+Conn f+ Arg f+ Production rules (1237)M1M2M3Excluding wa at BOP (3813)35 0Baseline (CONJUNCTION)74.3 0.6577.0 0.6976.7 0.69Conn only (1)Conn+Conn f+ Arg f (37)Conn+Conn f+ Arg f+ Production rules (1237)M1M2M319
  20. 20. ResultsAcurr kFeaturesAll connectives (6039)62.4 0Baseline (EXPANSION )88.7 0.7888.7 0.78Conn only (1)Conn+Conn f+ Arg f (37)M1M2Excluding wa at BOP (3813)41.8 0Baseline (EXPANSION)82.7 0.7483.5 0.75Conn only (1)Conn+Conn f+ Arg f (37)M1M220
  21. 21. Semantic-Based Segmentation ofArabic Texts Corpus Analysis Definition: Let L be a list of candidatesegments connectors, each element c in L isclassified based on its effects on the textsegmentation as either active or passive Examples:.1[][[.2]][][21
  22. 22. Segmentation Process Identifying the connectors that indicatecomplete segments. Locating the active connectors. Resolving the case where adjacent activeconnectors exist. Setting the segments boundaries. Creating the final list of segments.22
  23. 23. Discussion evaluate the segmentation process, theycollected ten essays. Each essay ranges between 500 and 700words. After implementing the segmentationprocess. Gave the output to judges to evaluatethem in terms of two factors: correcthit and incorrect hit.23
  24. 24. Discussion Cont..Incorrect hitCorrect hitEssay0331115202531234020512961267233802690221024
  25. 25. Arabic Discourse SegmentationBased on Rhetorical Methods This Method is depends on the meaning ofthe connector " " in Arabic language. There are six types of " " classified intotwo classes, "Fasl" and "Wasl " : "Fasl " : segmenting place. "Wasl " : unsegmenting but connectingthe text.25
  26. 26. Types of Connector " "ClassExampleTypeFaslFaslFaslWaslWaslWasl26
  27. 27. The Arabic sentenceSegmentation System27
  28. 28. Feature Extraction•The following are the features of " ":X3 = noun and X7 = accusative mark.28
  29. 29. Experiment and Results They used 1200 instances for training. They used 293 instances for testing aftertesting there are 290 correct and 3incorrect instances. The result with:94.68%Recall96.82%Precision98.98 %Accuracy29
  30. 30. A Comprehensive Taxonomy of ArabicDiscourse Coherence Relations Coherence relations are classified into twotypes: explicit relations and implicitrelations.exampleCoherence relationsI am very happy because I gotexcellent marks in exams.Explicit relationsI am very happy. I got excellentmarks in exams.Implicit relations.30
  31. 31. The procedure of creating an ArabicTaxonomy of Coherence Relations31
  32. 32. Examples of Implicit Arabicrelations "Impossible condition / " : "Cascaded questioning/ :(32
  33. 33. Results They got a set of 47 Arabic coherencerelations.coherence relations.ResultFrom English coherencerelations.31additional Arabic explicitcoherence relations.12Arabic implicit relations.433
  34. 34. ConclusionDiscourse Annotation is a very fertile fieldand it has many NLP applications, forArabic there are some challenges due tothe lack of annotated corpora and studies.34
  35. 35. Thank You35