Discourse annotation


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Discourse annotation

  1. 1. A SURVEY OF ARABIC DISCOURSEANNOTATIONBy:Abeer Al-QahtaniAfnan Al-MoadiNujoud Al-Ghamdi
  2. 2. INTRODUCTIONArabic language discourse annotation orsegmentation have become a popular area of research.The aim of this presentation is to survey and summarizesome techniques which used in discourse annotation andsegmentation and to show their methods and results.
  3. 3. CLAUSE-BASED DISCOURSE SEGMENTATION OFARABIC TEXTSDiscourse parsing consists in two steps:1- discourse segmentation which aims at identifyingElementary Discourse Units (EDU).2- building the discourse structure by linking EDUs using aset of rhetorical or discursive relationsArabic language characteristics:- An agglutinative.- Does not have capital letters.- Absence of diacritics.
  4. 4. METHODOLOGY Their analysis was carried out on two different corpusgenres: news articles and elementary school textbooks. They proposed a three steps segmentation algorithm: Step1: punctuation marks. Step2: lexical cues. Step3: Mixed of punctuation marks and lexical cues.
  5. 5. METHODOLOGY CONT. Step1- punctuation marks:[ ][Dr. Tarak Swiden has treated various diseases.] Step2: lexical cues:][][[They will know when we start][but they dont know whenwe finish]
  6. 6. METHODOLOGY CONT. Step3: Mixed of punctuation marks and lexical cues: If comma is followed by the conjunction " " (waw) or " " (fā)and then by a preposition of localization) { },it indicates the end of a segment.Example:.([Like Tunisian families, her family left Marsa city,][then, they found themselves at the wonderful Marsa’s beach.]
  7. 7. METHODOLOGY CONT. If comma is followed by the conjunction " " (waw) or " " (fā)and then by a possessive noun {}, it indicates the end of a segment.Example:[I saw my sister outside,] [with a talking doll] If a comma is followed by a demonstrative pronoun {} and then by a word that isnot a verb, there is not a segment frontier.Example:[Mr. Hamed, our teacher, was standing up, looking at us.]
  8. 8. THE RESULT
  9. 9. SEMANTIC-BASED SEGMENTATION FOR ARABICTEXTIn this approach the aim is to divide the text intocomplete meaningful parts which can existindependently without their prefix or postfix parts . Connectors Classification: Active: words that indicate the beginning of a newsegment, the end of a segment or a completesegment. ( – ) Passive: words that dont indicate a new segment, an endof a segment or a complete segment bythemselves, but when they come with activeelements, they contribute in determining the position of thestart or the end of the segments.
  10. 10. METHODOLOGY Identifying theconnectors that indicatecomplete segments (withS instances in theSegBoundary property). Locating the activeconnectors. Resolving the case whereadjacent activeconnectors exist Setting the segmentsboundaries. Creating the final list ofsegments
  11. 11. THE RESULT
  12. 12. ARABIC DISCOURSE SEGMENTATION BASED ONRHETORICAL METHOD This technique derived from Arabic Rhetorical as defined byArabic. Focuses on connector Waw “ ”. Categorizes the six known Rhetorical types of “ ” into tow classes:“Fasl” and “Wasl”. They use SVM Machine Learning.“Fasl”: 1,2 and 3“Wasl”: 4,5 and 6
  13. 13. EXAMPLES1Waw[Professors teach students sciences and virtue, I swear to God, they have done agreat mission for their nation]2Waw[Young people are not the only ones who suffer, but their crises are part of the crisesof the whole society and someone may ask: Why have focused only on youth onlyand not on the divisions of the whole society?]3Waw[Adolescents suffer from some psychological problems and there are, in general,other numerous problems in the society.]4Waw[The teacher came smiley into the classroom.]5Waw[The couple sat together with the light of the moon.]6Waw[The study started and students and teachers enrolled in schools.]
  14. 14. METHODOLOGY Preprocessing Diacretization Discriminate the connector “ ” from the letter “ ” Feature Extraction They extract 22 features to distinguish each type of “ ”. Classification
  15. 15. FEATURE EXTRACTION Waw1: X1= “ ” and X7= genitive mark. X3=noun, X7= genitive mark and X16=no. Waw2: “ ” X1= “ ” and X7= accusative mark. X3=noun, X5= indefinite, X6≠genitivemark and X7 = genitive mark. Waw3: “ ” X12≠X13. X14 ≠ X15. X19 ≠X20. X21=no and X22=no. Waw4: “ ” X16=yes. X1= “ ”, X10= verb and X11=past tense. Waw5: “ ” X3= noun and X7 = accusative mark. Waw6: “ ” X2=X3, X6=X7, and (X4=X5 OR X8=X9OR X17= X18). X12=X13, X14=X15, X19=X20 and(X21= yes OR X22= yes)
  16. 16. THE RESULT The Corpus of Arabic Discourse Segmentation incorporated in thisexperiment. They use 1200 instances for training and 293 for testing. Class Waw5 did not appear in training and testing. Class Waw3 and 6 are the most appearance.Segmentationaccuracy =98.98%
  17. 17. THE LEEDS ARABIC DISCOURSE TREEBANK: ANNOTATINGDISCOURSE CONNECTIVES FOR ARABIC First effort toward producing an Arabic Discourse Treebank. Defining discourse connectives as lexical expression that relate two textsegment. Segments called arguments. Discourse relations play an important role in producing a coherentdiscourse. Collecting Arabic Connectives: They using text analysis and corpus-based technique. Manually extracting connectives from 50 randomly selected texts from PATB and from10 different websites. Resulting list was manually tested by two native speakers. 107 discourse connectives.
  18. 18. CONT. Types Of Relations:
  19. 19. CONT. Agreement Studies: The Corpus: PATB ADA Tool & Annotating process.Afterannotating
  20. 20. METHODOLOGY Done by two independent Arabic native speakers. Agreement is measured on two tasks: Task1: measures whether annotators agree on the binary decision onwhether an item constitutes a discourse connective in context. Task2: measures whether annotators agree on which discourserelation an identified connective expresses.
  21. 21. THE RESULT Agreement on TASK I is highly reliable. Agreement on TASK II (relation assignment) isrelatively low.
  22. 22. MODELLING DISCOURSE RELATIONS FORARABIC. Discourse Connective Recognition. Discourse connective recognition distinguishes betweenthe discourse usage and non-discourse usage ofpotential connectives. Conjunctions such as /w/and, /¯aw/or can havediscourse usage or just conjoin two non-abstract entitiesas in /,mr w s¯arh/Omar and Sarah.
  23. 23. CONT. Features:1. Surface Features (SConn).2. Part of speech features(POS).3. Lexical features of surrounding words (Lex). E.g.4. Syntactic category of related phrases (Syn).5. Al-Masdar feature:
  25. 25.  Discourse Relation Recognition:1. Connective features.2. Words and POS of arguments. E.g. when thefirst word of Arg2 is /qd/might/may or /k¯an/had, therelation is likely to be EXPANSION.BACKGROUND orEXPANSION.CONJUNCTION.3. Tense and Negation.4. Masdar.5. Argument Parent.6. Production Rules.
  26. 26. Performance of different models for identifying fine-grained discourse relations on two datasetsPerformance of different models for identifyingclass-level discourse relations on two datasets
  27. 27. CONCLUSIONIn this survey we presented some annotatingconnectives and some segmentation techniques whichrelated with Arabic language and depended on differentcorpora and methods. according to that , we get manydifferent results.
  28. 28. THANKS!