SlideShare a Scribd company logo
Survey on Discourse
Annotation for Arabic
A. Algarni, H. Alharbi and N. Almutairy
Supervisor: Dr. A. Alsaif
April 23, 2013
Kingdom of Saudi Arabia
Ministry of Higher Education
Imam Mohammed Ibn Saud Islamic University
College of computer and Information Sciences
CS465 - Natural Language Processing –
1
Outline
 Introduction
 The Leeds Arabic Discourse Treebank
 Discourse Connective Recognition
 Discourse Relation Recognition
 Semantic-Based Segmentation
 Discourse Segmentation Based on Rhetorical
Methods
 A Comprehensive Taxonomy of Arabic Discourse
Coherence Relations
2
Introduction
 Linguistic annotation covers any descriptive
or analytic notations applied to raw language
data.
 Annotated Discourse Corpora can be very
useful to facilitate theoretical studies along
with contributing in the development of NLP
applications.
3
Applications
 Information extraction
 Question-answering
 Summarization
 Machine translation, generation.
4
Discourse Relations and
Discourse Connectives
 Discourse Relation is the way that two
arguments (text segments) logically connected.
 Temporal, Comparison, Causal, Expansion..etc
 Discourse Connective (DC) :A lexical marker
used to link two abstract objects in a text.
 Abstract Object (AO) : Abstract objects in
discourse are things like proposition
, events, facts and opinions.
 Argument (Arg) : A text expressing an abstract
object and linked by a DC.
5
The Leeds Arabic Discourse
Treebank
6
• First effort towards producing an Arabic
Discourse Treebank was introduced in 2011
by A. Alsaif and K. Markert.
• Collected a large set of Arabic discourse
connectives using text analysis and corpus
based techniques.
•Final list contains 107 discourse
connectives.
Types of Discourse connectives
7
Types of Relations
8
Types of Relations Cont..
 COMPARISON.Similarity:
9
Arabic Discourse Annotation Tool
(ADA) and Annotation Process
10
Annotation Methodology
1. Measuring whether annotators agree on
the binary decision on whether an item
constitutes a discourse connective in
context.
2. Measuring whether annotators agree on
which discourse relation an identified
connective expresses. As annotators can
use sets of relations for a connective.
11
Results
 Agreement in task 1 is highly reliable
(N=23331) percentage agreement of
0.95, kappa of 0.88.
 Agreement in task 2 (relation assignment)
is relatively low (N=5586), percentage
agreement of 0.66, kappa 0.57, and alpha
of 0.58.
12
Discourse Connective Recognition
 To distinguish between discourse and non-
discourse usage of a connective.
 Example: once, while.
 A. Alsaif and K.Markert (2011) introduced
a Connective identifier for Arabic based on
syntactic features.
13
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011)
Features:
 Surface Features (SConn)
 Lexical features of surrounding words
(Lex)
 Example
Arg1DC
Arg2.
[Children might be tired]Arg1 [and]DC [feel sleepy]Arg2 during school time if they did
not sleep well
14
Features:
 Part of Speech features (POS)
 Syntactic category of related phrases
(Syn) (E.g.: / the school is
very large and beautiful)
 Al-Masdar feature.
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011) Cont…
15
 Results
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011) Cont…
Features Acurr K
Baseline (not Conn) 68.9 0
M1 Conn only 75.7 0.48
Tokenization by white space + auto tagger
M2
M3
M4
Conn+ SConn+Lex
Conn+ SConn+Lex+POS
Conn+SConn+Lex+POS+Masdar
85.6 0.62
87.6 0.69
88.5 0.70
ATB-based features
M5
M6
M7
Conn+SConn+Lex
Conn+SConn+Lex+Syn/POS
Conn+SConn+Lex+Syn/POS+Masdar
86.2 0.65
91.2 0.79
92.4 0.82
M8
M9
Conn+SConn+Syn
SConn+Lex+Syn+Masdar
91.2 0.79
91.2 0.79
16
Discourse Relation Recognition
 To identify the type of the relation
 A. Alsaif and K.Markert (2011) introduced
the first algorithms to automatically
identify relations for Arabic
17
Features:
 Connective features
 Words and POS of arguments
 Masdar
 Tense and Negation
 Length, Distance and Order Features
 Argument Parent
 Production Rules
Discourse Relation Recognition
by A. Alsaif and K.Markert (2011)
18
Results
Acurr kFeatures
All connectives (6039)
52.5 0Baseline (CONJUNCTION)
77.2 0.60
78.7 0.66
78.3 0.65
Conn only (1)
Conn+Conn f+ Arg f (37)
Conn+Conn f+ Arg f+ Production rules (1237)
M1
M2
M3
Excluding wa at BOP (3813)
35 0Baseline (CONJUNCTION)
74.3 0.65
77.0 0.69
76.7 0.69
Conn only (1)
Conn+Conn f+ Arg f (37)
Conn+Conn f+ Arg f+ Production rules (1237)
M1
M2
M3
19
Results
Acurr kFeatures
All connectives (6039)
62.4 0Baseline (EXPANSION )
88.7 0.78
88.7 0.78
Conn only (1)
Conn+Conn f+ Arg f (37)
M1
M2
Excluding wa at BOP (3813)
41.8 0Baseline (EXPANSION)
82.7 0.74
83.5 0.75
Conn only (1)
Conn+Conn f+ Arg f (37)
M1
M2
20
Semantic-Based Segmentation of
Arabic Texts
 Corpus Analysis
 Definition: Let L be a list of candidate
segments connectors, each element c in L is
classified based on its effects on the text
segmentation as either active or passive
 Examples:
.1[
][
[
.2]][
]
[
21
Segmentation Process
 Identifying the connectors that indicate
complete segments.
 Locating the active connectors.
 Resolving the case where adjacent active
connectors exist.
 Setting the segments boundaries.
 Creating the final list of segments.
22
Discussion
 evaluate the segmentation process, they
collected ten essays.
 Each essay ranges between 500 and 700
words.
 After implementing the segmentation
process.
 Gave the output to judges to evaluate
them in terms of two factors: correct
hit and incorrect hit.
23
Discussion Cont..
Incorrect hitCorrect hitEssay
0331
1152
0253
1234
0205
1296
1267
2338
0269
02210
24
Arabic Discourse Segmentation
Based on Rhetorical Methods
 This Method is depends on the meaning of
the connector " " in Arabic language.
 There are six types of " " classified into
two classes, "Fasl" and "Wasl " :
 "Fasl " : segmenting place.
 "Wasl " : unsegmenting but connecting
the text.
25
Types of Connector " "
ClassExampleType
Fasl
Fasl
Fasl
Wasl
Wasl
Wasl
26
The Arabic sentence
Segmentation System
27
Feature Extraction
•The following are the features of " ":
X3 = noun and X7 = accusative mark.
28
Experiment and Results
 They used 1200 instances for training.
 They used 293 instances for testing after
testing there are 290 correct and 3
incorrect instances.
 The result with:
94.68%Recall
96.82%Precision
98.98 %Accuracy
29
A Comprehensive Taxonomy of Arabic
Discourse Coherence Relations
 Coherence relations are classified into two
types: explicit relations and implicit
relations.
exampleCoherence relations
I am very happy because I got
excellent marks in exams.
Explicit relations
I am very happy. I got excellent
marks in exams.
Implicit relations.
30
The procedure of creating an Arabic
Taxonomy of Coherence Relations
31
Examples of Implicit Arabic
relations
 "Impossible condition / " :
 "Cascaded questioning/ :
(
32
Results
 They got a set of 47 Arabic coherence
relations.
coherence relations.Result
From English coherence
relations.
31
additional Arabic explicit
coherence relations.
12
Arabic implicit relations.4
33
Conclusion
Discourse Annotation is a very fertile field
and it has many NLP applications, for
Arabic there are some challenges due to
the lack of annotated corpora and studies.
34
Thank You
35

More Related Content

What's hot

Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
IJECEIAES
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
Hiroki Shimanaka
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
ijnlc
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Waqas Tariq
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Saeedeh Shekarpour
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
ijnlc
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
IJERD Editor
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
ijnlc
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
IJNLC Int.Jour on Natural Lang computing
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
rudolf eremyan
 
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech SynthesisRule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
IJERA Editor
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
amit nagarkoti
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
IJECEIAES
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
Sunayana Gawde
 
Ceis 3
Ceis 3Ceis 3
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 

What's hot (17)

Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech SynthesisRule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 

Viewers also liked

Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
Arabic_NLP_ImamU2013
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
Arabic_NLP_ImamU2013
 
Speech recognition for arabic
Speech recognition for arabicSpeech recognition for arabic
Speech recognition for arabic
Arabic_NLP_ImamU2013
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
Arabic_NLP_ImamU2013
 
Arabic spell checkers
Arabic spell  checkersArabic spell  checkers
Arabic spell checkers
Arabic_NLP_ImamU2013
 
Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabic
Arabic_NLP_ImamU2013
 
Discourse annotation for arabic
Discourse annotation for arabicDiscourse annotation for arabic
Discourse annotation for arabic
Arabic_NLP_ImamU2013
 
Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
Arabic_NLP_ImamU2013
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
Arabic_NLP_ImamU2013
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
Arabic_NLP_ImamU2013
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
Arabic_NLP_ImamU2013
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
Arabic_NLP_ImamU2013
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
Arabic_NLP_ImamU2013
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
Arabic_NLP_ImamU2013
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
Arabic_NLP_ImamU2013
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
Arabic_NLP_ImamU2013
 

Viewers also liked (17)

Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
 
Arabic speech recognition
Arabic speech recognitionArabic speech recognition
Arabic speech recognition
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
 
Speech recognition for arabic
Speech recognition for arabicSpeech recognition for arabic
Speech recognition for arabic
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Arabic spell checkers
Arabic spell  checkersArabic spell  checkers
Arabic spell checkers
 
Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabic
 
Discourse annotation for arabic
Discourse annotation for arabicDiscourse annotation for arabic
Discourse annotation for arabic
 
Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Similar to Discourse annotation for arabic 2

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
Dialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speechDialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speech
IAESIJAI
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
CSCJournals
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
cscpconf
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
csandit
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
威華 王
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four Classifiers
IJCSIS Research Publications
 
The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...
IJECEIAES
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
acijjournal
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
Noobie312
 
Athifah procedia technology_2013
Athifah procedia technology_2013Athifah procedia technology_2013
Athifah procedia technology_2013
Nong Tiun
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
Vipul Munot
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
ijnlc
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
IJCSES Journal
 
arabic.pdf
arabic.pdfarabic.pdf
arabic.pdf
ShoaibBigzad
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 

Similar to Discourse annotation for arabic 2 (20)

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Dialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speechDialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speech
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four Classifiers
 
The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Athifah procedia technology_2013
Athifah procedia technology_2013Athifah procedia technology_2013
Athifah procedia technology_2013
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
 
arabic.pdf
arabic.pdfarabic.pdf
arabic.pdf
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Discourse annotation for arabic 2

  • 1. Survey on Discourse Annotation for Arabic A. Algarni, H. Alharbi and N. Almutairy Supervisor: Dr. A. Alsaif April 23, 2013 Kingdom of Saudi Arabia Ministry of Higher Education Imam Mohammed Ibn Saud Islamic University College of computer and Information Sciences CS465 - Natural Language Processing – 1
  • 2. Outline  Introduction  The Leeds Arabic Discourse Treebank  Discourse Connective Recognition  Discourse Relation Recognition  Semantic-Based Segmentation  Discourse Segmentation Based on Rhetorical Methods  A Comprehensive Taxonomy of Arabic Discourse Coherence Relations 2
  • 3. Introduction  Linguistic annotation covers any descriptive or analytic notations applied to raw language data.  Annotated Discourse Corpora can be very useful to facilitate theoretical studies along with contributing in the development of NLP applications. 3
  • 4. Applications  Information extraction  Question-answering  Summarization  Machine translation, generation. 4
  • 5. Discourse Relations and Discourse Connectives  Discourse Relation is the way that two arguments (text segments) logically connected.  Temporal, Comparison, Causal, Expansion..etc  Discourse Connective (DC) :A lexical marker used to link two abstract objects in a text.  Abstract Object (AO) : Abstract objects in discourse are things like proposition , events, facts and opinions.  Argument (Arg) : A text expressing an abstract object and linked by a DC. 5
  • 6. The Leeds Arabic Discourse Treebank 6 • First effort towards producing an Arabic Discourse Treebank was introduced in 2011 by A. Alsaif and K. Markert. • Collected a large set of Arabic discourse connectives using text analysis and corpus based techniques. •Final list contains 107 discourse connectives.
  • 7. Types of Discourse connectives 7
  • 9. Types of Relations Cont..  COMPARISON.Similarity: 9
  • 10. Arabic Discourse Annotation Tool (ADA) and Annotation Process 10
  • 11. Annotation Methodology 1. Measuring whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context. 2. Measuring whether annotators agree on which discourse relation an identified connective expresses. As annotators can use sets of relations for a connective. 11
  • 12. Results  Agreement in task 1 is highly reliable (N=23331) percentage agreement of 0.95, kappa of 0.88.  Agreement in task 2 (relation assignment) is relatively low (N=5586), percentage agreement of 0.66, kappa 0.57, and alpha of 0.58. 12
  • 13. Discourse Connective Recognition  To distinguish between discourse and non- discourse usage of a connective.  Example: once, while.  A. Alsaif and K.Markert (2011) introduced a Connective identifier for Arabic based on syntactic features. 13
  • 14. Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Features:  Surface Features (SConn)  Lexical features of surrounding words (Lex)  Example Arg1DC Arg2. [Children might be tired]Arg1 [and]DC [feel sleepy]Arg2 during school time if they did not sleep well 14
  • 15. Features:  Part of Speech features (POS)  Syntactic category of related phrases (Syn) (E.g.: / the school is very large and beautiful)  Al-Masdar feature. Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Cont… 15
  • 16.  Results Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Cont… Features Acurr K Baseline (not Conn) 68.9 0 M1 Conn only 75.7 0.48 Tokenization by white space + auto tagger M2 M3 M4 Conn+ SConn+Lex Conn+ SConn+Lex+POS Conn+SConn+Lex+POS+Masdar 85.6 0.62 87.6 0.69 88.5 0.70 ATB-based features M5 M6 M7 Conn+SConn+Lex Conn+SConn+Lex+Syn/POS Conn+SConn+Lex+Syn/POS+Masdar 86.2 0.65 91.2 0.79 92.4 0.82 M8 M9 Conn+SConn+Syn SConn+Lex+Syn+Masdar 91.2 0.79 91.2 0.79 16
  • 17. Discourse Relation Recognition  To identify the type of the relation  A. Alsaif and K.Markert (2011) introduced the first algorithms to automatically identify relations for Arabic 17
  • 18. Features:  Connective features  Words and POS of arguments  Masdar  Tense and Negation  Length, Distance and Order Features  Argument Parent  Production Rules Discourse Relation Recognition by A. Alsaif and K.Markert (2011) 18
  • 19. Results Acurr kFeatures All connectives (6039) 52.5 0Baseline (CONJUNCTION) 77.2 0.60 78.7 0.66 78.3 0.65 Conn only (1) Conn+Conn f+ Arg f (37) Conn+Conn f+ Arg f+ Production rules (1237) M1 M2 M3 Excluding wa at BOP (3813) 35 0Baseline (CONJUNCTION) 74.3 0.65 77.0 0.69 76.7 0.69 Conn only (1) Conn+Conn f+ Arg f (37) Conn+Conn f+ Arg f+ Production rules (1237) M1 M2 M3 19
  • 20. Results Acurr kFeatures All connectives (6039) 62.4 0Baseline (EXPANSION ) 88.7 0.78 88.7 0.78 Conn only (1) Conn+Conn f+ Arg f (37) M1 M2 Excluding wa at BOP (3813) 41.8 0Baseline (EXPANSION) 82.7 0.74 83.5 0.75 Conn only (1) Conn+Conn f+ Arg f (37) M1 M2 20
  • 21. Semantic-Based Segmentation of Arabic Texts  Corpus Analysis  Definition: Let L be a list of candidate segments connectors, each element c in L is classified based on its effects on the text segmentation as either active or passive  Examples: .1[ ][ [ .2]][ ] [ 21
  • 22. Segmentation Process  Identifying the connectors that indicate complete segments.  Locating the active connectors.  Resolving the case where adjacent active connectors exist.  Setting the segments boundaries.  Creating the final list of segments. 22
  • 23. Discussion  evaluate the segmentation process, they collected ten essays.  Each essay ranges between 500 and 700 words.  After implementing the segmentation process.  Gave the output to judges to evaluate them in terms of two factors: correct hit and incorrect hit. 23
  • 24. Discussion Cont.. Incorrect hitCorrect hitEssay 0331 1152 0253 1234 0205 1296 1267 2338 0269 02210 24
  • 25. Arabic Discourse Segmentation Based on Rhetorical Methods  This Method is depends on the meaning of the connector " " in Arabic language.  There are six types of " " classified into two classes, "Fasl" and "Wasl " :  "Fasl " : segmenting place.  "Wasl " : unsegmenting but connecting the text. 25
  • 26. Types of Connector " " ClassExampleType Fasl Fasl Fasl Wasl Wasl Wasl 26
  • 28. Feature Extraction •The following are the features of " ": X3 = noun and X7 = accusative mark. 28
  • 29. Experiment and Results  They used 1200 instances for training.  They used 293 instances for testing after testing there are 290 correct and 3 incorrect instances.  The result with: 94.68%Recall 96.82%Precision 98.98 %Accuracy 29
  • 30. A Comprehensive Taxonomy of Arabic Discourse Coherence Relations  Coherence relations are classified into two types: explicit relations and implicit relations. exampleCoherence relations I am very happy because I got excellent marks in exams. Explicit relations I am very happy. I got excellent marks in exams. Implicit relations. 30
  • 31. The procedure of creating an Arabic Taxonomy of Coherence Relations 31
  • 32. Examples of Implicit Arabic relations  "Impossible condition / " :  "Cascaded questioning/ : ( 32
  • 33. Results  They got a set of 47 Arabic coherence relations. coherence relations.Result From English coherence relations. 31 additional Arabic explicit coherence relations. 12 Arabic implicit relations.4 33
  • 34. Conclusion Discourse Annotation is a very fertile field and it has many NLP applications, for Arabic there are some challenges due to the lack of annotated corpora and studies. 34