SlideShare a Scribd company logo
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

.

A Study of Association Measures and their
Combination for Arabic MWT Extraction
.

10th International Conference on Terminology and Artificial
Intelligence (TIA’2013)
Abdelkader El Mahdaouy, Said El Alaoui Ouatik and Eric Gaussier

October 28th, 2013

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

1 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

. Table of contents
.
1 Introduction
Terminology Extraction
Motivation
.
2 The state of MWT extraction
Standard Approaches
Statistical Measures
.
3 Proposed Method
Linguistic Filter
Statistical Filter
.
4 Evaluation and Results
Corpus
Evaluation Method
Obtained results
.
5 Conclusion and perspectives
.
6 Bibliography
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

2 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Terminology Extraction
Motivation

. Terminology Extraction
.
Terminology
.
Set of terms representing the system of concepts of a particular
subject field.
.
.
Term
.
lexical unit that has an unambiguous meaning when used in a
text of a specific domain.
Refer to a defined concept ... (ISO 704).
.
.
Terminology Extraction
.
Subtask of information extraction.
.

Automatically extract relevant terms from a given corpus.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

3 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Terminology Extraction
Motivation

. Terminology Extraction
.
Terminology
.
Set of terms representing the system of concepts of a particular
subject field.
.
.
Term
.
lexical unit that has an unambiguous meaning when used in a
text of a specific domain.
Refer to a defined concept ... (ISO 704).
.
.
Terminology Extraction
.
Subtask of information extraction.
.

Automatically extract relevant terms from a given corpus.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

3 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Terminology Extraction
Motivation

. Motivation

The bag-of-words model (based on single word terms) is a
simplifying representation used in natural language processing
and information retrieval(IR).
Multi-word terms (MWT) are less ambiguous and less
polysemous than single word terms.
Using MWT instead of single word terms yields a better
representation of document content.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

4 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Standard Approaches
Statistical Measures

. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

5 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Standard Approaches
Statistical Measures

. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.
.
Hybrid Approaches
.
Combine linguistic and statistical techniques to extract MWTs in order to avoid the
weaknesses of the two approaches.
.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

5 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Standard Approaches
Statistical Measures

. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.
.
Hybrid Approaches
.
Combine linguistic and statistical techniques to extract MWTs in order to avoid the
weaknesses of the two approaches.
.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

5 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Standard Approaches
Statistical Measures

. Characteristics of MWTs
Defined by Kageura et al., 1996 :
.
Unithood
.
The degree of strength or stability of syntagmatic
combinations or collocations.
.

Log-Likelihood Ratio, T-Score, MI, etc.

.
Termthood
.
The degree to which a linguistic unit is related to a specific
domain concept.
.

C/NC-value.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

6 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Standard Approaches
Statistical Measures

. Characteristics of MWTs
Defined by Kageura et al., 1996 :
.
Unithood
.
The degree of strength or stability of syntagmatic
combinations or collocations.
.

Log-Likelihood Ratio, T-Score, MI, etc.

.
Termthood
.
The degree to which a linguistic unit is related to a specific
domain concept.
.

C/NC-value.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

6 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Proposed Method
Hybrid method consists of two filters:
.
Linguistic Filter
.
Use AMIRA 2.0 (POS tagging toolkit).
Extract MWT candidates based on syntactic patterns.
.

Handle the problem of MWT variation.

.
Statistical Filter
.
Propose novel statistical measure (NLC-value) that combine
context information with termhood and unithood.
.

Evaluate state-of-the-art statistical measures.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

7 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Proposed Method
Hybrid method consists of two filters:
.
Linguistic Filter
.
Use AMIRA 2.0 (POS tagging toolkit).
Extract MWT candidates based on syntactic patterns.
.

Handle the problem of MWT variation.

.
Statistical Filter
.
Propose novel statistical measure (NLC-value) that combine
context information with termhood and unithood.
.

Evaluate state-of-the-art statistical measures.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

7 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Linguistic Filter
The proposed linguistic filter extracts
candidate MWTs based on two core
components; the POS tagger and the
sequence identifier:
.
Syntactic patterns
.
(Noun + (Noun|Adj) +
|(Noun|adj) + |(Noun|Adj)).
.

Noun Prep Noun.
Figure 1 : The global schema of
the linguistic filter
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

8 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
Four types of variations are handled : graphical variants,
inflectional variants, morpho-syntactic variants and syntactic
variants.
.
Graphical Variants
.
Concern orthographic errors occurred in writing some particular
letters (” ”, ” ” and ” ”).
.
.
Example
.
which leads to

meaning

“Biodiversity”.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

9 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
Four types of variations are handled : graphical variants,
inflectional variants, morpho-syntactic variants and syntactic
variants.
.
Graphical Variants
.
Concern orthographic errors occurred in writing some particular
letters (” ”, ” ” and ” ”).
.
.
Example
.
which leads to

meaning

“Biodiversity”.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

9 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Inflectional Variants
.
These variants due to the use of different forms for the words
constituting a MWT:
The gender and the number.
The presence/absence of a definite article.

.

.
Examples
.
.
1

(ocean pollution) which leads to
(pollution of the oceans).

.

2

.

(water pollution) which leads to
water pollution).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

(the

10 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Inflectional Variants
.
These variants due to the use of different forms for the words
constituting a MWT:
The gender and the number.
The presence/absence of a definite article.

.

.
Examples
.
.
1

(ocean pollution) which leads to
(pollution of the oceans).

.

2

.

(water pollution) which leads to
water pollution).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

(the

10 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Morpho-syntactic Variants
.
These variants affect the internal structure of term as the words it
contains are related through derivational morphology:
Noun1 Noun2 ⇔ Noun1 Adj.
Noun1 Adj ⇔ Noun1 Prep Noun.

.

.
Examples
.
.
1
.

2

and
which leads to

(air pollution).
(barrel of oil).

.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

11 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Morpho-syntactic Variants
.
These variants affect the internal structure of term as the words it
contains are related through derivational morphology:
Noun1 Noun2 ⇔ Noun1 Adj.
Noun1 Adj ⇔ Noun1 Prep Noun.

.

.
Examples
.
.
1
.

2

and
which leads to

(air pollution).
(barrel of oil).

.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

11 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Syntactic Variants
.
These variants modify the structure of the MWT candidate by
adding one or more words (as adjectives) but do not affect the
grammatical categories:
Noun1 Noun2 ⇔ Noun1 Noun2 Adj.
Noun1 Adj1 ⇔ Noun1 Adj1 Adj2.
.
.
Examples
.
.
1
(Water stocks) and
(Groundwater stocks).
.

2

.

(Health Organization) and
(World Health Organization).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

12 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

. Term variation
.
Syntactic Variants
.
These variants modify the structure of the MWT candidate by
adding one or more words (as adjectives) but do not affect the
grammatical categories:
Noun1 Noun2 ⇔ Noun1 Noun2 Adj.
Noun1 Adj1 ⇔ Noun1 Adj1 Adj2.
.
.
Examples
.
.
1
(Water stocks) and
(Groundwater stocks).
.

2

.

(Health Organization) and
(World Health Organization).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

12 / 20
.

Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Linguistic Filter
Statistical Filter

Statistical Filter
The NLC-value
.
NLC-value
.
NLC-value(a) = 0.8 · LC-value(a) + 0.2 · N-value(a)

.

(1)

with

{
log2 (|a|) · FL(a) if a is not nested,
1 ∑
log2 (|a|) · (FL(a) − |T | b∈Ta FL(b)) else
a
and FL(a) = f(a) · ln(2 + min(LLR(a))),
∑
|T(b)|
fa (b) ·
N − value (a) =
n
b∈C

LC-value(a) =

,

a

.
1
.
2

|a| denotes the length in words of candidate term a.

.
3
.
4

T(a) denotes the set of longer candidate terms into which a appears.

.
5
.
6

Ca denotes the set of distinct context words of a.

.
7

n is the total number of terms considered.

f(a) is the number of occurrences of a.
|T(a)| is the cardinality of the set T(a).
fa (b) corresponds to the number of times b occurs in the context of a.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

13 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Corpus
Evaluation Method
Obtained results

. The Corpus

Lack of Arabic specialized domain corpora.
The corpus built contains 1666 files comprising 53569
different tokens (without stop words) extracted from the Web
site “Al-Khat Alakhdar”.
The corpus covers various environmental topics such as
pollution, water purification, soil degradation, forest
preservation, climate change and natural disasters.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

14 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Corpus
Evaluation Method
Obtained results

. The Evaluation
. We computed the association scores (LLR, C-value, NC-value,
NTC-value, LLR+C-value, NLC-value) for the MWT
candidates.
.
2 We retain from each produced ranking for each statistical
1

measure the k-best candidates, with k ranging from 100 to
300 at intervals of 100.
. We have constituted automatically a reference list of all Arabic
MWTs available in the latest version of AGROVOC thesaurus.
.
4 We used translation of MWT and European terminological
3

database IATE.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

15 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Corpus
Evaluation Method
Obtained results

. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Table 1 :

Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%

Results obtained for different statistical measures

Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86

Table 2 :

Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

16 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Corpus
Evaluation Method
Obtained results

. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Table 1 :

Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%

Results obtained for different statistical measures

Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86

Table 2 :

Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Top MWT considred
100
200
300
40
81
113
44
79
120
42
78
123
45
83
126
39
84
121
41
86
133

Table 3 :

Number of terms found in IATE
foreach measure
Arabic MWT Extraction

16 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Corpus
Evaluation Method
Obtained results

. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Table 1 :

Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%

Results obtained for different statistical measures

Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86

Table 2 :

Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value

Top MWT considred
100
200
300
40
81
113
44
79
120
42
78
123
45
83
126
39
84
121
41
86
133

Table 3 :

Number of terms found in IATE
foreach measure
Arabic MWT Extraction

16 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

Figure 2 :

Figure 3 :

Corpus
Evaluation Method
Obtained results

Precision obtained for different statistical measures that combine termhood and unithood

Precision obtained for the C/NC-value and

the NTC-value
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Figure 4 :

Precision obtained for the LLR and the

C/NC-value
Arabic MWT Extraction

17 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

. Conclusion and perspectives
.
Conclusion
.
.
1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing
linguistic and statistical approaches.
.
2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.
.
3

Experiments are performed for bi-grams and tri-grams on an environment Arabic
corpus.

.
.
perspectives
.
.
1 Validate the proposed statistical measure in other language.
.
2 Using the extracted MWTs for documents indexing and retrieving in IR systems.
.
.
We appreciate the reviewers for their useful comments (the results presented here are
based on their remarks).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

18 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

. Conclusion and perspectives
.
Conclusion
.
.
1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing
linguistic and statistical approaches.
.
2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.
.
3

Experiments are performed for bi-grams and tri-grams on an environment Arabic
corpus.

.
.
perspectives
.
.
1 Validate the proposed statistical measure in other language.
.
2 Using the extracted MWTs for documents indexing and retrieving in IR systems.
.
.
We appreciate the reviewers for their useful comments (the results presented here are
based on their remarks).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

18 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

19 / 20
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography

. Bibliography
Boulaknadel S, Daille B, and Aboutajdine D. 2008 a. Multi-word term indexing
for Arabic document retrieval. In Proceedings of the The IEEE symposium on
Computers and Communications, pp. 869-873.
Dunning T. 1994. Accurate Methods for the Statistics of Surprise and
Coincidence, volume 19. Computational Linguistics, pp. 61-74.
Frantzi K. T, Ananiadou S, and Tsujii T. 1998. The CValue/NC-Value Method
of Automatic Recognition for Multi-word terms. Journal on Research and
Advanced Technology for Digital Libraries, pp. 115-130.
Kageura K, and Umino B.1996, Methods of Automatic Term Recognition A
Review,volume 3. Terminology.
Vu T, Aw A. Ti, and Zhang M. 2008. Term Extraction Through Unithood And
Termhood Unification. In Procedings of IJCNLP.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier

Arabic MWT Extraction

20 / 20

More Related Content

Similar to A Study of Association Measures and their Combination for Arabic MWT Extraction

Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Association for Computational Linguistics
 
A novel method for arabic multi word term extraction
A novel method for arabic multi word term extractionA novel method for arabic multi word term extraction
A novel method for arabic multi word term extraction
ijdms
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
IJMIT JOURNAL
 
Blue ocean strategy
Blue ocean strategyBlue ocean strategy
Blue ocean strategy
EngSabreen Doghmosh
 
Blue ocean strategy
Blue ocean strategyBlue ocean strategy
Blue ocean strategy
EngSabreen Doghmosh
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets
gerogepatton
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
gerogepatton
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
ijaia
 
Qualitative & Mixed Methods Research
Qualitative & Mixed Methods ResearchQualitative & Mixed Methods Research
Qualitative & Mixed Methods Research
Sohail Bajammal
 
Aspect extraction (A survey)
Aspect extraction (A survey)Aspect extraction (A survey)
Aspect extraction (A survey)
Mido Razaz
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
ijnlc
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
diannepatricia
 
Analysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A surveyAnalysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A survey
Anjali Sreekumar
 
Analysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A surveyAnalysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A survey
Anjali Sreekumar
 
Assessing Quality of Individual Studies
Assessing Quality of Individual StudiesAssessing Quality of Individual Studies
Assessing Quality of Individual Studies
Effective Health Care Program
 
A New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic TextA New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic Text
CSCJournals
 
From RAE to REF
From RAE to REFFrom RAE to REF
From RAE to REF
David Clay
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Lifeng (Aaron) Han
 
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...
Gür, hamurcu, eren   2016 - selection of academic conferences based on analyt...Gür, hamurcu, eren   2016 - selection of academic conferences based on analyt...
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...
Quang Jimmy
 
Quality Assurance of NAO Value for Money Studies.doc
Quality Assurance of NAO Value for Money Studies.docQuality Assurance of NAO Value for Money Studies.doc
Quality Assurance of NAO Value for Money Studies.doc
NeerajOjha17
 

Similar to A Study of Association Measures and their Combination for Arabic MWT Extraction (20)

Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
 
A novel method for arabic multi word term extraction
A novel method for arabic multi word term extractionA novel method for arabic multi word term extraction
A novel method for arabic multi word term extraction
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
 
Blue ocean strategy
Blue ocean strategyBlue ocean strategy
Blue ocean strategy
 
Blue ocean strategy
Blue ocean strategyBlue ocean strategy
Blue ocean strategy
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
Qualitative & Mixed Methods Research
Qualitative & Mixed Methods ResearchQualitative & Mixed Methods Research
Qualitative & Mixed Methods Research
 
Aspect extraction (A survey)
Aspect extraction (A survey)Aspect extraction (A survey)
Aspect extraction (A survey)
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
 
Analysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A surveyAnalysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A survey
 
Analysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A surveyAnalysis of Feature Models using Alloy - A survey
Analysis of Feature Models using Alloy - A survey
 
Assessing Quality of Individual Studies
Assessing Quality of Individual StudiesAssessing Quality of Individual Studies
Assessing Quality of Individual Studies
 
A New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic TextA New Concept Extraction Method for Ontology Construction From Arabic Text
A New Concept Extraction Method for Ontology Construction From Arabic Text
 
From RAE to REF
From RAE to REFFrom RAE to REF
From RAE to REF
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...
Gür, hamurcu, eren   2016 - selection of academic conferences based on analyt...Gür, hamurcu, eren   2016 - selection of academic conferences based on analyt...
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...
 
Quality Assurance of NAO Value for Money Studies.doc
Quality Assurance of NAO Value for Money Studies.docQuality Assurance of NAO Value for Money Studies.doc
Quality Assurance of NAO Value for Money Studies.doc
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

A Study of Association Measures and their Combination for Arabic MWT Extraction

  • 1. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . A Study of Association Measures and their Combination for Arabic MWT Extraction . 10th International Conference on Terminology and Artificial Intelligence (TIA’2013) Abdelkader El Mahdaouy, Said El Alaoui Ouatik and Eric Gaussier October 28th, 2013 A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 1 / 20
  • 2. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . Table of contents . 1 Introduction Terminology Extraction Motivation . 2 The state of MWT extraction Standard Approaches Statistical Measures . 3 Proposed Method Linguistic Filter Statistical Filter . 4 Evaluation and Results Corpus Evaluation Method Obtained results . 5 Conclusion and perspectives . 6 Bibliography A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 2 / 20
  • 3. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Terminology Extraction Motivation . Terminology Extraction . Terminology . Set of terms representing the system of concepts of a particular subject field. . . Term . lexical unit that has an unambiguous meaning when used in a text of a specific domain. Refer to a defined concept ... (ISO 704). . . Terminology Extraction . Subtask of information extraction. . Automatically extract relevant terms from a given corpus. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 3 / 20
  • 4. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Terminology Extraction Motivation . Terminology Extraction . Terminology . Set of terms representing the system of concepts of a particular subject field. . . Term . lexical unit that has an unambiguous meaning when used in a text of a specific domain. Refer to a defined concept ... (ISO 704). . . Terminology Extraction . Subtask of information extraction. . Automatically extract relevant terms from a given corpus. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 3 / 20
  • 5. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Terminology Extraction Motivation . Motivation The bag-of-words model (based on single word terms) is a simplifying representation used in natural language processing and information retrieval(IR). Multi-word terms (MWT) are less ambiguous and less polysemous than single word terms. Using MWT instead of single word terms yields a better representation of document content. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 4 / 20
  • 6. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Standard Approaches Statistical Measures . Standard Approaches . Linguistic Approaches . Based on linguistic pre-processing and POS tagging. Extract candidate terms candidate using syntactic patterns. . . Statistical Approaches . Ranking candidate terms based on a particular measure that gives higher scores to ”good” candidate terms. Frequent expressions are assumed to represent important concepts. . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20
  • 7. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Standard Approaches Statistical Measures . Standard Approaches . Linguistic Approaches . Based on linguistic pre-processing and POS tagging. Extract candidate terms candidate using syntactic patterns. . . Statistical Approaches . Ranking candidate terms based on a particular measure that gives higher scores to ”good” candidate terms. Frequent expressions are assumed to represent important concepts. . . Hybrid Approaches . Combine linguistic and statistical techniques to extract MWTs in order to avoid the weaknesses of the two approaches. . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20
  • 8. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Standard Approaches Statistical Measures . Standard Approaches . Linguistic Approaches . Based on linguistic pre-processing and POS tagging. Extract candidate terms candidate using syntactic patterns. . . Statistical Approaches . Ranking candidate terms based on a particular measure that gives higher scores to ”good” candidate terms. Frequent expressions are assumed to represent important concepts. . . Hybrid Approaches . Combine linguistic and statistical techniques to extract MWTs in order to avoid the weaknesses of the two approaches. . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20
  • 9. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Standard Approaches Statistical Measures . Characteristics of MWTs Defined by Kageura et al., 1996 : . Unithood . The degree of strength or stability of syntagmatic combinations or collocations. . Log-Likelihood Ratio, T-Score, MI, etc. . Termthood . The degree to which a linguistic unit is related to a specific domain concept. . C/NC-value. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 6 / 20
  • 10. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Standard Approaches Statistical Measures . Characteristics of MWTs Defined by Kageura et al., 1996 : . Unithood . The degree of strength or stability of syntagmatic combinations or collocations. . Log-Likelihood Ratio, T-Score, MI, etc. . Termthood . The degree to which a linguistic unit is related to a specific domain concept. . C/NC-value. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 6 / 20
  • 11. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Proposed Method Hybrid method consists of two filters: . Linguistic Filter . Use AMIRA 2.0 (POS tagging toolkit). Extract MWT candidates based on syntactic patterns. . Handle the problem of MWT variation. . Statistical Filter . Propose novel statistical measure (NLC-value) that combine context information with termhood and unithood. . Evaluate state-of-the-art statistical measures. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 7 / 20
  • 12. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Proposed Method Hybrid method consists of two filters: . Linguistic Filter . Use AMIRA 2.0 (POS tagging toolkit). Extract MWT candidates based on syntactic patterns. . Handle the problem of MWT variation. . Statistical Filter . Propose novel statistical measure (NLC-value) that combine context information with termhood and unithood. . Evaluate state-of-the-art statistical measures. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 7 / 20
  • 13. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Linguistic Filter The proposed linguistic filter extracts candidate MWTs based on two core components; the POS tagger and the sequence identifier: . Syntactic patterns . (Noun + (Noun|Adj) + |(Noun|adj) + |(Noun|Adj)). . Noun Prep Noun. Figure 1 : The global schema of the linguistic filter A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 8 / 20
  • 14. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation Four types of variations are handled : graphical variants, inflectional variants, morpho-syntactic variants and syntactic variants. . Graphical Variants . Concern orthographic errors occurred in writing some particular letters (” ”, ” ” and ” ”). . . Example . which leads to meaning “Biodiversity”. . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 9 / 20
  • 15. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation Four types of variations are handled : graphical variants, inflectional variants, morpho-syntactic variants and syntactic variants. . Graphical Variants . Concern orthographic errors occurred in writing some particular letters (” ”, ” ” and ” ”). . . Example . which leads to meaning “Biodiversity”. . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 9 / 20
  • 16. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Inflectional Variants . These variants due to the use of different forms for the words constituting a MWT: The gender and the number. The presence/absence of a definite article. . . Examples . . 1 (ocean pollution) which leads to (pollution of the oceans). . 2 . (water pollution) which leads to water pollution). A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction (the 10 / 20
  • 17. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Inflectional Variants . These variants due to the use of different forms for the words constituting a MWT: The gender and the number. The presence/absence of a definite article. . . Examples . . 1 (ocean pollution) which leads to (pollution of the oceans). . 2 . (water pollution) which leads to water pollution). A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction (the 10 / 20
  • 18. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Morpho-syntactic Variants . These variants affect the internal structure of term as the words it contains are related through derivational morphology: Noun1 Noun2 ⇔ Noun1 Adj. Noun1 Adj ⇔ Noun1 Prep Noun. . . Examples . . 1 . 2 and which leads to (air pollution). (barrel of oil). . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 11 / 20
  • 19. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Morpho-syntactic Variants . These variants affect the internal structure of term as the words it contains are related through derivational morphology: Noun1 Noun2 ⇔ Noun1 Adj. Noun1 Adj ⇔ Noun1 Prep Noun. . . Examples . . 1 . 2 and which leads to (air pollution). (barrel of oil). . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 11 / 20
  • 20. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Syntactic Variants . These variants modify the structure of the MWT candidate by adding one or more words (as adjectives) but do not affect the grammatical categories: Noun1 Noun2 ⇔ Noun1 Noun2 Adj. Noun1 Adj1 ⇔ Noun1 Adj1 Adj2. . . Examples . . 1 (Water stocks) and (Groundwater stocks). . 2 . (Health Organization) and (World Health Organization). A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 12 / 20
  • 21. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter . Term variation . Syntactic Variants . These variants modify the structure of the MWT candidate by adding one or more words (as adjectives) but do not affect the grammatical categories: Noun1 Noun2 ⇔ Noun1 Noun2 Adj. Noun1 Adj1 ⇔ Noun1 Adj1 Adj2. . . Examples . . 1 (Water stocks) and (Groundwater stocks). . 2 . (Health Organization) and (World Health Organization). A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 12 / 20
  • 22. . Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Linguistic Filter Statistical Filter Statistical Filter The NLC-value . NLC-value . NLC-value(a) = 0.8 · LC-value(a) + 0.2 · N-value(a) . (1) with { log2 (|a|) · FL(a) if a is not nested, 1 ∑ log2 (|a|) · (FL(a) − |T | b∈Ta FL(b)) else a and FL(a) = f(a) · ln(2 + min(LLR(a))), ∑ |T(b)| fa (b) · N − value (a) = n b∈C LC-value(a) = , a . 1 . 2 |a| denotes the length in words of candidate term a. . 3 . 4 T(a) denotes the set of longer candidate terms into which a appears. . 5 . 6 Ca denotes the set of distinct context words of a. . 7 n is the total number of terms considered. f(a) is the number of occurrences of a. |T(a)| is the cardinality of the set T(a). fa (b) corresponds to the number of times b occurs in the context of a. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 13 / 20
  • 23. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Corpus Evaluation Method Obtained results . The Corpus Lack of Arabic specialized domain corpora. The corpus built contains 1666 files comprising 53569 different tokens (without stop words) extracted from the Web site “Al-Khat Alakhdar”. The corpus covers various environmental topics such as pollution, water purification, soil degradation, forest preservation, climate change and natural disasters. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 14 / 20
  • 24. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Corpus Evaluation Method Obtained results . The Evaluation . We computed the association scores (LLR, C-value, NC-value, NTC-value, LLR+C-value, NLC-value) for the MWT candidates. . 2 We retain from each produced ranking for each statistical 1 measure the k-best candidates, with k ranging from 100 to 300 at intervals of 100. . We have constituted automatically a reference list of all Arabic MWTs available in the latest version of AGROVOC thesaurus. . 4 We used translation of MWT and European terminological 3 database IATE. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 15 / 20
  • 25. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Corpus Evaluation Method Obtained results . Obtained results Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Table 1 : Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Top MWT considred 100 200 300 75,0% 70,5% 64,3% 71,0% 69,0% 67,3% 74,0% 70,0% 68,3% 80,0% 71,5% 69,7% 73,0% 72,0% 68,3% 82,0% 75,5% 73,0% Results obtained for different statistical measures Top MWT considred 100 200 300 35 60 80 27 59 82 32 62 82 35 60 83 34 60 84 41 65 86 Table 2 : Number of terms found in agrovoc foreach measure A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 16 / 20
  • 26. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Corpus Evaluation Method Obtained results . Obtained results Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Table 1 : Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Top MWT considred 100 200 300 75,0% 70,5% 64,3% 71,0% 69,0% 67,3% 74,0% 70,0% 68,3% 80,0% 71,5% 69,7% 73,0% 72,0% 68,3% 82,0% 75,5% 73,0% Results obtained for different statistical measures Top MWT considred 100 200 300 35 60 80 27 59 82 32 62 82 35 60 83 34 60 84 41 65 86 Table 2 : Number of terms found in agrovoc foreach measure A. El Mahdaouy, S.O El Alaoui and E. Gaussier Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Top MWT considred 100 200 300 40 81 113 44 79 120 42 78 123 45 83 126 39 84 121 41 86 133 Table 3 : Number of terms found in IATE foreach measure Arabic MWT Extraction 16 / 20
  • 27. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Corpus Evaluation Method Obtained results . Obtained results Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Table 1 : Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Top MWT considred 100 200 300 75,0% 70,5% 64,3% 71,0% 69,0% 67,3% 74,0% 70,0% 68,3% 80,0% 71,5% 69,7% 73,0% 72,0% 68,3% 82,0% 75,5% 73,0% Results obtained for different statistical measures Top MWT considred 100 200 300 35 60 80 27 59 82 32 62 82 35 60 83 34 60 84 41 65 86 Table 2 : Number of terms found in agrovoc foreach measure A. El Mahdaouy, S.O El Alaoui and E. Gaussier Statistical measures LLR C-value NC-value NTC-value LLR+C-value NLC-Value Top MWT considred 100 200 300 40 81 113 44 79 120 42 78 123 45 83 126 39 84 121 41 86 133 Table 3 : Number of terms found in IATE foreach measure Arabic MWT Extraction 16 / 20
  • 28. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography Figure 2 : Figure 3 : Corpus Evaluation Method Obtained results Precision obtained for different statistical measures that combine termhood and unithood Precision obtained for the C/NC-value and the NTC-value A. El Mahdaouy, S.O El Alaoui and E. Gaussier Figure 4 : Precision obtained for the LLR and the C/NC-value Arabic MWT Extraction 17 / 20
  • 29. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . Conclusion and perspectives . Conclusion . . 1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing linguistic and statistical approaches. . 2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates. . 3 Experiments are performed for bi-grams and tri-grams on an environment Arabic corpus. . . perspectives . . 1 Validate the proposed statistical measure in other language. . 2 Using the extracted MWTs for documents indexing and retrieving in IR systems. . . We appreciate the reviewers for their useful comments (the results presented here are based on their remarks). . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 18 / 20
  • 30. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . Conclusion and perspectives . Conclusion . . 1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing linguistic and statistical approaches. . 2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates. . 3 Experiments are performed for bi-grams and tri-grams on an environment Arabic corpus. . . perspectives . . 1 Validate the proposed statistical measure in other language. . 2 Using the extracted MWTs for documents indexing and retrieving in IR systems. . . We appreciate the reviewers for their useful comments (the results presented here are based on their remarks). . A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 18 / 20
  • 31. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 19 / 20
  • 32. Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . Bibliography Boulaknadel S, Daille B, and Aboutajdine D. 2008 a. Multi-word term indexing for Arabic document retrieval. In Proceedings of the The IEEE symposium on Computers and Communications, pp. 869-873. Dunning T. 1994. Accurate Methods for the Statistics of Surprise and Coincidence, volume 19. Computational Linguistics, pp. 61-74. Frantzi K. T, Ananiadou S, and Tsujii T. 1998. The CValue/NC-Value Method of Automatic Recognition for Multi-word terms. Journal on Research and Advanced Technology for Digital Libraries, pp. 115-130. Kageura K, and Umino B.1996, Methods of Automatic Term Recognition A Review,volume 3. Terminology. Vu T, Aw A. Ti, and Zhang M. 2008. Term Extraction Through Unithood And Termhood Unification. In Procedings of IJCNLP. A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 20 / 20