This document describes a study on extracting multi-word terms (MWTs) in Arabic texts. It presents a hybrid method using linguistic and statistical filters. The linguistic filter uses part-of-speech tagging to extract candidate MWTs based on syntactic patterns. The statistical filter ranks candidates using the proposed NLC-value measure, which combines context information with termhood and unithood. The method is evaluated on an Arabic environmental corpus and compared to other statistical measures. Results show the NLC-value outperforms other measures in extracting relevant MWTs.
This document provides a table of contents and overview of various Spanish grammar topics including: nationalities, stem changers, indirect object pronouns, gustar, affirmative and negative words, superlatives, affirmative tu commands, irregular verbs, negative tu commands, reflexives, and sequencing events. It discusses rules, examples, and exceptions for each topic in 1-3 sentences per section.
This document provides instructions for creating a Prezi account and basic lesson on using Prezi presentation software. It outlines how to:
1) Access the Prezi website and create a free public account by entering registration information.
2) Open an existing Prezi account by logging in with email and password credentials.
3) Start a new presentation by selecting a free template and customize it by changing templates, writing and formatting text, and editing the presentation path.
4) Add new frames, insert pictures, shapes, videos and control animations through the various tools in the Prezi interface.
5) Save work by downloading as a PDF or exiting and closing the presentation.
This document outlines branding and promotional activities for Samsung in both rural and urban areas of India. It discusses van-based activities and road shows in rural areas as well as urban promotions using sign media. The activities are being carried out by Crystal Sign Media Pvt. Ltd. based in Noida, India on behalf of Samsung.
Marketing and how atl and btl activity relate to marketing mc presentationSyed Salman
This document discusses Above the Line (ATL) and Below the Line (BTL) marketing activities and how they relate to marketing. ATL activities include television, radio, print media and billboards which allow mass communication. BTL involves more targeted and personal approaches like direct mail, events, promotions, and product demos to directly reach consumers. Both ATL and BTL techniques are used to effectively target customers depending on the product.
This document outlines a brand promotion project for retailers and active retail outlet owners in metro and mini-metro areas. It proposes various in-store activation ideas like puzzles, spinning wheels, and quizzes to build brand awareness, interest, and demand among retailers. It also suggests outdoor activities like processions, installations, and wall paintings to generate brand recognition among customers. The goal is to engage retailers through interactive learning activities and establish brand loyalty among customers through demonstrations and visibility campaigns.
Green Flag Branding Solutions is dedicated in a focused manner for all brand objectives in an experiential marketing domain where customer engagement is a top priority. Through this medium, strong and influential BRAND CONNECTION can be established, societal messages can be delivered and direct communication with the target audience is possible in an effective way.
We are committed to provide every solution related to Marketing, BRAND Building and Event Services for better BRAND Interaction and Experience.
- Promotion
- Merchandise
- Rural Marketing
- Direct marketing
- Awareness Campaign
- In-shop Activation
- Lead Generation Program
- Registration Management
- Road Shows
- Activations in Malls, Residential Societies
For any quires write to us at: contactus@greenflagdeals.com or events@greenflagdeals.com.
For direct connect call us at: 9986366664, 9008266664. Visit us at: www.greenflag.in
The document outlines the key elements of an effective marketing plan, including an executive summary, situation analysis, objectives, strategies, tactics, and budget. It provides examples of each element. The executive summary should briefly summarize the circumstances and recommendations. The situation analysis describes the company's current position. The objectives state where the company wants to be. The strategies are how the objectives will be achieved and tactics are specific actions that implement the strategies. The budget covers the costs.
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
Arabic Multiword Term are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any text mining
applications such as Categorisation, Clustering, Information Retrieval System, Machine
Translation, and Summarization, etc. This paper introduces our proposed Multiword term
extraction system based on the contextual information. In fact, we propose a new method based
a hybrid approach for Arabic Multiword term extraction. Like other method based on hybrid
approach, our method is composed by two main steps: the Linguistic approach and the
Statistical one. In the first step, the Linguistic approach uses Part Of Speech (POS) Tagger
(Taani’s Tagger) and the Sequence Identifier as patterns in order to extract the candidate
AMTWs. While in the second one which includes our main contribution, the Statistical approach
incorporates the contextual information by using a new proposed association measure based on
Termhood and Unithood for AMWTs extraction. To evaluate the efficiency of our proposed
method for AMWTs extraction, this later has been tested and compared using three different
association measures: the proposed one named NTC-Value, NC-Value, and C-Value. The
experimental results using Arabic Texts taken from the environment domain, show that our
hybrid method outperforms the other ones in term of precision, in addition, it can deal correctly
with tri-gram Arabic Multiword terms.
This document provides a table of contents and overview of various Spanish grammar topics including: nationalities, stem changers, indirect object pronouns, gustar, affirmative and negative words, superlatives, affirmative tu commands, irregular verbs, negative tu commands, reflexives, and sequencing events. It discusses rules, examples, and exceptions for each topic in 1-3 sentences per section.
This document provides instructions for creating a Prezi account and basic lesson on using Prezi presentation software. It outlines how to:
1) Access the Prezi website and create a free public account by entering registration information.
2) Open an existing Prezi account by logging in with email and password credentials.
3) Start a new presentation by selecting a free template and customize it by changing templates, writing and formatting text, and editing the presentation path.
4) Add new frames, insert pictures, shapes, videos and control animations through the various tools in the Prezi interface.
5) Save work by downloading as a PDF or exiting and closing the presentation.
This document outlines branding and promotional activities for Samsung in both rural and urban areas of India. It discusses van-based activities and road shows in rural areas as well as urban promotions using sign media. The activities are being carried out by Crystal Sign Media Pvt. Ltd. based in Noida, India on behalf of Samsung.
Marketing and how atl and btl activity relate to marketing mc presentationSyed Salman
This document discusses Above the Line (ATL) and Below the Line (BTL) marketing activities and how they relate to marketing. ATL activities include television, radio, print media and billboards which allow mass communication. BTL involves more targeted and personal approaches like direct mail, events, promotions, and product demos to directly reach consumers. Both ATL and BTL techniques are used to effectively target customers depending on the product.
This document outlines a brand promotion project for retailers and active retail outlet owners in metro and mini-metro areas. It proposes various in-store activation ideas like puzzles, spinning wheels, and quizzes to build brand awareness, interest, and demand among retailers. It also suggests outdoor activities like processions, installations, and wall paintings to generate brand recognition among customers. The goal is to engage retailers through interactive learning activities and establish brand loyalty among customers through demonstrations and visibility campaigns.
Green Flag Branding Solutions is dedicated in a focused manner for all brand objectives in an experiential marketing domain where customer engagement is a top priority. Through this medium, strong and influential BRAND CONNECTION can be established, societal messages can be delivered and direct communication with the target audience is possible in an effective way.
We are committed to provide every solution related to Marketing, BRAND Building and Event Services for better BRAND Interaction and Experience.
- Promotion
- Merchandise
- Rural Marketing
- Direct marketing
- Awareness Campaign
- In-shop Activation
- Lead Generation Program
- Registration Management
- Road Shows
- Activations in Malls, Residential Societies
For any quires write to us at: contactus@greenflagdeals.com or events@greenflagdeals.com.
For direct connect call us at: 9986366664, 9008266664. Visit us at: www.greenflag.in
The document outlines the key elements of an effective marketing plan, including an executive summary, situation analysis, objectives, strategies, tactics, and budget. It provides examples of each element. The executive summary should briefly summarize the circumstances and recommendations. The situation analysis describes the company's current position. The objectives state where the company wants to be. The strategies are how the objectives will be achieved and tactics are specific actions that implement the strategies. The budget covers the costs.
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
Arabic Multiword Term are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any text mining
applications such as Categorisation, Clustering, Information Retrieval System, Machine
Translation, and Summarization, etc. This paper introduces our proposed Multiword term
extraction system based on the contextual information. In fact, we propose a new method based
a hybrid approach for Arabic Multiword term extraction. Like other method based on hybrid
approach, our method is composed by two main steps: the Linguistic approach and the
Statistical one. In the first step, the Linguistic approach uses Part Of Speech (POS) Tagger
(Taani’s Tagger) and the Sequence Identifier as patterns in order to extract the candidate
AMTWs. While in the second one which includes our main contribution, the Statistical approach
incorporates the contextual information by using a new proposed association measure based on
Termhood and Unithood for AMWTs extraction. To evaluate the efficiency of our proposed
method for AMWTs extraction, this later has been tested and compared using three different
association measures: the proposed one named NTC-Value, NC-Value, and C-Value. The
experimental results using Arabic Texts taken from the environment domain, show that our
hybrid method outperforms the other ones in term of precision, in addition, it can deal correctly
with tri-gram Arabic Multiword terms.
The document describes a methodology for obtaining statistical machine translation (SMT) dictionaries for related languages without parallel data. The methodology uses cognate detection over comparable corpora to generate n-best lists of possible cognate translations, which are then ranked using a machine learning model trained on parallel data. Evaluation on Romance and Slavic language pairs shows the approach reduces out-of-vocabulary words for SMT by around 10% and yields small but not statistically significant BLEU score improvements over the baseline. Future work to improve the methodology is also discussed.
A novel method for arabic multi word term extractionijdms
Arabic Multiword Terms (AMWTs) are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any Arabic Text Mining
applications such as Categorization, Clustering, Information Retrieval System, Machine Translation, and
Summarization, etc. Mainly the proposed methods for AMWTs extraction can be categorized in three
approaches: Linguistic-based, Statistic-based, and hybrid-based approach. These methods present some
drawbacks that limit their use. In fact they can only deal with bi-grams terms and their yield not good
accuracies. In this paper, to overcome these drawbacks, we propose a new and efficient method for
AMWTs Extraction based on a hybrid approach. This latter is composed by two main filtering steps: the
Linguistic filter and the Statistical one. The Linguistic Filter uses our proposed Part Of Speech (POS)
Tagger and the Sequence identifier as patterns in order to extract candidate AMWTs. While the Statistical
filter incorporate the contextual information, and a new proposed association measure based on Termhood
and Unithood Estimation named NTC-Value.
To evaluate and illustrate the efficiency of our proposed method for AMWTs extraction, a comparative
study has been conducted based on Kalimat Corpus and using nine experiment schemes: In the linguistic
filter, we used three POS Taggers such as Taani’s method based Rule-approach, HMM method based
Statistical-approach, and our recently proposed Tagger based Hybrid –approach. While in the Statistical
filter, we used three statistical measures such as C-Value, NC-Value, and our proposed NTC-Value. The
obtained results demonstrate the efficiency of our proposed method for AMWTs extraction: it outperforms
the other ones and can deal correctly with the tri-grams terms.
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSIJMIT JOURNAL
In the context of Information Retrieval, Arabic stemming algorithms have become a most research area of
information retrieval. Many researchers have developed algorithms to solve the problem of stemming.
Each researcher proposed his own methodology and measurements to test the performance and compute
the accuracy of his algorithm. Thus, nobody can make accurate comparisons between these algorithms.
Many generic conflation techniques and stemming algorithms are theoretically analyzed in this paper.
Then, the main Arabic language characteristics that are necessary to be mentioned before discussing
Arabic stemmers are summarized. The evaluation of the algorithms in this paper shows that Arabic
stemming algorithm is still one of the most information retrieval challenges. This paper aims to compare
the most of the commonly used light stemmers in terms of affixes lists, algorithms, main ideas, and
information retrieval performance. The results show that the light10 stemmer outperformed the other
stemmers. Finally, recommendations for future research regarding the development of a standard Arabic
stemmer were presented.
تناولت في دراستي الأخيرة احدى استراتيجيات التسويق الحديثة وهي :
استراتيجية المحيط الأزرق: نوع من الاستراتيجيات الحديثة التي تناولتها كتب الإدارة الاستراتيجية الحديثة والمعاصرة والتي استندت من فكرة العالمان (البروفسور دبليو شان كي ، وزميلته البروفسور رينية موبورن (Renee Mauborgne))، تقوم هذه الاستراتيجية على فكرة ، انه ليس من الضروري على المنظمة التي تريد تحقيق النجاح في مسيرة حياتها العملية ان تحتل مركزا تنافسيا قويا، بل يمكن ان تحرز نجاحا بدون منافسة، وذلك بان تتبنى هذه المنظمات اسواقاً جديدة تعرض فيها منتجاتها الجديدة، او تقوم بطرح بضائع وسلع بديلة لاتجذب المنافس اليها، وبهذا تستطيع المنظمة تحقيق ارباحا وفيرة، وبذكائها وريادتها الاستراتيجية تتسطيع ان تجذب زبائن ومستهلكين جدد، وان تجعل الزبون أكثر ولاء لمنتجاتها وخدماته
#استراتيجية_المحيط_الأزرق
#Blue_ocean_strategy
دراستي الأخيرة احدى استراتيجيات التسويق الحديثة وهي :
استراتيجية المحيط الأزرق: نوع من الاستراتيجيات الحديثة التي تناولتها كتب الإدارة الاستراتيجية الحديثة والمعاصرة والتي استندت من فكرة العالمان (البروفسور دبليو شان كي ، وزميلته البروفسور رينية موبورن (Renee Mauborgne))، تقوم هذه الاستراتيجية على فكرة ، انه ليس من الضروري على المنظمة التي تريد تحقيق النجاح في مسيرة حياتها العملية ان تحتل مركزا تنافسيا قويا، بل يمكن ان تحرز نجاحا بدون منافسة، وذلك بان تتبنى هذه المنظمات اسواقاً جديدة تعرض فيها منتجاتها الجديدة، او تقوم بطرح بضائع وسلع بديلة لاتجذب المنافس اليها، وبهذا تستطيع المنظمة تحقيق ارباحا وفيرة، وبذكائها وريادتها الاستراتيجية تتسطيع ان تجذب زبائن ومستهلكين جدد، وان تجعل الزبون أكثر ولاء لمنتجاتها وخدماته
#Blue_ocean_strategy
#استراتيجية_المحيط_الأزرق
Customer Opinions Evaluation: A Case Study on Arabic Tweets gerogepatton
This paper presents an automatic method for extracting, processing, and analysis of customer opinions
on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural
Language Processing (NLP) with different types of analyses had performed. Second, we present an
automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives
as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded
by collecting synonyms and morphemes of each word through Arabic resources and google translate.
Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The
experimental results reveal that the proposed method outperforms counterpart ones with an improvement
margin of up to 4% using F-Measure.
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSgerogepatton
This paper presents an automatic method for extracting, processing, and analysis of customer opinions on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural Language Processing (NLP) with different types of analyses had performed. Second, we present an automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded by collecting synonyms and morphemes of each word through Arabic resources and google translate. Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The experimental results reveal that the proposed method outperforms counterpart ones with an improvement margin of up to 4% using F-Measure.
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSijaia
This paper presents an automatic method for extracting, processing, and analysis of customer opinions on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural Language Processing (NLP) with different types of analyses had performed. Second, we present an automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded by collecting synonyms and morphemes of each word through Arabic resources and google translate. Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The experimental results reveal that the proposed method outperforms counterpart ones with an improvement margin of up to 4% using F-Measure.
Qualitative research uses words rather than numbers to understand phenomena through interviews, observations and documents. It is useful when little is known about a condition or environment. Some key characteristics of qualitative research include studying things in their natural settings, using the researcher as the instrument of data collection, collecting multiple sources of data, and analyzing data inductively to identify themes. Mixed-methods research combines qualitative and quantitative approaches by collecting and analyzing both types of data sequentially or concurrently.
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
The rise of social media such as blogs and social n
etworks has fueled interest in sentiment analysis.
With
the proliferation of reviews, ratings, recommendati
ons and other forms of online expression, online op
inion
has turned into a kind of virtual currency for busi
nesses looking to market their products, identify n
ew
opportunities and manage their reputations, therefo
re many are now looking to the field of sentiment
analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analy
sis.
Our approach is using Arabic idioms/saying phrases
lexicon as a key importance for improving the
detection of the sentiment polarity in Arabic sente
nces as well as a number of novels and rich set of
linguistically motivated features (contextual Inten
sifiers, contextual Shifter and negation handling),
syntactic features for conflicting phrases which en
hance the sentiment classification accuracy.
Furthermore, we introduce an automatic expandable w
ide coverage polarity lexicon of Arabic sentiment
words. The lexicon is built with gold-standard sent
iment words as a seed which is manually collected a
nd
annotated and it expands and detects the sentiment
orientation automatically of new sentiment words us
ing
synset aggregation technique and free online Arabic
lexicons and thesauruses. Our data focus on modern
standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product
reviews, etc.). The experimental results using our
resources and techniques with SVM classifier indica
te
high performance levels, with accuracies of over 95
%.
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
Dr. Achim Rettinger from Karlsruhe Institute of Technology presented this today as part of the Cognitive Systems Institute Speaker Series on October 13, 2016
Analysis of Feature Models using Alloy - A surveyAnjali Sreekumar
This is a presentation of the work on a survey related to Feature model analysis using Alloy presented during the FMSPLE workshop at Eindhoven University of Technology on 3rd April 2016.
This document summarizes the process of quality assessment for individual studies in systematic reviews. It describes defining quality as protection against bias, the reasons for quality assessment including interpreting results and grading evidence strength. The key steps are classifying study design, applying predefined criteria considering biases, arriving at a quality rating of good, fair or poor, and transparently reporting the quality assessment process and ratings.
A New Concept Extraction Method for Ontology Construction From Arabic TextCSCJournals
Ontology is one of the most popular representation model used for knowledge representation, sharing and reusing. The Arabic language has complex morphological, grammatical, and semantic aspects. Due to complexity of Arabic language, automatic Arabic terminology extraction is difficult. In addition, concept extraction from Arabic documents has been challenging research area, because, as opposed to term extraction, concept extraction are more domain related and more selective. In this paper, we present a new concept extraction method for Arabic ontology construction, which is the part of our ontology construction framework. A new method to extract domain relevant single and multi-word concepts in the domain has been proposed, implemented and evaluated. Our method combines linguistic, statistical information and domain knowledge. It first uses linguistic patterns based on POS tags to extract concept candidates, and then stop words filter is implemented to filter unwanted strings. To determine relevance of these candidates within the domain, different statistical measures and new domain relevance measure are implemented for first time for Arabic language. To enhance the performance of concept extraction, a domain knowledge will be integrated into the module. The concepts scores are calculated according to their statistical values and domain knowledge values. In order to evaluate the performance of the method, precision scores were calculated. The results show the high effectiveness of the proposed approach to extract concepts for Arabic ontology construction.
This document summarizes a presentation on research assessment in the UK. It outlines the Research Assessment Exercise (RAE) process and its impact on researcher behavior. It then discusses the transition to the new Research Excellence Framework (REF), which will place greater emphasis on citations, impact, and environment. The presentation notes that the RAE and REF influence what researchers study and how they disseminate their work, and that behaviors will continue adjusting in response to assessment changes.
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...Quang Jimmy
This document discusses the selection of academic conferences using the Analytic Network Process (ANP). It begins with an abstract that outlines the importance of academic conferences and factors considered in selecting them, such as registration fees, subject matter, and deadlines. It then reviews literature on criteria used by academics to select conferences. These include location, costs, and opportunities.
The document goes on to describe the ANP method and its use in various decision-making problems. It then presents a case study using ANP to select among six conferences based on criteria like costs, time, location, and the conferences themselves. Data from academics was used to build an ANP network model relating the criteria. Pairwise comparisons and supermatrix calculations were
Quality Assurance of NAO Value for Money Studies.docNeerajOjha17
The NAO asks external academics to review its VFM reports to benefit from independent scrutiny and ensure technical rigor. Initially all published reports were reviewed, but since 2006 reviews are conducted on draft reports. Two universities evaluate reports on criteria like scope, analysis, conclusions, and provide scores from 1-5. The review process was changed to pre-publication to enhance report quality using external feedback. Reviewers evaluate drafts against criteria like scope, context, analysis, value for money, and evidence to provide moderated, consistent scoring and comments to the NAO.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
More Related Content
Similar to A Study of Association Measures and their Combination for Arabic MWT Extraction
The document describes a methodology for obtaining statistical machine translation (SMT) dictionaries for related languages without parallel data. The methodology uses cognate detection over comparable corpora to generate n-best lists of possible cognate translations, which are then ranked using a machine learning model trained on parallel data. Evaluation on Romance and Slavic language pairs shows the approach reduces out-of-vocabulary words for SMT by around 10% and yields small but not statistically significant BLEU score improvements over the baseline. Future work to improve the methodology is also discussed.
A novel method for arabic multi word term extractionijdms
Arabic Multiword Terms (AMWTs) are relevant strings of words in text documents. Once they are
automatically extracted, they can be used to increase the performance of any Arabic Text Mining
applications such as Categorization, Clustering, Information Retrieval System, Machine Translation, and
Summarization, etc. Mainly the proposed methods for AMWTs extraction can be categorized in three
approaches: Linguistic-based, Statistic-based, and hybrid-based approach. These methods present some
drawbacks that limit their use. In fact they can only deal with bi-grams terms and their yield not good
accuracies. In this paper, to overcome these drawbacks, we propose a new and efficient method for
AMWTs Extraction based on a hybrid approach. This latter is composed by two main filtering steps: the
Linguistic filter and the Statistical one. The Linguistic Filter uses our proposed Part Of Speech (POS)
Tagger and the Sequence identifier as patterns in order to extract candidate AMWTs. While the Statistical
filter incorporate the contextual information, and a new proposed association measure based on Termhood
and Unithood Estimation named NTC-Value.
To evaluate and illustrate the efficiency of our proposed method for AMWTs extraction, a comparative
study has been conducted based on Kalimat Corpus and using nine experiment schemes: In the linguistic
filter, we used three POS Taggers such as Taani’s method based Rule-approach, HMM method based
Statistical-approach, and our recently proposed Tagger based Hybrid –approach. While in the Statistical
filter, we used three statistical measures such as C-Value, NC-Value, and our proposed NTC-Value. The
obtained results demonstrate the efficiency of our proposed method for AMWTs extraction: it outperforms
the other ones and can deal correctly with the tri-grams terms.
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSIJMIT JOURNAL
In the context of Information Retrieval, Arabic stemming algorithms have become a most research area of
information retrieval. Many researchers have developed algorithms to solve the problem of stemming.
Each researcher proposed his own methodology and measurements to test the performance and compute
the accuracy of his algorithm. Thus, nobody can make accurate comparisons between these algorithms.
Many generic conflation techniques and stemming algorithms are theoretically analyzed in this paper.
Then, the main Arabic language characteristics that are necessary to be mentioned before discussing
Arabic stemmers are summarized. The evaluation of the algorithms in this paper shows that Arabic
stemming algorithm is still one of the most information retrieval challenges. This paper aims to compare
the most of the commonly used light stemmers in terms of affixes lists, algorithms, main ideas, and
information retrieval performance. The results show that the light10 stemmer outperformed the other
stemmers. Finally, recommendations for future research regarding the development of a standard Arabic
stemmer were presented.
تناولت في دراستي الأخيرة احدى استراتيجيات التسويق الحديثة وهي :
استراتيجية المحيط الأزرق: نوع من الاستراتيجيات الحديثة التي تناولتها كتب الإدارة الاستراتيجية الحديثة والمعاصرة والتي استندت من فكرة العالمان (البروفسور دبليو شان كي ، وزميلته البروفسور رينية موبورن (Renee Mauborgne))، تقوم هذه الاستراتيجية على فكرة ، انه ليس من الضروري على المنظمة التي تريد تحقيق النجاح في مسيرة حياتها العملية ان تحتل مركزا تنافسيا قويا، بل يمكن ان تحرز نجاحا بدون منافسة، وذلك بان تتبنى هذه المنظمات اسواقاً جديدة تعرض فيها منتجاتها الجديدة، او تقوم بطرح بضائع وسلع بديلة لاتجذب المنافس اليها، وبهذا تستطيع المنظمة تحقيق ارباحا وفيرة، وبذكائها وريادتها الاستراتيجية تتسطيع ان تجذب زبائن ومستهلكين جدد، وان تجعل الزبون أكثر ولاء لمنتجاتها وخدماته
#استراتيجية_المحيط_الأزرق
#Blue_ocean_strategy
دراستي الأخيرة احدى استراتيجيات التسويق الحديثة وهي :
استراتيجية المحيط الأزرق: نوع من الاستراتيجيات الحديثة التي تناولتها كتب الإدارة الاستراتيجية الحديثة والمعاصرة والتي استندت من فكرة العالمان (البروفسور دبليو شان كي ، وزميلته البروفسور رينية موبورن (Renee Mauborgne))، تقوم هذه الاستراتيجية على فكرة ، انه ليس من الضروري على المنظمة التي تريد تحقيق النجاح في مسيرة حياتها العملية ان تحتل مركزا تنافسيا قويا، بل يمكن ان تحرز نجاحا بدون منافسة، وذلك بان تتبنى هذه المنظمات اسواقاً جديدة تعرض فيها منتجاتها الجديدة، او تقوم بطرح بضائع وسلع بديلة لاتجذب المنافس اليها، وبهذا تستطيع المنظمة تحقيق ارباحا وفيرة، وبذكائها وريادتها الاستراتيجية تتسطيع ان تجذب زبائن ومستهلكين جدد، وان تجعل الزبون أكثر ولاء لمنتجاتها وخدماته
#Blue_ocean_strategy
#استراتيجية_المحيط_الأزرق
Customer Opinions Evaluation: A Case Study on Arabic Tweets gerogepatton
This paper presents an automatic method for extracting, processing, and analysis of customer opinions
on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural
Language Processing (NLP) with different types of analyses had performed. Second, we present an
automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives
as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded
by collecting synonyms and morphemes of each word through Arabic resources and google translate.
Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The
experimental results reveal that the proposed method outperforms counterpart ones with an improvement
margin of up to 4% using F-Measure.
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSgerogepatton
This paper presents an automatic method for extracting, processing, and analysis of customer opinions on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural Language Processing (NLP) with different types of analyses had performed. Second, we present an automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded by collecting synonyms and morphemes of each word through Arabic resources and google translate. Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The experimental results reveal that the proposed method outperforms counterpart ones with an improvement margin of up to 4% using F-Measure.
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSijaia
This paper presents an automatic method for extracting, processing, and analysis of customer opinions on Arabic social media. We present a four-step approach for mining of Arabic tweets. First, Natural Language Processing (NLP) with different types of analyses had performed. Second, we present an automatic and expandable lexicon for Arabic adjectives. The initial lexicon is built using 1350 adjectives as seeds from processing of different datasets in Arabic language. The lexicon is automatically expanded by collecting synonyms and morphemes of each word through Arabic resources and google translate. Third, emotional analysis was considered by two different methods; Machine Learning (ML) and rulebased method. Finally, Feature Selection (FS) is also considered to enhance the mining results. The experimental results reveal that the proposed method outperforms counterpart ones with an improvement margin of up to 4% using F-Measure.
Qualitative research uses words rather than numbers to understand phenomena through interviews, observations and documents. It is useful when little is known about a condition or environment. Some key characteristics of qualitative research include studying things in their natural settings, using the researcher as the instrument of data collection, collecting multiple sources of data, and analyzing data inductively to identify themes. Mixed-methods research combines qualitative and quantitative approaches by collecting and analyzing both types of data sequentially or concurrently.
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
The rise of social media such as blogs and social n
etworks has fueled interest in sentiment analysis.
With
the proliferation of reviews, ratings, recommendati
ons and other forms of online expression, online op
inion
has turned into a kind of virtual currency for busi
nesses looking to market their products, identify n
ew
opportunities and manage their reputations, therefo
re many are now looking to the field of sentiment
analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analy
sis.
Our approach is using Arabic idioms/saying phrases
lexicon as a key importance for improving the
detection of the sentiment polarity in Arabic sente
nces as well as a number of novels and rich set of
linguistically motivated features (contextual Inten
sifiers, contextual Shifter and negation handling),
syntactic features for conflicting phrases which en
hance the sentiment classification accuracy.
Furthermore, we introduce an automatic expandable w
ide coverage polarity lexicon of Arabic sentiment
words. The lexicon is built with gold-standard sent
iment words as a seed which is manually collected a
nd
annotated and it expands and detects the sentiment
orientation automatically of new sentiment words us
ing
synset aggregation technique and free online Arabic
lexicons and thesauruses. Our data focus on modern
standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product
reviews, etc.). The experimental results using our
resources and techniques with SVM classifier indica
te
high performance levels, with accuracies of over 95
%.
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
Dr. Achim Rettinger from Karlsruhe Institute of Technology presented this today as part of the Cognitive Systems Institute Speaker Series on October 13, 2016
Analysis of Feature Models using Alloy - A surveyAnjali Sreekumar
This is a presentation of the work on a survey related to Feature model analysis using Alloy presented during the FMSPLE workshop at Eindhoven University of Technology on 3rd April 2016.
This document summarizes the process of quality assessment for individual studies in systematic reviews. It describes defining quality as protection against bias, the reasons for quality assessment including interpreting results and grading evidence strength. The key steps are classifying study design, applying predefined criteria considering biases, arriving at a quality rating of good, fair or poor, and transparently reporting the quality assessment process and ratings.
A New Concept Extraction Method for Ontology Construction From Arabic TextCSCJournals
Ontology is one of the most popular representation model used for knowledge representation, sharing and reusing. The Arabic language has complex morphological, grammatical, and semantic aspects. Due to complexity of Arabic language, automatic Arabic terminology extraction is difficult. In addition, concept extraction from Arabic documents has been challenging research area, because, as opposed to term extraction, concept extraction are more domain related and more selective. In this paper, we present a new concept extraction method for Arabic ontology construction, which is the part of our ontology construction framework. A new method to extract domain relevant single and multi-word concepts in the domain has been proposed, implemented and evaluated. Our method combines linguistic, statistical information and domain knowledge. It first uses linguistic patterns based on POS tags to extract concept candidates, and then stop words filter is implemented to filter unwanted strings. To determine relevance of these candidates within the domain, different statistical measures and new domain relevance measure are implemented for first time for Arabic language. To enhance the performance of concept extraction, a domain knowledge will be integrated into the module. The concepts scores are calculated according to their statistical values and domain knowledge values. In order to evaluate the performance of the method, precision scores were calculated. The results show the high effectiveness of the proposed approach to extract concepts for Arabic ontology construction.
This document summarizes a presentation on research assessment in the UK. It outlines the Research Assessment Exercise (RAE) process and its impact on researcher behavior. It then discusses the transition to the new Research Excellence Framework (REF), which will place greater emphasis on citations, impact, and environment. The presentation notes that the RAE and REF influence what researchers study and how they disseminate their work, and that behaviors will continue adjusting in response to assessment changes.
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Gür, hamurcu, eren 2016 - selection of academic conferences based on analyt...Quang Jimmy
This document discusses the selection of academic conferences using the Analytic Network Process (ANP). It begins with an abstract that outlines the importance of academic conferences and factors considered in selecting them, such as registration fees, subject matter, and deadlines. It then reviews literature on criteria used by academics to select conferences. These include location, costs, and opportunities.
The document goes on to describe the ANP method and its use in various decision-making problems. It then presents a case study using ANP to select among six conferences based on criteria like costs, time, location, and the conferences themselves. Data from academics was used to build an ANP network model relating the criteria. Pairwise comparisons and supermatrix calculations were
Quality Assurance of NAO Value for Money Studies.docNeerajOjha17
The NAO asks external academics to review its VFM reports to benefit from independent scrutiny and ensure technical rigor. Initially all published reports were reviewed, but since 2006 reviews are conducted on draft reports. Two universities evaluate reports on criteria like scope, analysis, conclusions, and provide scores from 1-5. The review process was changed to pre-publication to enhance report quality using external feedback. Reviewers evaluate drafts against criteria like scope, context, analysis, value for money, and evidence to provide moderated, consistent scoring and comments to the NAO.
Similar to A Study of Association Measures and their Combination for Arabic MWT Extraction (20)
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes and Domino License Cost Reduction in the World of DLAU
A Study of Association Measures and their Combination for Arabic MWT Extraction
1. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
.
A Study of Association Measures and their
Combination for Arabic MWT Extraction
.
10th International Conference on Terminology and Artificial
Intelligence (TIA’2013)
Abdelkader El Mahdaouy, Said El Alaoui Ouatik and Eric Gaussier
October 28th, 2013
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
1 / 20
2. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
. Table of contents
.
1 Introduction
Terminology Extraction
Motivation
.
2 The state of MWT extraction
Standard Approaches
Statistical Measures
.
3 Proposed Method
Linguistic Filter
Statistical Filter
.
4 Evaluation and Results
Corpus
Evaluation Method
Obtained results
.
5 Conclusion and perspectives
.
6 Bibliography
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
2 / 20
3. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Terminology Extraction
Motivation
. Terminology Extraction
.
Terminology
.
Set of terms representing the system of concepts of a particular
subject field.
.
.
Term
.
lexical unit that has an unambiguous meaning when used in a
text of a specific domain.
Refer to a defined concept ... (ISO 704).
.
.
Terminology Extraction
.
Subtask of information extraction.
.
Automatically extract relevant terms from a given corpus.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
3 / 20
4. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Terminology Extraction
Motivation
. Terminology Extraction
.
Terminology
.
Set of terms representing the system of concepts of a particular
subject field.
.
.
Term
.
lexical unit that has an unambiguous meaning when used in a
text of a specific domain.
Refer to a defined concept ... (ISO 704).
.
.
Terminology Extraction
.
Subtask of information extraction.
.
Automatically extract relevant terms from a given corpus.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
3 / 20
5. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Terminology Extraction
Motivation
. Motivation
The bag-of-words model (based on single word terms) is a
simplifying representation used in natural language processing
and information retrieval(IR).
Multi-word terms (MWT) are less ambiguous and less
polysemous than single word terms.
Using MWT instead of single word terms yields a better
representation of document content.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
4 / 20
6. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Standard Approaches
Statistical Measures
. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
5 / 20
7. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Standard Approaches
Statistical Measures
. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.
.
Hybrid Approaches
.
Combine linguistic and statistical techniques to extract MWTs in order to avoid the
weaknesses of the two approaches.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
5 / 20
8. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Standard Approaches
Statistical Measures
. Standard Approaches
.
Linguistic Approaches
.
Based on linguistic pre-processing and POS tagging.
Extract candidate terms candidate using syntactic patterns.
.
.
Statistical Approaches
.
Ranking candidate terms based on a particular measure that gives higher scores
to ”good” candidate terms.
Frequent expressions are assumed to represent important concepts.
.
.
Hybrid Approaches
.
Combine linguistic and statistical techniques to extract MWTs in order to avoid the
weaknesses of the two approaches.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
5 / 20
9. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Standard Approaches
Statistical Measures
. Characteristics of MWTs
Defined by Kageura et al., 1996 :
.
Unithood
.
The degree of strength or stability of syntagmatic
combinations or collocations.
.
Log-Likelihood Ratio, T-Score, MI, etc.
.
Termthood
.
The degree to which a linguistic unit is related to a specific
domain concept.
.
C/NC-value.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
6 / 20
10. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Standard Approaches
Statistical Measures
. Characteristics of MWTs
Defined by Kageura et al., 1996 :
.
Unithood
.
The degree of strength or stability of syntagmatic
combinations or collocations.
.
Log-Likelihood Ratio, T-Score, MI, etc.
.
Termthood
.
The degree to which a linguistic unit is related to a specific
domain concept.
.
C/NC-value.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
6 / 20
11. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Proposed Method
Hybrid method consists of two filters:
.
Linguistic Filter
.
Use AMIRA 2.0 (POS tagging toolkit).
Extract MWT candidates based on syntactic patterns.
.
Handle the problem of MWT variation.
.
Statistical Filter
.
Propose novel statistical measure (NLC-value) that combine
context information with termhood and unithood.
.
Evaluate state-of-the-art statistical measures.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
7 / 20
12. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Proposed Method
Hybrid method consists of two filters:
.
Linguistic Filter
.
Use AMIRA 2.0 (POS tagging toolkit).
Extract MWT candidates based on syntactic patterns.
.
Handle the problem of MWT variation.
.
Statistical Filter
.
Propose novel statistical measure (NLC-value) that combine
context information with termhood and unithood.
.
Evaluate state-of-the-art statistical measures.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
7 / 20
13. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Linguistic Filter
The proposed linguistic filter extracts
candidate MWTs based on two core
components; the POS tagger and the
sequence identifier:
.
Syntactic patterns
.
(Noun + (Noun|Adj) +
|(Noun|adj) + |(Noun|Adj)).
.
Noun Prep Noun.
Figure 1 : The global schema of
the linguistic filter
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
8 / 20
14. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
Four types of variations are handled : graphical variants,
inflectional variants, morpho-syntactic variants and syntactic
variants.
.
Graphical Variants
.
Concern orthographic errors occurred in writing some particular
letters (” ”, ” ” and ” ”).
.
.
Example
.
which leads to
meaning
“Biodiversity”.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
9 / 20
15. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
Four types of variations are handled : graphical variants,
inflectional variants, morpho-syntactic variants and syntactic
variants.
.
Graphical Variants
.
Concern orthographic errors occurred in writing some particular
letters (” ”, ” ” and ” ”).
.
.
Example
.
which leads to
meaning
“Biodiversity”.
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
9 / 20
16. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Inflectional Variants
.
These variants due to the use of different forms for the words
constituting a MWT:
The gender and the number.
The presence/absence of a definite article.
.
.
Examples
.
.
1
(ocean pollution) which leads to
(pollution of the oceans).
.
2
.
(water pollution) which leads to
water pollution).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
(the
10 / 20
17. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Inflectional Variants
.
These variants due to the use of different forms for the words
constituting a MWT:
The gender and the number.
The presence/absence of a definite article.
.
.
Examples
.
.
1
(ocean pollution) which leads to
(pollution of the oceans).
.
2
.
(water pollution) which leads to
water pollution).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
(the
10 / 20
18. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Morpho-syntactic Variants
.
These variants affect the internal structure of term as the words it
contains are related through derivational morphology:
Noun1 Noun2 ⇔ Noun1 Adj.
Noun1 Adj ⇔ Noun1 Prep Noun.
.
.
Examples
.
.
1
.
2
and
which leads to
(air pollution).
(barrel of oil).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
11 / 20
19. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Morpho-syntactic Variants
.
These variants affect the internal structure of term as the words it
contains are related through derivational morphology:
Noun1 Noun2 ⇔ Noun1 Adj.
Noun1 Adj ⇔ Noun1 Prep Noun.
.
.
Examples
.
.
1
.
2
and
which leads to
(air pollution).
(barrel of oil).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
11 / 20
20. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Syntactic Variants
.
These variants modify the structure of the MWT candidate by
adding one or more words (as adjectives) but do not affect the
grammatical categories:
Noun1 Noun2 ⇔ Noun1 Noun2 Adj.
Noun1 Adj1 ⇔ Noun1 Adj1 Adj2.
.
.
Examples
.
.
1
(Water stocks) and
(Groundwater stocks).
.
2
.
(Health Organization) and
(World Health Organization).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
12 / 20
21. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
. Term variation
.
Syntactic Variants
.
These variants modify the structure of the MWT candidate by
adding one or more words (as adjectives) but do not affect the
grammatical categories:
Noun1 Noun2 ⇔ Noun1 Noun2 Adj.
Noun1 Adj1 ⇔ Noun1 Adj1 Adj2.
.
.
Examples
.
.
1
(Water stocks) and
(Groundwater stocks).
.
2
.
(Health Organization) and
(World Health Organization).
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
12 / 20
22. .
Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Linguistic Filter
Statistical Filter
Statistical Filter
The NLC-value
.
NLC-value
.
NLC-value(a) = 0.8 · LC-value(a) + 0.2 · N-value(a)
.
(1)
with
{
log2 (|a|) · FL(a) if a is not nested,
1 ∑
log2 (|a|) · (FL(a) − |T | b∈Ta FL(b)) else
a
and FL(a) = f(a) · ln(2 + min(LLR(a))),
∑
|T(b)|
fa (b) ·
N − value (a) =
n
b∈C
LC-value(a) =
,
a
.
1
.
2
|a| denotes the length in words of candidate term a.
.
3
.
4
T(a) denotes the set of longer candidate terms into which a appears.
.
5
.
6
Ca denotes the set of distinct context words of a.
.
7
n is the total number of terms considered.
f(a) is the number of occurrences of a.
|T(a)| is the cardinality of the set T(a).
fa (b) corresponds to the number of times b occurs in the context of a.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
13 / 20
23. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Corpus
Evaluation Method
Obtained results
. The Corpus
Lack of Arabic specialized domain corpora.
The corpus built contains 1666 files comprising 53569
different tokens (without stop words) extracted from the Web
site “Al-Khat Alakhdar”.
The corpus covers various environmental topics such as
pollution, water purification, soil degradation, forest
preservation, climate change and natural disasters.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
14 / 20
24. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Corpus
Evaluation Method
Obtained results
. The Evaluation
. We computed the association scores (LLR, C-value, NC-value,
NTC-value, LLR+C-value, NLC-value) for the MWT
candidates.
.
2 We retain from each produced ranking for each statistical
1
measure the k-best candidates, with k ranging from 100 to
300 at intervals of 100.
. We have constituted automatically a reference list of all Arabic
MWTs available in the latest version of AGROVOC thesaurus.
.
4 We used translation of MWT and European terminological
3
database IATE.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
15 / 20
25. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Corpus
Evaluation Method
Obtained results
. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Table 1 :
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%
Results obtained for different statistical measures
Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86
Table 2 :
Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
16 / 20
26. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Corpus
Evaluation Method
Obtained results
. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Table 1 :
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%
Results obtained for different statistical measures
Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86
Table 2 :
Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Top MWT considred
100
200
300
40
81
113
44
79
120
42
78
123
45
83
126
39
84
121
41
86
133
Table 3 :
Number of terms found in IATE
foreach measure
Arabic MWT Extraction
16 / 20
27. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Corpus
Evaluation Method
Obtained results
. Obtained results
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Table 1 :
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Top MWT considred
100
200
300
75,0%
70,5%
64,3%
71,0%
69,0%
67,3%
74,0%
70,0%
68,3%
80,0%
71,5%
69,7%
73,0%
72,0%
68,3%
82,0%
75,5%
73,0%
Results obtained for different statistical measures
Top MWT considred
100
200
300
35
60
80
27
59
82
32
62
82
35
60
83
34
60
84
41
65
86
Table 2 :
Number of terms found in agrovoc
foreach measure
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Statistical measures
LLR
C-value
NC-value
NTC-value
LLR+C-value
NLC-Value
Top MWT considred
100
200
300
40
81
113
44
79
120
42
78
123
45
83
126
39
84
121
41
86
133
Table 3 :
Number of terms found in IATE
foreach measure
Arabic MWT Extraction
16 / 20
28. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
Figure 2 :
Figure 3 :
Corpus
Evaluation Method
Obtained results
Precision obtained for different statistical measures that combine termhood and unithood
Precision obtained for the C/NC-value and
the NTC-value
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Figure 4 :
Precision obtained for the LLR and the
C/NC-value
Arabic MWT Extraction
17 / 20
29. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
. Conclusion and perspectives
.
Conclusion
.
.
1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing
linguistic and statistical approaches.
.
2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.
.
3
Experiments are performed for bi-grams and tri-grams on an environment Arabic
corpus.
.
.
perspectives
.
.
1 Validate the proposed statistical measure in other language.
.
2 Using the extracted MWTs for documents indexing and retrieving in IR systems.
.
.
We appreciate the reviewers for their useful comments (the results presented here are
based on their remarks).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
18 / 20
30. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
. Conclusion and perspectives
.
Conclusion
.
.
1 Hybrid method for Arabic MWT acquisition, that takes advantage of existing
linguistic and statistical approaches.
.
2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.
.
3
Experiments are performed for bi-grams and tri-grams on an environment Arabic
corpus.
.
.
perspectives
.
.
1 Validate the proposed statistical measure in other language.
.
2 Using the extracted MWTs for documents indexing and retrieving in IR systems.
.
.
We appreciate the reviewers for their useful comments (the results presented here are
based on their remarks).
.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
18 / 20
31. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
19 / 20
32. Introduction
The state of MWT extraction
Proposed Method
Evaluation and Results
Conclusion and perspectives
Bibliography
. Bibliography
Boulaknadel S, Daille B, and Aboutajdine D. 2008 a. Multi-word term indexing
for Arabic document retrieval. In Proceedings of the The IEEE symposium on
Computers and Communications, pp. 869-873.
Dunning T. 1994. Accurate Methods for the Statistics of Surprise and
Coincidence, volume 19. Computational Linguistics, pp. 61-74.
Frantzi K. T, Ananiadou S, and Tsujii T. 1998. The CValue/NC-Value Method
of Automatic Recognition for Multi-word terms. Journal on Research and
Advanced Technology for Digital Libraries, pp. 115-130.
Kageura K, and Umino B.1996, Methods of Automatic Term Recognition A
Review,volume 3. Terminology.
Vu T, Aw A. Ti, and Zhang M. 2008. Term Extraction Through Unithood And
Termhood Unification. In Procedings of IJCNLP.
A. El Mahdaouy, S.O El Alaoui and E. Gaussier
Arabic MWT Extraction
20 / 20