Arabic language is the most spoken languages in the Semitic languages group, and one of the most common languages in the world spoken by more than 422 million. It is also of paramount importance to Muslims, it is a sacred language of the Islamic Holly Book (Quran) and prayer (and other acts of worship) in Islam is performed only by mastering some of Arabic words. Arabic is also a major ritual language of a number of Christian churches in the Arab world and it is also used in writing several intellectual and religious Jewish books in the Middle Ages. Despite this, there is no semantic Arabic lexicon which researchers can depend on. In this paper we introduce Azhary as a lexical ontology for the Arabic language. It groups Arabic words into sets of synonyms called synsets, and records a number of relationships between words such as synonym, antonym, hypernym, hyponym, meronym, holonym and association relations. The ontology contains 26,195 words organized in 13,328 synsets. It has been developed and contrasted against AWN which is the most common available Arabic lexical ontology.
ANALYTICAL STUDY TO REVIEW OF ARABIC LANGUAGE LEARNING USING INTERNET WEBSITESijcsit
Arabic language is one of the most commonly used language in the world. It plays a very important role in
educational operations around the world. It contains various parts such as poetry, poem, novel, and
stories, as well as linguistic and grammatical rules and movements of letters, which change the word
according to the movements accompanying each letter, for example, there are movements of lifting and
breaking and annexation and silence. In this paper, we will review the research papers that studied the
Arabic language learning websites, using the content analysis to determine strengths, weaknesses,
advantages, disadvantages, and limitations. This research paper concluded that there is still a shortage
and scarcity in the number of articles and websites on the internet that teach Arabic language. The
suggestion is to assign task of development to Arab world instituties and others by increasing their number
of websites on the internet and enriching their scientific content to improve it and increase its spread
between the learners.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
Exploring Twitter as a Source of an Arabic Dialect CorpusCSCJournals
Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as a source of a corpus. We collected 210,915K tweets from five groups of Arabic dialects Gulf, Iraqi, Egyptian, Levantine, and North African. This paper explores Twitter as a source and describes the methods that we used to extract tweets and classify them according to the geographic location of the sender. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. Our approach in classification tweets achieved an accuracy equal to 79%.
Top 10 Natural Language Processing Trends in 2020 - International Journal on ...kevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
ANALYTICAL STUDY TO REVIEW OF ARABIC LANGUAGE LEARNING USING INTERNET WEBSITESijcsit
Arabic language is one of the most commonly used language in the world. It plays a very important role in
educational operations around the world. It contains various parts such as poetry, poem, novel, and
stories, as well as linguistic and grammatical rules and movements of letters, which change the word
according to the movements accompanying each letter, for example, there are movements of lifting and
breaking and annexation and silence. In this paper, we will review the research papers that studied the
Arabic language learning websites, using the content analysis to determine strengths, weaknesses,
advantages, disadvantages, and limitations. This research paper concluded that there is still a shortage
and scarcity in the number of articles and websites on the internet that teach Arabic language. The
suggestion is to assign task of development to Arab world instituties and others by increasing their number
of websites on the internet and enriching their scientific content to improve it and increase its spread
between the learners.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
Exploring Twitter as a Source of an Arabic Dialect CorpusCSCJournals
Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as a source of a corpus. We collected 210,915K tweets from five groups of Arabic dialects Gulf, Iraqi, Egyptian, Levantine, and North African. This paper explores Twitter as a source and describes the methods that we used to extract tweets and classify them according to the geographic location of the sender. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. Our approach in classification tweets achieved an accuracy equal to 79%.
Top 10 Natural Language Processing Trends in 2020 - International Journal on ...kevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Call for proposals, Translations of scholarly literature into Persian, Center...Encyclopaedia Iranica
The grant opportunity is designed for Iranian nationals with current permanent residency in Iran.
The selection committee is chaired by Kevan Harris and Cyrus Schayegh (both of Princeton University).
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
A Descriptive Study of Standard Dialect and Western Dialect of Odia Language ...ijtsrd
Language is a unique blessing to human beings. Human beings are bestowed with the faculty of language from very primitive age. Language makes human beings social and in a society human beings communicate with the help of language. Odia is one among the constitutionally approved language of India. Odisha is situated in the eastern part of India. Presently, this state has thirty districts. Odisha is bound to the north by the state Jharkhand, to the northeast by the state West Bengal, to the east by the Bay of Bengal, to the south by the state Andhra Pradesh, and to the west by the state Chhattisgarh. The languages used by the neighboring states have a lot of influence on Odia language. In this present study a modest attempt has been made to high light the differences between Standard Odia and Western Odia dialects. Various linguistic items used by the western Odia dialect users have marked differences compared to the standard Odia. The study has been done to delve into the phonological, morphological, semantic and syntactic features of both Standard Odia and Western Odia. Secondly, for ease of understanding some amount of discussion has been made on the existing literature on language and its variety in general. As spoken form is the primary form of any language, data have been collected from the informants' conversation for analysis. Debiprasad Pany "A Descriptive Study of Standard Dialect and Western Dialect of Odia Language in Terms of Linguistic Items" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29632.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/odia/29632/a-descriptive-study-of-standard-dialect-and-western-dialect-of-odia-language-in-terms-of-linguistic-items/debiprasad-pany
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
کتیب الملخصات - المؤتمر الدولي السادس حول القضايا الراهنة للغات، علم اللغة، الترجمة و الأدب
9-10 أكتوبر 2021 ، الأهواز
لمزید من المعلومات، ﯾرﺟﯽ زﯾﺎرة ﻣوﻗﻌﻧﺎ اﻹﻟﮐﺗروﻧﻲ : WWW.LLLD.IR
لا تتردد فی مراسلتنا للاجابة عن ای استفسارات.
اللجنة المنظمة للمؤتمر،
الأهواز / الصندوق البريدی 61335-4619:
الهاتف :32931199-61 (98+)
الفاکس:32931198-61(98+)
النقال و رقم للتواصل عبر الواتس اب : 9165088772(98+)
WWW.LLLD.IR، البريد اﻹﻟﮑﺘﺮوﻧﻲ: info@pahi.ir
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Tanzeela Abid & Dr. Habibullah Pathan,
English Language Development Centre, Faculty of Science, Technology and Humanities, Mehran University of Engineering and Technology, Pakistan
This is an exploratory study and qualitative in nature. Unit of exploration is ‘Adult Broca’s Aphasic Patients.’ This paper aims to explore the function and integrity of ‘Speech Therapy’ for adult Broca’s aphasia. Aphasia is the after-effect of brain damage, commonly found in left hemisphere which disrupts language faculty. The present study focuses on ‘Lexical’ aspect of language in which an individual faces trouble in processing of words. In Broca’s aphasia affected individual suffers from diminished capability of speaking/communication. To recover such diminished capabilities, speech therapy is utilized. This study intends to investigate the effectiveness of speech therapy that how speech therapy helps to adult Broca’s aphasia to recover their speaking or conversing skills? Participants of the study are ‘Speech therapists.’ Purposeful sampling, particularly Snowball sampling has been undertaken. Semi-structured interviews have been conducted from five speech therapists, which have been analyzed through thematic analysis under the light of ‘Sketch Model’ given by De ruiter and De beer (2013). The Findings of the study suggest that speech therapy may be proved helpful for Broca’s aphasia to recover their communicating capabilities but it requires much time (minimum 6 months). Moreover, recovery depends upon certain factors such as age, level of disorder and willingness.
Keywords: Broca’s Aphasia, Lexical Deficit, Speech Therapy, Communication, Speaking Skills
The Sixth International Conference on Languages, Linguistics, Translation and Literature
9-10 October 2021 , Ahwaz
For more information, please visit the conference website:
WWW.LLLD.IR
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...CSCJournals
Machine translation (MT) is a subfield of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Translating between English language and Yoruba language comes with some computational complexities such as syntactic and grammatical differences in the language pair. This paper aims at exploring a multi-layer hybridized language translation approach, which combines the Corpus-based and Rule-based approaches of machine translation to generate its outputs. A parallel corpus was built with texts from English and Yoruba languages and stored in My Structured Query Language (MySQL) database. One hundred and forty seven computational rules were manually formulated and also stored in MySQL database for generating sentences in both languages. A di-bilingual dictionary was developed, one of which stored words in English with their corresponding Yoruba counterparts and their equivalent parts of speech while the other dictionary stored words in Yoruba with their corresponding English counterparts and their equivalent parts of speech. A real time mobile chatting interface was developed for users’ interactions with themselves and the system. The research model was implemented using PHP for server-side scripting, JSON for data interchange and Java programming language for user interfaces accessible on users’ mobile phones. The Java programming language was coded in Android Studio 3.0 Integrated Development Environment. Two hundred and eleven sentences from Contemporary English Grammar were used for system testing and the result shows 95% accuracy compare with Google Translate.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
Building
dialogues systems
interaction
has recently gained considerable
attention, but most of the
resourc
es and systems built so far are
tailored to
English and other Indo
-
European languages. The need
for designing
systems for
other languages is increasing such as Arabic language.
For this reasons, there
are more int
erest for Arabic dialogue acts classification
task because it
a key player in Arabic language
under
standing
to
bu
ilding this systems
.
This paper surveys
different techniques
for dialogue acts classification
for Arabic.
W
e describe the
main existing techniques for utterances segmentations and
classification, annotation schemas, and
test corpora for Arabic
dialogues understanding
that have introduced
in the literature
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, PakistanBahram Kazemian
The study highlights English code-switching in Punjab Urdu textbooks. The research aims at finding and categorizing Urdu-English code-switches. Another rationale behind the study is to present Urdu equivalents of the switches from an Urdu-English dictionary; for instance, adakar for actor and sayyah for tourist. Textbooks of 5th, 6th, 9th and 10th class are selected for data collection and analysis. A number of instances are observed at morpheme, word, phrase and clause levels. Data is analyzed qualitatively. The data analysis shows switches at all the mentioned levels. The researchers propose a revision of the existing textbooks in the light of the given equivalents and a careful scrutiny of the compilation of future textbooks to preserve the purity of Urdu language.
The Ekegusii Determiner Phrase Analysis in the Minimalist ProgramBasweti Nobert
Among some of the recent syntactic developments, the noun phrase has been reanalyzed
as a determiner phrase (DP). This study analyses the Ekegusii determiner
phrase (DP) with an inquiry into the relationship between agreement of the INFL
(sentence) and concord in the noun phrase (determiner phrase). It hypothesizes that
the Ekegusii sentential Agreement has a symmetrical relationship with the Ekegusii
Determiner Phrase internal concord and feature checking theory and full
interpretation (FI) in the Minimalist Program is adequate in the analysis of the
internal structure of the Ekegusii DP. In employing the Minimalist Program (MP),
the study shall first seek to establish the domain of the NP in the Ekegusii DP and
go ahead to do an investigation into the adequacy of the Minimalist Program in
analyzing the Ekegusii DP. This study is also geared towards establishing the order
of determiners in the DP between the D-head and the NP complement. The study
concludes that the principles of feature checking and full interpretation in the
minimalist program are mutually crucial in ensuring that Ekegusii constructions (DP
and even the sentence) are grammatical (converge). This emphasizes the fact that
the MP is adequate in Ekegusii DP analysis.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
This paper purports to be a starting point to revisit existing approaches dealing with the origin and spread of languages in the light of the changed circumstances of the Twenty-first century without in any way undermining their applicability across space and time. The origin of spoken languages is intricately and inseparably interwoven and intertwined with the origin of human species as well, and in this paper, we propose a ‘Wholly-independent Multi-Regional hypothesis of the origin of Homo sapiens’ in response to both the highly-controversial and arguably antiquated ‘Out-of-Africa theory’ which we have stridently and vehemently opposed, along with all its protuberances and the contending Multi-Regional Hypothesis as well. The key tenets of this paper are therefore articulated based on this fundamental premise which is likely to upend existing presumptions and paradigms to a significant degree. Having said that, we must hasten to add that the evolutionary biology of language encompassing physical anthropology or genetics and other related areas of study, are wholly outside the purview of this paper. Structural linguistics and semantics are also outside the scope of this paper. In this paper, we examine the origins of spoken and written languages in pre-historic, proto-historic, historic, pre-globalized and post-globalized contexts and propose an ‘Epochal Polygenesis’ approach. As a part of this paper, we also provide a broad overview of early and current theories of the origin and spread of languages so that readers can compare our approaches with already existing ones and analyse the similarities and differences between the two. We propose and define several new concepts under the categories of contact-based scenarios and non-contact based scenarios such as the autochthonous origin of languages, the spread of properties of languages from key nodes, the ‘Theory of linguistic osmosis’ and the need to take historical and political factors into account while analysing the spread of languages. In this paper, we also propose among others, the ‘Theory of win-win paradigms’ and the ‘Net benefits approach’. We also emphasize the need to carry out a diachronic and synchronic assessment of the dynamics of languages spread and propose that this be made a continuous process so that the lessons learnt can be used to tweak and hone theories and models to perfection. This paper is likely to significantly up the ante in favour of a dynamics-driven approach by undermining the relative torpor now observed in this arguably vital sub-discipline and contribute greatly to the rapidly emerging field of language dynamics. We also hope that synchronic linguistics will finally get its due place under the sun in the post-globalised world, and will become a major driving force in linguistics in the Twenty-First Century.
Euphemism in the Qur'an: A Corpus-based Linguistic ApproachCSCJournals
Euphemism is an important metaphoric resource in language, which has a relatively high functional load in religious texts, such as the Qur'an. This study creates an electronic HTML database of euphemisms in the Qur'an through adopting a more systematic corpus-based approach. The database of Qur'anic euphemisms is released into the public domain and is free for research and educational use (http://corpus.leeds.ac.uk/euphemismolimat/). The mechanism of annotating Qur'anic euphemisms relies on certain procedures including developing a set of linguistic guidelines, analysis of the content of the Qur'an using two renowned exegeses of the Qur'an and a comprehensive dictionary, evaluating scholarly efforts on the phenomenon of euphemism in the Qur'an, and consulting academics and religious scholars. The study proposes a broad classification of euphemistic topics on the basis of the data in the Qur'an and former categorisations produced by others. It suggests an effective strategy to check and verify inter-annotator agreement in the annotation of Qur'anic euphemisms. It presents statistical analysis and visualisation of the euphemistic data in the corpus. It has been found that the thirty parts of the Qur'an vary in the number and distribution of euphemisms across verses. Although the Meccan surahs comprise about three quarters of the Qur'an, they have only 518 euphemisms in 440 verses. By contrast, the Medinan surahs, which make up the remainder of the Qur'an, have 400 euphemisms in 263 verses. Sex and death are the most common euphemistic topics in the Qur'an, while feelings, divorce and pregnancy are the least frequent euphemistic topics. The study recommends that the designed corpus of Qur'anic euphemisms should be used to update existing web pages on the Qur'an with extended linguistic information about euphemism encoded with HTML/XML annotation.
Arabic language is one of the most commonly used language in the world. It plays a very important role in educational operations around the world. It contains various parts such as poetry, poem, novel, and stories, as well as linguistic and grammatical rules and movements of letters, which change the word according to the movements accompanying each letter, for example, there are movements of lifting and breaking and annexation and silence. In this paper, we will review the research papers that studied the Arabic language learning websites, using the content analysis to determine strengths, weaknesses, advantages, disadvantages, and limitations. This research paper concluded that there is still a shortage and scarcity in the number of articles and websites on the internet that teach Arabic language. The suggestion is to assign task of development to Arab world instituties and others by increasing their number of websites on the internet and enriching their scientific content to improve it and increase its spread between the learners.
Call for proposals, Translations of scholarly literature into Persian, Center...Encyclopaedia Iranica
The grant opportunity is designed for Iranian nationals with current permanent residency in Iran.
The selection committee is chaired by Kevan Harris and Cyrus Schayegh (both of Princeton University).
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
A Descriptive Study of Standard Dialect and Western Dialect of Odia Language ...ijtsrd
Language is a unique blessing to human beings. Human beings are bestowed with the faculty of language from very primitive age. Language makes human beings social and in a society human beings communicate with the help of language. Odia is one among the constitutionally approved language of India. Odisha is situated in the eastern part of India. Presently, this state has thirty districts. Odisha is bound to the north by the state Jharkhand, to the northeast by the state West Bengal, to the east by the Bay of Bengal, to the south by the state Andhra Pradesh, and to the west by the state Chhattisgarh. The languages used by the neighboring states have a lot of influence on Odia language. In this present study a modest attempt has been made to high light the differences between Standard Odia and Western Odia dialects. Various linguistic items used by the western Odia dialect users have marked differences compared to the standard Odia. The study has been done to delve into the phonological, morphological, semantic and syntactic features of both Standard Odia and Western Odia. Secondly, for ease of understanding some amount of discussion has been made on the existing literature on language and its variety in general. As spoken form is the primary form of any language, data have been collected from the informants' conversation for analysis. Debiprasad Pany "A Descriptive Study of Standard Dialect and Western Dialect of Odia Language in Terms of Linguistic Items" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29632.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/odia/29632/a-descriptive-study-of-standard-dialect-and-western-dialect-of-odia-language-in-terms-of-linguistic-items/debiprasad-pany
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
کتیب الملخصات - المؤتمر الدولي السادس حول القضايا الراهنة للغات، علم اللغة، الترجمة و الأدب
9-10 أكتوبر 2021 ، الأهواز
لمزید من المعلومات، ﯾرﺟﯽ زﯾﺎرة ﻣوﻗﻌﻧﺎ اﻹﻟﮐﺗروﻧﻲ : WWW.LLLD.IR
لا تتردد فی مراسلتنا للاجابة عن ای استفسارات.
اللجنة المنظمة للمؤتمر،
الأهواز / الصندوق البريدی 61335-4619:
الهاتف :32931199-61 (98+)
الفاکس:32931198-61(98+)
النقال و رقم للتواصل عبر الواتس اب : 9165088772(98+)
WWW.LLLD.IR، البريد اﻹﻟﮑﺘﺮوﻧﻲ: info@pahi.ir
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Tanzeela Abid & Dr. Habibullah Pathan,
English Language Development Centre, Faculty of Science, Technology and Humanities, Mehran University of Engineering and Technology, Pakistan
This is an exploratory study and qualitative in nature. Unit of exploration is ‘Adult Broca’s Aphasic Patients.’ This paper aims to explore the function and integrity of ‘Speech Therapy’ for adult Broca’s aphasia. Aphasia is the after-effect of brain damage, commonly found in left hemisphere which disrupts language faculty. The present study focuses on ‘Lexical’ aspect of language in which an individual faces trouble in processing of words. In Broca’s aphasia affected individual suffers from diminished capability of speaking/communication. To recover such diminished capabilities, speech therapy is utilized. This study intends to investigate the effectiveness of speech therapy that how speech therapy helps to adult Broca’s aphasia to recover their speaking or conversing skills? Participants of the study are ‘Speech therapists.’ Purposeful sampling, particularly Snowball sampling has been undertaken. Semi-structured interviews have been conducted from five speech therapists, which have been analyzed through thematic analysis under the light of ‘Sketch Model’ given by De ruiter and De beer (2013). The Findings of the study suggest that speech therapy may be proved helpful for Broca’s aphasia to recover their communicating capabilities but it requires much time (minimum 6 months). Moreover, recovery depends upon certain factors such as age, level of disorder and willingness.
Keywords: Broca’s Aphasia, Lexical Deficit, Speech Therapy, Communication, Speaking Skills
The Sixth International Conference on Languages, Linguistics, Translation and Literature
9-10 October 2021 , Ahwaz
For more information, please visit the conference website:
WWW.LLLD.IR
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...CSCJournals
Machine translation (MT) is a subfield of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Translating between English language and Yoruba language comes with some computational complexities such as syntactic and grammatical differences in the language pair. This paper aims at exploring a multi-layer hybridized language translation approach, which combines the Corpus-based and Rule-based approaches of machine translation to generate its outputs. A parallel corpus was built with texts from English and Yoruba languages and stored in My Structured Query Language (MySQL) database. One hundred and forty seven computational rules were manually formulated and also stored in MySQL database for generating sentences in both languages. A di-bilingual dictionary was developed, one of which stored words in English with their corresponding Yoruba counterparts and their equivalent parts of speech while the other dictionary stored words in Yoruba with their corresponding English counterparts and their equivalent parts of speech. A real time mobile chatting interface was developed for users’ interactions with themselves and the system. The research model was implemented using PHP for server-side scripting, JSON for data interchange and Java programming language for user interfaces accessible on users’ mobile phones. The Java programming language was coded in Android Studio 3.0 Integrated Development Environment. Two hundred and eleven sentences from Contemporary English Grammar were used for system testing and the result shows 95% accuracy compare with Google Translate.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
Building
dialogues systems
interaction
has recently gained considerable
attention, but most of the
resourc
es and systems built so far are
tailored to
English and other Indo
-
European languages. The need
for designing
systems for
other languages is increasing such as Arabic language.
For this reasons, there
are more int
erest for Arabic dialogue acts classification
task because it
a key player in Arabic language
under
standing
to
bu
ilding this systems
.
This paper surveys
different techniques
for dialogue acts classification
for Arabic.
W
e describe the
main existing techniques for utterances segmentations and
classification, annotation schemas, and
test corpora for Arabic
dialogues understanding
that have introduced
in the literature
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, PakistanBahram Kazemian
The study highlights English code-switching in Punjab Urdu textbooks. The research aims at finding and categorizing Urdu-English code-switches. Another rationale behind the study is to present Urdu equivalents of the switches from an Urdu-English dictionary; for instance, adakar for actor and sayyah for tourist. Textbooks of 5th, 6th, 9th and 10th class are selected for data collection and analysis. A number of instances are observed at morpheme, word, phrase and clause levels. Data is analyzed qualitatively. The data analysis shows switches at all the mentioned levels. The researchers propose a revision of the existing textbooks in the light of the given equivalents and a careful scrutiny of the compilation of future textbooks to preserve the purity of Urdu language.
The Ekegusii Determiner Phrase Analysis in the Minimalist ProgramBasweti Nobert
Among some of the recent syntactic developments, the noun phrase has been reanalyzed
as a determiner phrase (DP). This study analyses the Ekegusii determiner
phrase (DP) with an inquiry into the relationship between agreement of the INFL
(sentence) and concord in the noun phrase (determiner phrase). It hypothesizes that
the Ekegusii sentential Agreement has a symmetrical relationship with the Ekegusii
Determiner Phrase internal concord and feature checking theory and full
interpretation (FI) in the Minimalist Program is adequate in the analysis of the
internal structure of the Ekegusii DP. In employing the Minimalist Program (MP),
the study shall first seek to establish the domain of the NP in the Ekegusii DP and
go ahead to do an investigation into the adequacy of the Minimalist Program in
analyzing the Ekegusii DP. This study is also geared towards establishing the order
of determiners in the DP between the D-head and the NP complement. The study
concludes that the principles of feature checking and full interpretation in the
minimalist program are mutually crucial in ensuring that Ekegusii constructions (DP
and even the sentence) are grammatical (converge). This emphasizes the fact that
the MP is adequate in Ekegusii DP analysis.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
This paper purports to be a starting point to revisit existing approaches dealing with the origin and spread of languages in the light of the changed circumstances of the Twenty-first century without in any way undermining their applicability across space and time. The origin of spoken languages is intricately and inseparably interwoven and intertwined with the origin of human species as well, and in this paper, we propose a ‘Wholly-independent Multi-Regional hypothesis of the origin of Homo sapiens’ in response to both the highly-controversial and arguably antiquated ‘Out-of-Africa theory’ which we have stridently and vehemently opposed, along with all its protuberances and the contending Multi-Regional Hypothesis as well. The key tenets of this paper are therefore articulated based on this fundamental premise which is likely to upend existing presumptions and paradigms to a significant degree. Having said that, we must hasten to add that the evolutionary biology of language encompassing physical anthropology or genetics and other related areas of study, are wholly outside the purview of this paper. Structural linguistics and semantics are also outside the scope of this paper. In this paper, we examine the origins of spoken and written languages in pre-historic, proto-historic, historic, pre-globalized and post-globalized contexts and propose an ‘Epochal Polygenesis’ approach. As a part of this paper, we also provide a broad overview of early and current theories of the origin and spread of languages so that readers can compare our approaches with already existing ones and analyse the similarities and differences between the two. We propose and define several new concepts under the categories of contact-based scenarios and non-contact based scenarios such as the autochthonous origin of languages, the spread of properties of languages from key nodes, the ‘Theory of linguistic osmosis’ and the need to take historical and political factors into account while analysing the spread of languages. In this paper, we also propose among others, the ‘Theory of win-win paradigms’ and the ‘Net benefits approach’. We also emphasize the need to carry out a diachronic and synchronic assessment of the dynamics of languages spread and propose that this be made a continuous process so that the lessons learnt can be used to tweak and hone theories and models to perfection. This paper is likely to significantly up the ante in favour of a dynamics-driven approach by undermining the relative torpor now observed in this arguably vital sub-discipline and contribute greatly to the rapidly emerging field of language dynamics. We also hope that synchronic linguistics will finally get its due place under the sun in the post-globalised world, and will become a major driving force in linguistics in the Twenty-First Century.
Euphemism in the Qur'an: A Corpus-based Linguistic ApproachCSCJournals
Euphemism is an important metaphoric resource in language, which has a relatively high functional load in religious texts, such as the Qur'an. This study creates an electronic HTML database of euphemisms in the Qur'an through adopting a more systematic corpus-based approach. The database of Qur'anic euphemisms is released into the public domain and is free for research and educational use (http://corpus.leeds.ac.uk/euphemismolimat/). The mechanism of annotating Qur'anic euphemisms relies on certain procedures including developing a set of linguistic guidelines, analysis of the content of the Qur'an using two renowned exegeses of the Qur'an and a comprehensive dictionary, evaluating scholarly efforts on the phenomenon of euphemism in the Qur'an, and consulting academics and religious scholars. The study proposes a broad classification of euphemistic topics on the basis of the data in the Qur'an and former categorisations produced by others. It suggests an effective strategy to check and verify inter-annotator agreement in the annotation of Qur'anic euphemisms. It presents statistical analysis and visualisation of the euphemistic data in the corpus. It has been found that the thirty parts of the Qur'an vary in the number and distribution of euphemisms across verses. Although the Meccan surahs comprise about three quarters of the Qur'an, they have only 518 euphemisms in 440 verses. By contrast, the Medinan surahs, which make up the remainder of the Qur'an, have 400 euphemisms in 263 verses. Sex and death are the most common euphemistic topics in the Qur'an, while feelings, divorce and pregnancy are the least frequent euphemistic topics. The study recommends that the designed corpus of Qur'anic euphemisms should be used to update existing web pages on the Qur'an with extended linguistic information about euphemism encoded with HTML/XML annotation.
Arabic language is one of the most commonly used language in the world. It plays a very important role in educational operations around the world. It contains various parts such as poetry, poem, novel, and stories, as well as linguistic and grammatical rules and movements of letters, which change the word according to the movements accompanying each letter, for example, there are movements of lifting and breaking and annexation and silence. In this paper, we will review the research papers that studied the Arabic language learning websites, using the content analysis to determine strengths, weaknesses, advantages, disadvantages, and limitations. This research paper concluded that there is still a shortage and scarcity in the number of articles and websites on the internet that teach Arabic language. The suggestion is to assign task of development to Arab world instituties and others by increasing their number of websites on the internet and enriching their scientific content to improve it and increase its spread between the learners.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by
taking advantage of large scale parallel corpora, but very few research works are conducted on the
Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel
Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic
text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation
of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using
Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.
LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM
based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score
of 12%, 11%, and 6% respectively.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively
The impact of artificial intelligence (AI) in Islam encompasses ethical, educational, financial, and legal considerations. AI can enhance education, improve healthcare, and streamline Islamic finance. Ethical questions about AI's alignment with Islamic values arise, as does the need for scholars to assess AI-generated legal decisions and contracts. The job market, social justice, and economic inequality may be affected. Accessibility and inclusivity align with Islamic principles, and AI-generated art may challenge notions of creativity. Theological discussions about consciousness and free will may also emerge. The extent of AI's impact will depend on its development, regulation, and integration in Islamic societies.
https://islamicmualim.com/
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Using automated lexical resources in arabic sentence subjectivityijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...ijcsit
This paper is not to compare between the Arabic language and the English language as natural languages.
Instead, it focuses on the comparison among them in terms of their challenges in building text based
Conversational Agents (CAs). A CA is an intelligent computer program that used to handle conversations
among the user and the machine. Nowadays, CAs can play an important role in many aspects as this work
figured. In this paper, different approaches that can be used to build a CA will be differentiated. In each
approach, the comparison aspects among the Arabic and English languages will be debated with the
respect to the Arabic language.
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In Arabic, there are few resources and these resources are not comprehensive. Most of the current research efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities. However, the coverage of all Arabic sentiment expressions can be applied using refined regular expressions rather than a large number of lexical entities. This paper presents an ASL that more comprehensive than the existing lexicons, for covering many expressions with different dialects including Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented to each
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONkevig
The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In
Arabic, there are few resources and these resources are not comprehensive. Most of the current research
efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities.
However, the coverage of all Arabic sentiment expressions can be applied using refined regular
expressions rather than a large number of lexical entities. This paper presents an ASL that more
comprehensive than the existing lexicons, for covering many expressions with different dialects including
Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different
lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical
information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented
to each
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation Systemijtsrd
Sadhu and Cholit bhasha are two significant Bangladeshi languages. Sadhu was functional in ancient era and had Sanskrit components but in present era cholit took its place. There are many formal and legal paper works present in Sadhu language which direly need to be translated in Cholit because its more favorable and speaker friendly. Therefore, this paper dealt with this issue by familiarizing the current era with Sadhu by creating a software. Different sentences were chosen and final data set was obtained by Principal Component Analysis PCA . MATLAB and Python are used for different machine learning algorithms. Most work is being done using Scikit Learn and MATLAB machine learning toolbox. It was found that Linear Discriminant Analysis LDA functions best. Speed prediction was also done and values were determined through graphs. It was inferred that this categorizer efficiently translated all Sadhu words to Cholit precisely and in well structured way. Therefore, Sadhu will not remain a complex language in this decade. Nakib Aman Turzo | Pritom Sarker | Biplob Kumar "Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30792.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30792/interpretation-of-sadhu-into-cholit-bhasha-by-cataloguing-and-translation-system/nakib-aman-turzo
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
The demand for multilingual information is becoming erceptive as the users of the internet throughout the world are escalating and it creates a problem of retrieving documents in one language by specifying query in another language. This increasing demand can be addressed by designing automatic tools, which accepts the query in one language and retrieves the relevant documents in other languages. We have developed prototype Amharic-Arabic Cross Language
Information Retrieval System by applying dictionary-based approach that enables the users to retrieve relevant documents from Amharic-Arabic corpus by entering the query in Amharic and retrieving the relevant documents both Amharic and Arabic.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Azhary: An Arabic Lexical Ontology
1. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
DOI : 10.5121/ijwest.2014.5405 71
Azhary: An Arabic Lexical Ontology
Hossam Ishkewy, Hany Harb and Hassan Farahat
Azhar University, Faculty of Engineering, Computers and Systems Engineering Department
Abstract
Arabic language is the most spoken languages in the Semitic languages group, and one of the most common languages in the world spoken by more than 422 million. It is also of paramount importance to Muslims, it is a sacred language of the Islamic Holly Book (Quran) and prayer (and other acts of worship) in Islam is performed only by mastering some of Arabic words. Arabic is also a major ritual language of a number of Christian churches in the Arab world and it is also used in writing several intellectual and religious Jewish books in the Middle Ages. Despite this, there is no semantic Arabic lexicon which researchers can depend on. In this paper we introduce Azhary as a lexical ontology for the Arabic language. It groups Arabic words into sets of synonyms called synsets, and records a number of relationships between words such as synonym, antonym, hypernym, hyponym, meronym, holonym and association relations. The ontology contains 26,195 words organized in 13,328 synsets. It has been developed and contrasted against AWN which is the most common available Arabic lexical ontology.
keywords: Arabic Language, Ontology, WordNet, Semantic Web, Arabic Lexicon.
1. Introduction 1.1 The Arabic Language
The language is a fundamental value in the life of every nation where it carries ideas and concepts and establishes the communication links among the people. The language templates which ideas, images and words that are formulated in which feelings and emotions never inseparable intellectual and emotional content[1].
The Arabic language is considered one of the most prolific languages in terms of language material, for example, The Lisan El-Arab (The Arabic Tongue) dictionary developed in the 13th century, contains more than 80,000 items, while in English, the dictionary of Samuel Johnson, which is considered one of the first English dictionaries, developed in the 18th century, contains of 42,000 words.
The Arabic language contains 28 written characters or 29 characters if the Hamza is considered as a separate character. The Arabic language is written from right to left like Persian, Hebrew, unlike many international languages and from the top to the bottom. It should be noted that the Arabic language includes Mandarin and Dialects, Mandarin is the language of the Quran, books and news bulletins and solemn occasion. Mandarin is the same in all Arab countries. The Arabic Dialects, is the language of daily communication, which vary from one country to another, for example, the Egyptian Dialect is different from the Iraqi Dialect. The most important differences
2. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
72
between Mandarin and Dialect are that the first is written and spoken, while the second is only spoken, as well as that the norms of Mandarin rules, are fixed. This phenomena arises a debate around usage of Mandarin and Dialect between intellectuals where some support the use of Mandarin in all areas, and never use Dialect, while others prefer the use of Dialect instead of Mandarin. Semantics plays an important role of Arabic language processing, we can't imagine accomplishing deep Arabic text processor, without sufficient information on the significance of the semantic relationships between words. The synonym, antonym, hypernym, hyponym, meronym, holonym and association relations in Arabic language are considered important linguistic features for the Arabic linguists. This research shows the importance of these relationships as a key to solving many of the linguistic analysis issues. The linguistic analysis issues are such as the development of characteristics and semantic fields of words, the text automatic analysis, the text understanding, solving the issues of linguistic confusion, and the machine translation from Arabic to another language or vice versa.
1.2 The Semantic Web
Current web contains billions of documents and has many administrative problems and limitations. Its content is still readable only by humans. To solve these problems, Tim Berners- Lee introduced the Semantic Web as a conceptual model of web that makes the contents to be read and used by human and intelligently by machines [2]. He mentioned that "The Semantic Web (SW) is not a separate Web but an extension of the current one, in which information is given well defined meaning. Adding semantics to the Web involves allowing documents which have information in machine readable forms, and allowing links to be created with relationship values"[3].
Tim Berners-Lee introduced in 2006 the fourth version of the Semantic Web Architecture containing eight layers [4]. The RDF and Ontology are considered to be the most important layers of this structure. The Resource Description Framework (RDF) is considered to be the key infrastructure to construct the Semantic Web , because RDF enables semantic interoperability. It is a standardized basis to encode , exchange and reuse the structured metadata . It is predicted that when RDF be used in a large scale on the web, content and relationship between different resources will be described better, and this will help search engines to easily find resources on the web and enable content to be rated [5]. Ontology is the backbone of the Semantic Web. ontology became an interesting topic for researches and an important key in the developing of the Semantic Web as it provides a sharable domain that facilitates the data understanding between people and different applications. Ontology has many definitions one of them defined by Gruber as "an explicit specification of a conceptualization" [6].
Since that Tim Berners-Lee invented the Semantic Web in 2001 and until now many researches about linguistic ontology have appeared, however still the linguistic ontology in Arabic language is fewer.
3. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
73
1.3 The Lexical Ontologies WordNet is a lexical database for the English language. It collects English words into sets of synonyms called synsets, gives short definitions and usage examples, and records a number of relations among these synonym sets. These relations include: hypernym, hyponym, coordinate terms, meronym, holonym[7]. WordNet can be interpreted and used as a lexical ontology. It has been used for a number of different purposes in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, machine translation and even automatic crossword puzzle generation[8]. The WordNet success of the English language stimulated similar projects that aim to develop WordNets for other languages, so, many WordNets has been carried out as an important resource for a wide range of natural language processing applications[9]. The global Wordnet (Global WordNet Organization), as non-profitable public organization, is established to support and encourage the development of dictionaries and WordNet for other languages based on the English WordNet, providing connectivity between the WordNet dictionaries from different languages such as Arabic, Hebrew, Persian, African, Albanian, Indian, etc.[10].
This research presents an Arabic WordNet (AWN) which is based on the design and contents of the universally accepted WordNet and mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. The AWN was developed and linked with the Suggested Upper Merged Ontology (SUMO), where concepts are defined with machine interpretable semantics in first order logic [11].
This paper is arranged as it follows. Section 1 introduces the Arabic language, the Semantic Web and the Lexical Ontologies. Section 2 discusses some related works. Section 3 discusses our proposed Azhary system architecture and its implementation issues. Section 4 evaluates the system. The paper is concluded in Section 5.
2. Related works
Many researches concerned with the Arabic ontologies construction have been appeared in the last few years. These ontologies belong to variant domains and are constructed with different methods.
2.1. Islamic Domain Ontology Al-Yahya, and others [12] proposed a computational model for representing Arabic lexicons using ontologies. The ontology development based on the UPON (Unified Process for ONtology) ontological engineering approach [13]. The ontology was limited to time nouns which appeared in the Holy Quran. The ontology consisted of 18 classes and contained a total of 59 words. Hikmat Ullah Khan and others [14] proposed that the ontology concept can be applied for developing semantic search in Holy Quran. They presented a domain ontology, based on living creatures including animals and birds mentioned in Holy Quran. Authors in [15] explored the representation and classification of Holy Quran knowledge by using ontology. The ontology model for Al-Quran was developed according to Al-Quran knowledge themes as
4. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
74
described in Syammil Al-Quran Miracle the Reference. Iman (faith) and Akhlaq (deed) main classes were chosen as the research scope for constructing the ontology.
2.2. Linguistic Ontology
Authors in [16] described the Semantic Quran dataset as a multilingual RDF representation of translations of the Quran. The dataset was created using two different semi-structured sources(The Tanzil Project, The Quranic Arabic Corpus Project). The dataset was aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompassed 43 different languages which are belonged to the most under represented languages in Linked Data, including Arabic, Amharic and Amazigh. The authors presented the ontology devised for structuring the data. It included four basic classes: Chapter, Verse, Word and LexicalItem. Belkredim and El Sebai [17] developed Arabic ontology using verbs and roots, and classified verbs of derivation in the Arabic language from the roots based on that 85% of words derived from tri-literal roots. The use of the roots as a base for building ontology is inaccurate because the word derived, though they have the same basic meaning, but may cannot be grouped under the same categories. Also, the authors did not provide any implementation. Author in [18] aimed to develop an Arabic Linguistic or Upper Ontology. The top levels of the Arabic ontology was built manually based on DOLCE and SUMO upper level ontologies. Only 420 concepts of the Arabic ontology is being evaluated, and the remainder concepts did not finished. Authors in [19] presented Al –Khalil project for building an Arabic infrastructure ontology. The core of this infrastructure is a linguistic ontology that is founded on Arabic traditional grammar. The development of Al –Khalil contained two steps. The first step is bootstrapping manually the ontology by choosing the linguistic concepts from Arabic linguistics and relating them to the concepts in GOLD. The second is using an automatic extraction algorithm to extract new concepts from linguistic texts to enrich the ontology.
After the great success of the WordNet, many WordNets appeared in several languages until Arabic WordNet (AWN) developed following a methodology developed for EuroWordNet. Arabic WordNet consists of 11,270 synsets and contains 23,496 Arabic expressions (words and multiwords) [20]. Many researchers depended on AWN in their work such as [21] and [22] in Q/A system. Some researchers tried to improve and extend AWN. Authors in [23] extended the AWN using lexical and morphological rules and applying Bayesian inference. Alkhalifa and Rodríguez [20] used Wikipedia to automatically extending named entities of Arabic WordNet while in [24] authors used the Yago ontology as a resource for the enrichment of named entities in Arabic WordNet.
2.3. Automatic Ontology Generation
[25] presented an approach to the automatic generation of ontology instances from a collection of unstructured known documents as Al-Quran. The presented approach was stimulated based on the combination of Natural Language Processing techniques, Information Extraction and Text Mining techniques. [26] presented an approach to automatic ontology construction from a corpus of domain "Arabic linguistics". They reused information extraction techniques for extracting new terms that will denote elements of the ontology(concept, relation). [27] illustrated an ontology extraction based on fuzzy-swarm algorithm for Islamic knowledge text. The authors supposed that combining lexicon, syntactic and statistical learning methods, the accuracy and the
5. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
75
computational efficiency of the ontology discovery process is improved. [28] proposed a system that automates the process of constructing a taxonomic agricultural domain ontology using a semi-structured domain specific web documents.
2.4. Miscellanies Ontologies
[29] presented multilingual tool providing online Arabic information retrieval based on legal domain ontology under try to improve the search recall and precision . They use the built ontology as a base for subsequent user-query expansion in the search system. In [30] the authors provided an Arabic ontology representing computer technology knowledge based on the Arabic Internet blogs. The research is conducting a pilot study on a number of Arabic blogs randomly selected in computer technology. The ontology comprised 110 classes, 78 class instances and 48 object properties while others [31] also developed an ontology in the computer technology domain. This ontology was built based on classic Arabic language rather than the blogs modern language. The presented ontology is much simpler than what is discussed in [30].
3. The Azhary System Architecture
Azhary is a lexical ontology for the Arabic language, its primary use is in automatic text analysis and artificial intelligence applications. It groups Arabic words into sets of synonyms called synsets, and records a number of relations between words. The ontology contains 26,195 words, organized in 13,328 synsets. The Azhary system architecture as shown in Figure 1 is composed of different modules: The Words Extraction, The Relations Building, The Ontology Building, The Searching, and The Graphical User Interface.
3.1 The Words Extraction
The words extraction is the process of creating the seed words to start the ontology. This seed is built from the Holy Quran which has 77,439 words. The primary function of this process is to cut the Quran words and storing these words with no repetition into an excel file which contains the words and the relations in tabular form. The Quran text is extracted from Tanzil [32] (an international Quranic project) which provides a highly-verified precise Quran text.
Figure 1 :The Azhary System Architecture
6. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
76
3.2 The Relations Building
The relations building module is a fully manual process where it is achieved by using Arabic/Arabic dictionaries to find the words and build the relations between them. Common dictionaries, such as the meanings dictionary (قاهىس الوعاًً ) [33] and rich lexicon(هعجن الغ يٌ ) [34] and mediator lexicon (الوعجن الىسيظ ) [35] are used in the construction of the ontology. The word can be connected to another word by means of the following semantic relations :
synonym: B is a synonym of A, if A and B has the same meaning.
Ex: ( ) هَعْرُوف is a synonym of (جَوِيل ). Both mean kindness
hypernym: B is a hypernym of A, if A is a (kind of) B.
Ex: limb( )عُضى is a hypernym of hand (يَذ )
hyponym: B is a hyponym of A, if B is a (kind of) A .
Ex: hand ( )يَذ is a hyponym of limb (عُضى )
meronym: B is a meronym of A, if B is a (part of) A .
Ex: hand ( )يَذ is a meronym of arm(رِراع )
holonym: B is a holonym of A, if A is a (part of ) B .
Ex: arm( )رِراع is a holonym of hand (يَذ ) antonym: B is an antonym of A, if A is an (inverse) of B. Ex: kindness ( ) هَعْرُوف is an antonym of harm (أريَح )
association: A and B are associated if A exists always with B.
Ex: doctor ( ) طثية and patient ( هريض ) are associated. husband(زوج ) and wife(زوجح ) are also associated. Azhary includes parts of speech (POS) of the Arabic language defined by [36]. Figure 2 represents these POS illustrated by the free open source ontology editor protégé [37]. Figure 3 translates this POS to English language.
3.3 The Ontology Building. The ontology building module reads the words and their relations from the excel file using JExcelApi [39]. JExcelApi is an open source java API enabling developers to read, write, and modify Excel spreadsheets dynamically. This process converts the words and relations tabular form to an ontology. Azhary ontology has been developed using the free and open source java framework for semantic web building (Apache Jena) [39]. A simple example of the ontology building process is listed below to explain how the ontology has been built.
<rdf:Description rdf:about="http://www.azhary.org#يَذ ">
<rdf:type rdf:resource="http://www.azhary.org#اسن "/>
<a:anti rdf:resource="http://www.azhary.org# ضَرَ ر "/>
<a:anti rdf:resource="http://www.azhary.org# شَ ر "/>
7. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
77
Figure 2: The Arabic POS Figure 3: Arabic POS translated into English
<a:means rdf:resource="http://www.azhary.org# ُ ىاطلُّس"/>
<a:has_a rdf:resource="http://www.azhary.org#تُ صٌُر "/>
<a:part_of rdf:resource="http://www.azhary.org#رِراع "/>
<a:part_of rdf:resource="http://www.azhary.org#جِسن "/>
<a:has_a rdf:resource="http://www.azhary.org#خُ صٌُر "/>
<a:means rdf:resource="http://www.azhary.org#إِحْساى "/>
<a:has_parent rdf:resource="http://www.azhary.org#عُضى "/>
<a:means rdf:resource="http://www.azhary.org#صَذَقَح "/>
<a:means rdf:resource="http://www.azhary.org# هِلْكُ "/>
<a:has_a rdf:resource="http://www.azhary.org#سَثَاتَح "/>
<a:anti rdf:resource="http://www.azhary.org# سُى ء "/>
8. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
78
<a:has_child rdf:resource="http://www.azhary.org#يَذ_يُسرَي "/>
<a:has_a rdf:resource="http://www.azhary.org#سُلاهياخ "/>
<a:has_a rdf:resource="http://www.azhary.org#إتهام "/>
<a:means rdf:resource="http://www.azhary.org#قُىَّج "/>
<a:has_parent rdf:resource="http://www.azhary.org#أداج "/>
<a:has_a rdf:resource="http://www.azhary.org#رُسغ "/>
<a:anti rdf:resource="http://www.azhary.org# أَرِيَّح "/>
<a:means rdf:resource="http://www.azhary.org#هَعْرُوف "/>
<a:means rdf:resource="http://www.azhary.org#جَوِيل "/>
<a:has_a rdf:resource="http://www.azhary.org#كَف "/>
<a:has_a rdf:resource="http://www.azhary.org#وُسطًَ "/>
<a:has_child rdf:resource="http://www.azhary.org#يَذ_يُوًٌَ "/>
<a:has_parent rdf:resource="http://www.azhary.org#طَرف "/>
<a:means rdf:resource="http://www.azhary.org# هَقْثَضُ "/>
<a:means rdf:resource="http://www.azhary.org#قُذرج "/>
<a:means rdf:resource="http://www.azhary.org#عًِْوَح "/>
</rdf:Description>
The previous ontology part describes the resource hand(يَذ ), its part of speech is noun(اسن ). The synonym relation is represented by the means property. The hypernym relation is represented by the has_parent property. The hyponym relation is represented by the has_child property. The meronym relation is represented by the part_of property. The holonym relation is represented by the has_a property. The antonym relation is represented by the anti property. Figure 4 introduces Azhary GUI of the example. Azhary GUI consists of sixteen components ( two text fields, six text area and eight labels describing the purpose of the text areas and fields). Semantic experts can query this ontology easily but normal user can only use the GUI.
3.4 The Searching Process The searching process goal is to receive the use query from the GUI, query the ontology to find the words and relations, return the results to the GUI. We used (Apache Jena) [39] also in this querying.
3.5 The Graphical User Interface. The GUI is used only to receive the user query and return the results. Figure 4 shows the Azhary GUI and Table 1 translates the GUI labels from Arabic to English.
9. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
79
Figure 4: The Azhary GUI
الكلوح
ىًع الكلوح
الوعًٌ
الوضاد
الأصل
الفرع
ذحرىي علً
ذىجذ فً
word
POS synonym antonym
hypernym
hyponym
holonym
meronym Table 1
4. Criticism and Comparison
AWN as an Arabic lexicon has the largest number of words and synsets in comparison with the other Arabic lexicons. But AWN has many disadvantage such that: 1- It has many errors in the meanings. Ex: if you use AWN to find the meaning of the verb fail(فشِل ) you will find (أصاب تِعَرَج ) in the results. This is wrong Arabic sentence. The true sentence is (أُصُ ية تِعَرَج ) which means (be lame) and it does not mean fail. 2-It has many words repetition. EX: if you use AWN to find the meaning of the verb speak(ذَكَلَن ) or the noun traveling. (سَفَر ), you will find many words repetition in the results. 3- A word may has different individual properties with itself. EX: if you use AWN to find the meaning of the noun food (طَعَام ) you will find that (طَعَام تخلاف طَعَام ) in the results which means (food is different from food). 4-AWN has errors in the Arabic word types tree. Ex: The verb is considered as a subclass of the noun. 5-AWN ignores the letters which has important meaning in Arabic language. EX: the letter(ليد ) means (hopefulness). 6- Many sentences have stuck words. EX:(ه رٌجاذألثاى ) is two stuck words while the true sentence is (ه رٌجاخ ألثاى ) means milk products. 7-AWN has some Arabic diacritics problems. 8-AWN is limited in terms of semantic meanings and the relations between words [22].
10. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
80
Table 2 shows a comparison between AWN and Azhary
CriteriaLexicon AWN Azhary Average Response Time 1.3 Second 11 Millie Second Number of Words
23,496 26,195 Number of Synsets
11,270 13,328 Synonym Relation Yes Yes Hypernym Relation Yes Yes Hyponym Relation Yes Yes Meronym Relation No Yes Holonym Relation No Yes Antonym Relation No Yes Association Relation No Yes
5. The Conclusions
There is no chance for Arabic semantic analysis since there is no Arabic lexical ontology can linguist researchers depend on, So, we have introduced in this paper Azhary as a lexical ontology for the Arabic language, its primary use is in automatic text analysis and artificial intelligence applications. It groups Arabic words into sets of synonyms called synsets, and records some relations between words such as: the synonym, antonym, hypernym, hyponym, meronym, holonym and association relations. It has been developed and contrasted against AWN which is the most common available Arabic lexical ontology. Our system shows better response time and it is has larger words and records more word relationships.
5. References
[1] Farhan El-Salim,"" اللغــح العـرتيــح وهكا رًهـا تيـي اللغــاخ , Available: http://www.saaid.net/Minute/33.htm, [Accessed: Aug. 5, 2014].
[2] Haytham T. Al-Feel, Magdy Koutb and Hoda Suoror, "Semantic Web on Scope: A New Architectural Model for the Semantic Web",Journal of Computer Science 4(7): 613-624, 2008.
[3] Tim Berners-Lee, James Hendler and Ora Lassila,"The Semantic Web", Scientific American, May 17, 2001
[4] T. Berners-Lee, "Artificial Intelligence & the Semantic Web" World Wide Web Consortium, 2006. [Online]. Available: http://www.w3.org/2006/Talks/0718-aaai-tbl/Overview.html#(14) [Accessed: Sep. 5, 2014].
[5] Liyang Yu, "Introduction to the Semantic Web and Semantic Web Services",2007.
[6] T. R. Gruber, "A Translation Approach to a Portable Ontology Specification", Knowledge Acquisition, vol. 6, pp. 199-221, 1993. [7] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller " Introduction to WordNet: an on-line lexical database ", International Journal of Lexicography 3(4):235-244 (1990).
[8] http://en.wikipedia.org/wiki/WordNet [Accessed: Sep. 9, 2014].
[9] Aya M. Al‐Zoghbya , Ahmed Sharaf Eldin Ahmed, Taher T. Hamza, "Arabic Semantic Web Applications – A Survey", Semantic Web Journal, 2012.
[10] http://globalwordnet.org/wordnets-in-the-world[Accessed: Sep. 7, 2014].
[11] Christiane Fellbaum, Musa Alkhalifa, William J.Black, Sabri Elkateb, Adam Pease, HoracioRodríguez, and Piek Vossen, "Building a WordNet for Arabic", May, 2006.
11. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
81
[12] Maha Al-Yahya, Hend Al-Khalifa,Alia Bahanshal, Iman Al-Odah, Nawal Al Helwah, "AN ONTOLOGICAL MODEL FOR REPRESENTING SEMANTIC LEXICONS: AN APPLICATION ON TIME NOUNS IN THE HOLY QURAN", The Arabian Journal for Science and Engineering, Volume 35, December 2010.
[13] A. De Nicola, M. Missikoff, and R. Navigli, “A Software Engineering Approach to Ontology Building”,Information Systems, 34(2009), pp. 258–275.
[14] Hikmat Ullah Khan, Syed Muhammad Saqlain, Muhammad Shoaib,Muhammad Sher, "Ontology Based Semantic Search in Holy Quran", International Journal of Future Computer and Communication, Vol. 2, No. 6, December, 2013.
[15] Azman Ta’a, Syuhada Zainal Abidin, Mohd Syazwan Abdullah, Abdul Bashah B Mat Ali, and Muhammad Ahmad," AL-QURAN THEMES CLASSIFICATION USING ONTOLOGY", Proceedings of the 4the International Conference on Computing and Informatics, ICOCI, 2013.
[16] Mohamed Ahmed Sherif ,Axel-Cyrille Ngonga Ngomo, "Semantic Quran: A Multilingual Resource for Natural-Language Processing", Semantic Web Journal,2009
[17] Fatma Zohra Belkredim, Ali El Sebai "An Ontology Based Formalism for the Arabic Language Using Verbs and their Derivatives", IBIMA,Volume 11, 2009.
[18] Mustafa Jarrar, "Building a Formal Arabic Ontology" , In proceedings of the Experts Meeting on Arabic Ontologies and Semantic Networks April 26-28, 2011.
[19] Hassina Aliane, Zaia Alimazighi, Mazari Ahmed Cherif,"Al–Khalil: The Arabic Linguistic Ontology Project", Seventh International Conference on Language Resources and Evaluation, 2010. [20] Musa Alkhalifa, Horacio Rodríguez, "Automatically Extending Named Entities Coverage of Arabic WordNet using Wikipedia ",International Journal on Information and Communication Technologies, Vol. 3, No. 3, June 2010.
[21] Lahsen Abouenour," On the Improvement of Passage Retrieval in Arabic Question/Answering (Q/A) Systems", Natural Language Processing and Information Systems, Volume 6716, 2011, pp 336-341.
[22] Abdurrahman A. Nasr., M.Sc. thesis, Al-Azhar university, Faculty Of Engineering, Computers and Systems Dept., 2012.
[23] Rodríguez, H. Farwell, D. Farreres, J. Bertran, M. Alkhalifa, M. Martí M.A " Arabic WordNet: Semi-automatic Extensions using Bayesian Inference". In Proceedings of the the 6th Conference on Language Resources and Evaluation LREC,May, 2008.
[24] Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso," Using the Yago ontology as a resource for the enrichment of Named Entities in Arabic WordNet ", Workshop on LR & HLT for Semitic Languages, 7th Int. Conf. on Language Resources and Evaluation, LREC, May, 2010.
[25] Saidah Saad, Naomie Salim and Hakim Zainal,"Islamic Knowledge Ontology Creation", Internet Technology and Secured Transactions, 2009.
[26] Ahmed Cherif Mazari , Hassina Aliane, and Zaia Alimazighi ,"Automatic construction of ontology from Arabic texts" ,Proceedings ICWIT, 2012.
[27] Saidah Saad, Naomi Salim, "Methodology of Ontology Extraction for Islamic Knowledge Text", In Postgraduate Annual Research Seminar, UTM,2008.
[28] Samhaa R. El-Beltagy, Maryam Hazman, Ahmed Rafea, "Ontology learning from domain specific web documents," vol. 4, No. 1/2, May 2009.
[29] S. Zaidi, M.T. Laskri, K. Bechkoum, "A Cross-language Information Retrieval Based on an Arabic Ontology in the Legal Domain, IEEE SITIS" ,2005.
[30] Lilac Al-Safadi, Mai Al-Badrani, Meshael Al-Junidey, "Developing Ontology for Arabic Blogs Retrieval", International Journal of Computer Applications (0975– 8887),Volume 19– No.4, April, 2011.
[31] Ibrahim Fathy Moawad, Mohammad Abdeen, Mostafa Mahmoud Aref, "Ontology-based Architecture for an Arabic Semantic Search Engine", December 15-16, 2010.
[32] Tanzil, http://www.tanzil.net
[33] The meanings dictionary(قاهىس الوعاًً ),http://www.almaany.com/ [34] Abd El-Ghany Abo El-Azm, rich lexicon(هعجن الغ يٌ ),2013
12. International Journal of Web & Semantic Technology (IJWesT) Vol.5, No.4, October 2014
82
[35] Mediator Lexicon (الوعجن الىسيظ ),http://kamoos.reefnet.gov.sy/
[36] Fadel Mostafa El-Saki, "Arab parts of speech in terms of form and function"
(أقسام الكلام العرت هي حيث الشكل والىظيفح ),1977.
[37] Protege http://protege.stanford.edu/
[38] JExcelApi, http://jexcelapi.sourceforge.net/ [39] Apache Jena, https://jena.apache.org/index.html