This document discusses the development of a machine translation system to translate English LaTeX mathematical documents into Arabic LaTeX. The system uses a Transformer model for the natural language translation and integrates RyDArab and rule-based translation for Arabic mathematical expressions. It aims to address the growing need for multilingual accessibility in scientific literature. The system shows satisfactory results but also has some limitations like low BLEU scores and issues with the rule-based expression translation that can be improved in future work.
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present ATAR, an ATtention-based LSTM model for ARabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by
taking advantage of large scale parallel corpora, but very few research works are conducted on the
Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel
Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic
text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation
of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using
Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.
LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM
based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score
of 12%, 11%, and 6% respectively.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively
Automatic Extraction of Spatio-Temporal Information from Arabic Text Documentsijcsit
Unstructured Arabic text documents are an important source of geographical and temporal information.
The possibility of automatically tracking spatio-temporal information, capturing changes relating to events
from text documents, is a new challenge in the fields of geographic information retrieval (GIR), temporal
information retrieval (TIR) and natural language processing (NLP). There was a lot of work on the
extraction of information in other languages that use Latin alphabet, such as English,, French, or Spanish,
by against the Arabic language is still not well supported in GIR and TIR and it needs to conduct more
researches. In this paper, we present an approach that support automated exploration and extraction of
spatio-temporal information from Arabic text documents in order to capture and model such information
before it can be utilized in search and exploration tasks. The system has been successfully tested on 50
documents that include a mixture of types of Spatial/temporal information. The result achieved 91.01% of
recall and of 80% precision. This illustrates that our approach is effective and its performance is
satisfactory.
Contextual Analysis for Middle Eastern Languages with Hidden Markov Modelsijnlc
Displaying a document in Middle Eastern languages requires contextual analysis due to different presentational forms for each character of the alphabet. The words of the document will be formed by the joining of the correct positional glyphs representing corresponding presentational forms of the
characters. A set of rules defines the joining of the glyphs. As usual, these rules vary from language to language and are subject to interpretation by the software developers.
Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation Systemijtsrd
Sadhu and Cholit bhasha are two significant Bangladeshi languages. Sadhu was functional in ancient era and had Sanskrit components but in present era cholit took its place. There are many formal and legal paper works present in Sadhu language which direly need to be translated in Cholit because its more favorable and speaker friendly. Therefore, this paper dealt with this issue by familiarizing the current era with Sadhu by creating a software. Different sentences were chosen and final data set was obtained by Principal Component Analysis PCA . MATLAB and Python are used for different machine learning algorithms. Most work is being done using Scikit Learn and MATLAB machine learning toolbox. It was found that Linear Discriminant Analysis LDA functions best. Speed prediction was also done and values were determined through graphs. It was inferred that this categorizer efficiently translated all Sadhu words to Cholit precisely and in well structured way. Therefore, Sadhu will not remain a complex language in this decade. Nakib Aman Turzo | Pritom Sarker | Biplob Kumar "Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30792.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30792/interpretation-of-sadhu-into-cholit-bhasha-by-cataloguing-and-translation-system/nakib-aman-turzo
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
Recent advancements in the field of natural language processing have markedly enhanced the capability of machines to comprehend human language. However, as language models progress, they require continuous architectural enhancements and different approaches to text processing. One significant challenge stems from the rich diversity of languages, each characterized by its distinctive grammar resulting ina decreased accuracy of language models for specific languages, especially for low-resource languages. This limitation is exacerbated by the reliance of existing NLP models on rigid tokenization methods, rendering them susceptible to issues with previously unseen or infrequent words. Additionally, models based on word and subword tokenization are vulnerable to minor typographical errors, whether they occur naturally or result from adversarial misspellings. To address these challenges, this paper presents the utilization of a recently proposed free-tokenization method, such as Cannine, to enhance the comprehension of natural language. Specifically, we employ this method to develop an Arabic-free tokenization language model. In this research, we will precisely evaluate our model’s performance across a range of eight tasks using Arabic Language Understanding Evaluation (ALUE) benchmark. Furthermore, we will conduct a comparative analysis, pitting our free-tokenization model against existing Arabic language models that rely on sub-word tokenization. By making our pre-training and fine-tuning models accessible to the Arabic NLP community, we aim to facilitate the replication of our experiments and contribute to the advancement of Arabic language processing capabilities. To further support reproducibility and open-source collaboration, the complete source code and model checkpoints will be made publicly available on our Huggingface1 . In conclusion, the results of our study will demonstrate that the free-tokenization approach exhibits comparable performance to established Arabic language models that utilize sub-word tokenization techniques. Notably, in certain tasks, our model surpasses the performance of some of these existing models. This evidence underscores the efficacy of free-tokenization in processing the Arabic language, particularly in specific linguistic contexts.
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present ATAR, an ATtention-based LSTM model for ARabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
Many automatic translation works have been addressed between major European language pairs, by
taking advantage of large scale parallel corpora, but very few research works are conducted on the
Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel
Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic
text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation
of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using
Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.
LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM
based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score
of 12%, 11%, and 6% respectively.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively
Automatic Extraction of Spatio-Temporal Information from Arabic Text Documentsijcsit
Unstructured Arabic text documents are an important source of geographical and temporal information.
The possibility of automatically tracking spatio-temporal information, capturing changes relating to events
from text documents, is a new challenge in the fields of geographic information retrieval (GIR), temporal
information retrieval (TIR) and natural language processing (NLP). There was a lot of work on the
extraction of information in other languages that use Latin alphabet, such as English,, French, or Spanish,
by against the Arabic language is still not well supported in GIR and TIR and it needs to conduct more
researches. In this paper, we present an approach that support automated exploration and extraction of
spatio-temporal information from Arabic text documents in order to capture and model such information
before it can be utilized in search and exploration tasks. The system has been successfully tested on 50
documents that include a mixture of types of Spatial/temporal information. The result achieved 91.01% of
recall and of 80% precision. This illustrates that our approach is effective and its performance is
satisfactory.
Contextual Analysis for Middle Eastern Languages with Hidden Markov Modelsijnlc
Displaying a document in Middle Eastern languages requires contextual analysis due to different presentational forms for each character of the alphabet. The words of the document will be formed by the joining of the correct positional glyphs representing corresponding presentational forms of the
characters. A set of rules defines the joining of the glyphs. As usual, these rules vary from language to language and are subject to interpretation by the software developers.
Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation Systemijtsrd
Sadhu and Cholit bhasha are two significant Bangladeshi languages. Sadhu was functional in ancient era and had Sanskrit components but in present era cholit took its place. There are many formal and legal paper works present in Sadhu language which direly need to be translated in Cholit because its more favorable and speaker friendly. Therefore, this paper dealt with this issue by familiarizing the current era with Sadhu by creating a software. Different sentences were chosen and final data set was obtained by Principal Component Analysis PCA . MATLAB and Python are used for different machine learning algorithms. Most work is being done using Scikit Learn and MATLAB machine learning toolbox. It was found that Linear Discriminant Analysis LDA functions best. Speed prediction was also done and values were determined through graphs. It was inferred that this categorizer efficiently translated all Sadhu words to Cholit precisely and in well structured way. Therefore, Sadhu will not remain a complex language in this decade. Nakib Aman Turzo | Pritom Sarker | Biplob Kumar "Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30792.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30792/interpretation-of-sadhu-into-cholit-bhasha-by-cataloguing-and-translation-system/nakib-aman-turzo
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
Recent advancements in the field of natural language processing have markedly enhanced the capability of machines to comprehend human language. However, as language models progress, they require continuous architectural enhancements and different approaches to text processing. One significant challenge stems from the rich diversity of languages, each characterized by its distinctive grammar resulting ina decreased accuracy of language models for specific languages, especially for low-resource languages. This limitation is exacerbated by the reliance of existing NLP models on rigid tokenization methods, rendering them susceptible to issues with previously unseen or infrequent words. Additionally, models based on word and subword tokenization are vulnerable to minor typographical errors, whether they occur naturally or result from adversarial misspellings. To address these challenges, this paper presents the utilization of a recently proposed free-tokenization method, such as Cannine, to enhance the comprehension of natural language. Specifically, we employ this method to develop an Arabic-free tokenization language model. In this research, we will precisely evaluate our model’s performance across a range of eight tasks using Arabic Language Understanding Evaluation (ALUE) benchmark. Furthermore, we will conduct a comparative analysis, pitting our free-tokenization model against existing Arabic language models that rely on sub-word tokenization. By making our pre-training and fine-tuning models accessible to the Arabic NLP community, we aim to facilitate the replication of our experiments and contribute to the advancement of Arabic language processing capabilities. To further support reproducibility and open-source collaboration, the complete source code and model checkpoints will be made publicly available on our Huggingface1 . In conclusion, the results of our study will demonstrate that the free-tokenization approach exhibits comparable performance to established Arabic language models that utilize sub-word tokenization techniques. Notably, in certain tasks, our model surpasses the performance of some of these existing models. This evidence underscores the efficacy of free-tokenization in processing the Arabic language, particularly in specific linguistic contexts.
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Seminar report on a statistical approach to machineHrishikesh Nair
This document is a seminar report on statistical machine translation presented by B Hrishikesh at Rajagiri School of Engineering and Technology. It provides an overview of machine translation techniques, focusing in detail on the basic statistical model. The report discusses the history of machine translation approaches, describes the noisy channel model for statistical machine translation, and covers key components like language modeling using n-grams, alignments, translation modeling, and parameter estimation methods. It also presents results from two pilot experiments on statistical machine translation.
This paper discusses a new metric that has been applied to verify the quality in translation between sentence pairs in parallel corpora of Arabic-English. This metric combines two techniques, one based on sentence length and the other based on compression code length. Experiments on sample test parallel Arabic-English corpora indicate the combination of these two techniques improves accuracy of the identification of satisfactory and unsatisfactory
sentence pairs compared to sentence length and compression code length alone. The newmethod proposed in this research is effective at filtering noise and reducing mis-translations resulting in greatly improved quality.
The document discusses machine translation (MT) between Arabic and English. It covers several key topics:
1. It outlines the challenges of Arabic natural language processing and MT, including the differences between Modern Standard Arabic and dialects and a lack of annotated resources.
2. It describes different types of MT systems like direct translation engines and those using linguistic knowledge architectures. It also discusses the importance of dictionaries.
3. It discusses common MT problems such as ambiguity and differences between languages.
4. It proposes a small prototype Arabic to English MT model to demonstrate basic techniques like normalization, tokenization, stemming and using a parser and transformation rules.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
The demand for multilingual information is becoming erceptive as the users of the internet throughout the world are escalating and it creates a problem of retrieving documents in one language by specifying query in another language. This increasing demand can be addressed by designing automatic tools, which accepts the query in one language and retrieves the relevant documents in other languages. We have developed prototype Amharic-Arabic Cross Language
Information Retrieval System by applying dictionary-based approach that enables the users to retrieve relevant documents from Amharic-Arabic corpus by entering the query in Amharic and retrieving the relevant documents both Amharic and Arabic.
Extracting numerical data from unstructured Arabic texts(ENAT)nooriasukmaningtyas
This document discusses a system called ENAT that aims to extract numerical data from Arabic unstructured text. ENAT includes an Arabic numerical dictionary containing stems that refer to numeric values. It also includes rules based on Arabic linguistic and morphological rules related to numerical terms. ENAT receives Arabic text as input, analyzes it using the dictionary and rules to extract numerical phrases, and then converts the phrases to integer values. The system is composed of four main components - the dictionary, rules, a data extraction unit, and a calculation unit. The author evaluates ENAT and finds it can extract numerical data from Arabic unstructured text with an accuracy of 100%.
Smart grammar a dynamic spoken language understanding grammar for inflective ...ijnlc
1. The document proposes SmartGrammar, a new method for developing spoken language understanding grammars for inflectional languages like Italian.
2. SmartGrammar uses a morphological analyzer to convert user utterances into their canonical forms before parsing, allowing the grammar to contain only canonical word forms rather than all possible inflections.
3. This significantly reduces the complexity and size of grammars for inflectional languages by representing many possible inflected forms with a single canonical form entry, making grammar development and management easier.
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.
MOLTO poster for ACL 2010, Uppsala SwedenOlga Caprotti
MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914. MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the tool and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.
http://molto-project.eu
The document summarizes a project to incorporate subject areas into the Apertium machine translation system used by the Universitat Oberta de Catalunya (UOC) virtual university. The project aimed to improve translation quality by adding subject area filters to disambiguate terms. It involved analyzing UOC subject areas, creating dictionaries of terms tagged with subject areas, and implementing subject filters in the interface. The filters allow users to select a subject area to generate translations tailored to that field's terminology. The project benefits UOC administrative staff, linguists, teachers, students and other users by producing more accurate translations for different domains.
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
In the article the results of development of machine translation expert system is presented. The approach of translation correspondences defining is suggested as a background for creation of data base and knowledge base of the system. Methods of transformation rules compiling applied for linguistic knowledge base of the expert system are based on the defining of translation correspondences between Azerbaijani and English languages.
A Survey of Arabic Text Classification Models IJECEIAES
There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...ijnlc
ext segmentation task is an essential processing task for many of Natural Language Processing (NLP)
such as text summarization, text translation, dialogue language understanding, among others. Turns
segmentation consi
dered the key player in dialogue understanding task for building automatic Human
-
Computer systems. In this paper, we introduce a novel approach to turn segmentation into utterances for
Egyptian spontaneous dialogues and Instance Messages (IM) using Machine
Learning (ML) approach as a
part of automatic understanding Egyptian spontaneous dialogues and IM task. Due to the lack of Egyptian
dialect
dialogue
corpus
the system evaluated by our
corpus
includes 3001 turns, which
are collected,
segmented, and annotat
ed manually from Egyptian call
-
centers. The system achieves F
1
scores
of 90.74%
and accuracy of 95.98%
This document provides information about LexiGraf, a software base developed to automate the creation of multilingual dictionaries. LexiGraf incorporates database functionality and handles tasks like layout, indexing, and formatting to output dictionary pages ready for printing. It is being used to create a dictionary with over 50,000 terms each in English, French, German and Greek covering various scientific fields. The document describes LexiGraf's features, technical specifications, implementation of the Greek science dictionary project, and its demonstration at the CRIS98 conference.
LIT (Lexicon of the Italian Television) is a project conceived by the Accademia della Crusca, the leading research institution on the Italian language, in collaboration with CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental languages), with the aim of studying frequencies of the Italian lexicon used in television content and targets the specific sector of web applications for linguistic research. The corpus of transcriptions is constituted approximately by 170 hours of random television recordings transmitted by the national broadcaster RAI (Italian Radio Television) during the year 2006.
XMODEL: An XML-based Morphological Analyzer for Arabic LanguageWaqas Tariq
Morphological analysis is an essential stage in language engineering applications. For the Arabic language, this stage is not easy to develop because the Arabic language has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. These reasons make the design of the morphological analyzer for Arabic somewhat difficult and require lots of other tools and treatments. The volume of the lexicon is another big problem of the morphological analysis of the Arabic Language which affects directly the process of the analyzing. In this paper we present a Morphological Analyzer for Modern Standard Arabic based on Arabic Morphological Automaton technique and using a new and innovative language (XMODEL) to represent the Arabic morphological knowledge in an optimal way. Both the Arabic Morphological Analyzer and Arabic Morphological Automaton are implemented in Java language and used XML technology. Buckwalter Arabic Morphological Analyzer and Xerox Arabic Finite State Morphology are two of the best known morphological analyzers for Modern Standard Arabic and they are also available and documented. Our Morphological Analyzer can be exploited by Natural Language Processing (NLP) applications such as machine translation, orthographical correction, information retrieval and both syntactic and semantic analyzers. At the end, an evaluation of Xerox and our system is done.
New text steganography method using the arabic letters dotsnooriasukmaningtyas
With the increasing technological and electronic development, methods have been developed to hide important information using text steganography as a new technology, since it is not noticeable and easy to send and receive. The use of the Arabic language is one of the new methods used to hide data. In this work, we preview our method that depends to use the part of Arabic language properties to embed the secret English message in to cover text to create text steganography. More than half of the Arabic characters contain dots. Several characters have upper dots and others have lower dots. Some have one dot others have two dots. Few have even three dots. In this new idea, we will use the dots of charters to embed the English secret message. First, we will compress the secret message by using the 5-Bit Encoding (T5BE) to make the cover text able to embed more bits of the secret message by 37.5%. Then we start using the Arabic semantic dictionary to correct the hiding path and enhancement the stego-cover text to eliminate errors caused by switching words. In this research, we were able to extract experimental results that show that the proposed model achieves high masking accuracy in addition to the storage capacity of the cover text.
This document summarizes research on models for classifying Arabic text. It discusses how text classification can organize large amounts of Arabic text by categorizing documents. The document reviews several studies that have applied algorithms like Naive Bayes, K-Nearest Neighbor, and Support Vector Machines to classify Arabic texts with accuracies ranging from 62.7% to 91%. It also outlines some of the linguistic challenges of classifying Arabic, which has a complex orthography compared to languages like English. Finally, it provides a brief overview of common text classification techniques like preprocessing, feature extraction, evaluation, and the machine learning vs rule-based approaches.
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI ...gerogepatton
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI & FL 2024) provides a
forum for researchers who address this issue and to present their work in a peer-reviewed forum. Authors
are solicited to contribute to the conference by submitting articles that illustrate research results, projects,
surveying works and industrial experiences that describe significant advances in the following areas, but
are not limited to these topics only.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
More Related Content
Similar to English to Arabic Machine Translation of Mathematical Documents
People across the globe have access to materials such as journals, articles, adverts etc. via the internet. However
many of these resources come in diverse nature of languages. Although, English language seems most suitable to
most people, some readers do believe that working on materials in one’s native language is more enjoyable than in
other languages. Researches have shown that Arabic language has not been prominent in terms of online materials
and the few existing are most times ignored due to the peculiar nature of its various characters and constructs.
Hence, a proper study of its relationship with English language with a view to bringing people closer to its
understanding becomes necessary. The system scenarios were modeled and implemented using Unified Modeling
Language and Microsoft C# respectively in a way that the expected set of characters of the language of interest was
automatically formed with respect to a given input. The procedural steps were properly followed in the development
and running of the code using Context-Free Rule Based Technique with the availability of hardware required as
clearly described in the design. The system’s workability was tested with different source texts as inputs and in each
case the resulting outputs were very effective with respect to the translation process. The design here is expected to
serve as a tool for assisting beginners in these two languages and so, showcases a one-to-one form of
correspondence, hence, more rules and functions for ensuring a more robust are expected in future works.
Seminar report on a statistical approach to machineHrishikesh Nair
This document is a seminar report on statistical machine translation presented by B Hrishikesh at Rajagiri School of Engineering and Technology. It provides an overview of machine translation techniques, focusing in detail on the basic statistical model. The report discusses the history of machine translation approaches, describes the noisy channel model for statistical machine translation, and covers key components like language modeling using n-grams, alignments, translation modeling, and parameter estimation methods. It also presents results from two pilot experiments on statistical machine translation.
This paper discusses a new metric that has been applied to verify the quality in translation between sentence pairs in parallel corpora of Arabic-English. This metric combines two techniques, one based on sentence length and the other based on compression code length. Experiments on sample test parallel Arabic-English corpora indicate the combination of these two techniques improves accuracy of the identification of satisfactory and unsatisfactory
sentence pairs compared to sentence length and compression code length alone. The newmethod proposed in this research is effective at filtering noise and reducing mis-translations resulting in greatly improved quality.
The document discusses machine translation (MT) between Arabic and English. It covers several key topics:
1. It outlines the challenges of Arabic natural language processing and MT, including the differences between Modern Standard Arabic and dialects and a lack of annotated resources.
2. It describes different types of MT systems like direct translation engines and those using linguistic knowledge architectures. It also discusses the importance of dictionaries.
3. It discusses common MT problems such as ambiguity and differences between languages.
4. It proposes a small prototype Arabic to English MT model to demonstrate basic techniques like normalization, tokenization, stemming and using a parser and transformation rules.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
The demand for multilingual information is becoming erceptive as the users of the internet throughout the world are escalating and it creates a problem of retrieving documents in one language by specifying query in another language. This increasing demand can be addressed by designing automatic tools, which accepts the query in one language and retrieves the relevant documents in other languages. We have developed prototype Amharic-Arabic Cross Language
Information Retrieval System by applying dictionary-based approach that enables the users to retrieve relevant documents from Amharic-Arabic corpus by entering the query in Amharic and retrieving the relevant documents both Amharic and Arabic.
Extracting numerical data from unstructured Arabic texts(ENAT)nooriasukmaningtyas
This document discusses a system called ENAT that aims to extract numerical data from Arabic unstructured text. ENAT includes an Arabic numerical dictionary containing stems that refer to numeric values. It also includes rules based on Arabic linguistic and morphological rules related to numerical terms. ENAT receives Arabic text as input, analyzes it using the dictionary and rules to extract numerical phrases, and then converts the phrases to integer values. The system is composed of four main components - the dictionary, rules, a data extraction unit, and a calculation unit. The author evaluates ENAT and finds it can extract numerical data from Arabic unstructured text with an accuracy of 100%.
Smart grammar a dynamic spoken language understanding grammar for inflective ...ijnlc
1. The document proposes SmartGrammar, a new method for developing spoken language understanding grammars for inflectional languages like Italian.
2. SmartGrammar uses a morphological analyzer to convert user utterances into their canonical forms before parsing, allowing the grammar to contain only canonical word forms rather than all possible inflections.
3. This significantly reduces the complexity and size of grammars for inflectional languages by representing many possible inflected forms with a single canonical form entry, making grammar development and management easier.
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.
MOLTO poster for ACL 2010, Uppsala SwedenOlga Caprotti
MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914. MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the tool and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.
http://molto-project.eu
The document summarizes a project to incorporate subject areas into the Apertium machine translation system used by the Universitat Oberta de Catalunya (UOC) virtual university. The project aimed to improve translation quality by adding subject area filters to disambiguate terms. It involved analyzing UOC subject areas, creating dictionaries of terms tagged with subject areas, and implementing subject filters in the interface. The filters allow users to select a subject area to generate translations tailored to that field's terminology. The project benefits UOC administrative staff, linguists, teachers, students and other users by producing more accurate translations for different domains.
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
In the article the results of development of machine translation expert system is presented. The approach of translation correspondences defining is suggested as a background for creation of data base and knowledge base of the system. Methods of transformation rules compiling applied for linguistic knowledge base of the expert system are based on the defining of translation correspondences between Azerbaijani and English languages.
A Survey of Arabic Text Classification Models IJECEIAES
There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...ijnlc
ext segmentation task is an essential processing task for many of Natural Language Processing (NLP)
such as text summarization, text translation, dialogue language understanding, among others. Turns
segmentation consi
dered the key player in dialogue understanding task for building automatic Human
-
Computer systems. In this paper, we introduce a novel approach to turn segmentation into utterances for
Egyptian spontaneous dialogues and Instance Messages (IM) using Machine
Learning (ML) approach as a
part of automatic understanding Egyptian spontaneous dialogues and IM task. Due to the lack of Egyptian
dialect
dialogue
corpus
the system evaluated by our
corpus
includes 3001 turns, which
are collected,
segmented, and annotat
ed manually from Egyptian call
-
centers. The system achieves F
1
scores
of 90.74%
and accuracy of 95.98%
This document provides information about LexiGraf, a software base developed to automate the creation of multilingual dictionaries. LexiGraf incorporates database functionality and handles tasks like layout, indexing, and formatting to output dictionary pages ready for printing. It is being used to create a dictionary with over 50,000 terms each in English, French, German and Greek covering various scientific fields. The document describes LexiGraf's features, technical specifications, implementation of the Greek science dictionary project, and its demonstration at the CRIS98 conference.
LIT (Lexicon of the Italian Television) is a project conceived by the Accademia della Crusca, the leading research institution on the Italian language, in collaboration with CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental languages), with the aim of studying frequencies of the Italian lexicon used in television content and targets the specific sector of web applications for linguistic research. The corpus of transcriptions is constituted approximately by 170 hours of random television recordings transmitted by the national broadcaster RAI (Italian Radio Television) during the year 2006.
XMODEL: An XML-based Morphological Analyzer for Arabic LanguageWaqas Tariq
Morphological analysis is an essential stage in language engineering applications. For the Arabic language, this stage is not easy to develop because the Arabic language has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. These reasons make the design of the morphological analyzer for Arabic somewhat difficult and require lots of other tools and treatments. The volume of the lexicon is another big problem of the morphological analysis of the Arabic Language which affects directly the process of the analyzing. In this paper we present a Morphological Analyzer for Modern Standard Arabic based on Arabic Morphological Automaton technique and using a new and innovative language (XMODEL) to represent the Arabic morphological knowledge in an optimal way. Both the Arabic Morphological Analyzer and Arabic Morphological Automaton are implemented in Java language and used XML technology. Buckwalter Arabic Morphological Analyzer and Xerox Arabic Finite State Morphology are two of the best known morphological analyzers for Modern Standard Arabic and they are also available and documented. Our Morphological Analyzer can be exploited by Natural Language Processing (NLP) applications such as machine translation, orthographical correction, information retrieval and both syntactic and semantic analyzers. At the end, an evaluation of Xerox and our system is done.
New text steganography method using the arabic letters dotsnooriasukmaningtyas
With the increasing technological and electronic development, methods have been developed to hide important information using text steganography as a new technology, since it is not noticeable and easy to send and receive. The use of the Arabic language is one of the new methods used to hide data. In this work, we preview our method that depends to use the part of Arabic language properties to embed the secret English message in to cover text to create text steganography. More than half of the Arabic characters contain dots. Several characters have upper dots and others have lower dots. Some have one dot others have two dots. Few have even three dots. In this new idea, we will use the dots of charters to embed the English secret message. First, we will compress the secret message by using the 5-Bit Encoding (T5BE) to make the cover text able to embed more bits of the secret message by 37.5%. Then we start using the Arabic semantic dictionary to correct the hiding path and enhancement the stego-cover text to eliminate errors caused by switching words. In this research, we were able to extract experimental results that show that the proposed model achieves high masking accuracy in addition to the storage capacity of the cover text.
This document summarizes research on models for classifying Arabic text. It discusses how text classification can organize large amounts of Arabic text by categorizing documents. The document reviews several studies that have applied algorithms like Naive Bayes, K-Nearest Neighbor, and Support Vector Machines to classify Arabic texts with accuracies ranging from 62.7% to 91%. It also outlines some of the linguistic challenges of classifying Arabic, which has a complex orthography compared to languages like English. Finally, it provides a brief overview of common text classification techniques like preprocessing, feature extraction, evaluation, and the machine learning vs rule-based approaches.
Similar to English to Arabic Machine Translation of Mathematical Documents (20)
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI ...gerogepatton
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI & FL 2024) provides a
forum for researchers who address this issue and to present their work in a peer-reviewed forum. Authors
are solicited to contribute to the conference by submitting articles that illustrate research results, projects,
surveying works and industrial experiences that describe significant advances in the following areas, but
are not limited to these topics only.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)gerogepatton
3rd International Conference on Artificial Intelligence Advances (AIAD 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area advanced Artificial Intelligence. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the research area. Core areas of AI and advanced multi-disciplinary and its applications will be covered during the conferences.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Information Extraction from Product Labels: A Machine Vision Approachgerogepatton
This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...gerogepatton
Aiming at the problems of poor local search ability and precocious convergence of fuzzy C-cluster
recursive genetic algorithm (FOLD++), a new fuzzy C-cluster recursive genetic algorithm based on
Bayesian function adaptation search (TS) was proposed by incorporating the idea of Bayesian function
adaptation search into fuzzy C-cluster recursive genetic algorithm. The new algorithm combines the
advantages of FOLD++ and TS. In the early stage of optimization, fuzzy C-cluster recursive genetic
algorithm is used to get a good initial value, and the individual extreme value pbest is put into Bayesian
function adaptation table. In the late stage of optimization, when the searching ability of fuzzy C-cluster
recursive genetic is weakened, the short term memory function of Bayesian function adaptation table in
Bayesian function adaptation search algorithm is utilized. Make it jump out of the local optimal solution,
and allow bad solutions to be accepted during the search. The improved algorithm is applied to function
optimization, and the simulation results show that the calculation accuracy and stability of the algorithm
are improved, and the effectiveness of the improved algorithm is verified
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Soft Computing (...gerogepatton
10th International Conference on Artificial Intelligence and Soft Computing (AIS 2024) will
provide an excellent international forum for sharing knowledge and results in theory, methodology, and
applications of Artificial Intelligence, Soft Computing. The Conference looks for significant
contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical
aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from
both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
Employee attrition refers to the decrease in staff numbers within an organization due to various reasons.
As it has a negative impact on long-term growth objectives and workplace productivity, firms have
recognized it as a significant concern. To address this issue, organizations are increasingly turning to
machine-learning approaches to forecast employee attrition rates. This topic has gained significant
attention from researchers, especially in recent times. Several studies have applied various machinelearning methods to predict employee attrition, producing different resultsdepending on the employed
methods, factors, and datasets. However, there has been no comprehensive comparative review of multiple
studies applying machine-learning models to predict employee attrition to date. Therefore, this study aims
to fill this gap by providing an overview of research conducted on applying machine learning to predict
employee attrition from 2019 to February 2024. A literature review of relevant studies was conducted,
summarized, and classified. Most studies agree on conducting comparative experiments with multiple
predictive models to determine the most effective one.From this literature survey, the RF algorithm and
XGB ensemble method are repeatedly the best-performing, outperforming many other algorithms.
Additionally, the application of deep learning to employee attrition prediction issues also shows promise.
While there are discrepancies in the datasets used in previous studies, it is notable that the dataset
provided by IBM is the most widely utilized. This study serves as a concise review for new researchers,
facilitating their understanding of the primary techniques employed in predicting employee attrition and
highlighting recent research trends in this field. Furthermore, it provides organizations with insight into
the prominent factors affecting employee attrition, as identified by studies, enabling them to implement
solutions aimed at reducing attrition rates.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AIFU 2024) is a forum for presenting new advances and research results in the fields of Artificial Intelligence. The conference will bring together leading researchers, engineers and scientists in the domain of interest from around the world. The scope of the conference covers all theoretical and practical aspects of the Artificial Intelligence.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...gerogepatton
This paper summarizes the most cogent advantages and risks associated with Artificial Intelligence from an
in-depth review of the literature. Then the authors synthesize the salient risk-related models currently being
used in AI, technology and business-related scenarios. Next, in view of an updated context of AI along with
theories and models reviewed and expanded constructs, the writers propose a new framework called “The
Transformation Risk-Benefit Model of Artificial Intelligence” to address the increasing fears and levels of
AIrisk. Using the model characteristics, the article emphasizes practical and innovative solutions where
benefitsoutweigh risks and three use cases in healthcare, climate change/environment and cyber security to
illustrate unique interplay of principles, dimensions and processes of this powerful AI transformational
model.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
English to Arabic Machine Translation of Mathematical Documents
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
DOI:10.5121/ijaia.2023.14602 13
ENGLISH TO ARABIC MACHINE TRANSLATION OF
MATHEMATICAL DOCUMENTS
Mustapha Eddahibi and Mohammed Mensouri
IMIS Laboratory, Ibnou Zohr University – Agadir Morocco
ABSTRACT
This paper is about the development of a machine translation system tailored specifically for LATEX
mathematical documents. The system focuses on translating English LATEX mathematical documents into
Arabic LATEX, catering to the growing demand for multilingual accessibility in scientific and
mathematical literature. With the vast proliferation of LATEX mathematical documents the need for an
efficient and accurate translation system has become increasingly essential. This paper addresses the
necessity for a robust translation tool that enables seamless communication and comprehension of complex
mathematical content across language barriers. The proposed system leverages a Transformer model as
the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic
LATEX documents. Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension,
along with a rule-based translator for Arabic mathematical expressions, contributes to the precise
rendering of complex mathematical symbols and equations in the translated output. The paper discusses
the architecture, methodology, of the developed system, highlighting its efficacy in bridging the language
gap in the domain of mathematical documentation.
KEYWORDS
Machine Translation System, LATEX mathematical Documents, Arabic mathematical notation, Text-to-
Text Transfer Transformer, RyDArab, Language Processing.
1. INTRODUCTION
Machine Translation (MT), is the conversion of text from one language to another using
computers. This process involves the utilization of programming languages and software to
accomplish the translation tasks [1]. Various types of content, ranging from commercial and
business documents to scientific papers, instruction manuals, textbooks, and online materials,
require translation services. Presently, machine translation has found widespread application in
the language services industry and has become a common tool for professional translators.
However, machine translation has limitations and is not suitable for every type of textual content.
The challenges in Machine Translation stem from the disparities in grammatical structures
existing between the source and target languages. Furthermore, the complexity of translation can
be influenced by the specific category of the text. While the translation of everyday spoken
language is relatively straightforward, rendering poetry, philosophical treatises, scientific or
technical documents poses more intricate challenges [2]. The seamless translation of complex
mathematical documents from English to Arabic stands as a big challenge in the machine
translation field. Significant progress has been achieved in this domain, the intricacies of
mathematical language, particularly in the context of LATEX [3] syntax, present a critical gap
that demands attention.
This study delves into uncharted territory, aiming to bridge the gap in the translation of
mathematical content into the Arabic language, a domain remains relatively unexplored in the
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
14
current body of literature. Leveraging cutting-edge methodologies such as LATEX parsing,
mathematical expression tokenization, and the utilization of a Transformer for natural language
translation, alongside a rule-based translator for managing mathematical expressions, our
research aspires to contribute significantly to the burgeoning field of automated translation.
By building upon recent advancements in pre-trained language models, we seek to unravel the
complex interplay between mathematical expressions and linguistic structures, thereby unlocking
the potential for comprehensive and accurate translation.
Despite the acknowledging the inherent intricacies linked with handling symbolic mathematical
content, the absence of a suitable corpus remains a notable limitation. The intricate nature of
mathematical language, coupled with the unique challenges posed by Arabic notation,
necessitates a meticulous and nuanced approach to the translation process.
This research holds substantial promise for the educational landscape, benefitting researchers,
educators, and students alike. By facilitating a deeper comprehension of intricate mathematical
documents, our work endeavours to streamline the creation, modification, and dissemination of
these materials in both English and Arabic contexts. The digitized representation of such
documents not only fosters their creation and archiving but also enables their seamless exchange
via various communication networks. However, realizing this objective necessitates overcoming
several challenges, particularly in dealing with the distinctive nature of mathematical expressions
in Arabic notation.
2. RELATED WORK
The prominence of English as the universal language of science emerged a mere four centuries
ago, marking a significant turning point in global communication. Its unparalleled expansion
across the world post-World War II solidified its dominance, surpassing the influence of any
other language in history [4]. This trend has underscored the democratization of scientific
knowledge, granting access to a wider audience beyond traditional linguistic barriers.
Nonetheless, it emphasizes the critical importance of promoting education in one's native
language, highlighting the necessity for striking a balance between the advantages of a common
scientific language and the preservation of cultural and linguistic diversity for comprehensive
learning.
The recognition of the significance of automated translation emerged many years back.
Numerous studies have focused on machine translation within the scientific realm. Tehseen et al.
introduced an English-to-Urdu scientific text translator that employs term tagging and domain-
specific translation [5]. This translator was specifically designed for the translation of computer
science documents and was assessed using a self-generated corpus within the computer science
domain. The translator algorithm didn’t show mathematical expressions translation or used file
format.
Because of the widespread adoption of LATEX stems owing to its capacity to manage complex
mathematical expressions, equations, and symbols with unparalleled precision and elegance, the
translation of documents in such format is a great opportunity. Ohri et al. developed a machine
translation system named Polymath for converting LATEX documents with mathematical text
[6]. This system is capable of translating English LATEX content to French LATEX. It operates
by transforming the main content of an input LATEX document into English sentences with
mathematical tokens, followed by the application of a pre-trained Transformer-based translator
model.
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
15
The process of translating LATEX documents from English to French is relatively
straightforward compared to the challenges encountered when translating from English to Arabic.
This discrepancy arises from the fundamental differences between the Arabic and English
mathematical notations. The Arabic mathematical system, distinct in its structure and
representation, introduces complexities that are not encountered in the French translation process.
The nuances of Arabic mathematical symbols and expressions often require specific handling
techniques and an in-depth understanding of the language's unique grammatical rules and
conventions. Despite the potential complexities, the English to French LATEX translation
process proves to be more seamless due to the greater similarity in notation and linguistic
constructs between the two languages.
3. ARABIC MATHEMATICAL NOTATION
In Arabic scientific documents there are two models of mathematical expressions: mixed and
Pure Arabic notations. Mixed mathematical notation used in some countries like Morocco, is the
outcome of a literal word-to-word translation of mathematical French books. The syntactic
models of several formulations have been transported to the new language. The symbolic writing
was imported just as it was, without any changes.
Figure 1. Mixed Mathematical Notation
In the pure Arabic presentation, mathematical expressions spread from right to left and use
Arabic symbols from its alphabet. These symbols are used to note unknown variables and
functions names. As for common functions that are replaced by their abbreviated names. In this
notation we can distinguish two types. The first uses some mirrored latin symbols like the sum
. The second type uses some Arabic alphabet-based symbols like for the sum.
Figure 2. Pure Arabic Notation with mirrored symbols
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
16
Figure 3. Pure Arabic Notation with Arabic alphabet-based symbols
4. TRANSLATION OF THE LATEX DOCUMENT
The process of translating the LATEX document comprises two distinct phases, each contributing
to the accurate conversion of its contents. Initially, the document undergoes a comprehensive
parsing phase, allowing for the precise identification and isolation of its constituent elements.
Subsequently, the second phase entails the systematic extraction of mathematical expressions,
subsequently replaced by an indexed list of tokens (Exprr1, Exprr2, etc.). These identified
expressions, denoted as Exprri, serve as novel entities during the natural language translation
process, functioning as placeholders for the mathematical components. Consequently, the
mathematical expressions are stored within a list, with the corresponding tokens acting as their
respective indices. The subsequent translation process relies on a rule-based translation function
tailored to the unique characteristics of mathematical notations. This function operates using the
specific mathematical expression, its associated token, and the corresponding notation type,
ensuring the seamless and accurate transformation of the mathematical content within the
document.
4.1. English to Arabic Natural Language Translator
The natural language translation part of the system is based on transformer architecture [7].
Opting for transformers over RNNs is supported by their capacity to process input data in
parallel. They are not susceptible to the vanishing gradient issue. Transformers proficiently grasp
intricate connections among various components of the input and utilize attention mechanisms to
focus on relevant portions of the input sequence [8].
Before the translation a pre-processing phase is done. The content of parsed LATEX blocs is
segmented to sentences. Certain formatting commands, such as textbf and textit, which alter the
style of individual or multiple words within a sentence, are eliminated to streamline the text. This
process aims to enhance the translation, particularly considering that Arabic typographic
conventions do not employ bold, italic, or roman styles, among others.
The transformer undergoes training and validation using datasets comprising English words,
sentences, and paragraphs along with their corresponding Arabic translations. Its architecture
consists of two primary components: an encoder and a decoder for sequence-to-sequence
translation, alongside an attention mechanism. Figure 4 provides an overview of the system.
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
17
Figure 4. Overview of the developed system
4.2. Translation of Mathematical Expression to Arabic Notation
The authors note that the existing pool of translations from English to Arabic for mathematical
expressions lacks sufficiency for the utilization of neural network-based translation methods.
Consequently, the current optimal approach is rule-based translation, despite its non-exhaustive
nature [9]. The translation process involves the following steps:
Utilization of TEX[10] packages, including RyDArab[11] for typesetting Arabic
mathematical expressions and curext [12] for handling stretched mathematical symbols in
the resulting LATEX file.
Introduction of the command arabmath to each mathematical expression to ensure proper
alignment.
Transliteration of alphabetical symbols into their Arabic equivalents.
Implementation of the commands warabnum for standard Western Arabic digits
or earabnum for Eastern Arabic digits (Hindi digits)
.
Use of alp without dots for alphabetic symbols without dots ( ) and alpwithdots for
those with dots ( ).
Application of fun with dots and fun without dots commands for elementary functions
with and without dots, respectively ( , ).
When these options are incorporated into the preamble of the input file, the commands are
universally applied to all mathematical expressions. To address document encoding concerns, the
output file must adhere to either the ISO-8859-6 encoding system for ArabTEX [13] or UTF-8
for Omega [14].
Furthermore, specific transformations are applied to mathematical symbols, such as replacing
sum with lsum ( ), csum ( ) for its Arabic literal linear and curved equivalents,
and ssum ( ) for the mirrored equivalent. Similar transformations are implemented for the
product command.
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
18
5. RESULTS, LIMITATIONS AND PERSPECTIVES
For system evaluation, we conducted a meticulous assessment using targeted documents that
explore mathematical concepts similar to the ones discovered in textbooks used in primary and
secondary education courses. The results for translating mathematical expressions were generally
satisfactory, showcasing the system's competency. However, some discrepancies were observed,
which can be attributed to the inherent limitations of the non-exhaustive rule-based method.
Notably, the machine translation performance, utilizing the transformer model, exhibited
markedly low BLUE scores. The BLEU (Bilingual Evaluation Understudy) score [15] is a
frequently employed measure for assessing the quality of machine-generated translations by
comparing them to human-generated references. In this context, the lower BLUE scores indicate
a misalignment between the machine-generated translations and the expected human references.
This suboptimal performance can be traced back to two key factors: the relatively modest size of
the dataset used for training and the substitution of mathematical expressions with "exprri" tokens
in the natural language input.
The limited dataset size may have hindered the model's exposure to diverse linguistic patterns,
leading to challenges in accurately capturing the nuances of natural language. Additionally, the
abstraction introduced by the "exprri" tokens may have contributed to a loss of contextual
information during the translation process, further impacting the system's ability to generate
linguistically fluent and contextually accurate translations. Addressing these issues by
augmenting the dataset and refining the tokenization approach could potentially enhance the
system's overall translation performance.
In future work, we aim to advance towards a comprehensive machine translation system capable
of translating entire LATEX documents, including intricate LATEX commands, into Arabic. This
expansion is driven by the integration of the DadTEX [16] extension, designed for completely
Arabic TEX environments. This ambitious undertaking holds potential to enhance the
accessibility of scientific literature by providing seamless translation of diverse document
components. The incorporation of DadTEX signifies a crucial step towards facilitating cross-
linguistic communication in scientific and mathematical domains. The future perspectives
emphasize a commitment to overcoming language barriers and promoting inclusivity in scientific
discourse.
6. CONCLUSIONS
In conclusion, this paper introduces a specialized machine translation system for LATEX
mathematical documents, concentrating on translating English LATEX mathematical content into
Arabic LATEX. Driven by the escalating demand for multilingual accessibility in scientific
literature, especially in the realm of complex mathematical expressions documented in LATEX,
the proposed system employs a text-to-text transfer transformer to enhance accuracy and fluency
in the translated Arabic LATEX documents.
The integration of RyDArab, an Arabic mathematical TEX extension, and a rule-based translator
for Arabic mathematical expressions ensures precise rendering of intricate symbols and equations
in the translated output. The discussed architecture, methodology, and performance evaluation
underscore the system's effectiveness in overcoming language barriers in mathematical
documentation.
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
19
Acknowledging the complexities of handling symbolic mathematical content, the study
underscores the necessity for an adequate corpus to improve translation accuracy. Despite
challenges posed by the unique nature of Arabic mathematical expressions, this research
promises significant benefits for education, aiding researchers, educators, and students in
comprehending intricate mathematical documents in both English and Arabic.
Addressing the unexplored realm of translating mathematical content into Arabic, this paper
recognizes and navigates the specific challenges posed by Arabic mathematical notations. It also
contributes valuable insights into related work, emphasizing the importance of automated
translation in scientific communication, particularly focusing on LATEX documents. In
summary, this study seeks to address a crucial void in automated translation, particularly in
mathematical documents, laying the groundwork for further advancements in multilingual
accessibility within the scientific and mathematical community.
REFERENCES
[1] Leslie Lamport. LATEX: A Document Preparation System. Addison-Wesley, 1986.
[2] Azzeddine Lazrek, RyDArab—Typesetting Arabic mathematical expressions , TUGboat , Volume
25, Number 2, pp. 141-149, 2004 ( http://www.tug.org/TUGboat/Articles/tb25-2/tb81lazrek.pdf )
[3] Donald E. Knuth. The TEXbook. Volume A, Addison-Wesley, 1984.
[4] Haifeng Wang, Hua Wu, Zhongjun He, Liang Huang, Kenneth Ward Church, Progress in Machine
Translation, Engineering, Volume 18, 2022, Pages 143-153, ISSN 2095-8099,
https://doi.org/10.1016/j.eng.2021.03.023.
[5] A. A. Zhivotova, V. D. Berdonosov and E. V. Redkolis, "Improving the Quality of Scientific
Articles Machine Translation While Writing Original Text," 2020 International Multi-Conference
on Industrial Engineering and Modern Technologies (FarEastCon), Vladivostok, Russia, 2020, pp.
1-4, doi: 10.1109/FarEastCon50210.2020.9271442.
[6] Emma Steigerwald, Valeria Ramírez-Castañeda, Débora Y C Brandt, András Báldi, Julie Teresa
Shapiro, Lynne Bowker, Rebecca D Tarvin,"Overcoming Language Barriers in Academia: Machine
Translation Tools and a Vision for a Multilingual Future", BioScience, Vo.72 N° 10, pp 988-
998,2022 doi 10.1093/biosci/biac062
[7] Irsha Tehseen, Ghulam Rasool Tahir, Khadija Shakeel & Mubbashir Ali (2018). Corpus Based
Machine Translation for Scientific Text. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds)
Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and
Communication Technology, vol 519. Springer, Cham. https://doi.org/10.1007/978-3-319-92007-
8_17
[8] ADITYA OHRI, TANYA SCHMAH, "Machine Translation of Mathematical Text," in IEEE
Access, vol. 9, pp. 38078-38086, 2021, doi: 10.1109/ACCESS.2021.3063715.
[9] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st
International Conference on Neural Information Processing Systems (Long Beach, California,
USA). 6000–6010.
[10] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi
Zhou, Wei Li, Peter J. Liu: Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer. J. Mach. Learn. Res. 21: 140:1-140:67 (2020)
[11] Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, and Bela Gipp. 2023. Neural Machine
Translation for Mathematical Formulae. In Proceedings of the 61st Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), pages 11534–11550, Toronto,
Canada. Association for Computational Linguistics
[12] Azzeddine Lazrek, CurExt, Typesetting variable-sized curved symbols , TUGboat, Volume 24,
number 3, pp. 323–327, 2003 EuroTeX2003: 14th European TEX Conference, Back to Typography,
Brest, France, EuroTeX2003 Preprints pp. 67-71, 2003
[13] Klaus Lagally. ArabTEX - Typesetting Arabic with Vowels and Ligatures. Proceedings of the 7th
European TEX Conference, Prague 1992.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.14, No.6, November 2023
20
[14] Yannis Haralambous and John Plaice. Multilingual Typesetting with Ω a Case Study: Arabic.
Proceedings of the International Symposium on Multilingual Information Processing, pp. 137–154,
Tsukuba, 1997.
[15] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for
automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics (ACL '02). Association for Computational Linguistics,
USA, 311–318. https://doi.org/10.3115/1073083.1073135
[16] Mustapha Eddahibi, Azzeddine Lazrek and Khalid Sami, DadTeX—A full Arabic interface,
TUGboat, Volume 27, Number 2, pp. 154-158, 2006
AUTHORS
Prof. Mustapha Eddahibi received his PhD from Unversity Cadi Ayyad Marrakech in
2007. He is currently computer science teacher researcher in University Ibn Zohr. He is
a former head of the decisional expert systems research team. His research interests are
in the area of intelligent computing, information engineering, digital Information
Encoding and Processing.
Prof. Mohammed Mensouri received the M.S degree in Networks and
Telecommunication in 2008, from Faculty of Sciences and Technology, Cadi Ayyad
University, Marrakech, Morocco. In 2015, He received Ph.D of Computer Science in
Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco. He is professor
at Ibn Zohr University, Agadir, Moroc. His research interest information theory and
channel coding , especially error correction codes.