Due to its wide applicability, natural language processing (NLP) has attracted significant research efforts to the machine learning and deep learning research community. Despite this, research works investigating NLP for the Albanian language are still limited. However, to the best of our knowledge, there is no literature review available, which presents a clear picture of what has been studied, argued, and established in the area. The main objective of this survey is to comprehensively review, analyze and discuss the state-of-the-art in NLP for the Albanian language. Here, we present an extensive study concerning the contribution of several authors that have contributed to the application of NLP to the Albanian language. Also, we present an overview of research carried out in the typical applications of NLP for the Albanian language. Finally, some future challenges and limitations of the area are discussed.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
This document provides an introduction to natural language processing (NLP) and discusses several key concepts:
- NLP aims to allow computers to understand human language in a way similar to humans. Examples of NLP applications discussed include spam filters, sentiment analysis tools, digital assistants, and language translators.
- The document outlines some of the core components of NLP systems, including natural language understanding to interpret text/speech meaning, and natural language generation to produce output text/speech.
- It introduces the NLTK (Natural Language Toolkit) as a popular Python package used for various NLP tasks like tokenization, tagging, parsing, and more. Basic NLTK structure and example modules are covered at a
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
Syracuse UniversitySURFACEThe School of Information Studie.docxdeanmtaylor1545
Syracuse University
SURFACE
The School of Information Studies Faculty
Scholarship
School of Information Studies (iSchool)
2001
Natural Language Processing
Elizabeth D. Liddy
Syracuse University, [email protected]
Follow this and additional works at: http://surface.syr.edu/istpub
Part of the Library and Information Science Commons, and the Linguistics Commons
This Book Chapter is brought to you for free and open access by the School of Information Studies (iSchool) at SURFACE. It has been accepted for
inclusion in The School of Information Studies Faculty Scholarship by an authorized administrator of SURFACE. For more information, please contact
[email protected]
Recommended Citation
Liddy, E.D. 2001. Natural Language Processing. In Encyclopedia of Library and Information Science, 2nd Ed. NY. Marcel Decker, Inc.
http://surface.syr.edu?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/ischool?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/1018?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/371?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
mailto:[email protected]
Natural Language Processing
1
INTRODUCTION
Natural Language Processing (NLP) is the computerized approach to analyzing text that
is based on both a set of theories and a set of technologies. And, being a very active area
of research and development, there is not a single agreed-upon definition that would
satisfy everyone, but there are some aspects, which would be part of any knowledgeable
person’s definition. The definition I offer is:
Definition: Natural Language Processing is a theoretically motivated range of
computational techniques for analyzing and representing naturally occurring texts
at one or more levels of linguistic analysis for the purpose of achieving human-like
language processing for a range of tasks or applications.
Several elements of this definition can be further detailed. Firstly the imprecise notion of
‘range of computational techniques’ is necessary because there are multiple methods or
techniques from which to choose to accomplish a particular type of language analysis.
‘Naturally occurring texts’ can be of any language, mode, genre, etc. The texts can be
oral or written. The only requirement is that they be in a language used by humans to
communicate to one another. Also, the text being analyzed should not be specifically
constru.
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
This document presents a research proposal to explore the challenges of natural language processing (NLP) for Ethiopian languages. The proposal outlines the background of the problem, which is the limited progress in NLP for under-resourced Ethiopian languages due to their morphological complexity, orthographic variations, and lack of linguistic resources. The objectives are to identify challenges, propose techniques for Ethiopian language NLP, and explore machine translation strategies. A qualitative case study approach involving literature review and interviews is proposed to analyze the NLP challenges of Amharic, Afaan Oromoo, and Tigrinya languages. The significance is to promote digital inclusion and language preservation through the development of NLP technologies for Ethiopia's diverse languages
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
This document provides an introduction to natural language processing (NLP) and discusses several key concepts:
- NLP aims to allow computers to understand human language in a way similar to humans. Examples of NLP applications discussed include spam filters, sentiment analysis tools, digital assistants, and language translators.
- The document outlines some of the core components of NLP systems, including natural language understanding to interpret text/speech meaning, and natural language generation to produce output text/speech.
- It introduces the NLTK (Natural Language Toolkit) as a popular Python package used for various NLP tasks like tokenization, tagging, parsing, and more. Basic NLTK structure and example modules are covered at a
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
Syracuse UniversitySURFACEThe School of Information Studie.docxdeanmtaylor1545
Syracuse University
SURFACE
The School of Information Studies Faculty
Scholarship
School of Information Studies (iSchool)
2001
Natural Language Processing
Elizabeth D. Liddy
Syracuse University, [email protected]
Follow this and additional works at: http://surface.syr.edu/istpub
Part of the Library and Information Science Commons, and the Linguistics Commons
This Book Chapter is brought to you for free and open access by the School of Information Studies (iSchool) at SURFACE. It has been accepted for
inclusion in The School of Information Studies Faculty Scholarship by an authorized administrator of SURFACE. For more information, please contact
[email protected]
Recommended Citation
Liddy, E.D. 2001. Natural Language Processing. In Encyclopedia of Library and Information Science, 2nd Ed. NY. Marcel Decker, Inc.
http://surface.syr.edu?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/ischool?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://surface.syr.edu/istpub?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/1018?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/371?utm_source=surface.syr.edu%2Fistpub%2F63&utm_medium=PDF&utm_campaign=PDFCoverPages
mailto:[email protected]
Natural Language Processing
1
INTRODUCTION
Natural Language Processing (NLP) is the computerized approach to analyzing text that
is based on both a set of theories and a set of technologies. And, being a very active area
of research and development, there is not a single agreed-upon definition that would
satisfy everyone, but there are some aspects, which would be part of any knowledgeable
person’s definition. The definition I offer is:
Definition: Natural Language Processing is a theoretically motivated range of
computational techniques for analyzing and representing naturally occurring texts
at one or more levels of linguistic analysis for the purpose of achieving human-like
language processing for a range of tasks or applications.
Several elements of this definition can be further detailed. Firstly the imprecise notion of
‘range of computational techniques’ is necessary because there are multiple methods or
techniques from which to choose to accomplish a particular type of language analysis.
‘Naturally occurring texts’ can be of any language, mode, genre, etc. The texts can be
oral or written. The only requirement is that they be in a language used by humans to
communicate to one another. Also, the text being analyzed should not be specifically
constru.
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
This document presents a research proposal to explore the challenges of natural language processing (NLP) for Ethiopian languages. The proposal outlines the background of the problem, which is the limited progress in NLP for under-resourced Ethiopian languages due to their morphological complexity, orthographic variations, and lack of linguistic resources. The objectives are to identify challenges, propose techniques for Ethiopian language NLP, and explore machine translation strategies. A qualitative case study approach involving literature review and interviews is proposed to analyze the NLP challenges of Amharic, Afaan Oromoo, and Tigrinya languages. The significance is to promote digital inclusion and language preservation through the development of NLP technologies for Ethiopia's diverse languages
Design Analysis Rules to Identify Proper Noun from Bengali Sentence for Univ...Syeful Islam
Abstract—Now-a-days hundreds of millions of people of
almost all levels of education and attitudes from different
country communicate with each other for different
purposes and perform their jobs on internet or other
communication medium using various languages. Not all
people know all language; therefore it is very difficult to
communicate or works on various languages. In this
situation the computer scientist introduce various inter
language translation program (Machine translation). UNL
is such kind of inter language translation program. One of
the major problem of UNL is identified a name from a
sentence, which is relatively simple in English language,
because such entities start with a capital letter. In Bangla
we do not have concept of small or capital letters. Thus
we find difficulties in understanding whether a word is a
proper noun or not. Here we have proposed analysis rules
to identify proper noun from a sentence and established
post converter which translate the name entity from
Bangla to UNL. The goal is to make possible Bangla
sentence conversion to UNL and vice versa. UNL system
prove that the theoretical analysis of our proposed system
able to identify proper noun from Bangla sentence and
produce relative Universal word for UNL.
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...CSCJournals
- The document describes the development of a bi-directional English to Yoruba translator for real-time mobile chatting.
- It uses a hybrid machine translation approach combining rule-based and corpus-based methods. Parallel corpora and computational rules were developed and stored in databases. Dictionaries containing words in both languages and their parts of speech were also created.
- The system was implemented using PHP for server-side scripting, JSON for data interchange, and an Android app for the user interface. Testing showed 95% accuracy compared to Google Translate.
**TOP 10 NATURAL LANGUAGE PROCESSING PAPERS: RECOMMENDED READING – LANGUAGE R...kevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
This document summarizes a research paper on the progress of natural language processing (NLP) in India. It discusses some of the key challenges in NLP research in India, including the complexity of languages, ambiguity, and lack of annotated corpora. It also outlines several potential benefits and applications of NLP in India, such as bridging the digital divide, enabling e-governance services in local languages, machine translation, and powering local language search engines and content. Overall, the document provides an overview of the state of NLP research in India and its prospects for the future.
This document provides a literature review and bibliometric analysis of natural language processing (NLP) applications in library and information science. It identifies 6,607 relevant publications on topics like information retrieval, machine translation, and text summarization. The document analyzes the historical trends, core journals, and prominent publications in the field. It also describes how NLP can enhance bibliometric studies by automating tasks like information extraction from texts. The bibliometric analysis reveals that library and information science is the most prominent subject category for NLP publications, followed by computer science and engineering.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
This document discusses the need to develop open natural language processing (NLP) resources and tools for the Serbian language. It notes that while large language models are making progress, they are currently limited for Serbian due to a lack of high-quality training data. The document proposes an initiative to create foundational NLP resources for Serbian, including labeled datasets and fine-tuned models, which would help spur the local development of NLP solutions. It provides an update on the initial work underway, and calls for additional organizations to support and contribute to the initiative.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...kevig
Morphological analyzer is the basic for various high level NLP applications such as information retrieval, spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic sentence construction. This paper is carefully designed for design and analysis of morphological analyzer Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have conducted using python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules are used. The performance of the system has been evaluated using 10 fold cross validation technique. Testing conducted using optimized parameter settings for regular verbs and linguistic rules of the Tigrigna language allomorph and phonology for the irregular verbs. The accuracy on the memory based approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%, respectively. Finally, the hybrid approach had the actual performance of 95.6% using linguistic rules for handling irregular and copula verbs.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
UNL-ization of Numbers and Ordinals in Punjabi with IANijnlc
In the field of Natural Language Processing, Universal Networking Language (UNL) has been an area of
immense interest among researchers during last couple of years. Universal Networking Language (UNL) is
an artificial Language used for representing information in a natural-language-independent format. This
paper presents UNL-ization of Punjabi sentences with the help of different examples, containing numbers
and ordinals written in words, using IAN (Interactive Analyzer) tool. In UNL approach, UNL-ization is a
process of converting natural language resource to UNL and NL-ization, is a process of generating a
natural language resource out of a UNL graph. IAN processes input sentences with the help of TRules and
Dictionary entries. The proposed system performs the UNL-ization of up to fourteen digit number and
ordinals, written in words in Punjabi language, with the help of 104 dictionary entries and 67 TRules. The
system is tested on a sample of 150 random Punjabi Numbers and Ordinals, written in words, and its FMeasure
comes out to be 1.000 (on a scale of 0 to 1).
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
The document provides a comprehensive overview of natural language processing (NLP) and natural language interfaces to databases (NLIDBs). It discusses the different levels of NLP including morphological, lexical, syntactic, semantic and pragmatic analysis. It also describes various approaches used to develop NLIDBs, including symbolic, empirical, connectionist and maximum entropy approaches. Additionally, it outlines the history of NLP and NLIDBs, covering early work in machine translation and historically developed systems like LUNAR.
We start with a linguistic discussion of language, its properties, and the study of language in philosophy and linguistics. We then investigate natural languages, controlled languages, and artificial languages to emphasise the human ability to control and construct languages. At the end, we arrive at the notion of software languages as means to communicate software between people.
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.
The document discusses language detection and identification (LDI). It begins by noting the increasing number of languages in the world and the need to identify unknown languages. It then outlines the objectives of developing an LDI system that can identify the language of scanned text images, including printed and handwritten text, while maintaining the structure of tabular data. The document provides an extensive literature review on the history of optical character recognition and language identification techniques. It discusses early work from the 1940s through the 2000s and highlights several influential studies on automatic language identification from the 1970s and 1990s.
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...kevig
Morphological analyzer is the base for various high-level NLP applications such as information retrieval,
spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic
sentence construction. This paper is carefully designed for design and analysis of morphological analyzer
Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have
conducted using Python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules
are used. The performance of the system has been evaluated using 10 fold cross validation technique.
Testing was conducted using optimized parameter settings for regular verbs and linguistic rules of the
Tigrigna language allomorph and phonology for the irregular verbs. The accuracy of the memory based
approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%,
respectively. Finally, the hybrid approach had an actual performance of 95.6% using linguistic rules for
handling irregular and copula verbs.
NLP stands for Natural Language Processing which is a field of artificial intelligence that helps machines understand, interpret and manipulate human language. The key developments in NLP include machine translation in the 1940s-1960s, the introduction of artificial intelligence concepts in 1960-1980s and the use of machine learning algorithms after 1980. Modern NLP involves applications like speech recognition, machine translation and text summarization. It consists of natural language understanding to analyze language and natural language generation to produce language. While NLP has advantages like providing fast answers, it also has challenges like ambiguity and limited ability to understand context.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
More Related Content
Similar to Natural language processing for Albanian: a state-of-the-art survey
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...CSCJournals
- The document describes the development of a bi-directional English to Yoruba translator for real-time mobile chatting.
- It uses a hybrid machine translation approach combining rule-based and corpus-based methods. Parallel corpora and computational rules were developed and stored in databases. Dictionaries containing words in both languages and their parts of speech were also created.
- The system was implemented using PHP for server-side scripting, JSON for data interchange, and an Android app for the user interface. Testing showed 95% accuracy compared to Google Translate.
**TOP 10 NATURAL LANGUAGE PROCESSING PAPERS: RECOMMENDED READING – LANGUAGE R...kevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
This document summarizes a research paper on the progress of natural language processing (NLP) in India. It discusses some of the key challenges in NLP research in India, including the complexity of languages, ambiguity, and lack of annotated corpora. It also outlines several potential benefits and applications of NLP in India, such as bridging the digital divide, enabling e-governance services in local languages, machine translation, and powering local language search engines and content. Overall, the document provides an overview of the state of NLP research in India and its prospects for the future.
This document provides a literature review and bibliometric analysis of natural language processing (NLP) applications in library and information science. It identifies 6,607 relevant publications on topics like information retrieval, machine translation, and text summarization. The document analyzes the historical trends, core journals, and prominent publications in the field. It also describes how NLP can enhance bibliometric studies by automating tasks like information extraction from texts. The bibliometric analysis reveals that library and information science is the most prominent subject category for NLP publications, followed by computer science and engineering.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
This document discusses the need to develop open natural language processing (NLP) resources and tools for the Serbian language. It notes that while large language models are making progress, they are currently limited for Serbian due to a lack of high-quality training data. The document proposes an initiative to create foundational NLP resources for Serbian, including labeled datasets and fine-tuned models, which would help spur the local development of NLP solutions. It provides an update on the initial work underway, and calls for additional organizations to support and contribute to the initiative.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from
different country communicate with each other for different using various languages. Machine
translation is highly demanding due to increasing the usage of web based Communication. One of
the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we
do not have concept of small or capital letters and there is huge no. of different naming entity
available in Bangla. Thus we find difficulties in understanding whether a word is a naming word
(proper noun) or not. Here we have introduced a new approach to identify naming word from a
Bengali sentence for UNL without storing huge no. of naming entity in word dictionary. The goal is
to make possible Bangla sentence conversion to UNL and vice versa with minimal storing word in
dictionary.
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...kevig
Morphological analyzer is the basic for various high level NLP applications such as information retrieval, spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic sentence construction. This paper is carefully designed for design and analysis of morphological analyzer Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have conducted using python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules are used. The performance of the system has been evaluated using 10 fold cross validation technique. Testing conducted using optimized parameter settings for regular verbs and linguistic rules of the Tigrigna language allomorph and phonology for the irregular verbs. The accuracy on the memory based approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%, respectively. Finally, the hybrid approach had the actual performance of 95.6% using linguistic rules for handling irregular and copula verbs.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
UNL-ization of Numbers and Ordinals in Punjabi with IANijnlc
In the field of Natural Language Processing, Universal Networking Language (UNL) has been an area of
immense interest among researchers during last couple of years. Universal Networking Language (UNL) is
an artificial Language used for representing information in a natural-language-independent format. This
paper presents UNL-ization of Punjabi sentences with the help of different examples, containing numbers
and ordinals written in words, using IAN (Interactive Analyzer) tool. In UNL approach, UNL-ization is a
process of converting natural language resource to UNL and NL-ization, is a process of generating a
natural language resource out of a UNL graph. IAN processes input sentences with the help of TRules and
Dictionary entries. The proposed system performs the UNL-ization of up to fourteen digit number and
ordinals, written in words in Punjabi language, with the help of 104 dictionary entries and 67 TRules. The
system is tested on a sample of 150 random Punjabi Numbers and Ordinals, written in words, and its FMeasure
comes out to be 1.000 (on a scale of 0 to 1).
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
The document provides a comprehensive overview of natural language processing (NLP) and natural language interfaces to databases (NLIDBs). It discusses the different levels of NLP including morphological, lexical, syntactic, semantic and pragmatic analysis. It also describes various approaches used to develop NLIDBs, including symbolic, empirical, connectionist and maximum entropy approaches. Additionally, it outlines the history of NLP and NLIDBs, covering early work in machine translation and historically developed systems like LUNAR.
We start with a linguistic discussion of language, its properties, and the study of language in philosophy and linguistics. We then investigate natural languages, controlled languages, and artificial languages to emphasise the human ability to control and construct languages. At the end, we arrive at the notion of software languages as means to communicate software between people.
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.
The document discusses language detection and identification (LDI). It begins by noting the increasing number of languages in the world and the need to identify unknown languages. It then outlines the objectives of developing an LDI system that can identify the language of scanned text images, including printed and handwritten text, while maintaining the structure of tabular data. The document provides an extensive literature review on the history of optical character recognition and language identification techniques. It discusses early work from the 1940s through the 2000s and highlights several influential studies on automatic language identification from the 1970s and 1990s.
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...kevig
Morphological analyzer is the base for various high-level NLP applications such as information retrieval,
spell checking, grammar checking, machine translation, speech recognition, POS tagging and automatic
sentence construction. This paper is carefully designed for design and analysis of morphological analyzer
Tigrigna verbs using hybrid of memory learning and rules based approaches. The experiment have
conducted using Python 3 where TiMBL algorithms IB2 and TRIBL2, and Finite State Transducer rules
are used. The performance of the system has been evaluated using 10 fold cross validation technique.
Testing was conducted using optimized parameter settings for regular verbs and linguistic rules of the
Tigrigna language allomorph and phonology for the irregular verbs. The accuracy of the memory based
approach with optimized parameters of TiMBL algorithm IB2 and TRIBL2 was 93.24% and 92.31%,
respectively. Finally, the hybrid approach had an actual performance of 95.6% using linguistic rules for
handling irregular and copula verbs.
NLP stands for Natural Language Processing which is a field of artificial intelligence that helps machines understand, interpret and manipulate human language. The key developments in NLP include machine translation in the 1940s-1960s, the introduction of artificial intelligence concepts in 1960-1980s and the use of machine learning algorithms after 1980. Modern NLP involves applications like speech recognition, machine translation and text summarization. It consists of natural language understanding to analyze language and natural language generation to produce language. While NLP has advantages like providing fast answers, it also has challenges like ambiguity and limited ability to understand context.
Similar to Natural language processing for Albanian: a state-of-the-art survey (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Applications of artificial Intelligence in Mechanical Engineering.pdf
Natural language processing for Albanian: a state-of-the-art survey
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 6, December 2022, pp. 6432~6439
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i6.pp6432-6439 6432
Journal homepage: http://ijece.iaescore.com
Natural language processing for Albanian: a state-of-the-art
survey
Muhamet Kastrati, Marenglen Biba
Department of Computer Science, Faculty of Engineering and Architecture, University of New York Tirana, Tirana, Albania
Article Info ABSTRACT
Article history:
Received Jul 27, 2021
Revised Jun 8, 2022
Accepted Jul 2, 2022
Due to its wide applicability, natural language processing (NLP) has
attracted significant research efforts to the machine learning and deep
learning research community. Despite this, research works investigating
NLP for the Albanian language are still limited. However, to the best of our
knowledge, there is no literature review available, which presents a clear
picture of what has been studied, argued, and established in the area. The
main objective of this survey is to comprehensively review, analyze and
discuss the state-of-the-art in NLP for the Albanian language. Here, we
present an extensive study concerning the contribution of several authors
that have contributed to the application of NLP to the Albanian language.
Also, we present an overview of research carried out in the typical
applications of NLP for the Albanian language. Finally, some future
challenges and limitations of the area are discussed.
Keywords:
Albanian language
Deep learning
Emotion detection
Machine learning
Natural language processing
Sentiment analysis This is an open access article under the CC BY-SA license.
Corresponding Author:
Muhamet Kastrati
Department of Computer Science, Faculty of Engineering and Architecture, University of New York Tirana
Kodra e Diellit, Tirana, Albania
Email: muhamet.kastrati@gmail.com
1. INTRODUCTION
Natural language processing (NLP) is an emerging interdisciplinary research area at the intersection
of learning linguistics, computer science, and artificial intelligence (AI) mainly dealing with computational
techniques to learn, understand, and produce human language content [1]. Over the last decades, the NLP
community has been focused on topics such as machine translation, speech recognition, and speech synthesis.
During the last 20 years, there has been an increased interest among the NLP research community and
practitioners in a wide range of real-world applications including speech-to-speech translation engines,
mining social media for information about health and finance, emotion and sentiment analysis concerning
products, and/or services [1]. Historically, there have been known three main approaches used in NLP:
rule-based, machine learning, and deep learning approach. The rule-based approach is the earliest type of all
AI algorithms. Second, is the machine learning approach that covers several machine algorithms that have
been used to solve several NLP tasks. Third, is the deep learning approach or deep neural networks (DNN),
which have revolutionized the field of NLP [2]. Among the most successful NLP tasks tackled by the
research community and practitioners includes named entity recognition, text classification, topic modeling,
parts-of-speech tagging, question answering, chatbots, image captioning, fake news detection, text generation,
sentiment and emotion analysis, speech-to-text, text-to-speech, and topic classification. Despite the advancement of
research in NLP for other high-resource languages, such as English, French, Spanish, German, and Chinese,
much less work has been done so far on other low-resource languages where Albanian belongs to.
More recently, there has been increased research interest among researchers to explore NLP for the
Albanian language mainly dealing with tools and applications. Studies conducted concerning tools include
2. Int J Elec & Comp Eng ISSN: 2088-8708
Natural language processing for Albanian: A state-of-the-art survey (Muhamet Kastrati)
6433
part-of-speech and morphological tagging [3]–[5], stemming [6], the lexicon of Albanian for NLP [7], and
syntactic parsing [8]. Also, there are several studies related to the application of the NLP for the Albanian
language such as named-entity recognition [9]–[12], sentiment analysis [13]–[16], emotion detection [17],
[18], hate speech detection [19], [20], summarization techniques [21], [22], text classification [23]–[25] and
question answering system [26].
To the best of our knowledge, this survey is the first one on this topic (NLP for the Albanian
language). The literature used for this survey encompasses almost all research works published in conference
proceedings and journals among other sources used in NLP for the Albanian language area. All papers
reviewed in this survey are considered from two main perspectives, respectively, the technical approach used
for learning and the NLP tasks that have been tackled.
The rest of the paper is structured as the following: section 2 presents a brief theoretical background
on NLP and the Albanian language. Section 3 presents a comprehensive state-of-the-art review in NLP for
the Albanian language from its beginning to the present day. Section 4 concludes the paper.
2. BACKGROUND
In order to facilitate understanding, we introduce a theoretical background on NLP and the Albanian
language that are part of this survey. NLP is a subfield of computer science that employs computational
techniques to learn, understand, and produce human language content [1]. Albanian is an Indo-European (IE)
language, an independent branch of its own, which has distinctive features that range from morphological to
lexical viewpoints.
2.1. Natural language processing
NLP is an emerging interdisciplinary research area at the intersection of learning linguistics,
computer science, and artificial intelligence mainly dealing with the interaction between computers and
human language. From their beginnings, the main goal of the NLP systems has been aiding human-human
and human-machine communication. In the first case, human-human communication, one typical example is
the case of machine translation (MT). In the second case, human-machine communication, some common
examples include conversational agents. In general, both humans and machines benefit from human language
content data that is nowadays available online [1]. Over the past two decades, there has been an increased
interest by both the research community and practitioners as the NLP has been shown to perform quite well
in several consumer products (for example, in speech-based natural user interfaces (NUI) such as Alexa,
Cortana, Google Assistant, and Apple’s Siri). As described by Hirschberg and Manning [1], the success of
NLP is strongly related to these four key factors: i) advances in computation power, ii) a large amount of
available natural language content, iii) advancements in machine learning and deep learning respectively, and
iv) a better understanding of the human languages including grammatical structure and meaning.
2.2. Albanian language
Albanian, in its modern form, is the official language of Albania, Kosovo, and North Macedonia,
where it has co-official status. Albanian is currently spoken by more than 7.5 million Albanians living in
Albania, Kosovo, Montenegro, northwest Macedonia, some other Western European countries, and North
America where Albanian people live and work. The Albanian language is considered an Indo-European (IE)
language family, which derives from the old Illyrian language. Same as the Greek and Armenian language it
is an independent branch of IE language, which has been mainly spoken in the Western Balkans. Same as
other languages, also the Albanian language has distinctive features that range from morphological to lexical
viewpoints.
3. STATE-OF-THE-ART REVIEW
In this section, we have presented a state-of-the-art review of studies conducted concerning the NLP
tools and applications for the Albanian language. It starts with some early studies conducted about parts-of-
speech and morphological tagging, annotated corpora, and then followed by a brief overview of the
stemming, and parsing. Finally, a short overview of the studies conducted concerning sentiment analysis,
emotion detection, hate speech detection, text classification, question answering systems, text classification,
and named entity recognition for the Albanian language is given.
3.1. Part-of-speech and morphological tagging
Over the last years, there have been several attempts to build NLP tools for the Albanian language.
In the following, we will briefly overview some of the most significant research work conducted in this area.
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6432-6439
6434
Almost all these existing tools are rule-based or dictionary-based and unfortunately are not available online
for NLP purposes.
Trommer and Kallulli [3] presented a simple morphological tagger for the standard Albanian
language. All that is intended here was just to introduce an initial component of an annotation tool in the
context of the Albanian Corpus Initiative. Their proposed morphological tagger has been evaluated in a small
corpus of 1,000 tokens (words) and the results obtained were very promising, with a Precision of 97% and
Recall in the range of 92–95%.
Several studies [27], [28] presented and developed electronic dictionaries and transducers for the
automatic processing of the Albanian language. The authors described some peculiarities of the Albanian
language and then explained how FST and generally speaking NooJ’s graphs enable to treat them. Their
focus was to analyze the words inside a linear segment of text. They also studied the relationship between
units of sense and units of form.
3.2. Annotated corpora
Arkhangelskij et al. [29] presented a large annotated corpus of Albanian created by the Saint-
Petersburg linguists’ team. The corpus consists of 16.6 M tokens collected from several sources including
textbooks, news reports, official, religious, and scientific texts. Kadriu in [30] presented an unsupervised
approach using a dictionary with around 32,000 words with the corresponding part-of-speech (POS) tags of
the words and a set of regular expression rules to assign POS tags to a new text, using the natural language
toolkit (NLTK) toolkit. The evaluation has been performed on a set of 30 random news articles, and the
accuracy reached was in the range of 88 to 93%. Some other annotated corpora have been presented by the
UniMorph project [31], which is a small annotated morphological corpus of Albanian inflected words
extracted from Wiktionary.
Kabashi and Proisl in [5] presented the gold version corpus of their previous corpus introduced in
Kabashi and Proisl [4]. Their corpus contains 2,020 sentences, with 31,584 tokens and was manually
annotated by two native Albanian speakers. The obtained results reached an accuracy ranging from 86.96%
to 95.10%.
3.3. Stemming and parsing
Karanikolas in [32] provided a naive-single-step (rudimentary) stemming algorithm for the Albanian
language. Here the author also presented a list of 470 stop words and a corpus containing 5,000 words, which
was used to generate the stems. The obtained results reached an accuracy of 80%. It is good to emphasize
that the author and the evaluators engaged in this study were not native speakers. Sadiku and Biba [6]
presented the first stemming algorithm for the Albanian language developed by a native speaker. Here
authors built upon a rule-based JStem algorithm, which is based on word formation with affixes.
Misini et al. in [8] provided an appropriate method for syntactic parsing of the Albanian language.
In this article, the authors begin by describing the prior work and pointing out several algorithms used for
parsing. They then state that parsing is the most appropriate approach to identify the syntactic structure that is
useful in determining the meaning of a sentence. Their algorithm is based on the idea of splitting the
sentences into parts of speech and analyzing these sentences using the natural language’s syntactic rules.
Collaku and Adali [33] introduced a new method of grouping verbs based on their inflection themes.
Contrary to traditional classification methods, where verbs are classified based on the inflection themes they
take, here verbs are classified into different verb groups. By doing this, the inflection process looks clearer
and more regular, as the affix remains the only changeable part of the inflected verb. The most important
outcome of this approach is that it makes the process of Albanian verbs simpler and easier.
3.4. Sentiment analysis and emotion detection
In this section, we will survey scientific literature concerning sentiment analysis and emotion
detection tasks for the Albanian language. As described by authors in [34] in general, research about
sentiment analysis has been done at three levels: document level, sentence level, and aspect level sentiment
classification. Over the last few years, sentiment analysis is one of the most active research areas in NLP
[35]. However, there are only a few researches dedicated to sentiment analysis (opinion mining) for the
Albanian language presented by authors in [13]–[16] and a few others related to emotion detection in studies
[17], [18].
Biba and Mane [13] presented the first approach for sentiment analysis in Albanian. They developed
a machine learning model to classify text documents belonging to a negative or positive opinion regarding
the given topic. To train their machine learning models they built a corpus of 400 documents containing
political news consisting of five different topics. In this case, each topic was represented by 80 documents
classified as positive or negative. Here authors made an empirical comparison between 6 different machine
4. Int J Elec & Comp Eng ISSN: 2088-8708
Natural language processing for Albanian: A state-of-the-art survey (Muhamet Kastrati)
6435
learning algorithms, respectively, Bayesian logistic regression, logistic regression, support vector machine
(SVM), voted perceptron, naive Bayes, and hyper pipes for classification of text documents, achieving
accuracy between 86% and 92% depending on the topic. The authors conclude that, in general, to achieve
higher accuracy in sentiment analysis, a larger corpus in the Albanian language is needed.
Kote et al. [14] presented a comprehensive experimental evaluation of machine learning algorithms
applied for opinion mining in the Albanian language. Among the algorithms tested that performed better
include logistic and multi-class classifier, hyper pipes, radial basis function (RBF) classifier, and RBF
network with the classification accuracy ranging from 79% to 94%. Here authors trained the classification
algorithms over a corpus of 500 news articles in Albanian consisting of 5 different topics. Here, each topic
was represented with a balanced set of articles opinionated as positive or negative. The experimental results
were interpreted concerning several evaluation criteria for each algorithm showing interesting features in the
performance of each algorithm.
Kastrati et al. [15] presented the sentiment analysis of people’s opinions expressed on Facebook
with regard to the pandemic situation in the Albanian language. Here the authors developed a deep learning-
based model to classify people’s opinions as neutral, negative, or positive. In order to train their model, they
created a new specific corpus containing 10,742 manually annotated Facebook comments in the Albanian
language. The authors conclude that combining the bidirectional long short-term memory neural network
(BiLSTM) with an attention mechanism outperformed the other approaches in their sentiment analysis task.
The best results achieved by their proposed model given in terms of Precision, Recall, and F1-score were
72.31%, 72.25%, and 72.09%, respectively.
Vasili et al. [16] presented sentiment analysis on Twitter messages for the Albanian language. The
authors compared the results among several methods and noted the challenges that arise when dealing with
sentiment analysis for Albanian language. They studied the performance of sentiment classification
techniques using three main approaches: traditional machine learning, lexicon-based and deep learning-based
approach. Their experiments revealed that long short-term memory (LSTM) based recurrent neural network
(RNN) with Glove as a feature extraction technique provides the best results with F-score=87.8%, followed
by Logistic Regression.
Skënduli and Biba [17] presented an approach for analyzing users’ emotions on microblogging texts
and postings in the Albanian language. Here authors studied users’ emotions at the sentence level by using
deep learning methods. Their approach was based on the idea of classifying a text fragment into a set of
pre-defined emotion categories (based on Ekman’s model) and therefore aims at detecting the emotional state
of the writer conveyed through the text. To perform their experiments, they built a new dataset that contained
manually annotated Facebook posts belonging to some active Albanian politicians, which were classified
using Ekman’s model and finally separated into six smaller datasets. Then, they performed a set of
experiments on these datasets to evaluate deep learning and classical machine learning models. Furthermore,
in their analysis, they also adopted a domestic stemming tool for the Albanian language to preprocess the
datasets, which showed a slight improvement in the classification accuracy. In general, the obtained results
showed that deep learning outperformed the other classical machine learning models, that is, Naive Bayes
(NB), instance-based learner (IBK), and support vector machines (SMO) with a given accuracy ranging from
70.2% to 91.2% for (correctly classified unstemmed instances), and 67.0% to 92.4% for (correctly classified
stemmed instances).
3.5. Hate speech detection
Internet in general and online social media, in particular, have greatly facilitated and changed the
way people communicate. As generally there is no censorship, online social media platforms sometimes are
used for the dissemination of aggressive and hateful content. Recently, there has been an increased research
interest among the academic community and practitioners to build effective automatic solutions for hate
speech detection. However, there is very limited work to address this problem for the Albanian language.
Ajdari et al. [19] explored the idea of building automatic hate speech detection in the public
Albanian language pages. As there was no previous hate speech corpus for the Albanian language, they
collected data from Facebook pages in the Albanian language and built their hate speech corpus. The corpus
comprised 4,886 comments, and two annotators were employed to categorize all these comments as hate or
no hate. Then, to assess the model generalization, the authors trained a SVM classifier using a training set of
4,000 instances, and the rest of 886 instances were used as a testing set. The best-obtained results by their
proposed model in terms of precision 61%, recall 57%, and F1-score 58%. Further, Raufi and Xhaferri [20]
attempted in the direction of implementation of a lightweight machine learning classification model for hate
speech detection in the Albanian language for mobile applications. Their initial testing and evaluations
provided promising results concerning classifier accuracy in mobile environments where frequent and
real-time training of the algorithm is required.
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6432-6439
6436
3.6. Question answering system
Recently, there has been an increasing trend toward the implementation of different systems and
tools for question answering. However, to the best of our knowledge, there is only one research work that
addressed this problem for the Albanian language. Trandafili et al. [26] work is the first study published
around question answering system for the Albanian language. The system was built based on the idea of
extracting answers to factoid questions for a given text. Experimental results obtained from the proposed
approach were promising and showed that this was an effective solution for single domain documents [26].
3.7. Summarization techniques
Trandafili et al. [22] proposed a novel document summarization system designed specifically for the
Albanian language. Here authors showed experimentally that the enrichment of the summarization system
with language-dependent elements improves the systems’ performance and the compression rate. Vasili et al.
[21] presented a study of summarization techniques for the Albanian language. Here, the authors begin by
pointing out a theoretical approach where the widely used summarization techniques are described. They
then further continued by applying these techniques to the Albanian language, since the language is an
important factor that might lead to different outcomes for each algorithm, due to its structure, its form, and its
rules. They also provided a comprehensive overview of the text summarization techniques mainly used for
high-resource languages and then conclude about the most appropriate summarizing techniques for the
Albanian language. They also provided a formal way of verifying the correctness of their obtained results, by
using the best criteria defined by experts in the field. They also published their dataset and practical approach
implemented in their study.
3.8. Text classification
Trandafili et al. in [25], evaluated the performance of some of the most important text classification
algorithms over a corpus composed of Albanian texts. The authors used several NLP preprocessing steps to
their data before feeding them as input to the learning models. To assess the model generalization, the
authors trained several algorithms, namely, simple logistics (SL), naive Bayes (NB), k-nearest neighbor
(K-NN), decision trees (DT), random forest (RF), support vector machine (SVM), and neural networks (NN).
The obtained results showed that naive Bayes and SVMs achieved better results compared to the other
algorithms used to classify Albanian corpus.
Kadriu and Abazi in [23] evaluated and compared several classification algorithms used for text
classification. Their research was mainly focused on the classification of text extracted from Albanian news
articles into a set of pre-defined categories, namely, latest news, economy, sport, showbiz, technology,
culture, and world. Authors here applied several preprocessing steps to the text corpus before feeding them as
input to the machine learning models, including stop words removal, and the creation of a separate file for
each category, then these files were split into sentences, then, for each sentence, one of the predefined
categories was assigned, and finally, a list of tuples sentence/category has been created. In this study, the
authors trained several classifiers including multinomial, linear support vector classifier (SVC), Neighbour,
Bernoulli, Centroid, stochastic gradient descent (SGD), Perceptron, Ridge, passive aggressive. The average
accuracy for the above classifiers was in the range of 74% and 91% depending on the input set.
Later on, Kadriu et al. [24] presented an extended study concerning text classification for Albanian
news articles, which was based on a bag of words model and word analogies. In this study, the authors
focused on text classification using two approaches: i) the text classification treats words as independent
components and ii) the text classification treats words based on their semantic and syntactic word similarities.
The obtained results showed that the bag of words model does better than fastText for smaller datasets and
fastText showed better performance when classifying multi-label text. The best results were achieved with a
bag of words model, with an accuracy of 94%.
3.9. Named entity recognition
Skënduli and Biba [9] proposed the first named entity recognition (NER) model for the Albanian
language. The authors here employed a maximum entropy approach based on the Apache OpenNLP tool. For
evaluation, they created a manually annotated corpus containing text from historical and political domains.
Authors experimentally demonstrated that models can be further improved if larger corpora will be available.
The obtained results reached the values of precision, recall, and F-measure 85%, 70%, 76% (person corpus),
83%, 66%, 73% (location corpus) and 69%, 60%, 64% (organization corpus) respectively.
Kono and Hoxha in [10] presented their NER approach for documents written in the Albanian
language. They explored the use of conditional random fields (CRFs) for this purpose. They created a
manually annotated corpus based on the Albanian news documents published in 2015 and 2016. The obtained
results reached the values of precision, recall and F-score are 83.2%, 60.1%, and 69.7% respectively.
6. Int J Elec & Comp Eng ISSN: 2088-8708
Natural language processing for Albanian: A state-of-the-art survey (Muhamet Kastrati)
6437
Hoxha and Baxhaku [11] reported on the first automatically-generated NE annotated corpus for the
Albanian language. In this study, the authors have also used news articles from Albanian news media as a
document source, which were automatically tagged using a custom generated gazetteer from Albanian
Wikipedia.
Trandafili et al. in [12] proposed their NER approach for the Albanian language corpus based on
deep learning models. In this study, the authors built a deep neural network using long short-term memory
(LSTM) cells as hidden layers and CRF as the output, using both word and character tagging. For evaluation,
they created their own manually annotated corpus. The results showed that the NER performance can be
further improved by using a larger annotated corpus to train the model. Table 1 presents a short summary of
the studies’ characteristics on sentiment and emotion analysis, hate speech detection, question answering
system, documents summarization, text classification, and named entity recognition for the Albanian
language.
Table 1. A summary of the studies on the application of NLP for the Albanian language
Study Application Approach Performance Corpus
Biba and Mane [13] Sentiment analysis Machine learning Accuracy: 86% to
92%
depending on the topic
400 documents containing
political news covering 5
different topics
Kote et al. [14] Sentiment analysis Machine learning Accuracy:
79% to 94%
500 news articles covering 5
different topics
Kastrati et al. [15] Sentiment analysis Deep learning
(BiLSTM + attention
mechanism)
Precision: 72.31%,
Recall: 72.25%,
F1-score: 72.09%.
10,742 comments collected
from the National Institute of
Public Health of Kosovo
(NIPHK)’s Facebook page
Vasili et al. [16] Sentiment
Analysis
Deep learning
LSTM-RNN with
Glove
Best results given in
terms of F1 score
F1: 87.8%
“Low quality annotators which
should be eliminated from
further considerations”
Skënduli-Prifti and
Biba [17], [18]
Emotion analysis Deep learning (CNN) Accuracy:
70.2% to 91.2%
(unstemmed) 67.0% to
92.4% for (stemmed)
6,358 Facebook posts
belonging to Albanian
politicians
Ajdari et al. [19] Hate speech
detection
Latent semantic
analysis
Precision: 61%,
Recall: 57%, F1-score:
58%.
4,886 Facebook posts
Raufi and Xhaferri
[20]
Hate speech
detection
Machine learning Accuracy: 50% to
95%
421 Facebook posts, 648 words
Trandafili et al. [26] Question answering Machine learning - the first attempt at a Q&A
system for Albanian
Trandafili et al. [22] Document
summarization
Machine learning - -
Trandafili et al. [25] Text classification Machine learning - -
Kadriu and Abazi
[23]
Text classification Machine learning Avg accuracy:
74% and 91%
Albanian language news articles
Kadriu et al. [24] Text classification Machine learning Accuracy: 94% Albanian language news articles
Skenduli-Prifti and
Biba [9]
Named entity
recognition
Maximum entropy
Markov model
Precision: 69% to 85%
Recall: 60% to 70%
Tagged corpus of around
3,000 sentences and 87,900
words.
Kono and Hoxha [10] Named entity
recognition
CRFs Precision: 83.26%
Recall: 60.14%
F-Score: 69.66%
50,000 words, from news
articles
Hoxha and Baxhaku
[11]
Named entity
recognition
Maximum entropy
model
Precision: 79.51%
Recall: 40.55%
F-Score: 52.99%
news articles from Albanian
news media as a document
source
Trandafili et al. [12] Named entity
recognition
Deep learning (LSTM
and CRF as output)
Precision: 78.5%
Recall: 72%
-
4. CONCLUSION
A major limitation of NLP today is the fact that most of the research work and application in
industry is done only for the high-resource languages and there is less work for other low-resource languages
including the Albanian language. This paper introduces and summarizes the most recent trends, methods,
applications, and gaps, in the conducted research in the area of NLP for the Albanian language. Firstly, we
described the theoretical basis of NLP, Albanian language, application of NLP for the Albanian language,
and technical approaches used to tackle these tasks. Then we provided a comprehensive overview and
research progress reached in NLP for the Albanian language. Finally, we discuss some challenges and open
problems in NLP for the Albanian language.
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6432-6439
6438
REFERENCES
[1] J. Hirschberg and C. D. Manning, “Advances in natural language processing,” Science, vol. 349, no. 6245, pp. 261–266, Jul.
2015, doi: 10.1126/science.aaa8685.
[2] W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative study of CNN and RNN for natural language processing,”
arXiv:1702.01923, 2017.
[3] J. Trommer and D. Kallulli, “A morphological tagger for standard Albanian,” in Proceedings of LREC, 2004, pp. 1–8.
[4] B. Kabashi and T. Proisl, “A proosal for a part-of-speech tagset for the Albanian language,” in Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 4305–4310.
[5] B. Kabashi and T. Proisl, “Albanian part-of-speech tagging: Gold standard and evaluation,” in Proceedings of the Eleventh
International Conference on Language Resources and Evaluation (LREC 2018), 2018, pp. 2593–2599.
[6] J. Sadiku and M. Biba, “Automatic stemming of Albanian through a rule-based approach,” Journal of International Research
Publications: Language, Individuals and Society, vol. 6, 2012.
[7] B. Kabashi, “A lexicon of albanian for natural language processing,” The XVIII EURALEX International Congress, pp. 855–862,
2019.
[8] A. Misini, E. Canhasi, and S. Krrabaj, “Albanian syntactic parsing,” in ICT Innovations 2020, 2020, pp. 1–16.
[9] M. P. Skenduli and M. Biba, “A named entity recognition approach for Albanian,” in 2013 International Conference on Advances
in Computing, Communications and Informatics (ICACCI), Aug. 2013, pp. 1532–1537, doi: 10.1109/ICACCI.2013.6637407.
[10] G. Kono and K. Hoxha, “Named entity recognition in albanian based on crfs approach,” in Proceedings of the 2nd International
Conference on Recent Trends and Applications in Computer Science and Information Technology, (RTA-CSIT), 2016, pp. 47–52.
[11] K. Hoxha and A. Baxhaku, “An automatically generated annotated corpus for Albanian named entity recognition,” Cybernetics
and Information Technologies, vol. 18, no. 1, pp. 95–108, Mar. 2018, doi: 10.2478/cait-2018-0009.
[12] E. Trandafili, E. K. Meçe, and E. Duka, “A named entity recognition approach for Albanian using deep learning,” in Complex
Pattern Mining, 2020, pp. 85–101.
[13] M. Biba and M. Mane, “Sentiment analysis through machine learning: An experimental evaluation for Albanian,” in Recent
Advances in Intelligent Informatics, 2014, pp. 195–203.
[14] N. Kote, M. Biba, and E. Trandafili, “A thorough experimental evaluation of algorithms for opinion mining in Albanian,” in
Advances in Internet, Data & Web Technologies, 2018, pp. 525–536.
[15] Z. Kastrati, L. Ahmedi, A. Kurti, F. Kadriu, D. Murtezaj, and F. Gashi, “A deep learning sentiment analyser for social media
comments in low-resource languages,” Electronics, vol. 10, no. 10, May 2021, doi: 10.3390/electronics10101133.
[16] R. Vasili, E. Xhina, I. Ninka, and D. Terpo, “Sentiment analysis on social media for Albanian language,” OALib, vol. 08, no. 06,
pp. 1–31, 2021, doi: 10.4236/oalib.1107514.
[17] M. P. Skenduli, M. Biba, C. Loglisci, M. Ceci, and D. Malerba, “User-emotion detection through sentence-based classification
using deep learning: A case-study with microblogs in Albanian,” in Foundations of Intelligent Systems, 2018, pp. 258–267.
[18] M. P. Skenduli and M. Biba, “Classification and clustering of emotive microblogs in Albanian: two user-oriented tasks,” in
Complex Pattern Mining, 2020, pp. 153–171.
[19] J. Ajdari, F. Ismaili, B. Raufi, and X. Zenuni, “Automatic hate speech detection in online contents using latent semantic analysis,”
Pressacademia, vol. 5, no. 1, pp. 368–371, Jun. 2017, doi: 10.17261/Pressacademia.2017.612.
[20] B. Raufi and I. Xhaferri, “Application of machine learning techniques for hate speech detection in mobile applications,” in 2018
International Conference on Information Technologies (InfoTech), Sep. 2018, pp. 1–4, doi: 10.1109/InfoTech.2018.8510738.
[21] R. Vasili, E. Xhina, I. Ninka, and T. Souliotis, “A study of summarization techniques in Albanian language,” Knowledge
International Journal, vol. 28, no. 7, pp. 2251–2257, Dec. 2018, doi: 10.35120/kij28072251R.
[22] E. Trandafili, H. Paci, and E. Karaj, “A novel document summarization system for Albanian language,” in Proceedings of the
20th International Conference on Computer Systems and Technologies, Jun. 2019, pp. 273–277, doi: 10.1145/3345252.3345275.
[23] A. Kadriu and L. Abazi, “A comparison of algorithms for text classification of Albanian news articles,” Entrenova-Enterprise
Research Innovation Conference, vol. 3, no. 1, pp. 62–68, 2017.
[24] A. Kadriu, L. Abazi, and H. Abazi, “Albanian text classification: Bag of words model and word analogies,” Business Systems
Research Journal, vol. 10, no. 1, pp. 74–87, Apr. 2019, doi: 10.2478/bsrj-2019-0006.
[25] E. Trandafili, N. Kote, and M. Biba, “Performance evaluation of text categorization algorithms using an Albanian corpus,” in
Advances in Internet, Data & Web Technologies, 2018, pp. 537–547.
[26] E. Trandafili, E. K. Meçe, K. Kica, and H. Paci, “A novel question answering system for Albanian language,” in Advances in
Internet, Data & Web Technologies, 2018, pp. 514–524.
[27] O. Piton, K. Lagji, and R. Përnaska, “Electronic dictionaries and transducers for automatic processing of the Albanian language,”
in Natural Language Processing and Information Systems, Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 407–413,
doi: 10.1007/978-3-540-73351-5_38.
[28] O. Piton and K. Lagji, “Morphological study of Albanian words, and processing with NooJ,” arXiv preprint arXiv:1002.0485,
Feb. 2010.
[29] T. Arkhangelskij, M. Daniel, M. Morozova, and A. Rusakov, “Albanian and the languages of the Balkans,” (in Albanian), in
Konferencë shkencore - Scientific Conference, 2012, pp. 635–642.
[30] A. Kadriu, “NLTK tagger for albanian using iterative approach,” in Proceedings of the ITI 2013 35th International Conference on
Information Technology Interfaces, 2013, pp. 283–288, doi: 10.2498/iti.2013.0565.
[31] C. Kirov et al., “UniMorph 2.0: Universal Morphology,” in Proceedings of the Eleventh International Conference on Language
Resources and Evaluation (LREC 2018), 2018, pp. 1–6.
[32] N. N. Karanikolas, “Bootstrapping the Albanian information retrieval,” in 2009 Fourth Balkan Conference in Informatics, 2009,
pp. 231–235, doi: 10.1109/BCI.2009.16.
[33] I. Collaku and E. Adal, “Morphological parsing of albanian language: a different approach to albanian verbs,” in International
Conference on Computer Science and Communication Engineering, 2015, pp. 87–91, doi: 10.33107/ubt-ic.2015.94.
[34] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” WIREs Data Mining and Knowledge
Discovery, vol. 8, no. 4, Jul. 2018, doi: 10.1002/widm.1253.
[35] B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1–167,
2012.
8. Int J Elec & Comp Eng ISSN: 2088-8708
Natural language processing for Albanian: A state-of-the-art survey (Muhamet Kastrati)
6439
BIOGRAPHIES OF AUTHORS
Muhamet Kastrati received the B.Eng. degree in computer engineering from the
University of Prishtina, Kosovo, in 2007 and an M.Sc. degree in computer engineering from
the University of Prishtina, Kosovo, in 2014. Currently, he is a Ph.D. candidate at the
department of computer science, University of New York Tirana, Albania. His research
interests include optimization algorithms, statistical relational learning, machine learning, deep
learning, natural language processing, and social network analysis. He can be contacted at
email: muhamet.kastrati@gmail.com.
Marenglen Biba received the Laurea Degree (5-year) Cumlaude degree in
computer science from the University of Bari, Italy, in 2004, and the Ph.D. degree in computer
sciences, in 2009. He has been a professor of computer sciences at the University of New York
Tirana, since 2009. He is currently the Dean of the Faculty of Engineering and Architecture,
University of New York Tirana in Albania. He has authored or co-authored several papers
published in international journals and conferences and has served as a reviewer for many
reputed journals. His research interests include artificial intelligence, machine learning,
pattern recognition, data mining, computational biology, document image understanding,
information extraction, and social network analysis. He can be contacted at email:
marenglenbiba@unyt.edu.al.