SlideShare a Scribd company logo
May 2024: Top10 Cited Articles in Natural
Language Computing
International Journal on Natural Language
Computing (IJNLC)
https://airccse.org/journal/ijnlc/index.html
ISSN: 2278 - 1307 [Online]; 2319 - 4111 [Print]
Google Scholar
https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
Rag-Fusion: A New Take on Retrieval Augmented Generation
Zackary Rackauckas, Infineon Technologies, California
Abstract
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain
product information. This problem is traditionally addressed with retrieval-augmented generation
(RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion
method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple
queries, reranking them with reciprocal scores and fusing the documents and scores. Through
manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-
Fusion was able to provide accurate and comprehensive answers due to the generated queries
contextualizing the original query from various perspectives. However, some answers strayed off
topic when the generated queries' relevance to the original query is insufficient. This research
marks significant progress in artificial intelligence (AI) and natural language processing (NLP)
applications and demonstrates transformations in a global and multi-industry context.
Keywords
Chatbot, Retrieval-augmented Generation, Reciprocal Rank Fusion, Natural Language
Processing
Full Text: https://aircconline.com/ijnlc/V13N1/13124ijnlc03.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol13.html
Performance, Energy Consumption and Costs: A Comparative Analysis of Automatic Text
Classification Approaches in the Legal Domain
Leonardo Rigutini1, Achille Globo1, Marco Stefanelli2, Andrea Zugarini1, Sinan Gultekin1,
Marco Ernandes1, 1expert.ai spa, Italy, 2University of Siena, Italy
Abstract
The common practice in Machine Learning research is to evaluate the top-performing models
based on their performance. However, this often leads to overlooking other crucial aspects that
should be given careful consideration. In some cases, the performance differences between
various approaches may be insignificant, whereas factors like production costs, energy
consumption, and carbon footprint should be taken into account. Large Language Models
(LLMs) are widely used in academia and industry to address NLP problems. In this study, we
present a comprehensive quantitative comparison between traditional approaches (SVM-based)
and more recent approaches such as LLM (BERT family models) and generative models (GPT2
and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only
performance parameters (standard indices), but also alternative measures such as timing, energy
consumption and costs, which collectively contribute to the carbon footprint. To ensure a
complete analysis, we separately considered the prototyping phase (which involves model
selection through training-validation-test iterations) and the in-production phases. These phases
follow distinct implementation procedures and require different resources. The results indicate
that simpler algorithms often achieve performance levels similar to those of complex models
(LLM and generative models), consuming much less energy and requiring fewer resources.
These findings suggest that companies should consider additional considerations when choosing
machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary
for the scientific world to also begin to consider aspects of energy consumption in model
evaluations, in order to be able to give real meaning to the results obtained using standard
metrics (Precision, Recall, F1 and so on).
Keywords
NLP, text mining, green AI, green NLP, carbon footprint, energy consumption, evaluation.
Full Text: https://aircconline.com/ijnlc/V13N1/13124ijnlc02.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol13.html
A Study on the Appropriate Size of the Mongolian General Corpus
Choi Sun Soo1 and Ganbat Tsend2, 1University of the Humanities, Mongolia, 2Otgontenger
University, Mongolia
Abstract
This study aims to determine the appropriate size of the Mongolian general corpus. This study
used the Heaps’ function and Type-Token Ratio (TTR) to determine the appropriate size of the
Mongolian general corpus. This study’s sample corpus of 906,064 tokens comprised texts from
10 domains of newspaper politics, economy, society, culture, sports, world articles and laws,
middle and high school literature textbooks, interview articles, and podcast transcripts. First, we
estimated the Heaps’ function with this sample corpus. Next, we observed changes in the number
of types and TTR values while increasing the number of tokens by one million using the
estimated Heaps’ function. As a result of observation, we found that the TTR value hardly
changed when the number of tokens exceeded 39~42 million. Thus, we conclude that an
appropriate size for a Mongolian general corpus is 39-42 million tokens.
Keywords
Mongolian general corpus, Appropriate size of corpus, Sample corpus, Heaps’ function, TTR,
Type, Token.
Full Text: https://aircconline.com/ijnlc/V12N3/12323ijnlc02.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol12.html
Evaluating BERT and ParsBERT for Analyzing Persian Advertisement Data
Ali Mehrban1 and Pegah Ahadian2, 1Newcastle University, UK, 2Kent State University,
USA
Abstract
This paper discusses the impact of the Internet on modern trading and the importance of data
generated from these transactions for organizations to improve their marketing efforts. The paper
uses the example of Divar, an online marketplace for buying and selling products and services in
Iran, and presents a competition to predict the percentage of a car sales ad that would be
published on the Divar website. Since the dataset provides a rich source of Persian text data, the
authors use the Hazm library, a Python library designed for processing Persian text, and two
state-of-the-art language models, mBERT and ParsBERT, to analyze it. The paper's primary
objective is to compare the performance of mBERT and ParsBERT on the Divar dataset. The
authors provide some background on data mining, Persian language, and the two language
models, examine the dataset's composition and statistical features, and provide details on their
fine-tuning and training configurations for both approaches. They present the results of their
analysis and highlight the strengths and weaknesses of the two language models when applied to
Persian text data. The paper offers valuable insights into the challenges and opportunities of
working with low-resource languages such as Persian and the potential of advanced language
models like BERT for analyzing such data. The paper also explains the data mining process,
including steps such as data cleaning and normalization techniques. Finally, the paper discusses
the types of machine learning problems, such as supervised, unsupervised, and reinforcement
learning, and the pattern evaluation techniques, such as confusion matrix. Overall, the paper
provides an informative overview of the use of language models and data mining techniques for
analyzing text data in low-resource languages, using the example of the Divar dataset.
Keywords
Text Recognition, Persian text, NLP, mBERT, ParsBERT
Full Text: https://aircconline.com/ijnlc/V12N2/12223ijnlc02.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol12.html
Understanding Chinese Moral Stories with Further Pre-Training
Jing Qian1, Yong Yue1, Katie Atkinson2 and Gangmin Li3, 1Xi’an Jiaotong Liverpool
University, China, 2University of Liverpool, UK, 3University of Bedfordshire, UK
Abstract
The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by
delving beyond the concrete occurrences and dynamic personas. Specifically, the narrative is
compacted into a single statement without involving any characters within the original text,
necessitating a more astute language model that can comprehend connotative morality and
exhibit commonsense reasoning. The “pre-training + fine-tuning” paradigm is widely embraced
in neural language models. In this paper, we propose an intermediary phase to establish an
improved paradigm of “pre-training + further pre-training + fine-tuning”. Further pre-training
generally refers to continual learning on task-specific or domain-relevant corpora before being
applied to target tasks, which aims at bridging the gap in data distribution between the phases of
pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH that
composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of
domain-adaptive pre-training in the intermediary phase. The first step depends on a newly-
collected Chinese dataset of Confucian moral culture. And the second step bases on the Chinese
version of a frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the
backbone model with inferential knowledge besides morality. By comparison with several
advanced models including BERT-base, RoBERTa-base and T5-base, experimental results on
two understanding tasks demonstrate the effectiveness of our proposed three-phase paradigm.
Keywords
Moral Understanding, Further Pre-training, Knowledge Graph, Pre-trained Language Model
Full Text: https://aircconline.com/ijnlc/V12N2/12223ijnlc01.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol12.html
LOCATION-BASED SENTIMENT ANALYSIS OF 2019 NIGERIA PRESIDENTIAL
ELECTION USING A VOTING ENSEMBLE APPROACH
Ikechukwu Onyenwe1, Samuel N.C. Nwagbo2, Ebele Onyedinma1, Onyedika Ikechukwu-
Onyenwe1, Chidinma A. Nwafor3 and Obinna Agbata1
1*
Computer Science Department, Nnamdi Azikiwe University, Onitsha-Enugu Expressway,
Awka, PMB 5025, Anambra, Nigeria.
2*
Political Science Department, Nnamdi Azikiwe University, Onitsha-Enugu Expressway, Awka,
PMB 5025, Anambra, Nigeria.
3*
Computer Science Department, Nigerian Army College of Environmental Science and
Technology, North-Bank, Makurdi,PMB 102272, Benue, Nigeria
Abstract
Nigeria president Buhari defeated his closest rival Atiku Abubakar by over 3 million votes. He
was issued a Certificate of Return and was sworn in on 29 May 2019. However, there were
claims of widespread hoax by the opposition. The sentiment analysis captures the opinions of the
masses over social media for global events. In this paper, we use 2019 Nigeria presidential
election tweets to perform sentiment analysis through the application of a voting ensemble
approach (VEA) in which the predictions from multiple techniques are combined to find the best
polarity of a tweet (sentence). This is to determine public views on the 2019 Nigeria Presidential
elections and compare them with actual election results. Our sentiment analysis experiment is
focused on location-based viewpoints where we used Twitter location data. For this experiment,
we live-streamed Nigeria 2019 election tweets via Twitter API to create tweets dataset of 583816
size, pre-processed the data, and applied VEA by utilizing three different Sentiment Classifiers
to obtain the choicest polarity of a given tweet. Furthermore, we segmented our tweets dataset
into Nigerian states and geopolitical zones, then plotted state-wise and geopolitical-wise user
sentiments towards Buhari and Atiku and their political parties. The overall objective of the use
of states/geopolitical zones is to evaluate the similarity between the sentiment of location-based
tweets compared to actual election results. The results reveal that whereas there are election
outcomes that coincide with the sentiment expressed on Twitter social media in most cases as
shown by the polarity scores of different locations, there are also some election results where our
location analysis similarity test failed.
Keywords
Nigeria, Election, Sentiment Analysis, Politics, Tweets, Exploration Data Analysis, location data
Full Text: https://aircconline.com/ijnlc/V12N1/12123ijnlc01.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol12.html
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context
for Continuous Speech Recognition
Piyush Behre, Sharman Tan, Padma Varadharajan and Shuangyu Chang, Microsoft
Corporation
Abstract
While speech recognition Word Error Rate (WER) has reached human parity for English,
continuous speech recognition scenarios such as voice typing and meeting transcriptions still
suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or
slow speakers. Transformer sequence tagging models are effective at capturing long bi-
directional context, which is crucial for automatic punctuation. Automatic Speech Recognition
(ASR) production systems, however, are constrained by real-time requirements, making it hard
to incorporate the right context when making punctuation decisions. Context within the segments
produced by ASR decoders can be helpful but limiting in overall punctuation performance for a
continuous speech session. In this paper, we propose a streaming approach for punctuation or re-
punctuation of ASR output using dynamic decoding windows and measure its impact on
punctuation and segmentation accuracy across scenarios. The new system tackles over-
segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation
achieves an average BLEUscore improvement of 0.66 for the downstream task of Machine
Translation (MT).
Keywords
automatic punctuation, automatic speech recognition, re-punctuation, speech segmentation.
Full Text: https://aircconline.com/ijnlc/V11N6/11622ijnlc01.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol11.html
A Robust Three-Stage Hybrid Framework for English to Bangla Transliteration
Redwan Ahmed Rizvee, Asif Mahmood, Shakur Shams Mullick and Sajjadul Hakim, Tiger
IT Bangladesh Limited, Dhaka, Bangladesh
Abstract
Phonetic typing using the English alphabet has become widely popular nowadays for social
media and chat services. As a result, a text containing various English and Bangla words and
phrases has become increasingly common. Existing transliteration tools display poor
performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration
(THT) framework that can transliterate both English words and phonetic typed Bangla words
satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based
techniques. Experimental results confirm superiority of THT as it significantly outperforms the
benchmark transliteration tool.
Keywords
Transliteration framework, phonetic typing, English to Bangla, hybrid framework, THT.
Full Text: https://aircconline.com/ijnlc/V11N1/11122ijnlc04.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol11.html
Analyzing Architectures for Neural Machine Translation using Low Computational
Resources
Aditya Mandke, Onkar Litake, and Dipali Kadam, SCTR’s Pune Institute of Computer
Technology, India
Abstract
With the recent developments in the field of Natural Language Processing, there has been a rise
in the use of different architectures for Neural Machine Translation. Transformer architectures
are used to achieve state-of-the-art accuracy, but they are very computationally expensive to
train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We
train our models on low computational resources and investigate the results. As expected,
transformers outperformed other architectures, but there were some surprising results.
Transformers consisting of more encoders and decoders took more time to train but had fewer
BLEU scores. LSTM performed well in the experiment and took comparatively less time to train
than transformers, making it suitable to use in situations having time constraints.
Keywords
Machine Translation, Indic Languages, Natural Language Processing.
Full Text: https://aircconline.com/ijnlc/V10N5/10521ijnlc02.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol10.html
Developing Products Update-Alert System for E-Commerce Websites Users using Html
Data and Web Scraping Technique
Ikechukwu Onyenwe, Ebele Onyedinma, Chidinma Nwafor and Obinna Agbata, Nnamdi
Azikiwe University, Nigeria
Abstract
Websites are regarded as domains of limitless information which anyone and everyone can
access. The new trend of technology has shaped the way we do and manage our businesses.
Today, advancements in Internet technology has given rise to the proliferation of e-commerce
websites. This, in turn made the activities and lifestyles of marketers/vendors, retailers and
consumers (collectively regarded as users in this paper) easier as it provides convenient
platforms to sale/order items through the internet. Unfortunately, these desirable benefits are not
without drawbacks as these platforms require that the users spend a lot of time and efforts
searching for best product deals, products updates and offers on ecommerce websites.
Furthermore, they need to filter and compare search results by themselves which takes a lot of
time and there are chances of ambiguous results. In this paper, we applied web crawling and
scraping methods on an e-commerce website to obtain HTML data for identifying products
updates based on the current time. These HTML data are preprocessed to extract details of the
products such as name, price, post date and time, etc. to serve as useful information for users.
Keywords
NATURAL LANGUAGE PREPROCESSING (NLP), E-COMMERCE, E-RETAIL, HTML,
DATA, Web, Webscrapping
Full Text: https://aircconline.com/ijnlc/V10N5/10521ijnlc01.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol10.html

More Related Content

Similar to May 2024 - Top10 Cited Articles in Natural Language Computing

taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
butest
 
An Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum SAn Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum S
Richard Hogue
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
Bhaskar Chatterjee
 
Aq35241246
Aq35241246Aq35241246
Aq35241246
IJERA Editor
 
A0210110
A0210110A0210110
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
gowthamnaidu0986
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
redpel dot com
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
ijcsity
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
IOSR Journals
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
IJDKP
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
ijnlc
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
ijdms
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
KSHITIJCHAUDHARY20
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
Lisa Graves
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
AIRCC Publishing Corporation
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
International Journal of Modern Research in Engineering and Technology
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 

Similar to May 2024 - Top10 Cited Articles in Natural Language Computing (20)

taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
 
An Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum SAn Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum S
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
Aq35241246
Aq35241246Aq35241246
Aq35241246
 
A0210110
A0210110A0210110
A0210110
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
 
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 

More from kevig

Identification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian LanguagesIdentification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian Languages
kevig
 
Effect of Query Formation on Web Search Engine Results
Effect of Query Formation on Web Search Engine ResultsEffect of Query Formation on Web Search Engine Results
Effect of Query Formation on Web Search Engine Results
kevig
 
Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
Investigations of the Distributions of Phonemic Durations in Hindi and DogriInvestigations of the Distributions of Phonemic Durations in Hindi and Dogri
Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
kevig
 
Effect of Singular Value Decomposition Based Processing on Speech Perception
Effect of Singular Value Decomposition Based Processing on Speech PerceptionEffect of Singular Value Decomposition Based Processing on Speech Perception
Effect of Singular Value Decomposition Based Processing on Speech Perception
kevig
 
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT ModelsIdentifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
kevig
 
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT ModelsIdentifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
kevig
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
kevig
 
Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Tagging
kevig
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
kevig
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimization
kevig
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
kevig
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generation
kevig
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
kevig
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
kevig
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
kevig
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
kevig
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
kevig
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...
kevig
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
kevig
 
February 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfFebruary 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdf
kevig
 

More from kevig (20)

Identification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian LanguagesIdentification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian Languages
 
Effect of Query Formation on Web Search Engine Results
Effect of Query Formation on Web Search Engine ResultsEffect of Query Formation on Web Search Engine Results
Effect of Query Formation on Web Search Engine Results
 
Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
Investigations of the Distributions of Phonemic Durations in Hindi and DogriInvestigations of the Distributions of Phonemic Durations in Hindi and Dogri
Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
 
Effect of Singular Value Decomposition Based Processing on Speech Perception
Effect of Singular Value Decomposition Based Processing on Speech PerceptionEffect of Singular Value Decomposition Based Processing on Speech Perception
Effect of Singular Value Decomposition Based Processing on Speech Perception
 
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT ModelsIdentifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
 
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT ModelsIdentifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
 
Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Tagging
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimization
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generation
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
 
February 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfFebruary 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdf
 

Recently uploaded

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 

Recently uploaded (20)

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 

May 2024 - Top10 Cited Articles in Natural Language Computing

  • 1. May 2024: Top10 Cited Articles in Natural Language Computing International Journal on Natural Language Computing (IJNLC) https://airccse.org/journal/ijnlc/index.html ISSN: 2278 - 1307 [Online]; 2319 - 4111 [Print] Google Scholar https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
  • 2. Rag-Fusion: A New Take on Retrieval Augmented Generation Zackary Rackauckas, Infineon Technologies, California Abstract Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG- Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context. Keywords Chatbot, Retrieval-augmented Generation, Reciprocal Rank Fusion, Natural Language Processing Full Text: https://aircconline.com/ijnlc/V13N1/13124ijnlc03.pdf Volume URL: http://airccse.org/journal/ijnlc/vol13.html
  • 3. Performance, Energy Consumption and Costs: A Comparative Analysis of Automatic Text Classification Approaches in the Legal Domain Leonardo Rigutini1, Achille Globo1, Marco Stefanelli2, Andrea Zugarini1, Sinan Gultekin1, Marco Ernandes1, 1expert.ai spa, Italy, 2University of Siena, Italy Abstract The common practice in Machine Learning research is to evaluate the top-performing models based on their performance. However, this often leads to overlooking other crucial aspects that should be given careful consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches (SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance parameters (standard indices), but also alternative measures such as timing, energy consumption and costs, which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of complex models (LLM and generative models), consuming much less energy and requiring fewer resources. These findings suggest that companies should consider additional considerations when choosing machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on). Keywords NLP, text mining, green AI, green NLP, carbon footprint, energy consumption, evaluation. Full Text: https://aircconline.com/ijnlc/V13N1/13124ijnlc02.pdf Volume URL: http://airccse.org/journal/ijnlc/vol13.html
  • 4. A Study on the Appropriate Size of the Mongolian General Corpus Choi Sun Soo1 and Ganbat Tsend2, 1University of the Humanities, Mongolia, 2Otgontenger University, Mongolia Abstract This study aims to determine the appropriate size of the Mongolian general corpus. This study used the Heaps’ function and Type-Token Ratio (TTR) to determine the appropriate size of the Mongolian general corpus. This study’s sample corpus of 906,064 tokens comprised texts from 10 domains of newspaper politics, economy, society, culture, sports, world articles and laws, middle and high school literature textbooks, interview articles, and podcast transcripts. First, we estimated the Heaps’ function with this sample corpus. Next, we observed changes in the number of types and TTR values while increasing the number of tokens by one million using the estimated Heaps’ function. As a result of observation, we found that the TTR value hardly changed when the number of tokens exceeded 39~42 million. Thus, we conclude that an appropriate size for a Mongolian general corpus is 39-42 million tokens. Keywords Mongolian general corpus, Appropriate size of corpus, Sample corpus, Heaps’ function, TTR, Type, Token. Full Text: https://aircconline.com/ijnlc/V12N3/12323ijnlc02.pdf Volume URL: http://airccse.org/journal/ijnlc/vol12.html
  • 5. Evaluating BERT and ParsBERT for Analyzing Persian Advertisement Data Ali Mehrban1 and Pegah Ahadian2, 1Newcastle University, UK, 2Kent State University, USA Abstract This paper discusses the impact of the Internet on modern trading and the importance of data generated from these transactions for organizations to improve their marketing efforts. The paper uses the example of Divar, an online marketplace for buying and selling products and services in Iran, and presents a competition to predict the percentage of a car sales ad that would be published on the Divar website. Since the dataset provides a rich source of Persian text data, the authors use the Hazm library, a Python library designed for processing Persian text, and two state-of-the-art language models, mBERT and ParsBERT, to analyze it. The paper's primary objective is to compare the performance of mBERT and ParsBERT on the Divar dataset. The authors provide some background on data mining, Persian language, and the two language models, examine the dataset's composition and statistical features, and provide details on their fine-tuning and training configurations for both approaches. They present the results of their analysis and highlight the strengths and weaknesses of the two language models when applied to Persian text data. The paper offers valuable insights into the challenges and opportunities of working with low-resource languages such as Persian and the potential of advanced language models like BERT for analyzing such data. The paper also explains the data mining process, including steps such as data cleaning and normalization techniques. Finally, the paper discusses the types of machine learning problems, such as supervised, unsupervised, and reinforcement learning, and the pattern evaluation techniques, such as confusion matrix. Overall, the paper provides an informative overview of the use of language models and data mining techniques for analyzing text data in low-resource languages, using the example of the Divar dataset. Keywords Text Recognition, Persian text, NLP, mBERT, ParsBERT Full Text: https://aircconline.com/ijnlc/V12N2/12223ijnlc02.pdf Volume URL: http://airccse.org/journal/ijnlc/vol12.html
  • 6. Understanding Chinese Moral Stories with Further Pre-Training Jing Qian1, Yong Yue1, Katie Atkinson2 and Gangmin Li3, 1Xi’an Jiaotong Liverpool University, China, 2University of Liverpool, UK, 3University of Bedfordshire, UK Abstract The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by delving beyond the concrete occurrences and dynamic personas. Specifically, the narrative is compacted into a single statement without involving any characters within the original text, necessitating a more astute language model that can comprehend connotative morality and exhibit commonsense reasoning. The “pre-training + fine-tuning” paradigm is widely embraced in neural language models. In this paper, we propose an intermediary phase to establish an improved paradigm of “pre-training + further pre-training + fine-tuning”. Further pre-training generally refers to continual learning on task-specific or domain-relevant corpora before being applied to target tasks, which aims at bridging the gap in data distribution between the phases of pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH that composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of domain-adaptive pre-training in the intermediary phase. The first step depends on a newly- collected Chinese dataset of Confucian moral culture. And the second step bases on the Chinese version of a frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the backbone model with inferential knowledge besides morality. By comparison with several advanced models including BERT-base, RoBERTa-base and T5-base, experimental results on two understanding tasks demonstrate the effectiveness of our proposed three-phase paradigm. Keywords Moral Understanding, Further Pre-training, Knowledge Graph, Pre-trained Language Model Full Text: https://aircconline.com/ijnlc/V12N2/12223ijnlc01.pdf Volume URL: http://airccse.org/journal/ijnlc/vol12.html
  • 7. LOCATION-BASED SENTIMENT ANALYSIS OF 2019 NIGERIA PRESIDENTIAL ELECTION USING A VOTING ENSEMBLE APPROACH Ikechukwu Onyenwe1, Samuel N.C. Nwagbo2, Ebele Onyedinma1, Onyedika Ikechukwu- Onyenwe1, Chidinma A. Nwafor3 and Obinna Agbata1 1* Computer Science Department, Nnamdi Azikiwe University, Onitsha-Enugu Expressway, Awka, PMB 5025, Anambra, Nigeria. 2* Political Science Department, Nnamdi Azikiwe University, Onitsha-Enugu Expressway, Awka, PMB 5025, Anambra, Nigeria. 3* Computer Science Department, Nigerian Army College of Environmental Science and Technology, North-Bank, Makurdi,PMB 102272, Benue, Nigeria Abstract Nigeria president Buhari defeated his closest rival Atiku Abubakar by over 3 million votes. He was issued a Certificate of Return and was sworn in on 29 May 2019. However, there were claims of widespread hoax by the opposition. The sentiment analysis captures the opinions of the masses over social media for global events. In this paper, we use 2019 Nigeria presidential election tweets to perform sentiment analysis through the application of a voting ensemble approach (VEA) in which the predictions from multiple techniques are combined to find the best polarity of a tweet (sentence). This is to determine public views on the 2019 Nigeria Presidential elections and compare them with actual election results. Our sentiment analysis experiment is focused on location-based viewpoints where we used Twitter location data. For this experiment, we live-streamed Nigeria 2019 election tweets via Twitter API to create tweets dataset of 583816 size, pre-processed the data, and applied VEA by utilizing three different Sentiment Classifiers to obtain the choicest polarity of a given tweet. Furthermore, we segmented our tweets dataset into Nigerian states and geopolitical zones, then plotted state-wise and geopolitical-wise user sentiments towards Buhari and Atiku and their political parties. The overall objective of the use of states/geopolitical zones is to evaluate the similarity between the sentiment of location-based tweets compared to actual election results. The results reveal that whereas there are election outcomes that coincide with the sentiment expressed on Twitter social media in most cases as shown by the polarity scores of different locations, there are also some election results where our location analysis similarity test failed. Keywords Nigeria, Election, Sentiment Analysis, Politics, Tweets, Exploration Data Analysis, location data Full Text: https://aircconline.com/ijnlc/V12N1/12123ijnlc01.pdf Volume URL: https://airccse.org/journal/ijnlc/vol12.html
  • 8. Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition Piyush Behre, Sharman Tan, Padma Varadharajan and Shuangyu Chang, Microsoft Corporation Abstract While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi- directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re- punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over- segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEUscore improvement of 0.66 for the downstream task of Machine Translation (MT). Keywords automatic punctuation, automatic speech recognition, re-punctuation, speech segmentation. Full Text: https://aircconline.com/ijnlc/V11N6/11622ijnlc01.pdf Volume URL: http://airccse.org/journal/ijnlc/vol11.html
  • 9. A Robust Three-Stage Hybrid Framework for English to Bangla Transliteration Redwan Ahmed Rizvee, Asif Mahmood, Shakur Shams Mullick and Sajjadul Hakim, Tiger IT Bangladesh Limited, Dhaka, Bangladesh Abstract Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool. Keywords Transliteration framework, phonetic typing, English to Bangla, hybrid framework, THT. Full Text: https://aircconline.com/ijnlc/V11N1/11122ijnlc04.pdf Volume URL: http://airccse.org/journal/ijnlc/vol11.html
  • 10. Analyzing Architectures for Neural Machine Translation using Low Computational Resources Aditya Mandke, Onkar Litake, and Dipali Kadam, SCTR’s Pune Institute of Computer Technology, India Abstract With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints. Keywords Machine Translation, Indic Languages, Natural Language Processing. Full Text: https://aircconline.com/ijnlc/V10N5/10521ijnlc02.pdf Volume URL: http://airccse.org/journal/ijnlc/vol10.html
  • 11. Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique Ikechukwu Onyenwe, Ebele Onyedinma, Chidinma Nwafor and Obinna Agbata, Nnamdi Azikiwe University, Nigeria Abstract Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology has shaped the way we do and manage our businesses. Today, advancements in Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in this paper) easier as it provides convenient platforms to sale/order items through the internet. Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users spend a lot of time and efforts searching for best product deals, products updates and offers on ecommerce websites. Furthermore, they need to filter and compare search results by themselves which takes a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and scraping methods on an e-commerce website to obtain HTML data for identifying products updates based on the current time. These HTML data are preprocessed to extract details of the products such as name, price, post date and time, etc. to serve as useful information for users. Keywords NATURAL LANGUAGE PREPROCESSING (NLP), E-COMMERCE, E-RETAIL, HTML, DATA, Web, Webscrapping Full Text: https://aircconline.com/ijnlc/V10N5/10521ijnlc01.pdf Volume URL: http://airccse.org/journal/ijnlc/vol10.html