ppt for IASNLP.pptx

•Download as PPTX, PDF•

0 likes•13 views

This document summarizes experiments on bilingual semantic text similarity using various word and sentence embedding approaches. It fine-tunes the COMET machine translation evaluation model on 100,000 parallel English-Hindi sentences and compares the results to other sentence encoders like LaBSE, LASER and SBERT using Pearson correlation. LaBSE and SBERT are found to give the best results. Future work could include fine-tuning models on larger data to potentially achieve better performance.

Bilingual Semantic Text Similarity
MT Evaluation
(Bilingual Evaluation)
Submitted By
Binod Kumar Mishra, Pursotam Kumar, Vaibhavi Kailash Kothadi
Mentor by
Ms. Ananya Mukherjee, Dr. Manish Shrivastava
11th Advanced Summer School on NLP
(IASNLP-2022)
IIIT Hyderabad

Contents
• Introduction
• Problem Statement
• Data
• Experiments
• Result & Analysis
• Conclusion & Future Work

1. Introduction
• Bilingual Semantic Text similarity [1]
• Measures the equivalence of meanings between two languages.
• With the bilingual distributed representation of words we try to capture semantic meaning.
• It can be divided into three phases, Preprocessing, Similarity Measure, and Score estimation
• It can be used in various NLP applications like Question-Answer, Machine Translation, and Information Extraction.
• Word level embedding and Sentence level embedding
• Word Level Embedding: Word2Vec, Glove, Fasttext, Elmo, BERT [2-8]
• Sentence Level Embedding: SBERT, LaBSE, LASER [9-11]
• Fine Tune COMET machine translation evaluation model[12-13]
• Pearson correlation using Normalized score and other sentence embedding approaches

2. Problem Statement
• Word level embedding vs Sentence level embedding
• Pre-trained COMET model vs Fine-tune COMET model
• Compare various sentence encoders using Pearson Correlation

3. Data
• We have collected dataset from [14] containing 1,00,000 parallel English and translated Hindi sentences to
fine-tune COMET machine translation model .

4. Experiment
Fig 1: Fine-tune COMET machine translation evaluation model

6. Conclusion & Future Work
In this report, we focus on three important things
• Embedding approach
• Fine tune COMET machine translation evaluation model
• Compare Correlation Pearson value between LaBSE, LASER, SBERT, and COMET. And concluded that LaBSE and SBERT give the
best result.
• The future work is that if we Fine-tune models with huge data then it may be possible that we get a better
result.

References
1. Shajalal, M. and M. Aono, Semantic textual similarity between sentences using bilingual word semantics. Progress in Artificial Intelligence, 2019. 8(2): p. 263-
272.
2. Mikolov, T., et al., Efficient estimation of word representations in vector space. 2013.
3. Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013.
4. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in
natural language processing (EMNLP). 2014.
5. Joulin, A., et al., Fasttext. zip: Compressing text classification models. 2016.
6. Bojanowski, P., et al., Enriching word vectors with subword information. 2017. 5: p. 135-146.
7. Ulčar, M. and M.J.a.p.a. Robnik-Šikonja, High quality ELMo embeddings for seven less-resourced languages. 2019.
8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
9. Reimers, N. and I.J.a.p.a. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks. 2019.
10. Feng, F., et al., Language-agnostic bert sentence embedding. 2020.
11. Artetxe, M. and H.J.T.o.t.A.f.C.L. Schwenk, Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. 2019. 7: p. 597-610.
12. Rei, R., et al., COMET: A neural framework for MT evaluation. 2020.
13. Agirre, E., et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. 2015. Denver, Colorado: Association for
Computational Linguistics.
14. Kunchukuttan, A., et al., Ai4bharat-indicnlp corpus: Monolingual corpora and word embeddings for indic languages. 2020.

Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.

2015-SemEval2015_poster

hpcosta

This document describes an SVM approach called MiniExperts for measuring semantic textual similarity that was submitted to SemEval 2015 Task 2. It extracts various features from text such as part-of-speech tags, lemmatization, named entities, and multiword expressions. It also calculates conceptual, semantic, and distributional similarity measures between sentences. When applied to the SemEval task, it achieved a Pearson correlation of 0.7216 for English and 0.5158 for Spanish, ranking 33rd and 9th respectively out of the submitted systems. Future work will focus on extracting conceptual descriptions from BabelNet to further improve matching of conceptual terms between sentences.

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES

kevig

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES

kevig

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.

Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...

Saurav Jha

In this paper, we present two approaches to automatically classify the gender of blog authors: the first is a manual feature extraction based system incorporating two novel feature classes: variable length character sequence patterns and thirteen new word classes, along with an added class of surface features while the second is a first-ever application of a memory variant of Recurrent Neural Networks, i.e. Bidirectional Long Short Term Memory Networks (BLSTMs) on this task. We use two blog data sets to report our results: the first is a well-explored one used by the previous state-of-the-art models while the other is a 20 times larger corpus. For the first system, we use a voting of machine learning classifiers to obtain an improved accuracy with respect to the previous feature mining systems on the former data set. Using our second approach, we show that the accuracy obtained using such deep LSTMs is comparable to the current state-of-the-art deep learning system for the task of gender classification. Finally, we carry out a comparative study of performance of both the systems on the two data sets.

A-STUDY-ON-SENTIMENT-POLARITY.pdf

SUDESHNASANI1

A hybrid composite features based sentence level sentiment analyzer

IAESIJAI

Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely support vector machines (SVMs), Naive Bayes (NB), logistic regression (LR) and random forest (RF). We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the SVM classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers.

Sentiment Analysis of Bengali text using Gated Recurrent Neural Network

A. Hasib Uddin

Sentiment analysis is a fundamental part of Natural Language Processing. There are numerous works on this topic in English and other languages. However, it is still a comparatively new practice in Bangla. The absence of a suitable Bangla corpus is the primary obstacle for sentiment analysis tasks in Bangla. Nonetheless, Long Short-term Memory (LSTM) is a common technique for resolving sentiments from a dataset containing a large amount of text data. However, Gated Recurrent Unit (GRU) is very efficient for datasets with a low amount of text data. In this manuscript, we present a 5- layered GRU neural network model, each layer comprising of 48 neurons, applied the model on an existing Bangla corpus. We implemented the 10-folds cross-validation approach and repeated the same processes three times. Each time, we considered the averages of the ten validation accuracy and losses and compared the results with the state-of-the-art published outcome (77.85% highest accuracy) for Bi-directional LSTM (BLSTM). The highest accuracies for our model was 78.41%, while the lowest accuracy was 76.34%.

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.

Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...

Saurav Jha

A-STUDY-ON-SENTIMENT-POLARITY.pdf

SUDESHNASANI1

A hybrid composite features based sentence level sentiment analyzer

IAESIJAI

Sentiment Analysis of Bengali text using Gated Recurrent Neural Network

A. Hasib Uddin

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

SHAILENDRA KUMAR SINGH

The rapid growth of internet facilities has increased the comments, posts, blogs, feedback, etc., on a large scale on social networking sites. These social media data are available in an unstructured form, which includes images, text, and videos. The processing of these data is difficult, but some sentiment analysis, information retrieval, and recommender systems are used to process these unstructured data. To extract the opinion and sentiment of internet users from their written social media text, a sentiment analysis system is required to develop, which can work on both monolingual and bilingual phonetic text. Therefore, a sentiment analysis (SA) system is developed, which performs well on different domain datasets. The system performance is tested on four different datasets and achieved better accuracy of 3% on social media datasets, 1.5% on movie reviews, 1.35% on Amazon product reviews, and 4.56% on large Amazon product reviews than the state-of-art techniques. Also, the stemmer (StemVerb) for verbs of the English language is proposed, which improves the SA system’s performance. https://www.researchgate.net/publication/350987101_Classification_of_Code-Mixed_Bilingual_Phonetic_Text_Using_Sentiment_Analysis

Sentiment analysis on Bangla conversation using machine learning approach

IJECEIAES

Nowadays, online communication is more convenient and popular than faceto-face conversation. Therefore, people prefer online communication over face-to-face meetings. Enormous people use online chatting systems to speak with their loved ones at any given time throughout the world. People create massive quantities of conversation every second because of their online engagement. People's feelings during the conversation period can be gleaned as useful information from these conversations. Text analysis and conclusion of any material as summarization can be done using sentiment analysis by natural language processing. The use of communication for customer service portals in various e-commerce platforms and crime investigations based on digital evidence is increasing the need for sentiment analysis of a conversation. Other languages, such as English, have welldeveloped libraries and resources for natural language processing, yet there are few studies conducted on Bangla. It is more challenging to extract sentiments from Bangla conversational data due to the language's grammatical complexity. As a result, it opens vast study opportunities. So, support vector machine, multinomial naïve Bayes, k-nearest neighbors, logistic regression, decision tree, and random forest was used. From the dataset, extracted information was labeled as positive and negative.

NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...

IJECEIAES

This document summarizes a research paper that proposes a lexicon-based method called NBLex to predict emotions in Kannada-English code-switch text. The paper describes creating a Kannada-English code-switch corpus by collecting comments from YouTube and annotating them. A Naive Bayes classifier is used to build the NBLex lexicon and predict emotions. Evaluation shows the NBLex method achieves 87.9% accuracy for emotion prediction, outperforming Naive Bayes and BiLSTM baselines. The paper concludes that a simple additive lexicon approach can be an alternative to machine learning for emotion prediction in code-switch text.

IRJET- Short-Text Semantic Similarity using Glove Word Embedding

IRJET Journal

The document describes a study that uses GloVe word embeddings to measure semantic similarity between short texts. GloVe is an unsupervised learning algorithm for obtaining vector representations of words. The study trains GloVe word embeddings on a large corpus, then uses the embeddings to encode short texts and calculate their semantic similarity, comparing the accuracy to methods that use Word2Vec embeddings. It aims to show that GloVe embeddings may provide better performance for short text semantic similarity tasks.

Text Mining for Lexicography

Leiden University

Suzan Verberne gave a workshop on using text mining for lexicography. She discussed using word embeddings to help discover and select new lemmas for dictionaries. Word2Dict is a lexicographic tool that uses word embeddings to present words semantically related to the lemma being described. Word embeddings learn dense vector representations of words by predicting words in context using neural networks, improving on the traditional sparse vector space model. Word embeddings can be trained using the Word2Vec algorithm and analyzed using the Gensim Python package to gain linguistic insights and improve natural language processing applications.

Cross-domain sentiment analysis of the natural Romanian language

ICDEcCnferenece

This document presents research on cross-domain sentiment analysis of the Romanian language. The researcher compiled a multi-domain Romanian corpus from reviews and evaluated popular sentiment analysis models including decision trees, logistic regression, support vector machines, naive Bayes, recurrent neural networks, and BERT. BERT achieved the best performance with 93% accuracy, outperforming models that used translated text. The models demonstrated robustness when trained and tested on a Romanian BERT model. The research aims to advance sentiment analysis for the under-resourced Romanian language.

Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...

IJECEIAES

Learning assessment deals with the process of making a decision on the quality or performance of student achievement in a number of competency standards. In the process, teacher’s preferences are provided through both test and non-test, generally in a numeric value, from which the final results are then converted into letters or linguistic value. In the proposed model, linguistic variables are exploited as a form of teacher’s preferences in nontest techniques. Consequently, the assessment data set will consist of numerical and linguistic information, so it requires a method to unify them to obtain the final value. A model that uses the 2-tuple linguistic approach and based on matrix operations is proposed to solve the problem. This study proposed a new procedure that consists of four stages: preprocessing, transformation, aggregation and exploitation. The final result is presented in 2-tuple linguistic representation and its equivalent number, accompanied by a description of the achievement of each competency. The α value of 2-tuple linguistic in the final result and in the description of each competency becomes meaningful information that can be interpreted as a comparative ability one student has related to other students, and shows how much potential is achieved to reach higher ranks. The proposed model contributes to enrich the learning assessment techniques, since the exploitation of linguistic variable as representation preferences provides flexible space for teachers in their assessments. Moreover, using the result with respect to students’ levels of each competency, students’ mastery of each attribute can be diagnosed and their progress of learning can be estimated.

A Review Of Text Mining Techniques And Applications

Lisa Graves

This document provides a review of various text mining techniques and applications. It discusses techniques used for text classification and summarization, including Naive Bayes classification, backpropagation neural networks, keyword matching, and information extraction. It also covers applications of text mining in areas like sentiment analysis of social media posts and hotel reviews. Finally, it discusses the need for organizational text mining to extract useful information and insights from large amounts of unstructured text data.

Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds

idescitation

This paper presents a supervised machine learning approach that uses a decision tree learning algorithm for recognition of Bengali noun-noun compounds as multiword expression (M WE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using chunk information and various heuristic rules and (2) training the machine learning algorithm to recognize a candidate multi-word expression as Multi-word expression or not. A variety of association measures have been used as features for identifying MWEs. The proposed system is tested on a Bengali corpus for identifying noun-noun compound MWEs from the corpus.

Continuous bag of words cbow word2vec word embedding work .pdf

devangmittal4

Continuous bag of words (cbow) word2vec word embedding work is that it tends to predict the probability of a word given a context. A context may be a single word or a group of words. But for simplicity, I will take a single context word and try to predict a single target word. The purpose of this question is to be able to create a word embedding for the given data set. data set text: In linguistics word embeddings were discussed in the research area of distributional semantics. It aims to quantify and categorize semantic similarities between linguistic items based on their distributional properties in large samples of language data. The underlying idea that "a word is characterized by the company it keeps" was popularized by Firth. The technique of representing words as vectors has roots in the 1960s with the development of the vector space model for information retrieval. Reducing the number of dimensions using singular value decomposition then led to the introduction of latent semantic analysis in the late 1980s.In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language models" to reduce the high dimensionality of words representations in contexts by "learning a distributed representation for words". (Bengio et al, 2003). Word embeddings come in two different styles, one in which words are expressed as vectors of co-occurring words, and another in which words are expressed as vectors of linguistic contexts in which the words occur; these different styles are studied in (Lavelli et al, 2004). Roweis and Saul published in Science how to use "locally linear embedding" (LLE) to discover representations of high dimensional data structures. The area developed gradually and really took off after 2010, partly because important advances had been made since then on the quality of vectors and the training speed of the model. There are many branches and many research groups working on word embeddings. In 2013, a team at Google led by Tomas Mikolov created word2vec, a word embedding toolkit which can train vector space models faster than the previous approaches. Most new word embedding techniques rely on a neural network architecture instead of more traditional n-gram models and unsupervised learning. Limitations One of the main limitations of word embeddings (word vector space models in general) is that possible meanings of a word are conflated into a single representation (a single vector in the semantic space). Sense embeddings are a solution to this problem: individual meanings of words are represented as distinct vectors in the space. For biological sequences: BioVectors Word embeddings for n-grams in biological sequences (e.g. DNA, RNA, and Proteins) for bioinformatics applications have been proposed by Asgari and Mofrad. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representa.

French machine reading for question answering

Ali Kabbadj

This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.

Journal of Computer Science Research | Vol.1, Iss.3

Bilingual Publishing Group

● A Foreword from the Editor-in-Chief https://doi.org/10.30564/jcsr.v1i3.1466 ● Discussion on Innovation of Seasoning under the Background of “Internet +” https://doi.org/10.30564/jcsr.v1i3.1264 ● Evaluating Word Similarity Measure of Embeddings Through Binary Classification https://doi.org/10.30564/jcsr.v1i3.1268 ● Performance Evaluation of Reactive Routing Protocols in MANETs in Association with TCP Newreno https://doi.org/10.30564/jcsr.v1i3.1441

Multi-modal NLP Systems in Healthcare

Jekaterina Novikova, PhD

- Linguistic and acoustic features are extracted from speech and language using both hand-crafted and automatic methods. Hand-crafted features include pauses and parts of speech while automatic features use models like BERT. - Models are developed for tasks like detecting cognitive impairment across languages and in low-resource settings. Semi-supervised models are created to handle limited labeled data. Models also aim to remove bias from non-clinical factors like age. - Quality assurance examines the effects of errors like those from automatic speech recognition systems. The impact of heterogeneous, multi-task datasets is also analyzed to improve model performance for detection tasks.

INFT2060 Applied Artificial Intelligence.docx

write4

This document provides an overview of natural language processing (NLP) and discusses some of its applications. It describes how NLP uses machine translation to translate between languages using techniques like bilingual and multilingual translation systems. It also discusses how question answering systems in NLP analyze natural language questions to generate answers and how text summarization automatically creates short summaries of longer documents to present the most relevant information concisely.

Machine Learning Techniques with Ontology for Subjective Answer Evaluation

ijnlc

Computerized Evaluation of English Essays is performed using Machine learning techniques like Latent Semantic Analysis (LSA), Generalized LSA, Bilingual Evaluation Understudy and Maximum Entropy. Ontology, a concept map of domain knowledge, can enhance the performance of these techniques. Use of Ontology makes the evaluation process holistic as presence of keywords, synonyms, the right word combination and coverage of concepts can be checked. In this paper, the above mentioned techniques are implemented both with and without Ontology and tested on common input data consisting of technical answers of Computer Science. Domain Ontology of Computer Graphics is designed and developed. The software used for implementation includes Java Programming Language and tools such as MATLAB, Protégé, etc. Ten questions from Computer Graphics with sixty answers for each question are used for testing. The results are analyzed and it is concluded that the results are more accurate with use of Ontology.

Eurocall 2014 titova

Moscow State University

This document summarizes a research project that studied the use of student response systems (SRS) to support mobile learning in university lectures. The study found that incorporating SRS-based formative assessments and collaborative activities into lectures improved student engagement and academic performance. Students reported that the SRS-supported approach prepared them well and helped them understand content. The researcher suggests further exploring new interactive mobile-enabled activities and assessment models to enhance learning beyond traditional boundaries.

IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...

IRJET Journal

This document summarizes research on named entity recognition for the Hindi language. It discusses various techniques that have been used for NER in Hindi, including rule-based approaches, machine learning approaches like hidden Markov models and conditional random fields, and hybrid approaches. The document also reviews several papers on Hindi NER systems that have used techniques like rule-based methods, list lookup, hidden Markov models, joint parsing with POS tagging, and distant supervision. Most studies found that machine learning approaches like hidden Markov models and conditional random fields produced relatively high accuracy, though performance could likely be improved further with larger annotated corpora.

Natural Language Processing Through Different Classes of Machine Learning

csandit

This document summarizes several papers on using different classes of machine learning for natural language processing tasks. It discusses supervised learning approaches for semantic orientation analysis and sentiment analysis. It also covers unsupervised learning approaches like Turney's work using semantic association to determine semantic orientation. Finally, it discusses semi-supervised learning and its ability to use both labeled and unlabeled data to help with NLP tasks on large, unprocessed datasets from the growing internet.

Evaluating sentiment analysis and word embedding techniques on Brexit

IAESIJAI

In this study, we investigate the effectiveness of pre-trained word embeddings for sentiment analysis on a real-world topic, namely Brexit. We compare the performance of several popular word embedding models such global vectors for word representation (GloVe), FastText, word to vec (word2vec), and embeddings from language models (ELMo) on a dataset of tweets related to Brexit and evaluate their ability to classify the sentiment of the tweets as positive, negative, or neutral. We find that pre-trained word embeddings provide useful features for sentiment analysis and can significantly improve the performance of machine learning models. We also discuss the challenges and limitations of applying these models to complex, real-world texts such as those related to Brexit.

Cross language information retrieval in indian

eSAT Publishing House

IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology

MARY JANE WILSON, A “BOA MÃE” .

Colégio Santa Teresinha

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP

RAHUL

This Dissertation explores the particular circumstances of Mirzapur, a region located in the core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal environment for investigating the changes in vegetation cover dynamics. Our study utilizes advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to analyze the transformations that have taken place over the course of a decade. The complex relationship between human activities and the environment has been the focus of extensive research and worry. As the global community grapples with swift urbanization, population expansion, and economic progress, the effects on natural ecosystems are becoming more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for these activities. As the most crucial natural resource, its utilization by humans results in different 'Land uses,' which are determined by both human activities and the physical characteristics of the land. The utilization of land is impacted by human needs and environmental factors. In countries like India, rapid population growth and the emphasis on extensive resource exploitation can lead to significant land degradation, adversely affecting the region's land cover. Therefore, human intervention has significantly influenced land use patterns over many centuries, evolving its structure over time and space. In the present era, these changes have accelerated due to factors such as agriculture and urbanization. Information regarding land use and cover is essential for various planning and management tasks related to the Earth's surface, providing crucial environmental data for scientific, resource management, policy purposes, and diverse human activities. Accurate understanding of land use and cover is imperative for the development planning of any area. Consequently, a wide range of professionals, including earth system scientists, land and water managers, and urban planners, are interested in obtaining data on land use and cover changes, conversion trends, and other related patterns. The spatial dimensions of land use and cover support policymakers and scientists in making well-informed decisions, as alterations in these patterns indicate shifts in economic and social conditions. Monitoring such changes with the help of Advanced technologies like Remote Sensing and Geographic Information Systems is crucial for coordinated efforts across different administrative levels. Advanced technologies like Remote Sensing and Geographic Information Systems 9 Changes in vegetation cover refer to variations in the distribution, composition, and overall structure of plant communities across different temporal and spatial scales. These changes can occur natural.

Similar to ppt for IASNLP.pptx

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

SHAILENDRA KUMAR SINGH

Sentiment analysis on Bangla conversation using machine learning approach

IJECEIAES

NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...

IJECEIAES

IRJET- Short-Text Semantic Similarity using Glove Word Embedding

IRJET Journal

Text Mining for Lexicography

Leiden University

Cross-domain sentiment analysis of the natural Romanian language

ICDEcCnferenece

Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...

IJECEIAES

A Review Of Text Mining Techniques And Applications

Lisa Graves

Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds

idescitation

Continuous bag of words cbow word2vec word embedding work .pdf

devangmittal4

French machine reading for question answering

Ali Kabbadj

Journal of Computer Science Research | Vol.1, Iss.3

Bilingual Publishing Group

Multi-modal NLP Systems in Healthcare

Jekaterina Novikova, PhD

INFT2060 Applied Artificial Intelligence.docx

write4

Machine Learning Techniques with Ontology for Subjective Answer Evaluation

ijnlc

Eurocall 2014 titova

Moscow State University

IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...

IRJET Journal

Natural Language Processing Through Different Classes of Machine Learning

csandit

Evaluating sentiment analysis and word embedding techniques on Brexit

IAESIJAI

Cross language information retrieval in indian

eSAT Publishing House

Similar to ppt for IASNLP.pptx (20)

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

Sentiment analysis on Bangla conversation using machine learning approach

NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...

IRJET- Short-Text Semantic Similarity using Glove Word Embedding

Text Mining for Lexicography

Cross-domain sentiment analysis of the natural Romanian language

Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...

A Review Of Text Mining Techniques And Applications

Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds

Continuous bag of words cbow word2vec word embedding work .pdf

French machine reading for question answering

Journal of Computer Science Research | Vol.1, Iss.3

Multi-modal NLP Systems in Healthcare

INFT2060 Applied Artificial Intelligence.docx

Machine Learning Techniques with Ontology for Subjective Answer Evaluation

Eurocall 2014 titova

IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...

Natural Language Processing Through Different Classes of Machine Learning

Evaluating sentiment analysis and word embedding techniques on Brexit

Cross language information retrieval in indian

Recently uploaded

MARY JANE WILSON, A “BOA MÃE” .

Colégio Santa Teresinha

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP

RAHUL

Walmart Business+ and Spark Good for Nonprofits.pdf

TechSoup

"Learn about all the ways Walmart supports nonprofit organizations. You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money. The webinar may also give some examples on how nonprofits can best leverage Walmart Business+. The event will cover the following:: Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping. Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders. Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates. Answers about how you can do more with Walmart!"

RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3

IreneSebastianRueco1

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx

mulvey2

বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf

eBook.com.bd (প্রয়োজনীয় বাংলা বই)

বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত .. আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ... তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...। বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...

How to Add Chatter in the odoo 17 ERP Module

Celine George

Advanced Java[Extra Concepts, Not Difficult].docx

adhitya5119

Digital Artifact 1 - 10VCD Environments Unit

chanes7

Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion

TechSoup

How to Manage Your Lost Opportunities in Odoo 17 CRM

Celine George

writing about opinions about Australia the movie

Nicholas Montgomery

Digital Artefact 1 - Tiny Home Environmental Design

amberjdewit93

PIMS Job Advertisement 2024.pdf Islamabad

AyyanKhan40

How to Setup Warehouse & Location in Odoo 17 Inventory

Celine George

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...

PECB

Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency. Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor. His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects. What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results. Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment. Date: May 29, 2024 Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR ------------------------------------------------------------------------------- Find out more about ISO training and certification services Training: ISO/IEC 27001 Information Security Management System - EN | PECB ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB General Data Protection Regulation (GDPR) - Training Courses - EN | PECB Webinars: https://pecb.com/webinars Article: https://pecb.com/article ------------------------------------------------------------------------------- For more information about PECB: Website: https://pecb.com/ LinkedIn: https://www.linkedin.com/company/pecb/ Facebook: https://www.facebook.com/PECBInternational/ Slideshare: http://www.slideshare.net/PECBCERTIFICATION

How to Fix the Import Error in the Odoo 17

Celine George

Film vocab for eal 3 students: Australia the movie

Nicholas Montgomery

ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf

Priyankaranawat4

BBR 2024 Summer Sessions Interview Training

Katrina Pritchard

Recently uploaded (20)

MARY JANE WILSON, A “BOA MÃE” .

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP

Walmart Business+ and Spark Good for Nonprofits.pdf

RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx

বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf

How to Add Chatter in the odoo 17 ERP Module

Advanced Java[Extra Concepts, Not Difficult].docx

Digital Artifact 1 - 10VCD Environments Unit

Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion

How to Manage Your Lost Opportunities in Odoo 17 CRM

writing about opinions about Australia the movie

Digital Artefact 1 - Tiny Home Environmental Design

PIMS Job Advertisement 2024.pdf Islamabad

How to Setup Warehouse & Location in Odoo 17 Inventory

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...

How to Fix the Import Error in the Odoo 17

Film vocab for eal 3 students: Australia the movie

ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf

BBR 2024 Summer Sessions Interview Training

ppt for IASNLP.pptx

1. Bilingual Semantic Text Similarity MT Evaluation (Bilingual Evaluation) Submitted By Binod Kumar Mishra, Pursotam Kumar, Vaibhavi Kailash Kothadi Mentor by Ms. Ananya Mukherjee, Dr. Manish Shrivastava 11th Advanced Summer School on NLP (IASNLP-2022) IIIT Hyderabad

2. Contents • Introduction • Problem Statement • Data • Experiments • Result & Analysis • Conclusion & Future Work

3. 1. Introduction • Bilingual Semantic Text similarity [1] • Measures the equivalence of meanings between two languages. • With the bilingual distributed representation of words we try to capture semantic meaning. • It can be divided into three phases, Preprocessing, Similarity Measure, and Score estimation • It can be used in various NLP applications like Question-Answer, Machine Translation, and Information Extraction. • Word level embedding and Sentence level embedding • Word Level Embedding: Word2Vec, Glove, Fasttext, Elmo, BERT [2-8] • Sentence Level Embedding: SBERT, LaBSE, LASER [9-11] • Fine Tune COMET machine translation evaluation model[12-13] • Pearson correlation using Normalized score and other sentence embedding approaches

4. 2. Problem Statement • Word level embedding vs Sentence level embedding • Pre-trained COMET model vs Fine-tune COMET model • Compare various sentence encoders using Pearson Correlation

5. 3. Data • We have collected dataset from [14] containing 1,00,000 parallel English and translated Hindi sentences to fine-tune COMET machine translation model .

6. 4. Experiment Fig 1: Fine-tune COMET machine translation evaluation model

7. 5. Result & Analysis

8. 5. Result & Analysis (Cont.)

9. 5. Result & Analysis (Cont.)

10. 6. Conclusion & Future Work In this report, we focus on three important things • Embedding approach • Fine tune COMET machine translation evaluation model • Compare Correlation Pearson value between LaBSE, LASER, SBERT, and COMET. And concluded that LaBSE and SBERT give the best result. • The future work is that if we Fine-tune models with huge data then it may be possible that we get a better result.

11. References 1. Shajalal, M. and M. Aono, Semantic textual similarity between sentences using bilingual word semantics. Progress in Artificial Intelligence, 2019. 8(2): p. 263- 272. 2. Mikolov, T., et al., Efficient estimation of word representations in vector space. 2013. 3. Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013. 4. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. 5. Joulin, A., et al., Fasttext. zip: Compressing text classification models. 2016. 6. Bojanowski, P., et al., Enriching word vectors with subword information. 2017. 5: p. 135-146. 7. Ulčar, M. and M.J.a.p.a. Robnik-Šikonja, High quality ELMo embeddings for seven less-resourced languages. 2019. 8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 9. Reimers, N. and I.J.a.p.a. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks. 2019. 10. Feng, F., et al., Language-agnostic bert sentence embedding. 2020. 11. Artetxe, M. and H.J.T.o.t.A.f.C.L. Schwenk, Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. 2019. 7: p. 597-610. 12. Rei, R., et al., COMET: A neural framework for MT evaluation. 2020. 13. Agirre, E., et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. 2015. Denver, Colorado: Association for Computational Linguistics. 14. Kunchukuttan, A., et al., Ai4bharat-indicnlp corpus: Monolingual corpora and word embeddings for indic languages. 2020.

12. THANK YOU

ppt for IASNLP.pptx

Recommended

Recommended

More Related Content

Similar to ppt for IASNLP.pptx

Similar to ppt for IASNLP.pptx (20)

Recently uploaded

Recently uploaded (20)

ppt for IASNLP.pptx