In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let’s say, the "lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Detection and Privacy Preservation of Sensitive Attributes Using Hybrid Appro...rahulmonikasharma
Privacy Preserving Record Linkage (PPRL) is a major area of database research which entangles in colluding huge multiple heterogeneous data sets with disjunctive or additional information about an entity while veiling its private information. This paper gives an enhanced algorithm for merging two datasets using Sorted Neighborhood Deterministic approach and an improved Preservation algorithm which makes use of automatic selection of sensitive attributes and pattern mining over dynamic queries. We guarantee strong privacy, less computational complexity and scalability and address the legitimate concerns over data security and privacy with our approach.
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGSijscai
Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their
corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed
in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers
can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between
the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis
using deep learning on Amazon.com product review data. Product reviews were converted to vectors using
paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our
model incorporated both semantic relationship of review text and product information. We also developed a
web service application that predicts the rating score for a submitted review using the trained model and if
there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the
reviewer.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Detection and Privacy Preservation of Sensitive Attributes Using Hybrid Appro...rahulmonikasharma
Privacy Preserving Record Linkage (PPRL) is a major area of database research which entangles in colluding huge multiple heterogeneous data sets with disjunctive or additional information about an entity while veiling its private information. This paper gives an enhanced algorithm for merging two datasets using Sorted Neighborhood Deterministic approach and an improved Preservation algorithm which makes use of automatic selection of sensitive attributes and pattern mining over dynamic queries. We guarantee strong privacy, less computational complexity and scalability and address the legitimate concerns over data security and privacy with our approach.
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGSijscai
Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their
corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed
in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers
can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between
the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis
using deep learning on Amazon.com product review data. Product reviews were converted to vectors using
paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our
model incorporated both semantic relationship of review text and product information. We also developed a
web service application that predicts the rating score for a submitted review using the trained model and if
there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the
reviewer.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Sentiment analysis focuses on analyzing web documents, especially user-generated content such as product
reviews, to identify opinionated documents, sentences and opinion holders. Most of the time classifiers trained in one
domain do not perform well in another domain. The existing approaches do not detect sentiment and topics
simultaneously. Sentiments may differ with topics. Our proposed model called Joint Sentiment Topic (JST) model to
detect sentiments and topics simultaneously from text. This model is based on Gibbs sampling algorithm. Besides,
unlike supervised approaches to opinion mining which often fail to produce good performance when shifting to other
domains, the semi-supervised nature of JST makes it highly portable to other domains. JST model performs better
when compared to existing supervised approaches.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
Question Answering (QA) system is one of the ever growing applications in Natural Language Processing. The purpose of Automatic Question and Answer Generation system is to generate all possible questions and its relevant answers from a given unstructured document. Complex sentences are simplified to make question generation easier. The accuracy of the generated questions is measured by identifying the subtopics from the text using Gaussian Mixture Neural Topic Model (GMNTM).The similarity between generated questions and text are calculated using Extended String Subsequence Kernel (ESSK). The syntactic correctness of the questions is measured by Syntactic Tree Kernel which computes the similarity scores between each sentence in the given context and generated questions. Based on the similarity score, questions are ranked. The answers for the generated questions are extracted using Pattern Matching Approach. This system is expected to produce better accuracy when compared with the system using Latent Dirichlet Allocation (LDA) for subtopic identification.
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
1 aedl - iywinc lite block presentation march 2015 engMichael Walsh
New Patented Super LED Hi/Low-Bay LED Lights.
130 lm P/W, Life 150,000+ hr. life.
Operating temperature: -45 Deg C to +50 Deg C.
IP 68. Our 140 Watt LED will replace a 400 Watt HID.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Sentiment analysis focuses on analyzing web documents, especially user-generated content such as product
reviews, to identify opinionated documents, sentences and opinion holders. Most of the time classifiers trained in one
domain do not perform well in another domain. The existing approaches do not detect sentiment and topics
simultaneously. Sentiments may differ with topics. Our proposed model called Joint Sentiment Topic (JST) model to
detect sentiments and topics simultaneously from text. This model is based on Gibbs sampling algorithm. Besides,
unlike supervised approaches to opinion mining which often fail to produce good performance when shifting to other
domains, the semi-supervised nature of JST makes it highly portable to other domains. JST model performs better
when compared to existing supervised approaches.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
Question Answering (QA) system is one of the ever growing applications in Natural Language Processing. The purpose of Automatic Question and Answer Generation system is to generate all possible questions and its relevant answers from a given unstructured document. Complex sentences are simplified to make question generation easier. The accuracy of the generated questions is measured by identifying the subtopics from the text using Gaussian Mixture Neural Topic Model (GMNTM).The similarity between generated questions and text are calculated using Extended String Subsequence Kernel (ESSK). The syntactic correctness of the questions is measured by Syntactic Tree Kernel which computes the similarity scores between each sentence in the given context and generated questions. Based on the similarity score, questions are ranked. The answers for the generated questions are extracted using Pattern Matching Approach. This system is expected to produce better accuracy when compared with the system using Latent Dirichlet Allocation (LDA) for subtopic identification.
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
1 aedl - iywinc lite block presentation march 2015 engMichael Walsh
New Patented Super LED Hi/Low-Bay LED Lights.
130 lm P/W, Life 150,000+ hr. life.
Operating temperature: -45 Deg C to +50 Deg C.
IP 68. Our 140 Watt LED will replace a 400 Watt HID.
An New Attractive Mage Technique Using L-Diversity mlaij
Data that is published or shared between organizations contain private information about an individual. The concept of Privacy Preservation aims to preserve this sensitive information from various privacy threats that violate the privacy of an individual. Analysis of this private information could reveal information that can be used for malicious purposes by the attackers. Anonymization is a privacy preservation approach suitable for mixed data that contains both numerical and categorical attributes. In this paper a novel method called Micro-aggregation Generalization (MAGE) is used for anonymization of microdata that can retain more semantics of the original data. Here the Micro-aggregation is applied over the numerical data and Generalization is applied over the categorical data. Even though the MAGE approach preserves privacy it fails to address the homogeneity and background knowledge attacks. Later the l-diversity approach is applied to deal with homogeneity attack. In l-diversity, the anonymized records are reordered to satisfy a new privacy principle that removes homogeneity of sensitive information. The result shows that the MAGE approach suffers from homogeneity attack and applying l-diversity over MAGE prevents homogeneity attack and also provides better privacy and data utility.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has supplied a large volume of data to many fields. The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. In as much as the data achieving from microarray technology is very noisy and also has thousands of features, feature selection plays an important role in removing irrelevant and redundant features and also reducing computational complexity. There are two important approaches for gene selection in microarray data analysis, the filters and the wrappers. To select a concise subset of informative genes, we introduce a hybrid feature selection which combines two approaches. The fact of the matter is that candidate’s features are first selected from the original set via several effective filters. The candidate feature set is further refined by more accurate wrappers. Thus, we can take advantage of both the filters and wrappers. Experimental results based on 11 microarray datasets show that our mechanism can be effected with a smaller feature set. Moreover, these feature subsets can be obtained in a reasonable time.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...IJECEIAES
Emotion analysis is a process of identifying the human emotions derived from the various data sources. Emotions can be expressed either in monolingual text or code-switch text. Emotion prediction can be performed through machine learning (ML), or deep learning (DL), or lexicon-based approach. ML and DL approaches are computationally expensive and require training data. Whereas, the lexicon-based approach does not require any training data and it takes very less time to predict the emotions in comparison with ML and DL. In this paper, we proposed a lexicon-based method called non-binding lower extremity exoskeleton (NBLex) to predict the emotions associated with Kannada-English code-switch text that no one has addressed till now. We applied the One-vs-Rest approach to generate the scores for lexicon and also to predict the emotions from the code-switch text. The accuracy of the proposed model NBLex (87.9%) is better than naïve bayes (NB) (85.8%) and bidirectional long short-term memory neural network (BiLSTM) (84.7%) and for true positive rate (TPR), the NBLex (50.6%) is better than NB (37.0%) and BiLSTM (42.2%). From our approach, it is observed that a simple additive model (lexicon approach) can also be an alternative model to predict the emotions in code-switch text.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Enhanced sentiment analysis based on improved word embeddings and XGboost IJECEIAES
Sentiment analysis is a well-known and rapidly expanding study topic in natural language processing (NLP) and text classification. This approach has evolved into a critical component of many applications, including politics, business, advertising, and marketing. Most current research focuses on obtaining sentiment features through lexical and syntactic analysis. Word embeddings explicitly express these characteristics. This article proposes a novel method, improved words vector for sentiments analysis (IWVS), using XGboost to improve the F1-score of sentiment classification. The proposed method constructed sentiment vectors by averaging the word embeddings (Sentiment2Vec). We also investigated the Polarized lexicon for classifying positive and negative sentiments. The sentiment vectors formed a feature space to which the examined sentiment text was mapped to. Those features were input into the chosen classifier (XGboost). We compared the F1-score of sentiment classification using our method via different machine learning models and sentiment datasets. We compare the quality of our proposition to that of baseline models, term frequency-inverse document frequency (TF-IDF) and Doc2vec, and the results show that IWVS performs better on the F1-measure for sentiment classification. At the same time, XGBoost with IWVS features was the best model in our evaluation.
Sentimental analysis is a context based mining of text, which extracts and identify subjective information from a text or sentence provided. Here the main concept is extracting the sentiment of the text using machine learning techniques such as LSTM Long short term memory . This text classification method analyses the incoming text and determines whether the underlined emotion is positive or negative along with probability associated with that positive or negative statements. Probability depicts the strength of a positive or negative statement, if the probability is close to zero, it implies that the sentiment is strongly negative and if probability is close to1, it means that the statement is strongly positive. Here a web application is created to deploy this model using a Python based micro framework called flask. Many other methods, such as RNN and CNN, are inefficient when compared to LSTM. Dirash A R | Dr. S K Manju Bargavi "LSTM Based Sentiment Analysis" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42345.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/42345/lstm-based-sentiment-analysis/dirash-a-r
Evaluating sentiment analysis and word embedding techniques on BrexitIAESIJAI
In this study, we investigate the effectiveness of pre-trained word embeddings for sentiment analysis on a real-world topic, namely Brexit. We compare the performance of several popular word embedding models such global vectors for word representation (GloVe), FastText, word to vec (word2vec), and embeddings from language models (ELMo) on a dataset of tweets related to Brexit and evaluate their ability to classify the sentiment of the tweets as positive, negative, or neutral. We find that pre-trained word embeddings provide useful features for sentiment analysis and can significantly improve the performance of machine learning models. We also discuss the challenges and limitations of applying these models to complex, real-world texts such as those related to Brexit.
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
The task of sentiment analysis of reviews is carried out using manually built / automatically generated
lexicon resources of their own with which terms are matched with lexicon to compute the term count for
positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other
lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The
polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates
the strength/weight of the word with that sentiment orientation. In this paper, we show that using the
Sentiwordnet, how we could enhance the performance of the classification at both sentence and document
level.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A novel meta-embedding technique for drug reviews sentiment analysisIAESIJAI
Traditional word embedding models have been used in the feature extraction process of deep learning models for sentiment analysis. However, these models ignore the sentiment properties of words while maintaining the contextual relationships and have inadequate representation for domain-specific words. This paper proposes a method to develop a meta embedding model by exploiting domain sentiment polarity and adverse drug reaction (ADR) features to render word embedding models more suitable for medical sentiment analysis. The proposed lexicon is developed from the medical blogs corpus. The polarity scores of the existing lexicons are adjusted to assign new polarity score to each word. The neural network model utilizes sentiment lexicons and ADR in learning refined word embedding. The refined embedding obtained from the proposed approach is concatenated with original word vectors, lexicon vectors, and ADR feature to form a meta-embedding model which maintains both contextual and sentimental properties. The final meta-embedding acts as a feature extractor to assess the effectiveness of the model in drug reviews sentiment analysis. The experiments are conducted on global vectors (GloVE) and skip-gram word2vector (Word2Vec) models. The empirical results demonstrate the proposed meta-embedding model outperforms traditional word embedding in different performance measures.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
1. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
DOI:10.5121/mlaij.2016.3206 65
ANALYSIS OF OPINIONATED TEXT FOR OPINION
MINING
K Paramesha1
and K C Ravishankar2
1
Department of Computer Engineering, Vidyavardhaka College of Engg., Mysuru, India
2
Department of Computer Engineering, Government Engineering College, Hassan, India
ABSTRACT
In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess
the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the
object/feature is a noun representation which refers to a product or a component of a product, let’s say, the
"lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words
themselves. Apart from such words, other meta-information and diverse effective features are also going to
play an important role in influencing the sentiment polarity and contribute significantly to the performance
of the system. In this paper, some of the associated information/meta-data are explored and investigated in
the sentiment text. Based on the analysis results presented here, there is scope for further assessment and
utilization of the meta-information as features in text categorization, ranking text document, identification
of spam documents and polarity classification problems.
KEYWORDS
Sentiment Analysis, Features, Emotions, Word Sense Disambiguation, Sentiment Lexicons, Meta-
Information.
1. INTRODUCTION
Sentiment Analysis (SA) is a multifaceted problem [1, 2], which needs thorough understanding of
the sentiment text syntactically and semantically [3] to be able to judge the sentiment polarities.
Being a classification task, the machine learning approaches map sentiment text into feature
vector which requires the extraction of several types of discriminating features. The meta-
information associated with the text are also explored and exploited as features in many machine
learning approaches [4, 5, 6, 7, 8]. In a paper presented by [9], highlights the explicit and implicit
features beyond words themselves. For example, word's part-of-speech (POS) and its position
considered to be the explicit one. The similarity score of a word with the polarity class seed word
"good" is an implicit one. In the papers presented by [10, 11, 12], different types of features such
as word features, structure features, sentence features etc. were explored. In fact there has been a
broad range of features and methods [13, 14] that many researchers have been investigating and
exploiting new ones for the sentiment classification and to enhance the performance of the
system.
2. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
66
2. DATASET
The dataset [15] on which we analysed, consists of 294 reviews sampled across the domains
books, dvds, music, electronics and videogames and has document sentiment categories positive,
neutral and negative. The document reviews are annotated with the categories POS, NEG, NEU,
MIX, and NR. The MIX and NR are combined with NEU and reduced to three classes. The
document and sentence distribution are as shown in the Table 1.
Table 1. Review documents and sentences distribution.
Category
Documents Sentences
POS NEG NEU Total POS NEG NEU Total
Books 19 20 20 59 160 195 384 739
Dvds 19 20 20 59 164 264 371 799
Electronics 19 19 19 57 161 240 227 628
Music 20 20 19 59 183 179 276 638
Videogames 20 20 20 60 255 442 335 1,032
Total 97 99 98 294 923 1,320 1,593 3,836
3. EXPERIMENT SETUP
For computing the WordNet domains of words in the review sentences in which each word has
many senses across POS in the WordNet, it is appropriate to find the right senses of words based
on which WordNet domains are assigned rather than arbitrarily assign the WordNet domains
based on the POS. The word sense disambiguation (WSD) is performed for each sentence using
the services offered by the idilia1
(http://www.idilia.com). To get our text sense tagged, java
application is implemented to communicate through REST services. To derive the vectors
representing the six types of emotions for each document, similarity scores of the words with all
the emotions types are computed with the aid of WordNet based word-similarity algorithms.
4. FEATURES
To aid the SA using machine learning approaches, several associated linguistic and statistical
features at sentence level and document level could be integrated [16, 17]. Many researches have
been conducted to evaluate the performance of the features but a holistic feature set for efficient
and high performance SA is hard to find. The aim of the work is to explore features and
techniques in the input text data, which can contribute to the performance of the system.
4.1. Inter-Sentence Coherence
In a review document consisting of several sentences, the polarity of the document could be
determined as the polarity with maximum votes of all the sentences. Intuitively, the probability of
a sentence polarity in the review document is same as the document's polarity. But, it is also
in influenced by the previous sentence's polarity. The probability distribution given in Table 2
shows the polarity probabilities of next sentence given the polarity of the current sentence across
all the review documents and domains. It is interesting to know that the likelihood of the polarity
of a sentence is the polarity of its previous sentence (of course not applicable to the first sentence
of the review document). It is also noticeable that the probability of polarity transition to its
3. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
67
opposite polarity (pos to neg and neg to pos) is lesser than the transition to neutral (pos to neu
and neg to neu). With this understanding of the probability distribution, it implies that the abrupt
transition in polarities of sentences is less likely to happen and while modelling the feature vector
for the current sentence, the incorporation of features of its previous sentence is indeed going to
contribute significantly to the performance of the system.
Table 2. Probability distribution of inter sentence polarities.
Current
Sentence
Next Sentence
Polarity Pos Neg Neu
Pos 0.148 0.027 0.063
Neg 0.018 0.234 0.095
Neu 0.073 0.094 0.248
4.2. Emotions
It is known that SA is hard unless the whole sense of the words and phrases are known properly.
A sentence can be positive or negative even though it doesn't contain any sentiment words and
vice versa is also true. It is observed that six types of emotions {Anger, Disgust, Fear, Joy,
Sadness and Surprise} can be mapped into polarities of the sentiments. As investigated in the
paper [18], the emotions are tagged to the words in short sentences and then valence (positive and
negative) is determined. For tagging, they have used enriched labels from the seed WordNet-
Affect labels. But, we took rather a different approach in computing the emotions of the text. We
employed the WordNet based word-similarity algorithms to compute and aggregate the similarity
score of words with all of the emotional words. We analysed emotions both at sentence and
document level. The emotion distributions over pos, neg and neu documents are plotted in the
Figure 1 which shows some variation in each of the distribution between pos and neg. The
distribution of emotion over neu documents is seen to be occupying the space between pos and
neg distribution. Further, we tested the combination of all the emotion distributions on various
binary classifiers which yielded an accuracy of 60% at the sentence level and 75% at document
level.
4.3. WordNet Syntactic Domains
Many researchers have been actively working on using WordNet domains in computational
linguistics. The WordNet domains contains around 200 domains labels such as economics,
politics, law, science etc., which are organized in a hierarchical structure (WordNet Domains
Hierarchy). Each synset of WordNet was labelled with one or more labels which could be
explored to establish semantic relations among word senses and can be effectively utilized during
disambiguation process. The domain information utility is realized in the applications such as
Domain Driven Disambiguation (DDD) [19], word sense disambiguation (WSD) and text
categorization (TC) [20]. Akin to the WordNet domains, there is another category of information
associated with the WordNet synsets, which are organized into forty five lexicographer les
based on syntactic category and logical grouping as shown in the Table 3. We have plotted the
syntactic domains distribution over POS, NEG and NEU documents across fiv domains of the
dataset after WSD process to investigate any potential use in deciding the polarity of sentences.
The distributions of syntactic domains are shown in Figure 2. We could figure out that in
electronics domain the noun.artifact is predominant compare to other domains whereas
4. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
68
noun.act, noun.cognition, noun.communication, noun.person, verb.cognition and
verb.communication have relatively significant distribution in all domains. The main idea for
looking into the syntactic domains is that the domain information make-up a part of the
information required for the word sense computations. Since word polarity depends on the word
sense, it could be perceived that domain information have in influence on the polarity of words.
Consider Sentence S1, which is POS polarity but doesn't contain any sentiment words. In this
case, we can able to predict the polarity if we can establish the word "buy" belonging to
verb.possession domain is inclined to positive sentiment.
S1: Netgear software makes me buy them over and over...
(a) Anger (b) Joy
(a) SADNESS (b) FEAR
5. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
69
(a) DISGUST (b) SURPRISE
Figure 1: Distributions of emotions over pos, neg and neu document sentiments.
4.4. Word Senses
As the polarity of word changes with the senses and across domains [21, 22], it is important to
know that in which sense each word is used in the sentence context [23]. For instance, consider
the sentence S2 which is negative on "blow", whereas in sentence S3 it turns out to be positive.
Having resolved all the word senses properly, the subsequent steps are going to become less error
prone in determining the polarity of the sentence and document as well. S2: It came as a big blow
to the project funding.
S3: The audio system will blow you away.
S4: A good beginner's tablet.
We did WSD using sense mapping service at idilia.com for entire document and analysed some
of the samples and we found that the results are accurate in almost all cases. For instance, in the
Sentence 4 the tablet is mapped to sense having the gloss "A graphics tablet is a computer
peripheral device that allows one to hand-draw images directly into a computer, generally through
an imaging program" which is good enough in identifying the concept related to laptops.
6. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
70
Figure 2: Distributions of WordNet syntactic domains over pos, neg and neu document sentiments.
4.5. Lexicons
Most of the models in SA use a sentiment dictionary which is built automatically using a
set of seed words [24] or a hand-built one. The automatically built dictionary such as
SentiWordNet [25] consists of rich set of annotated sentiment words for translating words
into polarity scores. For instance, consider the negative sentences S5, S6 and S7 which
have negative phrases "disproportionately", "put_to_sleep" and "one_dimensional"
respectively. Suppose using only the SentiWordNet gives neutral polarity for these
words, which leads to misclassification into neutral class. In such cases, seeking for a
second opinion or third to resolve the polarities of the words would be a better idea and
will going to improve the accuracy. The bottom line is that seeking second opinion on the
polarity of the words will ensure that the words which cannot be captured by a single
7. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
71
dictionary could be captured by other dictionaries. When we crosschecked the words with
other freely available dictionaries such as
a) Lexicoder2
(http://.lexicoder.com/download.html)
b) HarvardInquirer3
(http://.wjh.harvard.edu/inquirer/spreadsheet_guide.htm)
c) SenticNet4
(http://sentic.net/downloads)
d) Bing Liu Sentiment Lexicon5
(https://www.cs.uic.edu/liub/FBS/)
e) MPQA Subjectivity Lexicon6
(http://mpqa.cs.pitt.edu/#subj_lexicon)
f) SentiWords7
(https://hltnlp.fbk.eu/technologies/sentiwords)
g) WordStat Sentiment Lexicon8
(http://provalisresearch.com/products/)
We found that majority of these dictionaries are resolving the words with correct
polarities with which polarities of sentences could be determined correctly.
S5: The book is disproportionately focused on multilayer feedback networks.
S6: First of all, this book put me to sleep several times.
S7: This is a very one dimensional book.
4.6. Meta-Characters
A careful perusal of the input text data revealed that so many meta-characters such as {:,
?, !, :D, /} are intertwined with the sentences. The most frequently occurred meta-
character ":" as used in the sentences S8 and S9 could be exploited in assessing the
polarities of the sentiments expressed in the sentences. As per the syntax of the ":", the
preceding expression of ":" will provide a clue about the following expression. In the
sentences S10 and S11, the meta-character "/" is used to specify the overall rating which
can be useful in evaluating the sentiments on the product. The 1/10 is negative the 5/10 is
neutral and the 10/10 is positive ratings.
S8: The Good: it's a fun co-op game SP game (never tried PvP).
S9: Pros: Good features, stylish looks. Cons: This phone is very unreliable.
S10: Story: 1/10 here’s where things start to go bad.
S11: Sounds and Music: 7/10 Well, the sound is fantastic!
8. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
72
Table 3: Lexicographer file numbers and file names.
FileNo. Part-of-Speech.suffix Description
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
adj.all
adj.pert
adv.all
noun.Tops
noun.act
noun.animal
noun.artifact
noun.attribute
noun.body
noun.cognition
noun.communication
noun.event
noun.feeling
noun.food
noun.group
noun.location
noun.motive
noun.object
noun.person
noun.phenomenon
noun.plant
noun.possession
noun.process
noun.quantity
noun.relation
noun.shape
noun.state
noun.substance
noun.time
verb.body
verb.change
verb.cognition
verb.communication
verb.competition
verb.consumption
verb.contact
verb.creation
verb.emotion
verb.motion
verb.perception
verb.possession
verb.social
verb.stative
verb.weather
adj.ppl
all adjective cluster
relational adjectives (pertainyms)
all adverbs
unique beginner for nouns
nouns denoting acts or actions
nouns denoting animals
nouns denoting man-made objects
nouns denoting attributes of people and objects
nouns denoting body parts
nouns denoting cognitive processes and contents
nouns denoting communicative processes and contents
nouns denoting natural events
nouns denoting feelings and emotions
nouns denoting foods and drinks
nouns denoting groupings of people or objects
nouns denoting spatial position
nouns denoting goals
nouns denoting natural objects (not man-made)
nouns denoting people
nouns denoting natural phenomena
nouns denoting plants
nouns denoting possession and transfer of possession
nouns denoting natural processes
nouns denoting quantities and units of measure
nouns denoting relations between people or things or
ideas
nouns denoting two and three dimensional shapes
nouns denoting stable states of affairs
nouns denoting substances
nouns denoting time and temporal relations
verbs of grooming, dressing and bodily care
verbs of size, temperature change, intensifying, etc.
verbs of thinking, judging, analysing, doubting
verbs of telling, asking, ordering, singing
verbs of fighting, athletic activities
verbs of eating and drinking
verbs of touching, hitting, tying, digging
verbs of sewing, baking, painting, performing
verbs of feeling
verbs of walking, flying, swimming
verbs of seeing, hearing, feeling
verbs of buying, selling, owning
verbs of political and social activities and events
verbs of being, having, spatial relations
verbs of raining, snowing, thawing, thundering
participial adjectives
9. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
73
5. CONCLUSION
SA is indeed a complex process which integrates several associated linguistic and statistical
information and beyond the word features. In this paper, we have done some statistical and
linguistic analysis on the opinionated text and based on the results and observations obtained in
the analysis, the incorporation of the relevant features in feature vector would certainly enhance
the system performance. There is a scope for further investigation into WordNet syntactic
domains so as to utilize the syntactic domain information effectively but not limited to SA.
REFERENCES
[1] B. Liu, Sentiment analysis: A multi-faceted problem, IEEE IntelligentSystems 25 (3) (2010) 76-80.
[2] K. Paramesha, K. C. Ravishankar, A perspective on sentiment analysis, in: Proceedings of ERCICA
2014 - Emerging Research in Computing, Information, Communication and Applications, Vol. 1,
Elsevier, NMIT, Bengaluru, India, 2014, pp. 412-418.
[3] R. Delmonte, V. Pallotta, Opinion mining and sentiment analysis need text understanding, in:
Advances in Distributed Agent-Based Retrieval Tools, Springer, 2011, pp. 81-95.
[4] L. Dong, F.Wei, S. Liu, M. Zhou, K. Xu, A statistical parsing framework for sentiment classification,
Computational Linguistics.
[5] F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, H. Yu, Structure aware review mining and
summarization, in: Proceedings of the 23rd
international conference on computational linguistics,
Association for Computational Linguistics, 2010, pp. 653-661.
[6] S. Shariaty, S. Moghaddam, Fine-grained opinion mining using conditional random fields, in: Data
Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, 2011, pp. 109-
114.
[7] R. Narayanan, B. Liu, A. Choudhary, Sentiment analysis of conditional sentences, in: Proceedings of
the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1,
Association for Computational Linguistics, 2009, pp. 180-189.
[8] K. Sadamitsu, S. Sekine, M. Yamamoto, Sentiment analysis based on probabilistic models using
inter-sentence information, in: LREC, 2008.
[9] B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations and trends in information
retrieval 2 (1-2) (2008) 1-135.
[10] B. Liu, Sentiment analysis and opinion mining, Synthesis lectures on human language technologies 5
(1) (2012) 1-167.
[11] K. Paramesha, K. C. Ravishankar, Optimization of cross domain sentiment analysis using
SentiWordNet, arXiv preprint arXiv: 1401.3230.
[12] K. Paramesha, K. Ravishankar, Exploiting dependency relations for sentence level sentiment
classification using SVM, in: Electrical, Computer and Communication Technologies (ICECCT),
2015 IEEE International Conference on, IEEE, 2015, pp. 1-4.
[13] R. Prabowo, M. Thelwall, Sentiment analysis: A combined approach, Journal of Informetrics 3 (2)
(2009) 143-157.
[14] J. Polpinij, A. K. Ghose, An ontology-based sentiment classification methodology for online
consumer reviews, in: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web
Intelligence and Intelligent Agent Technology-Volume 01, IEEE Computer Society, 2008, pp. 518-
524.
[15] O. Täckström, R. McDonald, Discovering ne-grained sentiment with latent variable structured
prediction models, in: Advances in Information Retrieval, Springer, 2011, pp. 368-374.
[16] M. Ganapathibhotla, B. Liu, Mining opinions in comparative sentences, in: Proceedings of the 22nd
International Conference on Computational Linguistics-Volume 1, Association for Computational
Linguistics, 2008, pp. 241-248.
10. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.2, June 2016
74
[17] M. Gamon, Sentiment classification on customer feedback data: noisy data, large feature vectors, and
the role of linguistic analysis, in: Proceedings of the 20th international conference on Computational
Linguistics, Association for Computational Linguistics, 2004, p. 841.
[18] F. R. Chaumartin, Upar7: A knowledge-based system for headline sentiment tagging, in: Proceedings
of the 4th International Workshop on Semantic Evaluations, Association for Computational
Linguistics, 2007, pp. 422-425.
[19] A Gliozzo, C. Strapparava, I. Dagan, Unsupervised and supervised exploitation of semantic domains
in lexical disambiguation, Computer Speech & Language 18 (3) (2004) 275-299.
[20] B. Magnini, C. Strapparava, G. Pezzulo, A. Gliozzo, Comparing ontology-based and corpus-based
domain annotations in wordnet, in: Proceedings of the First International WordNet Conference, 2002,
pp. 21-25.
[21] T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity: An exploration of features for
phrase-level sentiment analysis, Computational linguistics 35 (3) (2009) 399-433.
[22] X. Ding, B. Liu, P. S. Yu, A holistic lexicon-based approach to opinion mining, in: Proceedings of the
2008 International Conference on Web Search and Data Mining, ACM, 2008, pp. 231-240.
[23] F. Benamara, B. Chardon, Y. Y. Mathieu, V. Popescu, et al., Towards context-based subjectivity
analysis, in: IJCNLP, 2011, pp. 1180-1188.
[24] N. Kaji, M. Kitsuregawa, Building lexicon for sentiment analysis from massive collection of html
documents, in: EMNLP-CoNLL, 2007, pp. 1075-1083.
[25] S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: An enhanced lexical resource for sentiment
analysis and opinion mining, in: LREC, Vol. 10, 2010, pp. 2200-2204.
Authors
K.PARAMESHA
Assoc. Professor
Dept. of CSE, VVCE,
Mysore, India
He has studied BE in Computer Science (Bangalore university) and M.Tech in Network
Engineering (VTU, Belagavi). Presently working as a faculty in the Dept. Of Computer
Science at VVCE, Mysuru and also actively involved in research on Sentiment Analysis.
His primary focus is on Feature Engineering in Text Analytics. His areas of interest are
Programming, NLP, Text Mining and Machine Learning.
Dr K.C.RAVISHANKAR
Professor & HOD
Dept. of CSE,
Govt. Engineering College
Hassan, India
He has obtained BE in Computer Science and Engineering from Mysore University,
M.Tech in Computer Science and Engineering from IIT, Delhi and doctorate from VTU,
Belagavi in Computer Science and Engineering. He has 23 years of teaching experience.
He served as Professor at Malnad College of Engineering Hassan. Presently working as
Professor & HOD at Govt. Engineering College Hassan, Karnataka. His area of interest
include Image Processing, Information security, Computer Network and Databases. He
has presented a number of research papers at National and International conferences. Delivered invited
talks and key note addresses at reputed institutions. He has served as chairperson and reviewer for
conferences and currently guiding three PhD scholars.