The document presents an approach for extracting aspects and associated sentiments from user feedback data using a rule-based approach. It involves extracting aspects from sentences, associating sentiment terms to the aspects using SentiWordNet, and classifying sentiments according to linguistic rules. The approach uses part-of-speech tagging and WordNet to identify aspects and group related ones. Sentiment scores are normalized to account for intensifiers, negations, and ambiguity. The approach was tested on 65,000 responses from a hospital survey to extract and classify aspects and sentiments at the sentence level.
Business recommendation based on collaborative filtering and feature engineer...IJECEIAES
Business decisions for any service or product depend on sentiments by people. We get these sentiments or rating on social websites like twitter, kaggle. The mood of people towards any event, service and product are expressed in these sentiments or rating. The text of sentiment contains different linguistic features of sentence. A sentiment sentence also contains other features which are playing a vital role in deciding the polarity of sentiments. If features selection is proper one can extract better sentiments for decision making. A directed preprocessing will feed filtered input to any machine learning approach. Feature based collaborative filtering can be used for better sentiment analysis. Better use of parts of speech (POS) followed by guided preprocessing and evaluation will minimize error for sentiment polarity and hence the better recommendation to the user for business analytics can be attained.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
The document summarizes a technique for generating summaries using sentence compression and statistical measures. It first implements a graph-based technique to achieve sentence compression and information fusion. It then uses hand-crafted syntactic rules to prune compressed sentences. Finally, it uses probabilistic measures and word co-occurrence to obtain the summaries. The system can generate summaries at any user-defined compression rate.
IRJET- Public Opinion Analysis on Law EnforcementIRJET Journal
The document describes a sentiment analysis algorithm that classifies law enforcement tweets as positive or negative. It uses a lexicon-based approach with sentiment composition rules to determine the polarity of each tweet. The algorithm was evaluated on a dataset of manually annotated law enforcement tweets, achieving an F-score of 75.6% for positive and negative classification. Sentiment composition rules are applied to identify the polarity of noun phrases, verb phrases, and phrases combined with prepositions or the conjunction "but". The overall polarity of each tweet is determined by calculating a positivity to negativity ratio.
IRJET-Semantic Similarity Between SentencesIRJET Journal
This document discusses approaches to measuring semantic similarity between sentences. It evaluates three approaches: cosine similarity, path-based measures using WordNet, and a feature-based approach. The feature-based approach generates similarity scores based on tagging parts of speech, lemmatization, and comparing nouns and verbs between sentences. It is concluded that the feature-based approach provides better semantic similarity scores compared to existing path-based and cosine similarity measures.
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...IRJET Journal
The document discusses using machine learning models for extractive text summarization, which involves selecting important sentences from a document to provide an overview. Five machine learning models are explored: an artificial neural network, naive Bayes classifier, support vector machine, and two convolutional neural networks. The models are used to classify sentences as important or not important based on features like position, length, terms, and similarity to other sentences. The models' performance on this text highlighting task is compared.
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...ijiert bestjournal
Opinion mining is nothing but mining opinion target s and opinion words from online reviews. To find op inion relation among them partially supervised word align ment model have used. To find confidence of each candidate graph based co-ranking algorithm have used. Further candidates having confidence higher than threshold value are extracted as opinion word or opinion targets. Compa red to previous approach syntax-based method this m ethod can give correct results by eliminating parsing errors and can work on reviews in informal language. Compa red to nearest neighbor method this method can give more p recise results and can find relations within a long span. Also to decrease error propagation graph based co-r anking algorithm is used to collectively extract op inion targets and opinion words. Also to decrease probability of error generation penetration of high degree vertice s is done and decrease effect of random walk.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
Business recommendation based on collaborative filtering and feature engineer...IJECEIAES
Business decisions for any service or product depend on sentiments by people. We get these sentiments or rating on social websites like twitter, kaggle. The mood of people towards any event, service and product are expressed in these sentiments or rating. The text of sentiment contains different linguistic features of sentence. A sentiment sentence also contains other features which are playing a vital role in deciding the polarity of sentiments. If features selection is proper one can extract better sentiments for decision making. A directed preprocessing will feed filtered input to any machine learning approach. Feature based collaborative filtering can be used for better sentiment analysis. Better use of parts of speech (POS) followed by guided preprocessing and evaluation will minimize error for sentiment polarity and hence the better recommendation to the user for business analytics can be attained.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
The document summarizes a technique for generating summaries using sentence compression and statistical measures. It first implements a graph-based technique to achieve sentence compression and information fusion. It then uses hand-crafted syntactic rules to prune compressed sentences. Finally, it uses probabilistic measures and word co-occurrence to obtain the summaries. The system can generate summaries at any user-defined compression rate.
IRJET- Public Opinion Analysis on Law EnforcementIRJET Journal
The document describes a sentiment analysis algorithm that classifies law enforcement tweets as positive or negative. It uses a lexicon-based approach with sentiment composition rules to determine the polarity of each tweet. The algorithm was evaluated on a dataset of manually annotated law enforcement tweets, achieving an F-score of 75.6% for positive and negative classification. Sentiment composition rules are applied to identify the polarity of noun phrases, verb phrases, and phrases combined with prepositions or the conjunction "but". The overall polarity of each tweet is determined by calculating a positivity to negativity ratio.
IRJET-Semantic Similarity Between SentencesIRJET Journal
This document discusses approaches to measuring semantic similarity between sentences. It evaluates three approaches: cosine similarity, path-based measures using WordNet, and a feature-based approach. The feature-based approach generates similarity scores based on tagging parts of speech, lemmatization, and comparing nouns and verbs between sentences. It is concluded that the feature-based approach provides better semantic similarity scores compared to existing path-based and cosine similarity measures.
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...IRJET Journal
The document discusses using machine learning models for extractive text summarization, which involves selecting important sentences from a document to provide an overview. Five machine learning models are explored: an artificial neural network, naive Bayes classifier, support vector machine, and two convolutional neural networks. The models are used to classify sentences as important or not important based on features like position, length, terms, and similarity to other sentences. The models' performance on this text highlighting task is compared.
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...ijiert bestjournal
Opinion mining is nothing but mining opinion target s and opinion words from online reviews. To find op inion relation among them partially supervised word align ment model have used. To find confidence of each candidate graph based co-ranking algorithm have used. Further candidates having confidence higher than threshold value are extracted as opinion word or opinion targets. Compa red to previous approach syntax-based method this m ethod can give correct results by eliminating parsing errors and can work on reviews in informal language. Compa red to nearest neighbor method this method can give more p recise results and can find relations within a long span. Also to decrease error propagation graph based co-r anking algorithm is used to collectively extract op inion targets and opinion words. Also to decrease probability of error generation penetration of high degree vertice s is done and decrease effect of random walk.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
Improvement of Text Summarization using Fuzzy Logic Based MethodIOSR Journals
The document describes a method for improving text summarization using fuzzy logic. It proposes using fuzzy logic to determine the importance of sentences based on calculated feature scores. Eight features are used to score sentences, including title words, length, term frequency, position, and similarity. Sentences are then ranked based on their fuzzy logic-determined scores. The highest scoring sentences are extracted to create a summary. An evaluation of summaries generated using this fuzzy logic method found it performed better than other summarizers in accurately reflecting the content and order of human-generated reference summaries. The method could be expanded to multi-document summarization and automatic selection of fuzzy rules based on input type.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding which used simple scoring functions based on distance between entity embeddings. More advanced models like semantic matching energy and neural tensor networks learn separate relation embeddings and use them to calculate entity interactions. The document also discusses applications of these embeddings for tasks like link prediction, question answering and knowledge base expansion. It provides details of various models' scoring functions, training objectives and datasets used for evaluation.
This document summarizes research on sentiment analysis of English and Tamil tweets using path length similarity-based word sense disambiguation. It discusses translating Tamil tweets to English, finding semantic similarity using path length in a lexical database, and classifying sentiments using support vector machines. The paper also reviews related work on multilingual sentiment analysis and adaptation to new topics, and proposes a framework to determine sentiment polarity of bilingual tweets.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding and semantic matching energy which used single hidden layers. It then explains more complex models like neural tensor networks that use tensors to model relations. The document also discusses applications of these embeddings for tasks like link prediction, question answering, and knowledge base expansion. It provides details on model formulations, scoring functions, training objectives, and datasets used for evaluation.
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET Journal
This document discusses analyzing paraphrase detection using natural language processing (NLP) techniques. It proposes applying a multi-head attention mechanism in a Siamese deep neural network to detect semantic similarity between texts and determine if they are paraphrases. The system would tokenize, stem, remove stopwords and part-of-speech tag input texts before applying the neural network. It evaluates the approach on datasets like SNLI and QQP and compares performance to existing methods.
The document proposes a method for automatically generating questions from sentences by performing sentence simplification. It involves two main steps - identifying potential answer phrases in the sentence, and generating simplified versions of the sentence focused around each answer phrase. A classifier is trained to identify answer phrases using syntactic and semantic features. Sentence simplification is done by pruning dependencies from the sentence's parse tree in a way that preserves the identified answer phrases, resulting in multiple simplified statements from the original sentence that can be transformed into questions. Evaluation shows the classifier achieves over 70% accuracy in identifying answer phrases.
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
This paper presents a cluster priority ranking based approach for extractive automatic text summarization that aggregates different cluster ranks for final sentence scoring. This approach does not require any learning, feature weighting and semantic processing. Surface level features combinations are used for
individual cluster scoring. Proposed approach produces quality summaries without using title feature. Experimental results on DUC 2002 dataset proves robustness of proposed approach as compared to other surface level approaches using ROUGE evaluation matrices.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses some of the challenges in NLP like ambiguity and new forms of written language. The document goes on to explain probabilistic models and language models that are used to complete sentences and rearrange phrases based on probabilities. It also covers text processing techniques like tokenization, regular expressions, and more. Finally, it discusses spelling correction techniques using noisy channel models and confusion matrices.
This document presents an overview of spell checking techniques in natural language processing. It discusses how spell checkers work by scanning text, comparing words to a dictionary, and using language-dependent algorithms. Two categories of spelling errors are described: real-word errors involving correctly spelled words and non-word errors containing no dictionary words. Techniques for error detection include dictionary lookup and n-gram comparisons using the Jaccard coefficient. The Levenshtein distance and Jaccard coefficient algorithms are then explained and shown to provide suggestions by calculating the edit distance between source and target words. The presentation concludes that these algorithms filter dictionary words and provide accurate suggestions to correct spelling mistakes in text.
A framework for emotion mining from text in online social networks(final)es712
This document proposes a framework for characterizing emotional interactions in social networks to distinguish friends from acquaintances. It collects posts and comments from social networks, develops lexicons to analyze informal language, generates features to assess text subjectivity, trains a model to classify text subjectivity, and uses this to train an SVM model that predicts relationships with 87% accuracy.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
An approach to word sense disambiguation combining modified lesk and bag of w...csandit
In this paper, we are going to propose a technique to find meaning of words using Word Sense
Disambiguation using supervised and unsupervised learning. This limitation of information is
main flaw of the supervised approach. Our proposed approach focuses to overcome the
limitation using learning set which is enriched in dynamic way maintaining new data. We
introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having
enriched bags using learning methods.
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...cscpconf
In this paper, we are going to propose a technique to find meaning of words using Word Sense Disambiguation using supervised and unsupervised learning. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. We introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having enriched bags using learning methods.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...Nikita Sharma
Applying Deep Learning LSTM network and Word embeddings on Job postings and Resume based text corpuses for Job Skills extraction.
The paper proposes the application of Long Short Term Memory (LSTM) deep learning network combined with Word Embeddings to extract the relevant skills from text documents. The approach proposed in the paper also aims at identifying new and emerging skills that haven’t been seen already.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
Business intelligence analytics using sentiment analysis-a surveyIJECEIAES
Sentiment analysis (SA) is the study and analysis of sentiments, appraisals and impressions by people about entities, person, happening, topics and services. SA uses text analysis techniques and natural language processing methods to locate and extract information from big data. As most of the people are networked themselves through social websites, they use to express their sentiments through these websites.These sentiments are proved fruitful to an individual, business, government for making decisions. The impressions posted on different available sources are being used by organization to know the market mood about the services they are providing. Analyzing huge moods expressed with different features, style have raised challenge for users. This paper focuses on understanding the fundamentals of sentiment analysis, the techniques used for sentiment extraction and analysis. These techniques are then compared for accuracy, advantages and limitations. Based on the accuracy for expexted approach, we may use the suitable technique.
Opinion Mining and Improvised Algorithm for Feature Reduction in Sentiment An...IJERA Editor
Nowadays organisations use the power of the web to analyse the review of the product by customer. The organisation cannot trust star based reviews because it can be faked by robots. That is why textual review is preferable. Opinion mining is used to find the approximate sentiment of the review. Sentiment analysis is a part of opinion mining which helps an organisation to get valuable feedback of the product by extracting the polarity of reviews. The review of a product may be used to improve productivity of the organisation as it could improve its product's features based on reviews. It provides us ways to analyse a given review. In our review paper, we have emphasised on the content based analysis of the review rather than deciding the contextual polarity by its topic. We have reached to our proposed algorithm by referring to BoPang and LillianLee’ s [2] paper and Tirath Prasad Sahu and Sanjeev Ahuja’ s [3] paper.
With the rapidly increasing growth in the field of internet and web usage, it has become essential to use a certain specific powerful tool, which should be capable to analyze and rank all these available reviews/opinion on the web/Internet. In this paper we have propose a new and effective approach which uses a powerful sentiment analysis procedure which will be based on an ontological adjustment and arrangements. This study also aims to understand pos tag order to get detailed observation for any review or opinion, it also helps in identifying all present positive /Negative sentiments and suggest a proper sentence inclination. For this we have used reviews available on internet regarding Nokia and Stanford parser for the purpose or pos tagging.
TOWARDS MAKING SENSE OF ONLINE REVIEWS BASED ON STATEMENT EXTRACTIONcscpconf
Product reviews are valuable resource for information seeking and decision making purposes. Products such as smart phone are discussed based on their aspects e.g. battery life, screen quality, etc. Knowing user statements about aspects is relevant as it will guide other users in their buying process. In this paper, we automatically extract user statements about aspects for a given product. Our extraction method is based on dependency parse information of individual reviews. The parse information is used to learn patterns and use them to determine the user statements for a given aspect. Our results show that our methods are able to extract potentially
useful statements for given aspects.
Implementation of Semantic Analysis Using Domain OntologyIOSR Journals
The document describes a semantic analysis system that analyzes feedback from an organization using domain ontology. The system first collects feedback data from students in an unstructured format. It then preprocesses the feedback using part-of-speech tagging to extract meaningful information. The system architecture includes preprocessing the feedback, matching entities in the feedback to an organization ontology using Jaccard similarity, and generating a summarized analysis of the feedback based on the ontology entities. The goal is to group related words and phrases expressed by students under the same entity to produce a meaningful summary for the organization.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
Improvement of Text Summarization using Fuzzy Logic Based MethodIOSR Journals
The document describes a method for improving text summarization using fuzzy logic. It proposes using fuzzy logic to determine the importance of sentences based on calculated feature scores. Eight features are used to score sentences, including title words, length, term frequency, position, and similarity. Sentences are then ranked based on their fuzzy logic-determined scores. The highest scoring sentences are extracted to create a summary. An evaluation of summaries generated using this fuzzy logic method found it performed better than other summarizers in accurately reflecting the content and order of human-generated reference summaries. The method could be expanded to multi-document summarization and automatic selection of fuzzy rules based on input type.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding which used simple scoring functions based on distance between entity embeddings. More advanced models like semantic matching energy and neural tensor networks learn separate relation embeddings and use them to calculate entity interactions. The document also discusses applications of these embeddings for tasks like link prediction, question answering and knowledge base expansion. It provides details of various models' scoring functions, training objectives and datasets used for evaluation.
This document summarizes research on sentiment analysis of English and Tamil tweets using path length similarity-based word sense disambiguation. It discusses translating Tamil tweets to English, finding semantic similarity using path length in a lexical database, and classifying sentiments using support vector machines. The paper also reviews related work on multilingual sentiment analysis and adaptation to new topics, and proposes a framework to determine sentiment polarity of bilingual tweets.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding and semantic matching energy which used single hidden layers. It then explains more complex models like neural tensor networks that use tensors to model relations. The document also discusses applications of these embeddings for tasks like link prediction, question answering, and knowledge base expansion. It provides details on model formulations, scoring functions, training objectives, and datasets used for evaluation.
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET Journal
This document discusses analyzing paraphrase detection using natural language processing (NLP) techniques. It proposes applying a multi-head attention mechanism in a Siamese deep neural network to detect semantic similarity between texts and determine if they are paraphrases. The system would tokenize, stem, remove stopwords and part-of-speech tag input texts before applying the neural network. It evaluates the approach on datasets like SNLI and QQP and compares performance to existing methods.
The document proposes a method for automatically generating questions from sentences by performing sentence simplification. It involves two main steps - identifying potential answer phrases in the sentence, and generating simplified versions of the sentence focused around each answer phrase. A classifier is trained to identify answer phrases using syntactic and semantic features. Sentence simplification is done by pruning dependencies from the sentence's parse tree in a way that preserves the identified answer phrases, resulting in multiple simplified statements from the original sentence that can be transformed into questions. Evaluation shows the classifier achieves over 70% accuracy in identifying answer phrases.
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
This paper presents a cluster priority ranking based approach for extractive automatic text summarization that aggregates different cluster ranks for final sentence scoring. This approach does not require any learning, feature weighting and semantic processing. Surface level features combinations are used for
individual cluster scoring. Proposed approach produces quality summaries without using title feature. Experimental results on DUC 2002 dataset proves robustness of proposed approach as compared to other surface level approaches using ROUGE evaluation matrices.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses some of the challenges in NLP like ambiguity and new forms of written language. The document goes on to explain probabilistic models and language models that are used to complete sentences and rearrange phrases based on probabilities. It also covers text processing techniques like tokenization, regular expressions, and more. Finally, it discusses spelling correction techniques using noisy channel models and confusion matrices.
This document presents an overview of spell checking techniques in natural language processing. It discusses how spell checkers work by scanning text, comparing words to a dictionary, and using language-dependent algorithms. Two categories of spelling errors are described: real-word errors involving correctly spelled words and non-word errors containing no dictionary words. Techniques for error detection include dictionary lookup and n-gram comparisons using the Jaccard coefficient. The Levenshtein distance and Jaccard coefficient algorithms are then explained and shown to provide suggestions by calculating the edit distance between source and target words. The presentation concludes that these algorithms filter dictionary words and provide accurate suggestions to correct spelling mistakes in text.
A framework for emotion mining from text in online social networks(final)es712
This document proposes a framework for characterizing emotional interactions in social networks to distinguish friends from acquaintances. It collects posts and comments from social networks, develops lexicons to analyze informal language, generates features to assess text subjectivity, trains a model to classify text subjectivity, and uses this to train an SVM model that predicts relationships with 87% accuracy.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
An approach to word sense disambiguation combining modified lesk and bag of w...csandit
In this paper, we are going to propose a technique to find meaning of words using Word Sense
Disambiguation using supervised and unsupervised learning. This limitation of information is
main flaw of the supervised approach. Our proposed approach focuses to overcome the
limitation using learning set which is enriched in dynamic way maintaining new data. We
introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having
enriched bags using learning methods.
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...cscpconf
In this paper, we are going to propose a technique to find meaning of words using Word Sense Disambiguation using supervised and unsupervised learning. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. We introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having enriched bags using learning methods.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...Nikita Sharma
Applying Deep Learning LSTM network and Word embeddings on Job postings and Resume based text corpuses for Job Skills extraction.
The paper proposes the application of Long Short Term Memory (LSTM) deep learning network combined with Word Embeddings to extract the relevant skills from text documents. The approach proposed in the paper also aims at identifying new and emerging skills that haven’t been seen already.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
Business intelligence analytics using sentiment analysis-a surveyIJECEIAES
Sentiment analysis (SA) is the study and analysis of sentiments, appraisals and impressions by people about entities, person, happening, topics and services. SA uses text analysis techniques and natural language processing methods to locate and extract information from big data. As most of the people are networked themselves through social websites, they use to express their sentiments through these websites.These sentiments are proved fruitful to an individual, business, government for making decisions. The impressions posted on different available sources are being used by organization to know the market mood about the services they are providing. Analyzing huge moods expressed with different features, style have raised challenge for users. This paper focuses on understanding the fundamentals of sentiment analysis, the techniques used for sentiment extraction and analysis. These techniques are then compared for accuracy, advantages and limitations. Based on the accuracy for expexted approach, we may use the suitable technique.
Opinion Mining and Improvised Algorithm for Feature Reduction in Sentiment An...IJERA Editor
Nowadays organisations use the power of the web to analyse the review of the product by customer. The organisation cannot trust star based reviews because it can be faked by robots. That is why textual review is preferable. Opinion mining is used to find the approximate sentiment of the review. Sentiment analysis is a part of opinion mining which helps an organisation to get valuable feedback of the product by extracting the polarity of reviews. The review of a product may be used to improve productivity of the organisation as it could improve its product's features based on reviews. It provides us ways to analyse a given review. In our review paper, we have emphasised on the content based analysis of the review rather than deciding the contextual polarity by its topic. We have reached to our proposed algorithm by referring to BoPang and LillianLee’ s [2] paper and Tirath Prasad Sahu and Sanjeev Ahuja’ s [3] paper.
With the rapidly increasing growth in the field of internet and web usage, it has become essential to use a certain specific powerful tool, which should be capable to analyze and rank all these available reviews/opinion on the web/Internet. In this paper we have propose a new and effective approach which uses a powerful sentiment analysis procedure which will be based on an ontological adjustment and arrangements. This study also aims to understand pos tag order to get detailed observation for any review or opinion, it also helps in identifying all present positive /Negative sentiments and suggest a proper sentence inclination. For this we have used reviews available on internet regarding Nokia and Stanford parser for the purpose or pos tagging.
TOWARDS MAKING SENSE OF ONLINE REVIEWS BASED ON STATEMENT EXTRACTIONcscpconf
Product reviews are valuable resource for information seeking and decision making purposes. Products such as smart phone are discussed based on their aspects e.g. battery life, screen quality, etc. Knowing user statements about aspects is relevant as it will guide other users in their buying process. In this paper, we automatically extract user statements about aspects for a given product. Our extraction method is based on dependency parse information of individual reviews. The parse information is used to learn patterns and use them to determine the user statements for a given aspect. Our results show that our methods are able to extract potentially
useful statements for given aspects.
Implementation of Semantic Analysis Using Domain OntologyIOSR Journals
The document describes a semantic analysis system that analyzes feedback from an organization using domain ontology. The system first collects feedback data from students in an unstructured format. It then preprocesses the feedback using part-of-speech tagging to extract meaningful information. The system architecture includes preprocessing the feedback, matching entities in the feedback to an organization ontology using Jaccard similarity, and generating a summarized analysis of the feedback based on the ontology entities. The goal is to group related words and phrases expressed by students under the same entity to produce a meaningful summary for the organization.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsIJECEIAES
Sentiment analysis is the pre-eminent technology to extract the relevant information from the data domain. In this paper cross domain sentimental classification approach Cross_BOMEST is proposed. Proposed approach will extract †ve words using existing BOMEST technique, with the help of Ms Word Introp, Cross_BOMEST determines †ve words and replaces all its synonyms to escalate the polarity and blends two different domains and detects all the self-sufficient words. Proposed Algorithm is executed on Amazon datasets where two different domains are trained to analyze sentiments of the reviews of the other remaining domain. Proposed approach contributes propitious results in the cross domain analysis and accuracy of 92 % is obtained. Precision and Recall of BOMEST is improved by 16% and 7% respectively by the Cross_BOMEST.
IRJET - Response Analysis of Educational VideosIRJET Journal
This document summarizes a research paper that analyzes student feedback on educational videos through sentiment analysis. It proposes a system to collect student comments, preprocess the data, identify sentiment and emotions, compute student satisfaction and dissatisfaction, and visualize the results. The system uses machine learning techniques like term frequency-inverse document frequency and random forest classification. It achieved 62.5% accuracy in classifying sentiment polarity in student comments. The analysis of student responses can help teachers better understand student interest and identify areas for improvement.
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
This document discusses using machine learning approaches to perform sentiment analysis on students' feedback. Specifically, it proposes using a random forest classifier to analyze descriptive feedback collected through an online student portal and classify it as having positive, negative, or neutral sentiment. The proposed system would collect real-time feedback, preprocess it by removing stop words and tagging parts of speech, extract sentiment-related features, and use the trained random forest model to classify unseen feedback with 90% accuracy. The goal is to more accurately analyze both objective and descriptive feedback to evaluate teacher performance.
1) The document proposes a novel algorithm that rates sentiment in comments using an Independent Term Matching scheme while accounting for negations. It modifies the sentiment score of words affected by negations.
2) It employs association rule discovery to find common word groups and trains a Naive Bayes classifier. This allows both sentiment rating and classification of diverse comments.
3) The algorithm enhances existing techniques by incorporating negation handling in sentiment scoring and using key words from association rules to improve Naive Bayes classification.
Camera ready sentiment analysis : quantification of real time brand advocacy ...Absolutdata Analytics
Quantification of Real Time Brand Advocacy for Customer Journey using Sentiment Analysis.
This was Presented in Rapid Miner Community Meeting & Conference, Portugal held on Aug 27-30, 2013
For more details, please visit: www.absolutdata.com
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
We propose an automatic classification system of movie genres based on different features from their textual synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis, and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
This document summarizes a research paper that proposed improving sentiment analysis of short informal Indonesian product reviews using synonym-based feature expansion. The paper developed an automatic sentiment analysis system using Naive Bayes classification and feature expansion. It first preprocessed texts through normalization, then used an API to find synonyms and expand text features. Experiments showed the proposed method improved sentiment analysis accuracy of short reviews to 98%, and that feature expansion helped more with small training datasets. The best performance was with 400 training examples using expansion.
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
We propose an automatic classification system of movie genres based on different features from their textual
synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis,
and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
The document describes a study that uses GloVe word embeddings to measure semantic similarity between short texts. GloVe is an unsupervised learning algorithm for obtaining vector representations of words. The study trains GloVe word embeddings on a large corpus, then uses the embeddings to encode short texts and calculate their semantic similarity, comparing the accuracy to methods that use Word2Vec embeddings. It aims to show that GloVe embeddings may provide better performance for short text semantic similarity tasks.
IRJET- Text Highlighting – A Machine Learning ApproachIRJET Journal
This document discusses using machine learning models for extractive text summarization. It explores five models - artificial neural network, naive Bayes classifier, support vector machine, and two convolutional neural networks. It also proposes a new method to generate extractive summarization datasets from human summaries. The models are trained and tested on a dataset generated from CNN/Daily Mail articles, with the convolutional neural network approach achieving the highest accuracy.
The document describes a research project on sentiment analysis of tweets. It involves collecting twitter data, preprocessing the data by removing stopwords and replacing emoticons/sentiment words with tags. Features are then extracted and normalized, followed by feature reduction. The data is clustered into positive and negative classes using K-means clustering and Differential Evolution algorithm, and their accuracies are compared, with Differential Evolution found to perform better. Future work proposed includes applying additional clustering techniques and comparing with supervised learning methods.
This document summarizes a research paper on sentiment analysis of customer review datasets. It discusses how sentiment analysis uses natural language processing to identify subjective information in text sources. Different levels of sentiment analysis are described, including document, sentence, and aspect levels. Methods for sentiment classification like using subjective dictionaries and machine learning are outlined. Challenges in sentiment analysis like interpreting words that can have both positive and negative meanings are also discussed.
The document summarizes an aspect-based sentiment analysis project that aims to identify aspects of entities and the sentiment expressed for each aspect from reviews. The project involves extracting aspects, detecting the category of each aspect, analyzing the polarity of each aspect, and summarizing the overall polarity for each category based on the individual aspect polarities. Various natural language processing libraries and machine learning algorithms like conditional random fields and support vector machines were used to implement the different parts of the project.
The document summarizes an aspect-based sentiment analysis project that identifies aspects of entities and the sentiment expressed for each aspect in reviews. It discusses the main sub-problems of aspect extraction, category detection, polarity analysis, and category polarity. It then provides details on the algorithms and libraries used to implement solutions for each sub-problem, including using conditional random fields for aspect extraction, an SVM model for category detection, and dependency parsing with a graph approach for polarity analysis of multiple aspects.
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
This document compares sentiment analysis techniques using deep learning and machine learning. It summarizes previous work using various machine learning algorithms and deep learning methods for sentiment analysis. The document then outlines the approach taken in this study, which is to determine the best sentiment analysis results using either machine learning or deep learning techniques. It describes preprocessing the Rotten Tomatoes movie review dataset and creating text matrices before selecting models for classification. The goal is to get a generalized understanding of how sentiment analysis can be performed and which practices yield optimal results.
This document proposes a model to estimate overall sentiment score by applying rules of inference from discrete mathematics. It discusses sentiment analysis and related work using techniques like supervised/unsupervised learning. The problem is identifying sentiment components and restricting patterns for feature identification. Most approaches focus on nouns/adjectives but not verbs/adverbs. The model preprocesses product review datasets using NLTK for stemming, parsing and tokenizing. It builds a lexicon dictionary of positive and negative words. The Lexical Pattern Sentiment Analysis algorithm uses both lexicon and pattern mining - it selects sentence patterns, checks for positive/negative words in the lexicon, and calculates an overall sentiment score.
Similar to Aspect mining and sentiment association (20)
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
1. An Approach to Extract Aspects and Sentence Level Sentiment From User Feedback
Using a Rule Based Approach in conjunction with SentiWordNet and POS-Tagging
Aaruna G {aaruna.g@imaginea.com}, Ramachandra Kousik A.S {kousik.r@imaginea.com}
Imaginea Technologies, a BU of Pramati Technologies.
Abstract
The most integral part of our work is to extract Aspects from User Feedback and associate Sentiment
and Opinion terms to them. The dataset we have at our disposal to work upon, is a set of feedback
documents for various departments in a Hospital in XML format which have comments represented in
tags. It contains about 65000 responses to a survey taken in a Hospital. Every response or comment is
treated as a sentence or a set of them. We perform a sentence level aspect and sentiment extraction
and we attempt to understand and mine User Feedback data to gather aspects from it. Further to it,
we extract the sentiment mentions and evaluate them contextually for sentiment and associate those
sentiment mentions with the corresponding aspects. To start with, we perform a clean up on the User
Feedback data, followed by aspect extraction and sentiment polarity calculation, with the help of POS
tagging and SentiWordNet[1] filters respectively. The obtained sentiments are further classified
according to a set of Linguistic rules and the scores are normalized to nullify any noise that might be
present. We lay emphasis on using a rule based approach; rules being Linguistic rules that correspond
to the positioning of various parts-of-speech words in a sentence.
Keywords : Aspect Mining, Opinion Mining, Sentiment Analysis, Polarity Classification.
1. Introduction:
The primary focus area of our work is on Aspect Extraction and grouping. Aspects form an important
part of any classification and they essentially also define the context in which a certain opinion or a
response is expressed. We perform grouping on aspects in order to achieve closeness i.e. to put aspects
that are linguistically related to each other together in a common bucket. Our work also focuses on
extracting relevant sentiment for an aspect. We perform this analysis at the sentence level using a rule
based approach, where the rules are English language rules.
Recently there has been a change of attitude in the field from plainly extracting positive or negative
opinions to introducing opinion weights and classifying opinions as neutral. Therefore, it is not
anymore focused on the binary classification of positive or negative as referred to in [3] by Tuney.
Corpus-based methods work by utilizing dictionary-based approaches and these approaches depend on
existing lexicographical resources (such as WordNet) to provide semantic data in regards to individual
senses and words [4] . We lay emphasis on Language rules rather than just a look-up from sources like
WordNet or SentiWordNet because we base our work on the fact that the meaning of the word is
relevant only in a context and the presence of other words along with a particular word in an
expression changes the intensity of the whole expression like an adjective or an adverb intensifying or
further deprecating the intensity of a noun or a verb etc. This is also the distinguishing factor between
2. text mining and Information Retrieval (IR), where the latter is only information access while the
former involves pruning the complete text data, as also argued by Hearst in [2] . In contrast to the
other works our work presents sentence level lexical/dictionary knowledge base method to tackle the
domain adaptability problem for different types data as shown by Khan et. al in [5].
The dataset used for our work is a set of documents that are a response to a survey conducted in a
Hospital. And these responses have been categorized with respect to the department. Overall, we have
conducted our study on about 65.000 responses. We did not chose to do any subjectivity analysis
because on a Feedback form or on a 'Post your comment' section the number of objective expressions
are negligible and wouldn't constitute to any significant noise. One of the main reasons why we have
limited it to Sentence Level Classification is because an aspect that appears in a response
predominantly contributes to that response alone but not to the whole document. Hence there is no
need for document-level(paragraph-level) analysis, should document-level analysis be done, the simple
aggregation of all the sentence level results would be pretty accurate. We as well wanted to remain as
domain independent as possible which could only be achieved by sentence level classification. We
analyze the performance of our approach by comparing our results (shown in Section 4) against the
manually annotated results.
Section 2 presents the approach we followed to solve the problem of aspect extraction and sentiment
association and classification. Section 3 will detail our score calculation metrics along with the pseudo-
code followed by performance analysis in Section 4 and conclusion and future work in Section 5, with
references in Section 6.
2. Our Approach
Each response is about an aspect or a group of aspects and these aspects form the theme of a comment
or a response. As the comments or responses were obtained through a survey, it is intuitive that a
certain aspect would appear in a response only when the user who has written that response thinks
that particular aspect is deemed fit. And every aspect that appears in a sentence has a part to play in
the overall sentence's expression quotient. Keeping this in mind, certain metrics like the TF-IDF were
ruled out because a) A response is not as elaborate as a document and b) When two aspects appear in
a response, we treat them both equally important to that response and we completely rule out the
concept of relative importance (where TF-IDF would've come in handy) as a response is typically not
more than a couple of sentences long and the chances of an aspect appearing multiple times in such a
response are negligible. We lay emphasis on sentence level classification to be able to increase the
efficiency of the model and to keep it as generic as possible.
Aspects that are one hop neighbors from each other in a dictionary more or less mean the same thing.
We use WordNet[8] to gather one hop neighbors for an aspect. The co-occurrence of aspects in a
linguistic sense implies that two aspects could mean the same thing but in no way suggests that they
should always co-occur in that context. We only use these to group aspects together which will help us
in filtering out the redundant aspects while we are focused on calculating top aspects.
3. We follow a four-step process to accomplish our objective.
1) Extracting the entities from the corpus/text-base that are potential sentiment holders and are
objects for potential sentiment - which we term as Aspects.
2) Filtering out noise by separating stop words and other irrelevant terms in a User comment
using a Parts of Speech Tagger in [6].
3) Associating respective sentiment terms to the corresponding Aspect.
4) Assigning normalized sentiment scores to a feature/entity to keep the output unaffected from
changes in the algorithm or the weights we assign through our rules.
Another case in point in our work is the user-profile. It is not imperative that everyone has to record
their response in adherence to correct grammar and the language conjugate. So we have chosen our
rules (explained in section 3) in way that we only consider rules that are more generic than extremely
specific language rules. For instance, there is a rule to be applied in case of intensifiers being present
('Very' good) but we ignore the usage of exclamation marks and emoticons (! or :D :P etc) respectively
because of the tendency to use them at will and sometimes rather arbitrarily.
Our approach essentially consists of two agents and these agents operate serially -:
1) The Aspect Extraction Agent (AEA)
2) The Sentiment Association Agent (SAA)
Before AEA takes over, we remove noise from the user feedback. By that we mean, we remove all the
special symbols, stop words like {a, hmm, is, yea etc} and blank spaces and we feed a filtered dataset
to the AEA.
The Aspect Extraction Agent : The Aspect Extraction Agent (AEA) makes use of a POS tagger [6] to
separate the subjects from the opinion part of the comment. The aspects form the subjective entities
about which a respective sentiment is expressed. The AEA also filters out the noise by not ranking the
special symbols, if any left out or the obvious objective features. The assumption is that, on a
feedback form the number of comments are predominantly subjective. The POS tagger also helps us
identify the opinion words present in the sentence. Intuitively it constructs a {key:value} pair where
Key is the aspect or the context of the expression or a set of aspects/contexts of an expression. The
value is the list of opinion words related to that aspect i.e. those used to express that particular
aspect. And the input is further supplied to the SAA. The Aspects are further grouped to remove the
redundant aspects and to re-adjust the weights of the aspects in the context of the department, which
will further aid us in prioritizing the aspects for a department.
The Sentiment Association Agent : The sentiment association agent receives a set of {Key: Value}
pairs from the FEA. It then makes use of SentiWordNet, detailed in [1] to compute the opinion score
of each word and produces an aggregate opinion score for that particular feature. The scores that are
thus outputted are collected and are pruned to a set of language rules such as the presence of
intensifiers or otherwise and also the presence of negations, adverbs and adjectives to obtain the
4. sentiment score for that particular feature. The sentiment score thus obtained has three components
namely, Positive, Negative and Neutral. It is assumed with considerable conviction that almost every
positive opinion term, phrase or entity will have some sort of a negative and neutral score when used
in different senses and vice-versa.
Eg: - Its incredible. I absolutely love it. [incredible is positive here]
Ah, what incredibly awful stuff that is! [incredible is negative here]
The reason for normalizing is that any rescaling of an input vector can be effectively undone by
changing the corresponding weights and biases, leaving us with the exact same outputs as we had
before. However, there are a variety of practical reasons why standardizing the inputs can make
training faster and reduce the chances of getting stuck in local optima. Also, weight decay and
Bayesian estimation can be done more conveniently with standardized inputs. We can always tell the
system by how much the value has changed since the previous input. Also the sum of respective scores
of opinion words in SentiWordNet converge to 1. So it makes a case in point to normalize our
aggregate of opinion word scores to converge to 1 to ensure consistency without regard to change in
the input vector scales. The score calculation metrics and pseudo code are further detailed in Section
3.
However, there is a catch with the way sentiment scores are organized on SentiWordNet. Every term
in the SentiWordNet database is classified into a number of senses, each sense ranked according to the
frequency of its usage in general (with the help of a sense-number), indicating in how many different
contexts that particular term could be used. There might be cases where a term could carry
ambiguous scores in the same sense. Table 1 illustrates this case.
Synset SentiWordNet Score Gloss
huffy, mad#1, sore (roused to (0.0, 0.125) "she gets mad when you wake
anger) her up so early"; "mad at his
friend"; "moreover a remark"
brainsick, crazy, demented, (0.0, 0.5) "a man who had gone mad"
disturbed, mad#2, sick,
unbalanced, unhinged
delirious, excited, frantic, (0.375, 0.125) "a crowd of delirious baseball
mad#3, unrestrained fans"; “a mad whirl of pleasure"
harebrained, insane, mad#4 (0.0, 0.25) "harebrained ideas"; "took insane
(very foolish) risks behind the wheel";
“a completely mad scheme to
build a bridge between two
mountains"
Table 1 : Example of Multiple Scores for a same term from SentiWordNet
5. In Table 1 the word mad belonging to adjective part of speech has got ambiguous positive and
negative senses and the disambiguation becomes a very primitive problem. It could be related to the
Word Sense Disambiguation problem in some sense. Due to the limited time and the complexity of
introducing WSD in this model, a simple approach is proposed to solve this problem
• Evaluate scores for each term in a given sentence
• If there are conflicting scores i.e. different sense scores for the same word – calculate the
weighted average of all positive scores and all negative scores. By doing so, we deprecate the
individual sense scores as the sense number increases.
3. Score Calculation and Pseudo Code
Our model is a blend of traditional bag of words and intelligent look up and priority evaluation using
a set of Language rules. The simplest of rules to start with is 'The Negation Rule'. When a negation or
a negative word like, “not”, “neither” etc is found in a response, the polarity of the opinion associated
with the aspect in context is reversed. If R is a response and {A} is the set of aspects associated with
that response in that context Θ S and if {N} denotes the set of negative words then,
If [ ∃n∈Θ S ] where n∈N then A InversePolarities
Rule 1 : The Negation Rule
The second rule is 'The Modal Rule'. Modals are the trickiest to deal with. To understand why Modals
are important, consider the following cases.
Case 1 : “The doctor could have been more positive”
A response like that in Case 1 would be tagged as a positive response, for there is no exact negative
inference there.
Similarly,
Case 2 : “I would have not gotten as much attention in any other hospital”
A response like this in Case 2 would be tagged as negative for the same reason mentioned in case 1.
Pertaining to the usage of modals extensively in the Language spoken or written, it makes a huge
difference to the results if they are not handled appropriately. So the following rule is proposed to deal
with Modals. If R is a response and {A} is the set of aspects associated with that response in that
context Θ S and if {M} denotes the set of words like 'would have', 'could have' etc, which we term as
Modals then
6. If [ ∃m∈ΘS ] where m ∈M then A InversePolarities
Rule 2 : The Modal Rule
The adjustment of polarity with respect to adjectives and adverbs becomes a very important aspect of
sentence level sentiment extraction. We take into account the intensifier and the re-prune our polarity
scores according to the score of the intensifier. If the score of the intensifier is I = [ i p ,i n ,i o ] where
i p ,i n ,i o denote the positive, negative and objectiveness scores of the intensifier and if [ Ψ p ,Ψ n ,Ψ o ]
denote the values of positive, negative and objectiveness of the quantity that is being intensified by
the intensifier or reducer I. The re-prune values are given by the following rules R3 and R4 for
intensifiers and R5 and R6 for reducers.
If I p >I n and Ψ p >Ψ n then the resultant re-prune score for Ψ is given by
Ψ newNegative = I p∗Ψ n ÷ ∑ Ψ k −Ψ o
Ψ newPositive = ∑ Ψ k −Ψ o −Ψ n
Rule R3 : Rule to intensify the positive quotient
If I p >I n and Ψ n >Ψ p then the resultant re-prune score for Ψ is given by
Ψ newPositive = I p∗Ψ p ÷ ∑ Ψ k −Ψ o
Ψ newNegative = ∑ Ψ k −Ψ o −Ψ p
Rule R4 : Rule to intensify the negative quotient
If I n >I p and Ψ p >Ψ n then the resultant re-prune score for Ψ is given by
Ψ newPositive = I n∗Ψ p ÷ ∑ Ψ k −Ψ o
Ψ newNegative = ∑ Ψ k −Ψ o −Ψ p
Rule R5 : Rule to reduce the positive quotient
If I n >I p and Ψ n >Ψ p then the resultant re-prune score for Ψ is given by
Ψ newNegative = I n∗Ψ n ÷ ∑ Ψ k −Ψ o
Ψ newPositive = ∑ Ψ k −Ψ o −Ψ n
Rule R6 : Rule to reduce the negative quotient
7. The above division rules are valid only when Ψ p +Ψ n >Ψ o . Otherwise, the value of the denominator
in the division rules becomes 1. The intuition is to reduce the impact of the opinion word in case of a
reducer and vice-versa for intensifier and the denominator component in division ensures that the
values aren't scaled down or scaled up by a huge margin. We equate the denominator to 1 in case of
sum of positive and negative scores for an opinionated term being less than the objective score of that
term, to tackle the problem of Polarity Inversion. The above rules are applied to amplify or reduce
the impact of the intensifier and a reducer respectively, on an opinionated word and sufficient care is
taken that the values converge to 1, to ensure domain and overall consistency. The following cases
illustrate our Rules R1...R6
Eg 1 : Could not ask for any better place or doctor for any cancer patient.
Aspects : doctor, place, cancer, patient
Positive Sentiment Value: 0.30319
Negative Sentiment Value: 0.00736
(Before R1 and R2)
Positive Sentiment Value: 0.00736
Negative Sentiment Value: 0.30319
(After R1)
Positive Sentiment Value: 0.30319
Negative Sentiment Value: 0.00736
(After R1 and R2)
Eg 2 : The doctors were very approachable and easy to talk to, understood my problem, and I could
clearly understand them.
Aspects: doctor, problem
Positive Sentiment Value: 0.21824
Negative Sentiment Value: 0.09563
Eg 4 : Effect of Adverbs on Adjectives
(approx to two Positive Sentiment Negative Sentiment Neutral Sentiment
decimal places) Val Val Val
good 0.5 0.25 1-(0.5+0.25)=0.25
very 0.25 0.17 1-(0.25+0.17)=0.58
very good (0.5+0.25)-0.083= 0.667 (0.25*0.25)/ 0.25
(0.5+0.25)=0.083
TABLE 2 : Demonstrating Effect of Adverbs on Adjectives
8. The following algorithm, Algorithm 1 shows the pseudo code of the implementation.
1. Start
2. Map sentimentMap = Load SentiWordNet
3. For each comment c in Comments C
4. boolean hasNegation = false
5. NounBag = {}
6. SentimentValue = {}
7.. For each word w in comment c:
8. If(negation(w) == true)
9. Set hasNegation = ~hasNegation
10. EndIf
11. If Pos(w) = NOUN:
12. NounBag.append(w)
13. SentimentValue = getSentimentValue(SentimentMap, w)
14.. EndIf
15. Elif (Pos(w) == Adj or Verb):
16. SentimentValue = getSentimentValue(SentimentMap, w)
17. Reprune();
18. End Elif
19. Elif (Pos(w) == AdVerb):
20. SentimentValue = getSentimentValue(SentimentMap, w)
21. RepruneAdverb();
22. End Elif
23. If (hasNegation):
24. InversePolarities();
25. EndIf
26. If (hasModals):
27. PruneModals();
28, EndIf
29. EndFor
30. EndFor
31. End
ALGORITHM 1 : Semi-Rule Based Model for Sentence Level Sentiment Extraction.
The steps explain the order of operations which start with extracting every comment and in-turn,
every opinion word in the comment, finding out its part-of-speech and then later finding out the
presence of intensifiers and the subsequent re-pruning. A lot of work has to be still done in evaluating
Modals like 'would've been, could've been' and conjunctions. Currently the sentiment score on both
side of a conjunction is aggregated, but efforts have to be put into finding out a metric to efficiently
evaluate the opinion weights. Current efforts are put into disambiguating senses and finding out
9. linguistic rules in the presence of conjunctions and Modals to prune opinion weights.
4. Performance Analysis
For sentiment terms association and classification we have run our algorithm in four iterations and at
each iteration we achieved better results and outperformed our previous iteration. Of the dataset we
have, we have considered four departments {SECTSCHE, SECTCARE, SECTFACI, SECTOVER}
respectively as our data sample. The reason to run it in four iterations is to understand how our
algorithm was getting us better results with respect to addition of rules. The following table illustrates
our strategy at each Iteration.
Iteration 1 Gathering Adjectives and Adverbs from a response using POS Tagging and using
SentiWordNet to look up for the scores and associating those scores with the
corresponding aspects.
Iteration 2 1) The impact of different senses was realized and nouns, adjectives, adverbs and
verbs have been separated from the rest.
2) Each of the above 4 POS have been looked up differently from the SentiWordNet
and the impact of one POS term on another is considered {Rules R3.. R6}
3) The notion of positive, negative and neutral scores to each entity was introduced
which essentially means, every positive word has some amount of negative sense and
vice-versa. These scores have been normalized to adjust to the changes in input
scales.
Iteration 3 Similar to Second Iteration, but nouns have been ignored during Sentiment
Association and classification process owing to the notion that Nouns usually form
the Aspect/Context of a Response but they don't greatly influence the Opinion
Quotient of a response.
Iteration 4 Rule R2 to deal with Modals has been introduced.
TABLE 3 : List of Iterations for Sentiment Extraction and Classification.
The following tabular representations in FIG 1 and FIG 2 shows the results of all the four iterations.
At each iteration the false positives and false negatives have been calculated. False positives are those
negative opinions mis-calculated as positive and false negatives are vice-versa. These False Positives
and False Negatives are denoted by FPOS and FNEG in FIG 1. At each iteration the percentage error
in positive and negative have also been shown. The false positives and false negatives are mainly due
to the fact that, every opinion however negative it is, is expressed more so using positive words than
the negative words. The columns T-Positive and T-Negative denote the total number of original
positive and negative responses as manually annotated. We check our calculated results against these
manually annotated results to determine the accuracy and efficiency of our model. The results are
detailed in the following figures.
10. FIG 1 : The sentiment classification results at various iterations.
As it can be seen from the tabular column, with every iteration, the number of false positives and false
negatives have come down. (Refer to TABLE 3 for iteration details). The following figure FIG 2
gives the percentage errors in every iteration after first.
FIG 2 : Percentage Errors with each iteration
The above figure shows the percentage errors with respect to each iteration. % Error in Positive are
the number of positive opinions mis-calculated as negative or otherwise they are the false negatives
and the % error in Negative are vice-versa. The percentage errors could be still brought down by
improving rules for Modals and using a domain specific lexicon or building a domain specific lexicon
and using it. As it could be seen from FIG 2 the percentage of negative error has got to do with
people expressing weak negatives with the help of positive qualifiers. And the increase in percentage
positive error from Iteration 3 to Iteration 4 could be tackled by fine tuning Rule R2. Part of the mis-
calculations can be attributed to the limitations in POS Tagger [6] that in-turn has got to do with our
datasets not having responses in a proper grammatical structure.
5. Conclusion and Future Work
Our model is a rule based approach proposed for aspect extraction and the association of opinion to
those respective aspects. The contextual information and the sense of each individual sentence are
extracted according to the pattern structure of the sentence using a Parts of Speech Tagger. The first
11. stage opinion score for the extracted sense is assigned to the sentence using SentiWordNet. The
eventual opinion score is calculated after checking linguistic orientation of each term in a sentence
with the help of Rules R1..R6 explained in Section 3 and normalizing the results to ensure that the
eventual score of an aspect in a response converges to 1, irrespective of the number of opinion words
associated with that aspect.
The accuracy of our model could be improved by having a lexicon that is specific to the domain or by
employing a learning mechanism with the help of a feedback loop which could also be manual.
However, natural language processing is just not black and white. A lot of work still has to be done in
disambiguating the word senses and weights associated with the subjects and objects in the sentence
as both of them don't necessarily have the same impact on the sentence's sentiment. Work is also
being carried out to separate weak positives and negatives from Strong positives and negatives to
provide the customer with potential stand points to improve their product quality. An approach to
deal with conjunctions has to be worked upon for better accuracy.
5. References:
[1] SentiWordNet 3.0 – An Enhanced Lexical Resource for Sentiment Analysis and Opinion
Mining. In Proc. Of LREC10(2010)
[2] M.A. Hearst, “Untangling text data mining,” Proceedings of the 37th annual meeting of the
Association for Computational Linguistics on Computational Linguistics, 1999, pp. 3-10.
[3] P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised
classification of reviews,” Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (ACLʼ02), 2002, pp. 417-424.
[4] A. Andreevskaia and S. Bergler, “When specialists and generalists work together: Overcoming
domain dependence in sentiment tagging,” Proceedings of ACL-08: HLT, 2008, pp. 290-298.
[5] A. Khan, B. Baharudin, and K. Khan, “Sentiment Classification from Online Customer Reviews
Using Lexical Contextual Sentence Structure,” Communications in Computer and Information
Science, Software Engineering and Computer Systems, Springer Verlag, 2011, pp. 317-331.
[6] Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature Rich Part-
of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-
259.
[7] Pang B., Lee L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using
Machine Learning Techniques. Proceedings of EMNLP, 2002.
[8] Miller G. A., Beckwith R., Fellbaum C, Gross D, Miller K. J. (1990). Introduction to WordNet: An
On-line Lexical Database. International Journal of Lexicography. Vol. 3, No. 4 (Jan. 1990), 235-244.