Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
Sentiment analysis and Opinion mining has emerged as a popular and efficient technique for information retrieval and web data analysis. The exponential growth of the user generated content has opened new horizons for research in the field of sentiment analysis. This paper proposes a model for sentiment analysis of movie reviews using a combination of natural language processing and machine learning approaches. Firstly, different data pre-processing schemes are applied on the dataset. Secondly, the behaviour of twoclassifiers, Naive Bayes and SVM, is investigated in combination with different feature selection schemes to
obtain the results for sentiment analysis. Thirdly, the proposed model for sentiment analysis is extended to
obtain the results for higher order n-grams.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
Sentiment analysis and Opinion mining has emerged as a popular and efficient technique for information retrieval and web data analysis. The exponential growth of the user generated content has opened new horizons for research in the field of sentiment analysis. This paper proposes a model for sentiment analysis of movie reviews using a combination of natural language processing and machine learning approaches. Firstly, different data pre-processing schemes are applied on the dataset. Secondly, the behaviour of twoclassifiers, Naive Bayes and SVM, is investigated in combination with different feature selection schemes to
obtain the results for sentiment analysis. Thirdly, the proposed model for sentiment analysis is extended to
obtain the results for higher order n-grams.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Sentiment classification for product reviews (documentation)Mido Razaz
The documentation of the pre-master graduation project prepared by my self and my colleagues Mostafa Ameen, Mai M. Farag and Mohamed Abd El kader.
If you want me to conduct any similar research for you you can have my service through this link: https://www.fiverr.com/meizzo/convert-your-textual-data-set-from-csv-file-format-to-arff-format-for-weka
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
The research-based implementations towards Sentiment analyses are about a decade old and have introduced many significant algorithms, techniques, and framework towards enhancing its performance. The applicability of sentiment analysis towards business and the political survey is quite immense. However, we strongly feel that existing progress in research towards Sentiment Analysis is not at par with the demand of massively increasing dynamic data over the pervasive environment. The degree of problems associated with opinion mining over such forms of data has been less addressed, and still, it leaves the certain major scope of research. This paper will brief about existing research trends, some important research implementation in recent times, and exploring some major open issues about sentiment analysis. We believe that this manuscript will give a progress report with the snapshot of effectiveness borne by the research techniques towards sentiment analysis to further assist the upcoming researcher to identify and pave their research work in a perfect direction towards considering research gap.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Arabic Sentiment analysis research field has been progressing in a slow pace compared to English and other languages. In addition to that most of the contributions are based on using supervised machine learning algorithms while comparing the performance of different classifiers with different selected stylistic and syntactic features. In this paper, we presented a novel framework for using the Concept-level sentiment analysis approach which classifies text based on their semantics rather than syntactic features. Moreover, we provided a lexicon dataset of around 69 k unique concepts that covers multi-domain reviews collected from the internet. We also tested the lexicon on a test sample from the dataset it was collected from and obtained an accuracy of 70%. The lexicon has been made publicly available for scientific purposes.
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
This paper presents a system that extracts information from automatically annotated tweets using well known existing opinion lexicons and supervised machine learning approach. In this paper, the sentiment features are primarily extracted from novel high coverage tweet specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment word hashtags and from tweets with emoticons. The sentence level or tweet level classification is done based on these word level sentiment features by using Sequential Minimal Optimization SMO classifier. SemEval 2013 Twitter sentiment dataset is applied in this work. The ablation experiments show that this system gains in F Score of up to 6.8 absolute percentage points. Nang Noon Kham "Lexicon Based Emotion Analysis on Twitter Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26566.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26566/lexicon-based-emotion-analysis-on-twitter-data/nang-noon-kham
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
Sentiment analysis is the process of identifying people’s attitude and emotional state’s from language. The main objective is realized by identifying a set of potential features in the review and extracting opinion expressions about those features by exploiting their associations. Opinion mining, also known as Sentiment analysis, plays an important role in this process. It is the study of emotions i.e. Sentiments, Expressions that are stated in natural language. Natural language techniques are applied to extract emotions from unstructured data. There are several techniques which can be used to analysis such type of data. Here, we are categorizing these techniques broadly as ”supervised learning”, ”unsupervised learning” and ”hybrid techniques”. The objective of this paper is to provide the overview of Sentiment Analysis, their challenges and a comparative analysis of it’s techniques in the field of Natural Language Processing.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
"Knowing about the user’s feedback can come to a greater aid in knowing the user as well as improving the organization. Here an example of student’s data is taken for study purpose. Analyzing the student feedback will help to help to address student related problems and help to make teaching more student oriented. Prashali S. Shinde | Asmita R. Kanase | Rutuja S. Pawar | Yamini U. Waingankar ""Sentiment Analysis of Feedback Data"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23090.pdf
Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/23090/sentiment-analysis-of-feedback-data/prashali--s-shinde"
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Sentiment classification for product reviews (documentation)Mido Razaz
The documentation of the pre-master graduation project prepared by my self and my colleagues Mostafa Ameen, Mai M. Farag and Mohamed Abd El kader.
If you want me to conduct any similar research for you you can have my service through this link: https://www.fiverr.com/meizzo/convert-your-textual-data-set-from-csv-file-format-to-arff-format-for-weka
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
The research-based implementations towards Sentiment analyses are about a decade old and have introduced many significant algorithms, techniques, and framework towards enhancing its performance. The applicability of sentiment analysis towards business and the political survey is quite immense. However, we strongly feel that existing progress in research towards Sentiment Analysis is not at par with the demand of massively increasing dynamic data over the pervasive environment. The degree of problems associated with opinion mining over such forms of data has been less addressed, and still, it leaves the certain major scope of research. This paper will brief about existing research trends, some important research implementation in recent times, and exploring some major open issues about sentiment analysis. We believe that this manuscript will give a progress report with the snapshot of effectiveness borne by the research techniques towards sentiment analysis to further assist the upcoming researcher to identify and pave their research work in a perfect direction towards considering research gap.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Arabic Sentiment analysis research field has been progressing in a slow pace compared to English and other languages. In addition to that most of the contributions are based on using supervised machine learning algorithms while comparing the performance of different classifiers with different selected stylistic and syntactic features. In this paper, we presented a novel framework for using the Concept-level sentiment analysis approach which classifies text based on their semantics rather than syntactic features. Moreover, we provided a lexicon dataset of around 69 k unique concepts that covers multi-domain reviews collected from the internet. We also tested the lexicon on a test sample from the dataset it was collected from and obtained an accuracy of 70%. The lexicon has been made publicly available for scientific purposes.
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
This paper presents a system that extracts information from automatically annotated tweets using well known existing opinion lexicons and supervised machine learning approach. In this paper, the sentiment features are primarily extracted from novel high coverage tweet specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment word hashtags and from tweets with emoticons. The sentence level or tweet level classification is done based on these word level sentiment features by using Sequential Minimal Optimization SMO classifier. SemEval 2013 Twitter sentiment dataset is applied in this work. The ablation experiments show that this system gains in F Score of up to 6.8 absolute percentage points. Nang Noon Kham "Lexicon Based Emotion Analysis on Twitter Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26566.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26566/lexicon-based-emotion-analysis-on-twitter-data/nang-noon-kham
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
Sentiment analysis is the process of identifying people’s attitude and emotional state’s from language. The main objective is realized by identifying a set of potential features in the review and extracting opinion expressions about those features by exploiting their associations. Opinion mining, also known as Sentiment analysis, plays an important role in this process. It is the study of emotions i.e. Sentiments, Expressions that are stated in natural language. Natural language techniques are applied to extract emotions from unstructured data. There are several techniques which can be used to analysis such type of data. Here, we are categorizing these techniques broadly as ”supervised learning”, ”unsupervised learning” and ”hybrid techniques”. The objective of this paper is to provide the overview of Sentiment Analysis, their challenges and a comparative analysis of it’s techniques in the field of Natural Language Processing.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
"Knowing about the user’s feedback can come to a greater aid in knowing the user as well as improving the organization. Here an example of student’s data is taken for study purpose. Analyzing the student feedback will help to help to address student related problems and help to make teaching more student oriented. Prashali S. Shinde | Asmita R. Kanase | Rutuja S. Pawar | Yamini U. Waingankar ""Sentiment Analysis of Feedback Data"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23090.pdf
Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/23090/sentiment-analysis-of-feedback-data/prashali--s-shinde"
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
Sentiment analysis on Bangla conversation using machine learning approachIJECEIAES
Nowadays, online communication is more convenient and popular than faceto-face conversation. Therefore, people prefer online communication over face-to-face meetings. Enormous people use online chatting systems to speak with their loved ones at any given time throughout the world. People create massive quantities of conversation every second because of their online engagement. People's feelings during the conversation period can be gleaned as useful information from these conversations. Text analysis and conclusion of any material as summarization can be done using sentiment analysis by natural language processing. The use of communication for customer service portals in various e-commerce platforms and crime investigations based on digital evidence is increasing the need for sentiment analysis of a conversation. Other languages, such as English, have welldeveloped libraries and resources for natural language processing, yet there are few studies conducted on Bangla. It is more challenging to extract sentiments from Bangla conversational data due to the language's grammatical complexity. As a result, it opens vast study opportunities. So, support vector machine, multinomial naïve Bayes, k-nearest neighbors, logistic regression, decision tree, and random forest was used. From the dataset, extracted information was labeled as positive and negative.
Sentimental analysis of audio based customer reviews without textual conversionIJECEIAES
The current trends or procedures followed in the customer relation management system (CRM) are based on reviews, mails, and other textual data, gathered in the form of feedback from the customers. Sentiment analysis algorithms are deployed in order to gain polarity results, which can be used to improve customer services. But with evolving technologies, lately reviews or feedbacks are being dominated by audio data. As per literature, the audio contents are being translated to text and sentiments are analyzed using natural processing language techniques. However, these approaches can be time consuming. The proposed work focuses on analyzing the sentiments on the audio data itself without any textual conversion. The basic sentiment analysis polarities are mostly termed as positive, negative, and natural. But the focus is to make use of basic emotions as the base of deciding the polarity. The proposed model uses deep neural network and features such as Mel frequency cepstral coefficients (MFCC), Chroma and Mel Spectrogram on audio-based reviews.
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Sentiment classification aims to detect information such as opinions, explicit , implicit feelings expressed
in text. The most existing approaches are able to detect either explicit expressions or implicit expressions of
sentiments in the text separately. In this proposed framework it will detect both Implicit and Explicit
expressions available in the meeting transcripts. It will classify the Positive, Negative, Neutral words and
also identify the topic of the particular meeting transcripts by using fuzzy logic. This paper aims to add
some additional features for improving the classification method. The quality of the sentiment classification
is improved using proposed fuzzy logic framework .In this fuzzy logic it includes the features like Fuzzy
rules and Fuzzy C-means algorithm.The quality of the output is evaluated using the parameters such as
precision, recall, f-measure. Here Fuzzy C-means Clustering technique measured in terms of Purity and
Entropy. The data set was validated using 10-fold cross validation method and observed 95% confidence
interval between the accuracy values .Finally, the proposed fuzzy logic method produced more than 85 %
accurate results and error rate is very less compared to existing sentiment classification techniques.
Evaluating sentiment analysis and word embedding techniques on BrexitIAESIJAI
In this study, we investigate the effectiveness of pre-trained word embeddings for sentiment analysis on a real-world topic, namely Brexit. We compare the performance of several popular word embedding models such global vectors for word representation (GloVe), FastText, word to vec (word2vec), and embeddings from language models (ELMo) on a dataset of tweets related to Brexit and evaluate their ability to classify the sentiment of the tweets as positive, negative, or neutral. We find that pre-trained word embeddings provide useful features for sentiment analysis and can significantly improve the performance of machine learning models. We also discuss the challenges and limitations of applying these models to complex, real-world texts such as those related to Brexit.
Enhanced sentiment analysis based on improved word embeddings and XGboost IJECEIAES
Sentiment analysis is a well-known and rapidly expanding study topic in natural language processing (NLP) and text classification. This approach has evolved into a critical component of many applications, including politics, business, advertising, and marketing. Most current research focuses on obtaining sentiment features through lexical and syntactic analysis. Word embeddings explicitly express these characteristics. This article proposes a novel method, improved words vector for sentiments analysis (IWVS), using XGboost to improve the F1-score of sentiment classification. The proposed method constructed sentiment vectors by averaging the word embeddings (Sentiment2Vec). We also investigated the Polarized lexicon for classifying positive and negative sentiments. The sentiment vectors formed a feature space to which the examined sentiment text was mapped to. Those features were input into the chosen classifier (XGboost). We compared the F1-score of sentiment classification using our method via different machine learning models and sentiment datasets. We compare the quality of our proposition to that of baseline models, term frequency-inverse document frequency (TF-IDF) and Doc2vec, and the results show that IWVS performs better on the F1-measure for sentiment classification. At the same time, XGBoost with IWVS features was the best model in our evaluation.
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
This paper presents a comprehensive exploration of graph embedding techniques for sentiment analysis. The objective of this study is to enhance the accuracy of sentiment analysis models by leveraging the rich contextual relationships between words in text data. We investigate the application of graph embedding in the context of sentiment analysis, focusing on it is effectiveness in capturing the semantic and syntactic information of text. By representing text as a graph and employing graph embedding techniques, we aim to extract meaningful insights and improve the performance of sentiment analysis models. To achieve our goal, we conduct a thorough comparison of graph embedding with traditional word embedding and simple embedding layers. Our experiments demonstrate that the graph embedding model outperforms these conventional models in terms of accuracy, highlighting it is potential for sentiment analysis tasks. Furthermore, we address two limitations of graph embedding techniques: handling out-of-vocabulary words and incorporating sentiment shift over time. The findings of this study emphasize the significance of graph embedding techniques in sentiment analysis, offering valuable insights into sentiment analysis within various domains. The results suggest that graph embedding can capture intricate relationships between words, enabling a more nuanced understanding of the sentiment expressed in text data.
Opinion mining of movie reviews at document levelijitjournal
The whole world is changed rapidly and using the current technologies Internet becomes an essential
need for everyone. Web is used in every field. Most of the people use web for a common purpose like
online shopping, chatting etc. During an online shopping large number of reviews/opinions are given by
the users that reflect whether the product is good or bad. These reviews need to be explored, analyse and
organized for better decision making. Opinion Mining is a natural language processing task that deals
with finding orientation of opinion in a piece of text with respect to a topic. In this paper a document
based opinion mining system is proposed that classify the documents as positive, negative and neutral.
Negation is also handled in the proposed system. Experimental results using reviews of movies show the
effectiveness of the system.
Review of Sentiment Analysis: An Hybrid Approach IIJSRJournal
Sentiment analysis is acknowledged as detecting thoughts used from field content features additionally it's recognized while one linked to the main parts of standpoint extraction. Through this type of process, we will be able to discover if a movie script is positive, negative, or natural. Using this research, a feeling examination is executed along with calvados data. The text message sensation analyzer combines organic and natural language processing (NLP) and even machine studying techniques to provide measured assessment rankings to be able to entities, subjects, themes, and groups in a term or key phrase. Inside expressing feelings, the particular polarity of calvados written content reviews can always be graded for the damaging to good range utilizing the education algorithm. The certain current decade presents seen substantial improvements in artificial brains; along with the device mastering revolution offers converted the complete AI sector. In the end, unit learning techniques include grown to always be an important aspect of any design and style in today's absorbing world. However, this ensemble of researching techniques promises for anyone who is part of motorization using the removal of common regulations for textual written content message and sentiment category activities. This kind of particular thesis has to style and carry out a good superior functionality matrix employing ensemble studying intended for sentiment category while well as software. With this paper, we possess analyzed the well-known techniques adopted within the classical Emotion Analysis problem associated with analyzing Elections evaluations like; Support Vector Machine (SVM) and Linear Regression (LR) for the effective detection of sentiments from the dataset obtained from the Kaggle machine learning repository.
Similar to Opinion mining on newspaper headlines using SVM and NLP (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
2. IJECE ISSN: 2088-8708 Ì 2153
2. RELATED WORK
Agarwal et al. (4) proposed a method containing two algorithms. First algorithm is used for data
pre-processing while other to detect polarity value of a word. Natural Language Processing Tool (NLTK) and
SentiWordNet are employed to build the proposed method. NLTK is a Python library for word tokenization ,
POS (Part of Speech) - Tagging , Lemmatization and Stemming. SentiWordNet (5) lexicon, a method extended
from WordNet is specifically designed for Sentiment Analysis. Each word is assigned with positive or negative
numerical scores. Negative words are denoted with negative score, while positive words are assigned with
positive score. Output of NLTK is fed to SentiWordNet in order to assign numerical score for each word and
compute the final sentiment score, which is sum of all the numerical scores. If the final value is greater or equal
to 0, the headline is classified as positive or negative headline.
Yang et al. (6) proposed a hybrid model for analyzing sentiments of textual data for a single domain.
It is implemented domain wise due to increase in complexity upon segregation. Single classification model is
used to segregate the responses as positive, negative and neutral. The model is a combination of multiple single
classification method which provides more efficient classification.
Rana and Singh (7) compared two machine learning algorithms Linear SVM and Naive Bayes for
Sentiment Analysis. Review on movies are used as dataset containing 1000 samples. Proter Stemmer is em-
ployed to preprocess dataset. While Rapid Miner tool is used to generate model, Linear SVM and Naive Bayes
are used as classifier. Precision, recall and accuracy of both the models are calculated. From the result, it is
observed that the Linear SVM gives better result than Naive Bayes.
Bakshi et al. (8) proposed an approach to classify positive, negative and neutral tweets of Twitter. It
is focused on a single company Samsung Electronics Ltd. Data is fed and processed to clean the data. The
algorithm is applied to analyze the sentiment of tweets and segregate into different categories.
Aroju et al. (9) proposed a method to perform Opinion Mining on three different newspapers related
to similar news. SVM and Naive Bayes are used for opinion mining. Around 105 news headlines are collected
from three different sources (35 headlines from The Hindu, The Times of India and Deccan Chronicle News-
paper). Data is processed using POS - tagger and stemming. Weka tool is used to implement the method. For
experimental results, F-score, Precision and Recall are calculated. From the result it is evident that The Hindu
newspaper contains more positive news than The Times of India and Deccan Chronicle.
Hasan et al. (10) proposed an algorithm which uses Naive Bayes to perform opinion mining. Data
are reviewed in English from e-commerce website Amazon. The reviews are also translated to Bangla using
Google Translator. Opinion mining is calculated for reviews in both Bangla and English. The Bangla dataset
translated from English contains noise. The reviews in Bangla as training data are fed into Naive Bayes to build
classifier excluding noisy words.
Akkineni et al. (11) proposed a method of classifying opinion, opinions are classified based on subject
of opinions and objective of having such opinion, this method helps to classify whether sentence is a fact or
opinion. Approach adopted for classifying in this paper range are heuristic approach where results within a
realistic time-frame. They are likely to produce the results themselves but are mostly used with the optimized
algorithms, Discourse Structure which focuses on the given text that just communicates a message, and linking
it to how that message constructs a social reality or view of the world,key word analysis which classifies text
by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored,
Concept analysis which concentrates on semantic analysis of text through the use of web ontologies or semantic
networks.The conceptual and affective information associated with natural language opinions are aggregated.
Arora et al. (12) proposed Cross BOMEST, a cross domain sentimental classification. Existing
method BOMEST, it retrieves +ve words from a content, followed by determination of +ve word with as-
sistance of Ms Word Introp. In order to escalate the polarity it replaces all it’s synonym. Moreover, it helps in
blending two different domains and detect self-sufficient words. The proposed method is test and implemented
on Amazon dataset. Total of 1500 product reviews are randomly selected for both +ve and -ve polarity. Out of
which 1000 are used for training and remaining to test the classification model. As a result, when applying on
cross domain precision, accuracy of 92% is achieved. For single domain, precision and recall of BOMEST is
improved by 16% and 7%. Thus, Cross BOMEST improves the precision and accuracy by 5% when compared
to other existing techniques.
Susanti et al. (13) employs Multinomial Na¨ıve Bayes Tree which is combination of Multinomial Na¨ıve
Bayes and Decision Tree. The technique is used in data mining for classification of raw data. Multinomial
Na¨ıve Bayes method is used specifically to address frequency calculation in the text of the sentence or docu-
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)
3. 2154 Ì ISSN: 2088-8708
ment. Documents used in this study are comments of Twitter users on the GSM telecommunications provider
in Indonesia.[] This paper used the method to categorize customers sentiment opinion towards telecommunica-
tion providers in Indonesia. Sentiment analysis only consists of positive, negative and neutral class. Decision
Tree is generated with this method and roots in the feature ”aktif”, where probability of feature ”aktif” belongs
in positive class. In result and analysis, it is indicated that the highest accuracy of classification using Multino-
mial Na¨ıve Bayes Tree (MNBTree) method is 16.26% when using 145 features. Furthermore, the Multinomial
Na¨ıve Bayes (MNB) yields the highest accuracy of 73,15% by using all dataset of 1665 features.
In this type of research selection of appropriate feature is one of the challenges. Many researchers
are using decision tree and n-gram approach for feature selection and Supervised Machine learning technique
for model building. The tedious job in this type of research is data preprocessing. Most of the researchers are
using NLP tools for data preprocessing (14), (15), (16), (17), (18). In this paper, n-gram and coreNLP is used
for feature selection and Linear SVM are used for model building.
3. PROPOSED WORK
Many algorithms are available for finding sentiment from text, but they are performed on large text
dataset like movie reviews, product reviews etc. Finding opinions from news headlines is also possible but the
accuracy of existing algorithm is not satisfactory. This paper tries to improve the accuracy of existing algorithm
(4) using different approach. The proposed method is distributed into three processes.
As shown in Figure 1, Process I is regarding data collection and pre-processing. Process II is the
core fragment to build classifier and Process III tests the classification model on test data. These processes are
discussed in detail further.
Data Pre-
processing
Model Building
Model
Evaluating
Figure 1. Process diagram
3.1. Data Pre-processing And Model Building
In this process, data pre-processing and model building are implemented. Figure 2 depicts the basic
flowchart of the data preprocessing and model building. Stop words are removed from news headlines followed
by converting uppercase texts to lowercase. The semi-processed headlines are fed to coreNLP. The output of
coreNLP with sentiment scores are set as input for process II. The input is received from process I and converted
into unigram and bi-gram representation. Model A is generated from the unigram and bi-gram representation.
Model A employs Linear SVM. Data representation is further converted into Tf-idf resulting in Model B and
C. Model B and C employ Linear SVM. However, unlike model B, model C uses SGD classifer to train the
data.
News Headlines
Dataset with
Sentiment Score
Remove All
Stop Words
from Headlines
Convert All Head-
lines to Lowercase
Provide Lowercase
Headlines as
Input to coreNLP
Output : Pre-
processed News
Headlines With
Sentiment Score
Unigram and
Bi-gram Repre-
sentation of Data
Use Tf-idf for
Term Frequency
SVM + SGD SVM
C
B A
Figure 2. Flow diagram of data pre-processing and model building
IJECE, Vol. 9, No. 3, June 2019 : 2152 – 2163
4. IJECE ISSN: 2088-8708 Ì 2155
Data are collected from the website http://www.indianexpress.com during August 2017. Figure 3
depicts sample unprocessed data. 1472 news headlines are collected and manually classified as either +1 or
-1. For positive news, sentiment score is +1, while for negative news is -1. Subsequently, after allocating the
sentiment score, all the stop words from headlines such as: is, the, are etc are eliminated. NLTK STOP WORDS
is used to perform the task of removing stop words. The algorithm is case sensitive for which all the headlines
are converted to lowercase. The data is fed to Stanford CoreNLP to generate the dependency parser. Standford
CoreNLP dependency parser (19) checks the grammatical construction of a sentence, which establishes relation
between ”ROOT” word and altering words. To understand how coreNLP works consider news headlines: ”Two
killed in car bomb in Iraq Kirkuk”. Figure 4 depicts parsing of sample data using dependency parser. ”ROOT”
word of the headline is returned and relation between each word of the headline. From the sample data, killed
is returned as ”ROOT” word and relation between all the words. Figure 5 shows parsing of sample headline
using dependency parser and converting the output in the form of string array. String array consists of ”ROOT”
word and relative words. The process is applied on all the data to generate array of strings with ”ROOT” word
and relative words. Figure 6 is statistical representation of frequency of words in data. Figure 7 depicts sample
data with sentiment score. The processed data are input for process II.
Figure 3. Dataset screenshot
Figure 4. Dependency parser working diagram
Figure 5. Generating pandas dataframe using coreNLP
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)
5. 2156 Ì ISSN: 2088-8708
Figure 6. Most frequently occurred words in dataset
Figure 7. Sample dataset after applying coreNLP
3.2. Unigram and Bi-gram Representation of Data
Before building the model, the raw data is converted from string to numerical values. In machine
learning, Text Analysis is a key application area. Most of the algorithms accept numerical data with fixed
size rather than text data of varying size. A collective approach uses a document-term vector where individual
document is encrypted as a discrete vector that sums occurrences for each word in the vocabulary it contains
(3). For example, consider two one-sentence documents:
D1: ”I like Google Machine Learning course”
D2: ”Machine Learning is awesome”
The vocabulary V = { I, like, Google, Machine, Learning, course, is, awesome } and two documents
can be encoded as v1 and v2. Figure 8 and 9 show the representation of given sentences in unigram and bi-gram
model. The bi-gram model refines data representation where occurrences are determined by a sequence of two
words rather than individually.
IJECE, Vol. 9, No. 3, June 2019 : 2152 – 2163
6. IJECE ISSN: 2088-8708 Ì 2157
Figure 8. Unigram representation of given vocabulary
Figure 9. Bi-gram representation of given vocabulary
Figure 10 shows snipped code to generate unigram and bi-gram representation of given data. Data is
an array of 4 news headlines. Here, CountV ectorizer() is used to convert string data into numeric values.
To generate unigram model, pass argument ngram range = (1, 1) and ngram range = (1, 2) for bi-gram.
In figure, there are two matrices generated using pandas library. In matrix, columns represents unique words
(31 for unigram and 61 for bi-gram) and rows represents 4 news headlines. If the word exists in particular
news headlines the value for that particular feature will be 1 and if the word exists twice then the value for that
feature will be 2. Value depends on frequancy of word in headline. This method is used when a model is built
in Process II.
data = [
’ Coal Burying Goa : What the t o x i c t r a i n l e a v e s in i t s wake ’ ,
’ Coal Burying Goa : Lives touched by coal ’ ,
’ Coal burying Goa : Danger ahead , new coal c o r r i d o r i s coming up ’ ,
’Goa mining : Supreme Court i s s u e s n o t i c e s to Centre , s t a t e government ’
]
c l f = CountVectorizer ( ngram range = ( 1 , 1 ) )
df= c l f . f i t t r a n s f o r m ( data ) . t o a r r a y ( )
pd . DataFrame ( df )
0 1 2 3 . . .
0 0 1 0 0 . . .
1 0 1 1 0 . . .
2 1 1 0 0 . . .
3 0 0 0 1 . . .
4 rows * 31 columns
c l f = CountVectorizer ( ngram range = ( 1 , 2 ) )
df= c l f . f i t t r a n s f o r m ( data ) . t o a r r a y ( )
pd . DataFrame ( df )
0 1 2 3 . . .
0 0 1 0 0 . . .
1 0 1 1 0 . . .
2 1 1 0 0 . . .
3 0 0 0 1 . . .
4 rows * 61 columns
Figure 10. Snipped python code to generate Unigram and Bi-gram from given string data.
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)
7. 2158 Ì ISSN: 2088-8708
3.3. Model A. Linear SVM
In this process, Linear SVM is used to build the model and the data that has been used to build the
model is Numeric. The Description for total dataset is shown in Table 1. The Matrices in Figure 11 and 12
have been generated by using unigram and bi-gram, column shows the number of words in headlines and row
represents number of headlines. When the value in Matrix is 0, it means that particular word does not exist in
the headlines. In unigram, number of feature depends on total unique words in dataset.
Table 1. Dataset Description
Total Sample Total Feature
Unigram 1472 4497
Bi-gram 1472 13832
Figure 11. Dataset representation in Unigram model.
Figure 12. Dataset representation in Bi-gram model.
Table 1 shows total number of features and samples in unigram and bi-gram. Total sample size is
total number of news headlines and total feature size is number of unique words in dataset. Here, total sample
dataset size is same for both models. But in unigram model total number of feature is 4497 and 13832 for
bi-gram model. For building this model, 80% data are considered for training set and 20% are considered for
evaluating the model. Here, kernel is linear because we have two class labels, so SVM generates linear hyper
plane which will separate words. It separates all negative news headline words and positive news headline
words.
3.4. Model B. Tf-idf and Linear SVM
Linear SVM is used in this model building and the dataset is converted into document frequency using
Tf-idf. Tf is Term-frequency while Tf-idf (3) is Term-frequency time’s inverse document-frequency. It is used
to classify the documents. The main aim of Tf-idf is to calculate the importance of a word in any given headline
with respect to overall occurrence of that word in the dataset. The importance of a word is high if it is frequent
in the headline, but less frequent in overall headline. Tf-idf can calculated as follows (3):
tfidf(t, d) = tf(t, d) ∗ idf(t) (1)
IJECE, Vol. 9, No. 3, June 2019 : 2152 – 2163
8. IJECE ISSN: 2088-8708 Ì 2159
Where tf(t,d) is term frequency in particular headline, the term occurred number of times in particular headline
and is multiplied with idf(t).
idf(t) = 1 + log
1 + nd
1 + df(d, t)
(2)
Where nd is the total number of headlines, and df(d,t) is the number of headlines that contain term t. The
resulting Tf-idf vectors are then normalized by the Euclidean norm:
vnorm =
v
||v||2
=
v
v2
1 + v2
2 + .. + v2
n
(3)
d a t a c o u n t s = [ [ 3 , 0 , 1] ,
[2 , 0 , 0] ,
[3 , 0 , 0] ,
[4 , 0 , 0] ,
[3 , 2 , 0] ,
[3 , 0 , 2 ] ]
For example, Tf-idf is computed for the first term in the first document in the data counts array as follows:
To calculate Tf-idf of first term in document:
Total No. of Documents : nd,term1 = 6
Total No of Documents which contain this term 1 :
df(d, t)term1 = 6
idf is for term 1 :
idf(d, t)term1 = 1 + log nd
df(d,t) = log 6
6 + 1 = 1
tfidfterm1 = tfterm1 ∗ idfterm1 = 3 ∗ 1 = 3
Similarly for other two terms:
tfidfterm2 = 0 ∗ (6
1 + 1) = 0
tfidfterm3 = 0 ∗ (6
1 + 1) = 2.0986
Represent Tf-idf in vector:
tfidfraw = [3, 0, 2.0986]
After applying Euclidean norm:
[3,0,2.0986]
√
(32+02+2.09862)
= [0.819, 0, 0.573]
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)
9. 2160 Ì ISSN: 2088-8708
In idf(t), to avoid zero divisions ”smooth idf=True” adds ”1” to the numerator and denominator.After
modification in equation the first two term value will be same but in term3 value changes to 1.8473:
tfidfterm3 = 1 ∗ log 7
3 + 1 = 1.8473
[3,0,1.8473]
√
(32+02+1.84732)
= [0.8515, 0, 0.5243]
S i m i l a r l y by c a l c u l a t i n g every value in d a t a c o u n t s a r r a y the f i n a l
output w i l l be :
T f i d f = TfidfTransformer ( )
X= T f i d f . f i t t r a n s f o r m ( d a t a c o u n t s )
X. t o a r r a y ( )
output of i s :
a r r a y ( [
[ 0.8515 ,0.0000 ,0.5243] ,
[ 1.0000 ,0.0000 ,0.0000] ,
[ 1.0000 ,0.0000 ,0.0000] ,
[ 1.0000 ,0.0000 ,0.0000] ,
[ 0.5542 ,0.8323 ,0.0000] ,
[ 0.6303 ,0.0000 ,0.7763]
] )
The total number of sample data size and feature data size will remain same for building this model.
Here, the frequency of each word is changed according to Tf-idf. Unlike the previous model words with less
frequency in headlines will have lower values. It means words with lesser frequency will have lesser impact on
model. For training, this model is using 80% data and for testing it is using 20% data.
3.5. Model C. Stochastic Gradient Descent (SGD) Classifier
The SGD is used to train the data for Linear SVM. SGD (3) is a discriminative learning of linear
classifiers like SVM and Logistic Regression which is simple and very efficient. SGD has been effectively
implemented in numeric data machine learning problems majorly involved in text categorization and NLP.
Data provided is in sparse, the classifiers in SGD efficiently scales to the problem with more than 105
training
samples and with more than 105
attributes. The major advantage of SGD is that it can handle large dataset.
Here, in this research problem small size of dataset has been used but this approach can be extended this
research for up to 105
features.
4. MODEL EVALUATING
Result of three different models are compared. The confusion matrix has been used as a performance
metric. Table 2 describes the structure of confusion matrix. The formula for accuracy of model is shown in eq.
4. Table 3, 4 and 5 are confusion matrices of Model A, B and C and Table 6 shows the accuracy score of three
models. This table implies that the bi-gram will give more accurate result than unigram. However, in unigram
model, number of feature is less than bi-gram model, due to which time in building the model in unigram is
less than bi-gram. Here, the accuracy of Model B is higher than Model A because it is trained with Tf-idf.
With the increase in feature size (>20000), Model A and B will not provide feasible solutions. To overcome
such issues Model C is introduced in this paper and it is trained using SGD, it supports up to 105
features (3)
for building a model. Thus Model C can be used when the feature size is high, otherwise Model B works well
when the feature size is less.
IJECE, Vol. 9, No. 3, June 2019 : 2152 – 2163
10. IJECE ISSN: 2088-8708 Ì 2161
News Headlines Dataset without Sentiment Score
BA C
Generate Con-
fusion Matrix
Generate Con-
fusion Matrix
Generate Con-
fusion Matrix
Result of
Model A
Result of
Model B
Result of
Model C
Figure 13. Models evaluation
Table 2. Confusion Matrix
PREDICTED
TRUE FALSE
ACTUAL TRUE TP FN
FALSE FP TN
Accuracy Score =
TP + TN
TP + TN + FP + FN
∗ 100 % (4)
Table 3. Model A (Linear SVM) Confusion Matrix
Unigram Model
PREDICTED
TRUE FALSE
ACTUAL TRUE 223 29
FALSE 9 34
Bi-gram Model
TRUE FALSE
ACTUAL TRUE 228 27
FALSE 4 36
Table 4. Model B (Linear SVM + Tf-idf) Confusion Matrix
Unigram Model
PREDICTED
TRUE FALSE
ACTUAL TRUE 227 22
FALSE 5 41
Bi-gram Model
TRUE FALSE
ACTUAL TRUE 228 21
FALSE 4 42
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)
11. 2162 Ì ISSN: 2088-8708
Table 5. Model C (SGD) Confusion Matrix
Unigram Model
PREDICTED
TRUE FALSE
ACTUAL TRUE 218 34
FALSE 14 29
Bi-gram Model
TRUE FALSE
ACTUAL TRUE 226 29
FALSE 6 34
Table 6. Accuracy Comparison between Different Models:
Model A (Linear SVM) Model B (Tf-idf + Linear SVM) Model C (SGD)
Unigram 87.11 % 90.84 % 83.72 %
Bi-gram 89.49 % 91.52 % 88.13 %
5. CONCLUSION AND FUTURE SCOPE
Opinion mining is a vast research domain which can be implemented using Neural Networks, NLP etc.
The proposed method is implemented and tested on National/International news headlines of many different
regions. The proposed method outperforms existing model. However, the proposed method can not handle
sarcasm.
As a future work, based on data accessibility, the proposed method can be implemented on social me-
dia data like Twitter, Facebook etc. Also in public sector, the proposed method can be used for the government
to analyze impact of a policy in particular region. The proposed method can be altered to handle sarcasm in
process I and II in order to increase the accuracy of the model.
REFERENCES
[1] D. Dor, “On newspaper headlines as relevance optimizers,” Journal of Pragmatics, vol. 35, no. 5, pp. 695
– 721, 2003. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378216602001340
[2] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The Stanford CoreNLP
natural language processing toolkit,” in Association for Computational Linguistics (ACL) System
Demonstrations, 2014, pp. 55–60. [Online]. Available: http://www.aclweb.org/anthology/P/P14/P14-
5010
[3] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,
“Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–
2830, 2011.
[4] A. Agarwal, V. Sharma, G. Sikka, and R. Dhir, “Opinion mining of news headlines using sentiwordnet,”
in Colossal Data Analysis and Networking (CDAN), Symposium on. IEEE, 2016, pp. 1–5.
[5] S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment
analysis and opinion mining.” in LREC, vol. 10, 2010, pp. 2200–2204.
[6] K. Yang, Y. Cai, D. Huang, J. Li, Z. Zhou, and X. Lei, “An effective hybrid model for opinion mining and
sentiment analysis,” in Big Data and Smart Computing (BigComp), 2017 IEEE International Conference
on. IEEE, 2017, pp. 465–466.
[7] S. Rana and A. Singh, “Comparative analysis of sentiment orientation using svm and naive bayes tech-
niques,” in Next Generation Computing Technologies (NGCT), 2016 2nd International Conference on.
IEEE, 2016, pp. 106–111.
IJECE, Vol. 9, No. 3, June 2019 : 2152 – 2163
12. IJECE ISSN: 2088-8708 Ì 2163
[8] R. K. Bakshi, N. Kaur, R. Kaur, and G. Kaur, “Opinion mining and sentiment analysis,” in Computing for
Sustainable Global Development (INDIACom), 2016 3rd International Conference on. IEEE, 2016, pp.
452–455.
[9] S. Aroju, V. Bhargavi, S. Namburi et al., “Opinion mining on indian newspaper quotations,” in Signal
Processing and Integrated Networks (SPIN), 2016 3rd International Conference on. IEEE, 2016, pp.
749–752.
[10] K. A. Hasan, M. S. Sabuj, and Z. Afrin, “Opinion mining using na¨ıve bayes,” in Electrical and Computer
Engineering (WIECON-ECE), 2015 IEEE International WIE Conference on. IEEE, 2015, pp. 511–514.
[11] H. Akkineni, P. Lakshmi, and B. V. Babu, “Online crowds opinion-mining it to analyze current trend:
A review,” International Journal of Electrical and Computer Engineering (IJECE), vol. 5, no. 5, pp.
1180–1187, 2015.
[12] P. Arora, D. Virmani, and P. Kulkarni, “An approach for big data to evolve the auspicious information
from cross-domains,” International Journal of Electrical and Computer Engineering (IJECE), vol. 7,
no. 2, pp. 967–974, 2017.
[13] A. R. Susanti, T. Djatna, and W. A. Kusuma, “Twitter’s sentiment analysis on gsm services using multi-
nomial na¨ıve bayes.” Telkomnika, vol. 15, no. 3, 2017.
[14] J. Jotheeswaran and Y. Kumaraswamy, “Opinion mining using decision tree based feature selection
through manhattan hierarchical cluster measure.” Journal of Theoretical & Applied Information Tech-
nology, vol. 58, no. 1, 2013.
[15] C. Bhadane, H. Dalal, and H. Doshi, “Sentiment analysis: measuring opinions,” Procedia Computer
Science, vol. 45, pp. 808–814, 2015.
[16] J. Kaur and S. Vashisht, “Analysis and indentifying variation in human emotion through data mining,”
International Journal of Computer Technology and Applications, vol. 3, no. 6, p. 1963, 2012.
[17] I. Smeureanu and C. Bucur, “Applying supervised opinion mining techniques on online user reviews,”
Informatica economica, vol. 16, no. 2, p. 81, 2012.
[18] K. Gao, H. Xu, and J. Wang, “A rule-based approach to emotion cause detection for chinese micro-blogs,”
Expert Systems with Applications, vol. 42, no. 9, pp. 4517–4528, 2015.
[19] D. Chen and C. Manning, “A fast and accurate dependency parser using neural networks,” in Proceedings
of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 740–
750.
Opinion mining on newspaper headlines using... (Chaudhary Jashubhai Rameshbhai)