Nowadays Sentiment Analysis play an important Role in each field such as Stock market, product reviews, news article, political debates which help us to determining current trend in the market regarding specific product, event, issues. Here we are apply sentiment analysis on microblogging platforms such as twitter, Facebook which is used by different people to express their opinion with respect to different kind of foods in the field of home’schef. This paper explain different methods of text preprocessing and applies them with a naive Bayes classifier in a big data, distributed computing platform with the goal of creating a scalable sentiment analysis solution that can classify text into positive or negative categories. We apply negation handling, word n-grams, stemming, and feature selection to evaluate how different combinations of these pre-processing methods affect performance and efficiency.
The document discusses various applications of dimension reduction techniques to extract low-dimensional representations from high-dimensional data for purposes of prediction, descriptive analysis, and input into subsequent causal analysis. It provides examples of such applications using Google search data, genetic data, medical claims data, credit scores, online purchases, and congressional roll call votes. It also discusses issues around text as data, including bag-of-words representations and the use of automated and manual steps in text analysis.
The document presents an overview of probabilistic models for information retrieval. It discusses how probability theory can be applied to model the uncertain nature of retrieval, where queries only vaguely represent user needs and relevance is uncertain. The document outlines different probabilistic IR models including the classical probabilistic retrieval model, probability ranking principle, binary independence model, Bayesian networks, and language modeling approaches. It also describes datasets used to evaluate these models, including collections from TREC, Cranfield, and others. Basic probability theory concepts are reviewed, including joint probability, conditional probability, and rules relating probabilities.
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the
unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
IRJET- Sentimental Analysis of Product Reviews for E-Commerce WebsitesIRJET Journal
This document summarizes a research paper that proposes using sentiment analysis of product reviews on e-commerce websites to help consumers decide where to purchase a product. The researchers describe collecting reviews from multiple websites, preprocessing the text, using clustering and classification algorithms like mean shift and support vector machines to label reviews as positive, negative or neutral. The system would then compare the results across websites and recommend the one with the most positive reviews to reduce the time users spend researching. Future work could include detecting fake reviews and identifying reasons for negative reviews on particular sites.
The document discusses various applications of dimension reduction techniques to extract low-dimensional representations from high-dimensional data for purposes of prediction, descriptive analysis, and input into subsequent causal analysis. It provides examples of such applications using Google search data, genetic data, medical claims data, credit scores, online purchases, and congressional roll call votes. It also discusses issues around text as data, including bag-of-words representations and the use of automated and manual steps in text analysis.
The document presents an overview of probabilistic models for information retrieval. It discusses how probability theory can be applied to model the uncertain nature of retrieval, where queries only vaguely represent user needs and relevance is uncertain. The document outlines different probabilistic IR models including the classical probabilistic retrieval model, probability ranking principle, binary independence model, Bayesian networks, and language modeling approaches. It also describes datasets used to evaluate these models, including collections from TREC, Cranfield, and others. Basic probability theory concepts are reviewed, including joint probability, conditional probability, and rules relating probabilities.
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the
unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
IRJET- Sentimental Analysis of Product Reviews for E-Commerce WebsitesIRJET Journal
This document summarizes a research paper that proposes using sentiment analysis of product reviews on e-commerce websites to help consumers decide where to purchase a product. The researchers describe collecting reviews from multiple websites, preprocessing the text, using clustering and classification algorithms like mean shift and support vector machines to label reviews as positive, negative or neutral. The system would then compare the results across websites and recommend the one with the most positive reviews to reduce the time users spend researching. Future work could include detecting fake reviews and identifying reasons for negative reviews on particular sites.
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
We propose an automatic classification system of movie genres based on different features from their textual synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis, and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
We propose an automatic classification system of movie genres based on different features from their textual
synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis,
and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
This document summarizes a project on sentiment analysis of tweets using lexicon-based approaches. It discusses tokenization, stop word removal, stemming, lemmatization, and lexicon-based sentiment analysis. Naive Bayes algorithms are also covered, explaining how they work and their applications, which include real-time prediction, text classification, and recommendation systems. Tools used for the analysis include Anaconda and Spider.
The document summarizes a student project to build a question answering search engine called "Ask Yahoo!" that retrieves answers directly from the Yahoo Answers database. The project involved:
1. Cleaning and indexing over 4 million Yahoo Answers questions and answers to create searchable indexes.
2. Developing a search engine frontend where users can enter questions and receive top matching answers below the search box.
3. Implementing a ranking system using TF-IDF that scores documents based on user query terms.
4. Evaluating performance on a test set where friends rewrote questions to check if the top answer matched the original question. The system achieved a score of 0.7 for correctly answering questions after query
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
This document provides an agenda and overview of topics related to data science and machine learning. It discusses data science processes including data preparation, algorithm selection, model deployment, and performance measurement. It also distinguishes machine learning from artificial intelligence and describes common machine learning algorithms like supervised and unsupervised learning. Examples of supervised and unsupervised learning applications are presented along with generic workflows. Machine learning algorithm selection and example cases are also summarized.
1. The document discusses machine learning and provides an overview of the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2. It describes tips for data preparation such as exploring data for trends and issues, formatting data consistently, and handling missing values, outliers, and imbalanced data.
3. Techniques for outlier removal are discussed including clustering-based, nearest-neighbor based, density-based, graphical, and statistical approaches. Limitations and challenges of outlier removal are noted.
This document provides an overview of exploratory data analysis (EDA). It discusses how EDA is used to generate and refine questions from data by visualizing, transforming, and modeling the data. Questions can come from hypotheses, problems, or the data itself. EDA plays a role in developing, testing, and refining theories, solving problems, and asking interesting questions about the data. The document emphasizes being skeptical of assumptions and open to multiple interpretations during EDA to maximize learning from the data. It introduces the dplyr and ggplot2 packages for selecting, filtering, summarizing, and visualizing data during the EDA process.
Information retrieval is the process of finding documents that satisfy an information need from within large collections, such as web search, email search, searching your laptop, corporate knowledge bases, and legal information retrieval. The goal is to retrieve documents that are relevant to the user's task and information need. A search engine takes a user's query and searches a collection of documents to return relevant results. Precision refers to the fraction of retrieved documents that are relevant, while recall refers to the fraction of relevant documents that are retrieved.
The document provides an overview of machine learning activities including data exploration, preprocessing, model selection, training and evaluation. It discusses exploring different data types like numerical, categorical, time series and text data. It also covers identifying and addressing data issues, feature engineering, selecting appropriate models for supervised and unsupervised problems, training models using methods like holdout and cross-validation, and evaluating model performance using metrics like accuracy, confusion matrix, F-measure etc. The goal is to understand the data and apply necessary steps to build and evaluate effective machine learning models.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
Information retrieval systems aim to find documents relevant to a user's information need. Search engines are a common example, allowing users to enter queries and receiving a list of relevant web pages. Effective systems represent documents and queries statistically based on word frequencies and use scoring functions to rank documents by estimated relevance to the query. Evaluation involves measuring a system's precision, the proportion of returned documents that are relevant, and recall, the proportion of all relevant documents that are returned.
This document summarizes a research paper that proposes a method to semantically detect plagiarism in research papers using text mining techniques. It introduces the problem of plagiarism in research and the need for automated detection methods. The proposed method uses TF-IDF to encode documents and LSI for semantic indexing. It collects research papers, preprocesses text, encodes documents with TF-IDF, and indexes them semantically using LSI to find similar papers and detect plagiarism.
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
This document summarizes a research paper on Twitter sentiment analysis using machine learning. It describes extracting tweets on a topic, cleaning the data, extracting features, building a logistic regression model to classify tweets as positive, negative or neutral sentiment, and validating the model. The goal is to analyze public sentiment from Twitter data, which has applications in marketing, product feedback, and other areas.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
El documento define el grooming como el acoso a menores a través de Internet con el objetivo de obtener satisfacción sexual. El proceso implica que un adulto establece lazos de amistad con un menor simulando ser otro niño, obtiene información personal del menor, y eventualmente consigue que el menor envíe imágenes pornográficas o se desnude frente a la cámara mediante tácticas de seducción y provocación. El documento recomienda prevenir el grooming limitando el uso inadecuado de cámaras web y asegurando
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
We propose an automatic classification system of movie genres based on different features from their textual synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis, and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
We propose an automatic classification system of movie genres based on different features from their textual
synopsis. Our system is first trained on thousands of movie synopsis from online open databases, by learning relationships between textual signatures and movie genres. Then it is tested on other movie synopsis,
and its results are compared to the true genres obtained from the Wikipedia and the Open Movie Database
(OMDB) databases. The results show that our algorithm achieves a classification accuracy exceeding 75%.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
This document summarizes a project on sentiment analysis of tweets using lexicon-based approaches. It discusses tokenization, stop word removal, stemming, lemmatization, and lexicon-based sentiment analysis. Naive Bayes algorithms are also covered, explaining how they work and their applications, which include real-time prediction, text classification, and recommendation systems. Tools used for the analysis include Anaconda and Spider.
The document summarizes a student project to build a question answering search engine called "Ask Yahoo!" that retrieves answers directly from the Yahoo Answers database. The project involved:
1. Cleaning and indexing over 4 million Yahoo Answers questions and answers to create searchable indexes.
2. Developing a search engine frontend where users can enter questions and receive top matching answers below the search box.
3. Implementing a ranking system using TF-IDF that scores documents based on user query terms.
4. Evaluating performance on a test set where friends rewrote questions to check if the top answer matched the original question. The system achieved a score of 0.7 for correctly answering questions after query
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
This document provides an agenda and overview of topics related to data science and machine learning. It discusses data science processes including data preparation, algorithm selection, model deployment, and performance measurement. It also distinguishes machine learning from artificial intelligence and describes common machine learning algorithms like supervised and unsupervised learning. Examples of supervised and unsupervised learning applications are presented along with generic workflows. Machine learning algorithm selection and example cases are also summarized.
1. The document discusses machine learning and provides an overview of the seven steps of machine learning including gathering data, preparing data, choosing a model, training the model, evaluating the model, tuning hyperparameters, and making predictions.
2. It describes tips for data preparation such as exploring data for trends and issues, formatting data consistently, and handling missing values, outliers, and imbalanced data.
3. Techniques for outlier removal are discussed including clustering-based, nearest-neighbor based, density-based, graphical, and statistical approaches. Limitations and challenges of outlier removal are noted.
This document provides an overview of exploratory data analysis (EDA). It discusses how EDA is used to generate and refine questions from data by visualizing, transforming, and modeling the data. Questions can come from hypotheses, problems, or the data itself. EDA plays a role in developing, testing, and refining theories, solving problems, and asking interesting questions about the data. The document emphasizes being skeptical of assumptions and open to multiple interpretations during EDA to maximize learning from the data. It introduces the dplyr and ggplot2 packages for selecting, filtering, summarizing, and visualizing data during the EDA process.
Information retrieval is the process of finding documents that satisfy an information need from within large collections, such as web search, email search, searching your laptop, corporate knowledge bases, and legal information retrieval. The goal is to retrieve documents that are relevant to the user's task and information need. A search engine takes a user's query and searches a collection of documents to return relevant results. Precision refers to the fraction of retrieved documents that are relevant, while recall refers to the fraction of relevant documents that are retrieved.
The document provides an overview of machine learning activities including data exploration, preprocessing, model selection, training and evaluation. It discusses exploring different data types like numerical, categorical, time series and text data. It also covers identifying and addressing data issues, feature engineering, selecting appropriate models for supervised and unsupervised problems, training models using methods like holdout and cross-validation, and evaluating model performance using metrics like accuracy, confusion matrix, F-measure etc. The goal is to understand the data and apply necessary steps to build and evaluate effective machine learning models.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
Information retrieval systems aim to find documents relevant to a user's information need. Search engines are a common example, allowing users to enter queries and receiving a list of relevant web pages. Effective systems represent documents and queries statistically based on word frequencies and use scoring functions to rank documents by estimated relevance to the query. Evaluation involves measuring a system's precision, the proportion of returned documents that are relevant, and recall, the proportion of all relevant documents that are returned.
This document summarizes a research paper that proposes a method to semantically detect plagiarism in research papers using text mining techniques. It introduces the problem of plagiarism in research and the need for automated detection methods. The proposed method uses TF-IDF to encode documents and LSI for semantic indexing. It collects research papers, preprocesses text, encodes documents with TF-IDF, and indexes them semantically using LSI to find similar papers and detect plagiarism.
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
This document summarizes a research paper on Twitter sentiment analysis using machine learning. It describes extracting tweets on a topic, cleaning the data, extracting features, building a logistic regression model to classify tweets as positive, negative or neutral sentiment, and validating the model. The goal is to analyze public sentiment from Twitter data, which has applications in marketing, product feedback, and other areas.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
El documento define el grooming como el acoso a menores a través de Internet con el objetivo de obtener satisfacción sexual. El proceso implica que un adulto establece lazos de amistad con un menor simulando ser otro niño, obtiene información personal del menor, y eventualmente consigue que el menor envíe imágenes pornográficas o se desnude frente a la cámara mediante tácticas de seducción y provocación. El documento recomienda prevenir el grooming limitando el uso inadecuado de cámaras web y asegurando
Este documento describe un proyecto de un año sobre acrosport en educación física para alumnos de primaria. El proyecto tiene como objetivos desarrollar habilidades individuales y de trabajo en equipo a través de la práctica de acrobacias, y recaudar fondos para una asociación de discapacitados mediante exhibiciones. El proyecto implica varias áreas y utiliza metodologías como el aprendizaje cooperativo y las TIC.
Este documento describe un proyecto para que estudiantes de 10-12 años de diferentes escuelas colaboren para aprender sobre Europa y la Unión Europea. Los estudiantes se presentarán unos a otros y formarán equipos internacionales. Investigarán preguntas sobre Europa y completarán actividades para ganar puntos. El proyecto utilizará herramientas digitales como un blog y Twinspace para compartir información y experiencias.
Este documento presenta un poema que describe la Semana Santa en Triana a través de tres oraciones. Resalta las tradiciones y procesiones de la Semana Santa en Triana, haciendo énfasis en las cruces que se llevan en procesión y la presencia de la Virgen de la Salud. Con imágenes poéticas, transmite la alegría y devoción que caracterizan a la Semana Santa en este barrio sevillano.
Este documento contiene varios poemas cortos escritos por Santiago Martín Moreno. Los poemas tratan temas como la amistad, el amor, la naturaleza y la vida. El estilo de los poemas es sencillo y reflexivo.
Rogelio Bobis Sabas Jr. has over 15 years of experience in commissioning and operating power plants, mining facilities, and cement plants around the world. He has worked as a Commissioning and DCS Operator in Turkey, Lead Operator in Madagascar, and operated mills, crushers, and material handling equipment in Saudi Arabia and the Philippines. His experience includes starting up and shutting down equipment, monitoring and controlling processes, troubleshooting issues, and ensuring safe and efficient operations.
We are specializing in Interior Decoration and Turn Key Execution of Interior Works. Currently we have very strong presence in the interior designing space in Bangalore, Goa, Pune and Kolkata in both commercial and residential
1. The document discusses endocrine disorders and their relation to obesity. It provides causes of obesity like overeating, intake of heavy or sweet foods, lack of exercise, and lack of active sex.
2. Treatment principles discussed include regulating eating habits, increasing physical activity, stress management, and genetic factors. Flow charts show how overeating can lead to increased weight.
3. Ayurvedic concepts of srotas, ama, and medas are explained in relation to obesity. Management focuses on normalization of digestive processes and metabolism.
This document defines and describes capital markets. It discusses that capital markets are financial markets for buying and selling long-term debt or equity securities to channel wealth from savers to long-term investments. It notes that capital markets are important for mobilizing savings, capital formation, providing investment opportunities, proper regulation of funds, and speeding economic growth. The document then describes the two types of capital markets: primary markets where new securities are first issued, and secondary markets where previously issued securities are traded. It provides details on the features and purpose of primary and secondary capital markets. Finally, it lists some common capital market instruments and risks.
En una isla que albergaba sentimientos como el amor y la sabiduría, estos se enteraron de que la isla se hundiría. Todos menos el amor lograron escapar en barcos. Cuando el amor pidió ayuda a la riqueza, el orgullo y la tristeza, estos se negaron. Finalmente, el tiempo acogió al amor en su barco y lo salvó, pues solo el tiempo comprende la importancia del amor.
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
The document discusses aspect-level sentiment analysis on hotel reviews. It describes extracting sentiments on specific aspects or entities mentioned in documents, like reviews. It uses Python tools like scrapy and NLTK to preprocess reviews, identify aspects in sentences, and determine sentiment scores for each aspect using a sentiment analysis algorithm. The goal is to analyze different aspects of reviews and summarize sentiment values to understand customer feedback.
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...Andrew Parish
This document summarizes a research paper that analyzed sentiment of movie reviews written in Bangla using machine learning techniques. The researchers collected a dataset of over 4,000 Bangla movie reviews labeled as positive or negative. Using this dataset, they tested support vector machine and long short-term memory models, achieving 88.9% and 82.42% accuracy respectively. The paper also reviewed other prior work on Bangla sentiment analysis and compared different machine learning methods.
Sentiment Analysis and Classification of Tweets using Data MiningIRJET Journal
This document summarizes research on using data mining techniques to perform sentiment analysis on tweets. The researchers collected tweets from Twitter and preprocessed the text to make it usable for building sentiment classifiers. They used three classifiers - K-Nearest Neighbor, Naive Bayes, and Decision Tree - and compared the results to determine which provided the best accuracy. Rapid Miner tool was used to preprocess the text, build the classifiers, and analyze the results. The goal was to determine people's sentiments expressed in their tweets and correctly classify them.
Co-Extracting Opinions from Online ReviewsEditor IJCATR
Exclusion of opinion targets and words from online reviews is an important and challenging task in opinion mining. The
opinion mining is the use of natural language processing, text analysis and computational process to identify and recover the subjective
information in source materials. This paper propose a Supervised word alignment model, which identifying the opinion relation. Rather
than this paper focused on topical relation, in which to extract the relevant information or features only from a particular online reviews.
It is based on feature extraction algorithm to identify the potential features. Finally the items are ranked based on the frequency of
positive and negative reviews. Compared to previous methods, our model captures opinion relation and feature extraction more precisely.
One of the most advantages that our model obtain better precision because of supervised alignment model. In addition, an opinion
relation graph is used to refer the relationship between opinion targets and opinion words.
The document provides an overview of sentiment analysis and summarizes the current approaches used. It discusses how machine learning classifiers like Naive Bayes can be used for sentiment classification of texts, treating it as a two-class text classification problem. It also mentions the use of natural language processing techniques. The current system discussed will use machine learning and NLP for sentiment analysis of tweets, training classifiers on labeled tweet data to classify the polarity of new tweets.
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
The document describes a proposed model for sentiment analysis of movie reviews using natural language processing and machine learning approaches. The model first applies various data pre-processing techniques to the dataset, including tokenization, pruning, filtering tokens, and stemming. It then investigates the performance of classifiers like Naive Bayes and SVM combined with different feature selection schemes, including term occurrence, binary term occurrence, term frequency and TF-IDF. Experiments are run using n-grams up to 4-grams to determine the best approach for sentiment analysis.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The document discusses sarcasm detection using logistic regression. It compares the performance of logistic regression and SVM classification for sarcasm detection. Logistic regression achieved higher accuracy of 93.5% for sarcasm detection, with lower execution time compared to SVM classification. The proposed approach uses data preprocessing, feature extraction using N-grams, and trains a logistic regression classifier on a manually labeled dataset to classify text as sarcastic or non-sarcastic. Accuracy and execution time analysis shows logistic regression performs better than SVM for this task.
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
Any E-Commerce website gets bad reputation if they
sell a product which has bad review, the user blames the eCommerce website rather than manufacturers most of the
times. In some review sites some great audits are included by
the item organization individuals itself so as to make so as to
deliver false positive item reviews. To eliminate these type of
fake product review, we will create a system that finds out the
fake reviews and eliminates all the fake reviews by using
machine learning. We also remove the reviews that are flood
by a marketing agency in order to boost up the ratings of a
particular product .Finally Sentiment analysis is done for the
genuine reviews to classify them into positive and negative.
We will use Bag-of-words to label individual words
according to their sentiment.
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
This document presents a framework for Arabic concept-level sentiment analysis using SenticNet. It discusses existing sentiment analysis approaches and focuses on concept-level sentiment analysis, which classifies text based on semantics rather than syntax. The authors modify SenticNet to suit Arabic and test it on a multi-domain Arabic dataset. Syntactic patterns are used to extract concepts from sentences, which are then translated to English and matched to SenticNet concepts to determine polarity. An accuracy of 70% was obtained when testing the generated lexicon on the dataset. The lexicon containing 69k unique concepts covers reviews from multiple domains and is made publicly available.
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
This document summarizes research on sentiment polarity analysis of Twitter data from different events. It discusses how Twitter data can be used for opinion mining and sentiment analysis. Several papers that used techniques like naive Bayes classifier, support vector machines, and dual sentiment analysis on Twitter data are summarized. The document also provides an overview of the key steps involved in a Twitter sentiment analysis system, including data collection, preprocessing, feature extraction, training a classification model, and evaluating accuracy. The goal of analyzing sentiments on Twitter is to understand public opinions on different topics and events.
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
Sentiment analysis is the process of identifying people’s attitude and emotional state’s from language. The main objective is realized by identifying a set of potential features in the review and extracting opinion expressions about those features by exploiting their associations. Opinion mining, also known as Sentiment analysis, plays an important role in this process. It is the study of emotions i.e. Sentiments, Expressions that are stated in natural language. Natural language techniques are applied to extract emotions from unstructured data. There are several techniques which can be used to analysis such type of data. Here, we are categorizing these techniques broadly as ”supervised learning”, ”unsupervised learning” and ”hybrid techniques”. The objective of this paper is to provide the overview of Sentiment Analysis, their challenges and a comparative analysis of it’s techniques in the field of Natural Language Processing.
This document summarizes an approach to sentiment analysis. It discusses how sentiment analysis uses natural language processing to identify subjective information and analyze affective states in text. It outlines several common methods for sentiment analysis, including Naive Bayes, Maximum Entropy, and Support Vector Machines. The document also discusses evaluating the accuracy of different sentiment analysis techniques and providing sentiment polarity classifications like positive, negative, neutral.
The document summarizes research on aspect-based sentiment analysis. It discusses four main tasks in aspect-based sentiment analysis: aspect term extraction, aspect term polarity identification, aspect category detection, and aspect category polarity identification. It then reviews several approaches researchers have used for each task, including supervised methods like conditional random fields and support vector machines, as well as unsupervised methods. The document concludes by comparing results from different studies on restaurant and laptop review datasets.
This document summarizes a research paper that proposes a method for performing sentiment analysis on product reviews to identify promising product features. It involves scraping short reviews from websites, preprocessing the text through cleaning, tokenization and part-of-speech tagging. Next, it uses pattern mining and a custom lexicon dictionary to determine the overall sentiment score and sentiment scores for specific product features. The goal is to analyze which features consumers view most positively to help businesses understand customer preferences.
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
Making use of social media for analyzing the perceptions of the masses over a product, event or a person has
gained momentum in recent times. Out of a wide array of social networks, we chose Twitter for our analysis as
the opinions expressed their, are concise and bear a distinctive polarity. Here, we collect the most recent tweets
on users' area of interest and analyze them. The extracted tweets are then segregated as positive, negative and
neutral. We do the classification in following manner: collect the tweets using Twitter API; then we process the
collected tweets to convert all letters to lowercase, eliminate special characters etc. which makes the
classification more efficient; the processed tweets are classified using a supervised classification technique. We
make use of Naive Bayes classifier to segregate the tweets as positive, negative and neutral. We use a set of
sample tweets to train the classifier. The percentage of the tweets in each category is then computed and the
result is represented graphically. The result can be used further to gain an insight into the views of the people
using Twitter about a particular topic that is being searched by the user. It can help corporate houses devise
strategies on the basis of the popularity of their product among the masses. It may help the consumers to make
informed choices based on the general sentiment expressed by the Twitter users on a product.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
This document proposes a model to estimate overall sentiment score by applying rules of inference from discrete mathematics. It discusses sentiment analysis and related work using techniques like supervised/unsupervised learning. The problem is identifying sentiment components and restricting patterns for feature identification. Most approaches focus on nouns/adjectives but not verbs/adverbs. The model preprocesses product review datasets using NLTK for stemming, parsing and tokenizing. It builds a lexicon dictionary of positive and negative words. The Lexical Pattern Sentiment Analysis algorithm uses both lexicon and pattern mining - it selects sentence patterns, checks for positive/negative words in the lexicon, and calculates an overall sentiment score.
A Survey On Sentiment Analysis And Opinion Mining TechniquesSabrina Green
This document summarizes a survey on sentiment analysis and opinion mining techniques. It discusses the key steps in the sentiment analysis process, including lexicon generation, subjectivity detection, sentiment polarity detection, sentiment structurization, and sentiment summarization/visualization/tracking. It also reviews previous work related to each step. Additionally, it discusses approaches that have been applied to sentiment analysis for Indian languages such as using bilingual dictionaries and wordnets to generate sentiment lexicons.
A Survey on Sentiment Analysis and Opinion Mining.pdfMandy Brown
This document summarizes approaches for sentiment analysis and opinion mining techniques. It discusses how sentiment analysis can be used to analyze opinions expressed in text about products, services, and other topics. Popular approaches for sentiment analysis include using subjectivity lexicons, n-gram modeling, and machine learning techniques. The document also describes the common steps involved in the sentiment analysis process, including lexicon generation, subjectivity detection, sentiment polarity detection, and sentiment summarization.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
1. Shirke et al., International Journal of Advance Research, Ideas and Innovations in Technology
ISSN: 2454-132X
(Volume2, Issue1)
Streaming Analytics
1
Aditya Shirke, 2
Ankur Singh, 3
Umesh Sapar, 4
Aditya Sengupta, 5
Prof Leena Deshpande
aditya.shirke@viit.ac.in1
, ankur.singh@viit.ac.in2
, saparumesh@gmail.com3
, adityasengupta2008@gmail.com4
,
leena.deshpande@viit.ac.in5
Department of Computer Engineering
Vishwakarma Institute of Information Technology, Pune.
Abstract: Nowadays Sentiment Analysis play an important Role in each field such as Stock market, product
reviews, news article, political debates which help us to determining current trend in the market regarding
specific product, event, issues. Here we are apply sentiment analysis on microblogging platforms such as
twitter, Facebook which is used by different people to express their opinion with respect to different kind of
foods in the field of home’schef. This paper explain different methods of text preprocessing and applies them
with a naive Bayes classifier in a big data, distributed computing platform with the goal of creating a scalable
sentiment analysis solution that can classify text into positive or negative categories. We apply negation
handling, word n-grams, stemming, and feature selection to evaluate how different combinations of these pre-
processing methods affect performance and efficiency.
Keywords: Text mining, Natural language processing, Sentiment analysis, Features Extraction
I. INTRODUCTION
Streaming analytics is related to a sentiment analysis, where we are going to extract the data, comment given
by the people about specific food, meals or restaurant, from social media sites such as Facebook, twitter. Develop the
problem on analytics which analyzes the different kind of sentiments, emotions, opinion, given by the customers from
the Social media with respect to Home’schef and predicts and Recommend the online Review and Show the Result.
Our main challenge is to develop the web site related to the Home’schef where different people can order
different types of foods from different hotels and restaurant at any time. For that, each user must register by giving
their personal information such as name, date of birth, area, etc., to create their own account to our website so that he
can order as much as food. And Restaurant also Register to our website by giving their personal information such as
Hotel address, specialty, Menu card, etc.
Finally we are apply Naive Bayes classifier algorithm for classifying and generate the result according to
area wise, different age, season of different food and gender.
2. Shirke et al., International Journal of Advance Research, Ideas and Innovations in Technology
This result will be helpful for restaurant as “What people think about your food”, and also he will get know
“latest trends in markets of different kind of foods” for increasing their business.
Fig 1.Flowchart
II. LITERATURE SURVEY
A) Some of the early and recent results on sentiment analysis of Twitter data are by A. Go et al(2009) [8],
(Bermingham and Smeaton, Classifying sentiment in microblogs, 2010, pages 1833-1836) and Pak and
Paroubek (2010) [5].
B) By Haseena Rahmath , Tanvir Ahmad [1], “Sentiment Analysis Techniques - A Comparative Study”.
In this paper they have aimed on comparative study of different Sentiment analysis techniques such as
Supervised Machine learning based techniques, Lexicon Based Method, Hybrid Techniques. In Hybrid
Techniques, many researcher have accomplish the task by combining supervised machine learning and
lexicon based approaches. They got into conclusion that more researchers are needed in this field to
achieve better performance in sentiment classification.
C) A. Agarwal et al.(2009) [6], in their paper Contextual phrase-level polarity analysis using lexical affect
scoring and syntactic ngrams; experimented with three types of models: unigram model, a feature based
model and designed the tree kernel based model on tweets for sentiment analysis.
D) A. Go et al. (2009) [8], in their paper “Twitter Sentiment Classification using Distant Supervision” use
distant learning to acquire sentiment data. They use tweets ending in positive emoticons like “:)” “:-)”
as positive and negative emoticons like “:(” “:-(” as negative.
E) A. Pak and P. Paroubek (2010) [5], in their paper “Twitter as a Corpus for Sentiment Analysis and
Opinion Mining” collect data following a similar distant learning paradigm. They perform a different
classification task though: subjective versus objective. For subjective data they collect the tweets ending
3. Shirke et al., International Journal of Advance Research, Ideas and Innovations in Technology
with emoticons in the same manner as Go et al. (2009). For objective data they crawl twitter accounts of
popular newspapers like “New York Times”, “Washington Posts” etc.
F) A. Celikyilmaz et al.(IEEE, 2010) [3] in their paper “Probabilistic model-based sentiment analysis of
twitter messages” developed a pronunciation based word clustering method for normalizing noisy
tweets.
G) Zhen Niu et al. [2] introduced a new model in which efficient approaches are used for feature selection,
weight computation and classification.
H) Wu et al. [9] proposed a influence probability model for twitter sentiment analysis.
I) Pak et al. [5] created a twitter corpus by automatically collecting tweets using Twitter API and
automatically annotating those using emoticons. Using that corpus, they built a sentiment classifier based
on the multinomial Naive Bayes classifier that uses N-gram and POS-tags as features.
J) Barbosa and Feng (2010) [10]. They use polarity predictions from three websites as noisy labels to train
a model and use 1000 manually labeled tweets for tuning and another 1000 manually labeled tweets for
testing. They however do not mention how they collect their test data. They propose the use of syntax
features of tweets like retweet, hashtags, link, punctuation and exclamation marks in conjunction with
features like prior polarity of words and POS of words.
K) Michael Gamon (2004) [11] perform sentiment analysis on feedback data from Global Support Services
survey. One aim of their paper is to analyze the role of linguistic features like POS tags. They perform
extensive feature analysis and feature selection and demonstrate that abstract linguistic analysis features
contributes to the classifier accuracy.
III. PROPOSED METHODOLOGY
In this paper we have studied different methodologies which can be useful to complete the given problem.
A. Feature Selection:
Feature selection method can be used to identify and remove unneeded and Redundant attributes such as
punctuation, infrequent word, and stop word from data that do not contribute to the accuracy of a predictive model.
We eliminate stop words like: a, about, above, across, anywhere, also, as, back, become, been, do, does, cases, certain,
end, ended, find, group, known, myself, place, such, why, yes, work, toward, this, there, etc. We also remove
punctuations like: { } , : ; ( ) . Removing unneeded and redundant attributes such as punctuation will reduce feature
set such that it will need less memory size to process and it will be easier for next steps to manipulate the feature set.
B. Preprocessing of Text:
In order to reduce the dimensionally of the document words, special method such as filtering and Stemming are
applied. Generally we use term Noise cancellation for this process. Filtering is a process in which those words gets
left out which does not help in getting sentiment precisely. Also one major advantage of using filtering is that it will
reduce the database size tremendously. Stemming is a process which reduces words to their morphological roots. For
e.g. doing, done, did may be represented as ‘Do’.
Another method is negation handling, where we negate the positive sentiment of words if they are negated in the
sentence
C. Feature Extraction:
Features extraction starts from an initial set of measured data and builds derived values. When the input data to an
algorithm is too large to be processed and it is redundant then it can be transformed into a reduced set of features.
This process is called feature extraction. The extracted features are expected to contain the relevant information from
the input data. Extracted features are also small in size comparable to the size before extraction.
4. Shirke et al., International Journal of Advance Research, Ideas and Innovations in Technology
D. Dictionary Based Algorithm:
We use dictionary based algorithm for counting purpose. Dictionary based algorithm gives number of count of specific
keywords we want to find and it also count total number of words. Suppose, there are four keyword namely : A,B,C,D
and total occurrences of these four keywords are 32 in which A occurred 5 times, B occurred 10 times, C occurred 8
times and D occurred 9 times. So, using dictionary based algorithm we found out that A occurred 5/32 times. Similarly
B occurred 10/32 times, C occurred 1/4 times, D occurred 9/32 times. We are then using this information in Naïve
Bayes classifier.
D. Naïve Bayes classifier:
Naïve Bayes classifier is a conditional probability model which is based on Bayes’ theorem which gives probability
of an event, based on conditions which might be related to that specific event. We use Naïve Bayes classifier to classify
the feature set into two set of class: positive and negative class. In a document which is training dataset for Naïve
Bayes classifier, we gave class value to sample dataset which would be given in string. The classifier then create
probability of each word given in those training sample data along with total positive and also total negative probability
of given data samples. We will use classifier to give minimum possible probability to those words which are never
been into the sample. Such that, if any new word came in test dataset, classifier will give minimum possible probability
to those words which are never been in the sampled data. Also, Naïve Bayes will give probability on the basis of
respective sentence class. The term sentence class is defined here which means that the class assigned to that overall
sentence. So, if a new word came in test dataset, then according to the overall sentence the probability of that word to
be in positive or negative class is defined. We use higher will be the label class priority which is used worldwide in
Naïve Bayes classifier. Like, if the given stream of data in test dataset is having higher negative probability than the
positive probability then that stream of data is most probably would be in negative class. So, here we used Naïve
Bayes classifier so that new sentences would get appropriate label n there would be no error.
VI. CONCLUSION
In this paper, we demonstrated a simple implementation of a naive Bayes classifier that shows accuracy comparable
with the state-of-the-art on the TWITTER dataset, yet is scalable and efficient, due to the minimal pre-processing and
the strong independence assumption of the naive Bayes classifier. We also demonstrated how different pre and post-
processing methods change accuracy, specifically how choosing the right combination of n-grams and the most
contributing features improves accuracy. We also implemented sentiment analysis on a distributed computing
platform, Apache Spark that showed high accuracy and efficiency with the Twitter dataset. We also suggest that effort
should be put into realizing more context-aware systems, as that is where the field is currently lacking.
VII. FUTURE SCOPE
In future this paper can be expanded to provide more accurate results by providing recommendations based on review
and ratings obtained from the customers. We tentatively conclude that sentiment analysis for microblogs’ data is not
that different from sentiment analysis for other genres. In future work, we will explore even richer linguistic analysis,
for example, parsing, semantic analysis and topic modeling. Also, by using Hybrid Technique, we were able to achieve
an accuracy upto 100%. So, more researchers are needed in this field.[1]. Set of current experiments indicates that
there are number of interesting directions for future work.
ACKNOWLEDGEMENT
An endeavor is successful only when it is carried out under proper guidance & blessings. We hereby are thankful to
Prof. Mr. S.R. Sakhare, Head of Department, Computer, Vishwakarma Institute of Information Technology, Pune. It
gives us great pleasure in acknowledging the support of Prof. (Mrs.) L.A. Deshpande and Mr. A. A. Bhosale and for
their productive & hopeful suggestions. We would also like to thank Mr. Sachin Deshpande CEO, InnoBytes
Technologies Pvt. Ltd for his insightful ideas and suggestions. We also thank all Teaching and Non-teaching staff of
5. Shirke et al., International Journal of Advance Research, Ideas and Innovations in Technology
VIIT, Pune for their kind of co-operation during the course. The blessings and support of family and friends has always
given me success, we are extremely thankful for their love.
REFERENCES
[1] Haseena Rahmath , Tanvir Ahmad, “Sentiment Analysis Techniques - A Comparative Study” in IJCEM
International Journal of Computational Engineering & Management, Vol. 17 Issue 4, July 2014
[2] Z. Niu, Z. Yin, and X. Kong, “Sentiment classification for microblog by machine learning,” in Computational
and Information Sciences ( ICCIS), 2012 Fourth International Conference on, pp. 286–289, IEEE, 2012.
[3] A. Celikyilmaz, D. Hakkani-Tur, and J. Feng, “Probabilistic model-based sentiment analysis of twitter
messages,” in Spoken Language Technology Workshop (SLT), 2010 IEEE, pp. 79–84, IEEE, 2010.
[4] L. Barbosa and J. Feng, “Robust sentiment detection on twitter from biased and noisy data,” in Proceedings
of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44, Association for
Computational Linguistics, 2010.
[5] Alexander Pak and Patrick Paroubek. 2010. “Twitter as a corpus for sentiment analysis and opinion mining”,
Proceedings of LREC.
[6] Apoorv Agarwal, Fadi Biadsy, and Kathleen Mckeown. 2009. “Contextual phrase-level polarity analysis using
lexical affect scoring and syntactic ngrams”. Proceedings of the 12th Conference of the European Chapter of
the ACL (EACL 2009), pages 24–32, March.
[7] Adam Bermingham and Alan Smeaton. 2010. Classifying sentiment in microblogs: is brevity an advantage is
brevity an advantage? ACM, pages 1833–1836
[8] Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision.
Technical report, Stanford.
[9] Y. Wu and F. Ren, 'Learning sentimental influence in twitter', Future Computer Sciences and Application
(ICFCSA), 2011 International Conference, IEEE, vol. 119122, 2011.
[10] L. Barbosa and J. Feng, “Robust sentiment detection on twitter from biased and noisy data,” in Proceedings
of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44, Association for
Computational Linguistics, 2010
[11] Michael Gamon. 2004. Sentiment classification on customer feedback data: noisy data, large feature vectors,
and the role of linguistic analysis. Proceedings of the 20th international conference on Computational
Linguistics.
[12] Alessandro Moschitti. 2006. Efficient convolution kernels for dependency and constituent syntactic trees. In
Proceedings of the 17th European Conference on Machine Learning.