Making use of social media for analyzing the perceptions of the masses over a product, event or a person has
gained momentum in recent times. Out of a wide array of social networks, we chose Twitter for our analysis as
the opinions expressed their, are concise and bear a distinctive polarity. Here, we collect the most recent tweets
on users' area of interest and analyze them. The extracted tweets are then segregated as positive, negative and
neutral. We do the classification in following manner: collect the tweets using Twitter API; then we process the
collected tweets to convert all letters to lowercase, eliminate special characters etc. which makes the
classification more efficient; the processed tweets are classified using a supervised classification technique. We
make use of Naive Bayes classifier to segregate the tweets as positive, negative and neutral. We use a set of
sample tweets to train the classifier. The percentage of the tweets in each category is then computed and the
result is represented graphically. The result can be used further to gain an insight into the views of the people
using Twitter about a particular topic that is being searched by the user. It can help corporate houses devise
strategies on the basis of the popularity of their product among the masses. It may help the consumers to make
informed choices based on the general sentiment expressed by the Twitter users on a product.
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Sentiment analysis in twitter using python
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
This is small twitter sentiment analysis project which will take one keyword(which is the primary way of storing the tweet in Twitter) and number of tweets, and gives you the pictorial representation of the overall sentiment.
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Sentiment analysis in twitter using python
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
This is small twitter sentiment analysis project which will take one keyword(which is the primary way of storing the tweet in Twitter) and number of tweets, and gives you the pictorial representation of the overall sentiment.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Sentiment Analysis/Opinion Mining of Twitter Data on Unigram/Bigram/Unigram+Bigram Model using:
1. Machine Learning
2. Lexical Scores
3. Emoticon Scores
YouTube Video: https://youtu.be/VuR16P87yPE
Link to the WebPage: http://akirato.github.io/Twitter-Sentiment-Analysis-Tool
Github Page: https://github.com/Akirato/Twitter-Sentiment-Analysis-Tool
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
Trend detection and analysis on TwitterLukas Masuch
By Henning Muszynski, Benjamin Räthlein & Lukas Masuch
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Sentiment Analysis/Opinion Mining of Twitter Data on Unigram/Bigram/Unigram+Bigram Model using:
1. Machine Learning
2. Lexical Scores
3. Emoticon Scores
YouTube Video: https://youtu.be/VuR16P87yPE
Link to the WebPage: http://akirato.github.io/Twitter-Sentiment-Analysis-Tool
Github Page: https://github.com/Akirato/Twitter-Sentiment-Analysis-Tool
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
Trend detection and analysis on TwitterLukas Masuch
By Henning Muszynski, Benjamin Räthlein & Lukas Masuch
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML.
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionIJERA Editor
This paper involves two approaches for finding the trending topics in social networks that is key-based approach and link-based approach. In conventional key-based approach for topics detection have mainly focus on frequencies of (textual) words. We propose a link-based approach which focuses on posts reflected in the mentioning behavior of hundreds users. The anomaly detection in the twitter data set is carried out by retrieving the trend topics from the twitter in a sequential manner by using some API and corresponding user for training, then computed anomaly score is aggregated from different users. Further the aggregated anomaly score will be feed into change-point analysis or burst detection at the pinpoint, in order to detect the emerging topics. We have used the real time twitter account, so results are vary according to the tweet trends made. The experiment shows that proposed link-based approach performs even better than the keyword-based approach.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
With the rise of social networking epoch, there has been a surge of user generated content. Micro blogging sites have millions of people sharing their thoughts daily because of its characteristic short and simple manner of expression. We propose and investigate a paradigm to mine the sentiment from a popular real-time micro blogging service, Twitter, where users post real time reactions to and opinions about “everything”. In this paper, we expound a hybrid approach using both corpus based and dictionary based methods to determine the semantic orientation of the opinion words in tweets. A case study is presented to illustrate the use and effectiveness of the proposed system.
Microblogging today has gotten an acclaimed specific instrument among Internet clients. Endless clients share assessments on various bits of life dependably. Accordingly, microblogging districts are rich wellsprings of information for assessment mining and tendency assessment. Since microblogging has shown up by and large lately, there several investigation works that were given to this point. In our paper, we base on using Twitter, the most notable microblogging stage, for the task of feeling examination. We advise the most ideal approach to thus accumulate a corpus for assessment and evaluation mining purposes. We play out a semantic assessment of the amassed corpus and clarify found wonders. Utilizing the corpus, we build up an end classifier, that can pick positive, negative, and honest evaluations for an annual. Test assessments show that our proposed strategies are convincing and act in a way that is better than actually proposed procedures. In our appraisal, we worked with English, in any case, the proposed procedure can be utilized with some other language. Krunal Dhardev | Dr. Kamalraj R "Twitter Sentiment Analysis" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42385.pdf Paper URL: https://www.ijtsrd.comcomputer-science/other/42385/twitter-sentiment-analysis/krunal-dhardev
There are various online networking sites such as Facebook, twitter where students casually discuss their educational
experiences, their opinions, emotions, and concerns about the learning process. Information from such open environment can
give valuable knowledge for opinions, emotions and help the educational organizations to get insight into students’ educational
life. Analysing down such data, on the other hand, can be challenging therefore a qualitative research and significant data
mining process needs to be done. Sentiment classification can be done using NLP (Natural Language Processing). For a social
network that provides micro blogging services such as twitter, the incoming tweets can be classified into News, Opinions,
Events, Deals and private Messages based on authors information available in the tweets. This approach is similar to
Tweetstand, which classifies the tweets into news and non-news. Even for e-commerce applications virtual customer
environments can be created using social networking sites. Since the data is ever growing, using data mining techniques can get
difficult, hence we can use data analysis tools
Similar to Sentiment Analysis of Twitter tweets using supervised classification technique (20)
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Sentiment Analysis of Twitter tweets using supervised classification technique
1. Pranav Waykar et al. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 5, (Part - 6) May 2016, pp.32-34
www.ijera.com 32 | P a g e
Sentiment Analysis of Twitter tweets using supervised
classification technique
Pranav Waykar, Kailash Wadhwani, Pooja More, Archana Kollu
Department Of Computer Engineering, DR.D.Y.Patil Institute Of Engineering And Technology,Pune.
ABSTRACT
Making use of social media for analyzing the perceptions of the masses over a product, event or a person has
gained momentum in recent times. Out of a wide array of social networks, we chose Twitter for our analysis as
the opinions expressed their, are concise and bear a distinctive polarity. Here, we collect the most recent tweets
on users' area of interest and analyze them. The extracted tweets are then segregated as positive, negative and
neutral. We do the classification in following manner: collect the tweets using Twitter API; then we process the
collected tweets to convert all letters to lowercase, eliminate special characters etc. which makes the
classification more efficient; the processed tweets are classified using a supervised classification technique. We
make use of Naive Bayes classifier to segregate the tweets as positive, negative and neutral. We use a set of
sample tweets to train the classifier. The percentage of the tweets in each category is then computed and the
result is represented graphically. The result can be used further to gain an insight into the views of the people
using Twitter about a particular topic that is being searched by the user. It can help corporate houses devise
strategies on the basis of the popularity of their product among the masses. It may help the consumers to make
informed choices based on the general sentiment expressed by the Twitter users on a product.
Keywords - Data Mining, Feature extraction Naïve Bayes Classifier, Natural language Processing, Twitter,
Unigram
I. INTRODUCTION
1
Twitter is a popular micro blogging
service where users create status messages (called
“tweets”). These tweets sometimes express
opinions about different topics. We propose a
method to automatically extract sentiment (positive
or neutral or negative) from a tweet. This is very
useful because it allows feedback to be aggregated
without manual intervention. Consumers can use
sentiment analysis to do a research on products or
services before making a purchase. Marketers can
use this to research public opinion of their
company and products, or to analyse customer
satisfaction. Organizations can also use this to
gather critical feedback about problems in newly
released products. There has been a large amount
of research in the area of sentiment classification.
Traditionally most of it has focused on classifying
larger pieces of text, like reviews. Tweets (and
micro blogs in general) are different from reviews
primarily because of their purpose: while reviews
represent summarized thoughts of authors, tweets
are more casual and limited to 140 characters of
text. Generally, tweets are not as thoughtfully
composed as reviews. Yet, they still offer
companies an additional avenue to gather feedback.
Previous research on analysing blog posts by Pang
et al. [3] have analysed the performance of
different classifiers on movie reviews. The work of
Pang et al. has served as a baseline and many
authors have used the techniques provided in their
work across different domains. In order to train a
classifier, supervised learning usually requires
hand-labelled training data.
With the large range of topics discussed
on Twitter, it would be very difficult to manually
collect enough data to train a sentiment classifier
for tweets. Hence, we have used publicly available
Twitter datasets. However, this dataset consist only
of positive and negative tweets. For neutral tweets,
we have used the publicly available neutral tweet
dataset provided. We run the machine learning
classifiers Naïve Bayes trained on the positive and
negative tweets dataset and the neutral tweets
against a test set of tweets.This can be used by
individuals and companies that may want to
research sentiment on any topic.
II. BACKGROUND
Defining the sentiment
For the purpose of this work, we define
sentiment as a positive or negative inclination of
the expression stated by the author. If the
expression doesn‟t bear any polarity, it is marked
as a neutral sentiment.
RESEARCH ARTICLE OPEN ACCESS
2. Pranav Waykar et al. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 5, (Part - 6) May 2016, pp.32-34
www.ijera.com 33 | P a g e
Table 1: Example Tweets
Sentiment Keyword Tweet
Positive Weather The weather is pretty good
this morning!
Negative Work Dammnn…. I hate this
clerical work
Neutral Bus The bus arrives at 8 in the
evening.
Related Work
Topics related to the one discussed in this
work, have been researched before. Alec Go, Richa
Bhayani and et al [4] classify tweets using unigram
features and the classifiers are trained on data
obtained using distant supervision. Radha N and et
al [5] shows that using emoticons (distant
supervision) as labels for positive and sentiment is
effective for reducing dependencies in machine
learning techniques and this idea is heavily used in
[4]. Pang and Lee [3] researched the performance
of various machine learning techniques in the
specific domain of movie reviews.
III. METHODOLOGY
A. Pre-processing
The Twitter language model has many
unique properties. These properties can be used to
reduce the feature space:
1) Usernames
In order to direct their messages users
often include Twitter usernames in their tweets.
A de facto standard is to include @ symbol
before the username (e.g. @towardshumanity).
A class token (AT_USER) replaces all words
that begin with @ symbol.
2) Usages of links:
Users very often include links in their
tweets. To simplify our further work, we convert
a URL like “http://tinyurl.com/cmn99f” to the
token “URL”.
3) Stop words:
There are a lot of stop words or filler
words such as “a”, “is”, “the” used in a tweet
which does not indicate any sentiment and hence
all of these are filtered out.
4) Repeated letters:
Tweets contain very casual
language. For example, if you search “hello” with
an arbitrary number of „o‟s in the middle (e.g.
helloooo) on Twitter, there will most likely be a
nonempty result set. I use pre-processing so that
any letter occurring more than two times in a row
is replaced with two occurrences. In the samples
above, these words would be converted into the
token “hello".
B. Feature Vector
After pre-processing the tweets, we get
features which have equal weights.
Unigram
Features which are individually enough to
understand the sentiment of a tweet is called as
unigram. For example, words like „good‟, „happy‟
clearly express a positive sentiment.
C. Classification
For the purpose of classification of tweets,
we make use of Naïve Bayes classifier. Naïve
Bayes is a probabilistic classifier based on Bayes‟
theorem. It classifies the tweets based on the
probability that a given tweets belongs to a
particular class.
We consider three classes namely,
positive, negative and neutral. We assign class c*
to tweet d where,
In this formula, f represents a feature and
ni(d) represents the count of feature fi found in
tweet d. There are a total of m features. Parameters
P(c) and P(f|c) are obtained through maximum
likelihood estimates, and add-1 smoothing is
utilized for unseen features. We have used the
Python based Natural Language Toolkit library to
train and classify using the Naïve Bayes method.
IV. EVALUATION
A. Training data
There are publicly available data sets of
Twitter messages with sentiment indicated by [4].
We have used a combination of these two datasets
to train the machine learning classifiers. For the test
dataset, we used 20 tweets collected run-time
during the execution.
B. Experimental Setup
The Twitter API has a parameter that
specifies which language to retrieve tweets in. We
always set this parameter to English (en). Thus, our
classification will only work on tweets in English
because the training data is English-only.
We build a web interface which searches
the Twitter API for a given keyword for the past
one day or seven days and fetches those results
which is then subjected to pre-processing. These
filtered tweets are fed into the trained classifiers
and the resulting output is then shown as a graph in
the web interface.
3. Pranav Waykar et al. Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 5, (Part - 6) May 2016, pp.32-34
www.ijera.com 34 | P a g e
V. RESULTS
When a keyword was entered into the
search box, the tweets about the entered topic were
collected and classified. For the purpose of testing
the application, we searched tweets about “Donald
Trump”. 20 tweets were shown to the user and
classified using Naïve Bayes classifier. The result
of the classification was displayed in the form of a
pie-chart as follows:
Figure 1: Pie-chart
Once the results were displayed, the user
was asked if he wants to see the segregated tweets
for the sake of justification. The accuracy of the
results depend on the number of training tweets
being fed to the classifier. Higher the number of
tweets greater is the accuracy.
Now, this result about “Donald Trump”
can be used by voters and political analysts alike.
Voters can use the data to see the positive as well
as negative aspects of Mr.Trump whereas the
political analysts and psephologists can use it to
make their predictions.
VI. FUTURE WORK
Machine learning techniques perform well for
classifying sentiment in tweets. We believe the
accuracy of the system could be still improved.
Below is a list of ideas we think could help the
classification:-
A. Semantics
The polarity of a tweet may depend on the
perspective you are interpreting the tweet from. For
example, in the tweet “Federer beats Nadal :)”, the
sentiment is positive for Federer and negative for
Nadal. In this case, semantics may help. Using a
semantic role labeler may indicate which noun is
mainly associated with the verb and the
classification would take place accordingly. This
may allow “Nadal beats Federer :)” to be classified
differently from “Federer beats Nadal :)”.
B. Internationalization
Currently, we focus only on English
tweets but Twitter has a huge international
audience. It should be possible to use our approach
to classify sentiment in other languages with a
language specific positive/negative keyword list.
VII. CONCLUSION
A live Twitter feed is collected under the
keywords entered by the user. The feed is stored
locally in a json file. The data is pre-processed to
remove unnecessary spaces, symbols and useless
features. It still requires further work to remove as
much noise as possible. 20 tweets are then stored as
a csv file for analysis. A number of Lexicon based
methods are utilised on individual tweets from the
file to assess their usefulness. The chosen classifier
for this work is a Naive Bayes Classifier utilising
the text processing tools in NLTK and their
capacity to work with human language data. It is
trained on tagged tweets and then used to analyse
the sentiment in the tweets about the searched
topic. The result is represented in the form of a pie
diagram which shows the percentage of users who
have positive opinion on the searched topic as
compared to the ones have negative opinion or are
neutral.
REFERENCES
[1]. Adam Tsakalidis, Symeon Papadopoulos,
Alexandra Cristea, Yiannis Kompatsiaris,
“Predicting Elections for Multiple Countries
Using Twitter and Polls),” IEEE. 2015.
[2]. Gayo-Avello, Daniel, A meta-analysis of
state-of-the-art electoral prediction from
Twitter data, Social Science Computer
Review, 2013.
[3]. B. Pang, L. Lee, and S. Vaithyanathan.
Thumbs up? Sentiment classification using
machine learning techniques. In Proceedings
of the Conference on Empirical Methods in
Natural Language Processing (EMNLP),
pages 79–86, 2002.
[4]. Alec Go, Richa Bhayani, Lei Huang. Twitter
Sentiment Classification using Distant
Supervision. Technical report, Stanford
Digital Library Technologies Project, 2009.
[5]. Jiawei Han, Micheline Kamber, “Data
mining: concepts and techniques", Morgan
Kaufmann Publisher, second edition, pages
310-317.
[6]. Publicly available Twitter dataset -
http://www.sananalytics.com/lab/twitter-
sentiment/sanders-twitter-0.2.zip.
[7]. Steven Bird, Ewam Klein, Edward Loper,
“Natural Language Processing with Python”,
O‟Reilly, 2009.
[8]. Efthymios Kouloumpis, Theresa Wilson,
Johanna Moore, “Twitter Sentiment
Analysis: The Good the Bad and the
OMG!”, AAAI Conference on Weblogs and
Social Media.