SlideShare a Scribd company logo
1 of 4
Download to read offline
Sentiments Analysis Of Twitter Data Using Data
Mining
Anurag P. Jain
Dept. of Information Technology
Pimpri Chinchwad College of Engineering
Pune, India
Email:anuragpjain7@gmail.com
Mr. Vijay D. Katkar
Dept. of Information Technology
Pimpri Chinchwad College of Engineering
Pune, India
Email:katkarvijayd@gmail.com
Abstract—With rapid growth in user of Social Media in recent
years, the researcher get attracted towords the use of social media
data for sentiments analysis of people or particular product or
person or event. Twitter is one of the widely used social media
platform to express the thoughts. This Paper presents approch for
analysing the sentiments of users using data mining classifiers. It
also compares the performance of single classifiers for sentiments
analysis over ensemble of classifier. Expermental results obtained
demonstrates that k-nearest neighbour classifier gives very high
predictive accuracy. Result also demonstrate that single classifiers
outperforms ensemble of classifier approch.
keywords- Sentiments analysis, Twitter, k-nearest neigh-
bour, Random Forest, Naive Naive Baysin, Bays Net, ensem-
ble of classifiers, openion mining
I. INTRODUCTION
From the past few years, social media plays a vital role
in modern life. Numbers of users of social media goes on
incresing day by day. Users Post their view, thoughts, life
events on social media and that too without any restriction
and hesitation. Some of the social media allow users to
interact with only with their freinds and sharing their post
with very easy level of privacy. Due to simple and easy
privacy policies,and easy accessibilty of a some social media,
users are migrated from traditional means of communication
such as blogs or mailing list to microblogging site such as
Twitter, Facebook etc. Billions of text data in the form of
messages on social media make it very facinating medium
for data analysis for the researchers.
Sentiments analysis is a method of computing and
stratifying a view of a person given in a piece of a text,
especially in order to identify persons thinking towards a
specific topic product etc..is positive or negative or neutral.
Social media portals have been globally used for expressing
a sentiments publicly through text based message and images
[1] . Currently, Twitter, facebook linkedIn, flickr etc enables
users to post their view publicly. Among all social media
Twitter widely get used for
This work is partially funded by BCUD, Savitribai Phule
Pune university
978-1-4673-7758-4/15/$31.00 c 2015 IEEE
expressing view on certain topic. Twitter allow tweeple(i.e
users of twitter) to post their opinion on politics
environment,entertainment,industry,stock market etc.
Twitter is Social Networking website that allow users to
send and read 140 character messages called tweets. Users
of Twitter can read and post tweets. According to Statista
survey there are 304 million monthly active users of Twitter.
Approximately 500 million tweets get tweeted per day on
twitter. Huge amount of valuable data resides on twitter
which gives opprunity to many researcher to dig up into
tweets and make prediction about event, product, industry
stock market etc by using various classification methods. The
availability of huge amount of data has drawn attention to
many researcher on researching twitter data statistically or
more specific sense scientifically and meaningfully.
Rest of the paper organized as follow: Section 2 describe
about related work; Section 3 describes proposed method-
ology; Section 4 presents Results and Analysis and final
conlusion detailed in section 5
II. RELATED WORKS
Many researcher carried out their research work in
sentiments analysis using social media. Several reseacher
have emphsize their attention on stastical results from social
media using various sentiments analysis methods. Malhar
Anjaria et.al. [1] introduce the novel approach of exploiting
the user influence factor in order to predict the outcome of
an election result. Athours also propose a hybrid approach of
extracting opinion using direct and indirect features of Twitter
data based on Support Vector Machines (SVM), Naive Bayes,
Maximum Entropy and Artificial Neural Networks based
supervised classifiers.
Min Song et.al. [2] employ temporal Latent Dirichlet
Allocation (LDA) to analyze and validate the relationship
between topics extracted from tweets and related events. They
developed the term cooccurrence retrieval technique to trace
chronologically cooccurring terms and thereby compensate for
LDAs limitations. Finally, authors identify thematic coherence
among users identified in sending receiving mentions.
Li Bing et. al. [3] proposed a method to mine Twitter
data for prediction of the movements of the stock price of
a particular company through public sentiments. Authors
also explain how stock price of one company to be more
predictable than that of another company and they proposed
to used a data mining algorithm to determine the stock price
movements of 30 companies listed in NASDAQ and the New
York Stock Exchange can actually be predicted by the given
15 million records of tweets (i.e., Twitter messages). They
did so by extracting ambiguous textual tweet data through
NLP techniques to define public sentiment, then make use of
a data mining technique to discover patterns between public
sentiment and real stock price movements.
Khoshgoftaar et. al. [4] explored techniques to apply
data mining towards the goal of identifying those who
score in the top 1.4% of a well-known psychopathy metric
using information available from their Twitter accounts.
Authors apply a newly-proposed form of ensemble learning,
SelectRUSBoost (which adds feature selection to their
earlier imbalance-aware ensemble in order to resolve
highdimensionality), they also employed four classification
learners, and use four feature selection techniques.
Mahmood et. al. [5] Analyzed the impact of tweets in
predicting the winner of the recent 2013 election held
in Pakistan. Author used Rapid miner as data mining
tool. For perforance analysis they used CHAID decision
tree(Chisquared Automatic Interaction Detector), Nave Bayes
and Support Vector Machine (SVM) algorithms. Author
collected a tweets by using twittermachine.
Tumitan, D. et. al. [6] investigate whether it is possible
to predict variations in vote intention based on sentiment
time series extracted from news comments, using three
Brazilian elections as case study. They Mainly emphasize
on an approach to predict polls vote intention variations that
is adequate for scenarios of sparse data. Authors developed
experiments to assess the influence on the forecasting accuracy
of the proposed features, and their respective preparation.
V.S. subrahmanian et. al. [7] used Sentiment Diffusion
Forecasting by using dataset of 23 million tweets from over 16
million Twitter users. Reaserchers obtain the Sentiment Scores
of tweets by adaptation of the AVA (adjective-verbadverb)
sentiment analysis algorithm.
NaiveBays :- Naive Baysin is used by many researchers
[8,9,10] to detect the sentiments of tweet. It works using
probalistic model given below
p(Ck | z1....zn) (1)
For each of k possible outcomes or classes.But if number
feature is lagre that is value of n is large then above formula
is not work well . Because probability tables become too large
and infeasible to handle. Therefore Bays theorem is used,
which decomposed the conditional probability as
p(Ck|z) =
p(Ck) p(z|Ck)
p(z)
. (2)
Where Ck is class for each of k possible outcomes. And z
are the instances to be classified .
A Bayesian network is also widely get used classi-
fier [11,112,13] in opinion mining or sentiments analysis .
Bayesian network work on the principle of joint probability
distribution function. Let pair (G,CPD) encodes joint prob-
ability distribution p(X1,X2........Xn) where X(X1,X2,.....,Xn)
discrete random variables.A unique joint probability distribu-
tion X over G from is factorized as:
p(X1, X2........Xn) = (p(Xi | Pa(Xi))) (3)
.
Many researcher [14,15,16] also used Randome Forest for
detection of a sentiments od tweets. Random forests are an
ensemble learning method for classification. Randaom forest
are combination of trees. In more specific way, Random forest
classifier defined as learning ensemble based on bagging of
un-pruned decision tree learners wih randomised seletion of
features at each split.
knearest neighbour classifier is also a widely used clas-
sifier in tweets classification. Many researcher [17,18,19]
used knearest neighbour for classification of tweets .knear-
est neighbour is simplest algorithm.knearest neighbour(KNN)
is instance based learning algorithm where all computation
delayed till classification.
III. PROPOSED METHODOLOGY
This paper presents a mechanism to predict the overall
sentiments inclination of indian people towords political sit-
uation and issue. Figure 1 shows the proposed system flow.
Raw training tweets are collected by using Twitter API v 1.1.
After collecting a raw tweets various preprocessing methode
get applied to clean the data. Same methodes are applied
for collecting and cleaning raw tweets for preparing test-
ing dataset. After preparation of training and testing dataset
various classifiers get applied to analyze the performance of
classifiers .
Following subsection explains the each figure 1 block in detail.
A. Data Collection
Traning and testing tweets collected from twitter by using
twitter searched API v 1.1 for various political leaders and
parties in india. Harvested tweets is of only a english language
tweets.
Fig. 1. Figure 1:Methodology.
B. Preprocessing
Tweets are sometime not in the usable format. To get a
tweets in usable format various preprocessing methode for
cleaning a tweets get applied. All userID,twitterId, userinfo
from the tweets is removed. All special character and hy-
perlinks is removed from the tweets. Duplicate tweets also
get removed from traininig dataset. Traning data set does not
contain retweets. After applying all cleaning methodes text
that is only with the tweeted text is remained.
C. Training DataSet and testing dataset
For classifying a dataset, three classes, Positive, Negative,
Nuetral are used. To classify tweets in to these category
SentiWordNet 3.0.0. dictionary is used. SentiWordNet 3.0.0.
dictionary contains the 117659 word. Each words is assgin
with its positive negative polarity. If positive and negative
polarity of word is 0 then word considerd as neutral. Basically
single tweet is spilted into words, after splitting, polarity of
words is calculated from SentiWordNet 3.0.0. After deciding a
polarity of each word all positive and negative words polarity
get added separatly. And then comparision between positive
words polarity with negative words polarity get done. If
sentense having more positive polarity then classify sentense(
tweet) as positive polarity. Same for nagative and neutral
polarity. Duplicates tweets were not considerd in a training
dataset.
IV. RESULT ANALYSIS AND DISCUSSION
2,102,52 tweets were collected about various political lead-
ers and parties. Data cleaning process leaves us with 35% of
original tweets.2 lack tweets collcted for various political lead-
ers. Experiment performed with following classifier: classifier.
1)k-nearest neighbour 2)Random Forest. 3)Naive Baysin
4)Baysnet. Table 1 shows the prediction accuracy of all
classifiers when stopwords are not removed from traning and
testing tweets
TABLE I
ACCURACIES FOR MACHINE LEARNER
Algorithm Accuracy
k-nearest neighbour 99.6456%
RandomForest 99.0373%
BaysNet 75.0695%
NaivBays 60.3159
k-nearest neighbour and random forest gives much better
accuracy compred to naive baysin and bays Net. k-nearest
neighbour give a highest accuracy.Prediction accuracy of
k-nearest neighbour 99.6456%.
Table 2 shows the prediction accuracy of all classifiers
when stopwords are removed from traning and testing tweets.
k-nearest neighbour gives better accuracy compred to all
three classifier. Prediction accuracy of k-nearest neighbour
when stopwords are removed is 96.6398%
TABLE II
ACCURACY OF MACHINE LEARNER WITHOUT STOPWORDS
Algorithm Accuracy
k-nearest neighbour 96.6398%
RandomForest 65.6681%
BaysNet 48.9579%
NaivBays 60.3159
Table 3 shows prediction accuracy when ensemble of
classifier are used to classify tweets without stopwords
removel.Ensemble of random forest , Baysnet and k-nearest
neighbour gives a higher accuracy that is 99.2400%
TABLE III
ACCURACY OF ENSEMBLE OF CLASSIFIER WITH STOPWORDS
Algorithm Accuracy
Naive baysin
Baysnet
RandomForest
81.8989%
Naive Bayesian
BayesNet
KNN
82.0705%
RandomForest
BayesNet
KNN
99.2400%
Naive Bayesian
BayesNet
RandomForest
KNN
91.8448%
Table 4 shows prediction accuracy when ensemble of classi-
fier are used to classify tweets without stopwords . Ensemble
of random forest , Baysnet and k-nearest neighbour gives a
higher accuracy that is 91.0418%
TABLE IV
ACCURACY OF ENSEMBLE OF CLASSIFIER WITHOUT
STOPWORDS
Algorithm Accuracy
Naive baysin
Baysnet
RandomForest
73.6148%
Naive Bayesian
BayesNet
KNN
73.8515%
RandomForest
BayesNet
KNN
91.0418%
Naive Bayesian
BayesNet
RandomForest
KNN
87.0895%
V. CONCLUSION
It can be observed from the experimental results that
data mining classifiers is a good choice for sentiments pre-
diction using tweeter data. In a experimentation, knearest
neighbour(IBK) outperforms over all three classifier namely
RandomForest, baysNet, Naive Baysein. RandomForest also
gives good prediction accuracy. There is a no need to use
of ensemble of classifier for sentiments predictions of tweets
as single classifier ( i.e knearest neighbour) gives a better
accuracy over all combinations of ensemble of classifier .
REFERENCES
[1] Malhar Anjaria, Ram Mahana Reddy Guddeti, ”Influence Factor Based
Opinion Mining of Twitter Data Using Supervised Learning”, Sixth
International Conference on Communication Systems and Networks
(COMSNETS), 2014 IEEE
[2] Min SongMeen Chul Kim , Yoo Kyung Jeong,”Analyzing the Political
Landscape of 2012 Korean Presidential Election in Twitter ”,Intelligent
Systems, IEEE (Volume:29 , Issue: 2 ) ,2014 IEEE.
[3] Li Bing ,Chan, K.C.C. , Ou, C. ,”Public Sentiment Analysis in Twitter
Data for Prediction of A Companys Stock Price Movements”,11th Inter-
national Conference on e-Business Engineering (ICEBE), 2014 IEEE.
[4] Wald, R. , Khoshgoftaar, T.M., Napolitano, A.Sumner, C.,”Using Twitter
Content to Predict Psychopathy”,11th International Conference on Ma-
chine Learning and Applications 2014.
[5] Mahmood, T.,Iqbal, T. ; Amin, F. ; Lohanna, W.; Mustafa, A.,Mining
Twitter big data to predict 2013 Pakistan election winner,16th Interna-
tional Multi Topic Conference (INMIC),2013 IEEE.
[6] Tumitan, D. ,Becker, K.,”Sentiment-Based Features for Predicting Elec-
tion Polls: A Case Study on the Brazilian Scenario”,Web Intelligence
(WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM
International Joint Conferences on (Volume:2 ) 2015.
[7] Vadim Kagan and Andrew Stevens, V.S. Subrahmanian.”Using Twitter
Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian
Election” ,Intelligent Systems, IEEE (Volume:30 , Issue: 1, ) 2015 IEEE
[8] ,Bo Pang. Lilliam Lee, ”Seeing Stars: Exploiting class relationships fpr
sentiment categorization with respect to rating scales”, 2002.
[9] Cozma, R., and Chen, K., ”Congressional Candidates” Use of Twitter
During the 2010 Midterm Elections: A Wasted Opportunity?” 6lst Annual
Conference of the International Communication Association, 2011.
[10] Pew Research Center, ”Parsing Election Day Media: How the Midterms
Message Varied by Platform”, Pew, 2010.
[11] S. Meganck, P. Leray, B. Manderick, Learning Causal Bayesian Net-
works from Observations and Experiments: A Decision Theoretic Ap-
proach, Proceedings of Modelling Decisions in Artificial Intelligence
(MDAI 2006), LNAI 3885, 2006, pp. 58-69.
[12] G. Li, T.-Y. Leong, A framework to learn Bayesian Networks from
changing, multiple-source biomedical data, Proceedings of the 2005
AAAI Spring Symposium on Challenges to Decision Support in a
Changing World, Stanford University, CA, USA, 2005, pp. 66-72.
[13] R.E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2004.
[14] http://www.dabi.temple.edu/∼hbling/8590.002/Montillo
RandomForests 4-2-2009.pdf
[15] Gokulakrishnan, B,Priyanthan, P., Ragavan, T,”Opinion mining and
sentiment analysis on a Twitter data stream”, Advances in ICT for
Emerging Regions (ICTer), 2012 International Conference on,2012 IEEE.
[16] Ndia F.F. da Silva, Eduardo R. Hruschkaa,”Tweet sentiment analysis
with classifier ensembles”,Decision Support Systems,Volume 66, October
2014, Pages 170179.
[17] http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN
Talk.pdf
[18] http://www.math.le.ac.uk/people/ag153/homepage/KNN/KNN3.html.

More Related Content

What's hot

IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learningijtsrd
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...IRJET Journal
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanvenkatramanJ4
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Finding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisFinding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisAndry Alamsyah
 
IRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET Journal
 
A data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksA data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksYassine Bensaoucha
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)SangMe Nam
 
Temporal Exploration in 2D Visualization of Emotions on Twitter Stream
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTemporal Exploration in 2D Visualization of Emotions on Twitter Stream
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
 
IRJET- Fake News Detection
IRJET- Fake News DetectionIRJET- Fake News Detection
IRJET- Fake News DetectionIRJET Journal
 
A scalable, lexicon based technique for sentiment analysis
A scalable, lexicon based technique for sentiment analysisA scalable, lexicon based technique for sentiment analysis
A scalable, lexicon based technique for sentiment analysisijfcstjournal
 
IRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
IRJET - Unauthorized Terror Attack Tracking System using Web Usage MiningIRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
IRJET - Unauthorized Terror Attack Tracking System using Web Usage MiningIRJET Journal
 

What's hot (20)

IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Finding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network AnalysisFinding Pattern in Dynamic Network Analysis
Finding Pattern in Dynamic Network Analysis
 
IRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source Identification
 
A data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksA data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networks
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
 
Temporal Exploration in 2D Visualization of Emotions on Twitter Stream
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTemporal Exploration in 2D Visualization of Emotions on Twitter Stream
Temporal Exploration in 2D Visualization of Emotions on Twitter Stream
 
IRJET- Fake News Detection
IRJET- Fake News DetectionIRJET- Fake News Detection
IRJET- Fake News Detection
 
A scalable, lexicon based technique for sentiment analysis
A scalable, lexicon based technique for sentiment analysisA scalable, lexicon based technique for sentiment analysis
A scalable, lexicon based technique for sentiment analysis
 
IRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
IRJET - Unauthorized Terror Attack Tracking System using Web Usage MiningIRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
IRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
 

Viewers also liked

Çalışma Yaprağı
Çalışma YaprağıÇalışma Yaprağı
Çalışma YaprağıRabiayaz
 
My new Resume July 7 2016 for monster
My new Resume July 7 2016 for monsterMy new Resume July 7 2016 for monster
My new Resume July 7 2016 for monsterMichelle Alleyne
 
Tjedna lista romana (2)
Tjedna lista romana (2)Tjedna lista romana (2)
Tjedna lista romana (2)Tomislavladan
 
4Q07 Results Presentation
4Q07 Results Presentation  4Q07 Results Presentation
4Q07 Results Presentation RiRossi
 
Constancia Instructor ODM - Angello Manrique
Constancia Instructor ODM - Angello ManriqueConstancia Instructor ODM - Angello Manrique
Constancia Instructor ODM - Angello ManriqueAngello Manrique Vigil
 
Portfolio - Ray Lim
Portfolio - Ray LimPortfolio - Ray Lim
Portfolio - Ray LimRay Lim
 
Mais+Economia ed. 0002 (WEB)
Mais+Economia ed. 0002 (WEB)Mais+Economia ed. 0002 (WEB)
Mais+Economia ed. 0002 (WEB)revistamaismais
 
Reussir Sa Strategie Fichier Btob
Reussir Sa Strategie Fichier BtobReussir Sa Strategie Fichier Btob
Reussir Sa Strategie Fichier BtobiProspects
 
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]François Jourde
 
TMPA-2015: A Need To Specify and Verify Standard Functions
TMPA-2015: A Need To Specify and Verify Standard FunctionsTMPA-2015: A Need To Specify and Verify Standard Functions
TMPA-2015: A Need To Specify and Verify Standard FunctionsIosif Itkin
 
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...Iosif Itkin
 
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...Iosif Itkin
 

Viewers also liked (16)

Çalışma Yaprağı
Çalışma YaprağıÇalışma Yaprağı
Çalışma Yaprağı
 
Using the voice
Using the voiceUsing the voice
Using the voice
 
cms45020
cms45020cms45020
cms45020
 
Abbie-Letterofrec
Abbie-LetterofrecAbbie-Letterofrec
Abbie-Letterofrec
 
My new Resume July 7 2016 for monster
My new Resume July 7 2016 for monsterMy new Resume July 7 2016 for monster
My new Resume July 7 2016 for monster
 
Tjedna lista romana (2)
Tjedna lista romana (2)Tjedna lista romana (2)
Tjedna lista romana (2)
 
4Q07 Results Presentation
4Q07 Results Presentation  4Q07 Results Presentation
4Q07 Results Presentation
 
Constancia Instructor ODM - Angello Manrique
Constancia Instructor ODM - Angello ManriqueConstancia Instructor ODM - Angello Manrique
Constancia Instructor ODM - Angello Manrique
 
Portfolio - Ray Lim
Portfolio - Ray LimPortfolio - Ray Lim
Portfolio - Ray Lim
 
Mais+Economia ed. 0002 (WEB)
Mais+Economia ed. 0002 (WEB)Mais+Economia ed. 0002 (WEB)
Mais+Economia ed. 0002 (WEB)
 
Reussir Sa Strategie Fichier Btob
Reussir Sa Strategie Fichier BtobReussir Sa Strategie Fichier Btob
Reussir Sa Strategie Fichier Btob
 
2016_13_brooking_prize_skaik
2016_13_brooking_prize_skaik2016_13_brooking_prize_skaik
2016_13_brooking_prize_skaik
 
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]
Trouver de nouveaux clients avec internet ? [Ecroissance-Prospection btob]
 
TMPA-2015: A Need To Specify and Verify Standard Functions
TMPA-2015: A Need To Specify and Verify Standard FunctionsTMPA-2015: A Need To Specify and Verify Standard Functions
TMPA-2015: A Need To Specify and Verify Standard Functions
 
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
 
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
 

Similar to 757

76201960
7620196076201960
76201960IJRAT
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningIRJET Journal
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolLaurie Smith
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...IRJET Journal
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
 
A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.IRJET Journal
 
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGSENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGIRJET Journal
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningIJSRD
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningIJSRD
 
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...IJDKP
 
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET Journal
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...IJCSIS Research Publications
 
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKIAEME Publication
 
Experimental of vectorizer and classifier for scrapped social media data
Experimental of vectorizer and classifier for scrapped social media dataExperimental of vectorizer and classifier for scrapped social media data
Experimental of vectorizer and classifier for scrapped social media dataTELKOMNIKA JOURNAL
 

Similar to 757 (20)

76201960
7620196076201960
76201960
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
 
F017433947
F017433947F017433947
F017433947
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data Mining
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.
 
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGSENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS ...
 
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
 
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
 
Experimental of vectorizer and classifier for scrapped social media data
Experimental of vectorizer and classifier for scrapped social media dataExperimental of vectorizer and classifier for scrapped social media data
Experimental of vectorizer and classifier for scrapped social media data
 

757

  • 1. Sentiments Analysis Of Twitter Data Using Data Mining Anurag P. Jain Dept. of Information Technology Pimpri Chinchwad College of Engineering Pune, India Email:anuragpjain7@gmail.com Mr. Vijay D. Katkar Dept. of Information Technology Pimpri Chinchwad College of Engineering Pune, India Email:katkarvijayd@gmail.com Abstract—With rapid growth in user of Social Media in recent years, the researcher get attracted towords the use of social media data for sentiments analysis of people or particular product or person or event. Twitter is one of the widely used social media platform to express the thoughts. This Paper presents approch for analysing the sentiments of users using data mining classifiers. It also compares the performance of single classifiers for sentiments analysis over ensemble of classifier. Expermental results obtained demonstrates that k-nearest neighbour classifier gives very high predictive accuracy. Result also demonstrate that single classifiers outperforms ensemble of classifier approch. keywords- Sentiments analysis, Twitter, k-nearest neigh- bour, Random Forest, Naive Naive Baysin, Bays Net, ensem- ble of classifiers, openion mining I. INTRODUCTION From the past few years, social media plays a vital role in modern life. Numbers of users of social media goes on incresing day by day. Users Post their view, thoughts, life events on social media and that too without any restriction and hesitation. Some of the social media allow users to interact with only with their freinds and sharing their post with very easy level of privacy. Due to simple and easy privacy policies,and easy accessibilty of a some social media, users are migrated from traditional means of communication such as blogs or mailing list to microblogging site such as Twitter, Facebook etc. Billions of text data in the form of messages on social media make it very facinating medium for data analysis for the researchers. Sentiments analysis is a method of computing and stratifying a view of a person given in a piece of a text, especially in order to identify persons thinking towards a specific topic product etc..is positive or negative or neutral. Social media portals have been globally used for expressing a sentiments publicly through text based message and images [1] . Currently, Twitter, facebook linkedIn, flickr etc enables users to post their view publicly. Among all social media Twitter widely get used for This work is partially funded by BCUD, Savitribai Phule Pune university 978-1-4673-7758-4/15/$31.00 c 2015 IEEE expressing view on certain topic. Twitter allow tweeple(i.e users of twitter) to post their opinion on politics environment,entertainment,industry,stock market etc. Twitter is Social Networking website that allow users to send and read 140 character messages called tweets. Users of Twitter can read and post tweets. According to Statista survey there are 304 million monthly active users of Twitter. Approximately 500 million tweets get tweeted per day on twitter. Huge amount of valuable data resides on twitter which gives opprunity to many researcher to dig up into tweets and make prediction about event, product, industry stock market etc by using various classification methods. The availability of huge amount of data has drawn attention to many researcher on researching twitter data statistically or more specific sense scientifically and meaningfully. Rest of the paper organized as follow: Section 2 describe about related work; Section 3 describes proposed method- ology; Section 4 presents Results and Analysis and final conlusion detailed in section 5 II. RELATED WORKS Many researcher carried out their research work in sentiments analysis using social media. Several reseacher have emphsize their attention on stastical results from social media using various sentiments analysis methods. Malhar Anjaria et.al. [1] introduce the novel approach of exploiting the user influence factor in order to predict the outcome of an election result. Athours also propose a hybrid approach of extracting opinion using direct and indirect features of Twitter data based on Support Vector Machines (SVM), Naive Bayes, Maximum Entropy and Artificial Neural Networks based supervised classifiers. Min Song et.al. [2] employ temporal Latent Dirichlet Allocation (LDA) to analyze and validate the relationship between topics extracted from tweets and related events. They developed the term cooccurrence retrieval technique to trace chronologically cooccurring terms and thereby compensate for LDAs limitations. Finally, authors identify thematic coherence
  • 2. among users identified in sending receiving mentions. Li Bing et. al. [3] proposed a method to mine Twitter data for prediction of the movements of the stock price of a particular company through public sentiments. Authors also explain how stock price of one company to be more predictable than that of another company and they proposed to used a data mining algorithm to determine the stock price movements of 30 companies listed in NASDAQ and the New York Stock Exchange can actually be predicted by the given 15 million records of tweets (i.e., Twitter messages). They did so by extracting ambiguous textual tweet data through NLP techniques to define public sentiment, then make use of a data mining technique to discover patterns between public sentiment and real stock price movements. Khoshgoftaar et. al. [4] explored techniques to apply data mining towards the goal of identifying those who score in the top 1.4% of a well-known psychopathy metric using information available from their Twitter accounts. Authors apply a newly-proposed form of ensemble learning, SelectRUSBoost (which adds feature selection to their earlier imbalance-aware ensemble in order to resolve highdimensionality), they also employed four classification learners, and use four feature selection techniques. Mahmood et. al. [5] Analyzed the impact of tweets in predicting the winner of the recent 2013 election held in Pakistan. Author used Rapid miner as data mining tool. For perforance analysis they used CHAID decision tree(Chisquared Automatic Interaction Detector), Nave Bayes and Support Vector Machine (SVM) algorithms. Author collected a tweets by using twittermachine. Tumitan, D. et. al. [6] investigate whether it is possible to predict variations in vote intention based on sentiment time series extracted from news comments, using three Brazilian elections as case study. They Mainly emphasize on an approach to predict polls vote intention variations that is adequate for scenarios of sparse data. Authors developed experiments to assess the influence on the forecasting accuracy of the proposed features, and their respective preparation. V.S. subrahmanian et. al. [7] used Sentiment Diffusion Forecasting by using dataset of 23 million tweets from over 16 million Twitter users. Reaserchers obtain the Sentiment Scores of tweets by adaptation of the AVA (adjective-verbadverb) sentiment analysis algorithm. NaiveBays :- Naive Baysin is used by many researchers [8,9,10] to detect the sentiments of tweet. It works using probalistic model given below p(Ck | z1....zn) (1) For each of k possible outcomes or classes.But if number feature is lagre that is value of n is large then above formula is not work well . Because probability tables become too large and infeasible to handle. Therefore Bays theorem is used, which decomposed the conditional probability as p(Ck|z) = p(Ck) p(z|Ck) p(z) . (2) Where Ck is class for each of k possible outcomes. And z are the instances to be classified . A Bayesian network is also widely get used classi- fier [11,112,13] in opinion mining or sentiments analysis . Bayesian network work on the principle of joint probability distribution function. Let pair (G,CPD) encodes joint prob- ability distribution p(X1,X2........Xn) where X(X1,X2,.....,Xn) discrete random variables.A unique joint probability distribu- tion X over G from is factorized as: p(X1, X2........Xn) = (p(Xi | Pa(Xi))) (3) . Many researcher [14,15,16] also used Randome Forest for detection of a sentiments od tweets. Random forests are an ensemble learning method for classification. Randaom forest are combination of trees. In more specific way, Random forest classifier defined as learning ensemble based on bagging of un-pruned decision tree learners wih randomised seletion of features at each split. knearest neighbour classifier is also a widely used clas- sifier in tweets classification. Many researcher [17,18,19] used knearest neighbour for classification of tweets .knear- est neighbour is simplest algorithm.knearest neighbour(KNN) is instance based learning algorithm where all computation delayed till classification. III. PROPOSED METHODOLOGY This paper presents a mechanism to predict the overall sentiments inclination of indian people towords political sit- uation and issue. Figure 1 shows the proposed system flow. Raw training tweets are collected by using Twitter API v 1.1. After collecting a raw tweets various preprocessing methode get applied to clean the data. Same methodes are applied for collecting and cleaning raw tweets for preparing test- ing dataset. After preparation of training and testing dataset various classifiers get applied to analyze the performance of classifiers . Following subsection explains the each figure 1 block in detail. A. Data Collection Traning and testing tweets collected from twitter by using twitter searched API v 1.1 for various political leaders and parties in india. Harvested tweets is of only a english language tweets.
  • 3. Fig. 1. Figure 1:Methodology. B. Preprocessing Tweets are sometime not in the usable format. To get a tweets in usable format various preprocessing methode for cleaning a tweets get applied. All userID,twitterId, userinfo from the tweets is removed. All special character and hy- perlinks is removed from the tweets. Duplicate tweets also get removed from traininig dataset. Traning data set does not contain retweets. After applying all cleaning methodes text that is only with the tweeted text is remained. C. Training DataSet and testing dataset For classifying a dataset, three classes, Positive, Negative, Nuetral are used. To classify tweets in to these category SentiWordNet 3.0.0. dictionary is used. SentiWordNet 3.0.0. dictionary contains the 117659 word. Each words is assgin with its positive negative polarity. If positive and negative polarity of word is 0 then word considerd as neutral. Basically single tweet is spilted into words, after splitting, polarity of words is calculated from SentiWordNet 3.0.0. After deciding a polarity of each word all positive and negative words polarity get added separatly. And then comparision between positive words polarity with negative words polarity get done. If sentense having more positive polarity then classify sentense( tweet) as positive polarity. Same for nagative and neutral polarity. Duplicates tweets were not considerd in a training dataset. IV. RESULT ANALYSIS AND DISCUSSION 2,102,52 tweets were collected about various political lead- ers and parties. Data cleaning process leaves us with 35% of original tweets.2 lack tweets collcted for various political lead- ers. Experiment performed with following classifier: classifier. 1)k-nearest neighbour 2)Random Forest. 3)Naive Baysin 4)Baysnet. Table 1 shows the prediction accuracy of all classifiers when stopwords are not removed from traning and testing tweets TABLE I ACCURACIES FOR MACHINE LEARNER Algorithm Accuracy k-nearest neighbour 99.6456% RandomForest 99.0373% BaysNet 75.0695% NaivBays 60.3159 k-nearest neighbour and random forest gives much better accuracy compred to naive baysin and bays Net. k-nearest neighbour give a highest accuracy.Prediction accuracy of k-nearest neighbour 99.6456%. Table 2 shows the prediction accuracy of all classifiers when stopwords are removed from traning and testing tweets. k-nearest neighbour gives better accuracy compred to all three classifier. Prediction accuracy of k-nearest neighbour when stopwords are removed is 96.6398% TABLE II ACCURACY OF MACHINE LEARNER WITHOUT STOPWORDS Algorithm Accuracy k-nearest neighbour 96.6398% RandomForest 65.6681% BaysNet 48.9579% NaivBays 60.3159 Table 3 shows prediction accuracy when ensemble of classifier are used to classify tweets without stopwords removel.Ensemble of random forest , Baysnet and k-nearest neighbour gives a higher accuracy that is 99.2400% TABLE III ACCURACY OF ENSEMBLE OF CLASSIFIER WITH STOPWORDS Algorithm Accuracy Naive baysin Baysnet RandomForest 81.8989% Naive Bayesian BayesNet KNN 82.0705% RandomForest BayesNet KNN 99.2400% Naive Bayesian BayesNet RandomForest KNN 91.8448%
  • 4. Table 4 shows prediction accuracy when ensemble of classi- fier are used to classify tweets without stopwords . Ensemble of random forest , Baysnet and k-nearest neighbour gives a higher accuracy that is 91.0418% TABLE IV ACCURACY OF ENSEMBLE OF CLASSIFIER WITHOUT STOPWORDS Algorithm Accuracy Naive baysin Baysnet RandomForest 73.6148% Naive Bayesian BayesNet KNN 73.8515% RandomForest BayesNet KNN 91.0418% Naive Bayesian BayesNet RandomForest KNN 87.0895% V. CONCLUSION It can be observed from the experimental results that data mining classifiers is a good choice for sentiments pre- diction using tweeter data. In a experimentation, knearest neighbour(IBK) outperforms over all three classifier namely RandomForest, baysNet, Naive Baysein. RandomForest also gives good prediction accuracy. There is a no need to use of ensemble of classifier for sentiments predictions of tweets as single classifier ( i.e knearest neighbour) gives a better accuracy over all combinations of ensemble of classifier . REFERENCES [1] Malhar Anjaria, Ram Mahana Reddy Guddeti, ”Influence Factor Based Opinion Mining of Twitter Data Using Supervised Learning”, Sixth International Conference on Communication Systems and Networks (COMSNETS), 2014 IEEE [2] Min SongMeen Chul Kim , Yoo Kyung Jeong,”Analyzing the Political Landscape of 2012 Korean Presidential Election in Twitter ”,Intelligent Systems, IEEE (Volume:29 , Issue: 2 ) ,2014 IEEE. [3] Li Bing ,Chan, K.C.C. , Ou, C. ,”Public Sentiment Analysis in Twitter Data for Prediction of A Companys Stock Price Movements”,11th Inter- national Conference on e-Business Engineering (ICEBE), 2014 IEEE. [4] Wald, R. , Khoshgoftaar, T.M., Napolitano, A.Sumner, C.,”Using Twitter Content to Predict Psychopathy”,11th International Conference on Ma- chine Learning and Applications 2014. [5] Mahmood, T.,Iqbal, T. ; Amin, F. ; Lohanna, W.; Mustafa, A.,Mining Twitter big data to predict 2013 Pakistan election winner,16th Interna- tional Multi Topic Conference (INMIC),2013 IEEE. [6] Tumitan, D. ,Becker, K.,”Sentiment-Based Features for Predicting Elec- tion Polls: A Case Study on the Brazilian Scenario”,Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on (Volume:2 ) 2015. [7] Vadim Kagan and Andrew Stevens, V.S. Subrahmanian.”Using Twitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election” ,Intelligent Systems, IEEE (Volume:30 , Issue: 1, ) 2015 IEEE [8] ,Bo Pang. Lilliam Lee, ”Seeing Stars: Exploiting class relationships fpr sentiment categorization with respect to rating scales”, 2002. [9] Cozma, R., and Chen, K., ”Congressional Candidates” Use of Twitter During the 2010 Midterm Elections: A Wasted Opportunity?” 6lst Annual Conference of the International Communication Association, 2011. [10] Pew Research Center, ”Parsing Election Day Media: How the Midterms Message Varied by Platform”, Pew, 2010. [11] S. Meganck, P. Leray, B. Manderick, Learning Causal Bayesian Net- works from Observations and Experiments: A Decision Theoretic Ap- proach, Proceedings of Modelling Decisions in Artificial Intelligence (MDAI 2006), LNAI 3885, 2006, pp. 58-69. [12] G. Li, T.-Y. Leong, A framework to learn Bayesian Networks from changing, multiple-source biomedical data, Proceedings of the 2005 AAAI Spring Symposium on Challenges to Decision Support in a Changing World, Stanford University, CA, USA, 2005, pp. 66-72. [13] R.E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2004. [14] http://www.dabi.temple.edu/∼hbling/8590.002/Montillo RandomForests 4-2-2009.pdf [15] Gokulakrishnan, B,Priyanthan, P., Ragavan, T,”Opinion mining and sentiment analysis on a Twitter data stream”, Advances in ICT for Emerging Regions (ICTer), 2012 International Conference on,2012 IEEE. [16] Ndia F.F. da Silva, Eduardo R. Hruschkaa,”Tweet sentiment analysis with classifier ensembles”,Decision Support Systems,Volume 66, October 2014, Pages 170179. [17] http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN Talk.pdf [18] http://www.math.le.ac.uk/people/ag153/homepage/KNN/KNN3.html.