2. CONTENTS:
Abstract
Existing System
Disadvantages of Existing System
Proposed System
Advantages of Proposed System
Modules
Software & Hardware Requirements
3. ABSTRACT:
Sentiment analysis deals with identifying and classifying
opinions or sentiments expressed in source text. Social media is
generating a vast amount of sentiment rich data in the form of
tweets, status updates, blog posts etc. Sentiment analysis of this
user generated data is very useful in knowing the opinion of the
crowd.
Twitter sentiment analysis is difficult compared to general
sentiment analysis due to the presence of slang words and
emoticons. The maximum limit of characters that are allowed in
Twitter is 140.
4. EXISTING SYSTEM :
The existing system, Uses knowledge base approach to classify the
tweets into either positive, negative or neutral. But, employing this
method results in less accuracy of the classification.
DISADVANTAGES OF EXISTING SYSTEM:
In Existing System, They have employed Lexicon based method to
compute the sentiment of the data coming from twitter which resulted in
lower accuracy rate.
Also, there is a lot of overhead while computing the sentiment of a
sentence, Because for each word this method retrieves the
sentiment from a predefined word dictionary(Generally SentiWord)
5. PROPOSED SYSTEM :
In the proposed system, we try to analyze the sentiment of the twitter posts
about electronic products like mobiles, laptops etc using Data Mining approach.
By doing sentiment analysis in a specific domain, it is possible to identify the
effect of domain information in sentiment classification.
In proposed system we are doing a comparative study on finding the sentiment
using two different algorithms they are NaïveBaye’s Method and Support Vector
Machine(SVM).
6. ADVANTAGES OF PROPOSED SYSTEM :
In proposed system we have used Data Mining Techniques which resulted in
increasing the accuracy rate for finding the sentiment of data.
Because of absence of the predefined datasets to find out the sentiment of
each word. So, as a result the overhead on the algorithms has been reduced
drastically, which directly resulted in the increase of the efficiency.
We have used WordCloud and Pie Chart to represent the final sentiment
visually which could help the user to apprehend the sentiment more easily.
7. MODULES:
1. Training and Testing Data Collection.
2. Data preprocessing and feature extraction.
3. Training and testing Algorithm(Compare Results).
4. Download and Preprocessing of Tweets from Twitter.
5. Discovery of Sentiment from Tweets.
8. DATA COLLECTION:
In order to perform sentiment of tweets we have to collect
largest dataset possible.
We have collected data from different datasets (SNAP platform
by Stanford University, Amazon’s user reviews).
We have to bring those datasets into the desired format and
assign sentiment to those tuples. We have denoted the tuples as
Positive, Negative and Neutral in the following format:
Positive Review : 4
Neutral Review : 2
Negative Review : 0
9. DATA PREPROCESSING AND FEATURE
EXTRACTION:
The first step after preparing the dataset is to preprocess it. Because we
need to extract the important features and remove the unwanted
information from the dataset.
Preprocessing of Dataset involves the following steps:
Removal of URL’s:
Twitter data consists of different type of information. If any user
posted any link which is none of the use for sentiment analysis.
Therefore, URL should be removed from the tweet.
Removal of special symbol:
There are various types of symbols used by the user such as comma (,),
full stop (.) etc. which does not contain sentiment. Therefore, special
symbols should be removed from the tweet.
10. Converting emoticons:
It shows the various emoticons used for conversion. Nowadays
emoticons become away for the user to express their views, feeling,
and emotion. Emotions play a big role in the sentiment analysis.
Therefore ,convert the whole emoticons into its equivalent word by
which we can do the analysis efficiently.
Removal of Username:
Every Twitter user has a unique username, therefore, anything is
written by a user can be indicated be writing their username
proceeding by @. This type is denoted as proper nouns. For
example, @ username. This also has to be removed for effective
analysis.
11. 5) Removal of Hash tag:
A hash tag is a prefixed with the hash symbol (#). Hash tag are used
for naming subjects or phrases that are currently in trend. For
example, #google,#twitter.
6) Removal of additional white spaces:
There may be consists of extra white space in the data and it needs
to be removed. By removing white spaces the analysis to be done
more efficiently.
13. TRAINING AND TESTING ALGORITHM:
After preprocessing the Train and Test Dataset. We need to
provide this data for training the algorithm.
In this step, We have used two Data Mining Algorithms namely
Naïve Bayesian Algorithm and Support Vector Machine(SVM).
During our research, We have found out that SVM out beats Naïve
Bayesian Algorithm in every test. So, We have implemented SVM
in the further project work.
15. DOWNLOAD OF TWEETS FROM TWITTER:
Download the Tweets for a specific keyword can be achieved by using
Tweepy library which is an twitter API for downloading the tweets.
The Tweepy API directly communicates with the Data Source after
providing the Authentication keys and tokens that are required.
After Successful handshake between the our Source code and Twitter
API we can download the user tweets. We need to save these tweets so
that we can perform sentiment.
16. DISCOVERY OF SENTIMENT ANALYSIS
The next step after downloading the tweets for a
specific keyword. We need to input the download
data to Support Vector Machine(SVM).
Then SVM performs the sentiment and outputs the
result in the form of Pie Chart and Word Cloud.