2. Contents
What is sentiment analysis ?
Why sentiment analysis is important ?
Using Twitter for sentiment analysis.
Extraction of Tweets.
Approach.
Different ways of Classification.
Challenges.
Data collection.
Data pre processing.
Case diagram for sentiment analysis.
Some Result
Conclusion and future scope.
References.
3. What is sentiment analysis?
It is classification of the polarity of a given text
in the document, sentence or phrase.
The goal is to determined whether the expressed
opinion in the text is positive, negative or
neutral. It is also known as Opinion Mining.
4. Why sentiment analysis ?
Micro blogging has become popular
communication tool.
Opinion of the mass is important.
• Political party may want to know whether people support
their program or not.
• Before investing into a company, one can leverage the
sentiment of the people for the company to find out
where it stands.
• A company might want find out the reviews of its
products.
5. Using Twitter for sentiment analysis :-
Twitter is micro blogging site.
Short text messages of 140 characters.
240+ Million active users.
500 million tweets are generated everyday.
Twitter audience varies from common man to
celebrities.
Users often discuss current affairs and share
personal views on various subjects.
Tweets are small in length and hence
unambiguous.
6. Extraction of Tweets :-
Twitter allows us to mine the data of any user
using Twitter API or Tweepy. The data will be
tweets extracted from the user. The first thing to
do is get the consumer key, consumer secret,
access key and access secret from twitter
developer available easily for each user. These
keys will help the API for authentication.
Tweepy :- Tweepy is one of the library that
should be installed using pip. Now in order to
authorize our app to access Twitter on our behalf,
we need to use the OAuth Interface. Tweepy
provides the convenient Cursor interface to
iterate through different types of objects. Twitter
allows a maximum of 3200 tweets for extraction.
7. Steps to obtain keys :-
Login to Twitter developer section
Go to “Create an App”
Fill the details of the application
Click on create your Twitter application
Details of new app will be shown along with
consumer key and consumer secret.
For access token, click ”Create my access token”.
The page will refresh and generate access token.
8. You can leave the Callback URL empty. Agree to the Developer
Conditions and select Create App.
We need the Secret Keys and Access Tokens for the API to
work. Please Click on “Keys and Access Tokens” Tab. You will
find Consumer Key and Consumer Secret. Note them down.
9. Now, we need to create Access Tokens for our Account. Click
on “Create my access token”
And then note down the “Access Token” and “Access Token
Secret”
Now we are ready to retrieve tweets from Twitter Stream.
10. APPORACH :-
Tweet downloader
Pre Processing
Remove of Nouns and Prepositions
Replace Negative Mentions
Feature Extractor
Prediction
11. Different ways of Classifications:-
Binary Classification :- It is a two way categorization i.e. Positive or
Negative.
3-Tier :- In this, Tweets are categorized as Positive, Negative and Neutral.
5-Tier :- In this, Tweets are categorized in five classes namely- Extremely
Positive, Positive, Neutral, Negative and Extremely Neutral.
We will do sentiment analysis using VADAR or Valence Aware Dictionary
and sEntiment Reasoning. VADER belongs to a type of sentiment analysis
that is based on lexicons of sentiment-related words. In this approach,
each of the words in the lexicon is rated as to whether it is positive or
negative, and in many cases, how positive or negative. Below you can see
an excerpt from VADER’s lexicon, where more positive words have higher
positive ratings and more negative words have lower negative ratings.
WORD SENTIMENT RATING
REJOICED 2.0
INSANE -1.7
DISASTER -3.1
GREAT 3.1
12. When VADER analyses a piece of text it checks to see if any of the words in
the text are present in the lexicon.
For example, the sentence “The food is good and the atmosphere is
nice” has two words in the lexicon (good and nice) with ratings of 1.9
and 1.8 respectively.
VADER produces four sentiment metrics from the word ratings. The first
three positive, neutral and negative represents the proportion of the text
that falls into those categories. In our example sentence was rated 45%
positive, 55% neutral and 0% negative. The final metric Compound score
is the sum of all the lexicon ratings (1.9 & 1.8) which have been
standardized to range between -1 and 1.
Our example sentence has a rating of 0.69, which is pretty strongly positive.
Sentiment Metric Value
Positive 0.45
Neutral 0.55
Negative 0.00
Compound 0.69
13. CHALLENGES :-
Tweets are highly unstructured and also non-grammatical.
Out of Vocabulary words.
Lexical variation.
Extensive usage of acronym like asap, lol etc.
14. DATA COLLECTION :-
Data streaming:- For performing sentimental
analysis we need Twitter data consisting of
Tweets about a particular keyword or query
term.
NOTE- Tweets are short messages restricted to
140 characters in length. Due to the nature of
this micro blogging service (quick & short
messages), people use acronym like spelling
mistakes, use emotions, and other character
that express special meaning.
15. DATA PRE PROCESSING:-
It is a process to remove unwanted words from
Tweets that does not account to any sentiments.
1. Emotional icons- 170 emotions, identified
emotional icons and remove them.
2. URLs- URLs does not signify any sentiment,
replaced it with a word |URL|.
3. Stop words- words as “a”, “is”, “the”; does not
indicate any sentiment.
16. 4. UserNames and HasTags- @ symbol before the
username and # for the topic; both replaced
with AT_USER.
5. Repeated letters- hunnngry, huuuuungry into
the token “hunngry”.
6. Slang words- Non English words
17. CASE DIAGRAM FOR SENTIMENT ANALYSIS
CONNECT TO
TWITTER
http
REQUEST
FOR
TWEETS
TWITTER API
AUTHORIZATION
RETRIEVE
METADATA FOR
EACH SET
STORE DATA
IN DATABASE
EXTRACT
SIGNIFICANT
PHRASES FOR EACH
TWEETS
CONNECT TO
DATABASE
STORE
RESULT IN
DATABASE
PERFORM
SENTIMENT
ANALYSIS ON EACH
TWEETS
http
RESPONSE
FROM
TWITTER
PLOT
GRAPHDISPLAY
RESULT
USER TWITTER
TWEET
SEARCH
19. Result stored in Database :-
Tweets are stored in the form of raw data in MS-Excel with its
values showing positive, negative, neutral and compound.
21. CONCLUSION :-
The field of sentiment analysis is an exciting new
research direction due to large number of real-world
applications where discovering people’s opinion is
important in better decision-making.
Recently, people have started expressing their
opinions on the Web that increased the need of
analyzing the opinionated online content for various
real-world applications.
A lot of research is present in literature for detecting
sentiment from the text. Still, there is a huge scope
of improvement of these existing sentiment analysis
models. Existing sentiment analysis models can be
improved further with more semantic and
commonsense knowledge.
22. FUTURE SCOPE :-
Data Pre-Processing using more parameters to
get best sentiments.
Updating Dictionary for new Synonym and
Antonyms of already existing words.
Web-Application can be converted to Mobile
Application.
Multi-lingual support: Due to the lack of multi-
lingual lexical dictionary, it is current not feasible
to develop a multi-language based sentiment
analyser.
Analysing sentiments on emoji/smileys.