What Is Sentiment Analysis?
Problem Statement
Why Twitter data?
The Process at a Glance
Methodology: How are we doing it?
Pre-processing of the datasets
Extract the candidate or take it as user input.
Calculate sentiment
Visualizing the candidate data
What visualization are we talking about?
2. Overview : What Is Sentiment Analysis?
Sentiment analysis is the automated process of identifying and classifying
subjective information in text data. This might be an opinion, a judgment, or a
feeling about a particular topic or product feature.
The most common type of sentiment analysis is ‘polarity detection’ & ‘subjectivity
detection’ and involves classifying statements as Positive, Negative or Neutral.
● Polarity : -1 to 1
● Subjectivity : 0 to 1
3. Problem Statement
● In this project, we try to implement a Twitter sentiment analysis model that
helps to overcome the challenges of identifying the sentiments of the tweets
in all subject platforms like business, politics, public actions etc.
4. Current Scenario
Twitter is an innovative service aired in 2006 with currently more than 550 million
users. Surprisingly witnesses tweets covering everything under the world, ranging
from current political affairs to personal experiences. Movie reviews, travel
experiences , current events etc.
5. Our Objective
Social networks is a rich platform to learn about people’s opinion and sentiment
regarding different topics as they can communicate and share their opinion
actively on social media.
To implement an algorithm for automatic classification of data into positive,
negative or neutral.
Analyze people's sentiments, attitudes, opinions,emotions, etc. towards elements
such as, products, individuals, topics ,organizations, and services.
Determine the attitude of the mass is positive, negative or neutral towards the
subject of interest.
6. Why Twitter data?
Twitter is a gold mine of data. Unlike other social platforms, almost every user’s
tweets are completely public and pullable. This is a huge plus if you’re trying to get
a large amount of data to run analytics on. Twitter data is also pretty specific.
A simple application of this could be analyzing how your company is received in
the general public. You could collect the last 2,000 tweets that mention your
company (or any term you like), and run a sentiment analysis algorithm over it.
7. The Process at a Glance
Importing
libraries and
dataset
Exploring and
preprocessing
the dataset
Sentiment Analysis Visualization of
analyzed data
8. Methodology : How we are doing it?
1. Authenticate on twitter
2. Importing libraries and dataset
3. Pre-processing of the datasets
4. Extract candidate or take it as a user input
5. Calculate sentiment
6. Visualise the candidate data
9. Authenticate on Twitter
In order to fetch tweets through Twitter API, one needs to register an App through
their twitter account. Then
● Create a twitter developer account
● Create a app and get the ‘Keys and Access Tokens’.
● Use the OAuthHandler package from tweepy library to authenticate the
consumer keys and access tokens
10. Importing libraries and dataset
● Create a Twitter Client class using Tweepy library to interact with the Twitter
API
● Take a hashtag input from the user to pass as a query parameter
● After authenticating the client pass the query parameter in API search
function with the number of items you want in return
● Parse the results into a array for future processing
11. Pre-processing of the datasets
● Remove all URLs (e.g. www.xyz.com), hash tags
● Correct the spellings
● Replace all the emoticons with their sentiment.
● Remove all punctuations, symbols, numbers
● Remove Stop Words
● Expand Acronyms(we can use a acronym dictionary)
● Remove Non-English Tweets
12. Extract candidate or take it as a user input
● Either we take a reference array as input from user.
OR
● We can use feature extraction to get the vector keywords
13. Calculate sentiment
● Using TextBlob library to extract sentiment (i.e polarity and subjectivity) for
each tweet
● We can also do various type of statistical analysis on the tweets like
aggregate the results based on min, max, median, mean, etc using numpy
14. Visualising the candidate data
We can use matplotlib or other client side libraries to visualize the data on the
basis of
● Timestamp
● Polarity
● Subjectiveness
● Number of Reference
● Aggregate columns (min, max, median, mean, etc)
16. References
● ‘’Twitter as a Corpus for Sentiment Analysis and Opinion Mining". In
Proceedings of the Seventh Conference on International Language Resources
and Evaluation, 2010, pp.1320-1326
● Sentiment140 dataset with 1.6 million tweets
● Twitter Datasets for Natural Language Processing and Machine Learning
● Sentiment Analysis of Twitter Data: A Survey of Techniques