Fast and accurate sentiment classification us and naive bayes model b516001

Fast and Accurate Sentiment Classification
Using NLTK And Naive Bayes Model
Presented By -
Abhisek Sahoo (ID - B516001)
Computer Engineering(2016-20)
Under the guidance of
Prof. Sabyasachi Patra
oj
Department of Computer Science And Engineering
International Institute of Information Technology, Bhubaneswar

Sentiment Analysis ?
Aim to determine the attitude of a
speaker/writer with respect to some text

Positive or Negative ?
This is a good book ! Positive
This is a good book ! I like it ! Positive (more)
This is a bad book! Negative
The first chapter is good,
But the rest is terrible. Negative

Features of Sentiment Analysis
➢ Identifying trends of public opinion in social media
➢ Marketing and consumer research
➢ Customer feedback on new product launches, political
campaigns
➢ To facilitate smarter business decisions
➢ Enhanced product recommendation

Natural
Language
Processing
➢ Field of computer science, artificial intelligence,
and computational linguistics concerned with the
interactions between computers and human (natural)
languages.
➢ Ability to draw insights from data contained in
emails, videos, and other unstructured material.
➢ The various aspects of NLP include Parsing,
Machine Translation, Language Modelling,
Machine Learning, Semantic Analysis etc.

Natural
Language
Toolkit
➢ NLTK is a leading platform for building Python
programs to work with human language data.
➢ It provides easy-to-use interfaces to over 50 corpora
and lexical resources such as WordNet.
➢ It provides a suite of text processing libraries for
classification, tokenization, stemming, tagging,
parsing, and semantic reasoning.

Dataset :
NLTK
Corpora
➢ A publicly available dataset of tweets from the
Natural Language Toolkit Corpus Library is used.
➢ Tweets dataset comprised of a ample collection of
individual emotions and captures most of the
adjectives important to sentiment classification.
➢ It consists of 30,000 extremely polar tweets for
training dataset and 10,000 for testing dataset.
➢ Both the training and test sets comprises of an
identical number of negative and positive tweets.

Tokenization
The process of breaking a stream of text up into words, phrases, symbols, or other
meaningful elements called tokens.

Normalization
➢ Normalization in NLP is the process of converting a word to its canonical form.
➢ Stemming is a process of removing affixes from a word.
➢ Lemmatization is a process in which a word normalizes with the context of vocabulary and
morphological analysis of words in text.
➢ Wordnet is a lexical database available in NLTK for the English language that helps the script
determine the base word.

Removal of Noise/Stop Words
➢ Noise is any part of the text that does not add any meaning to data.
➢ It is necessary to remove all hyperlinks, @ symbol of tweeter handles ,punctuation and
special characters.
➢ Also required to remove the stop words like “is”, “a” and “the” from the sentence.

Determine Word Density
➢ The most basic form of analysis on textual data is to take out the word frequency.
➢ After compiling all words in the sample of tweets, the most common words can be found out
using the FreqDist class of NLTK.

Separate The Data
Training data to train the model
Testing data to compute the accuracy

Training Data
(With Labels)
Model
Real Data Trained Model Prediction

Compute The Accuracy
Accuracy = No. of correct predictions / Total no. of statements

Future Work
A web based application can
implemented for better access
Web scraping can be
implemented for direct retrieval
of tweets or reviews from
Twitter or any other platform
More classification categories
can be added to determine the
sentiment more specifically
Supervised with multiple
languages to make it more local

Conclusion
We conclude that by using various NLTK modules for
preprocessing and NLTK Naive Bayes classifier it is easier to
classify the tweets and get better accuracy.

References
➢ https://www.researchgate.net/publication/220482883_NLTK_the_Natural_Language_
Toolkit
➢ Basic Sentiment Analysis using NLTK - Towards Data Science
➢ https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
➢ https://www.kaggle.com/lakshmi25npathi/sentiment-analysis-of-imdb-movie-reviews
➢ http://ijcsit.com/docs/Volume%206/vol6issue06/ijcsit20150606134.pdf
➢ https://pdfs.semanticscholar.org/c151/dfad8c1bf88b0afc716758c77d533ded7dd0.pdf

Fast and accurate sentiment classification us and naive bayes model b516001

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fast and accurate sentiment classification us and naive bayes model b516001

Similar to Fast and accurate sentiment classification us and naive bayes model b516001 (20)

Recently uploaded

Recently uploaded (20)

Fast and accurate sentiment classification us and naive bayes model b516001

Editor's Notes