In today’s world, Social Networking website like Twitter, Facebook , Linkedin, etc. plays a very significant role. Twitter is a micro-blogging platform which provides a tremendous amount of data which can be used for various application of sentiment Analysis like predictions, review, elections, marketing, etc. Sentiment Analysis is a process of extracting information from large amount of data, and classifies them into different classes called sentiments.
Human Factors of XR: Using Human Factors to Design XR Systems
Fast and accurate sentiment classification us and naive bayes model b516001
1. Fast and Accurate Sentiment Classification
Using NLTK And Naive Bayes Model
Presented By -
Abhisek Sahoo (ID - B516001)
Computer Engineering(2016-20)
Under the guidance of
Prof. Sabyasachi Patra
oj
Department of Computer Science And Engineering
International Institute of Information Technology, Bhubaneswar
3. Positive or Negative ?
This is a good book ! Positive
This is a good book ! I like it ! Positive (more)
This is a bad book! Negative
The first chapter is good,
But the rest is terrible. Negative
4. Features of Sentiment Analysis
➢ Identifying trends of public opinion in social media
➢ Marketing and consumer research
➢ Customer feedback on new product launches, political
campaigns
➢ To facilitate smarter business decisions
➢ Enhanced product recommendation
7. Natural
Language
Processing
➢ Field of computer science, artificial intelligence,
and computational linguistics concerned with the
interactions between computers and human (natural)
languages.
➢ Ability to draw insights from data contained in
emails, videos, and other unstructured material.
➢ The various aspects of NLP include Parsing,
Machine Translation, Language Modelling,
Machine Learning, Semantic Analysis etc.
8. Natural
Language
Toolkit
➢ NLTK is a leading platform for building Python
programs to work with human language data.
➢ It provides easy-to-use interfaces to over 50 corpora
and lexical resources such as WordNet.
➢ It provides a suite of text processing libraries for
classification, tokenization, stemming, tagging,
parsing, and semantic reasoning.
9. Dataset :
NLTK
Corpora
➢ A publicly available dataset of tweets from the
Natural Language Toolkit Corpus Library is used.
➢ Tweets dataset comprised of a ample collection of
individual emotions and captures most of the
adjectives important to sentiment classification.
➢ It consists of 30,000 extremely polar tweets for
training dataset and 10,000 for testing dataset.
➢ Both the training and test sets comprises of an
identical number of negative and positive tweets.
11. Tokenization
The process of breaking a stream of text up into words, phrases, symbols, or other
meaningful elements called tokens.
12. Normalization
➢ Normalization in NLP is the process of converting a word to its canonical form.
➢ Stemming is a process of removing affixes from a word.
➢ Lemmatization is a process in which a word normalizes with the context of vocabulary and
morphological analysis of words in text.
➢ Wordnet is a lexical database available in NLTK for the English language that helps the script
determine the base word.
13. Removal of Noise/Stop Words
➢ Noise is any part of the text that does not add any meaning to data.
➢ It is necessary to remove all hyperlinks, @ symbol of tweeter handles ,punctuation and
special characters.
➢ Also required to remove the stop words like “is”, “a” and “the” from the sentence.
14. Determine Word Density
➢ The most basic form of analysis on textual data is to take out the word frequency.
➢ After compiling all words in the sample of tweets, the most common words can be found out
using the FreqDist class of NLTK.
20. Future Work
A web based application can
implemented for better access
Web scraping can be
implemented for direct retrieval
of tweets or reviews from
Twitter or any other platform
More classification categories
can be added to determine the
sentiment more specifically
Supervised with multiple
languages to make it more local
21. Conclusion
We conclude that by using various NLTK modules for
preprocessing and NLTK Naive Bayes classifier it is easier to
classify the tweets and get better accuracy.
22. References
➢ https://www.researchgate.net/publication/220482883_NLTK_the_Natural_Language_
Toolkit
➢ Basic Sentiment Analysis using NLTK - Towards Data Science
➢ https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
➢ https://www.kaggle.com/lakshmi25npathi/sentiment-analysis-of-imdb-movie-reviews
➢ http://ijcsit.com/docs/Volume%206/vol6issue06/ijcsit20150606134.pdf
➢ https://pdfs.semanticscholar.org/c151/dfad8c1bf88b0afc716758c77d533ded7dd0.pdf
NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning.
I choose tweets as my dataset since it comprise of a ample collection of individual emotions and captures most of the adjectives important to sentiment classification.