https://www.learntek.org/blog/nltk-sentiment-analysis/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
2. CHAPTER ā 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
3. NLTK Sentiment Analysis
About NLTK :
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and
programs for symbolic and statistical natural language processing (NLP) for English
written in the Python programming language.
It was developed by Steven Bird and Edward Loper in the Department of Computer
and Information Science at the University of Pennsylvania.
Copyright @ 2019 Learntek. All Rights Reserved.
4. Copyright @ 2019 Learntek. All Rights Reserved. 4
Sentiment Analysis :
Sentiment Analysis is a branch of computer science, and overlaps heavily with
Machine Learning, and Computational Linguistics Sentiment Analysis is the
most common text classification tool that analyses an incoming message and
tells whether the underlying sentiment is positive, negative our neutral.
It the process of computationally identifying and categorizing opinions
expressed in a piece of text, especially in order to determine whether the
writerās attitude towards a particular topic, product, etc. is positive, negative,
or neutral.
5. Copyright @ 2019 Learntek. All Rights Reserved. 5
Sentiment Analysis is a concept of Natural Language Processing and Sometimes
referred to as opinion mining, although the emphasis in this case is on extraction
6. Copyright @ 2019 Learntek. All Rights Reserved. 6
Examples of the sentimental analysis are as follows :
ā¢Is this product review positive or negative?
ā¢Is this customer email satisfied or dissatisfied?
ā¢Based on a sample of tweets, how are people responding to this ad
campaign/product release/news item?
ā¢How have bloggersā attitudes about the president changed since the election?
ā¢The purpose of this Sentiment Analysis is to automatically classify a tweet as a
positive or Negative Tweet Sentiment wise
7. Copyright @ 2019 Learntek. All Rights Reserved. 7
ā¢Given a movie review or a tweet, it can be automatically classified in categories.
These categories can be user defined (positive, negative) or whichever classes you
want.
ā¢Sentiment Analysis for Brand Monitoring
ā¢Sentiment Analysis for Customer Service
ā¢Sentiment Analysis for Market Research and Analysis
9. Copyright @ 2015 Learntek. All Rights Reserved.
9
Sample Positive Tweets
ā¢I love this car
ā¢This view is amazing
ā¢I feel great this morning
ā¢I am so excited about the concert
ā¢He is my best friend
Sample Negative Tweets
ā¢I do not like this car
ā¢This view is horrible
ā¢I feel tired this morning
ā¢I am not looking forward to the concert
ā¢He is my enemy
10. Copyright @ 2019 Learntek. All Rights Reserved. 10
Sentimental Analysis Process
ā¢The list of word features need to be extracted from the tweets.
ā¢It is a list with every distinct word ordered by frequency of appearance.
ā¢The use of Feature Extractor to decide which features are more relevant.
ā¢The one we are going to use returns a dictionary indicating that words are
contained in the input passed.
12. Copyright @ 2019 Learntek. All Rights Reserved. 12
Naive Bayes Classifier
ā¢It uses the prior probability of each label ā which is the frequency of each label in
the training set and the contribution from each feature.
ā¢In our case, the frequency of each label is the same for āpositiveā and ānegativeā.
ā¢Word āamazingā appears in 1 of 5 of the positive tweets and none of the negative
tweets.
ā¢This means that the likelihood of the āpositiveā label will be multiplied by 0.2 when
this word is seen as part of the input
13. Copyright @ 2019 Learntek. All Rights Reserved. 13
Sentiment Analysis Example 1 :
Training Data
1.This is a good book! Positive
2.This is a awesome book! Positive
3.This is a bad book! Negative
4.This is a terrible book! Negative
Testing Data
ā¢This is a good article
ā¢This is a bad article
14. Copyright @ 2019 Learntek. All Rights Reserved. 14
We will train the model with the help of training data by using NaĆÆve Bayes
Classifier.
And then test the model on testing data.
15. Copyright @ 2019 Learntek. All Rights Reserved. 15
>>> def form_sent(sent):
...return {word: True for word in nltk.word_tokenize(sent)}
...
>>> form_sent("This is a good book")
{'This': True, 'is': True, 'a': True, 'good': True, 'book': True}
>>> s1='This is a good bookā
>>> s2='This is a awesome bookā
>>> s3='This is a bad bookā
>>> s4='This is a terrible book'
>>> training_data=[[form_sent(s1),'pos'],[form_sent(s2),'pos'],[form_sent(s3),'neg'],[form_sent(s4),'neg']]
>>> for t in training_data:print(t)
...
[{'This': True, 'is': True, 'a': True, 'good': True, 'book': True}, 'posā]
[{'This': True, 'is': True, 'a': True, 'awesome': True, 'book': True}, 'pos']
16. Copyright @ 2019 Learntek. All Rights Reserved. 16
[{'This': True, 'is': True, 'a': True, 'bad': True, 'book': True}, 'negā]
[{'This': True, 'is': True, 'a': True, 'terrible': True, 'book': True}, 'negā]
>>> from nltk.classify import NaiveBayesClassifier
>>> model = NaiveBayesClassifier.train(training_data)
>>>model.classify(form_sent('This is a good articleā))
'posā
>>>model.classify(form_sent('This is a bad articleā))
'negā
>>>
18. Copyright @ 2019 Learntek. All Rights Reserved. 18
Accuracy
NLTK has a built-in method that computes the accuracy rate of our model:
>>> from nltk.classify.util import accuracy
Sentiment Analysis Example 2 :
Gender Identification: ā we know that male and female names have some distinctive
characteristics. Generally, Names ending in a, e and i are likely to be female, while
names ending in k, o, r, s and t are likely to be male.
We build a classifier to model these differences more precisely.
19. Copyright @ 2019 Learntek. All Rights Reserved. 19
>>> def gender_features(word):
... return {'last_letter': word[-1]}
>>> gender_features('Shrekā)
{'last_letter': 'k'}
Now that weāve defined a feature extractor, we need to prepare a list of examples
and corresponding class labels.
>>> from nltk.corpus import names
>>> labeled_names = ([(name, 'male') for name in
names.words('male.txt')] +
... [(name, 'female') for name in names.words('female.txt')])
>>> import random
>>> random.shuffle(labeled_names)
20. Copyright @ 2019 Learntek. All Rights Reserved. 20
Next, the feature extractor is using to process the names data and divide the
resulting list of feature sets into a training set and a test set. The training set is
used to train a new ānaive Bayesā classifier.
>>> featuresets = [(gender_features(n), gender) for (n, gender) in labeled_names]
>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
22. Copyright @ 2019 Learntek. All Rights Reserved.
22
Letās just test it out on some names that did not appear in its training data:
>>> classifier.classify(gender_features('Neoā))
'maleā
>>> classifier.classify(gender_features('olvinā))
'maleā
>>> classifier.classify(gender_features('rickyā))
'femaleā
>>> classifier.classify(gender_features('serenaā))
'female'
24. Copyright @ 2019 Learntek. All Rights Reserved. 24
We can systematically evaluate the classifier on a much larger quantity of unseen
data:
>>> print(nltk.classify.accuracy(classifier, test_set))
0.77
Finally, we can examine the classifier to determine which features it found most
effective for distinguishing the namesā genders:
25. Copyright @ 2019 Learntek. All Rights Reserved. 25
>>> classifier.show_most_informative_features(20)
Most Informative Features
last_letter = 'a' female : male = 35.5 : 1.0
last_letter = 'k' male : female = 30.7 : 1.0
last_letter = 'p' male : female = 20.8 : 1.0
last_letter = 'f' male : female = 15.9 : 1.0
last_letter = 'd' male : female = 11.5 : 1.0
last_letter = 'v' male : female = 9.8 : 1.0
26. Copyright @ 2019 Learntek. All Rights Reserved. 26
last_letter = 'o' male : female = 8.7 : 1.0
last_letter = 'w' male : female = 8.4 : 1.0
last_letter = 'm' male : female = 8.2 : 1.0
last_letter = 'r' male : female = 7.0 : 1.0
last_letter = 'g' male : female = 5.1 : 1.0
last_letter = 'b' male : female = 4.4 : 1.0
last_letter = 's' male : female = 4.3 : 1.0
27. Copyright @ 2019 Learntek. All Rights Reserved. 27
last_letter = 'z' male : female = 3.9 : 1.0
last_letter = 'j' male : female = 3.9 : 1.0
last_letter = 't' male : female = 3.8 : 1.0
last_letter = 'i' female : male = 3.8 : 1.0
last_letter = 'u' male : female = 3.0 : 1.0
last_letter = 'n' male : female = 2.1 : 1.0
last_letter = 'e' female : male = 1.8 : 1.0
29. Copyright @ 2019 Learntek. All Rights Reserved. 29
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624