2. What is Sentiment Analysis?
• Takes a block of text as input
• Determines the sentiment expressed in it
• “Sentiment” refers to whether the author’s
opinion is positive or negative
4. What Sentiment Analysis is NOT
• Does NOT use images anywhere (that is
“emotion detection”)
• Does NOT aim at evaluating the product itself,
just the sentiment expressed by the reviewer
5. Why Sentiment Analysis is challenging
• Keywords are not usually direct
“This phone is as modern as the one owned by
Alexander Graham Bell”
• Opinions expressed may belong to other
people
“Many people say iPhones are better than Androids”
• Order Effects
“This could have revolutionized phones for ever,
but the bundled OS makes it an ultimate letdown”
• Colloquial and domain-specific phrases
“The phone runs a 1.2 GHz dual core processor”
6. Project Overview
• Aims to perform sentiment analysis on
cellphone reviews
• Rates the sentiment on a scale of 1 to 5 stars
7. Inner Workings
• Uses a corpus of several cellphone reviews
(currently 33)
• Trains a classifier using features, which may
be:
– Unigrams (Occurrences of single words)
– Bigrams (Occurrences in pairs)
– Adjectives only, etc.
• Uses the classifier to classify unknown reviews
9. Why Python?
• Less code, more productivity
• Flexible paradigms (functional, procedural,
object-oriented, all in one)
• Fast development cycle
• Wide range of modules
11. Diving In… The Algorithm
(Unigram Occurrences)
1. Take the entire corpus as input
2. Create a list ‘l’ of all documents, each labeled
by its category (i.e., no of stars)
3. Extract the ‘n’ most frequent words in the
entire corpus, cleaning up duplicates and
non-alphabetic words
12. Diving In… The Algorithm
(Unigram Occurrences)
4. For every document in l:
i. Create a dictionary d[l]
ii. For each of the n frequent words, put a value in
d[l] indicating presence or absence
5. Divide the dictionary into a training set and a
testing set
13. Diving In… The Algorithm
(Unigram Occurrences)
6. Train a Naïve Bayes Classifier using the
training set
7. Test the classifier using the testing set and
report the accuracy
14. Next Steps
• Investigating the Maximum Entropy Classifier
• Refining feature choice
– Negation Tagging
– Synonyms
• Investigating Regression techniques
15. Additional Applications of Sentiment
Analysis
• Filtering of SPAM or abusive e-mails
• Gauging the mood of people in a particular
network
• Government intelligence
• Psychological evaluation
• Recommendation Systems
• Display of ads on webpages
16. “Sentiment is the poetry of the imagination.”
- Alphonse de Lamartine