1. Introduction to Sentiment Analysis
Rajesh Piryani
Department Of Computer Science
South Asian University, New Delhi
2. What is Sentiment Analysis?
๏It is a natural language processing task that uses an algorithmic formulation to
categorize an opinionated text into either โpositiveโ or โnegativeโ sentiment classes (or
sometimes a โneutralโ class equivalent to having no opinion polarity).
๏SA(Sentiment Analysis) is defined as a quintuple
๏ง<Oi; Fij; Ski jl; Hk; Tl >
๏ง Oi = targeted object
๏ง Fij = feature of the object
๏ง Ski jl = Sentiment polarity,
๏ง Hk = Opinion Holder k,
๏ง Tl =Time when the opinion is expressed
Example
๏งOi = Samsung Mobile
๏งFij = Battery, Camera, Memory Card, Design, etc
๏งSki jl = positive for six month, Negative after that
๏งHk = Myself,
๏งTl =When I purchased the Samsung mobile it was
good, but now after 6 months it gets heated in 4 to
5 minutes .
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 2
3. Why Sentiment Analysis
๏Mainly because of the Web; huge volumes of opinionated text
๏User-generated media: One can express opinions on anything in reviews, forums,
discussion groups, blogs
๏Opinions of global scale: No longer limited to:
๏งIndividuals: oneโs circle of friends
๏ง Businesses: Small scale surveys, tiny focus groups, etc.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 3
4. Example 1
๏I love this movie! It's sweet, but with satirical humor. The dialogue is great and the
adventure scenes are funโฆ It manages to be whimsical and romantic while laughing at
the conventions of the fairy tale genre. I would recommend it to just about anyone.
I've seen it several times, and I'm always happy to see it again whenever I have a
friend who hasn't seen it yet.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 4
5. Example 2
๏My XYZ CAR was delivered yesterday. It looks fabulous. We went on a long
highway drive the very second day of getting the car. It was smooth, comfortable and
wonderful drive. Had a wonderful experience with family. Its an awesome car. I am
loving it..!
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 5
6. Classification of Sentence
๏Opinion without sentiment (Objectivity)
๏งI believe the World is flat.
๏งSamsung Galaxy has resolution of 14 MP.
๏Sentiment always involve holderโs emotion or
desires (Subjectivity)
๏งI think intervention in Libya will put US in a
difficult situation.
๏งThe US attack on Afghanistan is wrong.
๏งVideo Quality of iPhone is awesome.
๏งiPhone6 is newest in the market.
Sentences
Objective Subjective
Positive Negative Neutral
Figure 1. Classification of Sentence
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 6
7. Levels of Sentiment Analysis
Levels of
Sentiment Analysis
Document Level Sentence Level Aspect Level
Figure 2. Level of Sentiment Analysis
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 7
8. Example 3
๏iPhone- User Review:
I bought an iPhone a few days ago. It was such a nice phone. The touch screen was
really cool. The voice quality was clear too. Although the battery life was not long, that
is ok for me. However, my mother was mad with me as I did not tell her before I
bought the phone. She also thought the phone was too expensive, and wanted me to
return it to the shop. โฆ
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 8
9. Visual Comparison of Aspect based
Sentiment Analysis
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 9
Figure 3. Visual Comparison of Aspect Level based Sentiment Analysis
10. Approaches to perform Sentiment Analysis
๏Machine Learning Classifier Approach
๏งNaรฏve Bayes, Maximum Entropy, Support Vector Machine etc.
๏Unsupervised Semantic Orientation Approach
๏งSemantic Orientation-Point-wise Mutual Information-Information Retrieval
๏Semi-supervised SentiWordNet based Approaches
๏งSentiWordNet, SenticNet
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 10
11. ML Supervised Algorithm Block Diagram
Figure 4. Block diagram of ML Supervised Algorithm
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 11
12. Preprocessing of data for ML Algorithm
Review/Text Tokenization
Stop word
removal
Punctuation
marks
removal
Stemming
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 12
Figure 5. Steps for pre-processing of data
13. Preprocessing of data for ML Algorithm
๏Stop Words:
โขcommon words that have low discrimination power (e.g., the, is, and who)
โขusually filtered out before processing the text
๏Stemming
โขthe purpose of stemming is to reduce different grammatical forms or word forms of a
word like its noun, adjective, verb, adverb etc
โขThe goal of stemming is to reduce inflectional forms and sometimes derivationally related
forms of a word to a common base form
โขExample: "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu"
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 13
14. Supervised Machine Learning
๏Input:
๏ง a document ๐
๏งA fixed set of classes ๐ช = ๐๐, ๐๐, โฆ , ๐๐
๏งA train set of m hand-labeled documents ๐ ๐, ๐๐ , โฆ , (๐ ๐, ๐๐)
๏Output
๏งA learned classifier, ๐: ๐ โ ๐
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 14
15. The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 15
16. The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 16
17. The bag of word representation:
using a subset of words
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 17
18. The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 18
19. NB Machine Learning Approach
๏The probability of a document d being in class c is computed as
๐ท ๐ ๐ โ ๐ท ๐
๐โค๐โค๐๐
๐ท( ๐๐|๐)
๏where, ๐ท(๐๐|๐) is the conditional probability of a term ๐๐ occurring in a document of class ๐.
๏The goal is to find the best class, i.e., Maximum A Posteriori Class as follows:
๐๐๐๐ = ๐๐๐๐๐๐๐โ๐ช ๐ท ๐ โ
๐โค๐โค๐๐
๐ท( ๐๐|๐)
๏Which can be reframed as
๐๐๐๐ = ๐๐๐๐๐๐๐โ๐ช[๐๐๐ ๐ท ๐ +
๐โค๐โค๐๐
๐๐๐ ๐ท(๐๐|๐)]
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 19
20. NB Machine Learning Approach (Contd..)
๏๐ท(๐) and ๐ท(๐๐|๐) are maximum likelihood estimates based on training data and can be computed as:
๐ท ๐ =
๐ต๐ช
๐ต
๐ท ๐ ๐ =
๐ป๐๐
๐โฒโ๐ฝ ๐ป๐๐โฒ
๏Laplace (add-1) smoothing for Naรฏve Bayes
๐ท ๐ ๐ =
๐ป๐๐ + ๐
๐โฒโ๐ฝ ๐ป๐๐โฒ + ๐
=
๐ป๐๐ + ๐
( ๐โฒโ๐ฝ ๐ป๐๐โฒ) + |๐ฝ|
๏where, ๐ต is total no. of docs,
๏๐ต๐ is the no. of docs in the class ๐.
๏๐ป๐๐ is the number of occurrences of term ๐ in training docs from class ๐.
๏|๐ฝ|is the number of unique words in vocabulary
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 20
21. Example
๏S: I love this fun film.
๏Steps:
๏ง Assigning each word: ๐ท(๐๐๐๐ | ๐)
๏ง Assigning each sentence: ๐ท(๐|๐) = ๐ท ๐ท(๐๐๐๐ |๐)
Which class assigns the higher probability to s?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 21
22. Example
๏S: I love this fun film.
๏Steps:
๏ง Assigning each word: ๐ท(๐๐๐๐ | ๐)
๏ง Assigning each sentence: ๐ท(๐|๐) = ๐ท ๐ท(๐๐๐๐ |๐)
Model Positive
0.1 I
0.1 love
0.01 this
0.05 fun
0.1 film
Model Negative
0.2 I
0.001 love
0.01 this
0.005 fun
0.1 film
S I love this fun film
0.1 0.1 0.01 0.05 0.1
0.2 0.001 0.01 0.005 0.1
๐ท ๐ ๐๐๐ > ๐ท(๐|๐๐๐)
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 22
25. Performance Evaluation
๏Definition of some terminologies
๏ฑ๐๐:- A true positive (๐๐) decision assigns two similar documents to the same classes
๏ฑ๐๐:- a true negative (๐๐) decision assigns two dissimilar documents to different classes
๏ฑ๐๐:- A (๐๐) decision assigns two dissimilar documents to the same classes
๏ฑ๐๐:- A (๐๐) decision assigns two similar documents to different classes
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 25
27. Exercise
Doc Words Class
Training
Document
1 India Eden India Wicket Cricket
2 India India Sachin Cricket
3 Sachin India Eden Cricket
4 Japan Mesi India Football
Test Document 5 India Sachin India Japan Eden Wicket ?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 27
Compute the Conditional Probability of each unique word and compute
the class of doc5?
28. Hint
Doc Words Class
Training
Document
1 India Eden India Wicket Cricket
2 India India Sachin Cricket
3 Sachin India Eden Cricket
4 Japan Mesi India Football
Test Document 5 India Sachin India Japan Eden Wicket ?
Formulas
๐ท๐๐๐๐ ๐ท ๐ =
๐ต๐
๐ต
Conditional Probability
๐ท ๐ ๐ =
๐๐๐๐๐ ๐, ๐ + ๐
๐๐๐๐๐ ๐ + |๐ฝ|
๐ฝ : ๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐(๐ฎ๐ง๐ข๐ช๐ฎ๐ ๐ฐ๐จ๐ซ๐๐ฌ)
๐๐๐๐๐ ๐ : ๐๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐
๐๐๐๐๐ ๐, ๐ : ๐๐๐๐๐๐๐๐๐ ๐๐ ๐ ๐๐ ๐
For example: Prior ๐ท ๐ =
๐
๐
๐ท ๐ =
๐
๐
Conditional Probabilities
๐ ๐ผ๐๐๐๐ ๐ถ =?
๐ ๐ธ๐๐๐ ๐ถ =?
๐ ๐๐๐๐๐๐ก ๐ถ =?
๐ ๐๐๐โ๐๐ ๐ถ =?
๐ ๐ฝ๐๐๐๐ ๐ถ =?
๐ ๐๐๐ ๐ ๐ถ =?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 28
Conditional Probabilities
๐ ๐ผ๐๐๐๐ ๐น =?
๐ ๐ธ๐๐๐ ๐น =?
๐ ๐๐๐๐๐๐ก ๐น =?
๐ ๐๐๐โ๐๐ ๐น =?
๐ ๐ฝ๐๐๐๐ ๐น =?
๐ ๐๐๐ ๐ ๐น =?
CHOOSING A CLASS
๐ท ๐ช ๐ ๐ ?
๐ท ๐ญ ๐ ๐ ?
29. References
1. Bing Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second Edition.
Taylor and Francis Group, Boca, 2010.
2. Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: Opinion extraction and
semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide
Web, WWW โ03, pages 519โ528, New York, NY, USA, 2003. ACM.
3. Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. Proceedings of the20th international
conference on Computational Linguistics - COLING 04, 2004.
4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Proceedings of the ACL-02 conference on
Empirical methods in natural language processing - EMNLP โ02, 2002.
5. Bo Pang and Lillian Lee. A sentimental education. Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics - ACL โ04, 2004.
6. Bo Pang and Lillian Lee. Seeing stars. Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics - ACL 05, 2005.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 29
30. References
7. Michael Gamon. Sentiment classification on customer feedback data. Proceedings of the 20thinternational
conference on Computational Linguistics - COLING 04, 2004.
8. Daniel M. Bikel and Jeffrey Sorensen. If we want your opinion. International Conference on Semantic
Computing (ICSC 2007).
9. Kathleen T Durant and Michael D Smith. Mining sentiment classification from political web logs. In
Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (WebKDD-2006), Philadelphia, PA, 2006.
10. Peter D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. Lecture Notes in Computer
Science, page 491 to 502, 2001.
11. Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of
reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417โ
424. Association for Computational Linguistics, 2002.
12. Janyce Wiebe. Learning subjective adjectives from corpora. In AAAI/IAAI, pages 735โ740,2000.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 30
31. References
13. Vasileios Hatzivassiloglou and Kathleen R McKeown. Predicting the semantic orientation of
adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational
Linguistics and Eighth Conference of the European Chapter of the Association for Computational
Linguistics, pages 174โ181. Association for Computational Linguistics, 1997.
14. VK Singh, R Piryani, A Uddin, and P Waila. Sentiment analysis of movie reviews: A new feature-
based heuristic for aspect-level sentiment classification. In Automation, Computing, Communication,
Control and Compressed Sensing (iMac4s), 2013 International Multi- Conference on, pages 712โ717.
IEEE, 2013.
15. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. Sentiment analysis of blogs by combining
lexical knowledge with text classification. Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining - KDD โ09, 2009.
16. Robert T. Clemen and Robert L. Winkler. Combining probability distributions from experts in risk
analysis. Risk Analysis, 19(2):187 to 203, Apr 1999.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 31