SENTIMENT ANALYSIS AND
OPINION MINING
Avinash Kumar Singh
“WHAT OTHER PEOPLE THINK ?”
 What others think has always been an important
piece of information
 Before making any decision, we look for
suggestions and opinions from others.
 A big question “So whom shall I ask ?”.
2
EVOLUTION
History
• Friends
• Acquaintances
• Consumer Reports
Present
• Friends +
Acquaintances
• Unknowns
• No Limitations !
• Across Globe 3
MORE PROBLEMS !!
 Biased views
 Fake Reviews
 Spam Reviews
 Contradicting Reviews
4
SOLUTION ! – SUBJECTIVITY
ANALYSIS
• General Text can be divided into two
segments
 Objective – which don’t carry any opinion or
sentiment.
 Facts (news, encyclopedias, etc)
 Subjective
• Subjectivity Analysis
 Linguistic expressions of somebody’s opinions,
sentiments, emotions .. that is not open to
verification.
5
FLAVORS OF SUBJECTIVITY
ANALYSIS
Sentiment
Analysis
Opinion
Mining
Mood
Classification
Emotion
Analysis
Synonyms and
Used Interchangeably !!
6
WHAT IS SENTIMENT?
• Subjective impressions
• Generally, Sentiment ==
 Feelings
 Opinions
 Emotions
 Attitude
• like/dislike or good/bad, etc.
7
WHAT IS SENTIMENT ANALYSIS?
• Sentiment Analysis is a study of human
behavior in which we extract user opinion
and emotion from plain text.
• Identifying the orientation of opinions in a
piece of text.
 The Maggie is Tasty [Sentiment] 
 This movie stars Mr. X.   [Factual]
 This Maggi is silent killer [Sentiment] 
8
MOTIVATION
 Enormous amount of information.
 Real time update
 Monetary benefits
9
DOES WEB REALLY CONTAIN
SENTIMENTS ?
 Yes, Where ?
 Blogs
 Reviews
 User Comments
 Discussion Forums
 Social Network (Twitter, Facebook, etc.)
10
CHALLENGES
 Ambiguous words
 This music cd is literal waste of time. (negative)
 Please throw your waste material here. (neutral)
 Sarcasm detection and handling
 “All the features you want - too bad they don’t
work. :-P”
 (Almost) No resources and tools for low/scarce
resource languages like Indian languages.
11
BASICS ..
 Basic components
 Opinion Holder – Who is talking ?
 Object – Item on which opinion is expressed.
 Opinion – Attitude or view of the opinion holder.
This is a
good book.
Opinion
Holder
Object
Opinion
12
TYPES OF OPINIONS
 Direct
 “This is a great book.”
 “Mobile with awesome functions.”
 Comparison
 “Samsung Galaxy S3 is better than Apple iPhone 4S.”
 “Hyundai Eon is not as good as Maruti Alto ! .”
13
WHAT IS SENTIMENT
CLASSIFICATION
 Classify given text on the overall sentiments
expresses by the author
 Different levels
 Document
 Sentence
 Feature
 Classification levels
 Binary
 Multi Class
14
DOCUMENT LEVEL SENTIMENT
CLASSIFICATION
 Documents can be reviews, blog posts, ..
 Assumption:
 Each document focuses on single object.
 Only single opinion holder.
 Task : determine the overall sentiment
orientation of the document.
15
SENTENCE LEVEL SENTIMENT
CLASSIFICATION
 Considers each sentence as a separate unit.
 Assumption : sentence contain only one opinion.
 Task 1: identify if sentence is subjective or
objective
 Task 2: identify polarity of sentence.
16
FEATURE LEVEL SENTIMENT
CLASSIFICATION
 Task 1: identify and extract object features
 Task 2: determine polarity of opinions on
features
 Task 3: group same features
 Task 4: summarization
 Ex. This mobile has good camera but poor
battery life.
17
APPROACHES
 Prior Learning
 Subjective Lexicon
 (Un)Supervised Machine Learning
18
1.1 KEYWORDS SELECTION FROM
TEXT
 Pang et. al. (2002)
 Two human’s hired to pick keywords
 Binary Classification of Keywords
 Positive
 Negative
 Unigram method reached 80% accuracy.
19
1.2 N-GRAM BASED CLASSIFICATION
 Learn N-Grams (frequencies) from pre-annotated
training data.
 Use this model to classify new incoming sample.
 Classification can be done using
 Counting method
 Scoring function(s)
20
1.3 PART-OF-SPEECH BASED
PATTERNS
 Extract POS patterns from training data.
 Usually used for subjective vs objective
classification.
 Adjectives and Adverbs contain sentiments
 Example patterns
 *-JJ-NN : trigram pattern
 JJ-NNP : bigram pattern
 *-JJ : bigram pattern
21
DICTIONARY OF AFFECTIVE
LANGUAGE
 9000 Words with Part-of-speech information
 Each word has a valance score range 1 – 3.
 1 for Negative
 3 for Positive
 App
 http://sail.usc.edu/~kazemzad/emotion_in_text_cgi/DAL_app/inde
22
ADVANTAGES AND
DISADVANTAGES
 Advantages
 Fast
 No Training data necessary
 Good initial accuracy
 Disadvantages
 Does not deal with multiple word senses
 Does not work for multiple word phrases
23
APPROACH 3: MACHINE LEARNING
 Sensitive to sparse and insufficient data.
 Supervised methods require annotated data.
 Training data is used to create a hyper plane
between the two classes.
 New instances are classified by finding their
position on hyper plane.
24
SOME UNANSWERED QUESTIONS !
 Sarcasm Handling
 Word Sense Disambiguation
 Pre-processing and cleaning
 Multi-class classification
25
DATASETS
 MPQA Corpus
 Multi Perspective Question Answering
 News Article, other text documents
 Manually annotated
 692 documents
 Twitter Dataset
 http://www.sentiment140.com/
 1.6 million annotated tweets
 Bi-Polar classification
26
READING
 Opinion Mining and Sentiment Analysis
 Bo Pang and Lillian Lee (2008)
 www.cs.cornell.edu/home/llee/omsa/omsa.pdf
 Book: Sentiment Analysis and Opinion Mining
 Bing Liu (2012)
 http://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-
and-OpinionMining.html
27
THANK YOU

Major

  • 1.
    SENTIMENT ANALYSIS AND OPINIONMINING Avinash Kumar Singh
  • 2.
    “WHAT OTHER PEOPLETHINK ?”  What others think has always been an important piece of information  Before making any decision, we look for suggestions and opinions from others.  A big question “So whom shall I ask ?”. 2
  • 3.
    EVOLUTION History • Friends • Acquaintances •Consumer Reports Present • Friends + Acquaintances • Unknowns • No Limitations ! • Across Globe 3
  • 4.
    MORE PROBLEMS !! Biased views  Fake Reviews  Spam Reviews  Contradicting Reviews 4
  • 5.
    SOLUTION ! –SUBJECTIVITY ANALYSIS • General Text can be divided into two segments  Objective – which don’t carry any opinion or sentiment.  Facts (news, encyclopedias, etc)  Subjective • Subjectivity Analysis  Linguistic expressions of somebody’s opinions, sentiments, emotions .. that is not open to verification. 5
  • 6.
  • 7.
    WHAT IS SENTIMENT? •Subjective impressions • Generally, Sentiment ==  Feelings  Opinions  Emotions  Attitude • like/dislike or good/bad, etc. 7
  • 8.
    WHAT IS SENTIMENTANALYSIS? • Sentiment Analysis is a study of human behavior in which we extract user opinion and emotion from plain text. • Identifying the orientation of opinions in a piece of text.  The Maggie is Tasty [Sentiment]   This movie stars Mr. X.   [Factual]  This Maggi is silent killer [Sentiment]  8
  • 9.
    MOTIVATION  Enormous amountof information.  Real time update  Monetary benefits 9
  • 10.
    DOES WEB REALLYCONTAIN SENTIMENTS ?  Yes, Where ?  Blogs  Reviews  User Comments  Discussion Forums  Social Network (Twitter, Facebook, etc.) 10
  • 11.
    CHALLENGES  Ambiguous words This music cd is literal waste of time. (negative)  Please throw your waste material here. (neutral)  Sarcasm detection and handling  “All the features you want - too bad they don’t work. :-P”  (Almost) No resources and tools for low/scarce resource languages like Indian languages. 11
  • 12.
    BASICS ..  Basiccomponents  Opinion Holder – Who is talking ?  Object – Item on which opinion is expressed.  Opinion – Attitude or view of the opinion holder. This is a good book. Opinion Holder Object Opinion 12
  • 13.
    TYPES OF OPINIONS Direct  “This is a great book.”  “Mobile with awesome functions.”  Comparison  “Samsung Galaxy S3 is better than Apple iPhone 4S.”  “Hyundai Eon is not as good as Maruti Alto ! .” 13
  • 14.
    WHAT IS SENTIMENT CLASSIFICATION Classify given text on the overall sentiments expresses by the author  Different levels  Document  Sentence  Feature  Classification levels  Binary  Multi Class 14
  • 15.
    DOCUMENT LEVEL SENTIMENT CLASSIFICATION Documents can be reviews, blog posts, ..  Assumption:  Each document focuses on single object.  Only single opinion holder.  Task : determine the overall sentiment orientation of the document. 15
  • 16.
    SENTENCE LEVEL SENTIMENT CLASSIFICATION Considers each sentence as a separate unit.  Assumption : sentence contain only one opinion.  Task 1: identify if sentence is subjective or objective  Task 2: identify polarity of sentence. 16
  • 17.
    FEATURE LEVEL SENTIMENT CLASSIFICATION Task 1: identify and extract object features  Task 2: determine polarity of opinions on features  Task 3: group same features  Task 4: summarization  Ex. This mobile has good camera but poor battery life. 17
  • 18.
    APPROACHES  Prior Learning Subjective Lexicon  (Un)Supervised Machine Learning 18
  • 19.
    1.1 KEYWORDS SELECTIONFROM TEXT  Pang et. al. (2002)  Two human’s hired to pick keywords  Binary Classification of Keywords  Positive  Negative  Unigram method reached 80% accuracy. 19
  • 20.
    1.2 N-GRAM BASEDCLASSIFICATION  Learn N-Grams (frequencies) from pre-annotated training data.  Use this model to classify new incoming sample.  Classification can be done using  Counting method  Scoring function(s) 20
  • 21.
    1.3 PART-OF-SPEECH BASED PATTERNS Extract POS patterns from training data.  Usually used for subjective vs objective classification.  Adjectives and Adverbs contain sentiments  Example patterns  *-JJ-NN : trigram pattern  JJ-NNP : bigram pattern  *-JJ : bigram pattern 21
  • 22.
    DICTIONARY OF AFFECTIVE LANGUAGE 9000 Words with Part-of-speech information  Each word has a valance score range 1 – 3.  1 for Negative  3 for Positive  App  http://sail.usc.edu/~kazemzad/emotion_in_text_cgi/DAL_app/inde 22
  • 23.
    ADVANTAGES AND DISADVANTAGES  Advantages Fast  No Training data necessary  Good initial accuracy  Disadvantages  Does not deal with multiple word senses  Does not work for multiple word phrases 23
  • 24.
    APPROACH 3: MACHINELEARNING  Sensitive to sparse and insufficient data.  Supervised methods require annotated data.  Training data is used to create a hyper plane between the two classes.  New instances are classified by finding their position on hyper plane. 24
  • 25.
    SOME UNANSWERED QUESTIONS!  Sarcasm Handling  Word Sense Disambiguation  Pre-processing and cleaning  Multi-class classification 25
  • 26.
    DATASETS  MPQA Corpus Multi Perspective Question Answering  News Article, other text documents  Manually annotated  692 documents  Twitter Dataset  http://www.sentiment140.com/  1.6 million annotated tweets  Bi-Polar classification 26
  • 27.
    READING  Opinion Miningand Sentiment Analysis  Bo Pang and Lillian Lee (2008)  www.cs.cornell.edu/home/llee/omsa/omsa.pdf  Book: Sentiment Analysis and Opinion Mining  Bing Liu (2012)  http://www.cs.uic.edu/~liub/FBS/SentimentAnalysis- and-OpinionMining.html 27
  • 28.