Opinion Mining
Outline







Definition
Applications
Challenges
Model
Conclusion
References
Definition


Opinion mining (sentiment mining, opinion/sentiment
extraction) is the area of research that attempts to
make automatic systems to determine human opinion
from text written in natural language.



It seeks to identify the view point (s) underlying a
text span; an example application is classifying a
movie review as thumbs up or thumbs down.
Opinion mining is a new discipline which has
recently attracted increased attension within fields
such as Marketing,Recomandation systems and
financial market prediction.Although often
associated Emotional states from text,Opinion
Mining is an independent area related to Natural
Language Processing And Text mining that deals
with the Identification of opinionsAnd attitudes in
. Natural Language Text


Consider, for instance, the following scenario. A
major computer manufacturer, disappointed with
unexpectedly low sales, finds itself confronted with
this question:

Why aren't consumers buying our laptop?




What other people think has always been an
important piece of information for most of us during
the decision-making process.
Opinion mining draws on computational linguistic,
information retrieval, text mining, natural language
processing, machine learning, statistics and predictive
analysis

1.
2.




Two main types of textual information.
Facts
Opinions
Most current information processing technique (e.g.,
search engines) work with facts (assume they are
true)
Facts can be expressed with topic keywords
In real life, facts are important, but opinion also
plays a crucial role. A computer manufacturer,
disappointed with low sales, asks itself: Why aren’t
consumers buying our laptop? The Democratic
National Committee, disappointed with the last
election, wants to know on an on-going basis: What
is the reaction in the press, newsgroups, chat rooms,
and blogs to Bush’s latest policy decision?



The main advantage is the speed
On average, humans process six articles
per hour against the machine’s throughput of 10 per
second
Applications






recommendation systems
Summarization

Applications in Business
 marketing intelligence,
 product and service benchmarking and
improvement.
 To understand the voice of the customer as
expressed in everyday communications
Applications


Politics
As is well known, opinions matter a
great deal in politics. Some work has focused on
understanding what voters are thinking
Challenges
The difficulty lies in the richness of the language that
human use.
Example:
1. This is a great camera.
2. A great amount of money was spent for
promoting this camera.
3. One might think this is a great camera. Well
think again, because.....
 a single keyword can be used to convey three
different opinions, +ve, neutral and -ve respectively.

Challenges


In order to arrive at sensible conclusions, sentiment
analysis has to understand context. For example,
“fighting” and “disease” is negative in a war context
but positive in a medical one.



Different mining for different domains.
sentiment analysis model
Data Preparation


The data preparation step performs necessary data
preprocessing and cleaning on the dataset for the
subsequent analysis. Some commonly used
preprocessing steps include removing non-textual
contents and markup tags (for HTML pages), and
removing information about the reviews that are not
required for sentiment analysis, such as review dates
and reviewers’ names.
Review Analysis


The review analysis step analyzes the linguistic
features of reviews so that interesting information,
including opinions and/or product features, can be
identified.



This step often applies various computational
linguistics tasks to reviews first, and then extracts
opinions and product features from the processed
reviews.
Sentiment Classification


There are two main techniques for sentiment
classification:



The symbolic technique uses manually crafted rules
and lexicons,
The machine learning approach uses unsupervised, or
supervised learning to construct a model from a large
training corpus.


?What


Find relevant words, phrases, patterns that can be
used to express subjectivity



Determine the polarity of subjective expressions
Words



Adjectives
positive: honest important mature large patient
Ron Paul is the only honest man in Washington.






Kitchell’s writing is unbelievably mature and is only likely to get
better.
To humour me my patient father agrees yet again to my choice of
film

negative: harmful hypocritical inefficient insecure



It was a macabre and hypocritical circus.
Why are they being so inefficient ?
Words


Verbs
positive: praise, love
 negative: blame, criticize




Nouns
positive: pleasure, enjoyment
 negative: pain, criticism

Phrases


Phrases containing adjectives and adverbs



positive: high intelligence, low cost
negative: little variation, many troubles
Machine Learning


Studies showed that standard machine learning
techniques definitively outperform humanproduced baselines.



To treat sentiment classification simply as a
special case of topic-based categorization
(with the two “topics” being positive sentiment
and negative sentiment)
Supervised Methods


In order to train a classifier for sentiment recognition
in text, classic supervised learning techniques (e.g.
Support Vector Machines, naive Bayes, Maximum
Entropy) can be used. A supervised approach entails
the use of a labelled training corpus to learn a certain
classification function. The method that in the
literature often yields the highest accuracy regards a
Support Vector Machine classifier
Suport Vector Machine
Unsupervised Learning
A clustering algorithm partitions the adjectives into two
subsets
+

slow

scenic
nice

terrible
handsome

painful

fun
expensive
comfortable
Conclusion





An important field of study
New Field
Many applications
Almost no work in this area
References




Pang, Bo and Lee, L. (2008). “Opinion Mining and
Sentiment Analysis”, Foundations and Trends R in,
Information Retrieval, Vol. 2, Nos. 1–2 (2008) 1–
135, ebook from
http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf
Wiebe, J. Cardie, C. and Riloff, E. ( 2007).
“Manual and Automatic Subjectivity and Sentiment
Analysis” , Center for Extraction and
Summarization of Events and Opinions in Text.
University of Utah

Opinion Mining

  • 1.
  • 2.
  • 3.
    Definition  Opinion mining (sentimentmining, opinion/sentiment extraction) is the area of research that attempts to make automatic systems to determine human opinion from text written in natural language.  It seeks to identify the view point (s) underlying a text span; an example application is classifying a movie review as thumbs up or thumbs down.
  • 4.
    Opinion mining isa new discipline which has recently attracted increased attension within fields such as Marketing,Recomandation systems and financial market prediction.Although often associated Emotional states from text,Opinion Mining is an independent area related to Natural Language Processing And Text mining that deals with the Identification of opinionsAnd attitudes in . Natural Language Text
  • 5.
     Consider, for instance,the following scenario. A major computer manufacturer, disappointed with unexpectedly low sales, finds itself confronted with this question: Why aren't consumers buying our laptop?
  • 6.
      What other peoplethink has always been an important piece of information for most of us during the decision-making process. Opinion mining draws on computational linguistic, information retrieval, text mining, natural language processing, machine learning, statistics and predictive analysis
  • 7.
     1. 2.   Two main typesof textual information. Facts Opinions Most current information processing technique (e.g., search engines) work with facts (assume they are true) Facts can be expressed with topic keywords
  • 8.
    In real life,facts are important, but opinion also plays a crucial role. A computer manufacturer, disappointed with low sales, asks itself: Why aren’t consumers buying our laptop? The Democratic National Committee, disappointed with the last election, wants to know on an on-going basis: What is the reaction in the press, newsgroups, chat rooms, and blogs to Bush’s latest policy decision? 
  • 9.
     The main advantageis the speed On average, humans process six articles per hour against the machine’s throughput of 10 per second
  • 10.
    Applications    recommendation systems Summarization Applications inBusiness  marketing intelligence,  product and service benchmarking and improvement.  To understand the voice of the customer as expressed in everyday communications
  • 11.
    Applications  Politics As is wellknown, opinions matter a great deal in politics. Some work has focused on understanding what voters are thinking
  • 12.
    Challenges The difficulty liesin the richness of the language that human use. Example: 1. This is a great camera. 2. A great amount of money was spent for promoting this camera. 3. One might think this is a great camera. Well think again, because.....  a single keyword can be used to convey three different opinions, +ve, neutral and -ve respectively. 
  • 13.
    Challenges  In order toarrive at sensible conclusions, sentiment analysis has to understand context. For example, “fighting” and “disease” is negative in a war context but positive in a medical one.  Different mining for different domains.
  • 14.
  • 15.
    Data Preparation  The datapreparation step performs necessary data preprocessing and cleaning on the dataset for the subsequent analysis. Some commonly used preprocessing steps include removing non-textual contents and markup tags (for HTML pages), and removing information about the reviews that are not required for sentiment analysis, such as review dates and reviewers’ names.
  • 16.
    Review Analysis  The reviewanalysis step analyzes the linguistic features of reviews so that interesting information, including opinions and/or product features, can be identified.  This step often applies various computational linguistics tasks to reviews first, and then extracts opinions and product features from the processed reviews.
  • 17.
    Sentiment Classification  There aretwo main techniques for sentiment classification:  The symbolic technique uses manually crafted rules and lexicons, The machine learning approach uses unsupervised, or supervised learning to construct a model from a large training corpus. 
  • 18.
    ?What  Find relevant words,phrases, patterns that can be used to express subjectivity  Determine the polarity of subjective expressions
  • 19.
    Words   Adjectives positive: honest importantmature large patient Ron Paul is the only honest man in Washington.    Kitchell’s writing is unbelievably mature and is only likely to get better. To humour me my patient father agrees yet again to my choice of film negative: harmful hypocritical inefficient insecure   It was a macabre and hypocritical circus. Why are they being so inefficient ?
  • 20.
    Words  Verbs positive: praise, love negative: blame, criticize   Nouns positive: pleasure, enjoyment  negative: pain, criticism 
  • 21.
    Phrases  Phrases containing adjectivesand adverbs   positive: high intelligence, low cost negative: little variation, many troubles
  • 22.
    Machine Learning  Studies showedthat standard machine learning techniques definitively outperform humanproduced baselines.  To treat sentiment classification simply as a special case of topic-based categorization (with the two “topics” being positive sentiment and negative sentiment)
  • 23.
    Supervised Methods  In orderto train a classifier for sentiment recognition in text, classic supervised learning techniques (e.g. Support Vector Machines, naive Bayes, Maximum Entropy) can be used. A supervised approach entails the use of a labelled training corpus to learn a certain classification function. The method that in the literature often yields the highest accuracy regards a Support Vector Machine classifier
  • 24.
  • 26.
    Unsupervised Learning A clusteringalgorithm partitions the adjectives into two subsets + slow scenic nice terrible handsome painful fun expensive comfortable
  • 27.
    Conclusion     An important fieldof study New Field Many applications Almost no work in this area
  • 28.
    References   Pang, Bo andLee, L. (2008). “Opinion Mining and Sentiment Analysis”, Foundations and Trends R in, Information Retrieval, Vol. 2, Nos. 1–2 (2008) 1– 135, ebook from http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf Wiebe, J. Cardie, C. and Riloff, E. ( 2007). “Manual and Automatic Subjectivity and Sentiment Analysis” , Center for Extraction and Summarization of Events and Opinions in Text. University of Utah

Editor's Notes

  • #19 What is lexicon development about? Regarding the term “polarity”: There are other terms that people in the field use to talk about polarity: semantic orientation and valence are two common ones.
  • #27 Step 4: the goal is to have mainly same-orientation links within the subsets and different-orientation links across the subsets