Opinion mining (sentiment mining, opinion/sentiment
extraction) is the area of research that attempts to
make automatic systems to determine human opinion
from text written in natural language.
It seeks to identify the view point (s) underlying a
text span; an example application is classifying a
movie review as thumbs up or thumbs down.
Opinion mining is a new discipline which has
recently attracted increased attension within fields
such as Marketing,Recomandation systems and
financial market prediction.Although often
associated Emotional states from text,Opinion
Mining is an independent area related to Natural
Language Processing And Text mining that deals
with the Identification of opinionsAnd attitudes in
. Natural Language Text
Consider, for instance, the following scenario. A
major computer manufacturer, disappointed with
unexpectedly low sales, finds itself confronted with
Why aren't consumers buying our laptop?
What other people think has always been an
important piece of information for most of us during
the decision-making process.
Opinion mining draws on computational linguistic,
information retrieval, text mining, natural language
processing, machine learning, statistics and predictive
Two main types of textual information.
Most current information processing technique (e.g.,
search engines) work with facts (assume they are
Facts can be expressed with topic keywords
In real life, facts are important, but opinion also
plays a crucial role. A computer manufacturer,
disappointed with low sales, asks itself: Why aren’t
consumers buying our laptop? The Democratic
National Committee, disappointed with the last
election, wants to know on an on-going basis: What
is the reaction in the press, newsgroups, chat rooms,
and blogs to Bush’s latest policy decision?
The main advantage is the speed
On average, humans process six articles
per hour against the machine’s throughput of 10 per
Applications in Business
product and service benchmarking and
To understand the voice of the customer as
expressed in everyday communications
As is well known, opinions matter a
great deal in politics. Some work has focused on
understanding what voters are thinking
The difficulty lies in the richness of the language that
1. This is a great camera.
2. A great amount of money was spent for
promoting this camera.
3. One might think this is a great camera. Well
think again, because.....
a single keyword can be used to convey three
different opinions, +ve, neutral and -ve respectively.
In order to arrive at sensible conclusions, sentiment
analysis has to understand context. For example,
“fighting” and “disease” is negative in a war context
but positive in a medical one.
Different mining for different domains.
The data preparation step performs necessary data
preprocessing and cleaning on the dataset for the
subsequent analysis. Some commonly used
preprocessing steps include removing non-textual
contents and markup tags (for HTML pages), and
removing information about the reviews that are not
required for sentiment analysis, such as review dates
and reviewers’ names.
The review analysis step analyzes the linguistic
features of reviews so that interesting information,
including opinions and/or product features, can be
This step often applies various computational
linguistics tasks to reviews first, and then extracts
opinions and product features from the processed
There are two main techniques for sentiment
The symbolic technique uses manually crafted rules
The machine learning approach uses unsupervised, or
supervised learning to construct a model from a large
Find relevant words, phrases, patterns that can be
used to express subjectivity
Determine the polarity of subjective expressions
positive: honest important mature large patient
Ron Paul is the only honest man in Washington.
Kitchell’s writing is unbelievably mature and is only likely to get
To humour me my patient father agrees yet again to my choice of
negative: harmful hypocritical inefficient insecure
It was a macabre and hypocritical circus.
Why are they being so inefficient ?
Phrases containing adjectives and adverbs
positive: high intelligence, low cost
negative: little variation, many troubles
Studies showed that standard machine learning
techniques definitively outperform humanproduced baselines.
To treat sentiment classification simply as a
special case of topic-based categorization
(with the two “topics” being positive sentiment
and negative sentiment)
In order to train a classifier for sentiment recognition
in text, classic supervised learning techniques (e.g.
Support Vector Machines, naive Bayes, Maximum
Entropy) can be used. A supervised approach entails
the use of a labelled training corpus to learn a certain
classification function. The method that in the
literature often yields the highest accuracy regards a
Support Vector Machine classifier
A clustering algorithm partitions the adjectives into two
An important field of study
Almost no work in this area
Pang, Bo and Lee, L. (2008). “Opinion Mining and
Sentiment Analysis”, Foundations and Trends R in,
Information Retrieval, Vol. 2, Nos. 1–2 (2008) 1–
135, ebook from
Wiebe, J. Cardie, C. and Riloff, E. ( 2007).
“Manual and Automatic Subjectivity and Sentiment
Analysis” , Center for Extraction and
Summarization of Events and Opinions in Text.
University of Utah