Sentiment Analysis
michel.bruley@teradata.com

Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data …

www.decideo.fr/bruley

January 2012
Introduction
Two main types of textual information: Facts and Opinions
Most current text information processing methods work
with factual information (e.g., web search, text mining)
Sentiment analysis or opinion mining, computational study
of opinions (sentiments, emotions) expressed in text
Why opinion mining now? Mainly because of the Web huge
volumes of opinionated text.

www.decideo.fr/bruley
What is Sentiment Analysis?
Identify the orientation of opinion in a piece of text (blogs,
user comments, review websites, community websites, …), in
others words determine if a sentence or a document
expresses positive, negative, neutral sentiment towards some
object?

The movie
was fabulous!
[ Sentimental ]

www.decideo.fr/bruley

The movie
stars Mr. X
[ Factual ]

The movie
was horrible!
[ Sentimental ]
SA at different levels

His last movie was
The movie was
Great and interesting.
The last movie was
His police stopped
The movie was
interesting and
very boring
corruption
great.
fabulousdud.
This one’s a

Word-level SA

Sentence-level SA

Document-level SA
fabulous
interesting
boring
police (subj.) stopped (verb) corruption (obj.)
www.decideo.fr/bruley
What is an Opinion?
An opinion is a quintuple:

(oj, fjk, soijkl, hi, tl)
where
– oj is a target object
– fjk is a feature of the object oj
– soijkl is the sentiment value of the opinion of the opinion
holder hi on feature fjk of object oj at time tl
– hi is an opinion holder
– tl is the time when the opinion is expressed
www.decideo.fr/bruley
Objective: structure the unstructured
Objective: Given an opinionated document,
– Discover all quintuples (oj, fjk, soijkl, hi, tl),
• i.e., mine the five corresponding pieces of information
in each quintuple
With the quintuples,
– Unstructured Text → Structured Data
• Traditional data and visualization tools can be used to
slice, dice and visualize the results in all kinds of ways
• Enable qualitative and quantitative analysis
With all quintuples, all kinds of analyses become possible
www.decideo.fr/bruley
SA is not Just ONE Problem
Track direct opinions:
– document
– sentence
– feature level
Compare opinions: different types of comparisons
Detect opinion spam detection: fake reviews

www.decideo.fr/bruley
Polarity Classifier
First eliminate objective sentences, then use remaining
sentences to classify document polarity (reduce noise)

www.decideo.fr/bruley
Level of Analysis
We can inquire about sentiment at various linguistic levels:
Words – objective, positive, negative, neutral
Clauses – “going out of my mind”
Sentences – possibly multiple sentiments
Documents

www.decideo.fr/bruley
Words
Adjectives
– objective: red, metallic
– positive: honest, important, mature, large, patient
– negative: harmful, hypocritical, inefficient
– subjective (but not positive or negative): curious, peculiar, odd,
likely, probable
Verbs
– positive: praise, love
– negative: blame, criticize
– subjective: predict
Nouns
– positive: pleasure, enjoyment
– negative: pain, criticism
– subjective: prediction, feeling
www.decideo.fr/bruley
Clauses
Might flip word sentiment
– “not good at all”
– “not all good”
Might express sentiment not in any word
– “convinced my watch had stopped”
– “got up and walked out”

www.decideo.fr/bruley
Some Problems
Which features to use? Words (unigrams), Phrases/n-grams,
Sentences
How to interpret features for sentiment detection? Bag of
words (IR), Annotated lexicons (WordNet, SentiWordNet),
Syntactic patterns, Paragraph structure
Must consider other features due to…
– Subtlety of sentiment expression
• irony
• expression of sentiment using neutral words
– Domain/context dependence
• words/phrases can mean different things in different
contexts and domains
– Effect of syntax on semantics
www.decideo.fr/bruley
Some Applications Examples
Review classification: Is a review positive or negative
toward the movie?
Product review mining: What features of the ThinkPad
T43 do customers like/dislike?
Tracking sentiments toward topics over time: Is anger
ratcheting up or cooling down?
Prediction (election outcomes, market trends): Will
Obama or Republican candidate win?
Etcetera

www.decideo.fr/bruley
Aster Data position for Text
Analysis
Data
Data
Acquisition
Acquisition
Gather text from
relevant sources
(web crawling, document
scanning, news feeds,
Twitter feeds, …)

Pre-Processing
Pre-Processing

Mining
Mining

Analytic
Analytic
Applications
Applications

Perform processing
required to transform and
store text data and
information

Apply data mining
techniques to derive
insights about stored
information

Leverage insights from
text mining to provide
information that improves
decisions and processes

(stemming, parsing, indexing,
entity extraction, …)

(statistical analysis,
classification, natural
language processing, …)

(sentiment analysis, document
management, fraud analysis,
e-discovery, ...)

Aster Data Fit
Third-Party Tools Fit
Aster Data Value: Massive scalability of text storage and processing, Functions for text processing, Flexibility to develop diverse
custom analytics and incorporate third-party libraries

www.decideo.fr/bruley

Big Data & Sentiment Analysis

  • 1.
    Sentiment Analysis michel.bruley@teradata.com Extract fromvarious presentations: Bing Liu, Aditya Joshi, Aster Data … www.decideo.fr/bruley January 2012
  • 2.
    Introduction Two main typesof textual information: Facts and Opinions Most current text information processing methods work with factual information (e.g., web search, text mining) Sentiment analysis or opinion mining, computational study of opinions (sentiments, emotions) expressed in text Why opinion mining now? Mainly because of the Web huge volumes of opinionated text. www.decideo.fr/bruley
  • 3.
    What is SentimentAnalysis? Identify the orientation of opinion in a piece of text (blogs, user comments, review websites, community websites, …), in others words determine if a sentence or a document expresses positive, negative, neutral sentiment towards some object? The movie was fabulous! [ Sentimental ] www.decideo.fr/bruley The movie stars Mr. X [ Factual ] The movie was horrible! [ Sentimental ]
  • 4.
    SA at differentlevels His last movie was The movie was Great and interesting. The last movie was His police stopped The movie was interesting and very boring corruption great. fabulousdud. This one’s a Word-level SA Sentence-level SA Document-level SA fabulous interesting boring police (subj.) stopped (verb) corruption (obj.) www.decideo.fr/bruley
  • 5.
    What is anOpinion? An opinion is a quintuple: (oj, fjk, soijkl, hi, tl) where – oj is a target object – fjk is a feature of the object oj – soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj at time tl – hi is an opinion holder – tl is the time when the opinion is expressed www.decideo.fr/bruley
  • 6.
    Objective: structure theunstructured Objective: Given an opinionated document, – Discover all quintuples (oj, fjk, soijkl, hi, tl), • i.e., mine the five corresponding pieces of information in each quintuple With the quintuples, – Unstructured Text → Structured Data • Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways • Enable qualitative and quantitative analysis With all quintuples, all kinds of analyses become possible www.decideo.fr/bruley
  • 7.
    SA is notJust ONE Problem Track direct opinions: – document – sentence – feature level Compare opinions: different types of comparisons Detect opinion spam detection: fake reviews www.decideo.fr/bruley
  • 8.
    Polarity Classifier First eliminateobjective sentences, then use remaining sentences to classify document polarity (reduce noise) www.decideo.fr/bruley
  • 9.
    Level of Analysis Wecan inquire about sentiment at various linguistic levels: Words – objective, positive, negative, neutral Clauses – “going out of my mind” Sentences – possibly multiple sentiments Documents www.decideo.fr/bruley
  • 10.
    Words Adjectives – objective: red,metallic – positive: honest, important, mature, large, patient – negative: harmful, hypocritical, inefficient – subjective (but not positive or negative): curious, peculiar, odd, likely, probable Verbs – positive: praise, love – negative: blame, criticize – subjective: predict Nouns – positive: pleasure, enjoyment – negative: pain, criticism – subjective: prediction, feeling www.decideo.fr/bruley
  • 11.
    Clauses Might flip wordsentiment – “not good at all” – “not all good” Might express sentiment not in any word – “convinced my watch had stopped” – “got up and walked out” www.decideo.fr/bruley
  • 12.
    Some Problems Which featuresto use? Words (unigrams), Phrases/n-grams, Sentences How to interpret features for sentiment detection? Bag of words (IR), Annotated lexicons (WordNet, SentiWordNet), Syntactic patterns, Paragraph structure Must consider other features due to… – Subtlety of sentiment expression • irony • expression of sentiment using neutral words – Domain/context dependence • words/phrases can mean different things in different contexts and domains – Effect of syntax on semantics www.decideo.fr/bruley
  • 13.
    Some Applications Examples Reviewclassification: Is a review positive or negative toward the movie? Product review mining: What features of the ThinkPad T43 do customers like/dislike? Tracking sentiments toward topics over time: Is anger ratcheting up or cooling down? Prediction (election outcomes, market trends): Will Obama or Republican candidate win? Etcetera www.decideo.fr/bruley
  • 14.
    Aster Data positionfor Text Analysis Data Data Acquisition Acquisition Gather text from relevant sources (web crawling, document scanning, news feeds, Twitter feeds, …) Pre-Processing Pre-Processing Mining Mining Analytic Analytic Applications Applications Perform processing required to transform and store text data and information Apply data mining techniques to derive insights about stored information Leverage insights from text mining to provide information that improves decisions and processes (stemming, parsing, indexing, entity extraction, …) (statistical analysis, classification, natural language processing, …) (sentiment analysis, document management, fraud analysis, e-discovery, ...) Aster Data Fit Third-Party Tools Fit Aster Data Value: Massive scalability of text storage and processing, Functions for text processing, Flexibility to develop diverse custom analytics and incorporate third-party libraries www.decideo.fr/bruley

Editor's Notes

  • #13 Lead in: these problems are similar to other IR tasks Have a body of text--- need to know how to classify it GRANULARITY --Most research has used unigrams (single words) --some research shows that k-length n-grams work best -------------------------------------------------------- Wordnet: Contains large lexicon with relationships Synonymy, antonymy, etc Syntactic patterns Indirect negation Setup/contradiction