Extracting What We Think and How We Feel from What We Say in Social Media

Extracting What
We Think and How
We Feel from What
We Say in Social
Media---- Subjective Information Extraction
Subjective Information Extraction, Lu Chen 1
Lu Chen
Kno.e.sis Center
Wright State University
http://cdryan.com/blog/think-feel/

Directions
• From coarse-grained to fine-grained
– Document level -> sentence level -> expression level
– General sentiment -> domain-dependent sentiment -> target-dependent sentiment
– Sentiment  Subjective information
• Sentiment (positive/negative/neutral) -> emotion (happy, sad, angry, surprise, etc.)
• Other types of subjective information: Intent, suggestion/recommendation,
wish/expectation, outlook, viewpoint, etc.
• From static to dynamic
– Our attitude can be changed during social communication.
• Modeling, detecting, and tracking the change of attitude
• What leads to the change of attitude? E.g., persuasion campaign
static
dynamic
coarse-grained
fine-grained
subjective information

Extracting a diverse and richer
set of sentiment-bearing
expressions, including formal
and slang words/phrases
Assessing the
target-dependent polarity
of each sentiment
expression
A novel formulation of assigning
polarity to a sentiment expression
as a constrained optimization
problem over the tweet corpus
Extracting Diverse Sentiment Expressions
With Target-dependent Polarity from Twitter
Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, and Amit P. Sheth

Approach
Extracting
Candidate Expressions
Identifying
Inter-Expression Relations
Assessing
Target-dependent Polarity

Extracting Candidate Expressions
• Root word: a word that is considered sentiment-bearing in general
sense.
• Collecting root words from
– General-purpose sentiment lexicons: MPQA, General Inquirer, and
SentiWordNet
– Slang dictionary: Urban Dictionary
• For each tweet, selecting the “on-target” root words, and extracting
all the n-grams that contain at least one selected root word as
candidates

Identifying Inter-Expression Relations
• Connecting the candidate expressions via two types of inter-
expression relations – consistency relation and inconsistency
relation
• Basic ideas:
– A sentiment expression is inconsistent with its negation; two sentiment
expressions linked by contrasting conjunctions are likely to be
inconsistent.
– Two adjacent expressions are consistent if they do not overlap, and
there is no extra negation applied to them or no contrasting conjunction
connecting them.

An Example
1. I saw The Avengers yesterday evening. It was long but it was very good!
2. I do enjoy The Avengers, but it's both overrated and problematic.
3. Saw the avengers last night. Mad overrated. Cheesy lines and horrible
writing. Very predictable.
4. The avengers was good but the plot was just simple minded and predictable.
5. The Avengers was good. I was not disappointed.

Assessing Target-dependent Polarity
• For each candidate expression ,
– P-Probability – the probability that indicates positive
sentiment
– N-Probability – the probability that indicates negative
sentiment
• For each pair of candidate expressions and ,
– Consistency probability – the probability that and have the same
polarity:
– Inconsistency probability – the probability that and have
different polarities:
ic
)(Pr i
P
c
)(Pr i
N
c
ic
ic
1)(Pr)(Pr  i
N
i
P
cc
ic jc
ic jc
)(Pr)(Pr)(Pr)(Pr),(Pr j
N
i
N
j
P
i
P
ji
cons
cccccc 
ic jc
)(Pr)(Pr)(Pr)(Pr),(Pr j
P
i
N
j
N
i
P
ji
incons
cccccc 

An Optimization Model
• We want the consistency and inconsistency probabilities derived
from the the P-Probabilities and N-Probabilities of the candidates
will be closest to their expectations suggested by the relation
networks.
• Objective Function:
    








 
1
1
22
),(Pr1),(Pr1minimize
n
i
n
ij
ji
inconsincons
ijji
conscons
ij ccwccw
where and are the weights of the edges (the frequency of the
relations) between and in the consistency and inconsistency
relation networks, and n is the total number of candidate expressions.
ic jc
cons
ijw incons
ijw

The Example

Evaluation
• Datasets:
– 168,005 tweets about movies
– 258,655 tweets about persons
• Gold standard:
– 1,500 tweets labeled with sentiment expressions and overall polarities for
the movie targets
– 1,500 tweets labeled with sentiment expressions and overall polarities for
the person targets
• Baseline methods:
– MPQA, GI, SWN: For each extracted root word regarding the target, simply
look up its polarity in MPQA, General Inquirer and SentiWordNet,
respectively.
– PROP: a propagation approach proposed by Qiu et al. (2009)
– COM-const: Assign 0.5 to all the candidates as their initial P-Probabilities.
– COM-gelex: Initialize the candidates’ polarities according to the root word
set.
Reference: Qiu, G.; Liu, B.; Bu, J.; and Chen, C. 2009. Expanding domain sentiment lexicon through double propagation. In Proc. of IJCAI.

Application

Relevance of User Groups Based on Demographics and
Participation to Social Media Based Prediction
-- -- A Case Study of 2012 U.S. Republican Presidential Primaries
Lu Chen, Wenbo Wang, and Amit P. Sheth
• Existing studies on predicting election result are under the
assumption that all the users should be treated equally.
• How could different groups of users be different in predicting
election results?
1. Providing a detailed analysis of the social media users on different
dimensions
2. Estimating the “vote” of each user by analyzing his/her tweets, and
predicted the results based on “vote-counting”
3. Examining the predictive power of different user groups in predicting
the results of Super Tuesday races in 10 states

User Categorization
Engagement Degree
Tweet Mode
Content Type
Political Preference
Location

Electoral Prediction with Different User Groups
Revealing the challenge of identifying
the vote intent of “silent majority”
Retweets may not necessarily reflect
users' attitude.

Electoral Prediction with Different User Groups
Prediction of user’s vote based on
more opinion tweets is not
necessarily more accurate than the
prediction using more information
tweets
The right-leaning user group provides
the most accurate prediction result. It
correctly predict the winners in 8 out
of 10 states.
To some extent, it demonstrates the
importance of identifying likely voters
in electoral prediction.

Emotion
• Discovering Fine-grained Sentiment in Suicide Notes: Classify each
sentence from suicide notes into 15 emotional categories, e.g., love,
pride, guilt, blame, hopelessness, etc.
• Emotion Identification from Twitter Data: 7 emotion categories,
including joy, sadness, anger, lover, fear, thankfulness, and surprise
– Can we automatically create a large emotion dataset with high quality
labels from Twitter? How?
– What features can effectively improve the performance of supervised
machine learning algorithms?
– How much performance will be gained by increasing the size of the
training data?
– Can the system developed on Twitter data be directly applied to identify
emotions from other datasets?

What’s next?
static
dynamic
coarse-grained fine-grained
Detecting the
change of
attitude during
persuasive
communication
Discriminating
other types of
from sentiment,
e.g., wish,
intent

Thank you !

Extracting What We Think and How We Feel from What We Say in Social Media

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Extracting What We Think and How We Feel from What We Say in Social Media

Similar to Extracting What We Think and How We Feel from What We Say in Social Media (20)

Recently uploaded

Recently uploaded (20)

Extracting What We Think and How We Feel from What We Say in Social Media

Editor's Notes