Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election

Are Twitter Users Equal in
Predicting Elections?
A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries
(with additional insights into the 2012 General Election)

Lu Chen Wenbo Wang Amit Sheth
chen@knoesis.org wenbo@knoesis.org amit@knoesis.org

Ohio Center of Excellent in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, Dayton, OH, USA
Lu Chen, Wenbo Wang, Amit Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in
Predicting 2012 U.S. Republican Presidential Primaries. The 4th International Conference on Social Informatics
1
(SocInfo2012), December 5-8, 2012, Lausanne, Switzerland.

There is a surge of interest in building systems that harness the
power of social data to predict election results.
# of Facebook users
Twitter users’ talking about each
# of Facebook Positive/negative candidate; who is talking
“likes” & Twitter opinions about about which candidate :
“follower” each candidate age, gender, state

Tweets from
@BarackObama and
Real time semantic
@MittRomney organized
analysis of topic,
by engagement on Twitter
opinion, emotion, and
popularity about each
candidate

Are Twitter Users Equal in Predicting Elections? Lu Chen, Wenbo Wang, Amit Sheth 2

One problem seems to be ignored:
Are social media users equal
in predicting elections?
They may be from different countries and states.
They may be have different political beliefs.
They may be of different ages.
They may engage in the elections in different ways
and with different levels of involvement.
……
They may be … different in predicting elections…?
WHOSE opinion really matters?


o We Studied different groups of
social media users who engage in
the discussions of 2012 U.S.
Republican Presidential Primaries,
and compare the predictive power
among these user groups.

Data: Using Twitter Streaming API, we collected tweets that contain the words
“gingrich”, “romney”, “ronpaul”, or “santorum” from 01/10/2012 to 03/05/2012 (Super
Tuesday was 03/06/2012). The dataset comprises 6,008,062tweets from 933,343users.


User Categorization
2. Tweet Mode 3. Content Type
4. Political Preference

1. Engagement
Degree


1

More than half of the users posted only one tweet. Only 8% of the
users posted more than 10 tweets.
 A small group of users (0.23%) can produce a large amount of tweets
(23.73%) – Is tweet volume a reliable predictor?

2

The usage of hashtags and URLs reflects the users' intent to attract
people's attention on the topic they discuss. The more engaged users
show stronger such intent and are more involved in the election event.


According to users' preference on generating their tweets, i.e., tweet mode, we
classified the users as original tweet-dominant, original tweet-prone, balanced,
retweet-prone and retweet-dominant.

3

Engagement
Degree

 The original tweet-dominant group accounts for the biggest
proportion of users in every user engagement group.
 A significant number of users (34.71% of all the users) belong to the
retweet -dominant group, whose voting intent might be more difficult
to detect.


We use target-specific sentiment analysis techniques to classify each tweet as
positive or negative – whether the expressed opinion about a specific candidate is
positive or negative. The users are categorized based on whether they post more
information or more opinion.

4

Engagement
Degree

 More engaged users tend to post a mixture of content, with similar
proportion of opinion and information, or larger proportion of
information.


We collected a set of Twitter users with known political preference from Twellow
(http://www.twellow.com/categories/politics). Based on the assumption that a user tends
to follow others who share the same political preference as his/hers, we identified the
left-leaning and right-leaning users utilizing their following/follower relations. We
tested this method using a datasets of 3341 users, and it showed an accuracy of 0.9243.

5

 Right-leaning users were (as expected) more involved in republican
primaries in several ways: more users, more tweets, more original
tweets, higher usage of hashtags and URLs.

 We utilized the background knowledge from LinkedGeoData to identify the
states from user location information.
 If the user's state could not be inferred from his/her location in the profile, we
utilized the geographic locations of his/her tweets. A user was recognized as from
a state if his/her tweets were from that state.

6

The Pearson's r for the correlation between the number of users/tweets
and the population is 0.9459/0.9667 (p<.0001).


Predicting a User's Vote
• Basic idea: for which candidate the user shows the most support
– Frequent mentions The user
More mentions,
– Positive sentiment posted opinion
higher score
about c

More positive/less The user
negative opinions, mentioned c but
higher score did not post
Nm(c): the number of tweets mentioning the candidate c opinion about c
Npos(c): the number of positive tweets about candidate c
Nneg(c): the number of negative tweets about candidate c
(0 < < 1): smoothing parameter
(0 < < 1): discounting the score when the user does not
express any opinion towards c.

Prediction Results
We examine the predictive power of different user groups in predicting the
results of Super Tuesday races in 10 states.

To predict the election results in a state, we used only the collection of
users who are identified from that state.

We examined four time windows -- 7 days, 14 days, 28 days and 56 days
prior to the primary day. In a specific time window, a user's vote was
assessed using only the set of tweets he/she created during this time.

The results were evaluated in two ways: (1) the accuracy of predicting
winners, and (2) the error rate between the predicted percentage of votes
and the actual percentage of votes for each candidate.


7

The prediction accuracy:
 Engagement Degree: High > Low or Very Low
Tweet Mode: Original Tweet-Prone >Retweet-Prone
 Content Type: In a draw
 Political Preference: Right-Leaning >> Left Leaning


Revealing the challenge of
Retweets may not necessarily
8 identifying the vote intent of “silent
reflect users' attitude.
majority”

The right-leaning user group provides
the most accurate prediction result. In
the best case (56-day time window), it
correctly predict the winners in 8 out
of 10 states with an average
Prediction of user’s vote based on prediction error of 0.1.
more opinion tweets is not
necessarily more accurate than the To some extent, it demonstrates the
prediction using more information importance of identifying likely voters
tweets in electoral prediction.


Our findings
Twitter users are not “equal”
in predicting elections!
The likely voters’ opinions matter more.

Some users’ opinions are more difficult to identify because
of their lower levels of engagement
or the implicit ways to express opinions.


More Work need to be
done…

• Identifying likely/actual voters

• Improving sentiment analysis
techniques

• Investigating possible data biases
(e.g., spam tweets and political
campaign tweets) and how they
might affect the results

and more …


It is actually about tracking public opinion.

PollingorSocial Media Analysis?
1. Sample size
2. Representative of the target population
3. Accurate measure of opinions
4. Timeliness

1 Sample Size

Polling Social Media Analysis

Thousands of people Millions of people


2 Representative of the Target Population


 About 95% of US homes can be
reached by landline telephone and
cell phone.  About 60% of American adults
 Sampling the target population use social networking sites.
randomly. Difficult to do random sampling.
 Weighting the sample to census Limited demographic data
estimates for demographic (although with some work, can be
characteristics (gender, race, age, improved).
educational attainment, and
region).

[1] Can Social Media Be Used for Political Polling? http://www.radian6.com/blog/2012/07/can-social-media-be-used-for-political-polling/


3 Accurate measure of opinions

 Ask people what they think
Who will
you vote
for?
 Look at what people talk about
and extract their opinions
……

 Not as accurate as Polling


4 Timeliness


Not be able to track people’s
opinion in real time What is happening now


Social Media Analysis – Promising but Very
Challenging
 Extracting demographic
 Increasing number of social information
media users
 Identifying the target population
 Convenient and comfortable whose opinion matter, e.g. the
way to express opinions likely voters in electoral prediction

 The analysis can be done in real  Discriminate personal opinion
time from the voice of mainstream
media and political campaign
 Lower cost
 More accurate sentiment
A great complement (if not analysis/opinion mining,
substitute) for polling especially the identification of
opinions about a specific object

Our Twitris+ System kept tracking
people’s opinion on 2012 U.S.
Presidential Election in real time and this
is what we saw on the Election Day …

Subjective Information Extraction, Lu Chen 23

/t

The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST


Twitris+: http://twitris.knoesis.org/
Select event
Multi-faceted
Analysis

Select date

N-gram summaries

Related tweets Reference news Wikipedia articles


 A key innovation in sentiment analysis, employed in Twitris+, is topic specific sentiment
analysis -- to associate sentiment with an entity. The same sentiment phrases may be
assigned different polarities associated with different entities.
Twitris+ tracks sentiment trend about different entities, and identifies topics/events that
contribute to sentiment changes. The result is updated every hour.

Sentiment change about
BarackObama

Analysis can be
performed at
location (eg, by
state) or issue Positive/negative topics
based level (eg, that contribute to such
economy, tax, Sentiment change about change
social issues – Mitt Romney
women, …)

Individual tweets related
to chosen topic

Twitris+ Insights in 2012 Presidential Debates

How was Obama doing in the first debate?


How was Obama doing in the second debate?

Red Color: Negative Topics
Green Color: Positive Topics


Obama vsRomney in the third debate

Obama

Romney

You can find a lot more –
Eg analysis from network,
demographic,
emotion, temporal, …
perspectives at
http://twitris.knoesis.org

Thank you !
More about this study:
http://wiki.knoesis.org/index.php/ElectionPrediction
Kno.e.sis Center:
http://knoesis.wright.edu/
Twitris+:
http://twitris.knoesis.org/
Semantics driven Analysis of Social Media:
http://knoesis.org/research/semweb/projects/socialmedia


Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election

Similar to Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election (20)

Recently uploaded

Recently uploaded (20)

Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election

Editor's Notes