• Save
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
Upcoming SlideShare
Loading in...5
×
 

A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter

on

  • 2,799 views

Slide deck for an invited lecture I delivered at Yahoo! Research Barcelona on July 21, 2011. ...

Slide deck for an invited lecture I delivered at Yahoo! Research Barcelona on July 21, 2011.

It can be a nice companion for these two papers:

Don't turn social media into another 'Literary Digest' Poll (http://bit.ly/oVJAtP)

Limits of Electoral Predictions Using Twitter (http://bit.ly/nFNzGi)

Statistics

Views

Total Views
2,799
Slideshare-icon Views on SlideShare
1,800
Embed Views
999

Actions

Likes
8
Downloads
0
Comments
0

8 Embeds 999

http://www.di.uniovi.es 804
http://di002.edv.uniovi.es 133
http://paper.li 29
http://paper.li 29
http://twitter.com 1
http://translate.googleusercontent.com 1
http://a0.twimg.com 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter Presentation Transcript

    • Daniel Gayo-Avello @PFCdgayo
    • it’s the year 2006…
    • it’s the year 2006…
    • it’s the year 2006…
    • Web searches in Spain
    • election dayWeb searches in Spain
    • election dayMultiple issues Searching does not imply support. Not every user is a voter. …
    • election dayBesides… the world is a very strange place.
    • election dayBesides… the world is a very strange place.
    • 30M records 15M records 650,000 users ? users3 months (march-may 2006) 1 month (may 2006) Publicly available :) Available under agreement :(Terrible “anonymization” :( Reasonably anonymized :(
    • bijnwwjwff calvary baptist virginiabijnwwjwff calvary baptist virginiabijnwwjwff bunny mans bridgebijnwwjwff calvary baptist virginiabijnwwjwff calvary baptist virginiabijnwwjwff sexxybekahbijnwwjwff bekahs homepagebijnwwjwff bekahluvsyabijnwwjwff bekaluvsyabijnwwjwff "bekahluvs"
    • “If you had to dream of research content, it would be sending out a diary andhaving people record their thoughts at the moment. Thats like asocial scientists wet dream, right? And here it has kind of fallen on our lap,these ephemeral recordings that we would not have otherwise gotten.” Alex Halavais (on Harvard’s Facebook dataset)
    • Twitter/Facebook research 500 450 400 350 Number of papers 300 250 title 200 abstract 150 100 50 0 2000 2001 2003 2004 2005 2006 2007 2008 2009 2010Source: Search of ACM Digital Library papers mentioning twitter or facebook
    • *…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its predictive power evencomes close to traditional election polls.
    • *…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its predictive power evencomes close to traditional election polls.
    • The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with r = 72.5% for k = 15.[...] in 2008 the sentiment ratio does not substantiallycorrelate to the election polls (r = -8%) [...] We mightexpect the sentiment for mccain to be vary inverselywith obama, but they in fact slightly correlate.
    • The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with r = 72.5% for k = 15.[...] in 2008 the sentiment ratio does not substantiallycorrelate to the election polls (r = -8%) [...] We mightexpect the sentiment for mccain to be vary inverselywith obama, but they in fact slightly correlate.
    • [...] the performance of their (Tumasjan et al.)indicator varies over time as well as it criticallyhinges upon which subset of the German partysystem is covered. The number of partymentions in the Twittersphere is thus not a validindicator of offline political sentiment or even offuture election outcomes.
    • Winner predicted in only half of the races.Sentiment analysis slightly better thanrandom classifier (36.9%).Sentiment analysis weakly correlates withusers’ political preference.
    • As the first reviewer says, unless a negative results paper ismethodologically impeccable, it is hard for its conclusions to bebelieved.there is certainly a lot of misplaced optimism in how much signal isavailable. [...] The main difficulty with this type of counter-argumentpaper is that it is hard to make it immune to attacks of them form: "youwould have done better if you did a different kind of analysis." (if onlyyou had looked at this other time periods, a broader set of tweets, appliedthis other sentiment analysis technique, etc.) Its not clear to me what canbe done to relieve this, but by concentrating on the failure of a specificset of techniques it is not obvious how the reader can take this asevidence of failure of the idea in general.
    • NO COMMENT Picture by J e n s (away)
    • “Those who cannot remember the past arecondemned to repeat it.” George Santayana
    • its the year 1936...
    • its the year 1936...
    • its the year 1936...
    • its the year 1936...
    • [...] the 1936 Literary Digest poll failed [...] notsimply because of its initial sample, but alsobecause of a low response rate combined with anonresponse bias.Failure to properly handle participation problemscan damage the results produced by any poll.
    • No streaming API, just search API. obama OR biden / mccain OR palinRate-limited, “shrinking” sliding window for searches. Tweets collected for each county in each swing state plus California and Texas. 240k tweets / 21.2k users (June 1 to Nov 11, 2008)
    • tweets for each ticket
    • very promising :)
    • Obama wins Picture by J e n s (away)
    • Obama winseverywhere Picture by J e n s (away)
    • [...] Big Data presents new opportunities forunderstanding social practice. Of course the next statement must begin with a “but.” And that “but” is simple: Just because you see traces of data doesn’t mean you always know the intention or cultural logic behind them. And just because you have a big N doesn’t mean that it’s representative or generalizable.
    • Confronted with a “negative” result, therefore, a scientist might be tempted toeither not spend time publishing it (what isoften called the “file-drawer effect” *…+) or to turn it somehow into a positive result.
    • Counties with tweets in the dataset
    • Counties with tweets in the dataset
    • Popular vote by county (U.S. election 2008)
    • Counties with tweets in the dataset
    • 4 different methods to infer voting intention
    • 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis
    • 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al.
    • 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al. Vote & Flip (Choi & Clardie) – based on # pos, neg, neutral & negations
    • 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al. Vote & Flip (Choi & Clardie) – based on # pos, neg, neutral & negations Semantic orientation (Turney) – PMI between bigrams and keyphrases
    • semantic orientationPointwise Mutual Information between bigrams and selected keyphrases I will vote for I’m not voting for I’d vote ...
    • semantic orientationPointwise Mutual Information between bigrams and selected keyphrases I will vote for I’m not voting for I’d vote ...
    • evaluation?
    • evaluation
    • evaluation #failRandom classiffier (TwitVote) 86.6% 13.4% 76.8%
    • anyway... evaluation
    • evaluation
    • evaluation
    • any explanation for these results? Picture by J e n s (away)
    • differences between urban & rural users/voters?Pictures by Stuck in Customs and chris runoff
    • the collected sample seems toover-represent urban voters who weremore prone to vote for Obama in 2008.
    • it was 13.10%
    • although Twitter data overestimate theopinion of younger users, it is possible to correct that, provided the actual age distribution was known.
    • “Shy-Republican” factor? Picture by Loren Javier
    • little canbe said :( Picture by Loren Javier
    • Counties with tweets in the dataset
    • Counties with tweets in the dataset
    • “Predicting” elections with Twitter You have missed The opinions of those not using Twitter. The opinions of those not publicly tweeting. The opinions of those publicly tweeting but not discussing politics. You have taken into account The opinions of those discussing politics on Twitter but who are not voting. Besides, You have inferred votes using noisy and not-that-accurate methods.
    • If we were able to accurately predict elections from social media then there would beinterest in tampering the data, hence, making predictions impossible.