Daniel Gayo-Avello @PFCdgayo
it’s the year 2006…
it’s the year 2006…
it’s the year 2006…
Web searches in Spain
election dayWeb searches in Spain
election dayMultiple issues   Searching does not imply support.   Not every user is a voter.   …
election dayBesides…  the world is a very strange place.
election dayBesides…  the world is a very strange place.
30M records                   15M records       650,000 users                     ? users3 months (march-may 2006)        ...
bijnwwjwff   calvary baptist virginiabijnwwjwff   calvary baptist virginiabijnwwjwff   bunny mans bridgebijnwwjwff   calva...
“If you had to dream of research content, it would be sending out a diary andhaving people record their thoughts at the mo...
Twitter/Facebook research                    500                    450                    400                    350 Numb...
*…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its...
*…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its...
The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with...
The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with...
[...] the performance of their (Tumasjan et al.)indicator varies over time as well as it criticallyhinges upon which subse...
Winner predicted in only half of the races.Sentiment analysis slightly better thanrandom classifier (36.9%).Sentiment anal...
As the first reviewer says, unless a negative results paper ismethodologically impeccable, it is hard for its conclusions ...
NO COMMENT        Picture by J e n s (away)
“Those who cannot  remember the past arecondemned to repeat it.”          George Santayana
its the year 1936...
its the year 1936...
its the year 1936...
its the year 1936...
[...] the 1936 Literary Digest poll failed [...] notsimply because of its initial sample, but alsobecause of a low respons...
No streaming API, just search API.      obama OR biden / mccain OR palinRate-limited, “shrinking” sliding window for searc...
tweets for each ticket
very promising :)
Obama wins        Picture by J e n s (away)
Obama winseverywhere        Picture by J e n s (away)
[...] Big Data presents new opportunities forunderstanding social practice. Of course the next   statement must begin with...
Confronted with a “negative” result, therefore, a scientist might be tempted toeither not spend time publishing it (what i...
Counties with tweets in the dataset
Counties with tweets in the dataset
Popular vote by county (U.S. election 2008)
Counties with tweets in the dataset
4 different methods to infer voting intention
4 different methods to infer voting intention      Mention counts (Tumasjan et al.) – Not sentiment analysis
4 different methods to infer voting intention      Mention counts (Tumasjan et al.) – Not sentiment analysis     Count of ...
4 different methods to infer voting intention         Mention counts (Tumasjan et al.) – Not sentiment analysis        Cou...
4 different methods to infer voting intention         Mention counts (Tumasjan et al.) – Not sentiment analysis        Cou...
semantic orientationPointwise Mutual Information between bigrams and selected keyphrases                                  ...
semantic orientationPointwise Mutual Information between bigrams and selected keyphrases                                  ...
evaluation?
evaluation
evaluation #failRandom classiffier (TwitVote)   86.6%   13.4%   76.8%
anyway... evaluation
evaluation
evaluation
any explanation for these results?                         Picture by J e n s (away)
differences between                   urban & rural                   users/voters?Pictures by Stuck in Customs and chris ...
the collected sample seems toover-represent urban voters who weremore prone to vote for Obama in 2008.
it was 13.10%
although Twitter data overestimate theopinion of younger users, it is possible to     correct that, provided the actual ag...
“Shy-Republican”  factor?     Picture by Loren Javier
little canbe said :(    Picture by Loren Javier
Counties with tweets in the dataset
Counties with tweets in the dataset
“Predicting” elections with Twitter   You have missed       The opinions of those not using Twitter.       The opinions of...
If we were able to accurately predict elections    from social media then there would beinterest in tampering the data, he...
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter
Upcoming SlideShare
Loading in …5
×

A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter

3,395 views

Published on

Slide deck for an invited lecture I delivered at Yahoo! Research Barcelona on July 21, 2011.

It can be a nice companion for these two papers:

Don't turn social media into another 'Literary Digest' Poll (http://bit.ly/oVJAtP)

Limits of Electoral Predictions Using Twitter (http://bit.ly/nFNzGi)

Published in: Technology, News & Politics
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,395
On SlideShare
0
From Embeds
0
Number of Embeds
1,080
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

A warning against converting Twitter into the next 'Literary Digest' or NO YOU CANNOT predict elections with Twitter

  1. 1. Daniel Gayo-Avello @PFCdgayo
  2. 2. it’s the year 2006…
  3. 3. it’s the year 2006…
  4. 4. it’s the year 2006…
  5. 5. Web searches in Spain
  6. 6. election dayWeb searches in Spain
  7. 7. election dayMultiple issues Searching does not imply support. Not every user is a voter. …
  8. 8. election dayBesides… the world is a very strange place.
  9. 9. election dayBesides… the world is a very strange place.
  10. 10. 30M records 15M records 650,000 users ? users3 months (march-may 2006) 1 month (may 2006) Publicly available :) Available under agreement :(Terrible “anonymization” :( Reasonably anonymized :(
  11. 11. bijnwwjwff calvary baptist virginiabijnwwjwff calvary baptist virginiabijnwwjwff bunny mans bridgebijnwwjwff calvary baptist virginiabijnwwjwff calvary baptist virginiabijnwwjwff sexxybekahbijnwwjwff bekahs homepagebijnwwjwff bekahluvsyabijnwwjwff bekaluvsyabijnwwjwff "bekahluvs"
  12. 12. “If you had to dream of research content, it would be sending out a diary andhaving people record their thoughts at the moment. Thats like asocial scientists wet dream, right? And here it has kind of fallen on our lap,these ephemeral recordings that we would not have otherwise gotten.” Alex Halavais (on Harvard’s Facebook dataset)
  13. 13. Twitter/Facebook research 500 450 400 350 Number of papers 300 250 title 200 abstract 150 100 50 0 2000 2001 2003 2004 2005 2006 2007 2008 2009 2010Source: Search of ACM Digital Library papers mentioning twitter or facebook
  14. 14. *…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its predictive power evencomes close to traditional election polls.
  15. 15. *…+ the mere number of tweetsmentioning a political party can beconsidered a plausible reflection of thevote share and its predictive power evencomes close to traditional election polls.
  16. 16. The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with r = 72.5% for k = 15.[...] in 2008 the sentiment ratio does not substantiallycorrelate to the election polls (r = -8%) [...] We mightexpect the sentiment for mccain to be vary inverselywith obama, but they in fact slightly correlate.
  17. 17. The job approval poll is the most straightforward [...]The sentiment ratio also generally declines during thisperiod, with r = 72.5% for k = 15.[...] in 2008 the sentiment ratio does not substantiallycorrelate to the election polls (r = -8%) [...] We mightexpect the sentiment for mccain to be vary inverselywith obama, but they in fact slightly correlate.
  18. 18. [...] the performance of their (Tumasjan et al.)indicator varies over time as well as it criticallyhinges upon which subset of the German partysystem is covered. The number of partymentions in the Twittersphere is thus not a validindicator of offline political sentiment or even offuture election outcomes.
  19. 19. Winner predicted in only half of the races.Sentiment analysis slightly better thanrandom classifier (36.9%).Sentiment analysis weakly correlates withusers’ political preference.
  20. 20. As the first reviewer says, unless a negative results paper ismethodologically impeccable, it is hard for its conclusions to bebelieved.there is certainly a lot of misplaced optimism in how much signal isavailable. [...] The main difficulty with this type of counter-argumentpaper is that it is hard to make it immune to attacks of them form: "youwould have done better if you did a different kind of analysis." (if onlyyou had looked at this other time periods, a broader set of tweets, appliedthis other sentiment analysis technique, etc.) Its not clear to me what canbe done to relieve this, but by concentrating on the failure of a specificset of techniques it is not obvious how the reader can take this asevidence of failure of the idea in general.
  21. 21. NO COMMENT Picture by J e n s (away)
  22. 22. “Those who cannot remember the past arecondemned to repeat it.” George Santayana
  23. 23. its the year 1936...
  24. 24. its the year 1936...
  25. 25. its the year 1936...
  26. 26. its the year 1936...
  27. 27. [...] the 1936 Literary Digest poll failed [...] notsimply because of its initial sample, but alsobecause of a low response rate combined with anonresponse bias.Failure to properly handle participation problemscan damage the results produced by any poll.
  28. 28. No streaming API, just search API. obama OR biden / mccain OR palinRate-limited, “shrinking” sliding window for searches. Tweets collected for each county in each swing state plus California and Texas. 240k tweets / 21.2k users (June 1 to Nov 11, 2008)
  29. 29. tweets for each ticket
  30. 30. very promising :)
  31. 31. Obama wins Picture by J e n s (away)
  32. 32. Obama winseverywhere Picture by J e n s (away)
  33. 33. [...] Big Data presents new opportunities forunderstanding social practice. Of course the next statement must begin with a “but.” And that “but” is simple: Just because you see traces of data doesn’t mean you always know the intention or cultural logic behind them. And just because you have a big N doesn’t mean that it’s representative or generalizable.
  34. 34. Confronted with a “negative” result, therefore, a scientist might be tempted toeither not spend time publishing it (what isoften called the “file-drawer effect” *…+) or to turn it somehow into a positive result.
  35. 35. Counties with tweets in the dataset
  36. 36. Counties with tweets in the dataset
  37. 37. Popular vote by county (U.S. election 2008)
  38. 38. Counties with tweets in the dataset
  39. 39. 4 different methods to infer voting intention
  40. 40. 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis
  41. 41. 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al.
  42. 42. 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al. Vote & Flip (Choi & Clardie) – based on # pos, neg, neutral & negations
  43. 43. 4 different methods to infer voting intention Mention counts (Tumasjan et al.) – Not sentiment analysis Count of polarized words – Based on lexicon by Wilson et al. Vote & Flip (Choi & Clardie) – based on # pos, neg, neutral & negations Semantic orientation (Turney) – PMI between bigrams and keyphrases
  44. 44. semantic orientationPointwise Mutual Information between bigrams and selected keyphrases I will vote for I’m not voting for I’d vote ...
  45. 45. semantic orientationPointwise Mutual Information between bigrams and selected keyphrases I will vote for I’m not voting for I’d vote ...
  46. 46. evaluation?
  47. 47. evaluation
  48. 48. evaluation #failRandom classiffier (TwitVote) 86.6% 13.4% 76.8%
  49. 49. anyway... evaluation
  50. 50. evaluation
  51. 51. evaluation
  52. 52. any explanation for these results? Picture by J e n s (away)
  53. 53. differences between urban & rural users/voters?Pictures by Stuck in Customs and chris runoff
  54. 54. the collected sample seems toover-represent urban voters who weremore prone to vote for Obama in 2008.
  55. 55. it was 13.10%
  56. 56. although Twitter data overestimate theopinion of younger users, it is possible to correct that, provided the actual age distribution was known.
  57. 57. “Shy-Republican” factor? Picture by Loren Javier
  58. 58. little canbe said :( Picture by Loren Javier
  59. 59. Counties with tweets in the dataset
  60. 60. Counties with tweets in the dataset
  61. 61. “Predicting” elections with Twitter You have missed The opinions of those not using Twitter. The opinions of those not publicly tweeting. The opinions of those publicly tweeting but not discussing politics. You have taken into account The opinions of those discussing politics on Twitter but who are not voting. Besides, You have inferred votes using noisy and not-that-accurate methods.
  62. 62. If we were able to accurately predict elections from social media then there would beinterest in tampering the data, hence, making predictions impossible.

×