The Potential and Perils of
Election Prediction Using
Social Media Sources
Federico Nanni and Josh Cowls
University of
Mannheim/Comparative Media
Studies, MIT
Reasons to be cheerful
+ Social media data is (often) cheap
+ Phone response rates are in decline
+ More granularity available?
Cost
Utility
Traditional inferential model Social media model
Reasons to be doubtful
- Myriad reliability issues...
– Difficult to establish the meaning
of latent messages
– Platform specific behaviours (e.g.
hashtags, likes) are not always
understood
– Political discourse often laced
with e.g. sarcasm
- The ethics of collecting and
using social media data
Results to date have been mixed...
• A meta-analysis found little evidence that
using Twitter to predict elections is better
than chance in the aggregate (Gayo-Avello,
2013)
• Nonetheless, social media can provide an
‘early warning system’ for a candidate’s
momentum (Jensen and Anstead, 2013)
• Big problem: what’s in a name?
Our approach:
intention over attention
• Most models count references to candidates’
or parties’ names – measuring attention
• Other models use sentiment analysis,
seeking to ascertain emotion responses to
candidates
• We built an intention model, collecting
instances of vote declarations for specific
candidates
Case study
• Context: Labour and the Lib Dems
required new leaders in 2015 (after a
polling fail!)
• Leadership elections conducted in summer
2015
– Lib Dems: two candidates (Tim Farron,
Norman Lamb)
– Labour: four candidates (Jeremy Corbyn,
Andy Burnham, Yvette Cooper, Liz Kendall)
Advantages of our case
• Primary candidates’ names easier to isolate
than ambiguous party names (“Labour”,
“Liberal”)
• Party elections are a minority sport – better
signal to noise ratio?
• Start and end dates clear; postal vote system
ensured greater period of decision-making
Method
 Wrote Python scripts to collect tweets which:
 Mentioned the name of a candidate
 Included a specific declaration to vote (“I’ll vote for...”,
“I’m voting for” etc)
 Cleaned data
 Removed non-declarations (“I’m not voting for...”)
 Ascertained preferred candidate in ambiguous cases
 Final dataset: 1361 valid declarations for Lib Dem
race and 17617 for Labour
Analysis (1)
Analysis (2)
Key successes
• ‘Intention’ model beat out ‘Attention’ model
in 5 out of 6 races, and in both races
overall
• Lib Dem prediction accuracy close to
traditional margin of error (MOE = 3.5)
• Caught Corbyn’s success to a high degree
of accuracy (MOE = 2)
Reflections and future work
• Tough to generalise successes – specific
cases, particular platform. (How) would this
work for:
– Multi-state process (e.g. US primaries)?
– General elections?
• Despite ongoing challenges, social media will
surely play a key role in the future of accurate
election prediction

The Potential and Perils of Election Prediction Using Social Media Sources

  • 1.
    The Potential andPerils of Election Prediction Using Social Media Sources Federico Nanni and Josh Cowls University of Mannheim/Comparative Media Studies, MIT
  • 2.
    Reasons to becheerful + Social media data is (often) cheap + Phone response rates are in decline + More granularity available? Cost Utility Traditional inferential model Social media model
  • 3.
    Reasons to bedoubtful - Myriad reliability issues... – Difficult to establish the meaning of latent messages – Platform specific behaviours (e.g. hashtags, likes) are not always understood – Political discourse often laced with e.g. sarcasm - The ethics of collecting and using social media data
  • 4.
    Results to datehave been mixed... • A meta-analysis found little evidence that using Twitter to predict elections is better than chance in the aggregate (Gayo-Avello, 2013) • Nonetheless, social media can provide an ‘early warning system’ for a candidate’s momentum (Jensen and Anstead, 2013) • Big problem: what’s in a name?
  • 5.
    Our approach: intention overattention • Most models count references to candidates’ or parties’ names – measuring attention • Other models use sentiment analysis, seeking to ascertain emotion responses to candidates • We built an intention model, collecting instances of vote declarations for specific candidates
  • 6.
    Case study • Context:Labour and the Lib Dems required new leaders in 2015 (after a polling fail!) • Leadership elections conducted in summer 2015 – Lib Dems: two candidates (Tim Farron, Norman Lamb) – Labour: four candidates (Jeremy Corbyn, Andy Burnham, Yvette Cooper, Liz Kendall)
  • 7.
    Advantages of ourcase • Primary candidates’ names easier to isolate than ambiguous party names (“Labour”, “Liberal”) • Party elections are a minority sport – better signal to noise ratio? • Start and end dates clear; postal vote system ensured greater period of decision-making
  • 8.
    Method  Wrote Pythonscripts to collect tweets which:  Mentioned the name of a candidate  Included a specific declaration to vote (“I’ll vote for...”, “I’m voting for” etc)  Cleaned data  Removed non-declarations (“I’m not voting for...”)  Ascertained preferred candidate in ambiguous cases  Final dataset: 1361 valid declarations for Lib Dem race and 17617 for Labour
  • 9.
  • 10.
  • 11.
    Key successes • ‘Intention’model beat out ‘Attention’ model in 5 out of 6 races, and in both races overall • Lib Dem prediction accuracy close to traditional margin of error (MOE = 3.5) • Caught Corbyn’s success to a high degree of accuracy (MOE = 2)
  • 12.
    Reflections and futurework • Tough to generalise successes – specific cases, particular platform. (How) would this work for: – Multi-state process (e.g. US primaries)? – General elections? • Despite ongoing challenges, social media will surely play a key role in the future of accurate election prediction