Predicting Swedish Elections with Twitter:
A Case for Stochastic Link Structure Analysis
Nima Dokoohaki, Filippia Zikou,
Daniel Gillblad, Mihhail Matskin
Agenda
➢Introduction
➢Swedish Elections On Twitter
➢Data Analysis Framework
○Social Network Analysis
○Popularity modeling
○Statistical correlation modeling
➢Conclusion and Future Work
Social Media
➢ Twitter within politics domain
○ Express opinion through short texts of 140 characters
○ Follow politicians through mentions and retweets - @
○ Follow discussions through hashtags - #
➢ The role of Twitter during European Election in Sweden, May 2014
➢ The role of Twitter during General Swedish Election, September
2014
Swedish Election on Twitter
Summary of Elections results and data gathered from
our framework
Data Aggregation
➢ In total 7,000,000 tweets
○70% Swedish
○2,000,000 political text
○210,000 users
○130,000 political profiles
➢ Twitter Streaming API
○ From February until September 2014
○ Filtered politicians and keywords (#val2014, #svpol) as well as tweets from swedish locations
European and General Election
Time-Lines
Continous Powerlaw distribution for european election network
(log likelihood estimator with α=14.13428, p=0.9999887)
Continous Powerlaw distribution for general election network
(log likelihood estimator with α=12.74022, p=0.9995703)
Party Interaction through Tweets
(4,000 nodes, 20,000 edges)
Unlabeled visualization of political discussions
(50,000 nodes, 240,000 edges)
Link Prediction with Electoral
Networks
”Who to Follow”: User Recommendation service of Twitter
Run egocentric random walk for each user.
Create Circle of Trust (CoT) with N most frequently visited vertices
Design Bipartite graph of CoT and subset of their followees
on the left
Run SALSA ranking algorithm for top K scored vertices
Stochastic Approach for Link-Structure Analysis
Collaborative Filtering Algorithm
Built on top of Cassovary, an open source
in-memory graph processing engine
Popularity Estimates: Parties,
Members and Coalitions
DrunkardMob component of GraphChi popularities
PARTIES MEMBERS
Popularity Estimates: Parties,
Members and Coalitions
DrunkardMob component of GraphChi popularities
COALITION
(aggregated party accounts)
COALITION
(aggregated member accounts)
Correlating Network Dynamism to
Vote Outcomes
➢ Statistical variables: popularity standings and vote outcome
➢ Pearson’s coefficient (linear dependence between values)
○ xi and yi are statistical variables
➢ Spearman’ s coefficient (monotonic function)
○ n is size of population
dn is the difference between the ranks
Conclusion and Future Work
The importance of Twitter during elections (EU and
General).
Extract parties popularity through structural link formation
and correlate with votes outcome
Future focus at textual content of tweets using sentiment
analysis and topic modeling
Questions
THANK YOU!
Acknolwedgement
1. European Data Science Academy project
www.edsa-project.eu
1. Linn Sandberg,
University of Oslo, Norway
1. Nina Tahmasebi,
Språkbanken, University of Göteborg
1. Johan Hammarlund,
MBrain AB, https://www.m-brain.com/
1. Staffan Truve,
RecordedFuture AB, www.recordedfuture.se

Snaa2015 (1)

  • 1.
    Predicting Swedish Electionswith Twitter: A Case for Stochastic Link Structure Analysis Nima Dokoohaki, Filippia Zikou, Daniel Gillblad, Mihhail Matskin
  • 2.
    Agenda ➢Introduction ➢Swedish Elections OnTwitter ➢Data Analysis Framework ○Social Network Analysis ○Popularity modeling ○Statistical correlation modeling ➢Conclusion and Future Work
  • 3.
    Social Media ➢ Twitterwithin politics domain ○ Express opinion through short texts of 140 characters ○ Follow politicians through mentions and retweets - @ ○ Follow discussions through hashtags - # ➢ The role of Twitter during European Election in Sweden, May 2014 ➢ The role of Twitter during General Swedish Election, September 2014
  • 4.
    Swedish Election onTwitter Summary of Elections results and data gathered from our framework
  • 5.
    Data Aggregation ➢ Intotal 7,000,000 tweets ○70% Swedish ○2,000,000 political text ○210,000 users ○130,000 political profiles ➢ Twitter Streaming API ○ From February until September 2014 ○ Filtered politicians and keywords (#val2014, #svpol) as well as tweets from swedish locations
  • 6.
    European and GeneralElection Time-Lines Continous Powerlaw distribution for european election network (log likelihood estimator with α=14.13428, p=0.9999887) Continous Powerlaw distribution for general election network (log likelihood estimator with α=12.74022, p=0.9995703) Party Interaction through Tweets (4,000 nodes, 20,000 edges) Unlabeled visualization of political discussions (50,000 nodes, 240,000 edges)
  • 7.
    Link Prediction withElectoral Networks ”Who to Follow”: User Recommendation service of Twitter Run egocentric random walk for each user. Create Circle of Trust (CoT) with N most frequently visited vertices Design Bipartite graph of CoT and subset of their followees on the left Run SALSA ranking algorithm for top K scored vertices Stochastic Approach for Link-Structure Analysis Collaborative Filtering Algorithm Built on top of Cassovary, an open source in-memory graph processing engine
  • 8.
    Popularity Estimates: Parties, Membersand Coalitions DrunkardMob component of GraphChi popularities PARTIES MEMBERS
  • 9.
    Popularity Estimates: Parties, Membersand Coalitions DrunkardMob component of GraphChi popularities COALITION (aggregated party accounts) COALITION (aggregated member accounts)
  • 10.
    Correlating Network Dynamismto Vote Outcomes ➢ Statistical variables: popularity standings and vote outcome ➢ Pearson’s coefficient (linear dependence between values) ○ xi and yi are statistical variables ➢ Spearman’ s coefficient (monotonic function) ○ n is size of population dn is the difference between the ranks
  • 11.
    Conclusion and FutureWork The importance of Twitter during elections (EU and General). Extract parties popularity through structural link formation and correlate with votes outcome Future focus at textual content of tweets using sentiment analysis and topic modeling
  • 12.
  • 13.
    Acknolwedgement 1. European DataScience Academy project www.edsa-project.eu 1. Linn Sandberg, University of Oslo, Norway 1. Nina Tahmasebi, Språkbanken, University of Göteborg 1. Johan Hammarlund, MBrain AB, https://www.m-brain.com/ 1. Staffan Truve, RecordedFuture AB, www.recordedfuture.se

Editor's Notes

  • #2 on which I have been involved as part of my thesis in KTH university
  • #4 real-time information dissemination on Twitter allows masses to express and follow opinions freely, in short message and quick ways to follow up interesting topics (hashtags) and politicians (mentions) Wordcloud: representation of example topics within political conversation. poll, debate, alliance, jobs, migration, feminism, social events (almedalen), news/media channels (atonbladet, dinröst) Study the link structures / tweet network surrounding political tweets to capture the popularity of parties and politicians
  • #5 The outcome of the vote, saw the rise of left-wing parties in popularity, specially Social Democrats (S) which took the highest proportion of votes. Whilst the right-leaning parties took the lower vote percentage, specially Moderate Party, who were at the time in power. General elections were held in Sweden on 14 September 2014. During the elections, The Center-right Alliance coalition (comprised of Moderate Party, Liberal People’s Party, Center Party and Christian Democrats) competed for a third term in office. In contrast to the 2010 election, three dominant left-leaning parties (Social Democrats, Green Party and Left Party) took the lead and won, while Alliance came second, and Sweden Democrats, a nationalist party took the third place. Finally, the Feminist Initiative and a left leaning party, did not secure the 4% needed threshold to join the government. Mention that far right and feminist party (popular topics nowdays in EU) were very discussed within twitter despite the outcome in the real election With valence we try to find the important tags of each party. Since hashtags hide information it is important to observe for which party a topic is more important. What this equation measures is the percentage of occurances of a hashtag per party.
  • #6 Gather tweets from streaming api and store in MongoDB database. explain how political users are identified: manual search for politicians or people who support a specific party, algorithm to identify people posting for one specific party Timeline of tweets, two bursts on the plot for eu and general. High density of tweets during the month of EU elections
  • #7 We consider a combined selection of mentions and retweets as the source of informarion Understand the dynamics of parties interaction within and without their neighborhood. 1st graph: Such community analysis will help us understand how coherent the parties are in terms of their internal formation, meaning how much they interact with their own members as well as how much other parties interact with them. 2nd graph: edges. This graph contains the discussion between public and politicians, explaining such larger data. (source - destination colours) Figure 1 and 2 show that clustering coefficient for both graphs is proper fit for continous power-law distribution. A scale-free network is a network whose degree distribution follows a power law, Many networks have been reported to be scale-free, although statistical analysis has refuted many of these claims and seriously questioned others (from Wikipedia). Now the interesting part: “.... important characteristic of scale-free networks is the clustering coefficient distribution, which decreases as the node degree increases. This distribution also follows a power law. Consider a social network in which nodes are people and links are acquaintance relationships between people. It is easy to see that people tend to form communities, i.e., small groups in which everyone knows everyone…Some people, however, are connected to a large number of communities (e.g., celebrities, politicians). Those people may be considered the hubs responsible for the small-world phenomenon.” So we evaluated and proved this phenomena as shown in the plots: Plots show distribution for total degrees of edges of networks, as one might expect in a social network setting. Degree distributions for the second network seems to be a better representation of a scale-free network.
  • #8 Link prediction to measure popularity: 1. explain density of conversations about parties and members that will reflect on popoularity 2. dynamics of interactions with accounts of politicians reflects on popularity that a link mining stochastic approach can capture Twitter’s Who To Follow (WTF) service [13], implements a Monte Carlo approximation of Personalized PageRank referred to as Circle-of Trust (CoT) which in turn generates a collection of egocentric random walks. These walk models are consumed by the follower recommendation algorithm of Twitter, called SALSA [13]. SALSA uses the random walks to constructs a bipartite graph from a collection of interest nodes, which become classified as Hubs and Authorities, similar to HITS algorithm [16]. Random walks are bi-directional due to bi-directional edges of graph and also to guarantee convergence. Provided that Circle of Trust is an approximation of Personalized PageRank we decided to use equivalent implementation of this recommendation process using DrunkardMob engine [18], which provides a more scalable random walk model and most importantly more accurate Personalized PageRank approximates. run drunkardmob: -n,--nshards <arg> number of machines 8 -s,--nsources <arg> number of sources 4000 -w,--walkspersource <arg> number of walks to start from each source 1000
  • #9 From DB group tweets by user to get statistics. (Nima) Scores are weighted sum: SUM(PR)/Total Executed 50 million walks for each graph and then we focus on the selection of the specific parties and some of their individual members Figures show estimated size of top-k (out degree) scores for each party according to personalized PageRank values. In each experiment, we took the identifier of the source node of the egocentric walk and estimated the personalized follower score accordingly.
  • #10 visualize the comparison of dynamics of group rankings computed a median of scores for each group of individual members
  • #11 To measure any possible dependence between two results, we take the popularity standings as one statistical variable and vote outcome as another variable. Pearson coefficients show that there is a strong linear correlation between the popularity rankings of both party and member accounts for European election time-line. This is while linear dependence for european election time-line is not as high as general election. What is interesting for us is the result of Spearman ranks, which show quite strong dependence for both European and general election time-lines. The most interesting result is how the choice of ranking of member accounts shows strong dependence with outcome of general election, while the choice of ranking of party accounts shows strong dependence with outcome of European elections. To justify the latter results, we explain possible behavioural traits that might contribute to these observations. First of all, the density of interactions with party accounts during European parliamentary time-line as we saw in figure 1 is rather high. This is while authority reported low voter turn over (reported 51,07 % of eligible population) , which would explain how a proportion of voters followed the election event on-line, instead of going to ballots. Comparatively, the gain in increased interaction with member accounts during the general elections could explain increased participation from voters (reported 85,81% of eligible population)5 . Second, result shows how activity of member accounts could have significant impact on explaining result of elections, as compared to solely focusing on the party accounts. This is also verifiable by the fact that the number of Swedish politicians using Twitter, are increasing thus attracting public attention and interaction.
  • #14 Comments: The approach with link prediction seems very promising and to measure density of talk. Was the mentions counted only if the politician had a Twitter account? Yes. It is a bit problematic in general that various parties have different levels of engagement on Twitter. Not so much to do about that but could be nice to take a closer look at the distribution. Track from hashtag instead of profile itself 2. I think we have neglected journalists and activists role in our picture. Perhaps including them could give interesting results. Do you have a list of these people with their twitter profile links ? I’m not sure there is a complete list but I have contact with a researcher looking into this, think they had a sample. Will look this up.