Language of Politics on Twitter
Summer School in AI
American University Beirut
June 16, 2015
Yelena Mejova
@yelenamm
Social Computing Group
Qatar Computing Research Institute, HBKU
political
twitter
analysis
Roadmap
• lets talk politics (sampling)
• political leaning
– human classification
– text-based classification
– network-based classification
• look who’s talking (users)
• predicting elections!
US politics
• Most research done so far
• Clear left/right distinction
• Popular political figures
• High(ish) Twitter engagement REPUBLICAN
(right)
DEMOCRAT
(left)
• Sampling Twitter for political speech
– general keywords: #current
– event keywords: #debate08, #tweetdebate
– people: obama, romney, merkel
– parties: democrat, republican, pirate
– accounts: wefollow, twellow
– news stories, known URL retweets
• Caveats
– requires expert knowledge
– known best after the event
– selection bias (who do you want to ignore?)
topical sampling
bootstrapping
1. start with a few key words
2. find tweets that have these words
3. get more words out of these tweets
• seed sample with known political hashtags
– #p2 – Progressives 2.0
– #tcot – Top Conservatives on Twitter
• find hashtags which co-occurred with them,
using Jaccard similarity
bootstrapping
tweets mentioning both
tweets mentioning either
bootstrapping
Predicting the political alignment of twitter users
@vagabondjack Conover et al. @ SocialCom (2011)
got your #tag!
hashtag week party
aggregated user volume for (h,w)aggregated user volume for (*,w)
• Given set of users with known leaning:
Political hashtag hijacking in the US Hadgu, Garimella, Weber @ WWW (2013)
[some figures from authors’ original slides]
Crimean conflict
Крым
comparing tweets by users with
Ukrainian or Russian as profile language
most distinguishing hashtags
Language Plurality in Twitter Political Speech Mejova, Boynton @ ICCSS (2015)
1. Crowdsourcing
2. Text (text classification)
3. Network (label propagation)
political leaning classification
human classification
crowdsourcing
mechanical turk
crowdsourcing
• break the task into micro-tasks (N/Y question)
• have many people answer for a bit of money
• wisdom of crowds will give the right answer
crowdsourcing
text classification
Representing Text
• “Bag of words”, i.e.
Vector Space Model
break the document
into its constituent
words and put them
in a table
Representing Text
• Preprocessing
– Clean-up
• remove formatting, tables, HTML…
– Remove stopwords
• the, of, to, a, in, and, that, for, is
– Stem words
• get to a “stem” of a word
• cats -> cat, running -> run, uncomfortable -> uncomfort?
Representing Text
• Vector Space Model:
those lazy cats sleep and sleep
everywhere
D = (t1, wd1; t2, wd2; …, tv, wdv)
w: binary, count, TFIDF
lazy cat sleep everywhere …
1 1 2 1 …
TFIDF
term frequency – inverse document frequency
Problems
• Synonymy
– multiple words that have similar meanings
• Polysemy
– words that have more than one meaning
EYE DROPS OFF SHELF
PROSTITUTES APPEAL TO POPE
KIDS MAKE NUTRITIOUS SNACKS
STOLEN PAINTING FOUND BY TREE
LUNG CANCER IN WOMEN MUSHROOMS
QUEEN MARY HAVING BOTTOM SCRAPED
DEALERS WILL HEAR CAR TALK AT NOON
MINERS REFUSE TO WORK AFTER DEATH
MILK DRINKERS ARE TURNING TO POWDER
DRUNK GETS NINE MONTHS IN VIOLIN CASE
GRANDMOTHER OF EIGHT MAKES HOLE IN ONE
HOSPITALS ARE SUED BY 7 FOOT DOCTORS
LAWMEN FROM MEXICO BARBECUE GUESTS
TWO SOVIET SHIPS COLLIDE, ONE DIES
ENRAGED COW INJURES FARMER WITH AX
LACK OF BRAINS HINDERS RESEARCH
RED TAPE HOLDS UP NEW BRIDGE
SQUAD HELPS DOG BITE VICTIM
IRAQI HEAD SEEKS ARMS
HERSHEY BARS PROTEST
text classification
classifier
document
label
• is it spam?
• is it important?
• is it happy?
• is it true?
• is it a flight ticket?
classifier
document
label
• is it written well?
• is it about politics?
• is it a bully?
• is it fake?
• is it a joke?
classifiers
naïve bayes
decision trees
support vector machines
logistic regression
perceptron
neural networks
k-nearest neighbor
naïve bayes classifier
• We want to know probability of a class given an
instance represented by a feature
vector. By Bayes’ Theorem:
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
constant no matter C
joint probability
naïve bayes classifier
• Expand the joint probability using the chain rule
• But to simplify, we use a naïve assumption of
conditional independence for each feature
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
naïve bayes classifier
• Finally, the conditional distribution over class C
scaling factor
probability of class C given a
document with some features
prior of the class
frequency based probability
of features in that class C
support vector machine
• Finds a hyperplane in high-dimensional space that
maximizes the distance to the nearest training point
of any class
https://en.wikipedia.org/wiki/Support_vector_machine
political leaning classification
Predicting the political
alignment of twitter users
@vagabondjack Conover,
Gonçalves, Ratkiewicz,
Flammini, Menczer @
SocialCom (2011)
Is a user politically left or right?
actual class
A
B
predicted class
A B
Classifier: Support Vector Machine
network-based classification
network
adjacency matrix
0 1
0 4
1 2
1 3
1 4
2 3
3 4
adjacency list
network label propagation
at each step
update each
node’s label
based on its
neighbors
• Label propagation
– Initialize cluster
membership arbitrarily
– Iteratively update each
node’s label according
to the majority of its
neighbors
– Ties are broken
randomly
• Cluster assignment by
majority cluster label
(using manually labeled
data)
political leaning classification
retweet network
Twitter polarity classification with label propagation
over lexical links and the follower graph
@speriosu Speriosu, Sudan, Upadhyay, Baldridge @ EMNLP (2011)
political leaning classificationknown
known
automatically
labeled
bonus: news are users too!
news polarization
Visualizing media bias through Twitter
@JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012)
Jaccard similarity
of their audience
(co-subscribers)
distance between
two media
overlap in common audience (followers on Twitter)
look who’s talking
look who’s talking
Vocal Minority versus Silent Majority:
Discovering the Opinions of the Long Tail
@enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
number of tweets per user
look who’s talking
Vocal Minority versus Silent Majority:
Discovering the Opinions of the Long Tail
@enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
GOP primary season on twitter: popular political sentiment in social media
@yelenamm Mejova, Srinivasan, Boynton @ WSDM (2013)
look who’s talking
• Truthiness is a quality
characterizing a "truth"
that a person making
an argument or
assertion claims to
know intuitively "from
the gut" or because it
"feels right" without
regard to evidence,
logic, intellectual
examination, or facts.
Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
look who’s talking
Classifying memes (hashtags)
for astroturf
(fake grass roots movements)
Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
look who’s talking
most useful:
network features
Truthy
project by Indiana University
http://truthy.indiana.edu/
look who’s talking
look who’s talking
#ampat @PeaceKaren_25 &
@HopeMarie_25
gopleader.gov Chris Coons
#Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…”
TRUTHYLEGITIMATE
elections
Science vol 338
sentiment classification
classifier
tweet
on a topic
positive vs negative
Trained Classifiers Sentiment Lexicons
can “tune” for specific topic and data
but expensive
can use “out of the box”
but may not work for every topic
political discussions: debates
• Mean valence:
– Obama: -2.09
– McCain: -5.64
Characterizing Debate Performance via Aggregated Twitter Sentiment
@ndiakopoulos Diakopoulos, Shamma
@ CHI (2010)
an emotional story
volume positive - negative
• 2009 German federal elections
elections
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010)
“The mere number
of tweets reflects
voter preferences
and comes close to
traditional election
polls”
elections
Why the Pirate Party won the German election of 2009 or the trouble with predictions: A
response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What
140 characters reveal about political sentiment"
@ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012)
“arbitrary choices”
If results of polls played a role in
deciding upon the inclusion of particular
parties, the TSSW method is dependent
on public opinion surveys
Choice of Parties Choice of Dates
prediction analysis […] between [13.9]
and [27.9], the day of the election,
produces a MAE of of 2.13, significantly
higher than the MAE for TSSW
• 2012 US Republican Primary Debates
• Predicting polls swings around televised debates:
– 104 predictions overall
elections
GOP primary season on twitter: popular political sentiment in social media
@yelenamm Mejova, Srinivasan, Boynton @ WSDM (2013)
Both volume or
sentiment classification
are same than random
elections
single variable logistic regression models multi-variable logistic regression models
strong baselines!
having followers (in your own party?)
focusing on centrist issues
graph structure and content
significantly improve accuracy
The Party Is Over Here: Structure and Content in the 2010 Election
Livne, Simmons, Adar, Adamic @ ICWSM (2011)
• Non-US elections:
– Irish: On using twitter to monitor political sentiment and
predict election results, Bermingham, Smeaton (2011)
• "Our approach however has demonstrated an error which is not
competitive with the traditional polling methods.”
– Dutch: Predicting the 2011 Dutch senate election results with
twitter, Sang, Bos (2012)
• Uses polls for demographic imbalances, yet performance still below
traditional polls
– Singapore: Tweets and votes: A study of the 2011 singapore
general election, Skoric, Poor, Achananuparp, Lim, Jiang (2012)
• Not as accurate as traditional polls, performance at local government
levels
– many more coming out each day!
elections
! " #$" %#&' ! ! " (
! "#$%&' (#)#&' %* +, (- %' . (/ - %' ' #"
!"# "$%&' (&)*+,$' -. *&/ -+0",$"' 1%&2%"13&45"$$+-
6. 1"+*&7. 8' 9: ;+**' &<!"#$%&'()=& >1";?&' (&@;"+0' &<AB. "1=
/ . 1. 3"' $"%&4?&C+$. D. %&<!*'+,-./0*'1'-=& E +**+%*+8&F' **+3+&<>A: =
)1"&C2%$. (. -. G&<!02,/3-*=& E+**+%*+8&F' **+3+&<>A: =
Metaxas et al. @ SocialCom (2011)
• Data from social media are fundamentally
different than data from natural phenomena
– people change their behavior next time around
– spammers & activists will try to take advantage
• From a testable theory on why and when it
predicts (avoid self-deception!)
• (maybe) Learn from professional pollsters
– tweet ≠ user
– user ≠ eligible voter
– eligible voter ≠ voter
How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011)
elections
but what can we do?
help campaigners reach more people
predict people’s political leaning
help understand reasons for affiliation
recommend politicians, news, friends
detect sudden strong sentiment about a topic
detect polarization (users & news)
views of issues from around the world
light summer reading 
• M. D. Conover, B. Gonçalves, J. Ratkiewicz, A. Flammini, and F. Menczer,
“Predicting the political alignment of twitter users,” in Privacy, security, risk and
trust (passat), 2011 IEEE Third International Conference on Social Computing
(SocialCom), 2011, pp. 192–199.
• M. D. Conover, J. Ratkiewicz, M. Francisco, B. Goncalves, F. Menczer, and A.
Flammini, “Political Polarization on Twitter,” International Conference on Weblogs
and Social Media (ICWSM), 2011.
• M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge, “Twitter polarity
classification with label propagation over lexical links and the follower graph,” in
Proceedings of the First workshop on Unsupervised Learning in NLP, 2011, pp. 53–
63.
• I. Weber, V. R. K. Garimella, and A. Teka, “Political hashtag trends,” in in Advances
in Information Retrieval, Springer, 2013, pp. 857–860.
• A. T. Hadgu, K. Garimella, and I. Weber, “Political hashtag hijacking in the US,” in
Proceedings of the 22nd international conference on World Wide Web companion,
2013, pp. 55–56.
• M. Pennacchiotti and A.-M. Popescu, “Democrats, republicans and starbucks
afficionados: user classification in twitter,” in Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining, 2011, pp. 430–
438.
• N. A. Diakopoulos and D. A. Shamma, “Characterizing Debate Performance via
Aggregated Twitter Sentiment,” Conference on Human Factors in Computing
Systems (CHI), 2010.
• L. Chen, W. Wang, and A. P. Sheth, “Are twitter users equal in predicting elections?
a study of user groups in predicting 2012 US republican presidential primaries,” in
Social Informatics, Springer, 2012, pp. 379–392.
• J. An, M. Cha, K. P. Gummadi, J. Crowcroft, and D. Quercia, “Visualizing media bias
through Twitter,” Association for the Advancement of Artificial Intelligence (AAAI),
Technical WS-12-11, 2012.
• E. Mustafaraj, S. Finn, C. Whitlock, and P. T. Metaxas, “Vocal Minority versus Silent
Majority: Discovering the Opinions of the Long Tail,” in International Conference
on Social Computing, 2011, pp. 103–110.
• J. Ratkiewicz, M. D. Conover, M. Meiss, B. Goncalves, A. Flammini, and F. M.
Menczer, “Detecting and Tracking Political Abuse in Social Media,” International
Conference on Weblogs and Social Media (ICWSM), 2011.
• A. Livne, M. Simmons, E. Adar, and L. Adamic, “The Party Is Over Here: Structure
and Content in the 2010 Election,” International Conference on Weblogs and Social
Media (ICWSM), 2011.
• A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting Elections
with Twitter: What 140 Characters Reveal about Political Sentiment,” Association
for the Advancement of Artificial Intelligence Conference (AAAI), 2010.
• P. Metaxas, E. Mustafaraj, and D. Gayo-Avello, “How (Not) To Predict Elections,”
International Conference on Social Computing, 2011.
• A. Jungherr, P. Jürgens, and H. Schoen, “Why the pirate party won the german
election of 2009 or the trouble with predictions: A response to Tumasjan, a.,
Sprenger, to, Sander, pg, & Welpe, im ‘predicting elections with twitter: What 140
characters reveal about political sentiment’,” Social Science Computer Review, vol.
30, no. 2, pp. 229–234, 2012.
• I. Weber, V. R. K. Garimella, and A. Batayneh, “Secular vs. Islamist polarization in
Egypt on Twitter.” ASONAM, 2013.
Surveys
• D. Gayo-Avello, “‘ I Wanted to Predict Elections with Twitter and all I got was this
Lousy Paper’--A Balanced Survey on Election Prediction using Twitter Data,” arXiv
preprint arXiv:1204.6441, 2012.
• D. Gayo-Avello, “A meta-analysis of state-of-the-art electoral prediction from
Twitter data,” Social Science Computer Review, 2013.

Language of Politics on Twitter - 03 Analysis

  • 1.
    Language of Politicson Twitter Summer School in AI American University Beirut June 16, 2015 Yelena Mejova @yelenamm Social Computing Group Qatar Computing Research Institute, HBKU
  • 2.
  • 3.
    Roadmap • lets talkpolitics (sampling) • political leaning – human classification – text-based classification – network-based classification • look who’s talking (users) • predicting elections!
  • 4.
    US politics • Mostresearch done so far • Clear left/right distinction • Popular political figures • High(ish) Twitter engagement REPUBLICAN (right) DEMOCRAT (left)
  • 5.
    • Sampling Twitterfor political speech – general keywords: #current – event keywords: #debate08, #tweetdebate – people: obama, romney, merkel – parties: democrat, republican, pirate – accounts: wefollow, twellow – news stories, known URL retweets • Caveats – requires expert knowledge – known best after the event – selection bias (who do you want to ignore?) topical sampling
  • 6.
    bootstrapping 1. start witha few key words 2. find tweets that have these words 3. get more words out of these tweets
  • 7.
    • seed samplewith known political hashtags – #p2 – Progressives 2.0 – #tcot – Top Conservatives on Twitter • find hashtags which co-occurred with them, using Jaccard similarity bootstrapping tweets mentioning both tweets mentioning either
  • 8.
    bootstrapping Predicting the politicalalignment of twitter users @vagabondjack Conover et al. @ SocialCom (2011)
  • 9.
    got your #tag! hashtagweek party aggregated user volume for (h,w)aggregated user volume for (*,w) • Given set of users with known leaning: Political hashtag hijacking in the US Hadgu, Garimella, Weber @ WWW (2013) [some figures from authors’ original slides]
  • 10.
    Crimean conflict Крым comparing tweetsby users with Ukrainian or Russian as profile language most distinguishing hashtags Language Plurality in Twitter Political Speech Mejova, Boynton @ ICCSS (2015)
  • 11.
    1. Crowdsourcing 2. Text(text classification) 3. Network (label propagation) political leaning classification
  • 12.
  • 13.
  • 14.
    crowdsourcing • break thetask into micro-tasks (N/Y question) • have many people answer for a bit of money • wisdom of crowds will give the right answer
  • 15.
  • 16.
  • 17.
    Representing Text • “Bagof words”, i.e. Vector Space Model break the document into its constituent words and put them in a table
  • 18.
    Representing Text • Preprocessing –Clean-up • remove formatting, tables, HTML… – Remove stopwords • the, of, to, a, in, and, that, for, is – Stem words • get to a “stem” of a word • cats -> cat, running -> run, uncomfortable -> uncomfort?
  • 19.
    Representing Text • VectorSpace Model: those lazy cats sleep and sleep everywhere D = (t1, wd1; t2, wd2; …, tv, wdv) w: binary, count, TFIDF lazy cat sleep everywhere … 1 1 2 1 …
  • 21.
    TFIDF term frequency –inverse document frequency
  • 22.
    Problems • Synonymy – multiplewords that have similar meanings • Polysemy – words that have more than one meaning
  • 23.
    EYE DROPS OFFSHELF PROSTITUTES APPEAL TO POPE KIDS MAKE NUTRITIOUS SNACKS STOLEN PAINTING FOUND BY TREE LUNG CANCER IN WOMEN MUSHROOMS QUEEN MARY HAVING BOTTOM SCRAPED DEALERS WILL HEAR CAR TALK AT NOON MINERS REFUSE TO WORK AFTER DEATH MILK DRINKERS ARE TURNING TO POWDER DRUNK GETS NINE MONTHS IN VIOLIN CASE GRANDMOTHER OF EIGHT MAKES HOLE IN ONE HOSPITALS ARE SUED BY 7 FOOT DOCTORS LAWMEN FROM MEXICO BARBECUE GUESTS TWO SOVIET SHIPS COLLIDE, ONE DIES ENRAGED COW INJURES FARMER WITH AX LACK OF BRAINS HINDERS RESEARCH RED TAPE HOLDS UP NEW BRIDGE SQUAD HELPS DOG BITE VICTIM IRAQI HEAD SEEKS ARMS HERSHEY BARS PROTEST
  • 24.
  • 25.
  • 26.
    • is itspam? • is it important? • is it happy? • is it true? • is it a flight ticket? classifier document label • is it written well? • is it about politics? • is it a bully? • is it fake? • is it a joke?
  • 27.
    classifiers naïve bayes decision trees supportvector machines logistic regression perceptron neural networks k-nearest neighbor
  • 28.
    naïve bayes classifier •We want to know probability of a class given an instance represented by a feature vector. By Bayes’ Theorem: https://en.wikipedia.org/wiki/Naive_Bayes_classifier constant no matter C joint probability
  • 29.
    naïve bayes classifier •Expand the joint probability using the chain rule • But to simplify, we use a naïve assumption of conditional independence for each feature https://en.wikipedia.org/wiki/Naive_Bayes_classifier
  • 30.
    naïve bayes classifier •Finally, the conditional distribution over class C scaling factor probability of class C given a document with some features prior of the class frequency based probability of features in that class C
  • 31.
    support vector machine •Finds a hyperplane in high-dimensional space that maximizes the distance to the nearest training point of any class https://en.wikipedia.org/wiki/Support_vector_machine
  • 32.
    political leaning classification Predictingthe political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011) Is a user politically left or right? actual class A B predicted class A B Classifier: Support Vector Machine
  • 33.
  • 34.
    network adjacency matrix 0 1 04 1 2 1 3 1 4 2 3 3 4 adjacency list
  • 35.
    network label propagation ateach step update each node’s label based on its neighbors
  • 36.
    • Label propagation –Initialize cluster membership arbitrarily – Iteratively update each node’s label according to the majority of its neighbors – Ties are broken randomly • Cluster assignment by majority cluster label (using manually labeled data) political leaning classification retweet network
  • 37.
    Twitter polarity classificationwith label propagation over lexical links and the follower graph @speriosu Speriosu, Sudan, Upadhyay, Baldridge @ EMNLP (2011) political leaning classificationknown known automatically labeled
  • 38.
    bonus: news areusers too!
  • 39.
    news polarization Visualizing mediabias through Twitter @JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012) Jaccard similarity of their audience (co-subscribers) distance between two media overlap in common audience (followers on Twitter)
  • 40.
  • 41.
    look who’s talking VocalMinority versus Silent Majority: Discovering the Opinions of the Long Tail @enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011) number of tweets per user
  • 42.
    look who’s talking VocalMinority versus Silent Majority: Discovering the Opinions of the Long Tail @enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
  • 43.
    GOP primary seasonon twitter: popular political sentiment in social media @yelenamm Mejova, Srinivasan, Boynton @ WSDM (2013) look who’s talking
  • 44.
    • Truthiness isa quality characterizing a "truth" that a person making an argument or assertion claims to know intuitively "from the gut" or because it "feels right" without regard to evidence, logic, intellectual examination, or facts. Detecting and Tracking Political Abuse in Social Media Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011) look who’s talking
  • 45.
    Classifying memes (hashtags) forastroturf (fake grass roots movements) Detecting and Tracking Political Abuse in Social Media Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011) look who’s talking most useful: network features
  • 46.
    Truthy project by IndianaUniversity http://truthy.indiana.edu/ look who’s talking
  • 47.
    look who’s talking #ampat@PeaceKaren_25 & @HopeMarie_25 gopleader.gov Chris Coons #Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…” TRUTHYLEGITIMATE
  • 48.
  • 49.
  • 50.
    classifier tweet on a topic positivevs negative Trained Classifiers Sentiment Lexicons can “tune” for specific topic and data but expensive can use “out of the box” but may not work for every topic
  • 51.
    political discussions: debates •Mean valence: – Obama: -2.09 – McCain: -5.64 Characterizing Debate Performance via Aggregated Twitter Sentiment @ndiakopoulos Diakopoulos, Shamma @ CHI (2010) an emotional story volume positive - negative
  • 52.
    • 2009 Germanfederal elections elections Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010) “The mere number of tweets reflects voter preferences and comes close to traditional election polls”
  • 53.
    elections Why the PirateParty won the German election of 2009 or the trouble with predictions: A response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What 140 characters reveal about political sentiment" @ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012) “arbitrary choices” If results of polls played a role in deciding upon the inclusion of particular parties, the TSSW method is dependent on public opinion surveys Choice of Parties Choice of Dates prediction analysis […] between [13.9] and [27.9], the day of the election, produces a MAE of of 2.13, significantly higher than the MAE for TSSW
  • 54.
    • 2012 USRepublican Primary Debates • Predicting polls swings around televised debates: – 104 predictions overall elections GOP primary season on twitter: popular political sentiment in social media @yelenamm Mejova, Srinivasan, Boynton @ WSDM (2013) Both volume or sentiment classification are same than random
  • 55.
    elections single variable logisticregression models multi-variable logistic regression models strong baselines! having followers (in your own party?) focusing on centrist issues graph structure and content significantly improve accuracy The Party Is Over Here: Structure and Content in the 2010 Election Livne, Simmons, Adar, Adamic @ ICWSM (2011)
  • 56.
    • Non-US elections: –Irish: On using twitter to monitor political sentiment and predict election results, Bermingham, Smeaton (2011) • "Our approach however has demonstrated an error which is not competitive with the traditional polling methods.” – Dutch: Predicting the 2011 Dutch senate election results with twitter, Sang, Bos (2012) • Uses polls for demographic imbalances, yet performance still below traditional polls – Singapore: Tweets and votes: A study of the 2011 singapore general election, Skoric, Poor, Achananuparp, Lim, Jiang (2012) • Not as accurate as traditional polls, performance at local government levels – many more coming out each day! elections
  • 57.
    ! " #$"%#&' ! ! " ( ! "#$%&' (#)#&' %* +, (- %' . (/ - %' ' #" !"# "$%&' (&)*+,$' -. *&/ -+0",$"' 1%&2%"13&45"$$+- 6. 1"+*&7. 8' 9: ;+**' &<!"#$%&'()=& >1";?&' (&@;"+0' &<AB. "1= / . 1. 3"' $"%&4?&C+$. D. %&<!*'+,-./0*'1'-=& E +**+%*+8&F' **+3+&<>A: = )1"&C2%$. (. -. G&<!02,/3-*=& E+**+%*+8&F' **+3+&<>A: = Metaxas et al. @ SocialCom (2011)
  • 58.
    • Data fromsocial media are fundamentally different than data from natural phenomena – people change their behavior next time around – spammers & activists will try to take advantage • From a testable theory on why and when it predicts (avoid self-deception!) • (maybe) Learn from professional pollsters – tweet ≠ user – user ≠ eligible voter – eligible voter ≠ voter How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011) elections
  • 59.
    but what canwe do? help campaigners reach more people predict people’s political leaning help understand reasons for affiliation recommend politicians, news, friends detect sudden strong sentiment about a topic detect polarization (users & news) views of issues from around the world
  • 60.
  • 61.
    • M. D.Conover, B. Gonçalves, J. Ratkiewicz, A. Flammini, and F. Menczer, “Predicting the political alignment of twitter users,” in Privacy, security, risk and trust (passat), 2011 IEEE Third International Conference on Social Computing (SocialCom), 2011, pp. 192–199. • M. D. Conover, J. Ratkiewicz, M. Francisco, B. Goncalves, F. Menczer, and A. Flammini, “Political Polarization on Twitter,” International Conference on Weblogs and Social Media (ICWSM), 2011. • M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge, “Twitter polarity classification with label propagation over lexical links and the follower graph,” in Proceedings of the First workshop on Unsupervised Learning in NLP, 2011, pp. 53– 63. • I. Weber, V. R. K. Garimella, and A. Teka, “Political hashtag trends,” in in Advances in Information Retrieval, Springer, 2013, pp. 857–860. • A. T. Hadgu, K. Garimella, and I. Weber, “Political hashtag hijacking in the US,” in Proceedings of the 22nd international conference on World Wide Web companion, 2013, pp. 55–56. • M. Pennacchiotti and A.-M. Popescu, “Democrats, republicans and starbucks afficionados: user classification in twitter,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 430– 438.
  • 62.
    • N. A.Diakopoulos and D. A. Shamma, “Characterizing Debate Performance via Aggregated Twitter Sentiment,” Conference on Human Factors in Computing Systems (CHI), 2010. • L. Chen, W. Wang, and A. P. Sheth, “Are twitter users equal in predicting elections? a study of user groups in predicting 2012 US republican presidential primaries,” in Social Informatics, Springer, 2012, pp. 379–392. • J. An, M. Cha, K. P. Gummadi, J. Crowcroft, and D. Quercia, “Visualizing media bias through Twitter,” Association for the Advancement of Artificial Intelligence (AAAI), Technical WS-12-11, 2012. • E. Mustafaraj, S. Finn, C. Whitlock, and P. T. Metaxas, “Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail,” in International Conference on Social Computing, 2011, pp. 103–110. • J. Ratkiewicz, M. D. Conover, M. Meiss, B. Goncalves, A. Flammini, and F. M. Menczer, “Detecting and Tracking Political Abuse in Social Media,” International Conference on Weblogs and Social Media (ICWSM), 2011. • A. Livne, M. Simmons, E. Adar, and L. Adamic, “The Party Is Over Here: Structure and Content in the 2010 Election,” International Conference on Weblogs and Social Media (ICWSM), 2011. • A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Association for the Advancement of Artificial Intelligence Conference (AAAI), 2010.
  • 63.
    • P. Metaxas,E. Mustafaraj, and D. Gayo-Avello, “How (Not) To Predict Elections,” International Conference on Social Computing, 2011. • A. Jungherr, P. Jürgens, and H. Schoen, “Why the pirate party won the german election of 2009 or the trouble with predictions: A response to Tumasjan, a., Sprenger, to, Sander, pg, & Welpe, im ‘predicting elections with twitter: What 140 characters reveal about political sentiment’,” Social Science Computer Review, vol. 30, no. 2, pp. 229–234, 2012. • I. Weber, V. R. K. Garimella, and A. Batayneh, “Secular vs. Islamist polarization in Egypt on Twitter.” ASONAM, 2013. Surveys • D. Gayo-Avello, “‘ I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper’--A Balanced Survey on Election Prediction using Twitter Data,” arXiv preprint arXiv:1204.6441, 2012. • D. Gayo-Avello, “A meta-analysis of state-of-the-art electoral prediction from Twitter data,” Social Science Computer Review, 2013.

Editor's Notes

  • #8 U – union A – intersection Impossible to know what you are excluding S & T are sets of tweets containing a hashtag of interest
  • #10 Leaning for a hashtag h during the week w towards a part p is a normalized proportion of aggregated user volume of h in that week to aggregated volume of all hashtags in that week. Similar to valence measure in Conover et al. User volume smoothing term, time-dependence
  • #11 In Ukraine Russian is widely spoken, but not all those living in Ukraine would consider it their first language. Thus, we examine our collection gathered using keyword KRYM (Crimea in Russian) and divide the users in those indicating their first language as Ukrainian or Russian. We then compute hashtag frequencies for each group of users, and then subtract these frequencies to get the most distinguishing hashtags. You can see the partisan ones here. Each side was neatly divided into viewing Crimea as a part of their country. So, even though the same language was used (Russian), the first language of the users divided the stances on this issue.
  • #21 Querying: calculate distance Problem:
  • #23 synonyms to “big” polysemous to “Crane” - a bird, a type of construction equipment, to strain out one's neck to “apple”
  • #24 JUVENILE COURT TO TRY SHOOTING DEFENDANT COMPLAINTS ABOUT NBA REFEREES GROWING UGLY PANDA MATING FAILS; VETERINARIAN TAKES OVER MAN EATING PIRANHA MISTAKENLY SOLD AS PET FISH ASTRONAUT TAKES BLAME FOR GAS IN SPACECRAFT QUARTER OF A MILLION CHINESE LIVE ON WATER INCLUDE YOUR CHILDREN WHEN BAKING COOKIES OLD SCHOOL PILLARS ARE REPLACED BY ALUMNI
  • #35 http://www.geeksforgeeks.org/graph-and-its-representations/
  • #37 Using retweet network where there is an undirected link between two users if either user mentions the other during the analysis period Clusters: accept the majority cluster label Adjusted Rand Index: similarity of two cluster label assignments (-1 when totally disagree and +1 when totally agree) Clusters + Tags: topological information with 19 hashtags selected using Hall’s feature selection algorithm
  • #40 ADA: Americans for Democratic Action score, calculated based on various quantities such as the number of times a media outlet cites various think-tanks and other policy groups
  • #42 2010 US Senate special election in Massachusetts Silent majority & vocal minority tweet differently (different agendas?) Spamming, fake grassroots movements
  • #45 http://www.cbsnews.com/stories/2006/12/12/opinion/meyer/main2250923.shtml
  • #46 http://www.cbsnews.com/stories/2006/12/12/opinion/meyer/main2250923.shtml
  • #47 http://www.cbsnews.com/stories/2006/12/12/opinion/meyer/main2250923.shtml
  • #48 Dashed lines: retweets, Yellow: mentions #ampat – retweeted between two accounts who seemed to be owned by the same person @PeaceKaren_25 (and @HopeMarie_25) – two colluding accounts gopleader.gov – promoted by the two *_25 accounts above Chris Coons – a tweet smearing Chris Coons using bot accounts #Truthy – injected by NPR Science Friday radio program @senjohnmccain -- retweets from @ladygaga (don’t ask don’t tell) and mentions
  • #52 Green: number of positive minus negative tweets per minute, Grey: 3 minute moving average of total tweet volume first presidential debate 2008 #current, #debate08, #tweetdebate Labeled using AMT
  • #53 LIWC – Linguistic Inquiry and Word Count
  • #56 Same-party – indicating whether the party of the candidate is the same as the party that last held the seat. When applied to their data, the “all” classifier achieves 77.7% accuracy
  • #59 method of prediction should be an algorithm finalized before the elections: (input) how Social Media data are to be collected, including the dates of data collection, (filter) the way in which the cleanup of the data is to be performed (e.g., the selection of keywords relevant to the election), (method) the algorithms to be applied on the data along with their input parameters, and (output) the semantics under which the results are to be interpreted Data from social media are fundamentally different than data from natural phenomena people will change their behavior the next time around spammers & activists will try to take advantage