Language of Politics on Twitter
Summer School in AI
American University Beirut
June 16, 2015
Yelena Mejova
@yelenamm
Social Computing Group
Qatar Computing Research Institute, HBKU
political
twitter
analysis
Users
individuals
news
organizations
bots
…
#hashtags
word or phrase preceded by a hash mark (#),
used within a message to identify a keyword
or topic of interest and facilitate a search for it
links
all links are shortened by Twitter to form t.co/…
shorter
control for spam, malware, phishing
collect clickthrough information
MEME
an idea, behavior, or style that spreads
from person to person within a culture
Richard Dawkins
MEME
Monthly active users
302 million (4/28/2015)
Total number of Twitter registered users
“about a billion” (9/16/13)
Unique monthly visitors to Twitter.com (desktop)
36 million (10/3/13)
Daily active twitter users
100 million (10/3/13)
Number of Twitter accounts that have
ever sent a tweet
550 million (4/14/14)
TWITTER
TWITTER RESEARCH
Google Trends
users
tweets
relationships
Twitter API
https://dev.twitter.com/overview/documentation
users
try it yourself
• go to https://apigee.com/console/twitter
• select OAuth1 from Authentication and log in
using your Twitter account
Select
api.twitter.com/1.1
from Service
Click on the
on the left to see a
list of API methods
• select
• enter your Twitter handle into screen_name
and click
http://jsonviewer.stack.hu/
http://www.faceplusplus.com/demo-detect/
More info from picture
questions
where are you from?
are you male or female?
what job do you have?
when did you join?
how active are you?
what do you look like?
are you a bot?
tweets
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import sys
import urllib
# Go to http://dev.twitter.com and create an app.
# The consumer key and secret will be generated for you after
consumer_key = '4x8XS232ncHXewIOPa50eZZWz'
consumer_secret = '0rjF9c34QgjK6nlL9zSpptAmVntDDsXRKV5JS3sQ0bi15flq5Y'
# After the step above, you will be redirected to your app's page.
# Create an access token under the the "Your access token" section
access_token = '2958638362-6VIJ2S7zSX7ellLHvrFLbsJKBKimIDuk62O8ZNP'
access_token_secret='EwqIjYNJKDGhJskYHdMS8nX7dBqpxB94qmmarJL058B9I'
class StdOutListener(StreamListener):
""" A listener handles tweets are the received from the stream.
This is a basic listener that just prints received tweets to
stdout.
"""
def on_data(self, data):
print data[:-1]
return True
def on_error(self, status):
print status
Querying public stream
using python
(1)
https://tinyurl.com/aiss
15-gettweets
def auto_restart_stream(auth,listner,l_keywords):
while True:
try:
sapi = Stream(auth, l)
sapi.filter(track=l_keywords)
except:
#print 'Restarting ;)'
continue
if __name__ == '__main__':
keywords =
[u'Cátar',u'Catar',u'Katar',u'Katara',u'Kataras',u'Katari',u'Kataro',u'Qadar',u'Qatar',u'
‫,'ﺭﻃﻖ‬u'कतर',u'ਕਤਰ',u'卡塔尔',u'卡塔爾',u'카타르',u'‫,'קטאר‬u'कतार',u'કતાર''కతర్',u'ກາຕາ',u'カタ
ール',u'Κατάρ',u'Катар',u'Қатар',u'ատար',u'কাতার',u'ಕತಾರ್',u'ഖത്തർ',u'කටාර්',u'
กาตาร์',u'‫,'קַאטַאר‬u'கத்தார்',u'ប្រទេសកាតា',u'ကာတာနိုင်ငံ']
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auto_restart_stream(auth,l,keywords)
Querying public stream
using python
(2)
https://tinyurl.com/aiss
15-gettweets
{"created_at":"Wed May 13 11:44:24 +0000 2015","id":598453736839598080,"id_str":"598453736839598080","text":"Don't get star struck often but I
like this guy @Mo_Farah you the man boss! Much respect to you! #Doha #qatar http://t.co/wf8nc0C527","source":"u003ca
href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for
iPhoneu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_i
d_str":null,"in_reply_to_screen_name":null,"user":{"id":788413,"id_str":"788413","name":"Mohsin Ali","screen_name":"mohsin","location":"Doha,
Qatar","url":"http://mohsinali.com","description":"Digital story telling, infogrpahics, interactives, R&D, Emerging Technologies, Future Trends,
Innovation @ajlabs, Global Nomad, Likes Maps. LBA, DHA, BHA,
DOH","protected":false,"verified":false,"followers_count":2422,"friends_count":645,"listed_count":69,"favourites_count":889,"statuses_count":10756,
"created_at":"Thu Feb 22 11:11:01 +0000
2007","utc_offset":10800,"time_zone":"Riyadh","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background
_color":"C0DEED","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/462946198211407873/xWaKYtpF.jpeg","p
rofile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/462946198211407873/xWaKYtpF.jpeg","profile_backgr
ound_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333
333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/1249217364/n504379828_3076_normal.j
pg","profile_image_url_https":"https://pbs.twimg.com/profile_images/1249217364/n504379828_3076_normal.jpg","profile_banner_url":"https:
//pbs.twimg.com/profile_banners/788413/1399210132","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent
":null,"notifications":null},"geo":{"type":"Point","coordinates":[25.316197,51.498302]},"coordinates":{"type":"Point","coordinates":[51.498302,25.3161
97]},"place":{"id":"0181f32937df0de8","url":"https://api.twitter.com/1.1/geo/id/0181f32937df0de8.json","place_type":"admin","name":"Doha","
full_name":"Doha, Qatar","country_code":"QA","country":"u062fu0648u0644u0629
u0642u0637u0631","bounding_box":{"type":"Polygon","coordinates":[[[51.4477039,25.2216],[51.4477039,25.4263938],[51.630581,25.4263938],[5
1.630581,25.2216]]]},"attributes":{}},"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Doha","indices":[97,102]}
,{"text":"qatar","indices":[103,109]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Mo_Farah","name":"Mo
Farah","id":83855918,"id_str":"83855918","indices":[48,57]}],"symbols":[],"media":[{"id":598453717596119040,"id_str":"598453717596119040","indi
ces":[110,132],"media_url":"http://pbs.twimg.com/media/CE4ifEPUIAAhCsG.jpg","media_url_https":"https://pbs.twimg.com/media/CE4ifEPUIA
AhCsG.jpg","url":"http://t.co/wf8nc0C527","display_url":"pic.twitter.com/wf8nc0C527","expanded_url":"http://twitter.com/mohsin/status/59
8453736839598080/photo/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w
":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":598453717596119040,"id_str":"5984537
17596119040","indices":[110,132],"media_url":"http://pbs.twimg.com/media/CE4ifEPUIAAhCsG.jpg","media_url_https":"https://pbs.twimg.com
/media/CE4ifEPUIAAhCsG.jpg","url":"http://t.co/wf8nc0C527","display_url":"pic.twitter.com/wf8nc0C527","expanded_url":"http://twitter.com/
mohsin/status/598453736839598080/photo/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize"
:"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":fal
se,"filter_level":"low","lang":"en","timestamp_ms":"1431517464252"}
https://tinyurl.com/aiss15-tweetjson
http://jsonviewer.stack.hu/
JSON Tweet Object
JSON Tweet Object
http://jsonviewer.stack.hu/
JSON Tweet Object
http://jsonviewer.stack.hu/
JSON Tweet Object
http://jsonviewer.stack.hu/
import json
import codecs
from geopy import *
fin = open("rawTweets.txt",'r')
fout = open("parsedTweets.txt",'w')
line = fin.readline().rstrip()
while (line):
jdict = json.loads(line)
if jdict['coordinates'] != None or jdict['place'] != None:
# Coordinates
if jdict['coordinates'] != None:
longitude = jdict['coordinates']['coordinates'][0]
latitude = jdict['coordinates']['coordinates'][1]
fout.write(str(longitude)+'t’)
fout.write(str(latitude)+'t')
# Tweet id
fout.write(str(jdict['id'])+'t’)
# User screen name
fout.write(jdict['user']['screen_name'].encode("UTF-8")+'t’)
# Timestamp
fout.write(str(jdict['timestamp_ms'])+'t’)
# User's language
fout.write(jdict['user']['lang']+'t’)
# Text
fout.write(jdict['text'].encode("UTF-8").replace('n'," ").replace('rn',""))
fout.write('n')
line = fin.readline().rstrip()
fin.close()
fout.close()
Extracting individual
fields from JSON
https://tinyurl.com/aiss
15-cleanjson
Tab Separated Value (TSV) format
Language Model
http://tweetcloud.icodeforlove.com/
workshop 25
twitter 20
religion 17
interaction 12
online 12
dyad 9
research 9
accepted 7
…
Activity
http://www.tweetails.com/
Mentions
http://www.tweetails.com/
questions
what are you interested in?
how do you eat/sleep/work/hang out?
how happy are you?
what political opinions do you have?
what outside sources do you link to?
what new emerging topics are you mentioning?
how do you behave?
are you a bot?
network
network
nodes
edges
User Network
User Network
Follower Network
Mention Network
Mention Network
for hashtags
questions
how influential are you?
how influential are your connections?
who influences you?
what are people around you like?
do you bring together different communities?
how fast will you know about a piece of news?
are you an opinion leader?
are you a bot?
resources
https://dev.twitter.com/overview/documentation
https://apigee.com
try it in your favorite language
https://dev.twitter.com/overview/api/twitter-libraries
next
using Twitter data for
real-world political speech mining

Language of Politics on Twitter - 02 Twitter

  • 1.
    Language of Politicson Twitter Summer School in AI American University Beirut June 16, 2015 Yelena Mejova @yelenamm Social Computing Group Qatar Computing Research Institute, HBKU
  • 2.
  • 3.
  • 4.
    #hashtags word or phrasepreceded by a hash mark (#), used within a message to identify a keyword or topic of interest and facilitate a search for it
  • 5.
    links all links areshortened by Twitter to form t.co/… shorter control for spam, malware, phishing collect clickthrough information
  • 6.
    MEME an idea, behavior,or style that spreads from person to person within a culture Richard Dawkins MEME
  • 7.
    Monthly active users 302million (4/28/2015) Total number of Twitter registered users “about a billion” (9/16/13) Unique monthly visitors to Twitter.com (desktop) 36 million (10/3/13) Daily active twitter users 100 million (10/3/13) Number of Twitter accounts that have ever sent a tweet 550 million (4/14/14)
  • 9.
  • 10.
  • 11.
  • 13.
  • 14.
  • 17.
    try it yourself •go to https://apigee.com/console/twitter • select OAuth1 from Authentication and log in using your Twitter account
  • 18.
    Select api.twitter.com/1.1 from Service Click onthe on the left to see a list of API methods
  • 19.
    • select • enteryour Twitter handle into screen_name and click
  • 21.
  • 22.
  • 23.
    questions where are youfrom? are you male or female? what job do you have? when did you join? how active are you? what do you look like? are you a bot?
  • 24.
  • 28.
    #!/usr/bin/env python # -*-coding: utf-8 -*- from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import sys import urllib # Go to http://dev.twitter.com and create an app. # The consumer key and secret will be generated for you after consumer_key = '4x8XS232ncHXewIOPa50eZZWz' consumer_secret = '0rjF9c34QgjK6nlL9zSpptAmVntDDsXRKV5JS3sQ0bi15flq5Y' # After the step above, you will be redirected to your app's page. # Create an access token under the the "Your access token" section access_token = '2958638362-6VIJ2S7zSX7ellLHvrFLbsJKBKimIDuk62O8ZNP' access_token_secret='EwqIjYNJKDGhJskYHdMS8nX7dBqpxB94qmmarJL058B9I' class StdOutListener(StreamListener): """ A listener handles tweets are the received from the stream. This is a basic listener that just prints received tweets to stdout. """ def on_data(self, data): print data[:-1] return True def on_error(self, status): print status Querying public stream using python (1) https://tinyurl.com/aiss 15-gettweets
  • 29.
    def auto_restart_stream(auth,listner,l_keywords): while True: try: sapi= Stream(auth, l) sapi.filter(track=l_keywords) except: #print 'Restarting ;)' continue if __name__ == '__main__': keywords = [u'Cátar',u'Catar',u'Katar',u'Katara',u'Kataras',u'Katari',u'Kataro',u'Qadar',u'Qatar',u' ‫,'ﺭﻃﻖ‬u'कतर',u'ਕਤਰ',u'卡塔尔',u'卡塔爾',u'카타르',u'‫,'קטאר‬u'कतार',u'કતાર''కతర్',u'ກາຕາ',u'カタ ール',u'Κατάρ',u'Катар',u'Қатар',u'ատար',u'কাতার',u'ಕತಾರ್',u'ഖത്തർ',u'කටාර්',u' กาตาร์',u'‫,'קַאטַאר‬u'கத்தார்',u'ប្រទេសកាតា',u'ကာတာနိုင်ငံ'] l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) auto_restart_stream(auth,l,keywords) Querying public stream using python (2) https://tinyurl.com/aiss 15-gettweets
  • 30.
    {"created_at":"Wed May 1311:44:24 +0000 2015","id":598453736839598080,"id_str":"598453736839598080","text":"Don't get star struck often but I like this guy @Mo_Farah you the man boss! Much respect to you! #Doha #qatar http://t.co/wf8nc0C527","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_i d_str":null,"in_reply_to_screen_name":null,"user":{"id":788413,"id_str":"788413","name":"Mohsin Ali","screen_name":"mohsin","location":"Doha, Qatar","url":"http://mohsinali.com","description":"Digital story telling, infogrpahics, interactives, R&D, Emerging Technologies, Future Trends, Innovation @ajlabs, Global Nomad, Likes Maps. LBA, DHA, BHA, DOH","protected":false,"verified":false,"followers_count":2422,"friends_count":645,"listed_count":69,"favourites_count":889,"statuses_count":10756, "created_at":"Thu Feb 22 11:11:01 +0000 2007","utc_offset":10800,"time_zone":"Riyadh","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background _color":"C0DEED","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/462946198211407873/xWaKYtpF.jpeg","p rofile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/462946198211407873/xWaKYtpF.jpeg","profile_backgr ound_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333 333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/1249217364/n504379828_3076_normal.j pg","profile_image_url_https":"https://pbs.twimg.com/profile_images/1249217364/n504379828_3076_normal.jpg","profile_banner_url":"https: //pbs.twimg.com/profile_banners/788413/1399210132","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent ":null,"notifications":null},"geo":{"type":"Point","coordinates":[25.316197,51.498302]},"coordinates":{"type":"Point","coordinates":[51.498302,25.3161 97]},"place":{"id":"0181f32937df0de8","url":"https://api.twitter.com/1.1/geo/id/0181f32937df0de8.json","place_type":"admin","name":"Doha"," full_name":"Doha, Qatar","country_code":"QA","country":"u062fu0648u0644u0629 u0642u0637u0631","bounding_box":{"type":"Polygon","coordinates":[[[51.4477039,25.2216],[51.4477039,25.4263938],[51.630581,25.4263938],[5 1.630581,25.2216]]]},"attributes":{}},"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Doha","indices":[97,102]} ,{"text":"qatar","indices":[103,109]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Mo_Farah","name":"Mo Farah","id":83855918,"id_str":"83855918","indices":[48,57]}],"symbols":[],"media":[{"id":598453717596119040,"id_str":"598453717596119040","indi ces":[110,132],"media_url":"http://pbs.twimg.com/media/CE4ifEPUIAAhCsG.jpg","media_url_https":"https://pbs.twimg.com/media/CE4ifEPUIA AhCsG.jpg","url":"http://t.co/wf8nc0C527","display_url":"pic.twitter.com/wf8nc0C527","expanded_url":"http://twitter.com/mohsin/status/59 8453736839598080/photo/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w ":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":598453717596119040,"id_str":"5984537 17596119040","indices":[110,132],"media_url":"http://pbs.twimg.com/media/CE4ifEPUIAAhCsG.jpg","media_url_https":"https://pbs.twimg.com /media/CE4ifEPUIAAhCsG.jpg","url":"http://t.co/wf8nc0C527","display_url":"pic.twitter.com/wf8nc0C527","expanded_url":"http://twitter.com/ mohsin/status/598453736839598080/photo/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize" :"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":fal se,"filter_level":"low","lang":"en","timestamp_ms":"1431517464252"} https://tinyurl.com/aiss15-tweetjson
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    import json import codecs fromgeopy import * fin = open("rawTweets.txt",'r') fout = open("parsedTweets.txt",'w') line = fin.readline().rstrip() while (line): jdict = json.loads(line) if jdict['coordinates'] != None or jdict['place'] != None: # Coordinates if jdict['coordinates'] != None: longitude = jdict['coordinates']['coordinates'][0] latitude = jdict['coordinates']['coordinates'][1] fout.write(str(longitude)+'t’) fout.write(str(latitude)+'t') # Tweet id fout.write(str(jdict['id'])+'t’) # User screen name fout.write(jdict['user']['screen_name'].encode("UTF-8")+'t’) # Timestamp fout.write(str(jdict['timestamp_ms'])+'t’) # User's language fout.write(jdict['user']['lang']+'t’) # Text fout.write(jdict['text'].encode("UTF-8").replace('n'," ").replace('rn',"")) fout.write('n') line = fin.readline().rstrip() fin.close() fout.close() Extracting individual fields from JSON https://tinyurl.com/aiss 15-cleanjson
  • 36.
    Tab Separated Value(TSV) format
  • 37.
    Language Model http://tweetcloud.icodeforlove.com/ workshop 25 twitter20 religion 17 interaction 12 online 12 dyad 9 research 9 accepted 7 …
  • 38.
  • 39.
  • 40.
    questions what are youinterested in? how do you eat/sleep/work/hang out? how happy are you? what political opinions do you have? what outside sources do you link to? what new emerging topics are you mentioning? how do you behave? are you a bot?
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
    questions how influential areyou? how influential are your connections? who influences you? what are people around you like? do you bring together different communities? how fast will you know about a piece of news? are you an opinion leader? are you a bot?
  • 49.
  • 50.
  • 51.
  • 52.
    try it inyour favorite language https://dev.twitter.com/overview/api/twitter-libraries
  • 53.
    next using Twitter datafor real-world political speech mining

Editor's Notes

  • #6 https://support.twitter.com/entries/109623
  • #7 MEME an idea, behavior, or style that spreads from person to person within a culture Richard Dawkins
  • #8 http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  • #9 http://www.businessinsider.com/the-top-twitter-markets-in-the-world-2013-11 An amazing 41% of the online population in Saudi Arabia uses Twitter, a higher percentage than anywhere else in the world. Indonesia and the Philippines were close. Almost one-third of online audiences in India use Twitter. India, with 36.6 million people on the social network, is also Twitter's third-largest market right after the U.S. 14 countries — including Japan, the UK, South Africa, and Turkey — have heavier Twitter usage than the social network's home market, the U.S. 84.4 million Chinese Internet users report having used Twitter thanks to various hacks despite the fact that it's blocked in the country (along with just about every other Western social media service).
  • #26 Representational State Transfer (REST)
  • #46 http://mark-kay.net/2014/08/15/network-graph-of-twitter-followers/
  • #47 http://www.alex-hanna.com/tworkshops/lesson-7-mention-network-analysis/ Its complexity is quite remarkable, especially for only representing about 10 minutes of tweets. The larger nodes are those that have been mentioned more. The red edges are people who have interacted more than three times. So you see a pretty low incidence of interaction in this short time period, but a lot of mentions of elite users. You can also see a bit of a polarization developed around the two big nodes in the center, which are Obama and Romney. Once you run these analysis across time I’m sure more patterns will emerge.
  • #48 http://twittertoolsbook.com/10-awesome-twitter-analytics-visualization-tools/