SlideShare a Scribd company logo
1 of 54
Download to read offline
What do you do with
280 million tweets from the
2016 U.S. election?
Justin Littman
April 25, 2018
Overview
● Outline of the dataset
● Collecting the dataset
○ Social Feed Manager
● Sharing the dataset
○ TweetSets
● Uses of the dataset
● Plans for 2018 U.S. election
Outline of the dataset
Datasets
Filter stream:
● Candidates and key election
hashtags
● Democratic Convention
● GOP Convention
● First presidential debate
● Second presidential debate
● Third presidential debate
● Vice-presidential debate
● Election Day
User timelines:
● Democratic candidates
● Democratic Party
● Republican candidates
● Republican Party
Candidates and key election hashtags
● Track: election2016, election, clinton, kaine, trump, pence
● Follow: @realDonaldTrump, @HillaryClinton, @timkaine,
@mike_pence
● 251,077,140 tweets
● July 13, 2016 - November 10, 2016
Democratic Convention
● Track: philly convention, philadelphia convention,
democratic convention, dnc convention, #demsinphilly,
#dnc, #philly, #demconvention
● Follow: @DemConvention, @TheDemocrats
● 8,340,668 tweets
● July 22, 2016 - July 30, 2016
Democratic Candidates
● Accounts: @BernieSanders, @HillaryClinton,
@MartinOMalley, @SenSanders, @timkaine
● 22,251 tweets
● Collected every week
Tweet types
Most retweeted
Top tweeters
561k tweets, 15 followers
suspended
deleted
deleted
tweets primarily in Greek
577k tweets, last tweeted Nov 7, 2017
126k tweets, 5 followers
deleted
still tweeting (915k) at non-human rates
Top mentions
Where is @timkaine?
Top hashtags
Republicans clearly
out-hashtagged the
Democrats.
Top URLs
spam
spam
gone
gone
gone
gone
Collecting the dataset
Social Feed Manager (SFM)
● Open source software by GW Libraries.
● User interface for collecting, managing & exporting social
media data.
● Goal: Lower the technical barriers for collecting social
media data for academic research and archiving.
● Supports Twitter, Tumblr, Flickr & Sina Weibo.
● Intended for organizations to run for their users.
go.gwu.edu/sfm
Step 1a: Create a collection
Step 1b: Describe the collection
Step 1c: Specify what is to be collected
Step 2: Turn on collecting
Step 3: Monitor collecting
Step 4: Export
Collecting got
off to a rough start ...
Dataset caveats: Holes
Candidates and key election hashtags dataset by week
Family road trip to Michigan &
Canada. We loved Toronto!
Dataset caveats: Rate limits
Tweet rate (by minute) from Democratic Convention
Rate limit plateau
Dataset caveats: Non-U.S. election tweets
Sharing the dataset
Sharing the dataset
● Twitter’s developer policies require sharing tweet ids only.
● Complete tweets can be “hydrated” from Twitter API.
○ Hydrating complete dataset takes about a month.
● Tweets that are deleted or from accounts that are
protected, deleted, or suspended are not available.
● Provides a “right to be forgotten” but also:
○ Complicates reproducible research
○ Difficult to hold politicians accountable, research bots.
● However, share complete tweets within university.
Sharing the dataset: Harvard’s Dataverse
doi.org/10.7910/DVN/PDI7IN
Sharing the dataset: Harvard’s Dataverse
● Almost 3,000 downloads (as of mid-2018).
● Each collection has a README.
→ Interested in collaborating on best practices for sharing
datasets.
Sharing the dataset: TweetSets
● Open source software by GW Libraries.
● Basic idea: Reuse existing datasets, but allow to filter /
query for only the tweets that are needed.
● Conforms with Twitter policies.
○ Within university: Complete tweets
○ Public: Tweet ids only
tweetsets.library.gwu.edu
TweetSets step 1: Select source datasets
TweetSets step 2a: Query the tweets in datasets
TweetSets step 2a: Query the tweets in datasets
● Tweet text
● Hashtags
● Mentions
● Posted by
● In reply to
● Tweet type
● Created at
● URL
● Has image
● Is geotagged
Also, query by:
TweetSets step 2b View summary statistics
TweetSets step 2c: View sample tweets
TweetSets step 3: Create a dataset
TweetSets step 4: Export
Uses of the dataset
Academic research
● Clare H. Liu, “Applications of Twitter Emotion Detection for
Stock Market Prediction.” Masters thesis at MIT.
● David Anuta, Josh Churchin & Jiebo Luo, “Election Bias:
Comparing Polls and Twitter in the 2016 U.S. Election.”
● Sicheng Zhao, Yue Gao, Guiguang Ding & Tat-Seng
Chua. “Real-Time Multimedia Social Event Detection in
Microblog.” IEEE Transactions on Cybernetics.
● Ahsen J. Uppal & H. Howie Huang, “Event Prediction from
Dynamic Communities in Social Networks.”
Journalists
● Significant interest in dataset after release of list of IRA
accounts by Senate Intelligence Committee.
● We identified 36,210 tweets from these accounts.
● Sharing these deleted tweets violates Twitter policy.
● University weighed public interest vs. risk of losing access
to Twitter API for GW researchers.
● See
nbcnews.com/tech/social-media/now-available-more-200-
000-deleted-russian-troll-tweets-n844731
Deleted tweets research
● With Catie Bailard (School of Media & Public Affairs,
GWU) & Andy Hoagland (data scientist)
● Possible research questions:
○ What is the substantive content of deleted vs. extant tweets about
the candidate(s)?
○ What was the relative distribution of deleted / extant tweets in
terms of the proportion that were pro- / anti- Hillary / Trump?
○ Were tweets with a certain type of content more likely to be
deleted than those with other types of content?
Deleted tweets research
● Possible research questions:
○ What portion of tweets deleted by Twitter were likely-bots vs.
likely-humans? Were there differences in the substantive content
of deleted tweets generated by likely-humans versus likely-bots?
Deleted tweets research
● 92 million tweets from October 8th and November 8th
2016 which contain “Clinton,” “Trump,” “Donald,” “Hillary,”
“@realDonaldTrump” or “@HillaryClinton”.
● Split deleted tweets from extant tweets.
○ 22 million tweets (24%) were deleted
● Created 10% sample of deleted tweets & 1.5% sample of
extant tweets.
Deleted tweets research
● For each tweet in deleted tweets sample, determined
reason for deletion.
○ For example: user suspended, original user suspended, tweet
deleted
● For each user in each of the samples, ran bot detection.
○ Botometer, using API.
○ Used tweets from full dataset, rather than live Twitter.
○ Not all users had enough tweets.
Deleted tweets research
● Performing content analysis of 3000 tweets.
○ Coding for overall “gist” (anti-Trump, anti-Hillary, pro-Trump,
and/or pro-Hillary), specific subject matter (e.g., criticizes
candidate’s personal qualities or past actions, calls-to-action),
identity (e.g., race, gender), more.
○ Three humans code each tweet using DiscoverText.
○ Average Krippendorff’s Alpha score 0.73.
● Will use neural network machine learning to generalize to
larger dataset.
Delete reasons
Botometer scores for deleted tweets
Plans for 2018 election
Plans for #election2018: Currently collecting
● Neutral: #Nov2018, #Election2018, #Nov18, #Election18,
#Midterms2018, #Midterms18, #Midterm2018,
#Midterm18, #midtermelection, #election, #vote, 2018
election, election 2018, midterm election
● Partisan Republican: #trump, #maga, #gop, #republican,
#trumptrain, #kag
● Partisan Democrat: #bluewave2018, #bluewave18,
#bluewave, #democrats, #resist, #resistance
Plans for #election2018: Currently collecting
● Top accounts
○ 5,000+ accounts extracted from neutral collection because a top
tweeter, retweeted account, or mentioned account.
○ Add new accounts every week from rolling 2 weeks of tweets.
○ Already seeing significant churn as accounts are suspended.
Plans for #election2018:
● Individual candidates
● Local parties
● Local hashtags
→ Currently in discussions with a news organization to
collaborate on identifying these accounts / hashtags.
→ Thinking about how to “cut through noise” to collect tweets
from citizens.
→ Working on contemporaneous web archiving of linked web
resources and media.
#election2018: Topic Tracker
bit.ly/2J0EKFj
Questions?
More info:
● go.gwu.edu/gwsfm
● @SocialFeedMgr
● sfm@gwu.edu
Or:
● @justin_littman
● justinlittman@gwu.edu

More Related Content

What's hot

Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Practicing Data Science Responsibly
Practicing Data Science ResponsiblyPracticing Data Science Responsibly
Practicing Data Science Responsiblyrahulbot
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...CSCJournals
 
Social media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchSocial media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchToronto Metropolitan University
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsCitizens in the Making
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Identifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisIdentifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisFelipe Bonow Soares
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysisTaylor Graham
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
Term=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisTerm=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisSuresh Manian
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data miningSTEM/MARK
 
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Justin Littman
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 

What's hot (20)

Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Practicing Data Science Responsibly
Practicing Data Science ResponsiblyPracticing Data Science Responsibly
Practicing Data Science Responsibly
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
 
Social media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchSocial media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for research
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Guestlecture on #bigdata
Guestlecture on #bigdataGuestlecture on #bigdata
Guestlecture on #bigdata
 
BD-ACA Week8a
BD-ACA Week8aBD-ACA Week8a
BD-ACA Week8a
 
Identifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisIdentifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network Analysis
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Term=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisTerm=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysis
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data mining
 
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 

Similar to What do you do with 280 million tweets from the 2016 U.S. election?

Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...University of Groningen (The Netherlands)
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Laura Wrubel
 
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectA Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectAxel Bruns
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsToronto Metropolitan University
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkGeorge Konstantakopoulos
 
Data augmented ethnography: 
using big data and ethnography to explore candi...
Data augmented ethnography: 
using big data and ethnography  to explore candi...Data augmented ethnography: 
using big data and ethnography  to explore candi...
Data augmented ethnography: 
using big data and ethnography to explore candi...Salla-Maaria Laaksonen
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Axel Bruns
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saraviaElvis Saravia
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media ActivityIRJET Journal
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...Hemant Purohit
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter UseHan Woo PARK
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 

Similar to What do you do with 280 million tweets from the 2016 U.S. election? (20)

Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...
 
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectA Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical Considerations
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
Data augmented ethnography: 
using big data and ethnography to explore candi...
Data augmented ethnography: 
using big data and ethnography  to explore candi...Data augmented ethnography: 
using big data and ethnography  to explore candi...
Data augmented ethnography: 
using big data and ethnography to explore candi...
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots? Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots?
 
Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saravia
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media Activity
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...
 
Social Media Data Analytics
Social Media Data AnalyticsSocial Media Data Analytics
Social Media Data Analytics
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter Use
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 

Recently uploaded

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 

Recently uploaded (20)

Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 

What do you do with 280 million tweets from the 2016 U.S. election?

  • 1. What do you do with 280 million tweets from the 2016 U.S. election? Justin Littman April 25, 2018
  • 2. Overview ● Outline of the dataset ● Collecting the dataset ○ Social Feed Manager ● Sharing the dataset ○ TweetSets ● Uses of the dataset ● Plans for 2018 U.S. election
  • 3. Outline of the dataset
  • 4. Datasets Filter stream: ● Candidates and key election hashtags ● Democratic Convention ● GOP Convention ● First presidential debate ● Second presidential debate ● Third presidential debate ● Vice-presidential debate ● Election Day User timelines: ● Democratic candidates ● Democratic Party ● Republican candidates ● Republican Party
  • 5. Candidates and key election hashtags ● Track: election2016, election, clinton, kaine, trump, pence ● Follow: @realDonaldTrump, @HillaryClinton, @timkaine, @mike_pence ● 251,077,140 tweets ● July 13, 2016 - November 10, 2016
  • 6. Democratic Convention ● Track: philly convention, philadelphia convention, democratic convention, dnc convention, #demsinphilly, #dnc, #philly, #demconvention ● Follow: @DemConvention, @TheDemocrats ● 8,340,668 tweets ● July 22, 2016 - July 30, 2016
  • 7. Democratic Candidates ● Accounts: @BernieSanders, @HillaryClinton, @MartinOMalley, @SenSanders, @timkaine ● 22,251 tweets ● Collected every week
  • 10. Top tweeters 561k tweets, 15 followers suspended deleted deleted tweets primarily in Greek 577k tweets, last tweeted Nov 7, 2017 126k tweets, 5 followers deleted still tweeting (915k) at non-human rates
  • 11. Top mentions Where is @timkaine?
  • 15. Social Feed Manager (SFM) ● Open source software by GW Libraries. ● User interface for collecting, managing & exporting social media data. ● Goal: Lower the technical barriers for collecting social media data for academic research and archiving. ● Supports Twitter, Tumblr, Flickr & Sina Weibo. ● Intended for organizations to run for their users. go.gwu.edu/sfm
  • 16. Step 1a: Create a collection
  • 17. Step 1b: Describe the collection
  • 18. Step 1c: Specify what is to be collected
  • 19. Step 2: Turn on collecting
  • 20. Step 3: Monitor collecting
  • 22. Collecting got off to a rough start ...
  • 23. Dataset caveats: Holes Candidates and key election hashtags dataset by week Family road trip to Michigan & Canada. We loved Toronto!
  • 24. Dataset caveats: Rate limits Tweet rate (by minute) from Democratic Convention Rate limit plateau
  • 25. Dataset caveats: Non-U.S. election tweets
  • 27. Sharing the dataset ● Twitter’s developer policies require sharing tweet ids only. ● Complete tweets can be “hydrated” from Twitter API. ○ Hydrating complete dataset takes about a month. ● Tweets that are deleted or from accounts that are protected, deleted, or suspended are not available. ● Provides a “right to be forgotten” but also: ○ Complicates reproducible research ○ Difficult to hold politicians accountable, research bots. ● However, share complete tweets within university.
  • 28. Sharing the dataset: Harvard’s Dataverse doi.org/10.7910/DVN/PDI7IN
  • 29. Sharing the dataset: Harvard’s Dataverse ● Almost 3,000 downloads (as of mid-2018). ● Each collection has a README. → Interested in collaborating on best practices for sharing datasets.
  • 30. Sharing the dataset: TweetSets ● Open source software by GW Libraries. ● Basic idea: Reuse existing datasets, but allow to filter / query for only the tweets that are needed. ● Conforms with Twitter policies. ○ Within university: Complete tweets ○ Public: Tweet ids only tweetsets.library.gwu.edu
  • 31. TweetSets step 1: Select source datasets
  • 32. TweetSets step 2a: Query the tweets in datasets
  • 33. TweetSets step 2a: Query the tweets in datasets ● Tweet text ● Hashtags ● Mentions ● Posted by ● In reply to ● Tweet type ● Created at ● URL ● Has image ● Is geotagged Also, query by:
  • 34. TweetSets step 2b View summary statistics
  • 35. TweetSets step 2c: View sample tweets
  • 36. TweetSets step 3: Create a dataset
  • 38. Uses of the dataset
  • 39. Academic research ● Clare H. Liu, “Applications of Twitter Emotion Detection for Stock Market Prediction.” Masters thesis at MIT. ● David Anuta, Josh Churchin & Jiebo Luo, “Election Bias: Comparing Polls and Twitter in the 2016 U.S. Election.” ● Sicheng Zhao, Yue Gao, Guiguang Ding & Tat-Seng Chua. “Real-Time Multimedia Social Event Detection in Microblog.” IEEE Transactions on Cybernetics. ● Ahsen J. Uppal & H. Howie Huang, “Event Prediction from Dynamic Communities in Social Networks.”
  • 40. Journalists ● Significant interest in dataset after release of list of IRA accounts by Senate Intelligence Committee. ● We identified 36,210 tweets from these accounts. ● Sharing these deleted tweets violates Twitter policy. ● University weighed public interest vs. risk of losing access to Twitter API for GW researchers. ● See nbcnews.com/tech/social-media/now-available-more-200- 000-deleted-russian-troll-tweets-n844731
  • 41. Deleted tweets research ● With Catie Bailard (School of Media & Public Affairs, GWU) & Andy Hoagland (data scientist) ● Possible research questions: ○ What is the substantive content of deleted vs. extant tweets about the candidate(s)? ○ What was the relative distribution of deleted / extant tweets in terms of the proportion that were pro- / anti- Hillary / Trump? ○ Were tweets with a certain type of content more likely to be deleted than those with other types of content?
  • 42. Deleted tweets research ● Possible research questions: ○ What portion of tweets deleted by Twitter were likely-bots vs. likely-humans? Were there differences in the substantive content of deleted tweets generated by likely-humans versus likely-bots?
  • 43. Deleted tweets research ● 92 million tweets from October 8th and November 8th 2016 which contain “Clinton,” “Trump,” “Donald,” “Hillary,” “@realDonaldTrump” or “@HillaryClinton”. ● Split deleted tweets from extant tweets. ○ 22 million tweets (24%) were deleted ● Created 10% sample of deleted tweets & 1.5% sample of extant tweets.
  • 44. Deleted tweets research ● For each tweet in deleted tweets sample, determined reason for deletion. ○ For example: user suspended, original user suspended, tweet deleted ● For each user in each of the samples, ran bot detection. ○ Botometer, using API. ○ Used tweets from full dataset, rather than live Twitter. ○ Not all users had enough tweets.
  • 45. Deleted tweets research ● Performing content analysis of 3000 tweets. ○ Coding for overall “gist” (anti-Trump, anti-Hillary, pro-Trump, and/or pro-Hillary), specific subject matter (e.g., criticizes candidate’s personal qualities or past actions, calls-to-action), identity (e.g., race, gender), more. ○ Three humans code each tweet using DiscoverText. ○ Average Krippendorff’s Alpha score 0.73. ● Will use neural network machine learning to generalize to larger dataset.
  • 47. Botometer scores for deleted tweets
  • 48. Plans for 2018 election
  • 49. Plans for #election2018: Currently collecting ● Neutral: #Nov2018, #Election2018, #Nov18, #Election18, #Midterms2018, #Midterms18, #Midterm2018, #Midterm18, #midtermelection, #election, #vote, 2018 election, election 2018, midterm election ● Partisan Republican: #trump, #maga, #gop, #republican, #trumptrain, #kag ● Partisan Democrat: #bluewave2018, #bluewave18, #bluewave, #democrats, #resist, #resistance
  • 50. Plans for #election2018: Currently collecting ● Top accounts ○ 5,000+ accounts extracted from neutral collection because a top tweeter, retweeted account, or mentioned account. ○ Add new accounts every week from rolling 2 weeks of tweets. ○ Already seeing significant churn as accounts are suspended.
  • 51. Plans for #election2018: ● Individual candidates ● Local parties ● Local hashtags → Currently in discussions with a news organization to collaborate on identifying these accounts / hashtags. → Thinking about how to “cut through noise” to collect tweets from citizens. → Working on contemporaneous web archiving of linked web resources and media.
  • 53.
  • 54. Questions? More info: ● go.gwu.edu/gwsfm ● @SocialFeedMgr ● sfm@gwu.edu Or: ● @justin_littman ● justinlittman@gwu.edu