SlideShare a Scribd company logo
1 of 54
Download to read offline
What do you do with
280 million tweets from the
2016 U.S. election?
Justin Littman
April 25, 2018
Overview
● Outline of the dataset
● Collecting the dataset
○ Social Feed Manager
● Sharing the dataset
○ TweetSets
● Uses of the dataset
● Plans for 2018 U.S. election
Outline of the dataset
Datasets
Filter stream:
● Candidates and key election
hashtags
● Democratic Convention
● GOP Convention
● First presidential debate
● Second presidential debate
● Third presidential debate
● Vice-presidential debate
● Election Day
User timelines:
● Democratic candidates
● Democratic Party
● Republican candidates
● Republican Party
Candidates and key election hashtags
● Track: election2016, election, clinton, kaine, trump, pence
● Follow: @realDonaldTrump, @HillaryClinton, @timkaine,
@mike_pence
● 251,077,140 tweets
● July 13, 2016 - November 10, 2016
Democratic Convention
● Track: philly convention, philadelphia convention,
democratic convention, dnc convention, #demsinphilly,
#dnc, #philly, #demconvention
● Follow: @DemConvention, @TheDemocrats
● 8,340,668 tweets
● July 22, 2016 - July 30, 2016
Democratic Candidates
● Accounts: @BernieSanders, @HillaryClinton,
@MartinOMalley, @SenSanders, @timkaine
● 22,251 tweets
● Collected every week
Tweet types
Most retweeted
Top tweeters
561k tweets, 15 followers
suspended
deleted
deleted
tweets primarily in Greek
577k tweets, last tweeted Nov 7, 2017
126k tweets, 5 followers
deleted
still tweeting (915k) at non-human rates
Top mentions
Where is @timkaine?
Top hashtags
Republicans clearly
out-hashtagged the
Democrats.
Top URLs
spam
spam
gone
gone
gone
gone
Collecting the dataset
Social Feed Manager (SFM)
● Open source software by GW Libraries.
● User interface for collecting, managing & exporting social
media data.
● Goal: Lower the technical barriers for collecting social
media data for academic research and archiving.
● Supports Twitter, Tumblr, Flickr & Sina Weibo.
● Intended for organizations to run for their users.
go.gwu.edu/sfm
Step 1a: Create a collection
Step 1b: Describe the collection
Step 1c: Specify what is to be collected
Step 2: Turn on collecting
Step 3: Monitor collecting
Step 4: Export
Collecting got
off to a rough start ...
Dataset caveats: Holes
Candidates and key election hashtags dataset by week
Family road trip to Michigan &
Canada. We loved Toronto!
Dataset caveats: Rate limits
Tweet rate (by minute) from Democratic Convention
Rate limit plateau
Dataset caveats: Non-U.S. election tweets
Sharing the dataset
Sharing the dataset
● Twitter’s developer policies require sharing tweet ids only.
● Complete tweets can be “hydrated” from Twitter API.
○ Hydrating complete dataset takes about a month.
● Tweets that are deleted or from accounts that are
protected, deleted, or suspended are not available.
● Provides a “right to be forgotten” but also:
○ Complicates reproducible research
○ Difficult to hold politicians accountable, research bots.
● However, share complete tweets within university.
Sharing the dataset: Harvard’s Dataverse
doi.org/10.7910/DVN/PDI7IN
Sharing the dataset: Harvard’s Dataverse
● Almost 3,000 downloads (as of mid-2018).
● Each collection has a README.
→ Interested in collaborating on best practices for sharing
datasets.
Sharing the dataset: TweetSets
● Open source software by GW Libraries.
● Basic idea: Reuse existing datasets, but allow to filter /
query for only the tweets that are needed.
● Conforms with Twitter policies.
○ Within university: Complete tweets
○ Public: Tweet ids only
tweetsets.library.gwu.edu
TweetSets step 1: Select source datasets
TweetSets step 2a: Query the tweets in datasets
TweetSets step 2a: Query the tweets in datasets
● Tweet text
● Hashtags
● Mentions
● Posted by
● In reply to
● Tweet type
● Created at
● URL
● Has image
● Is geotagged
Also, query by:
TweetSets step 2b View summary statistics
TweetSets step 2c: View sample tweets
TweetSets step 3: Create a dataset
TweetSets step 4: Export
Uses of the dataset
Academic research
● Clare H. Liu, “Applications of Twitter Emotion Detection for
Stock Market Prediction.” Masters thesis at MIT.
● David Anuta, Josh Churchin & Jiebo Luo, “Election Bias:
Comparing Polls and Twitter in the 2016 U.S. Election.”
● Sicheng Zhao, Yue Gao, Guiguang Ding & Tat-Seng
Chua. “Real-Time Multimedia Social Event Detection in
Microblog.” IEEE Transactions on Cybernetics.
● Ahsen J. Uppal & H. Howie Huang, “Event Prediction from
Dynamic Communities in Social Networks.”
Journalists
● Significant interest in dataset after release of list of IRA
accounts by Senate Intelligence Committee.
● We identified 36,210 tweets from these accounts.
● Sharing these deleted tweets violates Twitter policy.
● University weighed public interest vs. risk of losing access
to Twitter API for GW researchers.
● See
nbcnews.com/tech/social-media/now-available-more-200-
000-deleted-russian-troll-tweets-n844731
Deleted tweets research
● With Catie Bailard (School of Media & Public Affairs,
GWU) & Andy Hoagland (data scientist)
● Possible research questions:
○ What is the substantive content of deleted vs. extant tweets about
the candidate(s)?
○ What was the relative distribution of deleted / extant tweets in
terms of the proportion that were pro- / anti- Hillary / Trump?
○ Were tweets with a certain type of content more likely to be
deleted than those with other types of content?
Deleted tweets research
● Possible research questions:
○ What portion of tweets deleted by Twitter were likely-bots vs.
likely-humans? Were there differences in the substantive content
of deleted tweets generated by likely-humans versus likely-bots?
Deleted tweets research
● 92 million tweets from October 8th and November 8th
2016 which contain “Clinton,” “Trump,” “Donald,” “Hillary,”
“@realDonaldTrump” or “@HillaryClinton”.
● Split deleted tweets from extant tweets.
○ 22 million tweets (24%) were deleted
● Created 10% sample of deleted tweets & 1.5% sample of
extant tweets.
Deleted tweets research
● For each tweet in deleted tweets sample, determined
reason for deletion.
○ For example: user suspended, original user suspended, tweet
deleted
● For each user in each of the samples, ran bot detection.
○ Botometer, using API.
○ Used tweets from full dataset, rather than live Twitter.
○ Not all users had enough tweets.
Deleted tweets research
● Performing content analysis of 3000 tweets.
○ Coding for overall “gist” (anti-Trump, anti-Hillary, pro-Trump,
and/or pro-Hillary), specific subject matter (e.g., criticizes
candidate’s personal qualities or past actions, calls-to-action),
identity (e.g., race, gender), more.
○ Three humans code each tweet using DiscoverText.
○ Average Krippendorff’s Alpha score 0.73.
● Will use neural network machine learning to generalize to
larger dataset.
Delete reasons
Botometer scores for deleted tweets
Plans for 2018 election
Plans for #election2018: Currently collecting
● Neutral: #Nov2018, #Election2018, #Nov18, #Election18,
#Midterms2018, #Midterms18, #Midterm2018,
#Midterm18, #midtermelection, #election, #vote, 2018
election, election 2018, midterm election
● Partisan Republican: #trump, #maga, #gop, #republican,
#trumptrain, #kag
● Partisan Democrat: #bluewave2018, #bluewave18,
#bluewave, #democrats, #resist, #resistance
Plans for #election2018: Currently collecting
● Top accounts
○ 5,000+ accounts extracted from neutral collection because a top
tweeter, retweeted account, or mentioned account.
○ Add new accounts every week from rolling 2 weeks of tweets.
○ Already seeing significant churn as accounts are suspended.
Plans for #election2018:
● Individual candidates
● Local parties
● Local hashtags
→ Currently in discussions with a news organization to
collaborate on identifying these accounts / hashtags.
→ Thinking about how to “cut through noise” to collect tweets
from citizens.
→ Working on contemporaneous web archiving of linked web
resources and media.
#election2018: Topic Tracker
bit.ly/2J0EKFj
Questions?
More info:
● go.gwu.edu/gwsfm
● @SocialFeedMgr
● sfm@gwu.edu
Or:
● @justin_littman
● justinlittman@gwu.edu

More Related Content

What's hot

Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Practicing Data Science Responsibly
Practicing Data Science ResponsiblyPracticing Data Science Responsibly
Practicing Data Science Responsiblyrahulbot
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...CSCJournals
 
Social media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchSocial media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchToronto Metropolitan University
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsCitizens in the Making
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Identifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisIdentifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisFelipe Bonow Soares
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysisTaylor Graham
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
Term=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisTerm=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisSuresh Manian
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data miningSTEM/MARK
 
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Justin Littman
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 

What's hot (20)

Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Practicing Data Science Responsibly
Practicing Data Science ResponsiblyPracticing Data Science Responsibly
Practicing Data Science Responsibly
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
 
Social media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for researchSocial media data stewardship: The ethics of social media data use for research
Social media data stewardship: The ethics of social media data use for research
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by Clai...
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Guestlecture on #bigdata
Guestlecture on #bigdataGuestlecture on #bigdata
Guestlecture on #bigdata
 
BD-ACA Week8a
BD-ACA Week8aBD-ACA Week8a
BD-ACA Week8a
 
Identifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network AnalysisIdentifying Influencers on Social Media Using Social Network Analysis
Identifying Influencers on Social Media Using Social Network Analysis
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Term=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysisTerm=machine+learning - Experiments in #textanalysis
Term=machine+learning - Experiments in #textanalysis
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data mining
 
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
Presentation at National Forum on Ethics & Archiving the Web (March 23, 2018)
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 

Similar to What do you do with 280 million tweets from the 2016 U.S. election?

Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...University of Groningen (The Netherlands)
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Laura Wrubel
 
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectA Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectAxel Bruns
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsToronto Metropolitan University
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkGeorge Konstantakopoulos
 
Data augmented ethnography: 
using big data and ethnography to explore candi...
Data augmented ethnography: 
using big data and ethnography  to explore candi...Data augmented ethnography: 
using big data and ethnography  to explore candi...
Data augmented ethnography: 
using big data and ethnography to explore candi...Salla-Maaria Laaksonen
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Axel Bruns
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saraviaElvis Saravia
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media ActivityIRJET Journal
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...Hemant Purohit
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter UseHan Woo PARK
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 

Similar to What do you do with 280 million tweets from the 2016 U.S. election? (20)

Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...Social Feed Manager: Developing Software and Offering Services to Support Soc...
Social Feed Manager: Developing Software and Offering Services to Support Soc...
 
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA ProjectA Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
A Multi-Institutional Approach to ‘Big Social Data’: The TrISMA Project
 
Research with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical ConsiderationsResearch with Social Media Data: Stewardship & Ethical Considerations
Research with Social Media Data: Stewardship & Ethical Considerations
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
Data augmented ethnography: 
using big data and ethnography to explore candi...
Data augmented ethnography: 
using big data and ethnography  to explore candi...Data augmented ethnography: 
using big data and ethnography  to explore candi...
Data augmented ethnography: 
using big data and ethnography to explore candi...
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots? Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots?
 
Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saravia
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media Activity
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...
 
Social Media Data Analytics
Social Media Data AnalyticsSocial Media Data Analytics
Social Media Data Analytics
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter Use
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 

Recently uploaded

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...akbard9823
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 

Recently uploaded (20)

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 

What do you do with 280 million tweets from the 2016 U.S. election?

  • 1. What do you do with 280 million tweets from the 2016 U.S. election? Justin Littman April 25, 2018
  • 2. Overview ● Outline of the dataset ● Collecting the dataset ○ Social Feed Manager ● Sharing the dataset ○ TweetSets ● Uses of the dataset ● Plans for 2018 U.S. election
  • 3. Outline of the dataset
  • 4. Datasets Filter stream: ● Candidates and key election hashtags ● Democratic Convention ● GOP Convention ● First presidential debate ● Second presidential debate ● Third presidential debate ● Vice-presidential debate ● Election Day User timelines: ● Democratic candidates ● Democratic Party ● Republican candidates ● Republican Party
  • 5. Candidates and key election hashtags ● Track: election2016, election, clinton, kaine, trump, pence ● Follow: @realDonaldTrump, @HillaryClinton, @timkaine, @mike_pence ● 251,077,140 tweets ● July 13, 2016 - November 10, 2016
  • 6. Democratic Convention ● Track: philly convention, philadelphia convention, democratic convention, dnc convention, #demsinphilly, #dnc, #philly, #demconvention ● Follow: @DemConvention, @TheDemocrats ● 8,340,668 tweets ● July 22, 2016 - July 30, 2016
  • 7. Democratic Candidates ● Accounts: @BernieSanders, @HillaryClinton, @MartinOMalley, @SenSanders, @timkaine ● 22,251 tweets ● Collected every week
  • 10. Top tweeters 561k tweets, 15 followers suspended deleted deleted tweets primarily in Greek 577k tweets, last tweeted Nov 7, 2017 126k tweets, 5 followers deleted still tweeting (915k) at non-human rates
  • 11. Top mentions Where is @timkaine?
  • 15. Social Feed Manager (SFM) ● Open source software by GW Libraries. ● User interface for collecting, managing & exporting social media data. ● Goal: Lower the technical barriers for collecting social media data for academic research and archiving. ● Supports Twitter, Tumblr, Flickr & Sina Weibo. ● Intended for organizations to run for their users. go.gwu.edu/sfm
  • 16. Step 1a: Create a collection
  • 17. Step 1b: Describe the collection
  • 18. Step 1c: Specify what is to be collected
  • 19. Step 2: Turn on collecting
  • 20. Step 3: Monitor collecting
  • 22. Collecting got off to a rough start ...
  • 23. Dataset caveats: Holes Candidates and key election hashtags dataset by week Family road trip to Michigan & Canada. We loved Toronto!
  • 24. Dataset caveats: Rate limits Tweet rate (by minute) from Democratic Convention Rate limit plateau
  • 25. Dataset caveats: Non-U.S. election tweets
  • 27. Sharing the dataset ● Twitter’s developer policies require sharing tweet ids only. ● Complete tweets can be “hydrated” from Twitter API. ○ Hydrating complete dataset takes about a month. ● Tweets that are deleted or from accounts that are protected, deleted, or suspended are not available. ● Provides a “right to be forgotten” but also: ○ Complicates reproducible research ○ Difficult to hold politicians accountable, research bots. ● However, share complete tweets within university.
  • 28. Sharing the dataset: Harvard’s Dataverse doi.org/10.7910/DVN/PDI7IN
  • 29. Sharing the dataset: Harvard’s Dataverse ● Almost 3,000 downloads (as of mid-2018). ● Each collection has a README. → Interested in collaborating on best practices for sharing datasets.
  • 30. Sharing the dataset: TweetSets ● Open source software by GW Libraries. ● Basic idea: Reuse existing datasets, but allow to filter / query for only the tweets that are needed. ● Conforms with Twitter policies. ○ Within university: Complete tweets ○ Public: Tweet ids only tweetsets.library.gwu.edu
  • 31. TweetSets step 1: Select source datasets
  • 32. TweetSets step 2a: Query the tweets in datasets
  • 33. TweetSets step 2a: Query the tweets in datasets ● Tweet text ● Hashtags ● Mentions ● Posted by ● In reply to ● Tweet type ● Created at ● URL ● Has image ● Is geotagged Also, query by:
  • 34. TweetSets step 2b View summary statistics
  • 35. TweetSets step 2c: View sample tweets
  • 36. TweetSets step 3: Create a dataset
  • 38. Uses of the dataset
  • 39. Academic research ● Clare H. Liu, “Applications of Twitter Emotion Detection for Stock Market Prediction.” Masters thesis at MIT. ● David Anuta, Josh Churchin & Jiebo Luo, “Election Bias: Comparing Polls and Twitter in the 2016 U.S. Election.” ● Sicheng Zhao, Yue Gao, Guiguang Ding & Tat-Seng Chua. “Real-Time Multimedia Social Event Detection in Microblog.” IEEE Transactions on Cybernetics. ● Ahsen J. Uppal & H. Howie Huang, “Event Prediction from Dynamic Communities in Social Networks.”
  • 40. Journalists ● Significant interest in dataset after release of list of IRA accounts by Senate Intelligence Committee. ● We identified 36,210 tweets from these accounts. ● Sharing these deleted tweets violates Twitter policy. ● University weighed public interest vs. risk of losing access to Twitter API for GW researchers. ● See nbcnews.com/tech/social-media/now-available-more-200- 000-deleted-russian-troll-tweets-n844731
  • 41. Deleted tweets research ● With Catie Bailard (School of Media & Public Affairs, GWU) & Andy Hoagland (data scientist) ● Possible research questions: ○ What is the substantive content of deleted vs. extant tweets about the candidate(s)? ○ What was the relative distribution of deleted / extant tweets in terms of the proportion that were pro- / anti- Hillary / Trump? ○ Were tweets with a certain type of content more likely to be deleted than those with other types of content?
  • 42. Deleted tweets research ● Possible research questions: ○ What portion of tweets deleted by Twitter were likely-bots vs. likely-humans? Were there differences in the substantive content of deleted tweets generated by likely-humans versus likely-bots?
  • 43. Deleted tweets research ● 92 million tweets from October 8th and November 8th 2016 which contain “Clinton,” “Trump,” “Donald,” “Hillary,” “@realDonaldTrump” or “@HillaryClinton”. ● Split deleted tweets from extant tweets. ○ 22 million tweets (24%) were deleted ● Created 10% sample of deleted tweets & 1.5% sample of extant tweets.
  • 44. Deleted tweets research ● For each tweet in deleted tweets sample, determined reason for deletion. ○ For example: user suspended, original user suspended, tweet deleted ● For each user in each of the samples, ran bot detection. ○ Botometer, using API. ○ Used tweets from full dataset, rather than live Twitter. ○ Not all users had enough tweets.
  • 45. Deleted tweets research ● Performing content analysis of 3000 tweets. ○ Coding for overall “gist” (anti-Trump, anti-Hillary, pro-Trump, and/or pro-Hillary), specific subject matter (e.g., criticizes candidate’s personal qualities or past actions, calls-to-action), identity (e.g., race, gender), more. ○ Three humans code each tweet using DiscoverText. ○ Average Krippendorff’s Alpha score 0.73. ● Will use neural network machine learning to generalize to larger dataset.
  • 47. Botometer scores for deleted tweets
  • 48. Plans for 2018 election
  • 49. Plans for #election2018: Currently collecting ● Neutral: #Nov2018, #Election2018, #Nov18, #Election18, #Midterms2018, #Midterms18, #Midterm2018, #Midterm18, #midtermelection, #election, #vote, 2018 election, election 2018, midterm election ● Partisan Republican: #trump, #maga, #gop, #republican, #trumptrain, #kag ● Partisan Democrat: #bluewave2018, #bluewave18, #bluewave, #democrats, #resist, #resistance
  • 50. Plans for #election2018: Currently collecting ● Top accounts ○ 5,000+ accounts extracted from neutral collection because a top tweeter, retweeted account, or mentioned account. ○ Add new accounts every week from rolling 2 weeks of tweets. ○ Already seeing significant churn as accounts are suspended.
  • 51. Plans for #election2018: ● Individual candidates ● Local parties ● Local hashtags → Currently in discussions with a news organization to collaborate on identifying these accounts / hashtags. → Thinking about how to “cut through noise” to collect tweets from citizens. → Working on contemporaneous web archiving of linked web resources and media.
  • 53.
  • 54. Questions? More info: ● go.gwu.edu/gwsfm ● @SocialFeedMgr ● sfm@gwu.edu Or: ● @justin_littman ● justinlittman@gwu.edu