collecting
twitter data
w /
social feed manager
Daniel Chudnov - @dchud - dchud at gwu edu
ELAG 2013 - 2013-05-30 - Ghent, Belgium
tinyurl.com / dchud-elag-2013
“How Mainstream News
Outlets Use Twitter” (2011)
• GWU’s Kimberly Gross (SMPA) +
students
• Pew Research Center’s Project for
Excellence in Journalism
• “news agenda these organizations
promoted on Twitter closely matches
that of their legacy platforms”
http://www.journalism.org/analysis_report/
how_mainstream_media_outlets_use_twitter
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
filter streams
* a little more complicated than that
•filter by users, keywords, geo
•about 3,000 tweets / min *
•10,000,000s of tweets
•political debates, news events
spritzer feed
•~0.5% of all public tweets
•~3,000,000 tweets / day
(growing)
•a useful random sampling
when Congress turned over
•16+ accounts deleted /
hidden
•combined 105,993 followers
•14,479 tweets saved in SFM
no longer public
if a researcher needs more
•support selection,
acquisition, accession,
storage, transformation
•collect what’s free around
it to minimize cost
•plan purchase via grant
•collect prospectively