SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
collecting
twitter data
w /
social feed manager
Daniel Chudnov - @dchud - dchud at gwu edu
ELAG 2013 - 2013-05-30 - Ghent, Belgium
tinyurl.com / dchud-elag-2013
2.
social-feed-manager
•python / django
•user timelines, filter,
sample, search
•simple display / export
for user timelines
•free software, on github
3.
social feed manager
github.com /
gwu-libraries /
social-feed-manager
10.
“How Mainstream News
Outlets Use Twitter” (2011)
• GWU’s Kimberly Gross (SMPA) +
students
• Pew Research Center’s Project for
Excellence in Journalism
• “news agenda these organizations
promoted on Twitter closely matches
that of their legacy platforms”
http://www.journalism.org/analysis_report/
how_mainstream_media_outlets_use_twitter
23.
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
28.
social feed manager
github.com /
gwu-libraries /
social-feed-manager
29.
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
38.
millions of tweets
as they occur
around an event
39.
filter streams
* a little more complicated than that
•filter by users, keywords, geo
•about 3,000 tweets / min *
•10,000,000s of tweets
•political debates, news events
40.
spritzer feed
•~0.5% of all public tweets
•~3,000,000 tweets / day
(growing)
•a useful random sampling
41.
search
•after an event
•find users, keywords
•limited - better than nothing
42.
we can do
all this
at no marginal cost
for data*
* not really “big data” - GBs, not TBs
46.
when Congress turned over
•16+ accounts deleted /
hidden
•combined 105,993 followers
•14,479 tweets saved in SFM
no longer public
47.
if a researcher needs more
•support selection,
acquisition, accession,
storage, transformation
•collect what’s free around
it to minimize cost
•plan purchase via grant
•collect prospectively