SlideShare a Scribd company logo
collecting
twitter data
w /
social feed manager
Daniel Chudnov - @dchud - dchud at gwu edu
ELAG 2013 - 2013-05-30 - Ghent, Belgium
tinyurl.com / dchud-elag-2013
social-feed-manager
•python / django
•user timelines, filter,
sample, search
•simple display / export
for user timelines
•free software, on github
social feed manager
github.com /
gwu-libraries /
social-feed-manager
github.com / gwu-libraries / social-feed-manager
a
traditional project
1
expand scope
of
collection development
2
at-risk
e-resource
licensing story
3
save the time
of the
researcher
let’s start
with
the researcher
“How Mainstream News
Outlets Use Twitter” (2011)
• GWU’s Kimberly Gross (SMPA) +
students
• Pew Research Center’s Project for
Excellence in Journalism
• “news agenda these organizations
promoted on Twitter closely matches
that of their legacy platforms”
http://www.journalism.org/analysis_report/
how_mainstream_media_outlets_use_twitter
how do researchers
study social media?
by hand.
•google reader
•copy and paste
•fold, spindle, mutilate
•excel
•...eventually, SPSS
and similar tools
whatever
help
they can get
it’s a lot of work
for not a lot of data
(1000s of tweets)
copy and paste
to excel
doesn’t scale
just ask any student assigned to do this!
first tweet, in native JSON
a
strategic
disadvantage
5,000+
theses/dissertations
since 2010
(not all CS grad students)
see Leetaru et al.
May 2013
First Monday
librarians can help here
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
options
for
historical data?
Twitter-licensed
data providers:
DataSift
Gnip
Topsy
data providers
•friendly
•not cheap
•more than we need
•expensive
•still need tools to
collect, process, etc.
what can we do
ourselves
?
social feed manager
github.com /
gwu-libraries /
social-feed-manager
what researchers ask for
•specific users, keywords
•historic time periods
•basic values: user, date, text,
counts
•10000s, not 10000000s
•delimited files to import
can do this
free
w/public API
twitter api
•user timelines
•filter streams
•spritzer
•search
up to 3,200
most recent tweets
any public user
200 at a time
and go back again for more later
dev.twitter.com/docs/working-with-timelines
1,969,760 tweets
from
1,228 users
group users in sets
export by user / set
all at once
or time slices
40+ media outlets
400+ elected officials
300+ journalists
300+ GWU groups
filter streams
millions of tweets
as they occur
around an event
filter streams
* a little more complicated than that
•filter by users, keywords, geo
•about 3,000 tweets / min *
•10,000,000s of tweets
•political debates, news events
spritzer feed
•~0.5% of all public tweets
•~3,000,000 tweets / day
(growing)
•a useful random sampling
search
•after an event
•find users, keywords
•limited - better than nothing
we can do
all this
at no marginal cost
for data*
* not really “big data” - GBs, not TBs
this much
alone
meets several needs
this much
alone
shows at-risk nature
when the Pope resigned
when Congress turned over
•16+ accounts deleted /
hidden
•combined 105,993 followers
•14,479 tweets saved in SFM
no longer public
if a researcher needs more
•support selection,
acquisition, accession,
storage, transformation
•collect what’s free around
it to minimize cost
•plan purchase via grant
•collect prospectively
next steps
improving sfm
•support concurrent per-user
filters / streams
•add Sina Weibo,YouTube,
others as asked
drive
selective, automated
web archiving
ensure
you can use
sfm
you can have it! it’s free to use, copify, modify, redistribute
discovery
?
the
obvious solution
653 - subject added entry, uncontrolled for hashtags
700 - name added entries for mentions
856 42 - URL of related resource for included links
500 - note for retweet count
336, 337, 338 - RDA ready!
w / catmandu
slinging data around
is fun and easy!
already indexed piles of tweets in ElasticSearch*
* really!
we will add
2 - 4 million
catalog records
per month
WorldCat
can handle this
it’s web scale!
augmenting / creating
authority records
w / twitter screen names
already cleared it with a PCC / NACO rep!
Summon
can handle this
Andrew is very familiar with growing consortial catalogs!
github.com /
gwu-libraries /
social-feed-manager
@dchud
dchud @ gwu edu

More Related Content

Similar to collecting twitter data w/social feed manager

#AMC2013 Participatory Social Impact Research
#AMC2013 Participatory Social Impact Research#AMC2013 Participatory Social Impact Research
#AMC2013 Participatory Social Impact Research
Georgia Bullen
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
Simon Bishop
 

Similar to collecting twitter data w/social feed manager (20)

Using Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesUsing Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challenges
 
Digital Journalism Mongolia
Digital Journalism Mongolia Digital Journalism Mongolia
Digital Journalism Mongolia
 
Sn@tch CNI Fall 2014
Sn@tch CNI Fall 2014Sn@tch CNI Fall 2014
Sn@tch CNI Fall 2014
 
Social Media Dataset
Social Media DatasetSocial Media Dataset
Social Media Dataset
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of Twitter
 
Twitter: A Hands-On Learning Session for Researcher
Twitter: A Hands-On Learning Session for ResearcherTwitter: A Hands-On Learning Session for Researcher
Twitter: A Hands-On Learning Session for Researcher
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 
Big data dan riset sosial dan politik
Big data dan riset sosial dan politikBig data dan riset sosial dan politik
Big data dan riset sosial dan politik
 
Data Driven Journalism Links and Resources
Data Driven Journalism Links and Resources Data Driven Journalism Links and Resources
Data Driven Journalism Links and Resources
 
#AMC2013 Participatory Social Impact Research
#AMC2013 Participatory Social Impact Research#AMC2013 Participatory Social Impact Research
#AMC2013 Participatory Social Impact Research
 
Twitter: Beyond the Basics 2015 by Amy Neumann at Tri-C
Twitter: Beyond the Basics 2015 by Amy Neumann at Tri-C Twitter: Beyond the Basics 2015 by Amy Neumann at Tri-C
Twitter: Beyond the Basics 2015 by Amy Neumann at Tri-C
 
Promising Techniques Used By Social Media Savvy Funders [Webinar]
Promising Techniques Used By Social Media Savvy Funders [Webinar]Promising Techniques Used By Social Media Savvy Funders [Webinar]
Promising Techniques Used By Social Media Savvy Funders [Webinar]
 
Ismte2011 social media
Ismte2011 social mediaIsmte2011 social media
Ismte2011 social media
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media framework
 
Using Social Media to Engage Professional Alumni
Using Social Media to Engage Professional AlumniUsing Social Media to Engage Professional Alumni
Using Social Media to Engage Professional Alumni
 
Using social media to enhance your research handout
Using social media to enhance your research handoutUsing social media to enhance your research handout
Using social media to enhance your research handout
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 
Curation is for cytomics
Curation is for cytomicsCuration is for cytomics
Curation is for cytomics
 

More from Dan Chudnov

think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talk
Dan Chudnov
 

More from Dan Chudnov (12)

Overview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research LabOverview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research Lab
 
stuff i'm learning in data school
stuff i'm learning in data schoolstuff i'm learning in data school
stuff i'm learning in data school
 
think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talk
 
what i want from linked data
what i want from linked datawhat i want from linked data
what i want from linked data
 
web archiving tools and technologies
web archiving tools and technologiesweb archiving tools and technologies
web archiving tools and technologies
 
WWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service MediumWWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service Medium
 
introduction to Django in five slides
introduction to Django in five slides introduction to Django in five slides
introduction to Django in five slides
 
CTS at LC - Access 2010
CTS at LC - Access 2010CTS at LC - Access 2010
CTS at LC - Access 2010
 
Repository Development at LC - Access 2009
Repository Development at LC - Access 2009Repository Development at LC - Access 2009
Repository Development at LC - Access 2009
 
Hacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, PythonHacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, Python
 
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and PythonHacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
 
Hacker 101/102 - Introduction to Programming w/Processing
Hacker 101/102 - Introduction to Programming w/ProcessingHacker 101/102 - Introduction to Programming w/Processing
Hacker 101/102 - Introduction to Programming w/Processing
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 

collecting twitter data w/social feed manager