Social Feed Manager, WADL/JCDL 2016

Social Feed Manager
Laura Wrubel
@liblaura @SocialFeedMgr
http://go.gwu.edu/sfm
Web Archives and Digital Libraries workshop, JCDL 2016
Social Feed Manager is supported by the National Historical Publications & Records Commission

Allows users to create collections of data
from social media platforms

Open source software, not a black box

Research documentation (for researchers)
≈ provenance metadata (for archivists)
(and it’s really important for both)

Creation
Authoring of the social media
● Creation metadata is provided by Twitter as JSON via API.
● Social media user metadata:
○ Screen name
○ Date account created
○ Location
● Tweet metadata:
○ Date
○ Tweet text
○ Mentions
○ Hashtags
○ URLs
○ Source (how posted)
● SFM records it in WARC files.

Selection
Decisions by the SFM user which leads SFM to harvest the tweet
Recorded in the SFM database
● Collection information
○ Harvest type
○ Harvest options (e.g., incremental, harvest web resources)
○ Credentials (API keys)
○ Description of collection
● Seeds for the collection (which vary by platform)
○ Screen name
○ UID
○ Keywords to filter on
● Change log
○ Change note
○ Fields changed
○ User who made change
○ Date of change

Collection
How SFM retrieved the tweet from Twitter’s API
● Collection metadata is received by SFM’s Twitter harvester & recorded
within WARCs.
● WARCs include the exact HTTP request/response
○ URL with params such as user account id or keywords
○ HTTP headers
● WARC record headers also include:
○ Date WARC record created
○ Server information
○ Fixities

Collection (cont)
● WARC file metadata, recorded in the SFM database:
○ File location
○ File size
○ Fixity
○ Creation date
● Harvest metadata:
○ Date
○ Collection
○ Date harvest started
○ Date harvest ended
○ Messages (informational, warning, or error)
○ Token/seed updates
○ Basic stats on number of items collected

Working paper: http://bit.ly/tweet-prov
Comments welcome!

How is this useful? http://bit.ly/tweet-prov
● Which of this provenance metadata do you (researcher,
archivist, librarian, etc.) want access to?
● How do you want access to this metadata? In SFM’s UI? In
reports when exports are created? Exposed via SFM’s
software libraries? A REST API? Machine-readable?
Human-readable?
● What metadata have we missed?
● Do the answers to the previous questions vary by discipline
(e.g., humanities, social science, etc.)?
● Are there other relevant specifications or standards that we
should consider? Is there value in a mapping to or providing
output in accordance with metadata standards such as
PREMIS or PROV?

Social Feed Manager, WADL/JCDL 2016

More Related Content

Similar to Social Feed Manager, WADL/JCDL 2016

Recently uploaded

Social Feed Manager, WADL/JCDL 2016