COLLECTING HISTORY AS IT HAPPENS
Twitter was a primary source of information
about the protests in and in the name of
How can organizations archive
Twitter data along with other
collections to make a
comprehensive data set available
for research in ways that are
ethically sound & in compliance
with the Twitter Terms of Service?
ABOUT THE PROJECT
Collaborative project between:
• 2 year project, funded by Andrew W. Mellon Foundation
• Feb 1, 2016 – Jan 31, 2018
• Funding project coordinator, application development, subawards to collaborators, Advisory Board
BUILDING OFF EXISTING
Ferguson Collecting Initiative
Joel Levy, “Ferguson, Night 3, photo 4,” WUSTL Digital Gateway Image
Collections & Exhibitions, accessed August 12, 2015,
DOCUMENTING THE NOW
1. Create the DocNow application
• Cloud-ready, open-source application for collecting tweets, associated Web content, &
• Create data visualizations & data exports
2. Create a Ferguson social media data set for research & preservation
• Use Tweets and related data (Documenting Ferguson, oral histories) to create a data set
research questions about Ferguson protests
3. Produce a white paper
• Address ethical, copyright & access issues for collection, preservation & dissemination of
4. Convene an Advisory Board
• Scholars, activists & technologists who can help shape application
5. Build a community of users & advocates around DocNow
• Identify other scholars interested in using DocNow for their own research
ROLE OF LIBRARIES & ARCHIVES IN
COLLECTING HISTORY AS IT HAPPENS
Mark Regester, “Protester holding a sign,” WUSTL Digital Gateway Image
Collections & Exhibitions, accessed March 5, 2016.
CURATING VS COLLECTING
• Includes research and input from different individuals
• an evaluation of the collection content within the framework of a
collection development policy
• an assessment of the content for research use
• an assessment of originality and authenticity
• ownership transfers to archive
Collecting at scale:
• No selection
• No assessment of content
• No quality control
• Contributor / author retains rights
• Contextual curation via research projects
Mark Regester, “Woman with Obama sign Greater Grace
Church,” WUSTL Digital Gateway Image Collections &
Exhibitions, accessed August 12, 2015,
Courtesy Sonya Rooney, University Archivist, WUSTL Libraries
ETHICAL ISSUES WITH ARCHIVING TWITTER DATA
Twitter was heavily used during protests
Central gathering point for other social media channels
• Publish a photo on Instagram or Vine & Share on Twitter
But what about:
• Identifying / Incriminating photos
• Honoring intent
• If we’ve captured a Tweet that someone has deleted, do we
• Do nothing, and keep it in full in the archive?
• Mark it as deleted, but still make it available for research use?
• Delete it from our archive?
White paper will address these
issues and more…
Twitter was a primary source of information about Michael Brown’s death and the resulting protests in Ferguson and other areas of St. Louis.
Twitter was used to communicate & organize. Reporting was happening in real-time on the ground.
The response was immediate, public, and international.
An issue for those of us in the research community is that Twitter only makes 7 days worth of data available via its API.
The research question for our project is..
My Co-Pis and I all have experience in preserving social media or working with Ferguson-related efforts.
I bring to the table our Documenting Ferguson project and our local work with and within the Ferguson community.
Community archive of digital media contributed by people at or involved in protests, currently with 1,500 first-hand images & videos
Bergis Jules is the University Archivist at UCR & the Community Outreach Lead for the project
Bergis has written extensively about Ferguson and the role of libraries & archives in preserving social media content
Ed Summers is the Lead Developer at MITH & is the Tech Lead for our project.
Ed has worked extensively with preserving social media data in his previous role working on the Twitter Archive at Library of Congress.
He also started a process that gathered 13million tweets in the two weeks following Michael Brown’s death. That work inspired this project.
So, why libraries?
Libraries and archives have always collected materials in order to preserve them and make them available for research.
The format shift from analog to born digital, like this digital image from Documenting Ferguson or this tweet about Ferguson, poses new challenges in preserving and maintaining persistent access to research materials.
Libraries have long collected ephemera, like the posters & signs used in protests, but the move to digital ephemera is relatively new
In closing, I’d like to address a final question we’ve received in the early startup of the project, which is “Why is this project important?” Multiple levels:
Builds on & extends our existing Ferguson efforts
Broadly, to the Profession
New practices for archiving social media for research use
Demonstrates the valuable role the Libraries play in stewarding new research technologies & infrastructures, and continues our work in the Ferguson community.
These short-form messages document an important & ultimately tragic moment in American history. Twitter, by its nature, is ephemeral & ever-changing. There is a very real risk that this history could vanish should Twitter change its business model or access policies. Our project is working to safeguard that record and to make it available for research & reflection.