Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Collecting in the MomentCollecting in the Moment
Gretchen Gueguen
University of Virginia
RBMS Pre-conference
June 24, 2013
June 10, 2012
Teresa A. Sullivan, President of the University of
Virginia announces resignation…
…Gretchen M. Gueguen, Dig...
June 11-16, 2012
June 18, 2012
June 18, 2012
• Decision is made to form a cross-
departmental group within the library to
discuss saving the historic rec...
At 9:00 a.m. on July 19th
…
me
What’s the Big Deal?
• Digital is THE publishing platform
• Event was important for both the historic
nature of the events...
Springing into action
• Twitter
• Blogs and Web
• Facebook
• News
• Video
Twitter API
• Allows you to download tweets as data
for a given hashtag, user, or keyword
search (#woo-hoo!)
• Has many to...
Info at:
http://mashe.hawksey.info/2011/11/twitter-how-to-archive-event-hashtags-and-visualize-conversation/
Final Collection
• 47 XML files
– #BOV, #UVA, #rally4honor, #dragasmustgo
– @cavalierdaily, @LarrySabato, @Rector Drago,
@...
Twitter API update
Re-harvest has returned ~53,000 tweets
– Data issues
– Deleted accounts
Posted content
• Links, pictures, video related to the story
• Could not find a tool to just extract these
to look through...
Blogs and other web content
• How to capture everything else
• Tools for web capture
– Difficult to implement
– Don’t do e...
Web sites
• No way to create web-archive standard (WARC)
files at the time
– ~1,000 HTML +archive
– Screenshot
Investigati...
Facebook & “Privacy”
• Rallies on grounds were organized
through Facebook “groups.”
• Some posts are visible
only to membe...
Facebook & Privacy
• Facebook accounts are free
• But this still means the content wasn’t
“public” as per the TOS
News
• Relatively easy to capture
• Overwhelming in volume
• Why capture the online version?
– Some things only appear onl...
Subscriptions
Audio/Video
• YouTube
• News
• WINA podcast
• WUVA streaming
• Streaming Board
Meetings
• Public Affairs
User Contributions
• Capture what the
public thought was
important
• Possible violations
of privacy or
intellectual proper...
Final Tally
• Tweets: 80,000 ?
• News articles: 572
• Blog posts: 147
• Other web content: 196
• Twitter pictures: 243
• V...
What’s Been Done
• Preliminary collection finding aid
• Working with small group
on twitter and web data
issues
• Twitter ...
What Needs to Be Done?
• Access
– Searching
– Use
• Metadata
• Further appraisal
decisions/de-
accessioning
What About Next Time?
• Need to establish a web/social media
collection plan
– If we are routinely
capturing certain thing...
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Collecting in the Moment
Upcoming SlideShare
Loading in …5
×

Collecting in the Moment

580 views

Published on

In the wake of recent events at the University of Virginia surrounding the ousting, and later reinstatement, of President Teresa Sullivan, the University Library, including the University Archives, in the Albert and Shirley Small Special Collections Library, scrambled to collect picket signs, gather tweets and Facebook postings, and bring together other materials documenting the events on Grounds, even as they were unfolding. In the light of this event and others like the Occupy Movements, Arab Spring, and 9/11, this discussion will explore questions of how institutions document, save, and preserve materials pertaining to current events, especially when those events are born through social networking sites. Gretchen Gueguen, Digital Archivist at the University of Virginia, will discuss her work to capture digital material from social media and websites during the Sullivan episode. A wide-ranging discussion with all audience members will follow to uncover questions of how to approach such events and what the role of the Archives or Special Collection might be in creating and managing such records.

Speaker: Gretchen Gueguen, Albert and Shirley Small Special Collections Library, University of Virginia

Moderator: Nicole Bouché, Albert and Shirley Small Special Collections Library, University of Virginia

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Collecting in the Moment

  1. 1. Collecting in the MomentCollecting in the Moment Gretchen Gueguen University of Virginia RBMS Pre-conference June 24, 2013
  2. 2. June 10, 2012 Teresa A. Sullivan, President of the University of Virginia announces resignation… …Gretchen M. Gueguen, Digital Archivist at Uva, prepares to attend Rare Book School the next day
  3. 3. June 11-16, 2012
  4. 4. June 18, 2012
  5. 5. June 18, 2012 • Decision is made to form a cross- departmental group within the library to discuss saving the historic record related to these events
  6. 6. At 9:00 a.m. on July 19th … me
  7. 7. What’s the Big Deal? • Digital is THE publishing platform • Event was important for both the historic nature of the events (message) but also HOW it was communicated (medium)
  8. 8. Springing into action • Twitter • Blogs and Web • Facebook • News • Video
  9. 9. Twitter API • Allows you to download tweets as data for a given hashtag, user, or keyword search (#woo-hoo!) • Has many tools available for doing all kinds of neat stuff (#woo-hoo!) • Limits you to just the last 1500 tweets for any given search (#d’oh!)
  10. 10. Info at: http://mashe.hawksey.info/2011/11/twitter-how-to-archive-event-hashtags-and-visualize-conversation/
  11. 11. Final Collection • 47 XML files – #BOV, #UVA, #rally4honor, #dragasmustgo – @cavalierdaily, @LarrySabato, @Rector Drago, @strategydynamo • 47 spreadsheets – Hashtags only (#UVA, #sullivan, #BOV, #fillthelawn, #strine, #united4honor,)
  12. 12. Twitter API update Re-harvest has returned ~53,000 tweets – Data issues – Deleted accounts
  13. 13. Posted content • Links, pictures, video related to the story • Could not find a tool to just extract these to look through later • Many shortened links that had to be clicked on to find out what they held • Many links were retweeted
  14. 14. Blogs and other web content • How to capture everything else • Tools for web capture – Difficult to implement – Don’t do exactly what is needed – I’m running out of time! • Solution: – I have to look at it anyway to select, so • Firefox “Save As” • Screengrab plugin for screenshot
  15. 15. Web sites • No way to create web-archive standard (WARC) files at the time – ~1,000 HTML +archive – Screenshot Investigation of WAIL (Web Archive Integrated Layer) to create WARC files – Will require a re-harvest of URLs to ensure proper header metadata – But has automated way of doing this
  16. 16. Facebook & “Privacy” • Rallies on grounds were organized through Facebook “groups.” • Some posts are visible only to members of the group. All others are only visible to those with a Facebook account.
  17. 17. Facebook & Privacy • Facebook accounts are free • But this still means the content wasn’t “public” as per the TOS
  18. 18. News • Relatively easy to capture • Overwhelming in volume • Why capture the online version? – Some things only appear online, some only in print – Online version, for many sources, allows commenting • Why capture this when it will be saved elsewhere? – Reference collection – Databases may capture content but not commentary
  19. 19. Subscriptions
  20. 20. Audio/Video • YouTube • News • WINA podcast • WUVA streaming • Streaming Board Meetings • Public Affairs
  21. 21. User Contributions • Capture what the public thought was important • Possible violations of privacy or intellectual property
  22. 22. Final Tally • Tweets: 80,000 ? • News articles: 572 • Blog posts: 147 • Other web content: 196 • Twitter pictures: 243 • Video: 69 • Documents: 21 • User-Contributed Items: 118
  23. 23. What’s Been Done • Preliminary collection finding aid • Working with small group on twitter and web data issues • Twitter and web re- harvest • Access provided in a few cases
  24. 24. What Needs to Be Done? • Access – Searching – Use • Metadata • Further appraisal decisions/de- accessioning
  25. 25. What About Next Time? • Need to establish a web/social media collection plan – If we are routinely capturing certain things we won’t have to worry about them during a crisis – Tools change rapidly, working on collecting routinely will better position ourselves to adapt

×