2. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• Collected by researchers
• Donated by activists
• Images, audio, video, scanned documents, web server
logs, social media
Digital Ephemera Collections
3. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• Tahrir Square Egypt & Libya unrest, 2011
• Tōhoku earthquake and tsunami, Japan, 2011
• AirAsia 8501 crash, December 2014
• Charlie Hebdo shooting, January 2015
Digital Ephemera Collections - Tweets
Social Feed Manager
http://social-feed-manager.readthedocs.org/
4. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• 264,000 hours of TV news archived digitally
• Recorded 2005-present, ca. 100 shows/day
• 13 countries, 9 languages
• 38 networks
• Searchable by captions, on-screen text, official transcripts
NewsScape
8. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
Tweets from @LATimes - Example
http://t.co/f5bp0E5twB
http://lat.ms/1EdoVAW
http://www.latimes.com/food/dailydish/la-dd-cinco-de-mayo-
25-recipes-to-celebrate-20150504-story.html
9. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• 3,246 tweets
• Containing URIs: 3,061
• all shortened by Twitter e.g., http://t.co/LThs4bfDrb
• URIs already shortened before Twitter: 3,018 (99%)
• lat.ms (2,976), fw.to (28), ow.ly (7), wp.me (4),
youtu.be, goo.gl, bit.ly (1 each)
• 2,991 (98%) are, in fact, longer than posted URI
(66 shorter, 4 same length)
Tweets from @LATimes - Stats
10. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• URIs shortened by Twitter archived:
• URIs otherwise shortened archived:
• “Original” URIs archived:
• Contributing web archives:
• Internet Archive: 2,195 (93%)
• Archive.today: 84
• Archive-It: 62
Archival Status
1
35
2,345 (77%)
These stats are brought to you by:
Memento
http://timetravel.mementoweb.org/
11. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
Similarity between tweet text
and
article’s web page title
“Archival Value”
0.0 0.2 0.4 0.6 0.8 1.0
020040060080010001200
IdenticalDissimilar
12. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
0.0 0.2 0.4 0.6 0.8 1.0
020040060080010001200
Not archived!
Need to Archive
IdenticalDissimilar
13. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
0.0 0.2 0.4 0.6 0.8 1.0
020040060080010001200
Need to Archive
IdenticalDissimilar
Archived!
14. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
Similarity between
web page’s text in May 2015
and
on 10/30/2015
Urgency to Archive
IdenticalDissimilar
0.0 0.2 0.4 0.6 0.8 1.0
02004006008001000
15. SoLoGlo
Martin Klein @mart1nkle1n
iPres 2015, Chapel Hill, NC, November 3rd 2015
• Online news change rather rapidly, duh!
• Social media, for some news organizations, is not just a
mirror of their web site, hence worth archiving
• Timing matters!
• Traditional, incidental web archiving seems insufficient
• Re dynamic character, change frequency
• Tools matter!
• Need to get closer to the publication process!
Conclusions