Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Creating Pockets of Persistence 
Herbert Van de Sompel 
@hvdsomp 
http://public.lanl.gov/herbertv/ 
Los Alamos National La...
Addressing the Link/Reference Rot Challenge 
• Pockets of Persistence 
• Capture – Archive Pro-Actively, Selectively 
• Re...
Pockets of Persistence 
Herbert Van de Sompel 
How to achieve the ability to: 
404/File Not Found, Washington, DC, October...
Pockets of Persistence 
Herbert Van de Sompel 
How to achieve the ability to: 
404/File Not Found, Washington, DC, October...
Illustration 
Herbert Van de Sompel 
Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 2014 
404/F...
Illustration – Link Rot 
Herbert Van de Sompel 
Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 ...
Illustration – Link Rot 
Herbert Van de Sompel 
Current version of http://liarsociety.tripod.com/blog/index.blog?from=2004...
Illustration – Content Drift 
Version of http://en.wikipedia.Herbert org/Van wiki/de Coil_(Sompel 
band) dated October 2 2...
Illustration – Content Drift 
Herbert Van de Sompel 
Current version of http://en.wikipedia.org/wiki/Peter_Christopherson ...
Illustration – Content Drift 
Version of http://en.wikipedia.org/wiki/Peter_Christopherson that was current on October 2 2...
Pockets of Persistence 
Herbert Van de Sompel 
How to achieve the ability to: 
404/File Not Found, Washington, DC, October...
Addressing the Link/Reference Rot Challenge 
• Pockets of Persistence 
• Capture – Archive Pro-Actively, Selectively 
• Re...
Pro-Active Capture for a Seed Collection 
• Seed Collection - Starting point for capture is a seed collection of 
interest...
Pro-Active Capture for Seed Collection 
• What those crucial lifecycle events are may depend on the 
• Creation of new art...
Authoring Legal Documents – perma.cc 
Herbert Van de Sompel 
http://perma.cc 
404/File Not Found, Washington, DC, October ...
Authoring Scholarly Literature: Experimental Zotero Extension 
Richard Wincewicz (2014) Prototype Hiberlink plugin for Zot...
Submitting Scholarly Literature: Experimental HiberActive Service 
Martin Klein et al. (2014) HiberActive: Pro-Active Arch...
Pro-Active Capture for Seed Collection 
• Interoperability for on-demand capture: 
o Need basic interoperability for machi...
Addressing the Link/Reference Rot Challenge 
• Pockets of Persistence 
• Capture – Archive Pro-Actively, Selectively 
• Re...
Reference Captures and Annotate Links 
• Existing practice for linking to captures: 
o Link to URI of Capture 
o Lose Capt...
Permanent Existence/Uptime of Archives? 
Capture of http://webcitation.org dated July 17 2013 
Herbert Van de Sompel 
http...
Permanent Existence/Uptime of Archives? 
Herbert Van de Sompel 
http://webcitation.org/ on August 6 2014 
404/File Not Fou...
Permanent Existence/Uptime of Archives? 
Remnant of discontinued web archive http://mummify.it captured on February 14 201...
Permanent Existence/Uptime of Archives? 
http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-a...
Hacking Original URI, Capture Datetime from Capture URI? 
URI of Capture Original URI Datetime T 
https://web.archive.org/...
Using Capture URI to find Captures in Other Web Archives? 
Herbert Van de Sompel 
404/File Not Found, Washington, DC, Octo...
Using Capture URI to find Captures in Other Web Archives? 
Herbert Van de Sompel 
404/File Not Found, Washington, DC, Octo...
Reference Captures and Annotate Links 
• Desired practice for linking to captures is to annotate the link so it 
Herbert V...
Reference Captures and Annotate Links 
• Desired practice for linking to captures is to annotate the link so it 
Herbert V...
Reference Captures and Annotate Links 
• Interoperability for link annotation: 
o Need an approach to convey, in a uniform...
Missing Link Proposal 
<a href=“http://liarsociety.tripod.com/blog/index.blog?from=20041130” 
data-versionurl=“https://arc...
Addressing the Link/Reference Rot Challenge 
• Pockets of Persistence 
• Capture – Archive Pro-Actively, Selectively 
• Re...
Memento Web Time Travel 
Use the Original URI 
Herbert Van de Sompel 
Current version of http://law.georgetown.edu/library...
Memento Web Time Travel 
And a Datetime 
Herbert Van de Sompel 
404/File Not Found, Washington, DC, October 24 2014
Memento Web Time Travel 
To automatically retrieve the temporally nearest available capture 
Capture of http://law.georget...
Memento Web Time Travel 
http://bit.ly/memento-for-chrome 
Herbert Van de Sompel 
http://mementoweb.org 
404/File Not Foun...
Travel in Time - Persistently, Precisely, Seamlessly 
On-Demand Capture URI of Capture Original URI Datetime T 
Herbert Va...
Travel in Time - Persistently, Precisely, Seamlessly 
On-Demand Capture URI of Capture Original URI Datetime T 
Herbert Va...
Travel in Time - Persistently, Precisely, Seamlessly 
On-Demand Capture URI of Capture Original URI Datetime T 
Herbert Va...
Travel in Time - Persistently, Precisely, Seamlessly 
On-Demand Capture URI of Capture Original URI Datetime T 
Herbert Va...
Travel in Time - Persistently, Precisely, Seamlessly 
On-Demand Capture URI of Capture Original URI Datetime T 
Not Availa...
Reference Captures and Annotate Links 
• Interoperability for time travel: 
o Memento protocol specifies interoperability ...
Conclusion 
• Significant technical solutions, infrastructure, ideas exist to 
address the link rot/reference rot challeng...
Creating Pockets of Persistence 
http://mementoweb.org 
http://hiberlink.org 
Herbert Van de Sompel 
404/File Not Found, W...
Upcoming SlideShare
Loading in …5
×

Creating Pockets of Persistence

Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.

Creating Pockets of Persistence

  1. 1. Creating Pockets of Persistence Herbert Van de Sompel @hvdsomp http://public.lanl.gov/herbertv/ Los Alamos National Laboratory Acknowledgements: Michael L. Nelson @phonedude_mln Old Dominion University Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  2. 2. Addressing the Link/Reference Rot Challenge • Pockets of Persistence • Capture – Archive Pro-Actively, Selectively • Reference – Annotate Links • Access – Travel in Time Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  3. 3. Pockets of Persistence Herbert Van de Sompel How to achieve the ability to: 404/File Not Found, Washington, DC, October 24 2014 • Persistently • Precisely • Seamlessly revisit the Web of the Past and the Web of the Now at some point in the Future
  4. 4. Pockets of Persistence Herbert Van de Sompel How to achieve the ability to: 404/File Not Found, Washington, DC, October 24 2014 • Persistently • Precisely • Seamlessly revisit the Web of the Past and the Web of the Now at some point in the Future Two components to the link/reference rot challenge: • Link rot: Links stop working aka 404 Not Found • Content drift: Referenced content changes over time
  5. 5. Illustration Herbert Van de Sompel Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 2014 404/File Not Found, Washington, DC, October 24 2014
  6. 6. Illustration – Link Rot Herbert Van de Sompel Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 2014 404/File Not Found, Washington, DC, October 24 2014
  7. 7. Illustration – Link Rot Herbert Van de Sompel Current version of http://liarsociety.tripod.com/blog/index.blog?from=20041130 on October 22 2014 404/File Not Found, Washington, DC, October 24 2014
  8. 8. Illustration – Content Drift Version of http://en.wikipedia.Herbert org/Van wiki/de Coil_(Sompel band) dated October 2 2014 http://en.wikipedia.org/w/index.php?title=Coil_(band)&oldid=388321480 404/File Not Found, Washington, DC, October 24 2014
  9. 9. Illustration – Content Drift Herbert Van de Sompel Current version of http://en.wikipedia.org/wiki/Peter_Christopherson on October 22 2014 404/File Not Found, Washington, DC, October 24 2014
  10. 10. Illustration – Content Drift Version of http://en.wikipedia.org/wiki/Peter_Christopherson that was current on October 2 2010 Herbert Van de Sompel http://en.wikipedia.org/w/index.php?title=Peter_Christopherson&oldid=387987414 404/File Not Found, Washington, DC, October 24 2014
  11. 11. Pockets of Persistence Herbert Van de Sompel How to achieve the ability to: 404/File Not Found, Washington, DC, October 24 2014 • Persistently • Precisely • Seamlessly revisit the Web of the Past and the Web of the Now at some point in the Future This challenge exists for the entire web, but some communities actually care about addressing it: • scholarly communication, • legal publications, • journalism, • Wikipedia, • … Mobilize the communities that care about this problem to work towards joint, interoperable solutions, approaches
  12. 12. Addressing the Link/Reference Rot Challenge • Pockets of Persistence • Capture – Archive Pro-Actively, Selectively • Reference – Annotate Links • Access – Travel in Time Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  13. 13. Pro-Active Capture for a Seed Collection • Seed Collection - Starting point for capture is a seed collection of interest to communities that care, e.g. o On-Line journalism • Lifecycle Events – Intervene at critical moments in the lifecycle of items in these collections to pro-actively capture o Collection items – some solutions in place o Web resources referenced in collection items Herbert Van de Sompel o Scholarly literature o Legal documents o Wikipedia articles 404/File Not Found, Washington, DC, October 24 2014
  14. 14. Pro-Active Capture for Seed Collection • What those crucial lifecycle events are may depend on the • Creation of new article • Creation of new version of article • Creation of substantially new version of article • Addition of external reference to article • References to article exceed a certain threshold Scholarly Literature Herbert Van de Sompel collection type Wikipedia 404/File Not Found, Washington, DC, October 24 2014
  15. 15. Authoring Legal Documents – perma.cc Herbert Van de Sompel http://perma.cc 404/File Not Found, Washington, DC, October 24 2014
  16. 16. Authoring Scholarly Literature: Experimental Zotero Extension Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero for pro-active archiving and temporal references Herbert Van de Sompel https://www.youtube.com/v/ZYmi_Ydr65M%26vq 404/File Not Found, Washington, DC, October 24 2014
  17. 17. Submitting Scholarly Literature: Experimental HiberActive Service Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles Herbert Van de Sompel Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive 404/File Not Found, Washington, DC, October 24 2014
  18. 18. Pro-Active Capture for Seed Collection • Interoperability for on-demand capture: o Need basic interoperability for machine-driven on-demand capture: - Discovery of capture interface - Interface IN - [ Original URI ] - Interface OUT - [ URI of Capture ; Capture Datetime ] Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  19. 19. Addressing the Link/Reference Rot Challenge • Pockets of Persistence • Capture – Archive Pro-Actively, Selectively • Reference – Annotate Links • Access – Travel in Time Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  20. 20. Reference Captures and Annotate Links • Existing practice for linking to captures: o Link to URI of Capture o Lose Capture Datetime • Problems with existing practice: o Impossible to visit the original URI, if desired o Requires the permanent existence/uptime of the archive that holds the capture - One link rot problem replaced by another Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rot Herbert Van de Sompel o Lose Original URI http://mementoweb.org/missing-link/ 404/File Not Found, Washington, DC, October 24 2014
  21. 21. Permanent Existence/Uptime of Archives? Capture of http://webcitation.org dated July 17 2013 Herbert Van de Sompel https://archive.today/eAETp 404/File Not Found, Washington, DC, October 24 2014
  22. 22. Permanent Existence/Uptime of Archives? Herbert Van de Sompel http://webcitation.org/ on August 6 2014 404/File Not Found, Washington, DC, October 24 2014
  23. 23. Permanent Existence/Uptime of Archives? Remnant of discontinued web archive http://mummify.it captured on February 14 2014 Herbert Van de Sompel https://web.archive.org/web/20140214233752/https://www.mummify.it/ 404/File Not Found, Washington, DC, October 24 2014
  24. 24. Permanent Existence/Uptime of Archives? http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-archive-over-islamic-state-video/ Herbert Van de Sompel 510074.html 404/File Not Found, Washington, DC, October 24 2014
  25. 25. Hacking Original URI, Capture Datetime from Capture URI? URI of Capture Original URI Datetime T https://web.archive.org/web/20140214233752/https:// www.mummify.it https://archive.today/eAETp no no http://perma.cc/4RH7-999Q?type=source no no http://en.wikipedia.org/w/index.php?title=Coil_(band) &oldid=388321480 Herbert Van de Sompel yes yes no no 404/File Not Found, Washington, DC, October 24 2014
  26. 26. Using Capture URI to find Captures in Other Web Archives? Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  27. 27. Using Capture URI to find Captures in Other Web Archives? Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  28. 28. Reference Captures and Annotate Links • Desired practice for linking to captures is to annotate the link so it Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 conveys: - URI of Capture - Original URI - Capture Datetime • Link annotation supports fallback to other archives: o Original URI allows finding captures in all web archives o Capture Datetime allows finding an appropriate capture in all web archives o Original URI and Capture Datetime allows automatic access to an appropriate capture in all web archives (see Access) Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rot http://mementoweb.org/missing-link/
  29. 29. Reference Captures and Annotate Links • Desired practice for linking to captures is to annotate the link so it Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 conveys: URI of Capture Original URI Capture Datetime
  30. 30. Reference Captures and Annotate Links • Interoperability for link annotation: o Need an approach to convey, in a uniform, machine-actionable - URI of Capture - Original URI - Capture Datetime o Missing Link Proposal - http://mementoweb.org/missing-link/ o W3C Robustness and Archiving Community Group - http://www.w3.org/community/irobar/ Herbert Van de Sompel way: • Ongoing efforts: 404/File Not Found, Washington, DC, October 24 2014
  31. 31. Missing Link Proposal <a href=“http://liarsociety.tripod.com/blog/index.blog?from=20041130” data-versionurl=“https://archive.today/ElCHn” data-versiondate=“2008-02-06T00:00:00Z”> Herbert Van de Sompel URI of Capture Capture Datetime 404/File Not Found, Washington, DC, October 24 2014 Original URI Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rot http://mementoweb.org/missing-link/
  32. 32. Addressing the Link/Reference Rot Challenge • Pockets of Persistence • Capture – Archive Pro-Actively, Selectively • Reference – Annotate Links • Access – Travel in Time Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  33. 33. Memento Web Time Travel Use the Original URI Herbert Van de Sompel Current version of http://law.georgetown.edu/library/404/ on October 22 2014 404/File Not Found, Washington, DC, October 24 2014
  34. 34. Memento Web Time Travel And a Datetime Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  35. 35. Memento Web Time Travel To automatically retrieve the temporally nearest available capture Capture of http://law.georgetown.edu/library/404/ dated May 3 2014 Herbert Van de Sompel http://wayback.archive-it.org/all/20140503094327/http://www.law.georgetown.edu/library/404/ 404/File Not Found, Washington, DC, October 24 2014
  36. 36. Memento Web Time Travel http://bit.ly/memento-for-chrome Herbert Van de Sompel http://mementoweb.org 404/File Not Found, Washington, DC, October 24 2014
  37. 37. Travel in Time - Persistently, Precisely, Seamlessly On-Demand Capture URI of Capture Original URI Datetime T Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 Available Accessible + - - • Time Travel is: • Persistent – See next slide • Precise – Following link to URI of Capture retrieves exact capture • Seamless – Requires clicking a link as usual
  38. 38. Travel in Time - Persistently, Precisely, Seamlessly On-Demand Capture URI of Capture Original URI Datetime T Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 Available Not Accessible + - - • Time Travel is: • Persistent – Following link to URI of Capture leads nowhere • Precise – Following link to URI of Capture leads nowhere • Seamless – Following link to URI of Capture leads nowhere
  39. 39. Travel in Time - Persistently, Precisely, Seamlessly On-Demand Capture URI of Capture Original URI Datetime T Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 Available Not Accessible + + + • Time Travel is: • Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems • Precise – Using Memento with [ Original URI ; Datetime ] retrieves nearest capture from other archive • Seamless – Requires browser plugin
  40. 40. Travel in Time - Persistently, Precisely, Seamlessly On-Demand Capture URI of Capture Original URI Datetime T Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014 Available Accessible - + + • Time Travel is: • Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems • Precise – Using Memento with [ Original URI ; Datetime ] retrieves exact capture from other archive • Seamless – Requires browser plugin
  41. 41. Travel in Time - Persistently, Precisely, Seamlessly On-Demand Capture URI of Capture Original URI Datetime T Not Available - + + • Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems • Precise – Using Memento with [ Original URI ; Datetime ] retrieves nearest capture from other archive • Seamless – Requires browser plugin Herbert Van de Sompel • Time Travel is: 404/File Not Found, Washington, DC, October 24 2014
  42. 42. Reference Captures and Annotate Links • Interoperability for time travel: o Memento protocol specifies interoperability across web archives, version management systems o Memento protocol is supported by major web archives o Need to work towards Memento support by version management systems o Need to work towards making Memento experience seamless through native browser support o Need to work towards robustness and sustainability of Memento infrastructure Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  43. 43. Conclusion • Significant technical solutions, infrastructure, ideas exist to address the link rot/reference rot challenge • Mobilize the communities that care about this challenge to work towards joint, interoperable approaches Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014
  44. 44. Creating Pockets of Persistence http://mementoweb.org http://hiberlink.org Herbert Van de Sompel 404/File Not Found, Washington, DC, October 24 2014

×