Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity for Web Archiving

Memento Tracer
An Innovative Approach Towards Balancing
Scale and Fidelity for Web Archiving

Presentation at RESAW The Web That Was
Amsterdam, NL, June 20 2019

  • Login to see the comments

  • Be the first to like this

Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity for Web Archiving

  1. 1. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Martin Klein Los Alamos National Laboratory @mart1nkle1n Herbert Van de Sompel DANS @hvdsomp Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity for Web Archiving
  2. 2. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Background: Scholarly Orphans Project The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation
  3. 3. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Scholarly Orphans Team • Los Alamos National Laboratory: • Lyudmila Balakireva • Martin Klein • James Powell • Harihar Shankar • Herbert Van de Sompel (now at DANS) • Old Dominion University: • Sawood Alam • Grant Atkins (now at Mitre) • Shawn Jones • Mat Kelly • Michael L. Nelson
  4. 4. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 • Consideration • Researchers deposit artifacts in web platforms • Status quo - Not systematically archived • No frameworks like LOCKSS/Portico exist for these artifacts • Researchers only selectively deposit artifacts in portals that provide archival guarantees; to obtain a cite-able DOI • Can’t expect researchers to (also) upload all artifacts in IRs • Web archives only incidentally archive these artifacts, cf. anecdotal & Hiberlink project evidence Research and Research Communication on the Web
  5. 5. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Emma Schymanski https://orcid.org/0000-0001-6868-8145 https://github.com/schymane https://www.slideshare.net/EmmaSchymanski https://figshare.com/authors/Emma_Schymanski/5087039 https://publons.com/author/1538491/emma-schymanski#profile https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
  6. 6. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Emma’s SlideShare Artifact: 0 Mementos https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge http://timetravel.mementoweb.org/
  7. 7. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Shawn Jones https://orcid.org/0000-0002-4372-870X http://www.shawnmjones.org/ https://github.com/shawnmjones https://www.slideshare.net/shawnmjones https://en.wikipedia.org/wiki/User:Shawnmjones https://www.blogger.com/profile/17827543974149663194
  8. 8. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Shawn’s GitHub Artifact: 1 Memento https://github.com/shawnmjones/mediawiki https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
  9. 9. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Hiberlink Evidence Web resources referenced in Elsevier corpus (1996-2012) without representative Memento in public web archives Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  10. 10. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Scholarly Orphans Project How to faithfully capture Scholarly Orphans for long-term archiving?
  11. 11. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Scale vs. Fidelity
  12. 12. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Scale! https://twitter.com/brewster_kahle/status/1016003169589981184
  13. 13. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Scale!! https://twitter.com/brewster_kahle/status/1118172506777509890
  14. 14. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Scale!!! https://twitter.com/brewster_kahle/status/1139700494748663809
  15. 15. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Fidelity? https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has- been-unarchivable.html http://web.archive.org/web/*/http://cnn.com
  16. 16. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Fidelity!! https://twitter.com/ianmilligan1/status/1136703505442324481https://twitter.com/MellonFdn/status/1138811967060267011
  17. 17. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Web Archiving: Scale? https://twitter.com/mart1nkle1n/status/1136705116738904067
  18. 18. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Resource Boundary https://www.slideshare.net/hvdsomp/paul-evan-peters-lecture
  19. 19. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Resource Boundary https://github.com/mementoweb/memento_extensions
  20. 20. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Memento Tracer Framework http://tracer.mementoweb.org Inspired by: • LOCKSS • Same automated approach for resources of a class • Webrecorder • Manual recording of web resources • Various attempts aimed at automating interactions/behaviors • E.g., Brozzler, Browsertrix
  21. 21. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Memento Tracer Framework http://tracer.mementoweb.org
  22. 22. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Memento Tracer DEMO http://tracer.mementoweb.org
  23. 23. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Current Memento Tracer Capabilities • Single clicks/links • All links in an area • Repeated click on links, with stop condition • Slides • Pagination • Nested traces i.e., “trace in a trace” • Trace for portal A  follow link to portal B  execute trace for portal B • Identification of page/portal for which a trace exists by URI (pattern)
  24. 24. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Memento Tracer Benefits • Scalability • Trace created once is applicable to all web resources of the same class • Traces shared via repository (edits, versioning) • Quality • Trace used as set of instructions for browser-based capture framework • Resource boundary explicit • Tradeoff • Quality vs performance
  25. 25. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Memento Tracer Challenges • Memento Tracer: • Language used to express Traces (interoperability) • Organization of the shared repository for Traces • Limitations of the browser event listener approach for recording Traces • Selection of a Trace for capturing a web publication by other means than URI pattern • Legal constraints
  26. 26. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 myresearch.institute - Pilot For more details and statistics, see our 2019 CNI Spring meeting slides: https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
  27. 27. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 myresearch.institute - Pilot For more details and statistics, see our 2019 CNI Spring meeting slides: https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
  28. 28. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 myresearch.institute - Pilot For more details and statistics, see our 2019 CNI Spring meeting slides: https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
  29. 29. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 myresearch.institute - Pilot For more details and statistics, see our 2019 CNI Spring meeting slides: https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
  30. 30. Memento Tracer @mart1nkle1n @hvdsomp The web that was, Amsterdam, NL, June 20 2019 Martin Klein Los Alamos National Laboratory @mart1nkle1n Herbert Van de Sompel DANS @hvdsomp Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity for Web Archiving The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation

×