Prototypes of pro-active approaches to support the archiving of web references for scholarly communications

499 views

Published on

Delivered by Richard Richard Wincewicz at Open Repositories OR2015, Indianapolis, IN, USA, June 2014.

An introduction to "Reference or Link Rot", the evidence for the extent of the problem, and remedies proposed by the Hiberlink project.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
499
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Prototypes of pro-active approaches to support the archiving of web references for scholarly communications

  1. 1. Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz1, Peter Burnhill1 & Herbert Van de Sompel2 1EDINA, University of Edinburgh, 2Los Alamos National Laboratory
  2. 2. The Project Team 2013 – 2015, funded by the Andrew W. Mellon Foundation • Los Alamos National Laboratory: Research Library: Herbert Van de Sompel Harihar Shankar, [Martin Klein, Rob Sanderson] • University of Edinburgh: Language Technology Group: Claire Grover, Beatrice Alex, Colin Matheson, Richard Tobin, [Ke “Adam” Zhou] EDINA * : Peter Burnhill, Muriel Mewissen (Project Manager), Tim Stickland, Richard Wincewicz, [Neil Mayo] Centre for Service Delivery & Digital Expertise
  3. 3. Overview 1. Introduction 2. Evidence 3. Remedy
  4. 4. 1. Introduction
  5. 5. Reference Rot Links to Web at Large resources are subject to Reference Rot. This is a combination of two factors: • Link Rot: Link stops working • e.g. HTTP 404 “Not Found” • Content Drift: Linked content changes over time • Possibly to the extent that it is no longer representative of the content that was initially referenced
  6. 6. 2. Evidence
  7. 7. Articles that Link to Articles & to Web At Large Resources (PMC) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
  8. 8. Articles that Link to Articles & to Web At Large Resources (Elsevier) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
  9. 9. Articles with URI References (PMC) Articles 479,194 with URI references 399,005 with URI references to articles 240,857 with URI references to Web at Large 156,160 Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
  10. 10. Link Rot (PMC) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
  11. 11. Link Rot (Elsevier) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
  12. 12. Links from arXiv, Elsevier, PMC to TLD Targets Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
  13. 13. Grey is Link Rot – Referenced Content Not Accessible Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
  14. 14. Grey is Not Archived - Referenced Content Lost Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
  15. 15. Content Drift – http://dl00.org 2000 2004 2005 2008 (a) Dynamic content values on webpage change over time (b) Static content but very different (often unrelated) web pages
  16. 16. 3. Remedy
  17. 17. Create Snapshots of Referenced Resources Various web archives support on-demand creation of snapshots of URIs (manual, API):  archive.today  Internet Archive  perma.cc  webcitation.org When creating snapshots, maintain:  Original URI  Snapshot URI  Date/Time of snapshot
  18. 18. Create Snapshots of Referenced Resources Snapshots can be created at various stages. The closer to the moment of referencing, the better the image captured. Stage Actor Snapshot Quality Preparation Author/reference tool best Submission /Issue Editor/manuscript system good Publication Aggregator/ publisher platform ok Post-publication Librarian/IR, journal archive better than nothing
  19. 19. Authoring - Zotero Plugin Demonstrator Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero for pro-active archiving and temporal references https://www.youtube.com/v/ZYmi_Ydr65M%26vq
  20. 20. Publication - OJS
  21. 21. Publication - OJS
  22. 22. Publication - OJS
  23. 23. Publication - OJS
  24. 24. Publication - HiberActive Service Demonstrator Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
  25. 25. Reference Resources Robustly When referencing resources include:  Original URI – Allows the user to revisit the URI as it is at the time of reading, if the URI is still operational  Snapshot URI – Allows the user to visit the snapshot, if one was created, and if the web archive in which it was created is still operational  Date/Time – with the original URI allow the user to visit any snapshot created around the Date/Time in any web archive around the world (using Memento infrastructure) (2015) Robust Links - Motivation http://robustlinks.mementoweb.org/about/
  26. 26. Reference Resources Actionably When referencing resources, use Link Decorations to convey Original URI, Snapshot URI, Date/Time <a href=“http://www.stanford.edu” data-originalurl=“http://archive.is/FAy6o” data-versiondate=“2014-08-15” > <a href=“http://www.stanford.edu” data-versiondate=“2014-08-15” > Herbert Van de Sompel et al. (2015) Robust Links - Link Decorations http://robustlinks.mementoweb.org/spec/ <a href=“http://archive.is/FAy6o” data-versionurl=“http://www.stanford.edu” data-versiondate=“2014-08-15” >
  27. 27. Robust Links Using Link Decorations, JavaScript, Memento API Demo - http://robustlinks.mementoweb.org/demo/uri_references_js.html robustlinks.js - https://github.com/mementoweb/robustlinks
  28. 28. Activate Robust Links There are no Link Decorations, currently. But there is an article publication date:  Express the article publication date in an actionable manner (‘datePublished’ or ‘dateModified’ Schema.org properties) in HTML pages that contain URI references  Tailor robustlinks.js to exclude links to articles  Inject robustlinks.js in HTML pages that contain URI references
  29. 29. Users Follow Robust Links into Web Archives The combination of the referenced URI and the article publication date:  Leads users to a snapshot in a web archive, created as close as possible to the article publication date  Addresses link rot  Addresses content drift
  30. 30. Create Archive Copies When ingesting new content into the platform:  Parse for URI references  Create snapshots in web archives of select URIs  For these URIs, use Link Decorations in HTML to convey: • original URI • snapshot URI • snapshot Date/Time
  31. 31. Users Follow Robust Links into Web Archives The Link Decorations:  Lead users to the created snapshot, if the web archive is operational  Lead users to a snapshot in any web archive, created as close as possible to the snapshot Date/Time  Addresses link rot  Addresses content drift
  32. 32. Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz1, Peter Burnhill1 & Herbert Van de Sompel2 1EDINA, University of Edinburgh, 2Los Alamos National Laboratory http://hiberlink.org #hiberlink

×