Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Actions to Ensure the Integrity and Continuity of the Scholarly Record

1,632 views

Published on

Overview of the problems of Reference Rot and what actions to take to ensure the persistence of the digital scholarly record. Presented by Peter Burnhill with Adam Rusbridge & Muriel Mewissen, EDINA, University of Edinburgh, UK; Herbert Van De Sompel, Los Alamos National Laboratory Research Library, USA; Gaelle Bequet, ISSN International Centre, France; at Towards Open Science, LIBER, London, June 2015.

Published in: Education

Actions to Ensure the Integrity and Continuity of the Scholarly Record

  1. 1. Actions to Ensure the Integrity and Continuity of the Scholarly Record Peter Burnhill with Adam Rusbridge & Muriel Mewissen, EDINA, University of Edinburgh, UK Herbert Van De Sompel, Los Alamos National Laboratory Research Library, USA Gaelle Bequet, ISSN International Centre, France 09:40 – 10:00 Towards Open Science, LIBER, London, June 2015
  2. 2. Overview 1. The Scholarly (& Cultural) Record 2. Threat to the Continuity of the Scholarly (& Cultural) Record 3. Threat to the Integrity of our Scholarly Record 4. Ensuring the Integrity & Continuity of the Scholarly Record 5. Actions to Ensure the Integrity & Continuity of the Scholarly Record – Keywords: Stewardship, Collection, Cooperation, Advocacy, Spend
  3. 3. “The Scholarly Record has a fuzzy edge” ‘e-journals’ ‘book-length work’ 1. The (digital) Scholarly Record conference proceedings ‘data as findings’ New ‘research objects’
  4. 4. “The Scholarly Record has a fuzzy edge” (resources needed for scholarship) ‘e-journals’ Websites, Databases, Repositories ‘book-length work’ ‘Gov Docs’ 1. The (digital) Scholarly Record conference proceedings ‘e-magazines’ ‘e-newsmedia’ ‘data as findings’ New ‘research objects’
  5. 5. Online Continuing Resources ISSN ‘The (published) Scholarly Record’ ‘resources needed for scholarship’ Issued in Parts (Serials) Content changes over time (Integrating) ‘e-journals’ Websites, Databases, Repositories ‘Book-length work’ ‘Gov Docs’ Focus on what is published and that content that is issued online as a ‘continuing resource’ Conference proceedings ‘e-magazines’ ‘e-newsmedia’ New ‘research objects’
  6. 6. to ensure researchers, students & their teachers have ease and continuing access to online resources needed for open scholarship licence to use access to content & tools Our Shared Task is 2. Threat to Continuity of the Scholarly (& Cultural) Record
  7. 7. what was once available in print, on-shelf locally … … is now online & accessed remotely, ‘anytime/anywhere’ We’ve seen improved Ease of Access… But what of Continuity of Access? (this is mostly due to publishers)
  8. 8. Digital back copy is not in the custody of libraries Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/
  9. 9. Digital back copy is not in the custody of libraries Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/ Libraries boast of ‘e-collections’, but do they only have ‘e-connections’?
  10. 10. access to content & services We need to have some digital shelving for the Record
  11. 11. Ensuring ease and continuing access to the digital back copy access to content & services We need to have some digital shelving for the Record
  12. 12. National Science Library, Chinese Academy of Sciences Emergence of Keepers of digital content ① Web-scale not-for-profit archiving agencies: ② National libraries … ① Research libraries: consortia & specialist centres … National Science Library, Chinese Academy of Sciences
  13. 13. access to content & services 3. Threat to Integrity of our Scholarly Record
  14. 14. Uncovering the Threat of Reference Rot “when links to web resources no longer point to what was intended” This is a combination of two factors:
  15. 15. Link Rot Link Rot: Link stops working
  16. 16. + Content Drift: What is at end of URI has changed, or gone! http://dl00.org 2000 http://dl00.org 2004 http://dl00.org 2005 http://dl00.org 2008 (a) Dynamic content as values on webpage changes over time (b) Static content but very different (often unrelated) web pages
  17. 17. http://hiberlink.org/ Project 2 years: March 2013 to June 2015 Funder Andrew W. Mellon Foundation Partners University of Edinburgh EDINA Peter Burnhill, Muriel Mewissen, Richard Wincewicz, Paul Walk, Tim Stickland, [Christine Rees] Language Technology Group, Informatics Claire Grover, Beatrice Alex, Richard Tobin, Colin Matheson, [Ke “Adam” Zhou] Los Alamos National Laboratory Research Library Herbert Van de Sompel, Harihar Shankar, [Martin Klein, Rob Sanderson]
  18. 18. Scholarly Articles increasingly link to Web Resources, not just back to other Articles
  19. 19. References in Web-Based Scholarly Communication To Scholarly Resources To Web at Large Resources Link Rot DOI, HTTP version of DOI ‘Web today, gone tomorrow’ Content Decay Has ‘fixity’ How to add fixity to the dynamic Archiving: CLOCKSS, Portico, LOCKSS, etc, as per Keepers Registry … Focus for Hiberlink
  20. 20. Findings: Status of Referenced URIs, PMC corpus Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0115253 6 publicly accessible web archives for lookup: Internet Archive, archive.is (archive.today), Archive-It, BL Web Archive, UK National Archives Web Archive & Icelandic National Archive
  21. 21. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0115253 Findings: Status of Referenced URIs, Elsevier corpus 6 publicly accessible web archives for lookup: Internet Archive, archive.is (archive.today), Archive-It, BL Web Archive, UK National Archives Web Archive & Icelandic National Archive
  22. 22. … what is then a rotten article! … sale of rotten goods undermes the integrity of the scholarly record - especially for open scholarship
  23. 23. References in Web-Based Scholarly Communication To Scholarly Resources To Web at Large Resources Link Rot DOI, HTTP version of DOI ‘Web today, gone tomorrow’ Content Decay Has ‘fixity’ How to add fixity to the dynamic Archiving: CLOCKSS, Portico, LOCKSS, etc, as per Keepers Registry … The diverse world of web-archiving: How to enable ‘pro-active’ archiving of what is regarded as important “Think Hiberlink” There are issues here too; how happy should we be?
  24. 24. 4. Ensuring the Integrity & Continuity of the Scholarly Record Good that the likes of CLOCKSS & Portico, the BL & KB (Netherlands) and some others are doing something … But to what extent is the scholarly record still at risk of loss? How can we know? NB: Check out David Rosenthal’s blog posts ….
  25. 25. National Science Library, Chinese Academy of Sciences All praise to those who have stepped forward to act as digital shelves! ① Web-scale not-for-profit archiving agencies: ② National libraries … ① Research libraries: consortia & specialist centres … National Science Library, Chinese Academy of Sciences
  26. 26. Many archiving organisations is a Good Thing  “Digital information is best preserved by replicating it at multiple archives run by autonomous organizations” B. Cooper and H. Garcia-Molina (2002) As with Magna Carta, lots of copies …
  27. 27. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry SERVICES: user requirements (a) (b) ISSN Register at heart of the Data Model; ISSN-L as kernel field METADATA on extant e-journals METADATA on preservation action How to know who is looking after what & how? (and uncover what is still at risk) Digital Preservation Agencies e.g. CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance etc. (Taken from Figure 1 in reference paper in Serials, March 2009) Piloting an E-journal Preservation Registry Service
  28. 28. thekeepers.org as Global Monitor … to discover who is looking after what
  29. 29. Two Key Performance Indicators (KPIs) ‘Ingest Ratio’ = titles ingested by one or more Keeper / ‘online serials’ in ISSN Register = 28,103 / 165,949 [as of June 2015] => 17% ‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register = 9,836 / 165,949 => 6%
  30. 30. with usage logs for the UK OpenURL Router* • 8.5m full text requests in UK during 2012 => 53,311 online titles requested Analysis in 2013: ‘Ingest Ratio’ = 32% (16,985/53,311) => over two thirds 68% (36,326 titles) held by none! Archival Status of e-Serials Requested * As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL resolver services; developed & delivered by EDINA as part of Jisc support for UK universities & colleges
  31. 31. with usage logs for the UK OpenURL Router* • 8.5m full text requests in UK during 2012  53,311 online titles requested Analysis carried out again in 2015: ‘Ingest Ratio’ = 36% (19,231/53,311) ; up by 2,246 (4%) => but still, 64% (34,080 titles) held by none! ‘KeepSafe Ratio’ = 20% (10,847/53,311) ; up by 2,985 (5%) Archival Status of Requested e-Serials: Update
  32. 32. Known Archival Status of Online Continuing Resources assigned ISSN, by Country, June 2015
  33. 33. Known Archival Status of Online Continuing Resources assigned ISSN, by Country, June 2015 If its being kept safe then tell the Keepers Registry
  34. 34. Known Archival Status of Online Continuing Resources assigned ISSN, by Country, June 2015 If its being kept safe then tell the Keepers Registry Researchers (and therefore libraries) in any one country are dependent upon content written and published as serials in countries other than their own
  35. 35. very many ‘at risk’ e-journals from many (small & not so small) publishers BIG publishers act early but incompletely Priority: find economic way to archive content from
  36. 36. 5a. Actions to Ensure the Integrity & Continuity of the Scholarly Record What should be done? Accept responsibility for stewardship of collections 1. Use the Keepers Registry 2. Commit financial support for web-scale agencies, such as CLOCKSS & Portico: invest 1% 3. Contribute your collection development expertise 4. Tell publishers, archiving agencies & national library 5. Consider options for collaborative action as LIBER 6. Avoid the 2020 Vision where you get the blame!
  37. 37. • Upload list of ISSN & titles • Receive back report on what is being archived & what is not Register now for Member Services: http://thekeepers.org New Service: [just launched this week] Title List Comparison 1. Use the Keepers Registry to check the archival status of the journals that are of key importance to you
  38. 38. 5b. Actions to Ensure the Integrity & Continuity of the Scholarly Record • What should be done? – Accept responsibility for stewardship of collections – Think Hiberlink: what about Reference Rot? 1. Good News is that there is Remedy (coming out of R&D) • to create ‘snapshots’ of referenced content – to store in web archives • to include in the citations: Original URI Snapshot URI [obtained from a web archive] Date/Time of snapshot 2. Role for research librarians is to alert publishers, editors and authors and support new initiatives : HiberActive infrastructure
  39. 39. Help authors do the right thing via a reference manager (eg EndNote, Reference Manager, Zotero, Mendeley) ① archiving of referenced web content when noted ② use Datetime URI for archived content in the citation Hiberlink Plug-in developed for Zotero 1. Hiberlink Remedy To Avoid Reference Rot Help editors & publishers do the right thing, having parsed the document to extract URIs ① archiving of referenced web content [having author check] ② use Datetime URI for archived content in the citation Hiberlink Plug-in developed for OJS
  40. 40. … for what an Author regards as significant
  41. 41. … or needs to provide as evidence
  42. 42. 2. HiberActive Service Demonstrator Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
  43. 43. Cite with Robust References • For Open Science, the Scholarly Record Must Include:  Original URI, as it was at the time of reading  Snapshot URI, to revisit the content that was noted  Date/Time that Snapshot was taken of what was noted  Date/Time & Original URI enables access to snapshots created near the Date/Time in any web archive around the world  using Memento infrastructure (2015) Robust Links - Motivation http://robustlinks.mementoweb.org/about/
  44. 44. Well Published References with Robust Links
  45. 45. Thanks for listening … 1. The Scholarly (& Cultural) Record 2. Threat to the Continuity of the Scholarly (& Cultural) Record 3. Threat to the Integrity of our Scholarly Record 4. Ensuring the Integrity & Continuity of the Scholarly Record 5. Actions to Ensure the Integrity & Continuity of the Scholarly Record Keywords: Stewardship, Collection, Cooperation, Advocacy, Spend hiberlink.org thekeepers.orgedina.ac.uk

×