Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Access to Digital Back Copy

1,495 views

Published on

Delivered by Peter Burnhill at CNI Fall 2014 Membership Meeting, December 8-9, 2014
Washington, DC. This is about ensuring that online serial content, whether issued in parts or changes over time via a website, continues to be available for scholarship. The central take home message is that we all have a lot still to do.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Access to Digital Back Copy

  1. 1. Access to Digital Back Copy http://www.flickr.com/photos/shinez/5000985919/
  2. 2. to ensure researchers, students & their teachers have ease and continuing access to online resources for scholarship licence to use “ease” “continuing” usability preservation access to content & tools Our Shared Task is
  3. 3. what was once available in print, on-shelf locally … … is now online & accessed remotely, ‘anytime/anywhere’ exploiting the telematic opportunity! 1990s/1990s Euro-speak But what of Continuity of Access? we’ve seen improved Ease of Access 
  4. 4. Back Copy, once available in print on-shelf locally (or via that tedious ILL) Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/ … is where exactly is the digital back copy?
  5. 5. … not in the custody of Libraries Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/ Libraries boast of ‘e-collections’, but maybe they only have ‘e-connections’ => real & present threat to the integrity of what is published as scholarly record
  6. 6. The following questions are implicit: 1. What exactly was once on library shelves & What exactly is the scholarly record? … and where is it now? Ensuring access to digital back copy:
  7. 7. The following questions are implicit: 1. What exactly was once on library shelves & What exactly is the scholarly record? 2. What is now ‘on the Web’? … or rather, what was once ‘on the Web’? Ensuring access to digital back copy:
  8. 8. The following questions are implicit: 1. What exactly was once on library shelves & What exactly is the scholarly record? 2. What is now ‘on the Web’? 3. What of other (external) resources, now issued online & needed for scholarship? eg Gov. Docs, the cultural record? Ensuring access to digital back copy:
  9. 9. The following questions are implicit: 1. What exactly was once on library shelves & What exactly is the scholarly record? 2. What is now ‘on the Web’? 3. What of other (external) resources needed for scholarship, eg Gov. Docs, the cultural record? 2. & whose responsibility to archive content? Each research library; consortia; national/state libraries/archives? & is this a national, or a trans-national challenge? challenge? Ensuring access to digital back copy:
  10. 10. What every country should know: trans-national action! %age of 132,806 ISSN issued for e-serials (December 2013) US: 20%Sp: 5% Rest of World: > 50% Researchers (& libraries/publishers) in any one country are dependent upon content written and published as serials in countries other than their own Canada 5.5%UK: 9% Brazil: 6% Ger: 6%
  11. 11. licence to use Ensuring researchers, students and their teachers have ease and continuing access to online resources used for scholarship “ease” “continuing” usability preservation access to content & services security & integrity of medium replication usability of format back content semantiic drift archiving Access to Digital Back Copy: Search for digital shelving … trust & verification
  12. 12. Reflect upon a landmark, 10+ years ago The editor, Linda Cantara [Abbott] passed away, 22 August, 2013
  13. 13. Her summary of “responsibility for archiving the content of electronic journals”, involved some familiar organisational names And so began different investigations; all addressed key issues: • Identification of what should be archived • Guidelines for accessing e-journal archives • Development of sustainable economic and business models
  14. 14. The result includes some digital shelves a. Web-scale not-for-profit archiving agencies: a. National libraries … a. Research libraries: consortia & specialist centres … … alongside other Keepers with archival intent: National Science Library, Chinese Academy of Sciences National Science Library, Chinese Academy of Sciences Different models 100 +
  15. 15. Many archiving organisations a Good Thing “Digital information is best preserved by replicating it at multiple archives run by autonomous organizations” B. Cooper and H. Garcia-Molina (2002) Bad stuff will happen!
  16. 16. following themes recur: 1. Identify Threat & Seek Remedy ✔ 2. What’s the (scale of the) Present Danger? • How do we know? 3. What’s the Remedy? • How best to implement remedy? 4. Monitor progress / Reflect / Re-think 5. Repeat ↵ Moving towards some practical steps …
  17. 17. … to discover who is looking after what *New in 2014* Library of Congress and Scholars Portal now reporting in *What’s New in 2014 and what’s coming* eg Library of Congress and Scholars Portal now reporting in New functionality Evidence of what is archived
  18. 18. Keepers Registry: an online service that has: • free-to-web facilities: • search and browse by serial title, ISSN and by publisher • ‘Holdings statement’ – issues & volumes • summary statistics; date of last update for each ‘Keeper’ + • a Members Area [enabling additional functionality]  check archival status of list of ISSN  machine (API) interfaces, eg OpenURL link [3rd party website]  statistics, beyond those provided on the simple user interface • the Keepers Area [to be ‘co-designed’]
  19. 19. Successfully made transition to be a sustainable service!  Sustainable … • Technologically: the software/hardware/data • Organisationally: EDINA & ISSN IC, Jisc Core Service • Financially: costs understood; has recurrent revenue Needed & wanted by one or more Use Community 1. the means to discover who is looking after what, how & access terms 2. the lens on what is being kept safe => what is at risk of loss 3. a showcase for archival organizations of all types, worldwide. Keepers Registry: an online service that is …
  20. 20. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry user requirements (a) (b) ISSN-L as kernel field METADATA on extant e-serials METADATA on preservation action Digital Preservation Agencies Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance A Project to Pilot an E-journal Preservation Registry Service Need to know who is looking after what & how? The Keepers Registry "Tales from the Keepers Registry" Serials Review 39.1 (2013) Serials, March 2009 Project Data Model
  21. 21. 10 Questions & Some Short Answers 1. What type of resources are recorded in the Keepers Registry? Very short answer: Serial content The streams of content (in digital form) that are: • issued online in parts (e.g. journal content) • issued through change over time (e.g. web page). The Registry follows the rules used for ISSN assignment. Such serial titles include: • digitised journal content as well as born digital • e-books that are issued as a series (having ISSN) • contents of selected websites • what may be made available via repositories.
  22. 22. 10 Questions & Some Short Answers … 2. Is the purpose of the Registry MAINLY to record 'scholarly resources’? • and does that also mean cultural heritage resources? Very short answer: That was the motivation, but …
  23. 23. The Scholarly Record & Serials … [not to scale] Continuing Resources ‘The Scholarly Record’ ‘resources needed for scholarship’ Issued in Parts (Serials) Content changes over time (Intergrating) ‘e-journals’ Websites, Databases, Repositories ‘Book-length work’ ‘Gov Docs’
  24. 24. 10 Questions & Some Short Answers (cont) 3. Why has Keepers Registry a global remit, why not national registries? • Researchers (& libraries/publishers) in any one country are dependent upon content written & published as serials in countries other than their own 3. Does the Keepers Registry intend to carry out audit or certification? • No, but each ‘keeper’ can report such information 3. What granularity is recorded about archived content? • Issue & volume (& year if available) • Not article-level, altho’ keepers can report at that level
  25. 25. 10 Questions & Some Short Answers (cont) 6. Is theKeepers.org only intended for librarians and policy- makers or also for individual scholars? • Open for all but geared to librarians who would be stewards 6. What is meant by archived, and is this the same as preserved? • Someone is keeping with archival intent; preservation levels? 6. Can the Keepers Registry help print archiving initiatives? • It already assists UK Research Reserve 6. Can the Keepers Registry help digitisation initiatives? 7. And what about the Internet Archive? • Interesting you should ask – ability to ‘see the streams’ ?
  26. 26. What’s the (scale of the) Present Danger? • How do we know? In 2011, the Keepers Registry recorded 16,558 titles ‘ingested & archived’ by at least 1 ‘keeper’ 21,557 in 2013 26,195 as at November 2014 9,656 'ingested & archived' by 3+ More archives reporting into Registry & more archiving!
  27. 27. “Are we there yet?” … “Don’t think so” ‘Ingest Ratio’ = titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register = 26,195 / 136,965 [in March 2014] => 19% (We do not know about 80% of e-serials having ISSN) ‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register = 9,656 / 136,965 => 7%
  28. 28. Evidence using Title List Comparison tool As reported in: P. Burnhill (2013) Tales from The Keepers Registry: Serial Issues About Archiving & the Web. Serials Review 39 (1), 3–20. http://www.sciencedirect.com/science/article/pii/S0098791313000178, &https ://www.era.lib.ed.ac.uk/handle/1842/6682 In 2011/12 three major research libraries in the USA (Columbia, Cornell & Duke) checked archival status of serial titles regarded as important ‘Ingest Ratio’ = 22% to 28%, ie about a quarter => fate of c.75% is unknown
  29. 29. very many ‘at risk’ e-journals from many small publishers BIG publishers act early but incompletely Priority: find economic way to archive content from …
  30. 30. … with usage logs for the UK OpenURL Router* • 8.5m full text requests in UK during 2012 => 53,311 online titles requested Analysis in 2013:: ‘Ingest Ratio’ = 32% (16,985/53,311) => over two thirds 68% (36,326 titles) held by none! User-centric Evidence * As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL resolver services; developed & delivered by EDINA as part of Jisc support for UK universities & colleges Next Step is to focus on ‘scholarly record’?
  31. 31. Imagine CNI 2020 • Best Case scenario – Publishers (& Libraries) have acted – Together with the Keepers they have ensured that all the e-journal content used by researchers this year (in 2014) has been preserved and can be used successfully in 2020
  32. 32. Imagine CNI 2020
  33. 33. Added remarks from related projects • Keepers Extra: 2-year investment by Jisc to ensure that the Keepers Registry is all it can be • Hiberlink: Investigation into the threat of ‘reference rot’; bonus report of potential remedy – With thanks to Andrew W Mellon Foundation • SafeNet: 2-year investigation for Jisc into a PLN for the UK, with part focus on ‘post-cancelation access’
  34. 34. Keepers Extra: 2-year (Jisc) Project Builds on the work of the eJournal Archiving Group run by Jisc in 2012/13 (we may re-name this project as JARVIG): •Assign priority of attention: collection judgement & decisions •Provide librarians with a toolkit relating to collection coverage, using the Keepers Registry •R&D on data quality and metadata challenges – Might lead to of service enhancements for Keepers Registry – Improve ‘holdings display’ •Governance? •Extend Keepers Registry model – to recognise identifiers other than ISSN (URN?) – model for how other types of scholarly content are kept safe?
  35. 35. We will have something now to report & yet more to say in 2015  Two-year project funded by Andrew Mellon Foundation ‘Reference Rot’ When what was referenced & cited ceases to say the same thing, or ‘has ceased to be’ http://www.snorgtees.com/this-parrot-has-ceased-to-be … undermining the integrity of what is published
  36. 36. An International Team at Work funded by the Andrew W. Mellon Foundation • Los Alamos National Laboratory: Research Library: Martin Klein, (Rob Sanderson), Harihar Shankar, Herbert Van de Sompel • University of Edinburgh: Language Technology Group: Beatrice Alex, Claire Grover, Richard Tobin, Ke “Adam” Zhou EDINA * : Neil Mayo, Muriel Mewissen (Project Manager), Christine Rees, Tim Stickland, Richard Wincewicz, Peter Burnhill Centre for Service Delivery & Digital Expertise Funded by the Andrew W. Mellon Foundation
  37. 37. Reference Rot = Link Rot + Content Drift “when links to web resources no longer point to what they once did” Investigating Reference Rot in Web-Based Scholarly Communication
  38. 38. Link Rot ‘Link Rot’
  39. 39. + Content Drift: What is at end of URI has changed, or gone! http://dl00.org 2000 http://dl00.org 2004 http://dl00.org 2005 http://dl00.org 2008 (a) Dynamic content as values on webpage changes over time (b) Static content but very different (often unrelated) web pages
  40. 40. What of the references to Web resources that were cited in the landmark publication ?
  41. 41. 11 years later, few references work as intended 
  42. 42. A re-direct [from RLG to OCLC] but ‘content drift’ Fail !!
  43. 43. Reference no longer works: ‘link rot’ Fail !!
  44. 44. Reference no longer works: ‘link rot’ Fail !!
  45. 45. A re-direct but content not found Fail !!
  46. 46. Successful link: URI works as expected 
  47. 47. Successful link: URI works as expected 
  48. 48. Classic link rot: ‘Page Not Found’ Fail !!
  49. 49. reference to the Web is to an e-journal that is still current
  50. 50. Classic link rot: ‘Page Not Found’ Fail !!
  51. 51. URI works but content drift: reference is not as intended Fail !!
  52. 52. This is a Threat to The Integrity of The Scholarly Record hiberlink.org
  53. 53. What we are doing in Hiberlink 1. Creating evidence on extent of ‘Reference Rot’ – Main focus has been on references (& URIs) made in Journal Articles • Inc. reference rot in Supreme Court judgments with Harvard Law Library & permaCC – ETD2014 was opportunity to look at Reference Rot & the e-Thesis – PRELIDA is opportunity to look at impact on Linked Data 1. Understanding the preparation/publication/ingest workflow(s) – Identifying opportunity for productive intervention 1. Prototypes for pro-active archiving to enable remedy – Embedding such ‘solutions’ in existing tools & infrastructure – Propose/test new infrastructure for temporal referencing • supporting & using the Memento protocol 1. Raising awareness & seeking collaborative actions …. through events like this
  54. 54. Remedy for The Integrity of The Scholarly Record Envisage the best opportunities for Intervention to make Remedy, to ‘flash-freeze’, either to avoid reference rot or to ‘stop the rot’. 3 basic workflows: a.Study: Preparation -> (Review) -> Submission b.Publication: Editorial -> (Revision) -> Acceptance -> Issue c.Post-Publication: Deposit/Ingest -> Provide/Access -> Use Identify the Actors involved in: a.Composition: author/creator b.Public Release: editor/referee/copy c.Curation: librarian / repository manager / archivist
  55. 55. 1. Hiberlink Plug-in - to help authors and middle-folk (publishers/librarians) do the right thing: – Zotero - used by authors to manage references https://www.zotero.org/ – Open Journal System (OJS) - used by OA publishers https://pkp.sfu.ca/ojs/ ‘Work in progress’ to effect Remedy (1)
  56. 56. For use during preparation of thesis & before final submission but also before deposit with Library (& maybe for repair by Library …) Hiberlink Plug-in for Zotero a. Triggers archiving of referenced web content b. Returns Datetime URI for archived content
  57. 57. 1. Hiberlink Plug-in - to enable pro-active archiving 2. Missing Link - re-factor the HTML link that is returned ‘Work in progress’ to effect Remedy (2) b) Augment Link with a set of Datetime & location pairs a) Take simple URI - to French National Library (say)
  58. 58. 1. Hiberlink Plug-in - to enable pro-active archiving 2. Missing Link - re-factoring the HTML link First two approaches support ‘perfect scenario’: • All authors archive all their cited URIs • e.g. (but not exclusively) with Hiberlink / Zotero 3. HiberActive – Enables repositories to ‘stop the rot’ by actively archiving those references in e-theses – A notification hub, a component for the infrastructure • testing workflow with ResourceSync, CORE & external archive programme ‘Work in progress’ to effect Remedy (3)
  59. 59. Back Copy, once available in print on-shelf locally (or via that tedious ILL) Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/ … is where exactly is the digital back copy? Scholarly e-journals Alternative ‘Scholarly’ & other Web venues That which supports scholarly statement: References / Citations In Scholarly e-journals On the ‘Web at Large’
  60. 60. a. Web-scale not-for-profit archiving agencies: b. National libraries … a. Research libraries: consortia & specialist centres … Meanwhile: Promote & engage the real heroes! National Science Library, Chinese Academy of Sciences 100 +
  61. 61. What you can also do today! 1. Engage now with the real heroes of this story: those that provide digital shelving 2. Go to the Keepers Registry => thekeepers.org  Search on Title/ISSN • Check key volumes & issues are being archived  Browse by publisher 3. Sign-up to test the new Member Services:  Title List Comparison tool • Are your Titles actually being archived? • & Check archival status for ISSNs listed in citations  Linking Options for ‘archival status’ on your website
  62. 62. very many ‘at risk’ e-journals from many small publishers including Gov Docs! BIG publishers act early but incompletely Priority: work with other organisations to find economic way to archive content from …
  63. 63. Access to Digital Back Copy http://www.flickr.com/photos/shinez/5000985919/ Thank you

×