Tales from the Keepers Registry: Dr Who and the Scholarly Record

2,248 views

Published on

“Who does forever?” : A Registry of Keepers
Who is looking after e-journals with archival intent?

2. Dr Who and the Scholarly Record
Time Travel for Scholarly Web

Evidence from the Keepers Registry
Statistics on who is looking after what, & what is at risk

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,248
On SlideShare
0
From Embeds
0
Number of Embeds
565
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://www.jisc-collections.ac.uk/Reports/e-journal-archiving-comparative-study/
  • E-journals, and online serials more generally, are a big part of the scholarly record – if we use the distribution of assigned ISSN as a guide, then we have some measure of just how international is the problem space:- centering the map on Singapore, Asia and the Pacific for a change yes a lot is published in US, UK, Netherlands and Germany – but over 60% is not – and that is an underestimate because so many online serials in countries in the centre of this map do not have ISSN assigned – they remain hidden to our arithmetic.
  • by considering 3 answers that might be given to a user asking after a particular e-journal content – typically an article
  • Tales from the Keepers Registry: Dr Who and the Scholarly Record

    1. 1. Association of Subscription Agents & Intermediaries ASA ANNUAL CONFERENCE 2014 24-25 February 2014 Tales from the Keepers Registry Dr Who and the Scholarly Record Peter Burnhill EDINA, University of Edinburgh, UK http://creativecommons.org/licenses/by/3.0/
    2. 2. Overview 1. “Who does forever?” : A Registry of Keepers Who is looking after e-journals with archival intent? 2. Dr Who and the Scholarly Record Time Travel for Scholarly Web 3. Evidence from the Keepers Registry Statistics on who is looking after what, & what is at risk
    3. 3. Some Consequences of Web • Essentials of supply chain have changed • licensed to access, not sale of content • Libraries no longer take physical custody of much “The Library [Committee], which key content made up of librarians and is • online academics, remotely, not on-shelf locally • Role of libraries as reassurance about … wants trusted keepers of information long-term disrupted and culture has beenpreservation before confirming a University policy – Need assurance of continuity of access of goinge-only.” • of all content for future generations from email sent by a the licence • of the back copies, post-cancellation of big UK Library • Does this mean that the Scholarly Record is at risk?
    4. 4. 1. “Who does forever?” Many reports over past 10 years highlighted risks • „digital decay‟: format obsolescence & bit rot and warned against single points of failure: • • natural disasters (earthquake, fire and flood) human folly (criminal and political action): hacking + risks with commercial events in the publisher/supply chain Some early archiving initiatives emerged … •eDepotat KoninklijkeBibliotheek • international significance(Elsevier &Kluwer) •the LOCKSS project at Stanford University • from which came CLOCKSS [as library/publisher „dark archive‟] •the electronic-archiving initiative at JSTOR • from which came Portico[as service provider]
    5. 5. A „global challenge‟: trans-national action UK.BL 10% Netherlands & Germany: c. 4.5% each „hidden‟ e-journals: low % ISSN US.LoC 20% Brazil 4% %age of the 113,000 ISSN issued for e-serials Researchers (and therefore libraries) in any one country are dependent upon content written and published in countries other than their own
    6. 6. A Variety of „Archiving Organisations‟ ① web-scale not-for-profit archiving agencies e.g. CLOCKSS Archive & Portico ② national libraries (with legal deposit in mind) e.g. e-Depot (Netherlands); British Library;DnB etc ③ research libraries: consortia & specialist centres e.g. Global LOCKSS Network, HathiTrust, Scholars Portal, Archaeology Data Service Disclaimer: University of Edinburgh is a CLOCKSS Node & Board Member: Jisc supports UK LOCKSS Alliance
    7. 7. How can we know who is looking after what & how? (and uncover what is still at risk) SERVICES: user requirements E-J Preservation Registry Service Data dependency E-Journal Preservation Registry (b) The Keepers Registry, product of Jisc-funded PEPRS Project (EDINA & the ISSN IC) METADATA on preservation action (a) METADATA on extant e-journals ISSN Register at heart of the Data Model; ISSN-L as kernel field ISSN Register (Taken from Figure 1 in reference paper in Serials, March 2009) Digital Preservation Agencies e.g. CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance etc.
    8. 8. How can we know who is looking after what & how? (and uncover what is still at risk) SERVICES: user requirements E-J Preservation Registry Service Data dependency E-Journal Preservation Registry (b) The Keepers Registry, product of Jisc-funded PEPRS Project (EDINA & the ISSN IC) METADATA on preservation action (a) Look forward to ISNI for publisher as kernel field ISSN Register at heart of the Data Model; ISSN-L as kernel field METADATA on extant e-journals ISSN Register (Taken from Figure 1 in reference paper in Serials, March 2009) Digital Preservation Agencies e.g. CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance etc.
    9. 9. Many archiving organisations is a Good Thing  “Digital information is best preserved by replicating it at multiple archives run by autonomous organizations” B. Cooper and H. Garcia-Molina (2002)
    10. 10. Now have a global Registry of e-journal archiving … to discover who is looking after what Enter title or ISSN to search across metadata reported by leading archiving organisations *news* Library of Congress has now joined the Keepers Registry [& have high hopes for some others …]
    11. 11. … and discover details of its „archival status‟ This e-journal is being archived by 5 archiving agencies … … but coverage of volumes is partial & patchy Example search: „Origins of Life’ 11
    12. 12. Overview: Time for Part 2 2. Dr Who and the Scholarly Record (Time Travel for Scholarly Web) • ‘Reference Rot’: When what was referenced and cited ceases to say the same thing, or ‘has ceased to be’ http://www.snorgtees.com/this-parrot-has-ceased-to-be
    13. 13. The „reference rot‟ problem definition Investigating Reference Rot in Web-Based Scholarly Communication 1. http:// link to a resource no longer works • Link rot 2. The citation is inadequate • Not robust over time 3. The content referenced at the end of the link a) has evolved, b) changed dramatically, c) disappeared completely. http://hiberlink.org #hiberlink
    14. 14. Hiberlink Project: Andrew W. Mellon Foundation Investigating Reference Rot in Web-Based Scholarly Communication Partners • Los Alamos National Laboratory: Research Library • Martin Klein, Robert Sanderson, Herbert Van de Sompel • University of Edinburgh: EDINA&Language Technology Group • Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine Rees, Tim Stickland, Richard Wincewicz&Beatrice Alex, Claire Grover, Richard Tobin, Ke „Adam‟ Zhou Acknowledgments • Primary datasets: arXiv, Chesapeake Project, Elsevier, PubMed Central, PLoS, … Planning on large-scale investigation (looking for more …) • Secondary datasets: Ex Libris, MS Academic, SerialsSolutions • Technology support: CrossRef Labs, CrossRef Prospect, Elsevier • Liaisons: archive.is, CrossRef, Internet Archive, Old Dominion University Web Science & Digital Library Research Group, perma.cc http://hiberlink.org #hiberlink
    15. 15. Hiberlink Project: Four work packages Investigating Reference Rot in Web-Based Scholarly Communication 1. Problem Quantification. text mining of vast corpus of scholarly literature to uncover references to web resource (URIs); using Memento; determine availability on live web and in archives. 2. Archival Solution Infrastructure. Prototyping proactive, web-centric archiving approaches mechanisms for archiving cited web resources at the point of use or publication. 3. Temporal Reference Solutions. Prototyping new methods of citation to enable creation of precise & actionable time-specific references. 4. Dissemination and Outreach. Raising awareness of the challenges at the heart of digital scholarly communication. http://hiberlink.org #hiberlink
    16. 16. Investigating Reference Rot in Web-Based Scholarly Communication References in Web-Based Scholarly Communication References to other online scholarly works Link Rot DOI, HTTP version of DOI Content Decay References to online resources on the „wider Web‟ Fixity of content Archiving: CLoCKSS, LoCKSS, Portico… (Keepers Registry) This is becoming understood but issues, see This is unexplored, so to be Hiberlink focus David Rosenthal blog post http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
    17. 17. Articles Increasingly link to online resources on the „wider Web‟ URIs extracted from PubMed papers – links to Web at Large resources
    18. 18. Quantifying the extent of „Reference Rot‟ – Early Results Using: PubMed Central Corpus 01/1997 - 12/2012 • • • • Articles processed: Articles that contain links (URIs) to „Web at Large‟ : Number of references to „Web at Large‟ URIs: Unique referenced Web at Large URIs: 494,785 176,527 557,432 327,782 Percentage Exists & Archived Referenced URIs 31.2% Exists & Archived !Exists & Archived Exists & !Archived !Exists & !Archived 16.8% 11.3% 40.7% 31% 11% 41% 17% are available & safe can be retrieved at risk are lost
    19. 19. Thoughts on How to Address Content Decay Who is not selling defective goods? • Remedy: Pro-active approach to trigger web archiving when web-based content is referenced in scholarly work: – By authors • during note taking, authoring, when submitting – By publication platforms • During submission, editing, acceptance, issue
    20. 20. + Tool with Temporal Context for Links • Memento for Chrome is an application that uses Original URI-R and dates to access Mementos in various web archives Memento Time Travel for Chrome http://bit.ly/memento-for-chrome
    21. 21. BackTo The Overview - Part 3 E-journals should be easy – right? … but is the e-journals problem is being solved? 3. Evidence from the Keepers Registry Statistics on who is looking after what, & what is at risk
    22. 22. 3. Evidence from the Keepers Registry a) 21,557 e-serial titles are reported as being ingested by the 10 Keepers – organisations with archival intent – with many „missing volumes and issues‟ b) 113,092 ISSN assigned to „online serials‟ in the ISSN Register  Progress with a key indicator: ratio of a/b = 19% – was 17% at close of 2011 (16,558 / 97,563) Progress, but far from „job done‟
    23. 23. Do we need to agree a „priority list‟ of titles? 1. Should we only be interested in the c.30,000 „peer-reviewed‟ scholarly journals? [Ulrich‟s] 2. Do we look only at on what individual libraries list? – In 2012 we checked „archival status‟ for 3 large university libraries c.75% „at risk‟ c.11% held by 3 or more • Two key indicators: %age (& number) of titles that are „at risk of loss‟ %age (& number) titles that are ‘preserved by 3 or more Keepers’. 1. Should we ask the audience? • The researchers and students who read online serials
    24. 24. Looking from the user‟s point of view … … with usage logs for the UK OpenURL Router • 10.4m full text requests in 2012; ISSN-L to de-duplicate ISSN • 53,311 online titles requested by researchers & student from 108/160+ Analysis using the Keepers Registry: • Only 15% (7,862) are being kept by 3+ Keepers • Over two thirds (68%) held by none  36,326 titles „at risk‟ of loss  So „preservation‟ (or lack of it) is still a real and present problem!
    25. 25. Good News & Main Challenge? Good news? • Most of the big publishers engage with archiving initiatives – typically CLOCKSS, e-Depot and Portico. • Are those titles, volumes & issues actually being archived? Main challenge? • Long tail of smaller publishers - regardless of business model. • Everyone in the audience should check whether they are participating in at least one preservation approach? • Role of Agents, who arrange subscriptions with those small publishers? – Or only role of national libraries & research library consortia?
    26. 26. Choice of future with 2020 Vision • Best Case scenario for ASA 2020 – Libraries, Agents & Publishers have acted to reduce that alarming 80% figure to near to zero  – They have ensured that all the e-journal content used by their researchers in 2013 has been preserved and can be successfully used in 2020, and assuredly beyond.  • Worst Case scenario for ASA 2020 – Libraries, Agents & Publishers have failed to act  – Important literature has been lost  – Citizens & scholars complain of neglect!
    27. 27. The Keepers Registry: Actionable Evidence Sidebar note on monitoring their progress … 1. To assist publishers „do the right thing‟ – A showcase for the real heroes: the archiving organisations – Means to check what content is being reported as archived – Provide libraries, publishers & archiving organisations with lists of titles that seem to be at risk of loss 2. To keep a close focusBreaking News: on volumes & issues Need New release (end of Q12014) Members Area: for Publishers & Libraries to make sure all issued content is being kept safe Upload a list of ISSNs& get back archival status of Titles 3. To assist collaboration between Keepers: „a safe places network Access to API, to report archival status on 3rd Party websites – 4. If it is worth preserving, it really should have an identifier
    28. 28. Gentle Wake-up Call to Ensure Continuity of Access ‘Go Smell The Coffee’ #hiberlink http://thekeepers.blogs.edina.ac.uk/ http://thekeepers.orghttp://hiberlink.org/
    29. 29. Ask a librarian in 2020: 3 possible answers 1. "Yes, we have it (we've checked recently, both in the catalogue and in actuality), and you can access it now" 2. "No, but we know some body that does (we trust), – so we can point you to (or arrange access to) it now/soon-ish" 3. "Sorry, we don't know … - perhaps nobody has it - it may be lost forever, altho' perhaps somebody somewhere ...” - That was true for the print world - Unfortunately, unless we do something now, the 3rd answer could become the common one for a lot of e-journal content
    30. 30. Sidebar note on National Libraries Should we wait upon Legal Deposit? – 94% of libraries have some form of legal deposit for print. • Only 44% national libraries had legislation in 2011 for e-books or e-journals; expected to rise to 58% by June 2012. from presentation, CENL 2011 Survey by Lynne Brindley to CDNL Annual Meeting Puerto Rico, 15/8/11 • Only 27%[expected to rise to 37% by June 2012] actually ingesting via legal deposit  Total national libraries collecting = those 14 via legal deposit + 9 by other means (Netherlands, UK/BL, Switzerland voluntary deposit)  Only KB e-Depot, BL, NSLC (+ LoC) in The Keepers Registry  Only when the other 19 join will all know about their activity  Key point is not about call for „legal deposit‟ but that on its own it is taking too much time

    ×