A Perspective on Archiving the Scholarly Record

3,563 views

Published on

As the scholarly communication system evolves to become natively web-based and starts supporting the communication of a wide variety of objects, the manner in which its essential functions – registration, certification, awareness, archiving - are fulfilled co-evolves. This presentation focuses on the nature of the archival function based on a perspective of the future scholarly communication infrastructure. This presentation, prepared for a meeting in June 2014, is based on and updates a previous one that was prepared for a January 2014 meeting. The latter is available at http://www.slideshare.net/atreloar/scholarly-archiveofthefuture

Published in: Internet, Travel, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,563
On SlideShare
0
From Embeds
0
Number of Embeds
1,013
Actions
Shares
0
Downloads
33
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

A Perspective on Archiving the Scholarly Record

  1. 1. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving the Evolving Scholarly Record: A Perspective Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Andrew Treloar, @atreloar , ANDS
  2. 2. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  3. 3. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Functions of Scholarly Communication • Registration: Allows claims of precedence for a scholarly finding • Certification: Establishes validity of the claim • Awareness: Allows actors in the system to remain aware of new claims • Archiving: Preserves the scholarly record over time Roosendaal, H, Geurts, C. (1997) Forces and functions in scientific communication http://www.physik.uni-oldenburg.de/conferences/crisp97/roosendaal.html
  4. 4. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 System of Journals, Paper Version • Registration: Manuscript submission • Certification: Peer review • Awareness: alerts, library shelf surfing • Archiving: Journals in library stacks
  5. 5. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 System of Journals, Digital Version • Registration: Manuscript submission • Certification: Peer review • Awareness: Various web discovery services • Archiving: Special purpose archives (e.g. Portico), publishers
  6. 6. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  7. 7. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Pointers to the Future “The future is already here – it’s just not very evenly distributed” William Gibson Gibson, W. (1999) The Science in Science FIction, NPR Interview http://www.npr.org/templates/story/story.php?storyId=1067220
  8. 8. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration - BioRxiv http://biorxiv.org
  9. 9. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration - GitHub http://github.com
  10. 10. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration – slideshare http://www.slideshare.net/hvdsomp/presentations
  11. 11. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration - WikiPathways http://wikipathways.org/index.php/WikiPathways
  12. 12. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration - Neurolex http://neurolex.org/wiki/Category:Olfactory_cortex_horizontal_cell
  13. 13. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration – Research Objects http://researchobject.org/
  14. 14. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Registration - Observations • Registration of wide variety of objects • dynamic, compound, inter-related, distributed across the web • Decoupling registration from certification • Time stamping, versioning
  15. 15. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Certification – PubMed Commons http://www.ncbi.nlm.nih.gov/pubmedcommons/
  16. 16. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Certification – The Open Journal http://theoj.org
  17. 17. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Certification – slideshare http://www.slideshare.net/hvdsomp/presentations
  18. 18. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Certification – Project FeederWatch http://feederwatch.org
  19. 19. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Certification - Observations • Certification decoupled from registration • Certification of various types of objects • Social interactions validating • Machines validating
  20. 20. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Awareness – Twitter http://twitter.com
  21. 21. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Awareness – myexperiment http://myexperiment.org/
  22. 22. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Awareness – NARCIS http://narcis.nl/
  23. 23. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Awareness – eLabNoteBook RSS Feeds http://malaria.ourexperiment.org/feeds
  24. 24. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Awareness - Observations • Awareness for various types of objects • Real time awareness • Awareness through social media
  25. 25. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving – CLOCKSS http://www.clockss.org/
  26. 26. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving – DANS Easy http://easy.dans.knaw.nl/
  27. 27. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving – Australian Antarctic Data Centre http://data.aad.gov.au/
  28. 28. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving – perma.cc http://perma.cc
  29. 29. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving – EU Trusted Digital Repositories http://trusteddigitalrepository.eu/Site/Welcome.html
  30. 30. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving - Observations • Archiving/Archives for various types of objects • Distributed archives • Archival consortia • Audit for trustworthiness
  31. 31. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  32. 32. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 The Future • Registration • Wide variety of objects • Versions of objects • Interrelated, interdependent objects • Certification • Variety of certification mechanisms • Decoupled from / Overlaid upon Registration • Awareness • Real-time • Social • Variety of objects • Archiving …
  33. 33. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Characterizing the Future – Scholarly Communication
  34. 34. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Characterizing the Future – Communicated Objects
  35. 35. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  36. 36. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 The Future – Core Observations • The research process, not just its outcome, is becoming visible … on the web • Massive extension of the scholarly record with an enormous variety of novel objects • The objects are heterogeneous, dynamic, compound, inter-related and distributed across the web • The objects are often hosted on common web platforms that are not dedicated to scholarship The archival paradigm must take these characteristics into account
  37. 37. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  38. 38. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  39. 39. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Articles • Special-purpose archival solutions for articles • Rosenthal finds that what is archived is too few, too healthy, too easy • Attempts with the Keepers Registry to map out what is archived • Based on [ISSN, volume, issue], not on DOI, HTTP URI David Rosenthal (2013) Patio Perspectives at ANADP II: Preserving the Other Half http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
  40. 40. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Articles Peter Burnhill (2014) Ensuring access to digital back copy http://www.cni.org/topics/digital-preservation/ensuring-access-to-digital-back-copy/
  41. 41. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Web at Large Resources • Web archives contain snapshots, the result of incidental archiving • The Hiberlink project finds that for the large majority of these “Web at Large” resources, no temporally appropriate archived versions exist • Memento infrastructure allows auditing what is globally archived based on HTTP URI http://hiberlink.org
  42. 42. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Links Abstracted to Top Level Domain Targets Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  43. 43. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Loss of Current Context – Link Rot Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  44. 44. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Loss of Past Context – Archival Status (14 day window) Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  45. 45. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  46. 46. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Perspective on “Repository” Capture Paradigm • Atomic object • Finalized object • Removal of context • Perspective on object: file in a file system • Capture request by owner of object • Capture time decided by owner of object
  47. 47. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Perspective on “Web” Capture Paradigm • Compound object (context essential) • Constituents of compound object in flux • Perspective on constituents: resources with URIs on the web • Capture request by user of the constituents, owned by self, owned by 3rd parties • Capture time decided by user of the constituents
  48. 48. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  49. 49. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Creating Pockets of Persistence How to achieve the ability to: • Persistently • Precisely • Seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future
  50. 50. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Creating Pockets of Persistence How to achieve the ability to: • Persistently • Precisely • Seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future This challenge exists for the entire web, but some communities actually care about addressing it: • scholarly communication, • legal publications, • journalism, • Wikipedia, • …
  51. 51. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for a Seed Collection • Seed Collection - Starting point for capture is a seed collection of interest to communities that care, e.g. o Scholarly literature o Legal documents o On-Line journalism o Wikipedia articles • Lifecycle Events – Intervene at critical moments in the lifecycle of items in these collections to pro-actively capture o Collection items – some solutions in place o Web resources referenced in collection items
  52. 52. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for a Seed Collection • Request by user of a A to capture A, B, C, D, E • Request for capture may result in • In-situ or remote capture • Creation of snapshot or creation of trace • Archival URI, capture datetime • Interoperability for on-demand capture • Orchestration of capture process
  53. 53. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for Seed Collection • What those crucial lifecycle events are may depend on the collection type Wikipedia • Creation of new article • Creation of new version of article • Creation of substantially new version of article • Addition of external reference to article • References to article exceed a certain threshold Scholarly Literature
  54. 54. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Scholarly Literature: Experimental Zotero Extension Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero https://www.youtube.com/v/ZYmi_Ydr65M%26vq
  55. 55. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Scholarly Literature: Experimental HiberActive Service Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
  56. 56. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  57. 57. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Web Platforms for Scholarship • Increasingly, common web platforms are used for scholarship • GitHub, Wikis, Wordpress, etc. • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • But, these platforms record rather than archive
  58. 58. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Recording is not Archiving “GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.” “GitHub does not warrant that (i) the service will meet your specific requirements, (ii) the service will be uninterrupted, timely, secure, or error-free, (iii) the results that may be obtained from the use of the service will be accurate or reliable, (iv) the quality of any products, services, information, or other material purchased or obtained by you through the service will meet your expectations, and (v) any errors in the Service will be corrected.” GitHub Terms of Service http://help.github.com/articles/github-terms-of-service
  59. 59. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Recording versus Archiving Recording Archiving Short-term Longer-term No guarantees provided Attempt to provide guarantees Write many/read many Write once/Read many Scholarly process Scholarly record
  60. 60. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Recording versus Archiving • A perspective on scholarly infrastructure
  61. 61. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014
  62. 62. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Infrastructure Considerations • Various incentives to move objects from Private to Recording: • Share with self, team, comply with funder requirements • Objects in Recording are network accessible and in global (HTTP) namespace • Within reach of web-scale processes aimed at selectively moving them from Recording to Archiving • Core aspects of these processes include • Ability to snapshot the state of interlinked objects at specific moments in their lifecycle • Transfer of snapshots from Recording platforms to appropriate, distributed Archive platforms (interoperability) • Curatorial decisions regarding what should be captured
  63. 63. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Curatorial Considerations • What are the criteria involved in deciding (which states of) which objects get captured/archived? • What triggers transition from Recording to Archiving? • On-demand in lifecycle, social status of the object, reference made to object, deliberate randomness for serendipity, … • What to archive? • Snapshot of object or trace of object (metadata, provenance, …) ?
  64. 64. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Final Considerations • Need organizational, technical, and curatorial interfaces between Recording and Archiving platforms • Need organizational and technical interfaces across Archiving platforms
  65. 65. Herbert Van de Sompel OCLC ESR, Washington, DC, December 10 2014 Archiving the Evolving Scholarly Record: A Perspective Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Andrew Treloar, @atreloar , ANDS

×