Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DHUG 2018: Towards Web-Centric Repository Interoperability

95 views

Published on

PowerPoint accompaniment for Martin Klein's presentation at DHUG 2018.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

DHUG 2018: Towards Web-Centric Repository Interoperability

  1. 1. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Towards Web-Centric Repository Interoperability Martin Klein @mart1nkle1n http://orcid.org/0000-0003-0130-2097 Research Library Los Alamos National Laboratory
  2. 2. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 2 Team Effort Herbert Van de Sompel Harihar Shankar David Rosenthal Michael L. Nelson Geoffrey Bilder John Kunze Shawn Jones Simeon Warner Karl Ward Joe Wass
  3. 3. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 3 Robust Linking to Web Resources http://robustlinks.mementoweb.org/
  4. 4. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 4 Link Rot
  5. 5. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 5 https://web.archive.org/web/20140101072007/http://netpreserve.org/general-assembly/2013/overview IIPC 2013
  6. 6. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 6 http://netpreserve.org/general-assembly/2013/overview IIPC today
  7. 7. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 7 Content Drift
  8. 8. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 8 https://web.archive.org/web/20161228184110/https://www.epa.gov/climatechange EPA 12/2016
  9. 9. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 9 https://www.epa.gov/sites/production/files/signpost/cc.html EPA today
  10. 10. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 10 • Link rot: a link stops working all together • Content drift: The linked content changes over time and may eventually no longer be representative of the content that was originally linked • Link rot + content drift = reference rot • On the web, all links are subject to reference rot • Reference rot hinders our ability to follow links as they were intended when they were put in place Problem
  11. 11. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 11 http://dx.doi.org/10.1371/journal.pone.0115253 http://dx.doi.org/10.1371/journal.pone.0167475 Reference Rot in Scholarly Communication
  12. 12. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 12 Link Rot in Scholarly Articles
  13. 13. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 13 Link Rot in Scholarly Articles
  14. 14. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 14 Reference Rot Over Time - arXiv
  15. 15. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 15 Robust Links 1. Make links more robust 2. Make them actionable for humans and machines
  16. 16. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 16 Robust Links 1. Create a snapshot of referenced resources in a public web archive
  17. 17. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 17 Why multiple archives? They aren’t magic web sites! They’re just web sites. If you used Mummify, you’re now left with a bunch of defunct, shortened links like: https://mummify.it/XbmcMfE3 Slide by Michael L. Nelson, 2016
  18. 18. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 18 Robust Links 1. Create a snapshot of referenced resources in a publically available web archive 2. Decorate links with standard HTML attributes to convey: • URI of archived snapshot • datetime of linking • resource’s original URI
  19. 19. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 19 Link Decoration with Standard HTML <a href="http://www.cabq.gov/" data-versionurl="http://archive.is/H2Ox9" data-versiondate="2018-02-06"> City of Albuquerque</a> http://robustlinks.mementoweb.org/spec
  20. 20. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 20 Link Decoration with Standard HTML <a href="http://archive.is/H2Ox9 data-originalurl="http://www.cabq.gov/" data-versiondate="2018-02-06"> City of Albuquerque</a> http://robustlinks.mementoweb.org/spec
  21. 21. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 21 Robust Links in Action via JavaScript http://dx.doi.org/10.1045/november2015-vandesompel
  22. 22. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 22 Robust Links in Action - JavaScript http://dx.doi.org/10.1045/november2015-vandesompel
  23. 23. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 23 Take-Aways • Links on the web are subject to reference rot • “Robustifying” them can help alleviate the problem • Link decorations as proposed by Robust Links are • based on HTML standards • machine-actionable • Organizations such as publishers, libraries, archives, Wikipedia can help with adoption
  24. 24. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 24 Signposting http://signposting.org/ Signposting is funded by the Andrew W. Mellon Foundation
  25. 25. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 25 10.1594/PANGAEA.867908
  26. 26. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 26 10.1594/PANGAEA.867908 https://doi.org/10.1594/PANGAEA.867908
  27. 27. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 27 10.1594/PANGAEA.867908 https://doi.org/10.1594/PANGAEA.867908 https://doi.pangaea.de/10.1594/PANGAEA.867908
  28. 28. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 28 https://doi.pangaea.de/10.1594/PANGAEA.867908
  29. 29. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 29 https://doi.pangaea.de/10.1594/PANGAEA.867908
  30. 30. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 30 https://doi.pangaea.de/10.1594/PANGAEA.867908
  31. 31. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 31 https://doi.pangaea.de/10.1594/PANGAEA.867908
  32. 32. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 32 https://doi.pangaea.de/10.1594/PANGAEA.867908
  33. 33. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 33 https://doi.pangaea.de/10.1594/PANGAEA.867908
  34. 34. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 34 Problems • Humans can easily navigate such links i.e., • Copy the DOI and resolve it via https://doi.org • Determine where the bibliographic resources are • Interpret the download link for the dataset ZIP • Search for authors’ names on the web
  35. 35. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 35 Problems • Humans can easily navigate such links i.e., • Copy the DOI and resolve it via https://doi.org • Determine where the bibliographic resources are • Interpret the download link for the dataset ZIP • Search for authors’ names on the web • Machines can’t do any of this!
  36. 36. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 36 HTTP Links Mark Nottingham (2017) RFC8288: Web Linking. https://tools.ietf.org/rfc/rfc8288.txt
  37. 37. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 37 HTTP Links
  38. 38. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 38 HTTP Links
  39. 39. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 39 HTTP Links Are Used curl -I http://dbpedia.org/data/Albuquerque HTTP/1.1 200 OK Date: Tue, 06 Feb 2018 02:02:11 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 24208 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Albuquerque> ; rel="alternate"; type="application/json", <http://dbpedia.org/resource/Albuquerque>; rel="describes",
  40. 40. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 40 HTTP Links Are Used curl -I http://dbpedia.org/data/Albuquerque HTTP/1.1 200 OK Date: Tue, 06 Feb 2018 02:02:11 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 24208 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Albuquerque> ; rel="alternate"; type="application/json", <http://dbpedia.org/resource/Albuquerque>; rel="describes",
  41. 41. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 41 HTTP Links Are Used curl -I http://dbpedia.org/data/Albuquerque HTTP/1.1 200 OK Date: Tue, 06 Feb 2018 02:02:11 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 24208 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Albuquerque> ; rel="alternate"; type="application/json", <http://dbpedia.org/resource/Albuquerque>; rel="describes",
  42. 42. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 42 HTTP Links Are Used curl -I http://dbpedia.org/data/Albuquerque HTTP/1.1 200 OK Date: Tue, 06 Feb 2018 02:02:11 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 24208 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Albuquerque> ; rel="alternate"; type="application/json", <http://dbpedia.org/resource/Albuquerque>; rel="describes",
  43. 43. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 43 HTTP Link Relation Types • Registered in IANA registry • Strings, e.g. license, alternate, describes • Requires a formal specification e.g., RFC • Typically used for common relationships, generically specified • Provides broad, coarse grained interoperability https://www.iana.org/assignments/link-relations/link-relations.xml
  44. 44. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 44 HTTP Links Are Pretty Neat • Can uniformly be used for all MIME types • Accessible via HTTP HEAD (no content transfer): • Works for large resources and for restricted content • HTTP Links can be conveyed: • by-value, in the HTTP Link header • by-reference, by using a linkset link in the HTTP header that points to a collection of links (1) • HTTP Links provide guidance to machine agents’ intent on accomplishing a specific task (1) Wilde, E. and Van de Sompel, H (2017) Linkset: A Link Relation Type and Media Types for Link Sets https://datatracker.ietf.org/doc/draft-wilde-linkset-link-rel/
  45. 45. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 45 Signposting for Repositories Proposal: Use HTTP Links to address some long standing problems regarding scholarly resources on the web, by interlinking them using appropriate relation types.
  46. 46. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 46 Pattern: Identifier • Problem: It is not possible to determine the associated HTTP PID of a scholarly object’s constituent resources • Landing page URIs used for citation**** • Annotations do not refer to HTTP PID • Solution: provide cite-as link pointing at the HTTP PID • Applies to: landing page, all constituent resources
  47. 47. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Use HTTP Link with cite-as Relation Type http://signposting.org/identifier/
  48. 48. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 48 Use HTTP Link with cite-as Relation Type curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908" HTTP/1.1 200 OK Content-length: 8424 Content-type: text/html;charset=UTF-8 Link: <https://doi.org/10.1594/PANGAEA.867908> ; rel=”cite-as"
  49. 49. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 49 • When classifying links extracted from PMC as linking to articles, we assumed that filtering on http://dx.doi.org/* would do the trick • But we found a lot of http://link.springer.com/article/* • For example: http://link.springer.com/article/10.1007/s00799-014-0108-0 • Instead of: http://dx.doi.org/10.1007/s00799-014-0108-0 • We used CrossRef’s reverse domain lookup to classify these extracted links as linking to articles ****A Disconcerting Observation
  50. 50. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 50 ****URI References – PMC Corpus ● ● ● ● ● ● ● ● 04000080000120000 2005 2006 2007 2008 2009 2010 2011 2012 ● DOI shouldBeDOI Web−at−large Herbert Van de Sompel, Martin Klein, Shawn M. Jones (2016) Persistent URIs Must Be Used To Be Persistent In WWW’16. https://arxiv.org/abs/1602.09102
  51. 51. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 51 Pattern: Publication Boundary • Problem: It is not possible to determine what the constituent resources of a scholarly object are • Preservation and text mining tools require portal-specific heuristic to find those constituent resources (1) • No direct path from an HTTP PID to e.g., the PDF • Solution: provide item/collection links to interlink entry page and constituent resources; convey MIME types on item links • Applies to: All constituent resources of a scholarly object (1) Van de Sompel, H., Rosenthal, D., and Nelson, M.L. (2016) Web Infrastructure to Support e-Journal Preservation (and More) http://arxiv.org/abs/1605.06154
  52. 52. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Use HTTP Link with item/collection Relation Type http://signposting.org/publication_boundary/
  53. 53. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 53 Pattern: Bibliographic Metadata • Problem: It is not possible to determine where the bibliographic resources that describes a scholarly object can be found • Preservation and reference manager tools require portal-specific heuristic to find those resources (1) • Solution: provide describedby/describes links to point at bibliographic metadata • Applies to: • describedby: HTTP PID, landing page • describes: bibliographic resources (1) Van de Sompel, H., Rosenthal, D., and Nelson, M.L. (2016) Web Infrastructure to Support e-Journal Preservation (and More) http://arxiv.org/abs/1605.06154
  54. 54. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Use HTTP Link with describedby/describes Relation Type http://signposting.org/bibliographic_metadata/
  55. 55. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 55 Pattern: Author • Problem: It is not possible to uniquely determine who authored the work • Solution: provide author link to point at author-identifying URI • Applies to: HTTP PID, landing page, all constituent resources
  56. 56. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Use HTTP Link with author Relation Type http://signposting.org/author/
  57. 57. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 57 Use HTTP Link with author Relation Type curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908" HTTP/1.1 200 OK Content-length: 8424 Content-type: text/html;charset=UTF-8 Link: <http://orcid.org/0000-0003-1291-8524> ; rel="author"
  58. 58. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM
  59. 59. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM bibliographic resources constituent resources HTTP PID
  60. 60. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM cite-as
  61. 61. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM cite-as
  62. 62. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM cite-as
  63. 63. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM cite-as
  64. 64. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 64 Other Use Cases • Licenses • Versions • Resource Types
  65. 65. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 65 COAR Next Generation Repositories Vision: • Position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication • Layers of value added services will be deployed on top Objective: • Achieve a level of cross-repository interoperability by exposing uniform behaviors across repositories that leverage web-friendly technologies and architectures • Encourage the emergence of added-value services that use these uniform behaviors to support discovery, access, annotating, real-time curating, sharing, quality assessment, content transfer, analytics, provenance tracing, etc. http://comment.coar-repositories.org/
  66. 66. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM 66 Take-Aways • HTTP Links and Relation Types help address common problems regarding scholarly resources on the web e.g.,: • Convey the (persistent) identifier of a resource • Inform about the boundaries of an object • Point at bibliographic metadata • Refer to an author-identifying resource • Increase interoperability of repositories, embrace principles of the web • Make repositories more machine-friendly to the benefit of humans
  67. 67. Towards Web-Centric Repository Interoperability @mart1nkle1n DHUG 2018, 02/07/2018, Albuquerque, NM Towards Web-Centric Repository Interoperability Martin Klein @mart1nkle1n http://orcid.org/0000-0003-0130-2097 Research Library Los Alamos National Laboratory

×