Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PID Signposting Pattern

2,021 views

Published on

Presentation for PIDapalooza 2016. PIDs need to be used to achieve their intended persistence. Our research (reported at WWW2016, see http://arxiv.org/1602.09102) found that a disturbing percentage of references to papers that have DOIs actually use the landing page HTTP URI instead of the DOI HTTP URI. The problem is likely related to tools used for collecting references such as bookmarks and reference managers. These select the landing page URI instead of the DOI URI because the former is what's available in the address bar. It can safely be assumed that the same problem exists for other types of PIDs. The net result is that the true potential of PIDs is not realized. In order to ameliorate this problem we propose a Signposting pattern for PIDs (http://signposting.org/identifier/). It consists of adding a Link header to HTTP HEAD/GET responses for all resources identified by a DOI, including the landing page and content resources such as "the PDF" and "the dataset". The Link header contains a link, which points with the "identifier" relation type to the DOI HTTP URI. When such a link is available, tools can automatically discover and use the DOI URI instead of the other URIs (landing page, PDF, dataset) associated with the DOI-identified object.

Published in: Internet
  • Be the first to comment

PID Signposting Pattern

  1. 1. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Cartoon by Patrick Hochstenbach Herbert Van de Sompel LANL & DANS @hvdsomp http://orcid.org/0000-0002-0715-6126 Acknowledgments: Geoff Bilder, Shawn Jones, Martin Klein, Michael L. Nelson, David Rosenthal, Harihar Shankar, Simeon Warner, Karl Ward, Joe Wass A Signposting Pattern for PIDs http://signposting.org Signposting is funded by the Andrew W. Mellon Foundation
  2. 2. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • A Disconcerting Observation • A Proposed Fix Using Signposting • Signposting, The Bigger Picture • Additional Signposting Patterns Outline
  3. 3. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Large Scale Study into Reference Rot for Links to Web-at-Large Resources Found in STM Articles Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253 Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. Under review.
  4. 4. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 STM Articles in the Study Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253 STM articles published 1997-2012 arXiv PMC total Per corpus 707,667 479,194 1,186,861 With URI references to articles 51,574 240,857 292,431 With URI references to web-at-large resources 142,134 156,160 298,294
  5. 5. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Articles that Link to Articles & to Web At Large Resources (PMC) Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  6. 6. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 URI References in the Study Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253 URI References arXiv PMC total Per corpus 781,895 1,653,567 2,435,462 Excluded 1,555 428,036 429,591 To articles 434,163 744,678 1,178,841 To web-at-large resources 346,177 480,853 827,030
  7. 7. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 URI References to Articles & to Web At Large Resources (PMC) Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  8. 8. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • When classifying URI references as linking to articles, we assumed that filtering on http://dx.doi.org/* would do the trick • But we found a lot of e.g. http://link.springer.com/article/* • For example: • http://link.springer.com/article/10.1007%2Fs00799-014-018-0 • Instead of: • http://dx.doi.org/10.1007/s00799-014-0108-0 • We used CrossRef’s Reverse Domain Lookup to classify these URI references as linking to articles and went on with our reference rot research A Disconcerting Observation
  9. 9. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Hiberlink Results: Link Rot - arXiv Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  10. 10. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Hiberlink Results: Content Drift - arXiv Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. Under review. Under review
  11. 11. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Hiberlink Results: Robust Links http://robustlinks.mementoweb.org
  12. 12. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 A Closer Look at the Disconcerting Observation Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
  13. 13. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 A Closer Look at the Disconcerting Observation - arXiv Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
  14. 14. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 A Closer Look at the Disconcerting Observation - PMC Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
  15. 15. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • CrossRef’s publisher baseURLs represents the state of the DOI resolver at the time of the research • Some shouldbeDOI may have been classified as web-at-large because old publisher baseURLs are no longer in the resolver • At the time of the research, no public information was available about when a publisher started to assign DOIs • Some references may have wrongly been classified as shouldbeDOI because publisher was not yet assigning DOIs in earlier years • Findings for recent years do not suffer from the above Caveats Regarding the Disconcerting Observation
  16. 16. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Content Types of ”200 OK” shouldbeDOI Resources, Year 2012 Content Type arXiv PMC text/html 19,649 63,769 application/pdf 153 1,813 text/plain 7 3,924 image/jpeg 1 64 other 46 74 none provided 2,118 5,210 total 21,974 74,854
  17. 17. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Content Length of ”200 OK” shouldbeDOI Resources, Year 2012 Content Length arXiv PMC 1-50 k 6,084 7,215 50-100 k 772 12,804 100-150 k 225 4,835 150-200 k 33 7,885 200+ k 216 9,423 chunked 4,100 20,596 none provided 10,544 12,096 total 21,974 74,854
  18. 18. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Top Target baseURLs for shouldbeDOI Resources, 1997-2012 arXiv PMC ams.org biomedcentral.com adsabs.harvard.edu scripts.iucr.org link.aps.org ncbi.nlm.nih.gov stacks.aip.org frontiersin.org link.aip.org ccforum.com emis.de nar.oxfordjournals.org springerlink.com nature.com jstor.org elsevier.com ncbi.nlm.nih.gov jcb.org sciencemag.org jmir.org
  19. 19. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • A Disconcerting Observation • A Proposed Fix Using Signposting • Signposting, The Bigger Picture • Additional Signposting Patterns Outline
  20. 20. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • The PID URI is not in the browser’s address bar, when at: • The landing page • The PDF • The dataset • Any web resource that is part of the PID-identified object • Status quo: • Provide the PID URI in copy/paste-able manner in landing page • Provide PID URI in a downloadable citation • Embed PID URI in an XMP container • Desired: The ability for tools to uniformely discover the PID URI when at any web resource that is part of a PID-identified object Status Quo
  21. 21. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links Mark Nottingham (2010) RFC5988: Web Linking. http://tools.iets.org/rfc/rfc5988.txt
  22. 22. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links
  23. 23. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links
  24. 24. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  25. 25. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  26. 26. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  27. 27. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Links Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  28. 28. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 HTTP Link Are Used curl –I http://dbpedia.org/data/Reykjavik HTTP/1.1 200 OK Date: Thu, 27 Oct 2016 04:43:28 GMT Content-Type: application/rdf+xml; charset=UTF-8 Content-Length: 1210 Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"
  29. 29. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • Registered in IANA registry • Strings, e.g. license, alternate, describes, timegate • Requires a formal specification, e.g. RFC • Typically used for common relationships, generically specified • Provides broad, coarse grained interoperability • Minted by a community • URIs, e.g. http://xmlns.com/foaf/0.1/primaryTopic • Requires community agreement • Can be as specific as desired • Can provide community-specific, fine grained interoperability HTTP Link Relation Types
  30. 30. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Proposal: Use HTTP Link with identifier Relation Type curl –I http://www.dlib.org/dlib/november15/vandesompel/11vandesompel.html HTTP/1.1 200 OK Date: Wed, 26 Oct 2016 12:36:37 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Thu, 19 Nov 2015 14:50:19 GMT ETag: "205a5e-f5ef-524e5e0ab80c0" Accept-Ranges: bytes Content-Length: 62959 Content-Type: text/html; charset=UTF-8 Link: <https://doi.org/10.1045/november2015-vandesompel> ; rel=“identifier” Michael Nelson and Herbert Van de Sompel (2016) Linking to Persistent Identifiers with rel=“identifier” http://ws-dl.blogspot.nl/2016/11/2016-11-07-linking-to-persistent.html
  31. 31. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Proposal: Use HTTP Link with identifier Relation Type http://signposting.org/identifier/dryad/
  32. 32. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • Can uniformly be used for all MIME types • Accessible via HTTP HEAD (no content transfer): • Works for large resources • Can work for restricted content • Unbelievable but True: Many publishers don’t support HEAD • In many cases, HTTP identifier links can be implemented using simple URI rewrite rules in web server • The URIs of web resources that are part of PID-identified object many times contain the PID • Allows addressing many other patterns using basic technology HTTP Links Are Pretty Neat
  33. 33. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • A Disconcerting Observation • A Proposed Fix Using Signposting • Signposting, The Bigger Picture • Additional Signposting Patterns Outline
  34. 34. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Signposting the Scholarly Web http://signposting.org
  35. 35. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Herbert Van de Sompel and Michael L. Nelson (2015) Reminiscing about 15 years of interoperability efforts. https://doi.org/10.1045/november2015-vandesompel Reminiscing About Interoperability for Scholarly Communication
  36. 36. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 I Have Done My Fair Share OAI-PMH OAI-ORE Memento Shared Canvas info URI Open Annotation ResourceSync Robust Links OpenURL
  37. 37. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • A highly distributed activity • Try turning this distributed activity from a gathering of silos into an ecology of collaborating repositories • In the web context, this seems like a rather unique challenge • Most web enterprises want dominance, not collaboration • Interoperability as an enabler to connect resources from distributed repositories • Repositories expose uniform behaviors • Multiple parties can interact uniformly with (resources of) these repositories to create added-value Research Communication on the Web
  38. 38. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Tools of the Web-Centric Interoperability Trade • Resource • URI • HTTP as the API: HEAD/GET, POST, PUT, DELETE • Representation • Media Type • Link • Content Negotiation, e.g. for preferred Media Type • Typed Link • Controlled Vocabularies for Typed Links W3C Architecture of the World Wide Web
  39. 39. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Tools of the Web-Centric Interoperability Trade – RDF Stack • Resource • URI • HTTP as the API: HEAD/GET, POST, PUT, DELETE • Representation • Media Type • Link • Content Negotiation, e.g. for preferred Media Type • Typed Link • Controlled Vocabularies for Typed Links RDF, RDFS, OWL W3C Architecture of the World Wide Web
  40. 40. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Used by various interoperability efforts, e.g. OAI-ORE, Open Annotation, W3C PROV, Research Objects, … • Provides extensive expressiveness for description • Typically based on publishing documents that adhere to a certain “profile” and reveal relations, properties, … • Non-Trivial barrier to entry as illustrated by slow adoption, likely related to unfamiliar technology stack Interoperability via RDF, RDFS, OWL Stack
  41. 41. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Tools of the Web-Centric Interoperability Trade – HTTP Stack • Resource • URI • HTTP as the API • Representation • Media Types • Link • Content Negotiation, e.g. for preferred Media Type • Typed Link • Controlled Vocabularies for Typed Links HTTP Links, IANA link relation registry, community link relation types HATEOAS – Hypermedia As The Engine Of Application State http://en.wikipedia.org/wiki/HATEOAS W3C Architecture of the World Wide Web
  42. 42. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Used by Memento, ResourceSync, Signposting the Scholarly Web: • Provides coarse expressiveness for navigation via IANA registered relation types (expressed as reserved terms) • Finer grained expressiveness via community-defined relation types (expressed as HTTP URIs) • Typically based on publishing typed links that support a client to navigate among resources in an informed manner • Low implementation barrier because of familiar technology stack Interoperability via HTTP Links, IANA Link Relation Types
  43. 43. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • A Disconcerting Observation • A Proposed Fix Using Signposting • Signposting, The Bigger Picture • Additional Signposting Patterns Outline
  44. 44. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • Identifier pattern • Publication boundary pattern • Bibliographic metadata pattern Currently at signposting.org
  45. 45. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Publication Boundary Pattern http://signposting.org/publication_boundary/oxford/
  46. 46. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Bibliographic Metadata Pattern http://signposting.org/bibliographic_metadata/springer/
  47. 47. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Bibliographic Metadata Pattern http://signposting.org/conventions/
  48. 48. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Use Case: Resource Capture for Digital Preservation Herbert Van de Sompel, David Rosenthal, and Michael L. Nelson (2015) Web Infrastructure to Support e-Journal Preservation (and More). http://arxiv.org/abs/1605.06154
  49. 49. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 • Author pattern • author link from DOI URI to ORCID URI • author link from landing page to ORCID URI • License pattern • license link from web resources that are part of a scholarly object to the appropriate license URI • Resource type pattern • type relation type on the web resource itself • sem-type attribute on links to a web resource • URIs to express resource types • Which? How coarse/fine grained? Expected at signposting.org
  50. 50. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Resource Type Pattern
  51. 51. Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Cartoon by: Patrick Hochstenbach Herbert Van de Sompel LANL & DANS @hvdsomp http://orcid.org/0000-0002-0715-6126 Acknowledgments: Geoff Bilder, Shawn Jones, Martin Klein, Michael L. Nelson, David Rosenthal, Harihar Shankar, Simeon Warner, Karl Ward, Joe Wass A Signposting Pattern for PIDs http://signposting.org Signposting is funded by the Andrew W. Mellon Foundation

×