The Web as infrastructure for scholarly research and communication

3,578 views
3,942 views

Published on

Keynote presented at IDCC13, Amsterdam, The Netherlands, January 16 2013.

Published in: Technology

The Web as infrastructure for scholarly research and communication

  1. 1. @hvdsomp #idcc13 Wanderer above the Sea of Fog – Caspar David Friedrich (1818) http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog
  2. 2. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  3. 3. The Scholarly Record is Changing•  The scholarly record is extending with a wide range of non- traditional assets emerging from eScience and eHumanities •  e.g. datasets, software, ontologies, workflows, online debate, slides, blogs, videos, etc.•  Many of these non-traditional assets: •  Have a wide range of relationships with and dependencies on other assets – grouping assets •  Are becoming increasingly dynamic, and do not have the sense of fixity that traditional assets such as journal articles or books have – versioning assets Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  4. 4. grouping assets versioning assets Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  5. 5. discovering assets Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  6. 6. •  OAI was a heroic effort to fundamentally transform scholarly communication •  By promoting communication via preprints, non-peer-reviewed papers •  The OAI took a technical approach to1999 achieve the goal •  Make preprints easier to discover, access – Protocol for Metadata Harvesting Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  7. 7. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  8. 8. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  9. 9. Don’t trust HTTP HTTP GET on record identifier An HTTP link Just anotherHTTP baseURL Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  10. 10. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  11. 11. grouping assets versioning assets Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  12. 12. •  OAI-ORE observation: Scholarly assets are rapidly becoming compound, consisting of multiple resources with various: •  Relationships •  Interdependencies•  How to convey this compound-ness in an 2007 interoperable manner so that applications can access, consume such assets? Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  13. 13. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  14. 14. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  15. 15. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  16. 16. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  17. 17. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  18. 18. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  19. 19. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  20. 20. See e.g. http://www.ctwatch.org/quarterly/articles/2007/08/interoperability- for-the-discovery-use-and-re-use-of-units-of-scholarly-communication/8/ index.html Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  21. 21. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  22. 22. grouping assets versioning assets Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  23. 23. •  Memento is about the Web and time: •  Resources evolve over time •  Only the current representation is available from a resource’s URI •  How to seamlessly access prior representation, if they exist?•  Memento looks at this problem for the Web, in general 2009 Digital Preservation Award 2010 Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  24. 24. •  Memento has potential consequences for scholarly communication•  Observation: Scholarly assets are becoming increasingly dynamic, and do not have the sense of fixity that traditional assets such as journal articles or books have •  Even traditional assets are becoming 2009 increasingly dynamic and dependent on other assets, which may themselves be dynamic Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  25. 25. Scientific Workflows, Services, Data, Workflow EnginesCarole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  26. 26. From The Version of Record to A Version of the Record •  The ever-evolving nature of some assets challenges the notion of fixity as “forever frozen” and begs considering the notion of the “state of the scholarly record at a specific moment in time” •  It will become essential to be able to determine what the state of related and interdependent assets was at certain moments in time Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  27. 27. Two Perspectives on Memento Web ArchiveURI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/URI-R - http://www.cnn.com/ Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  28. 28. Two Perspectives on Memento CMSURI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333URI-R - http://en.wikipedia.org/wiki/September_11_attacks Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  29. 29. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  30. 30. •  How to get to the time-specific resources from the generic resource?•  Memento addresses the problem in a resource-centric way: •  Resource, URI, state, representation, link, content negotiation 2009 Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  31. 31. Access Versions via the original URI and datetime Select DateToday Sep 16 2010 Sep 12 2010 From BL Archive Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  32. 32. From The Version of Record to A Version of the Record •  The ever-evolving nature of some assets challenges the notion of fixity as “forever frozen” and begs considering the notion of the “state of the scholarly record at a specific moment in time” •  It will become essential to be able to determine what the state of related and interdependent assets was at certain moments in time Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  33. 33. Recreating a Version of the Record•  Is it possible to reconstruct the Web-based scholarly record as it was ata certain point in time?•  Consider a special case: Given a paper can one see the referencedmaterials as they were the time of publication of the paper? •  ti: Time of publication •  Relationship: Cited resources Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  34. 34. Published September 15 2004 Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  35. 35. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  36. 36. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  37. 37. Domain Gone Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  38. 38. Archived copy December 5 2003 Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  39. 39. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  40. 40. Current version Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  41. 41. Archived copy December 11 2004 Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  42. 42. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  43. 43. Resource gone Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  44. 44. Archived copy December 5 2003 Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  45. 45. Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  46. 46. Resource gone Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  47. 47. Archived copy unavailable Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  48. 48. Pilot Study at Scale with Memento •  Papers from arXiv: 400,000 papers => 144,000 unique URIs •  Papers from UNT ETD repository: 3,600 papers => 18,000 URIs •  Referenced URIs of established scholarly repositories removed (e.g. http://dx.doi.org), i.e. focusing in on the periphery of the scholarly record •  Study looks into: •  Does the referenced resource still exist? •  Are there archived versions of of the referenced resource? •  From around the time of publication of the citing paper? •  Study does not look into dynamic aspects: •  If the referenced resource still exists, is its content same as at ti? •  Does an archived version have the same content as at ti?Sanderson, R., Phillips, M., and Van de Sompel, H. (2011) Analyzing the Persistence of Referenced WebResources with Memento. Open Repositories 2011; Arxiv preprint. arXiv:1105.3459 ; http://arxiv.org/abs/1105.3459 Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  49. 49. UNT Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  50. 50. arXiv Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  51. 51. The Good News ™•  Despite there not being a pro-active effort to archive those resources, a considerable amount were o  Because they had HTTP URIs and hence were archived as part of ongoing web archiving processes o  In The Wild archiving comes for free with the web infrastructure•  404 resources exist in web archives and Memento can access them via their original HTTP URI o  Does that make an HTTP URI a PID? Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  52. 52. The Bad News ™•  Many resources were not archived•  For many resources there were no archival versions around ti Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  53. 53. Automatic Creation of Archival Snapshots•  There is a need for a more pro-active approach to archive dynamic, interdependent assets, e.g.: o  Web Archives as infrastructure o  Use CMS, wikis, datawikis with solid versioning mechanisms o  Archiving linked context at the time of publication o  Archive at the moment of use (social interaction, downloading, annotating, etc.) o  Delineate which resources are considered in/out of a scholarly assets (OAI-ORE) to understand what needs archiving Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  54. 54. discovering assets Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  55. 55. •  ResourceSync is about allowing 3rd party systems and applications to remain synchronized with a server’s evolving resources. •  Many use cases: •  Mirroring repository content •  Aggregating content •  Replicating datasets2012 •  Exposing content to archives •  Keeping linked data applications that leverage remote data up-to-date •  Differing needs regarding: •  Coverage •  Accuracy •  Latency Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  56. 56. ResourceSync Approach•  Resource centric; it’s all about the URI (again)•  Introduces a set of modular capabilities that a server can implement to allow 3rd parties to remain in sync with its resources. Recurrently publish: o  Resource Lists o  Change Lists o  Resource Dumps o  Change Dumps•  All capabilities based on the Sitemap document formats and extensions thereof o  Existing Sitemaps are off-the-shelf compliant Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  57. 57. ResourceSync Capabilities Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  58. 58. •  Beta spec end 01/2013 •  http://www.openarchives.org/rs/ •  Feedback •  mailto:resourcesync@googlegroups.com •  Papers in D-Lib Magazine2012 •  http://dx.doi.org/10.145/september2012- vandesompel •  http://dx.doi.org/10.145/january2013-klein •  Paper in Ariadne •  http://www.ariande.ac.uk/issue70/lewis-et- al Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  59. 59. 1998 - 2013 Herbert Van de SompelIDCC 2013, Amsterdam, The Netherlands, January 16 2013
  60. 60. 1998 - 2013a stack of journals or a network of interconnecteda bunch of PDF files assets and actors Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  61. 61. Conclusion•  OAI-ORE, Memento, ResourceSync illustrate the potential ofleveraging the Web infrastructure for scholarly communication•  This suggests that other special requirements of scholarlycommunication (certification, archiving, persistence, trust, annotation,metrics, …) may be addressable in an interoperable manner byleveraging the Web infrastructure•  Wins: •  Long Term Sustainability: Reuse of infrastructure (network, software, platforms, standards, etc.) that the entire world depends on •  Integration of scholarly discourse with other Web-based discourse Herbert Van de Sompel IDCC 2013, Amsterdam, The Netherlands, January 16 2013
  62. 62. @hvdsomp #idcc13 Herbert Van de Sompel Wanderer above the Sea of Fog – Caspar David Friedrich (1818) IDCC 2013, Amsterdam, The Netherlands, January 16 2013 http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog

×