Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
The Web as infrastructure for scholarly research and communication
1. @hvdsomp #idcc13
Wanderer above the Sea of Fog – Caspar David Friedrich (1818)
http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog
2. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
3. The Scholarly Record is Changing
• The scholarly record is extending with a wide range of non-
traditional assets emerging from eScience and eHumanities
• e.g. datasets, software, ontologies, workflows, online debate,
slides, blogs, videos, etc.
• Many of these non-traditional assets:
• Have a wide range of relationships with and dependencies on
other assets – grouping assets
• Are becoming increasingly dynamic, and do not have the sense
of fixity that traditional assets such as journal articles or books
have – versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
4. grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
5. discovering assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
6. • OAI was a heroic effort to fundamentally
transform scholarly communication
• By promoting communication via
preprints, non-peer-reviewed papers
• The OAI took a technical approach to
1999 achieve the goal
• Make preprints easier to discover,
access – Protocol for Metadata
Harvesting
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
7. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
8. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
9. Don’t trust HTTP
HTTP GET on record identifier
An HTTP link
Just another
HTTP baseURL
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
10. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
11. grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
12. • OAI-ORE observation: Scholarly assets are
rapidly becoming compound, consisting of
multiple resources with various:
• Relationships
• Interdependencies
• How to convey this compound-ness in an
2007
interoperable manner so that applications
can access, consume such assets?
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
13. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
14. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
15. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
16. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
17. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
18. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
19. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
21. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
22. grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
23. • Memento is about the Web and time:
• Resources evolve over time
• Only the current representation is
available from a resource’s URI
• How to seamlessly access prior
representation, if they exist?
• Memento looks at this problem for the Web,
in general 2009
Digital Preservation Award 2010
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
24. • Memento has potential consequences for
scholarly communication
• Observation: Scholarly assets are
becoming increasingly dynamic, and do not
have the sense of fixity that traditional
assets such as journal articles or books
have
• Even traditional assets are becoming 2009
increasingly dynamic and dependent on
other assets, which may themselves be
dynamic
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
25. Scientific Workflows, Services, Data, Workflow Engines
Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
26. From The Version of Record to A Version of the Record
• The ever-evolving nature of some assets challenges the notion of
fixity as “forever frozen” and begs considering the notion of the
“state of the scholarly record at a specific moment in time”
• It will become essential to be able to determine what the state of
related and interdependent assets was at certain moments in time
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
27. Two Perspectives on Memento
Web Archive
URI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/
URI-R - http://www.cnn.com/
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
28. Two Perspectives on Memento
CMS
URI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333
URI-R - http://en.wikipedia.org/wiki/September_11_attacks
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
29.
30.
31. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
32.
33. • How to get to the time-specific resources
from the generic resource?
• Memento addresses the problem in a
resource-centric way:
• Resource, URI, state, representation,
link, content negotiation
2009
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
34.
35.
36. Access Versions via the original URI and datetime
Select Date
Today Sep 16 2010
Sep 12 2010
From
BL Archive
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
37. From The Version of Record to A Version of the Record
• The ever-evolving nature of some assets challenges the notion of
fixity as “forever frozen” and begs considering the notion of the
“state of the scholarly record at a specific moment in time”
• It will become essential to be able to determine what the state of
related and interdependent assets was at certain moments in time
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
38.
39.
40.
41.
42.
43.
44.
45.
46. Recreating a Version of the Record
• Is it possible to reconstruct the Web-based scholarly record as it was at
a certain point in time?
• Consider a special case: Given a paper can one see the referenced
materials as they were the time of publication of the paper?
• ti: Time of publication
• Relationship: Cited resources
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
47. Published
September 15 2004
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
48. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
49. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
50. Domain Gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
51. Archived copy
December 5 2003
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
52. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
53. Current version
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
54. Archived copy
December 11 2004
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
55. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
56. Resource gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
57. Archived copy
December 5 2003
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
58. Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
59. Resource gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
60. Archived copy
unavailable
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
61. Pilot Study at Scale with Memento
• Papers from arXiv: 400,000 papers => 144,000 unique URIs
• Papers from UNT ETD repository: 3,600 papers => 18,000 URIs
• Referenced URIs of established scholarly repositories removed (e.g.
http://dx.doi.org), i.e. focusing in on the periphery of the scholarly record
• Study looks into:
• Does the referenced resource still exist?
• Are there archived versions of of the referenced resource?
• From around the time of publication of the citing paper?
• Study does not look into dynamic aspects:
• If the referenced resource still exists, is its content same as at ti?
• Does an archived version have the same content as at ti?
Sanderson, R., Phillips, M., and Van de Sompel, H. (2011) Analyzing the Persistence of Referenced Web
Resources with Memento. Open Repositories 2011; Arxiv preprint. arXiv:1105.3459 ; http://arxiv.org/abs/
1105.3459
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
62. UNT
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
63. arXiv
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
64. The Good News ™
• Despite there not being a pro-active effort to archive those
resources, a considerable amount were
o Because they had HTTP URIs and hence were archived as
part of ongoing web archiving processes
o In The Wild archiving comes for free with the web
infrastructure
• 404 resources exist in web archives and Memento can access
them via their original HTTP URI
o Does that make an HTTP URI a PID?
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
65. The Bad News ™
• Many resources were not archived
• For many resources there were no archival versions around ti
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
66.
67.
68.
69. Automatic Creation of Archival Snapshots
• There is a need for a more pro-active approach to archive
dynamic, interdependent assets, e.g.:
o Web Archives as infrastructure
o Use CMS, wikis, datawikis with solid versioning mechanisms
o Archiving linked context at the time of publication
o Archive at the moment of use (social interaction,
downloading, annotating, etc.)
o Delineate which resources are considered in/out of a
scholarly assets (OAI-ORE) to understand what needs
archiving
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
70. discovering assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
71. • ResourceSync is about allowing 3rd party
systems and applications to remain
synchronized with a server’s evolving
resources.
• Many use cases:
• Mirroring repository content
• Aggregating content
• Replicating datasets
2012 • Exposing content to archives
• Keeping linked data applications that
leverage remote data up-to-date
• Differing needs regarding:
• Coverage
• Accuracy
• Latency
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
72. ResourceSync Approach
• Resource centric; it’s all about the URI (again)
• Introduces a set of modular capabilities that a server can
implement to allow 3rd parties to remain in sync with its
resources. Recurrently publish:
o Resource Lists
o Change Lists
o Resource Dumps
o Change Dumps
• All capabilities based on the Sitemap document formats and
extensions thereof
o Existing Sitemaps are off-the-shelf compliant
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
73. ResourceSync Capabilities
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
74. • Beta spec end 01/2013
• http://www.openarchives.org/rs/
• Feedback
• mailto:resourcesync@googlegroups.com
• Papers in D-Lib Magazine
2012 • http://dx.doi.org/10.145/september2012-
vandesompel
• http://dx.doi.org/10.145/january2013-klein
• Paper in Ariadne
• http://www.ariande.ac.uk/issue70/lewis-et-
al
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
75. 1998 - 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
76. 1998 - 2013
a stack of journals or a network of interconnected
a bunch of PDF files assets and actors
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
77. Conclusion
• OAI-ORE, Memento, ResourceSync illustrate the potential of
leveraging the Web infrastructure for scholarly communication
• This suggests that other special requirements of scholarly
communication (certification, archiving, persistence, trust, annotation,
metrics, …) may be addressable in an interoperable manner by
leveraging the Web infrastructure
• Wins:
• Long Term Sustainability: Reuse of infrastructure (network,
software, platforms, standards, etc.) that the entire world depends
on
• Integration of scholarly discourse with other Web-based discourse
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
78. @hvdsomp #idcc13
Herbert Van de Sompel
Wanderer above the Sea of Fog – Caspar David Friedrich (1818)
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog