ResourceSync tutorial OAI8

1,580 views

Published on

This ResourceSync tutorial was presented at OAI8, June 19 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,580
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • LANL Memento Aggregator of IIPC; Europeana does metadata via OAI-PMH but anticipate content also; arXiv – mirroring and data sharing; Linked data @ BBC; DBpedia, journal data at LANLREST not about in 1999
  • XML <-> OAI-PMHlarge data begs diff question
  • protected mostly about existing HTTP auth methods, stats -> just inventory
  • Switching to a standardized resource-centric framework could
  • Semantic web version of wikipedia; want mirror to provide reliable basis for local services
  • Semantic web version of wikipedia; want mirror to provide reliable basis for local services
  • Semantic web version of wikipedia; want mirror to provide reliable basis for local services
  • Semantic web version of wikipedia; want mirror to provide reliable basis for local services
  • Rsyncetc just reference; push vs pull -> both; many other parts
  • Rsyncetc just reference; push vs pull -> both; many other parts
  • They have in common: versions exist at different URIs. Because only the representation of a single state of a resource is available from a URI.
  • They have in common: versions exist at different URIs. Because only the representation of a single state of a resource is available from a URI.
  • Pattern exists in e.g.: WikiPedia, W3C specs, DryadNot sure whether DOI in general follows this paradigm.
  • Now the question is “How we do access those versions” - Can interlink them. There’s RFCs that describe how to do that.-But that URI-R is special. It is what typically is being bookmarked, put in email. Want to leverage the fact that this URI-R is always there. Use it as the entry point.
  • Memento addresses the problem in a resource-centric way:Resource, URI, state, representation, link, content negotiation
  • Test site, has subsets of arXiv and even complete source plus metadata (at present not up to date with 0.9)
  • No way around the difficulty of transferring 1TB initially but then a daily or weekly sync is efficient, and it still works even after some arbitrary time.
  • Email and phone discussions over the past few months. Knock-down drag-out two day meeting after JCDL in DC in June.
  • Email and phone discussions over the past few months. Knock-down drag-out two day meeting after JCDL in DC in June.
  • ResourceSync tutorial OAI8

    1. 1. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync: A Web-Based Resource Synchronization Framework OAI8 version, June 19 2013 ResourceSyncis funded by The Sloan Foundation & JISC#resourcesync 1
    2. 2. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 2 These slides were presented OAI8, Geneva, Switzerland, June 19 2013 The most recent version of the slides is available at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
    3. 3. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Herbert Van de Sompel Los Alamos National Laboratory <hvdsomp@gmail.com> @hvdsomp Robert Sanderson Los Alamos National Laboratory <azaroth42@gmail.com> @azaroth42 Richard Jones Cottage Labs <richard@cottagelabs.com> @cottagelabs ResourceSync Tutorial Presenters 3
    4. 4. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Martin Klein Los Alamos National Laboratory <martinklein0815@gmail.com> @mart1nkle1n ResourceSync Tutorial Contributors Simeon Warner Cornell University <simeon.warner@cornell.edu> 4 Herbert Van de Sompel Los Alamos National Laboratory <hvdsomp@gmail.com> @hvdsomp Robert Sanderson Los Alamos National Laboratory <azaroth24@gmail.com> @azaroth24 Richard Jones Cottage Labs <richard@cottagelabs.com> @cottagelabs
    5. 5. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Core Team OAI Herbert Van de Sompel Martin Klein Robert Sanderson (Los Alamos National Laboratory) Simeon Warner (Cornell University) BerhardHaslhofer (University of Vienna) Michael L. Nelson (Old Dominion University) Carl Lagoze (University of Michigan) NISO Todd Carpenter Nettie Lagace Lyrasis Peter Murray 5
    6. 6. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Technical Group JISC Richard Jones Graham Klyne Stuart Lewis OCLC Jeff Young LOCKSS David Rosenthal RedHat Christian Sadilek Ex Libris Inc. Shlomo Sanders Library of Congress Kevin Ford 6
    7. 7. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 1. ResourceSync: Problem Perspective & Conceptual Approach 2. Motivation & Use Cases 3. Framework Walkthrough 4. Framework (Technical) Details 5. Implementation 6. Q&A 7
    8. 8. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 1. ResourceSync: Problem Perspective & Conceptual Approach 8
    9. 9. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Synchronize What? • Web resources o things with a URI that can be dereferenced • Focus on needs of research communication and cultural heritage organizations o but aim for generality 9
    10. 10. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Synchronize What? • Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) sync sync 10
    11. 11. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Synchronize What? • Low change frequency (weeks/months) to high change frequency(seconds) sync sync sync 11
    12. 12. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Synchronize What? • Synchronization latency and accuracy needs may vary sync Sync ??? 12
    13. 13. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! • Project team involved with projects that need this • Experience with OAI-PMH: widely used in repos but o XML metadata only o Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o Web technology has moved on since 1999 • Devise a shared solution for data, metadata, linked data? 13
    14. 14. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Problem • Consideration: • Source (server) A has resources that change over time: they get created, modified, deleted • Destination (servers) X, Y, and Z leverage (some) resources of Source A. • Problem: • Destinations want to keep in step with the resource changes at Source A: resource synchronization. • Goal: • Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. • The approach must scale better than recurrent HTTP HEAD/GET on resources. 14
    15. 15. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source: 4 Core Synchronization Capabilities 1. Describing content – publish a list of resources subject to synchronization to enable Destinations to perform an initial load or catch-up with a Source 2. Packaging content – bundle resources to enable bulk download for destinations 3. Describing changes – publish a list of resource changes to enable destinations to stay synchronized and decrease latency 4. Packaging changes – bundle resource changes for bulk download for destinations 15
    16. 16. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source: Synchronization Features 5. Linking to related resources – provide links from to be synchronized resources to related resources  applicable to all core capabilities (1..4) 6. Access to historical data – provide archives of 1..4 7. Discovery of capabilities – support Destinations in discovering all offered capabilities 1..4 16
    17. 17. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Destination: Synchronization Needs 1. Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source - avoid out-of-band setup 2. Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source - subject to some latency; minimal: create/update/delete - allow to catch-up after destination has been offline 3. Audit – A destination should be able to determine whether it is synchronized with a source - subject to some latency 17
    18. 18. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 2. Motivation & Use Cases 18
    19. 19. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Use Cases – The Basics a) b) 19
    20. 20. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Use Cases – The Basics c) d) 20
    21. 21. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Use Cases – The not-so-Basics e) f) 21
    22. 22. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Use Cases – The not-so-Basics g) h) 22
    23. 23. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 1. Use Case: arXiv Mirroring and Data Sharing • Repository of scholarly articles in physics, mathematics, computer science, etc. • > 850k articles • approx. 1.5 revisions per article on average • approx. 75k new articles per year • Each article has full-text and separate metadata record • approx. 3.8M resources 23
    24. 24. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 1. Use Case: arXiv Mirroring and Data Sharing • 2,700 updates daily o at 8pm EST o Currently using homebrew mirroring solution (running with minor modifications since 1994!) o occasional rsync (file system-specific, auth issues) 24
    25. 25. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Mirroring arXiv: 1994 - 2013 • Operated since the very early days of the Web! 1. HTTP trigger from the main site 2. HTTP pull update specific to mirror site 3. HTTP download of the resources 4. HTTP trigger to main site when mirror process complete 5. HTTP verification (via HEAD) by the main site which updates the update list specific to mirror site 6. periodic repeat as long as there are updates in the inventory for that mirror • Requires trusted set of servers operating with the same internal organization • Does not support synchronization check (so rsync is used periodically) 25
    26. 26. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Mirroring • GOAL: Keep mirror sites synchronized with daily changes • WANT: o high consistency o moderate latency o robustness to global network outages (low admin effort) o ability to verify sync status in case of questions 1. Use Case: arXiv 26
    27. 27. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Data Sharing • GOAL: Make resources and update information publicly available so that any other service may synchronize at the frequency it needs, e.g. o Math Front at UC Davis o EprintWeb from IOP in UK o Data for bibliometric and scientometric analysis • WANT: o low admin effort (i.e. standard approach, standard tools) o reasonable consistency, latency, efficiency 1. Use Case: arXiv 27
    28. 28. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 2. Use Case: DBpedia Live Duplication • Average of 2 updates per second • Low latency desirable => need for a push technology 28
    29. 29. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 2. Use Case: DBpedia Live Duplication 29 • Initial experiment with distributed infrastructure
    30. 30. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 2. Use Case: DBpedia Live Duplication • Daily traffic: o 99% updates o 0.6% deletions o 0.03% creations 30
    31. 31. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 2. Use Case: DBpedia Live Duplication • # of content transfer events in two 8 hour intervals • Max, queue size of remote duplication process 31
    32. 32. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 3. Framework Walkthrough 32
    33. 33. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o Publish a Resource List, a list of resource URIs and possibly associated metadata - Destination GETs the Content Description - Destination GETs listed resources by their URI o Describes state of set of resources at one point in time (snapshot) 33
    34. 34. 34
    35. 35. 35
    36. 36. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source Capability 2: Packaging Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o Publish a Resource Dump, a document that points to packages of resource representations and necessary metadata - Destination GETs the package - Destination unpacks the package - ZIP format supported o Packages set of resources at one point in time (snapshot) 36
    37. 37. 37
    38. 38. 38
    39. 39. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source: Modular Capabilities 39
    40. 40. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source Capability 3: Describing Changes In order to achieve lower latency, a source may communicate about changes to its resources: o Publish a Change List, a list of recent change events (created, updated, deleted resource) - Destination acts upon change events, e.g. GETs created/updated resources, removes deleted resources. o Describes changes to resources that occurred in a temporal interval with a start- and an end-date 40
    41. 41. 41
    42. 42. 42
    43. 43. 43
    44. 44. 44
    45. 45. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source Capability 4: Packaging Changes In order to reduce the number of requests to obtain resource changes, a source may provide packaged bitstreams for changed resources: o Publish a Change Dump, a document that points to packages of recently changed resource representations and necessary metadata - Destination GETs the package - Destination unpacks the package - ZIP format supported o Packages resources that changed in a temporal interval with a start- and an end-date 45
    46. 46. 46
    47. 47. 47
    48. 48. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Source: Modular Capabilities 48
    49. 49. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Framework Structure (light) 49
    50. 50. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Framework Structure (complete) 50
    51. 51. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Destination: Key Processes 51
    52. 52. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 52
    53. 53. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 1. Sitemaps 2. Pull method 3. Linking between resources 4. Discovery 5. Push method 6. Archives 53
    54. 54. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 1. Sitemaps 54
    55. 55. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland So Many Choices XMPP AtomPub SDShare RSS Atom PubSubHubbub Sitemap XMPP rsync OAI-PMH WebDAV Col. Syn. OAI-ORE DSNotify RDFsync Crawl Push Pull SWORD SPARQLpush 55
    56. 56. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland So Many Choices XMPP AtomPub SDShare RSS Atom PubSubHubbub Sitemap XMPP rsync OAI-PMH WebDAV Col. Syn. OAI-ORE DSNotify RDFsync Crawl Push Pull SWORD SPARQLpush 56
    57. 57. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 57
    58. 58. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland A Framework Based on Sitemaps • Modular framework allowing selective deployment • Sitemap is the core format throughout the framework o Introduce extension elements and attributes: - In ResourceSync namespace (rs:) to accommodate synchronization needs o Reuse Sitemap format for all capability documents: Resource List, Resource Dump, Change List, Change Dump, as well as for manifest in Dumps o Utilize Sitemap index format where needed/allowed 58
    59. 59. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Sitemap Format <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9”> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </url> … </urlset> 59
    60. 60. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Sitemap Index Format <sitemapindexxmlns="http://www.sitemaps.org/schemas/sitemap/0.9”> <sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </sitemap> <sitemap> <loc>http://example.com/sitemap2.xml</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </sitemap> … </sitemapindex> 60
    61. 61. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Sitemap Extensions <urlsetxmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url> </urlset> 61
    62. 62. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Sitemap Extensions <sitemapindexxmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </sitemap> … </sitemapindex> 62
    63. 63. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 2. Pull method 63
    64. 64. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 1: Resource List 64
    65. 65. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 1: Resource List <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability="resourcelist" from="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:mdhash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url> </urlset> 65
    66. 66. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Resource List • Describe Source’s resources that are subject to synchronization • At one point in time (snapshot) • Typical Destination use: Baseline Synchronization, Audit • Each URI typically listed only once • Might be expensive to generate • Destinations use @from to determine freshness • Issue GETs against URIs to obtain resources • Very similar to current Sitemaps 66
    67. 67. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 2: Resource Dump 67
    68. 68. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 2: Resource Dump <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump" from="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/resourcedump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”97553" type=”application/zip"/> </url> <url> <loc>http://example.com/resourcedump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”21294" type=”application/zip"/> </url> </urlset> 68
    69. 69. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Resource Dump Manifest <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump-manifest" from="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type="text/html" path=”/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type=”application/pdf” path=”/resources/res2"/> </url> </urlset> 69
    70. 70. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Resource Dump • Package Source’s resourcesthat are subject to synchronization • At one point in time (snapshot) • Points to ZIP packages • Mandatory, even for only one ZIP • ZIP package contains manifest, listing contained bitstreams • Typical Destination use: Baseline Synchronization, bulk download • Each URI typically listed only once • Might be expensive to generate • Destinations use @from to determine freshness • GETs against individual URIs from Resource List achieves the same result (ignoring varying freshness) 70
    71. 71. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 3: Change List 71
    72. 72. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 3: Change List <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:mdchange=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url> </urlset> 72
    73. 73. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change List • Describe Source’s resource changes • Occurring during temporal interval with start- and end-date • Typical Destination use: Incremental Synchronization, Audit • Changes are listed in chronological order • Multiple changes to one URI may result in multiple listing of same URI • Source determines duration of temporal interval • Destinations use @from and @until to determine freshness • Issue GETs against URIs to obtain changed resources 73
    74. 74. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 4: Change Dump 74
    75. 75. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability 4: Change Dump <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/change_dump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length="887" type=”application/zip"/> </url> <url> <loc>http://example.com/change_dump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”9767" type=”application/zip"/> </url></urlset> 75
    76. 76. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change Dump Manifest <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump-manifest" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:mdchange=”updated" length=”2887” type=”text/html” path=”changes/res1”/> </url> <url> … </url> </urlset> 76
    77. 77. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change Dump • Package Source’s resources that have changed • during temporal interval with start- and end-date • Points to ZIP packages • Mandatory, even for only one ZIP • ZIP package contains manifest, listing contained bitstreams • Typical Destination use: Incremental Synchronization, bulk download of changes • Changes in Change Dump Manifest listed in chronological order • Same URI can be listed multiple times • Might be expensive to generate • Destinations use @from and @until to determine freshness 77
    78. 78. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Recall… *Index <changelist_index.xml> <changelist1.xml> 78
    79. 79. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change List Index <changelist_index.xml> <sitemapindexxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <sitemap> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-01-02T11:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap> <sitemap> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-01-02T23:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap> </urlset> 79
    80. 80. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change List <changelist1.xml> <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs=http://www.openarchives.org/rs/terms/> <rs:lnrel=”up” href=”http://example.com/changelist_index.xml”/> <rs:md capability="changelist" from="2013-01-02T09:00:00Z” until="2013-01-02T21:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> </urlset> 80
    81. 81. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 3. Linking between resources 81
    82. 82. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Supported Linking Use Cases The web is based on links between resources, many of which are important to understand for synchronization. 1. Mirrored content with multiple download locations 2. Alternate representations of the same content 3. Patching content rather than replacing 4. Resources and their metadata 5. Prior versions of resources 6. Collection membership of resources 7. Republishing synchronized resources All cases are handled with a <rs:ln> element referring to the remote resource 82
    83. 83. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Notes about Linked Resources Some important things to keep in mind about linked resources: • They may also be subject to synchronization • They may be updated in a very different schedule to the resource it is linked from • Therefore, it is recommended to convey metadata about the linked resource too • Links can be bi-directional – the linked resource can link back to the linking resource 83
    84. 84. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #1 - Mirror 1. Mirrored content with multiple download locations This might occur due to: • Content distribution networks • Mirror sites • Backup locations • Load balancing 84
    85. 85. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #1 - Mirror <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”duplicate” pri=”1” href=”http://mirror1.example.com/res1"/> <rs:lnrel=”duplicate” pri=”2” href=”http://mirror2.example.com/res1"/> </url> </urlset> 85
    86. 86. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #2 – Alternate Representations 2. Alternate representations of the same content This might occur due to: • Server supports HTTP content negotiation • Multiple copies of the same resource • Format migration for preservation reasons • Different clients wanting different formats • Multiple languages of the content 86
    87. 87. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #2 – Alternate Representations 87 <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel="alternate" type="text/html" href="http://example.com/res1.html"/> <rs:lnrel="alternate" type=“application/pdf" href=”http://example.com/res1.pdf"/> </url> </urlset>
    88. 88. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #2 – Alternate Representations <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.html</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”canonical” href="http://example.com/res1"/> </url> </urlset> 88
    89. 89. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #3 – Patching Content 3. Patching content rather than replacing This might occur due to: • Resources are very large and server wishes to conserve bandwidth where possible • Changes are frequent and small • Changes are managed in a CMS that tracks differences • Format exists or can be described that is machine processable to replicate the change 89
    90. 90. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #3 – Patching Content 90 <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.json</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated” length=“398723”/> <rs:lnrel=”http://www.openarchives.org/rs/terms/patch” type=”application/json-patch” modified=“2013-01-02T17:00:00Z” length=“58” href=”http://example.com/res1-patch.json"/> </url> </urlset>
    91. 91. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #4 – Metadata about Resources 4. Resources and their metadata This might occur due to: • Resources have additional metadata records, which are useful for understanding the resource • Such as cultural heritage images, audio, video • Collections with descriptive metadata • Resources with technical metadata • Administrative or Rights metadata 91
    92. 92. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #4 – Metadata about Resources 92 <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”describedby” type=”application/xml” href=”http://example.com/metadata/res1.xml"/> </url> </urlset>
    93. 93. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #4 – Metadata about Resources <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/metadata/res1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”describes” type=”text/html” href=”http://example.com/res1"/> </url> </urlset> 93
    94. 94. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #5 – Prior Versions of Resources But first… 94
    95. 95. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Memento Intermezzo http://www.mementoweb.org/
    96. 96. URI for Original, URI for Version URI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/ Web Archive URI-R - http://www.cnn.com/
    97. 97. URI for Original, URI for Version URI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333 CMS URI-R - http://en.wikipedia.org/wiki/September_11_attacks
    98. 98. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #5 – Prior Versions of Resources 104 <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”memento” href=”http://example.com/past/20130102130000/res1"/> <rs:lnrel=”timegate” href=”http://example.com/timegate/res1"/> <rs:lnrel=”timemap” href=“http://example.com/timemap/res1” type=“application/link-format”/> </url> </urlset>
    99. 99. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #6 – Collection Membership 6. Collection membership of resources This might occur due to: • Resources being part of OAI-ORE aggregations • Resources being part of OAI-PMH sets • Or any other type of collections of resources 105
    100. 100. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #6 – Collection Membership <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”collection” href=”http://example.com/aggregation/allres"/> </url> </urlset> 106
    101. 101. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #7 – Republishing Resources 7. Republishing synchronized resources This might occur due to: • Aggregator systems that harvest resources from remote sites and then republish them at new URIs • Examples include Blog republishing, content distribution networks, mirrored or combined collections • Hypothetical scenario: Lots of little museums with small collections, and a large European/American aggregating digital library system that wants to provide fast, combined access to the content (with permission) 107
    102. 102. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #7 – Republishing Resources <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”via” modified=“2013-01-02T10:00:00Z” href=”http://original.example.org/res1"/> </url> </urlset> 108
    103. 103. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Linking #7 – Republishing Resources <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://aggregator.example.com/res1</loc> <lastmod>2013-01-02T18:00:00Z</lastmod> <rs:md change=”updated”/> <rs:lnrel=”via” modified=“2013-01-02T13:00:00Z” href=”http://example.org/res1"/> <rs:lnrel=”via” modified=“2013-01-02T10:00:00Z” href=”http://original.example.org/res1"/> </url> </urlset> 109
    104. 104. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 4. Discovery 110
    105. 105. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery of Capabilities 111
    106. 106. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery of Capability Documents Requirements: • Need to discover capability documents, i.e. Resource List, Resource Dump, Change List, Change Dump, Archives • Need to know the type of capability each document represents. Approach: • The Capability List provides links to these capability documents, if the Source supports them. • These links have appropriate relation types, e.g. “resourcelist”, “changelist”, etc. 112
    107. 107. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Capability List <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability=”capabilitylist”/> <rs:lnrel=“resourcesync” href=“http://example.com/.well-known/resourcesync”/> <url> <loc>http://aggregator.example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://aggregator.example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url> </urlset> 113
    108. 108. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 114 Requirements: • Need to discover a Capability List Approach: • HTTP Link header from resources subject to synchronization, relation type “resourcesync” • Links from HTML document <head>, relation type “resourcesync” • Links from Capability documents, relation type “up” Link header on example.com/res1.pdf Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync” Discoveryof Capability Lists
    109. 109. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery of Capabilities 115
    110. 110. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery: ResourceSync Description Requirements: • Support for multiple Capability Lists, one per “set of resources” • Need to discover these Capability Lists • Need descriptive information about each set of resources that a Capability List pertains to • Useful to have descriptive information about the Source itself Approach: • The ResourceSync Description document meets these requirements. • It should be at a particular location to avoid having registries: http://(hostname)/.well-known/resourcesync • It can be linked to from the Capability Lists as well. 116
    111. 111. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery of Capabilities 117
    112. 112. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Description <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability=”resourcesync”/> <rs:lnrel=“describedby” href=“http://example.com/info_about_source.xml”/> <url> <loc>http://aggregator.example.com/dataset1/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:lnrel=“describedby” href=“http://example.com/dataset1/info_about_dataset1.xml”/> </url> <url> <loc>http://aggregator.example.com/dataset2/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:lnrel=“describedby” href=“http://example.com/dataset2/info_about_dataset2.xml”/> </url> </urlset> 118
    113. 113. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Discovery of Capabilities 119
    114. 114. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 5. Push method 120
    115. 115. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Motivation for a Push Component in ResourceSync • Reduce synchronization latency by having the Source push out resource change information • To avoid continuous pull of Change Lists by Destinations • Share information about changes to the Source’s ResourceSync implementation, e.g. announcement of new Resource List, new Capability List, etc. • To avoid continuous polling of e.g. Resource Lists, ResourceSync Description 121
    116. 116. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Notification Types • Events pertaining to a resource • updated | created | deleted for a resource • 3rd party defined events • Events pertaining to a set of resources • updated | created | deleted for a Resource List, Resource Dump, Change List, Change Dump, Archives • 3rd party defined events • Events pertaining to the overall ResourceSync implementation • updated | created | deleted for a Capability List, ResourceSync Description • 3rd party defined events 122
    117. 117. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Possible Push Technology: XMPP PubSub Other technologies: WebSockets, HTTP callback 123
    118. 118. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Notification Payload • Payload the same irrespective of transport protocol • Use <urlset> as encapsulating element • One <url> element per notification 124
    119. 119. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Notification Payload – Resource Update (XMPP) <xmpp:iq from=“sender@example.com” to=“destination@example.org” type=“set” id=“liAJUz3S”> <xmpp:pubsub> <xmpp:publishnode=“resource_notification_channel”> <xmpp:item id=“1234577”> <sm:urlsetxmlns:sm=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> <sm:url> <sm:loc>http://example.com/res1</sm:loc> <sm:lastmod>2013-01-02T14:00:00Z</sm:lastmod> <rs:mdchange=“updated” hash=“md5:12324324jhhjl234234” length=“987665” type=“application/pdf”/> </sm:url> </sm:urlset> </xmpp:item> </xmpp:publish> </xmpp:pubsub> </xmpp:iq> 125
    120. 120. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Notification Payload – Capability Update (XMPP) <xmpp:iq from=“sender@example.com” to=“destination@example.com” type=“set” id=“liAJUz3S”> <xmpp:pubsub> <xmpp:publishnode=“changelist_notification_channel”> <xmpp:item id=“1234577”> <sm:urlsetxmlns:sm=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> <sm:url> <sm:loc>http://example.com/dataset1/changelist.xml</sm:loc> <sm:lastmod>2013-01-02T14:00:00Z</sm:lastmod> <rs:mdcapability=“changelist” change=“updated”/> </sm:url> </sm:urlset> </xmpp:item> </xmpp:publish> </xmpp:pubsub> </xmpp:iq> 126
    121. 121. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Considerations • Notification channels • Multiple channels per Source to divide up notifications, e.g. • a channel for changes pertaining to all resources that belong to a set of resources • a channel for changes to capabilities for a set of resources • Server-side filtering preferred over client-side • Authentication/Authorization • To subscribe/create channels 127
    122. 122. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Considerations • Delayed notification • Insurance that Destination does not miss anything • Discovery • Links to channels e.g. from a Capability List • Links from channels to other channels • Provide channel metadata (transport protocol info etc.) 128
    123. 123. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <urlsetxmlns=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> … <url> <loc>xmpp:pubsub.example.com/dataset1?;node=resource_notification_cha nnel</loc> <rs:md capability=“resource-notification”/> <rs:lnrel=“alternate” href=“ws://example.com/dataset1/meta_notification_channel”/> </url> <url> <loc>xmpp:pubsub.example.com/dataset1?;node=capability_notification_ch annel</loc> <rs:md capability=“capability-notification”/> </url> <url> <loc>xmpp:pubsub.example.com/dataset1?;node=resourcesync_notification _channel</loc> <rs:md capability=“resourcesync-notification”/> </url> </urlset> Push Channel Discovery 129
    124. 124. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 4. Framework (Technical) Details 6. Archives 130
    125. 125. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync Framework Component: Archives In order to allow a Source to hold on to historical data and Destinations to catch up with events it has missed: o Publish a - Resource List Archive, - Resource Dump Archive, - Change List Archive, and/or a - Change Dump Archive o Documents, listing historical capability documents 131
    126. 126. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Resource List Archive 132
    127. 127. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability="resourcelist-archive" from="2013-01-09T13:00:00Z"/> <url> <loc>http://example.com/resourcelist1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcelist2.xml</loc> <lastmod>2013-01-09T13:00:00Z</lastmod> </url> <url> … </url> </urlset> Resource List Archive 133
    128. 128. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Resource Dump Archive 134
    129. 129. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability="resourcedump-archive" from="2013-02-10T03:00:00Z"/> <url> <loc>http://example.com/resourcedump1.xml</loc> <lastmod>2013-01-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcedump2.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> … </url> </urlset> Resource Dump Archive 135
    130. 130. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change List Archive 136
    131. 131. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability=”changelist-archive" from="2013-02-01T23:00:00Z until="2013-02-03T23:00:00Z"/> <url> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-02-01T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-02-02T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist3.xml</loc> <lastmod>2013-02-03T23:00:00Z</lastmod> </url> </urlset> Change List Archive 137
    132. 132. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Change Dump Archive 138
    133. 133. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:mdcapability=”changedump-archive" from="2013-02-10T03:00:00Z until="2013-02-17T03:00:00Z"/> <url> <loc>http://example.com/changedump1.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/changedump2.xml</loc> <lastmod>2013-02-17T03:00:00Z</lastmod> </url> <url> … </url> </urlset> Change Dump Archive 139
    134. 134. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 5. Implementation 140
    135. 135. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Implementation #1: The Metadata Harvesting Use Case 141
    136. 136. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland The Metadata HarvestingUse Case 1. Identification of metadata records within a service 1. Use of standards in metadata formats 1. Incremental updates 1. Create, Update, Delete 1. Sets 142
    137. 137. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland The Metadata HarvestingUse Case 1. Identification of metadata records within a service 2. Use of standards in metadata formats ResourceSync does not specifically care about metadata records, only resources. It is up to the server to identify which of those resources are metadata. We are free to annotate a resource's entry with appropriate metadata to indicate the format. 143
    138. 138. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland The Metadata HarvestingUse Case 3. Incremental updates 4. Create, Update, Delete 5. Sets All resources that can be obtained from a change list will be annotated with the kind of change that happened to them. ResourceSync allows the server to publish lists of resources and changes and indexes of those lists all annotated with metadata. ResourceSync publishes changes as static documents. The client is then free to walk up and down the change lists provided by the server. 144
    139. 139. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland (Required)Documents for metadata harvesting use case 145
    140. 140. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Describing Metadata Resources <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" from="2013-05-05T13:00:00Z"/> <url> <loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md type=”application/xml”/> <rs:lnhref="http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf" rel="describes"/> <rs:lnhref="http://mydspace.edu/bitstream/123456789/7/2/image.jpg" rel="describes"/> <rs:lnhref="http://mydspace.edu/123456789/3" rel=”collection"/> </url> </urlset> 146
    141. 141. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Describing Bitstream Resources <urlset … <url> <loc>http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md hash="md5:75d0ea94097a05fce9aca5b079e2f209" length="419805" type="application/pdf"/> <rs:lnhref="http://mydspace.edu/dspace-rs/resource/123456789/7/qdc" rel="describedby"/> <rs:lnhref="http://mydspace.edu/dspace-rs/resource/123456789/7/mets" rel="describedby"/> <rs:lnhref="http://mydspace.edu/dspace-rs/resource/123456789/12/qdc" rel="describedby"/> <rs:lnhref="http://mydspace.edu/123456789/2" rel=”collection"/> </url> </urlset> 147
    142. 142. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Serving Metadata Resources http://mydspace.edu/dspace-rs/resource/123456789/7/qdc ResourceSync webapp Item handle Metadata Format metadata.formats = qdc = http://purl.org/dc/terms/, mets = http://www.loc.gov/METS/ metadata.types = qdc = application/xml, mets = application/xml <loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc<loc> <rs:md type="application/xml”/> <rs:ln href="http://purl.org/dc/terms/" rel="describedby"/> <loc>http://mydspace.edu/dspace-rs/resource/123456789/7/mets</loc> <rs:md type="application/xml”/> <rs:ln href="http://www.loc.gov/METS/" rel="describedby"/> 148
    143. 143. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Generating Documents 1. Initialise Creates initial Capability List and Resource List documents [dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -i 2. Update Creates a new Change List which covers the period since the last Change List was created [dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -u 3. Rebase A combination of both Initialise and Update. [dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -r 149
    144. 144. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Usage of Resources by clients 150
    145. 145. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Impact on DSpace 151
    146. 146. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland URLs • Stable identifiers for archived items • Stable identifiers for unarchived items • Stable identifiers for metadata resources (in their various formats) • Stable identifiers for previous versions Provenance • History of changes to an item/bitstream • Item/bitstream deletions (vs withdraw) • Bitstream create/update dates • Item create/update dates 152 ?
    147. 147. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Versioning • Access of previous versions of both metadata and bitstreams • Stable identifiers for previous versions of both metadata and bitstreams Metadata Resources • Metadata in a variety of formats • Metadata as file/bitstream ? ? 153
    148. 148. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Admin Files • ResourceSync documents (Resource Lists, Change Lists, etc) • ResourceSync exports - Resource Dumps, Change Dumps • Metadata exports in a number of formats Scheduled Tasks • Regular generation of RS documents Complex Objects • Item/bitstream relationships • Collections of content 154
    149. 149. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Dspace Module:https://github.com/CottageLabs/DSpaceResourceSync depends on the common java library: https://github.com/CottageLabs/ResourceSyncJava PHP client: https://github.com/stuartlewis/resync-php depends on the SWORDv2 clienbt library: https://github.com/swordapp/swordappv2-php-library/ Get the software! 155
    150. 150. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Implementation #2: ResourceSync at arXiv.org 156
    151. 151. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync @ arXiv • Use ResourceSync for both mirroring and public data access o efficient updates o ability to do periodic audits o public synchronization capability o reduce admin burden • Likely start with metadata + source for mirroring use case (doing experiments now) • Open access use cases requires processed PDF also • Some concerns about likely use/load… 157
    152. 152. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 158
    153. 153. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Alternate download location • Likely want to separate machine accesses from human accesses to preserve response time on main server => Use Mirrored Content part of spec o <loc> specifies canonical URI - e.g. http://arxiv.org/pdf/1306.1073v1.pdf o <rs:lnrel=“duplicate”> specifies preferred download location - e.g. http://export.arxiv.org/pdf/1306.1073v1.pdf 159
    154. 154. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland <url> <loc>http://arxiv.org/pdf/1306.1073v1.pdf</loc> <lastmod>2013-06-06T00:57:12Z</lastmod> <rs:md hash="md5:e08e0c4e4d7b0895120014f0aa09e7c4" length="287714” type=”application/pdf"/> <rs:lnrel="duplicate” pri="1" href="http://export.arxiv.org/pdf/1306.1073v1.pdf" modified="2013-06-06T02:00:59Z"/> </url> 160 Alternate download location
    155. 155. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Getting a copy of arXiv It might be as easy as: (of course, you probably have to wait a while but it is nice to know ResourceSync is stateless so one can efficiently restart) 161
    156. 156. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync - Agenda 6. Q&A 162
    157. 157. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 163
    158. 158. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 164
    159. 159. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland 165
    160. 160. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Timeline • June 2013 o Version 0.9 of ResourceSync framework specification released o Soliciting broad feedback • July 2013 o Version 0.x of Push-based methods for ResourceSync • Fall 2013 o Specification becomes NISO standard 166
    161. 161. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland Pointers • Specification http://www.openarchives.org/rs/ http://www.openarchives.org/rs/0.9/resourcesync http://www.openarchives.org/rs/0.9/archives • List for public comment https://groups.google.com/d/forum/resourcesync • Simulator code o http://github.org/resync/simulator 167
    162. 162. ResourceSync Tutorial June 19th 2013OAI8, Geneva, Switzerland ResourceSync: A Web-Based Resource Synchronization Framework ResourceSync is funded by The Sloan Foundation & JISC #resourcesync 168

    ×