ResourceSync in 24x7

“Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Synchronize your
resources with
ResourceSync
Simeon Warner
(Cornell University Library)
1

July 10, 2013, Open Repositories 2013, PEI, Canada 2
Team sport

more, still more missing
JISC
Richard Jones
Graham Klyne
Stuart Lewis
OCLC
Jeff Young
LOCKSS
David Rosenthal
RedHat
Christian Sadilek
Ex Libris Inc.
Shlomo Sanders
Library of Congress
Kevin Ford

Alfred P. Sloan
Foundation

Synchronize
• keep “in sync” (colloq.)
• Following changes over time
and
• Keeping copies on different systems the same
• Tackle only the unidirectional problem:
From a Source, to a Destination
5

Resources
aka Web Resources:
have URI, HTTP GET representation(s)
 Many / Few
 Big / Small
 Fast / Slow
6

Why?

Scholarly repositories
• Replicate data/articles for mirroring, reuse,
indexing, ...
• OAI-PMH for metadata
• Many custom solutions
for full content
8

Linked data
Fundamentally distributed but local copy often
required. Either:
1. cache
2. sync local copy...
• Many custom solutions
for local copy
9
Last.FM
MusicBrainz
GeoNames
DBpedia
others...
BBC

Didn’t you sell us OAI-
PMH?
Or... will ResourceSync replace OAI-PMH?
 Proven metadata transfer protocol
 Widely adopted in our community
X Predates REST, not “of the web”
X Not adopted for content transfer
Can replace, likely coexistence
10

What?

1. Baseline sync
Initial load, copy, or catch-up from source
• need list of all resources
• optional packaged content
Want to
• avoid out-of-band setup & customization
12

2. Incremental sync
Keep up-to-date with changes at a source
• need information about changes
• optional packaged content
• minimal primitives: create/update/delete
Want
• allow catch-up after destination offline
• lower latency and/or greater efficiency than
repeated baseline sync
13

3. Audit
Destination should be able to verify whether it is
synchronized with a source
• need list of all resources + fixity info
Want
• lower latency and/or greater efficiency than
baseline sync
• note: subject to some latency
14

How?

Minor?
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”
xmlns:rs="http://www.openarchives.org/rs/terms/”>
<rs:ln …/>
<rs:md …/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:ln …/>
<rs:md …/>
</url>
<url>
…
</url>
</urlset>

Baseline sync & Google
Most basic capability is Resource List:
• Snapshot of state of resources
• URI, datestamp + optional extra fixity info
• Destination does GET on each resource
ResourceSync Baseline sync & Audit
Google/Bing/Yahoo!/etc. harvest
18

19
Modular
Discovery
Four Core
Capabilities
1 2 3 4

Extensible
Extensible use of Link Relations from Atom
• Spec describes use for mirrors, patches,
historical, provenance, conneg...
• Use <rs:ln rel=“your-relation-here” .../>
Extensible attributes for fixity etc.
• Includes lastmod, fixity, length, type...
Extensible framework -> new capabilities
20

Push = Lower latency
Pull
• easy setup, no trust required
Push Changes
• lower latency, better scaling
• same descriptions as pull
• standard transports (XMPP, Websockets...)
• can push discovery info to trigger pull
21

Timeline
January 2013
June 2013
July 2013
Fall 2013
• Tools and libraries
being developed to
ease implementation
 First beta
 Version 0.9
 Update and push spec
 NISO standardization
• Tutorials at major
conferences (OAI8,
OR, JCDL,...)
22

23
http://www.openarchives.org/rs/
• Framework
• Archives
• Push (to come)
• Links to Google
group, associated
articles, blogs, etc.

ResourceSync in 24x7

Recommended

Recommended

More Related Content

Similar to ResourceSync in 24x7

Similar to ResourceSync in 24x7 (20)

More from Simeon Warner

More from Simeon Warner (20)

Recently uploaded

Recently uploaded (20)

ResourceSync in 24x7

Editor's Notes