ResourceSync: Web-Based Resource Synchronization
Upcoming SlideShare
Loading in...5
×
 

ResourceSync: Web-Based Resource Synchronization

on

  • 2,368 views

Presentation about the NISO/OAI ResourceSync effort used at TICER 2012 Summer School.

Presentation about the NISO/OAI ResourceSync effort used at TICER 2012 Summer School.

Statistics

Views

Total Views
2,368
Views on SlideShare
1,714
Embed Views
654

Actions

Likes
2
Downloads
14
Comments
0

4 Embeds 654

http://www.niso.org 633
http://beta.sandboxbk.net 16
https://si0.twimg.com 3
http://tweetedtimes.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ResourceSync: Web-Based Resource Synchronization ResourceSync: Web-Based Resource Synchronization Presentation Transcript

  • ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de SompelTICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync Core Team – NISO & OAICornell University & OAI: Berhard Haslhofer, Carl Lagoze, Simeon WarnerOld Dominion University & OAI: Michael L. NelsonLos Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de SompelNISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync Technical Group•  Manuel Bernhardt, Delving B.V.•  Kevin Ford, Library of Congress•  Richard Jones, JISC•  Graham Klyne, JISC•  Stuart Lewis, JISC•  David Rosenthal, LOCKSS•  Christian Sadilek, Red Hat•  Shlomo Sanders, Ex Libris, Inc.•  Sjoerd Siebinga, Delving B.V.•  Ed Summers, Library of Congress•  Jeff Young, OCLC Online Computer Library Center ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSyncResourceSync: What & Why?Problem Perspective & Conceptual ApproachTechnical DetailsQ&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSyncResourceSync: What & Why?Problem Perspective & Conceptual ApproachTechnical DetailsQ&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Synchronize What?•  Web resources – things with a URI that can be dereferenced and are cache-able (no dependency on underlying OS, technologies etc.)•  Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources)•  That change slowly (weeks/months) or quickly (seconds), and where latency needs may vary•  Focus on needs of research communication and cultural heritage organizations, but aim for generality ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Why?… because lots of projects and services are doing synchronizationbut have to resort to ad-hoc, case by case, approaches!•  Project team involved with projects that need this•  Experience with OAI-PMH: widely used in repos but o  XML metadata only o  Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o  Web technology has moved on since 1999•  Devise a shared solution for data, metadata, linked data? ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Use Cases – The Basics ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Use Cases - More ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Out Of Scope (For Now)•  Bidirectional synchronization•  Destination-defined selective synchronization (query)•  Bulk URI migration ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Use Case: arXiv Mirroring•  1M article versions, ~800/day created or updated at 8 PM US Eastern Time•  Metadata and full-text for each article•  Accuracy important•  Want low barrier for others to use•  Look for more general solution than current homebrew mirroring (running with minor modifications since 1994!) and occasional rsync (filesystem layout specific, auth issues) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Use Case: DBpedia Live Duplication•  Average of 2 updates per second•  Want low latency => need a push technology ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSyncResourceSync: What & Why?Problem Perspective & Conceptual ApproachTechnical DetailsQ&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync Problem•  Consideration: •  Source (server) A has resources that change over time: they get created, modified, deleted •  Destination (servers) X, Y, and Z leverage (some) resources of Source A.•  Problem: •  Destinations want to keep in step with the resource changes at Source A: resource synchronization.•  Goal: •  Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. •  The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Destination: 3 Basic Synchronization Needs1.  Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source -  avoid out-of-band setup2.  Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source -  subject to some latency; minimal: create/update/delete -  allow to catch-up after destination has been offline3.  Audit – A destination should be able to determine whether it is synchronized with a source -  subject to some latency ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 1: Describing ContentIn order to advertise the resources that a source wants destinationsto know about, it may describe them: o  Publish an inventory of resource URIs and possibly associated metadata -  Destination GETs the Content Description -  Destination GETs listed resources by their URI ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 2: Communicating Change EventsIn order to achieve lower latency, a source may communicate aboutchanges to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 2: Communicating Change EventsIn order to achieve lower latency, a source may communicate aboutchanges to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. o  2.2. Push Change Set: Push a list of recent change events (create, update, delete resource) towards (a) destination(s) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 3: Providing Access to VersionsIn order to allow a destination to catch up with missed changes, asource may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 3: Providing Access to VersionsIn order to allow a destination to catch up with missed changes, asource may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set o  3.2. Historical Content: Provide access to prior resource versions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 4: Transferring ContentBy default, content is transferred in response to a GET issued by adestination against a URI of a source’s resource. But a source maysupport additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capability 4: Transferring ContentBy default, content is transferred in response to a GET issued by adestination against a URI of a source’s resource. But a source maysupport additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump o  4.2. Alternate Content Transfer: Support alternative mechanisms to optimize getting content, e.g. content via a mirror site, only changes not the entire changed resource. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source: Advertise CapabilitiesA source needs to advertise the capabilities it supports to allow adestination to discover them•  Some capabilities may be provided by a third party, not the source itself o  e.g. Historical Change Sets, Historical Content o  But the source should still make those third party capabilities discoverable - trust ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSyncResourceSync: What & Why?Problem Perspective & Conceptual ApproachTechnical DetailsQ&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • So Many Choices Push DSNotify OAI-PMH Pull rsync Crawl OAI-ORE RDFsync WebDAV Col. Syn. XMPP Atom SWORD AtomPub Sitemap RSSSPARQLpush PubSubHubbub SDShare XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync – Herbert Van de SompelTICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync – Herbert Van de SompelTICER Summer School, August 22 2012, Tilburg, The Netherlands
  • A Framework Based on Sitemaps•  Modular framework allowing selective deployment•  Sitemap is the core component throughout the framework o  Introduce extension elements and attributes: -  In ResourceSync namespace (rs:) to accommodate synchronization needs -  In XHTML namespace (xhtml:) mainly to accommodate discovery needs o  Reuse Sitemap format for Change Sets (both current and historical) and for manifest in Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Sitemap with Added Datetime ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Change Types: Extend lastmod, Use expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Sitemap with lastmod and expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Sitemap Discovery via robots.txt ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Change Set: An rs Typed Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • More rs Extension Elements ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Change Set with rs and xhtml Extensions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Pushing Change Sets via XMPP PubSubXMPP Publish-Subscribe: Client to Subscription Service, Subscription Service to Client(s) communication•  One of the XMPP (Extensible Messaging and Presence Protocol) extensions http://xmpp.org/extensions/xep-0060.html•  Apple Notifications based on XMPP PubSub•  Available tools, see http://xmpp.org/about-xmpp/ technology-overview/pubsub/#impl-client o  XMPP Servers with PubSub support: -  ejabberd , OpenFire , Tigase , SleekXMPP o  XMPP libraries with PubSub support: -  Strophe (C, JavaScript), XMPP4R (Ruby), SleekXMPP (Python), PubSub Client (Python) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Pushing Change Sets via XMPP PubSub ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Change Set via XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Push Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Discovering a Historical Change Set via a Current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Discovering Historical Content – Link to Version Resource ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Memento Intermezzo http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Original Resources and Mementos ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Bridge from Present to Past ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Bridge from Past to Present ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Memento Framework ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Discovering Historical Content – Link to Memento TimeGate ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Dump•  Two formats currently under discussion: o  Format based on ZIP: -  Package content -  Add manifest (manifest.xml) expressed in Sitemap format -  ZIP it up o  WARC files as used by the web archiving community ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Mapping URI to File Path with rs:path ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Manifest (manifest.xml) Expressed in Sitemap Format ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Dump Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Alternate Location ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Alternate Protocol, e.g. Obtain Changes Only ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Timeline•  August 2012 o  First draft spec shared for feedback with ResourceSync team•  September 2012 o  In-person meeting of ResourceSync Team o  Revise spec, conduct experiments o  Solicit broad feedback o  Paper in D-Lib Magazine•  December 2012 – Finalize specification (?) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • Pointers•  First draft spec: http://www.openarchives.org/rs/0.1/resourcesync!•  Simulator code on github http://github.org/resync/simulator!•  NISO workspace http://www.niso.org/workrooms/resourcesync/! !•  List for public comment coming soon ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de SompelTICER Summer School, August 22 2012, Tilburg, The Netherlands