ResourceSync: Leveraging Sitemaps for Resource Synchronization
ResourceSync: Leveraging Sitemapsfor Resource SynchronizationWWW 2013, Rio de Janeiro, May 17thBernhard Haslhofer | University ofViennaSimeon Warner | Cornell UniversityCarl Lagoze | University of MichiganMartin Klein, Robert Sanderson | Los Alamos National LabsMichael L. Nelson | Old Dominion UniversityHerbert van de Sompel | Los Alamos National Labshttp://www.openarchives.org/rs/
WWW 2013, May 17thResourceSync• What and Why?• Synchronization Scenarios• ResourceSync Basics• Demos• Status and Next Steps2
WWW 2013, May 17thWhat?• A framework for synchronizing Webresources from a Source to a Destination3Websync$ resync http://example.com
WWW 2013, May 17thWhy?• rsync: ﬁlesystem sync, but not Web• OAI-PMH: metadata, but not resources• Web-DAV: extends HTTP, requires serverinstallation at source• ...4… because lots of projects and services are doingsynchronization but rely on ad-hoc solutions!
WWW 2013, May 17thResourceSync• What and Why?• Synchronization Scenarios• ResourceSync Basics• Demos• Status and Next Steps5
WWW 2013, May 17tharxiv.org mirroring• 2.4M resources (PDF,metadata, Latex src)• ~800/day created orupdated• uses homebrewmirroring since 1994 (!)• look for more generalsolution to supportindependent destinations6
WWW 2013, May 17thWikipedia• 1.4 updates / sec• many dependentservices reusingWikipedia content (e.g.,DBPedia, Freebase, etc.)• harvest articles via OAI-PMH, retrieve changesvia IRC, downloaddumps7
WWW 2013, May 17thdata.europeana.eu• aggregates metadatafrom >200 dataproviders in Europe• 10 largest providerscontribute 80%• >190 providerscontribute 20%8
WWW 2013, May 17thDesign Guidelines• Sync small websites / repositories (fewresources) but also large data collections(millions of resources)• Support low change frequency (weeks /months) to high change frequency(seconds) sources• Low adoption barrier!9
WWW 2013, May 17thResourceSync• What and Why?• Synchronization Scenarios• ResourceSync Basics• Demos• Status and Next Steps10
WWW 2013, May 17thResourceSync• What and Why?• Synchronization Scenarios• ResourceSync Basics Walkthrough• Demos• Status and Next Steps27
WWW 2013, May 17thStatus• Beta spec (v.0.6) for public commenthttp://www.openarchives.org/rs/0.6/resourcesync• Tool development started• Separate documents for archiving and pushdeployments28
WWW 2013, May 17thNext Steps• Continue tool development & deployment• Collect• public comments email@example.com• implementation issues onhttps://github.com/resync/resync/issues• Version 0.9 to be released in Summer 2013• Version 1.0 in fall 2013 (NISO standard)29
WWW 2013, May 17thThanks!@bhaslhoferhttp://slideshare.net/bhaslhoferhttp://firstname.lastname@example.org
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.