SlideShare a Scribd company logo
ResourceSync: Leveraging Sitemaps
for Resource Synchronization
WWW 2013, Rio de Janeiro, May 17th
Bernhard Haslhofer | University ofVienna
Simeon Warner | Cornell University
Carl Lagoze | University of Michigan
Martin Klein, Robert Sanderson | Los Alamos National Labs
Michael L. Nelson | Old Dominion University
Herbert van de Sompel | Los Alamos National Labs
http://www.openarchives.org/rs/
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
2
WWW 2013, May 17th
What?
• A framework for synchronizing Web
resources from a Source to a Destination
3
Web
sync
$ resync http://example.com
WWW 2013, May 17th
Why?
• rsync: filesystem sync, but not Web
• OAI-PMH: metadata, but not resources
• Web-DAV: extends HTTP, requires server
installation at source
• ...
4
… because lots of projects and services are doing
synchronization but rely on ad-hoc solutions!
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
5
WWW 2013, May 17th
arxiv.org mirroring
• 2.4M resources (PDF,
metadata, Latex src)
• ~800/day created or
updated
• uses homebrew
mirroring since 1994 (!)
• look for more general
solution to support
independent destinations
6
WWW 2013, May 17th
Wikipedia
• 1.4 updates / sec
• many dependent
services reusing
Wikipedia content (e.g.,
DBPedia, Freebase, etc.)
• harvest articles via OAI-
PMH, retrieve changes
via IRC, download
dumps
7
WWW 2013, May 17th
data.europeana.eu
• aggregates metadata
from >200 data
providers in Europe
• 10 largest providers
contribute 80%
• >190 providers
contribute 20%
8
WWW 2013, May 17th
Design Guidelines
• Sync small websites / repositories (few
resources) but also large data collections
(millions of resources)
• Support low change frequency (weeks /
months) to high change frequency
(seconds) sources
• Low adoption barrier!
9
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
10
WWW 2013, May 17th
Resource List
11
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
</url>
<url>
<loc>http://example.com/res2</loc>
</url>
</urlset>
$ resync -b http://example.com
XML Sitemap
WWW 2013, May 17th
Resource List
12
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"/>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T14:00:00Z</lastmod>
<rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"/>
</url>
</urlset>
Source
WWW 2013, May 17th
Change List
13
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="changelist"
modified="2013-01-03T11:00:00Z"/>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md change="updated"/>
</url>
<url>
<loc>http://example.com/res3</loc>
<lastmod>2013-01-02T18:00:00Z</lastmod>
<rs:md change="deleted"/>
</url>
</urlset>
$ resync -b http://example.com
$ resync -i http://example.com
XML Sitemap
WWW 2013, May 17th
Resource Dump
14
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcedump"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/resourcedump.zip</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
XML Sitemap
WWW 2013, May 17th
Resource Dump
15
http://example.com/resourcedump.zip
|- manifest.xml
|- resources
|- res1
|- res2
WWW 2013, May 17th
Resource Dump Manifest
16
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcedump-manifest"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-03T03:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
path="/resources/res1"/>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-03T04:00:00Z</lastmod>
<rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"
path="/resources/res2"/>
</url>
</urlset>
manifest.xml (XML Sitemap)
WWW 2013, May 17th
Capability List
17
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln href="http://example.com/info-about-source.xml"
rel="describedby"
type="application/xml"/>
<rs:md capability="capabilitylist"
modified="2013-01-02T14:00:00Z"/>
<url>
<loc>http://example.com/dataset1/resourcelist.xml</loc>
<rs:md capability="resourcelist"/>
</url>
<url>
<loc>http://example.com/dataset1/resourcedump.xml</loc>
<rs:md capability="resourcedump"/>
</url>
<url>
<loc>http://example.com/dataset1/changelist.xml</loc>
<rs:md capability="changelist"/>
</url>
</urlset>
$ resync -x http://example.com
XML Sitemap
WWW 2013, May 17th
Large Resource Lists
18
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<sitemap>
<loc>http://example.com/resourcelist-part2.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/resourcelist-part1.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</sitemap>
</sitemapindex>
Source
WWW 2013, May 17th
Other Capabilities
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics Walkthrough
• Demos
• Status and Next Steps
20
WWW 2013, May 17th
Available code
• ResourceSync client
and library (Python)
• ResourceSync source
simulator
21
http://github.com/resync
WWW 2013, May 17th
Install resync client/library
22
$ git clone git://github.com/resync/resync.git
$ cd resync/
$ python setup.py build
$ sudo python setup.py install
$ sudo easy_install resync
$ sudo pip install resync
or
or
WWW 2013, May 17th
Install resync simulator
23
$ git clone git://github.com/resync/simulator.git
$ cd simulator/
$ chmod u+x simulate-source
$ ./simulate-source
$ sudo easy_install tornado
WWW 2013, May 17th
Run client against simulator
24
$ resync -b http://localhost:8888
$ resync -i http://localhost:8888
WWW 2013, May 17th
resync @ arxiv.org
25
resync -v --noauth http://resync.library.cornell.edu/
arxiv-q-bio=/tmp/qbio http://
resync.library.cornell.edu/arxiv=/tmp/arxiv
WWW 2013, May 17th
resync @ en.wikipedia.org
26
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics Walkthrough
• Demos
• Status and Next Steps
27
WWW 2013, May 17th
Status
• Beta spec (v.0.6) for public comment
http://www.openarchives.org/rs/0.6/
resourcesync
• Tool development started
• Separate documents for archiving and push
deployments
28
WWW 2013, May 17th
Next Steps
• Continue tool development & deployment
• Collect
• public comments on
resourcesync@googlegroups.com
• implementation issues on
https://github.com/resync/resync/issues
• Version 0.9 to be released in Summer 2013
• Version 1.0 in fall 2013 (NISO standard)
29
WWW 2013, May 17th
Thanks!
@bhaslhofer
http://slideshare.net/bhaslhofer
http://openarchives.org/rs
resourcesync@googlegroups.com

More Related Content

Similar to ResourceSync: Leveraging Sitemaps for Resource Synchronization

LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
Ross Singer
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...
Chris Richardson
 
REST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practiceREST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practice
hamnis
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
Peter Batty
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
Peter Batty
 
Web 3 0
Web 3 0Web 3 0
elasticsearch
elasticsearchelasticsearch
elasticsearch
Satish Mohan
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
eswcsummerschool
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
Kai Li
 
Intro to Semantic Web
Intro to Semantic WebIntro to Semantic Web
Intro to Semantic Web
Timea Turdean
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
shellac
 
Java colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rsJava colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rs
Sagara Gunathunga
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
Martin Toshev
 
RESTful Web Services with Spring MVC
RESTful Web Services with Spring MVCRESTful Web Services with Spring MVC
RESTful Web Services with Spring MVC
digitalsonic
 
Restful webservices
Restful webservicesRestful webservices
Restful webservices
Luqman Shareef
 
The RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with OracleThe RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with Oracle
Emiliano Pecis
 
Web Services
Web ServicesWeb Services
Web Services
Katrien Verbert
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
Kristof Van Tomme
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimized
Christian Rokitta
 
Unify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearchUnify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearch
Gasperi Jerome
 

Similar to ResourceSync: Leveraging Sitemaps for Resource Synchronization (20)

LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...
 
REST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practiceREST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practice
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
 
Web 3 0
Web 3 0Web 3 0
Web 3 0
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
Intro to Semantic Web
Intro to Semantic WebIntro to Semantic Web
Intro to Semantic Web
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
 
Java colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rsJava colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rs
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
RESTful Web Services with Spring MVC
RESTful Web Services with Spring MVCRESTful Web Services with Spring MVC
RESTful Web Services with Spring MVC
 
Restful webservices
Restful webservicesRestful webservices
Restful webservices
 
The RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with OracleThe RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with Oracle
 
Web Services
Web ServicesWeb Services
Web Services
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimized
 
Unify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearchUnify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearch
 

More from Bernhard Haslhofer

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Bernhard Haslhofer
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
Bernhard Haslhofer
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
Bernhard Haslhofer
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency Networks
Bernhard Haslhofer
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Bernhard Haslhofer
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
Bernhard Haslhofer
 
Mind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringMind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software Engineering
Bernhard Haslhofer
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
Bernhard Haslhofer
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
Bernhard Haslhofer
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bernhard Haslhofer
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Bernhard Haslhofer
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
Bernhard Haslhofer
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveBernhard Haslhofer
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
Bernhard Haslhofer
 
Semantic Tagging on Historical Maps
Semantic Tagging on Historical MapsSemantic Tagging on Historical Maps
Semantic Tagging on Historical Maps
Bernhard Haslhofer
 
The Story behind Maphub
The Story behind MaphubThe Story behind Maphub
The Story behind Maphub
Bernhard Haslhofer
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup Graz
Bernhard Haslhofer
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
Bernhard Haslhofer
 

More from Bernhard Haslhofer (20)

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency Networks
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
 
Mind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringMind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software Engineering
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
 
Things, not Strings
Things, not StringsThings, not Strings
Things, not Strings
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Semantic Tagging on Historical Maps
Semantic Tagging on Historical MapsSemantic Tagging on Historical Maps
Semantic Tagging on Historical Maps
 
The Story behind Maphub
The Story behind MaphubThe Story behind Maphub
The Story behind Maphub
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup Graz
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

ResourceSync: Leveraging Sitemaps for Resource Synchronization

  • 1. ResourceSync: Leveraging Sitemaps for Resource Synchronization WWW 2013, Rio de Janeiro, May 17th Bernhard Haslhofer | University ofVienna Simeon Warner | Cornell University Carl Lagoze | University of Michigan Martin Klein, Robert Sanderson | Los Alamos National Labs Michael L. Nelson | Old Dominion University Herbert van de Sompel | Los Alamos National Labs http://www.openarchives.org/rs/
  • 2. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 2
  • 3. WWW 2013, May 17th What? • A framework for synchronizing Web resources from a Source to a Destination 3 Web sync $ resync http://example.com
  • 4. WWW 2013, May 17th Why? • rsync: filesystem sync, but not Web • OAI-PMH: metadata, but not resources • Web-DAV: extends HTTP, requires server installation at source • ... 4 … because lots of projects and services are doing synchronization but rely on ad-hoc solutions!
  • 5. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 5
  • 6. WWW 2013, May 17th arxiv.org mirroring • 2.4M resources (PDF, metadata, Latex src) • ~800/day created or updated • uses homebrew mirroring since 1994 (!) • look for more general solution to support independent destinations 6
  • 7. WWW 2013, May 17th Wikipedia • 1.4 updates / sec • many dependent services reusing Wikipedia content (e.g., DBPedia, Freebase, etc.) • harvest articles via OAI- PMH, retrieve changes via IRC, download dumps 7
  • 8. WWW 2013, May 17th data.europeana.eu • aggregates metadata from >200 data providers in Europe • 10 largest providers contribute 80% • >190 providers contribute 20% 8
  • 9. WWW 2013, May 17th Design Guidelines • Sync small websites / repositories (few resources) but also large data collections (millions of resources) • Support low change frequency (weeks / months) to high change frequency (seconds) sources • Low adoption barrier! 9
  • 10. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 10
  • 11. WWW 2013, May 17th Resource List 11 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> </url> <url> <loc>http://example.com/res2</loc> </url> </urlset> $ resync -b http://example.com XML Sitemap
  • 12. WWW 2013, May 17th Resource List 12 <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> <rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"/> </url> </urlset> Source
  • 13. WWW 2013, May 17th Change List 13 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="changelist" modified="2013-01-03T11:00:00Z"/> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change="updated"/> </url> <url> <loc>http://example.com/res3</loc> <lastmod>2013-01-02T18:00:00Z</lastmod> <rs:md change="deleted"/> </url> </urlset> $ resync -b http://example.com $ resync -i http://example.com XML Sitemap
  • 14. WWW 2013, May 17th Resource Dump 14 Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/resourcedump.zip</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </url> </urlset> XML Sitemap
  • 15. WWW 2013, May 17th Resource Dump 15 http://example.com/resourcedump.zip |- manifest.xml |- resources |- res1 |- res2
  • 16. WWW 2013, May 17th Resource Dump Manifest 16 <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump-manifest" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-03T03:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" path="/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-03T04:00:00Z</lastmod> <rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e" path="/resources/res2"/> </url> </urlset> manifest.xml (XML Sitemap)
  • 17. WWW 2013, May 17th Capability List 17 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln href="http://example.com/info-about-source.xml" rel="describedby" type="application/xml"/> <rs:md capability="capabilitylist" modified="2013-01-02T14:00:00Z"/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability="resourcelist"/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability="resourcedump"/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability="changelist"/> </url> </urlset> $ resync -x http://example.com XML Sitemap
  • 18. WWW 2013, May 17th Large Resource Lists 18 <?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <sitemap> <loc>http://example.com/resourcelist-part2.xml</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </sitemap> <sitemap> <loc>http://example.com/resourcelist-part1.xml</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </sitemap> </sitemapindex> Source
  • 19. WWW 2013, May 17th Other Capabilities
  • 20. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics Walkthrough • Demos • Status and Next Steps 20
  • 21. WWW 2013, May 17th Available code • ResourceSync client and library (Python) • ResourceSync source simulator 21 http://github.com/resync
  • 22. WWW 2013, May 17th Install resync client/library 22 $ git clone git://github.com/resync/resync.git $ cd resync/ $ python setup.py build $ sudo python setup.py install $ sudo easy_install resync $ sudo pip install resync or or
  • 23. WWW 2013, May 17th Install resync simulator 23 $ git clone git://github.com/resync/simulator.git $ cd simulator/ $ chmod u+x simulate-source $ ./simulate-source $ sudo easy_install tornado
  • 24. WWW 2013, May 17th Run client against simulator 24 $ resync -b http://localhost:8888 $ resync -i http://localhost:8888
  • 25. WWW 2013, May 17th resync @ arxiv.org 25 resync -v --noauth http://resync.library.cornell.edu/ arxiv-q-bio=/tmp/qbio http:// resync.library.cornell.edu/arxiv=/tmp/arxiv
  • 26. WWW 2013, May 17th resync @ en.wikipedia.org 26
  • 27. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics Walkthrough • Demos • Status and Next Steps 27
  • 28. WWW 2013, May 17th Status • Beta spec (v.0.6) for public comment http://www.openarchives.org/rs/0.6/ resourcesync • Tool development started • Separate documents for archiving and push deployments 28
  • 29. WWW 2013, May 17th Next Steps • Continue tool development & deployment • Collect • public comments on resourcesync@googlegroups.com • implementation issues on https://github.com/resync/resync/issues • Version 0.9 to be released in Summer 2013 • Version 1.0 in fall 2013 (NISO standard) 29
  • 30. WWW 2013, May 17th Thanks! @bhaslhofer http://slideshare.net/bhaslhofer http://openarchives.org/rs resourcesync@googlegroups.com