SlideShare a Scribd company logo
NISO Training
http://www.niso.org/workrooms/onixpl-encoding/

NISO Training
ResourceSync: A Web-Based
Resource Synchronization
Framework
December 3, 2013
Speakers:
Bernhard Haslhofer - Postdoc Research Associate
Department of Computer Science, University of Vienna
Simeon Warner - Information Science, Cornell University
ResourceSync:
A Web-Based
Resource Synchronization
Framework

#resourcesync

ResourceSync is funded by
The Sloan Foundation & JISC
ResourceSync Webinar
December 3 2013

2
This is a short version of the complete ResourceSync tutorial,
which is available at
http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial

ResourceSync Webinar
December 3 2013

3
ResourceSync Tutorial History
• OAI8, June 2013 – Open Repositories, July 2013 –
JCDL, July 2013 – TPDL 2013, September 2013 –LITA
Forum, November 2013, SWIB November 2013, …

Presenters

Simeon Warner
Cornell University

Bernhard Haslhofer
University of Vienna
ResourceSync Webinar
December 3 2013
4
ResourceSync Tutorial Contributors

Martin Klein
Herbert Van de Sompel
Robert Sanderson
Los Alamos National Laboratory Los Alamos National Laboratory Los Alamos National Laboratory
<martinklein0815@gmail.com>
<hvdsomp@gmail.com>
<azaroth24@gmail.com>
@mart1nkle1n
@hvdsomp
@azaroth24

Simeon Warner
Cornell University
<simeon.warner@cornell.edu>
@zimeon

Michael L. Nelson
Old Dominion University
<mln@cs.odu.edu>
@phonedude_mln

Richard Jones
Cottage Labs
<richard@cottagelabs.com>
@cottagelabs
ResourceSync Webinar
December 3 2013
5
OAI
Herbert Van de Sompel
Martin Klein
Robert Sanderson
(Los Alamos National Laboratory)
Simeon Warner
(Cornell University)

NISO
Todd Carpenter
Nettie Lagace
University of Oxford
Graham Klyne

Bernhard Haslhofer
(University of Vienna)
Michael L. Nelson
(Old Dominion University)

Lyrasis
Peter Murray

Carl Lagoze
(University of Michigan)

ResourceSync Webinar
December 3 2013

6
ResourceSync Technical Group
LOCKSS
Ex Libris Inc.
Shlomo Sanders

David Rosenthal

JISC
Richard Jones

Paul Walk

Stuart Lewis

RedHat
OCLC
Christian Sadilek

Library of Congress

Jeff Young

Kevin Ford

ResourceSync Webinar
December 3 2013

7
Timeline, Status of Specification(s)
• August 2013
o

o

Release of ResourceSync framework Core specification
- Version 0.9.1
Public draft of ResourceSync Archives specification released

• September 2013
o

Core specification on its way to become an ANSI standard

• November 2013
o

Internal draft of ResourceSync Notification specification

• January 2014
o

Public draft of ResourceSync Notification specification

• Mid 2014
o

Core specification becomes ANSI/NISO standard

ResourceSync Webinar
December 3 2013

8
Pointers
• Specification
http://www.openarchives.org/rs/
http://www.openarchives.org/rs/resourcesync
http://www.openarchives.org/rs/archives

• List for public comment
https://groups.google.com/d/forum/resourcesync
• Client and simulator code
http://github.org/resync/resync
http://github.org/resync/simulator

ResourceSync Webinar
December 3 2013

9
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual
Approach

2. Motivation & Use Cases
3. Framework Walkthrough

4. Framework (Technical) Details
5. Implementation
6. Q&A
ResourceSync Webinar
December 3 2013

10
ResourceSync - Agenda
1. ResourceSync: Problem Perspective & Conceptual
Approach

ResourceSync Webinar
December 3 2013

11
Synchronize What?

• Web resources
o things with a URI that can be dereferenced
• Focus on needs of research communication and cultural heritage
organizations
o but aim for generality

ResourceSync Webinar
December 3 2013

12
Synchronize What?
• Small websites/repositories (a few resources) to large
repositories/datasets/linked data collections (many millions of
resources)

sync

sync

ResourceSync Webinar
December 3 2013

13
Synchronize What?
• Low change frequency (weeks/months) to high change
frequency (seconds)
• Synchronization latency and accuracy needs may vary
sync

sync

sync

ResourceSync Webinar
December 3 2013

14
Why?
… because lots of projects and services are doing synchronization
but have to resort to ad-hoc, case by case, approaches!
• Project team involved with projects that need this

• Experience with OAI-PMH: widely used in repos but
o XML metadata only
o Attempts at synchronizing actual content via OAI-PMH
(complex object formats, dc:identifier) not successful.
o Web technology has moved on since 1999
• Devise a shared solution for data, metadata, linked data?

ResourceSync Webinar
December 3 2013

15
ResourceSync Problem
• Consideration:
• Source (server) A has resources that change over time: they
get created, modified, deleted
• Destination (servers) X, Y, and Z leverage (some)
resources of Source A.
• Problem:
• Destinations want to keep in step with the resource changes
at Source A: resource synchronization.
• Goal:
• Design an approach for resource synchronization aligned
with the Web Architecture that has a fair chance of adoption
by different communities.
• The approach must scale better than recurrent HTTP
HEAD/GET on resources.

ResourceSync Webinar
December 3 2013

16
Source: Core Synchronization Capabilities

P
U
L
L

1. Describing content – publish a list of resources available for
synchronization to enable Destinations to perform an initial load
or catch-up with a Source
2. Packaging content – bundle resources to enable bulk download
by destinations
3. Describing changes – publish a list of resource changes to
enable destinations to stay synchronized and decrease latency
4. Packaging changes – bundle resource changes for bulk
download by destinations

ResourceSync Webinar
December 3 2013

17
Source: Notifications Capabilities
To reduce synchronization latency and to optimize the synchronization
process the Source can support:

P
•
U
S
•
H

1. Change Notification
• Notifies about changes to particular resources
• e.g., resource A has been updated | created | deleted
2. Framework Notification
• Notifies about changes to capabilities i.e., their documents
• e.g., a Change List has been updated | created | deleted

ResourceSync Webinar
December 3 2013

18
Source: Synchronization Features
1. Discovery of capabilities – support Destinations in discovering
all offered capabilities
o

Applies to PULL, PUSH, capabilities

1. Linking to related resources – provide links from resources
subject to synchronization to related resources
o

Applies to PULL, PUSH capabilities

ResourceSync Webinar
December 3 2013

19
Destination: Synchronization Needs
1. Baseline synchronization – A destination must be able to
perform an initial load or catch-up with a source
- avoid out-of-band setup
2. Incremental synchronization – A destination must have some
way to keep up-to-date with changes at a source
- subject to some latency; minimal: create/update/delete
- allow to catch-up after destination has been offline
3. Audit – A destination should be able to determine whether it is
synchronized with a source
- regarding coverage and accuracy

ResourceSync Webinar
December 3 2013

20
ResourceSync - Agenda

2. Motivation & Use Cases

ResourceSync Webinar
December 3 2013

21
Use Case 1: arXiv Mirroring and Data Sharing
• Repository of scholarly articles in
physics, mathematics, computer
science, etc.
• > 850k articles
• approx. 1.5 revisions per article on
average
• approx. 75k new articles per year
• Each article has full-text and separate
metadata record
• approx. 3.8M resources

ResourceSync Webinar
December 3 2013

22
Use Case 1: arXiv Mirroring and Data Sharing
• 2,700 updates daily
o at 8pm EST
o Currently using homebrew mirroring
solution (running with minor
modifications since 1994!)
o occasional rsync (file systemspecific, auth issues)

ResourceSync Webinar
December 3 2013

23
Use Case 1: arXiv
Mirroring / Data Sharing
• GOAL: Keep mirror sites synchronized with daily
changes
• WANT:
o
o
o
o
o

o

high consistency
moderate latency
robustness to global network outages (low admin effort)
ability to verify sync status in case of questions
low admin effort (i.e. standard approach, standard tools)
reasonable consistency, latency, efficiency

ResourceSync Webinar
December 3 2013

24
Use Case 2: DBpedia Live Duplication
• Average of 2 updates per second
• Low latency desirable => need for a push technology

ResourceSync Webinar
December 3 2013

25
ResourceSync - Agenda

3. Framework Walkthrough

ResourceSync Webinar
December 3 2013

26
Source Capability 1: Describing Content
In order to advertise the resources that a source wants destinations
to know about, it may describe them:
o

o

Publish a Resource List, a list of resource URIs and possibly
associated metadata
- Destination GETs the Resource List
- Destination GETs listed resources by their URI
A Resource List describes the state of a set of resources at
one point in time (snapshot)

ResourceSync Webinar
December 3 2013

27
28
29
Source Capability 2: Packaging Content
By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:
o

o

Publish a Resource Dump, a document that points to
packages of resource representations and necessary
metadata
- Destination GETs the package
- Destination unpacks the package
- ZIP format supported
A Resource Dump and the packages it points to reflect the
state of a set of resources at one point in time (snapshot)

ResourceSync Webinar
December 3 2013

30
31
32
Source Capability 3: Describing Changes
In order to achieve lower latency and/or greater efficiency, a source
may communicate about changes to its resources:
o

o

Publish a Change List, a list of recent change events
(created, updated, deleted resource)
- Destination acts upon change events, e.g. GETs
created/updated resources, removes deleted resources.
A Change List pertains to resources that changed in a
temporal interval with a start- and an end-date
- If a resource changed more than once, it will be listed
more than once

ResourceSync Webinar
December 3 2013

33
34
35
36
Source Capability 4: Packaging Changes
In order to reduce the number of requests to obtain resource
changes, a source may provide packaged bitstreams for changed
resources:
o

o

Publish a Change Dump, a document that points to
packages containing bitstreams of recently changed
resource and necessary metadata
- Destination GETs the package
- Destination unpacks the package
- ZIP format supported
A Change Dump and its packages pertain to resources that
changed in a temporal interval with a start- and an end-date
- If a resource changed more than once, it will be included
more than once
ResourceSync Webinar
December 3 2013

37
38
Destination: Key Processes

ResourceSync Webinar
December 3 2013

39
ResourceSync - Agenda

4. Framework (Technical) Details

ResourceSync Webinar
December 3 2013

40
So Many Choices
Push

DSNotify
OAI-PMH
rsync

Crawl

Pull
OAI-ORE

RDFsync

WebDAV Col. Syn.

XMPP
Atom

SWORD
Sitemap

SPARQLpush

SDShare

AtomPub

RSS
PubSubHubbub

XMPP
ResourceSync Webinar
December 3 2013

41
So Many Choices
Push

DSNotify
OAI-PMH
rsync

Crawl

Pull
OAI-ORE

RDFsync

WebDAV Col. Syn.

XMPP
Atom

SWORD
Sitemap

SPARQLpush

SDShare

AtomPub

RSS
PubSubHubbub

XMPP
ResourceSync Webinar
December 3 2013

42
ResourceSync Webinar
December 3 2013

43
Sitemap Format

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
</url>

<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T14:00:00Z</lastmod>
</url>
…
</urlset>

ResourceSync Webinar
December 3 2013

44
ResourceSync Sitemap Extensions
<urlset xmlns=http://www.sitemaps.org/schemas/sitemap/0.9
xmlns:rs="http://www.openarchives.org/rs/terms/”>
<rs:ln …/>
<rs:md …/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:ln …/>
<rs:md …/>
</url>
<url>
…
</url>
</urlset>

ResourceSync Webinar
December 3 2013

45
Related Resource Metadata Summary
• Attributes of the <rs:ln> element; c.f. resource metadata + pri
Element/Attribute Description

Defined by

<rs:ln>

ResourceSync

encoding

HTTP Content-Encoding header value

RFC2616

hash

One or more content digests (md5, sha-1, sha-256)

Atom Link Ext.

href

Related resource URI (identity)

RFC4287

length

HTTP Content-Length header value

RFC4287

modified

Timestamp of last change (c.f. <lastmod>)

Atom Link Ext.

path

Path in ZIP package (Dump Manifests only)

ResourceSync

pri

Priority of link

RFC6249

rel

Relation - IANA registered or URI

RFC4287

type

HTTP Content-Type header value

RFC4287
ResourceSync Webinar
December 3 2013
Resource Metadata Summary
Element/Attribute
<loc>
<lastmod>

Description
Resource URI (identity)
Timestamp of last change

Defined by
sitemaps
sitemaps

<changefreq>

Expected update frequency

sitemaps

<rs:md>
change
encoding

hash
length
path
type

ResourceSync
Change type (Change List & Change
Dump Manifest only)

ResourceSync

HTTP Content-Encoding header value

RFC2616

One or more content digests (md5, sha-1, Atom Link Ext.
sha-256)

HTTP Content-Length header value

RFC4287

Path in ZIP package (Dump Manifests
only)
HTTP Content-Type header value

ResourceSync

RFC4287

ResourceSync Webinar
December 3 2013
Link Relation Summary
Relation

Use in ResourceSync

Defined in

rel="alternate"

Link from generic to specific URI

HTML 5

rel="canonical"

Link from specific to generic URI

RFC6596

rel="collection"

Resource is member of collection

RFC6573

rel="contents"

Link from dump to manifest

rel="describedby"

Has metadata

HTML4
Protocol for Web Description Resources
(POWDER): Description Resources

rel="describes"

Is metadata for

The 'describes' Link Relation Type

rel="duplicate"

RFC6249

rel=".../rs/terms/patch"

Mirror or alternative copy
A patch -- efficient change
information

rel="memento"

Link to time-specific URI

Memento Internet Draft

rel="timegate"

Link to timegate

Memento Internet Draft

rel="via"

Provenance chain, came from

RFC4287

This specification

ResourceSync Webinar
December 3 2013
Resource List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
at="2013-01-03T09:00:00Z”
completed="2013-01-03T09:01:00Z” />
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="8876"
type="text/html"/>
</url>
<url>
…
</url>
</urlset>

ResourceSync Webinar
December 3 2013

49
Resource List
• Describe Source’s resources that are subject to synchronization
• At one point in time (snapshot)
• Creation can take some time – duration can be conveyed
• Typical Destination use: Baseline Synchronization, Audit

• Each URI typically listed only once
• Might be expensive to generate
• Destinations use @at to determine freshness
• [@at, @completed] – interval of uncertainty
• Destination issues GETs against URIs to obtain resources
• Very similar to current Sitemaps

ResourceSync Webinar
December 3 2013

50
Resource Dump
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability=”resourcedump"
at="2013-01-02T09:00:00Z”/>
<url>
<loc>http://example.com/resourcedump_part1.zip</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md length=”97553"
type=”application/zip"/>
<rs:ln rel=”contents”
href="http://example.com/resourcedump_manifest-part1.xml"
type=”application/xml"/>
</url>
<url>
<loc>http://example.com/resourcedump_part2.zip</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
</url>
</urlset>
ResourceSync Webinar
December 3 2013

51
Resource Dump Manifest
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability=”resourcedump-manifest"
at="2013-01-02T09:00:00Z”/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md type="text/html"
path=”/resources/res1"/>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md type=”application/pdf”
path=”/resources/res2"/>
</url>
</urlset>

ResourceSync Webinar
December 3 2013

52
Resource Dump
• A Resource Dump points to packages (ZIP files) that contain
representations of the Source’s resources
• At one point in time (snapshot)
• Resource Dump is mandatory, even if there is only one ZIP file
• ZIP package contains manifest, listing contained bitstreams
• Typical Destination use: Baseline Synchronization, bulk
download

• Each URI typically listed only once
• Might be expensive to generate
• Destinations use @at to determine freshness
• [@at, @completed] – interval of uncertainty
• GETs against individual URIs from Resource List achieves the
same result (ignoring varying freshness)
ResourceSync Webinar
December 3 2013

53
Change List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability=”changelist"
from="2013-01-02T09:00:00Z”
until="2013-01-03T09:00:00Z”/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md change=”updated"
hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="8876"
type="text/html"/>
</url>
<url>
…
</url>
</urlset>

ResourceSync Webinar
December 3 2013

54
Change List
• A Change List pertains to a Source’s resources that changed
• Changes that occurred during a temporal interval with startand end-date
• Typical Destination use: Incremental Synchronization, Audit
• Changes are listed in chronological order
• Multiple changes to one resource results in the resource being
listed multiple times, once per change
• Source determines duration of temporal interval
• Destinations use @from and @until to determine freshness
• Destinations issue GETs against URIs to obtain changed
resources

ResourceSync Webinar
December 3 2013

55
Discovery of Capabilities
Requirements:
• Need to discover capabilities, i.e. Resource List, Resource
Dump, Change List, Change Dump, Archives, Notification
channels
• Need to know the type of capability each document
represents.
Approach:
• The Source publishes a Capability List that enumerates the
capabilities it supports.
• By pointing at Resource List, Change List, Resource Dump,
etc. using appropriate relation types, e.g. “resourcelist”,
“changelist”, “resourcedump” etc.
http://www.openarchives.org/rs/resourcesync#CapabilityList
ResourceSync Webinar
December 3 2013

56
Discovery of Capabilities

ResourceSync Webinar
December 3 2013

57
Capability List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability=”capabilitylist”/>
<url>
<loc>http://example.com/dataset1/resourcelist.xml</loc>
<rs:md capability=”resourcelist”/>
</url>
<url>
<loc>http://example.com/dataset1/changelist.xml</loc>
<rs:md capability=”changelist”/>
</url>
<url>
<loc>http://example.com/dataset1/resourcedump.xml</loc>
<rs:md capability=”resourcedump”/>
</url>
</urlset>

ResourceSync Webinar
December 3 2013

58
Discovery of Capability Lists
Requirements:
• Need to discover a Capability List
Approaches:
• Introduce a link in the HTTP Link header of a resources that is
subject to synchronization, pointing at the Capability List with the
relation type “resourcesync”
• Introduce a link from an HTML document that is subject to
synchronization (<head> section), pointing at the Capability List
with the relation type “resourcesync”
• Link from a Resource List, etc. to the Capability List with the
relation type “up”
Link header on example.com/res1.pdf
Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync”
ResourceSync Webinar
December 3 2013

59
Discovery via robots.txt
• Resource Lists are (enhanced) Sitemaps
• Sitemaps can be discovered via robots.txt
• Ergo, Resource Lists should be discoverable via robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Sitemap: http://example.com/dataset1/resourcelist.xml

ResourceSync Webinar
December 3 2013

60
Framework Structure

ResourceSync Webinar
December 3 2013

61
Motivation for Notifications
•

Reduce synchronization latency by having the Source push out
resource change information
• To avoid continuous pull of Change Lists by Destinations

•

Share information about changes to the Source’s
ResourceSync implementation, e.g. announcement of new
Resource List, new Capability List, etc.
• To avoid continuous polling of e.g. Resource Lists,
ResourceSync Description

ResourceSync Webinar
December 3 2013

62
Source: Notification Capabilities
•

P
U
•
S
H

1. Change Notification
• Notifies about changes to particular resources
• e.g., resource A has been updated | created | deleted
2. Framework Notification
• Notifies about changes to capabilities i.e., their documents
• e.g., a Change List has been updated | created | deleted
• Also for Capability Lists and Source Description

ResourceSync Webinar
December 3 2013

63
Notification Channels
•

Notification sent via channels
• Resource Notification: one channel per set of resources
• Framework Notification: one channel per set of resources
• Sent on level of capability document, not on index-level
• Notifications about changes to Source Description sent on all
Framework Notification channels

•

Payload for notifications: <urlset> documents

•

Transport protocol for notifications under discussion:
• PubSubHubbub https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core0.4.html - current choice
• WebSockets -http://tools.ietf.org/html/rfc6455 – may be added
later
ResourceSync Webinar
December 3 2013

64
Change Notification Payload
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="up"
href="http://example.com/dataset1/capabilitylist.xml"/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T09:07:00Z</lastmod>
<rs:md change=”updated"
hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="8876"
type="text/html"/>
</url>
<url>
…
</url>
</urlset>

ResourceSync Webinar
December 3 2013

65
ResourceSync - Agenda

5. Implementation

ResourceSync Webinar
December 3 2013

66
DSpace support for
metadata harvesting use case

DSpace Module:
https://github.com/CottageLabs/DSpaceResourceSync
PHP client:
https://github.com/stuartlewis/resync-php
http://mydspace.edu/dspace-rs/resource/123456789/7/qdc
ResourceSync webapp

Item handle

Metadata Format
ResourceSync Webinar
December 3 2013

67
ResourceSync @ arXiv
• Use ResourceSync for both
mirroring and public data access
o efficient updates
o ability to do periodic audits
o public synchronization capability
o reduce admin burden
• Start with metadata + source for
mirroring use case (doing
experiments now)
• Open Access use cases require
processed PDF also
ResourceSync Webinar
December 3 2013

68
Getting a copy of arXiv
It might be as easy as:

(of course, you probably have to wait a while but it is nice to know ResourceSync is
stateless so one can efficiently restart)

ResourceSync Webinar
December 3 2013

69
Python Library and Client
• Aim to provide library code implementing all ResourceSync
facilities for use in both source and destination implementations
o

Designed for python 2.6 (RHEL6) and 2.7

• Client (resync) supports many destination operations, inspired
by the common Unix rsync program
• Client also supports some operations that might be useful in a
source, such as generation of static Resource Lists, or periodic
Change Lists (used in arXiv experiments)
• Explorer (resync-explorer) intended to allow easy inspection
of a source’s resource sets and capabilities
• Developed since ResourceSync v0.5, updated for v0.9.1
http://github.org/resync/resync
On pypi: “easy_install resync”

ResourceSync Webinar
December 3 2013
ResourceSync Source Simulator
• Python code using Tornado server
• Provides random set of resources of different sizes updated at a
particular rate
• Very useful for testing Destination code

http://github.com/resync/simulator

ResourceSync Webinar
December 3 2013
ResourceSync - Agenda

6. Q&A
ResourceSync Webinar
December 3 2013

72
ResourceSync:
A Web-Based
Resource Synchronization
Framework

#resourcesync

ResourceSync is funded by
The Sloan Foundation & JISC
ResourceSync Webinar
December 3 2013

73
THANK YOU
We look forward to seeing you at a
future NISO training event.

More Related Content

What's hot

ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
Herbert Van de Sompel
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...Ahmed AlSum
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Humphrey Southall
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
eswcsummerschool
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
Uri Laserson
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
Herbert Van de Sompel
 
Memento 101
Memento 101Memento 101
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly ResourcesRobert Sanderson
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
Aaron Brooks
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoopguest27e6764
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
Martin Klein
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
tcloudcomputing-tw
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 

What's hot (20)

ResourceSync
ResourceSyncResourceSync
ResourceSync
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Memento 101
Memento 101Memento 101
Memento 101
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 

Similar to NISO ResourceSync Training Session

ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
Simeon Warner
 
The Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
The Missing Link-The Evolving Current State of Linked Data for Serials-FallgrenThe Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
The Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
NASIG
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
OpenAIRE
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
ResourceSync tutorial OAI8
ResourceSync tutorial OAI8ResourceSync tutorial OAI8
ResourceSync tutorial OAI8
Herbert Van de Sompel
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)
Christine Stohn
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
Heinz Pampel
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
DataDryad
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
Pascal-Nicolas Becker
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
Pascal-Nicolas Becker
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG: connecting the knowledge community
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
Sebastian Schaffert
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
Pascal-Nicolas Becker
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
University of California Curation Center
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdNathan Yergler
 

Similar to NISO ResourceSync Training Session (20)

ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
 
The Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
The Missing Link-The Evolving Current State of Linked Data for Serials-FallgrenThe Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
The Missing Link-The Evolving Current State of Linked Data for Serials-Fallgren
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
ResourceSync tutorial OAI8
ResourceSync tutorial OAI8ResourceSync tutorial OAI8
ResourceSync tutorial OAI8
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
BaurCHCArchivist
BaurCHCArchivistBaurCHCArchivist
BaurCHCArchivist
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
UKSG webinar: Making Connections - Creating Linked Open Library Data with Nei...
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
 

More from National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
National Information Standards Organization (NISO)
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
National Information Standards Organization (NISO)
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

NISO ResourceSync Training Session

  • 1. NISO Training http://www.niso.org/workrooms/onixpl-encoding/ NISO Training ResourceSync: A Web-Based Resource Synchronization Framework December 3, 2013 Speakers: Bernhard Haslhofer - Postdoc Research Associate Department of Computer Science, University of Vienna Simeon Warner - Information Science, Cornell University
  • 2. ResourceSync: A Web-Based Resource Synchronization Framework #resourcesync ResourceSync is funded by The Sloan Foundation & JISC ResourceSync Webinar December 3 2013 2
  • 3. This is a short version of the complete ResourceSync tutorial, which is available at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial ResourceSync Webinar December 3 2013 3
  • 4. ResourceSync Tutorial History • OAI8, June 2013 – Open Repositories, July 2013 – JCDL, July 2013 – TPDL 2013, September 2013 –LITA Forum, November 2013, SWIB November 2013, … Presenters Simeon Warner Cornell University Bernhard Haslhofer University of Vienna ResourceSync Webinar December 3 2013 4
  • 5. ResourceSync Tutorial Contributors Martin Klein Herbert Van de Sompel Robert Sanderson Los Alamos National Laboratory Los Alamos National Laboratory Los Alamos National Laboratory <martinklein0815@gmail.com> <hvdsomp@gmail.com> <azaroth24@gmail.com> @mart1nkle1n @hvdsomp @azaroth24 Simeon Warner Cornell University <simeon.warner@cornell.edu> @zimeon Michael L. Nelson Old Dominion University <mln@cs.odu.edu> @phonedude_mln Richard Jones Cottage Labs <richard@cottagelabs.com> @cottagelabs ResourceSync Webinar December 3 2013 5
  • 6. OAI Herbert Van de Sompel Martin Klein Robert Sanderson (Los Alamos National Laboratory) Simeon Warner (Cornell University) NISO Todd Carpenter Nettie Lagace University of Oxford Graham Klyne Bernhard Haslhofer (University of Vienna) Michael L. Nelson (Old Dominion University) Lyrasis Peter Murray Carl Lagoze (University of Michigan) ResourceSync Webinar December 3 2013 6
  • 7. ResourceSync Technical Group LOCKSS Ex Libris Inc. Shlomo Sanders David Rosenthal JISC Richard Jones Paul Walk Stuart Lewis RedHat OCLC Christian Sadilek Library of Congress Jeff Young Kevin Ford ResourceSync Webinar December 3 2013 7
  • 8. Timeline, Status of Specification(s) • August 2013 o o Release of ResourceSync framework Core specification - Version 0.9.1 Public draft of ResourceSync Archives specification released • September 2013 o Core specification on its way to become an ANSI standard • November 2013 o Internal draft of ResourceSync Notification specification • January 2014 o Public draft of ResourceSync Notification specification • Mid 2014 o Core specification becomes ANSI/NISO standard ResourceSync Webinar December 3 2013 8
  • 9. Pointers • Specification http://www.openarchives.org/rs/ http://www.openarchives.org/rs/resourcesync http://www.openarchives.org/rs/archives • List for public comment https://groups.google.com/d/forum/resourcesync • Client and simulator code http://github.org/resync/resync http://github.org/resync/simulator ResourceSync Webinar December 3 2013 9
  • 10. ResourceSync - Agenda 1. ResourceSync: Problem Perspective & Conceptual Approach 2. Motivation & Use Cases 3. Framework Walkthrough 4. Framework (Technical) Details 5. Implementation 6. Q&A ResourceSync Webinar December 3 2013 10
  • 11. ResourceSync - Agenda 1. ResourceSync: Problem Perspective & Conceptual Approach ResourceSync Webinar December 3 2013 11
  • 12. Synchronize What? • Web resources o things with a URI that can be dereferenced • Focus on needs of research communication and cultural heritage organizations o but aim for generality ResourceSync Webinar December 3 2013 12
  • 13. Synchronize What? • Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) sync sync ResourceSync Webinar December 3 2013 13
  • 14. Synchronize What? • Low change frequency (weeks/months) to high change frequency (seconds) • Synchronization latency and accuracy needs may vary sync sync sync ResourceSync Webinar December 3 2013 14
  • 15. Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! • Project team involved with projects that need this • Experience with OAI-PMH: widely used in repos but o XML metadata only o Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o Web technology has moved on since 1999 • Devise a shared solution for data, metadata, linked data? ResourceSync Webinar December 3 2013 15
  • 16. ResourceSync Problem • Consideration: • Source (server) A has resources that change over time: they get created, modified, deleted • Destination (servers) X, Y, and Z leverage (some) resources of Source A. • Problem: • Destinations want to keep in step with the resource changes at Source A: resource synchronization. • Goal: • Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. • The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync Webinar December 3 2013 16
  • 17. Source: Core Synchronization Capabilities P U L L 1. Describing content – publish a list of resources available for synchronization to enable Destinations to perform an initial load or catch-up with a Source 2. Packaging content – bundle resources to enable bulk download by destinations 3. Describing changes – publish a list of resource changes to enable destinations to stay synchronized and decrease latency 4. Packaging changes – bundle resource changes for bulk download by destinations ResourceSync Webinar December 3 2013 17
  • 18. Source: Notifications Capabilities To reduce synchronization latency and to optimize the synchronization process the Source can support: P • U S • H 1. Change Notification • Notifies about changes to particular resources • e.g., resource A has been updated | created | deleted 2. Framework Notification • Notifies about changes to capabilities i.e., their documents • e.g., a Change List has been updated | created | deleted ResourceSync Webinar December 3 2013 18
  • 19. Source: Synchronization Features 1. Discovery of capabilities – support Destinations in discovering all offered capabilities o Applies to PULL, PUSH, capabilities 1. Linking to related resources – provide links from resources subject to synchronization to related resources o Applies to PULL, PUSH capabilities ResourceSync Webinar December 3 2013 19
  • 20. Destination: Synchronization Needs 1. Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source - avoid out-of-band setup 2. Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source - subject to some latency; minimal: create/update/delete - allow to catch-up after destination has been offline 3. Audit – A destination should be able to determine whether it is synchronized with a source - regarding coverage and accuracy ResourceSync Webinar December 3 2013 20
  • 21. ResourceSync - Agenda 2. Motivation & Use Cases ResourceSync Webinar December 3 2013 21
  • 22. Use Case 1: arXiv Mirroring and Data Sharing • Repository of scholarly articles in physics, mathematics, computer science, etc. • > 850k articles • approx. 1.5 revisions per article on average • approx. 75k new articles per year • Each article has full-text and separate metadata record • approx. 3.8M resources ResourceSync Webinar December 3 2013 22
  • 23. Use Case 1: arXiv Mirroring and Data Sharing • 2,700 updates daily o at 8pm EST o Currently using homebrew mirroring solution (running with minor modifications since 1994!) o occasional rsync (file systemspecific, auth issues) ResourceSync Webinar December 3 2013 23
  • 24. Use Case 1: arXiv Mirroring / Data Sharing • GOAL: Keep mirror sites synchronized with daily changes • WANT: o o o o o o high consistency moderate latency robustness to global network outages (low admin effort) ability to verify sync status in case of questions low admin effort (i.e. standard approach, standard tools) reasonable consistency, latency, efficiency ResourceSync Webinar December 3 2013 24
  • 25. Use Case 2: DBpedia Live Duplication • Average of 2 updates per second • Low latency desirable => need for a push technology ResourceSync Webinar December 3 2013 25
  • 26. ResourceSync - Agenda 3. Framework Walkthrough ResourceSync Webinar December 3 2013 26
  • 27. Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o o Publish a Resource List, a list of resource URIs and possibly associated metadata - Destination GETs the Resource List - Destination GETs listed resources by their URI A Resource List describes the state of a set of resources at one point in time (snapshot) ResourceSync Webinar December 3 2013 27
  • 28. 28
  • 29. 29
  • 30. Source Capability 2: Packaging Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o o Publish a Resource Dump, a document that points to packages of resource representations and necessary metadata - Destination GETs the package - Destination unpacks the package - ZIP format supported A Resource Dump and the packages it points to reflect the state of a set of resources at one point in time (snapshot) ResourceSync Webinar December 3 2013 30
  • 31. 31
  • 32. 32
  • 33. Source Capability 3: Describing Changes In order to achieve lower latency and/or greater efficiency, a source may communicate about changes to its resources: o o Publish a Change List, a list of recent change events (created, updated, deleted resource) - Destination acts upon change events, e.g. GETs created/updated resources, removes deleted resources. A Change List pertains to resources that changed in a temporal interval with a start- and an end-date - If a resource changed more than once, it will be listed more than once ResourceSync Webinar December 3 2013 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. Source Capability 4: Packaging Changes In order to reduce the number of requests to obtain resource changes, a source may provide packaged bitstreams for changed resources: o o Publish a Change Dump, a document that points to packages containing bitstreams of recently changed resource and necessary metadata - Destination GETs the package - Destination unpacks the package - ZIP format supported A Change Dump and its packages pertain to resources that changed in a temporal interval with a start- and an end-date - If a resource changed more than once, it will be included more than once ResourceSync Webinar December 3 2013 37
  • 38. 38
  • 39. Destination: Key Processes ResourceSync Webinar December 3 2013 39
  • 40. ResourceSync - Agenda 4. Framework (Technical) Details ResourceSync Webinar December 3 2013 40
  • 41. So Many Choices Push DSNotify OAI-PMH rsync Crawl Pull OAI-ORE RDFsync WebDAV Col. Syn. XMPP Atom SWORD Sitemap SPARQLpush SDShare AtomPub RSS PubSubHubbub XMPP ResourceSync Webinar December 3 2013 41
  • 42. So Many Choices Push DSNotify OAI-PMH rsync Crawl Pull OAI-ORE RDFsync WebDAV Col. Syn. XMPP Atom SWORD Sitemap SPARQLpush SDShare AtomPub RSS PubSubHubbub XMPP ResourceSync Webinar December 3 2013 42
  • 45. ResourceSync Sitemap Extensions <urlset xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url> </urlset> ResourceSync Webinar December 3 2013 45
  • 46. Related Resource Metadata Summary • Attributes of the <rs:ln> element; c.f. resource metadata + pri Element/Attribute Description Defined by <rs:ln> ResourceSync encoding HTTP Content-Encoding header value RFC2616 hash One or more content digests (md5, sha-1, sha-256) Atom Link Ext. href Related resource URI (identity) RFC4287 length HTTP Content-Length header value RFC4287 modified Timestamp of last change (c.f. <lastmod>) Atom Link Ext. path Path in ZIP package (Dump Manifests only) ResourceSync pri Priority of link RFC6249 rel Relation - IANA registered or URI RFC4287 type HTTP Content-Type header value RFC4287 ResourceSync Webinar December 3 2013
  • 47. Resource Metadata Summary Element/Attribute <loc> <lastmod> Description Resource URI (identity) Timestamp of last change Defined by sitemaps sitemaps <changefreq> Expected update frequency sitemaps <rs:md> change encoding hash length path type ResourceSync Change type (Change List & Change Dump Manifest only) ResourceSync HTTP Content-Encoding header value RFC2616 One or more content digests (md5, sha-1, Atom Link Ext. sha-256) HTTP Content-Length header value RFC4287 Path in ZIP package (Dump Manifests only) HTTP Content-Type header value ResourceSync RFC4287 ResourceSync Webinar December 3 2013
  • 48. Link Relation Summary Relation Use in ResourceSync Defined in rel="alternate" Link from generic to specific URI HTML 5 rel="canonical" Link from specific to generic URI RFC6596 rel="collection" Resource is member of collection RFC6573 rel="contents" Link from dump to manifest rel="describedby" Has metadata HTML4 Protocol for Web Description Resources (POWDER): Description Resources rel="describes" Is metadata for The 'describes' Link Relation Type rel="duplicate" RFC6249 rel=".../rs/terms/patch" Mirror or alternative copy A patch -- efficient change information rel="memento" Link to time-specific URI Memento Internet Draft rel="timegate" Link to timegate Memento Internet Draft rel="via" Provenance chain, came from RFC4287 This specification ResourceSync Webinar December 3 2013
  • 49. Resource List <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2013-01-03T09:00:00Z” completed="2013-01-03T09:01:00Z” /> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url> </urlset> ResourceSync Webinar December 3 2013 49
  • 50. Resource List • Describe Source’s resources that are subject to synchronization • At one point in time (snapshot) • Creation can take some time – duration can be conveyed • Typical Destination use: Baseline Synchronization, Audit • Each URI typically listed only once • Might be expensive to generate • Destinations use @at to determine freshness • [@at, @completed] – interval of uncertainty • Destination issues GETs against URIs to obtain resources • Very similar to current Sitemaps ResourceSync Webinar December 3 2013 50
  • 51. Resource Dump <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/resourcedump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”97553" type=”application/zip"/> <rs:ln rel=”contents” href="http://example.com/resourcedump_manifest-part1.xml" type=”application/xml"/> </url> <url> <loc>http://example.com/resourcedump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> </urlset> ResourceSync Webinar December 3 2013 51
  • 52. Resource Dump Manifest <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump-manifest" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type="text/html" path=”/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type=”application/pdf” path=”/resources/res2"/> </url> </urlset> ResourceSync Webinar December 3 2013 52
  • 53. Resource Dump • A Resource Dump points to packages (ZIP files) that contain representations of the Source’s resources • At one point in time (snapshot) • Resource Dump is mandatory, even if there is only one ZIP file • ZIP package contains manifest, listing contained bitstreams • Typical Destination use: Baseline Synchronization, bulk download • Each URI typically listed only once • Might be expensive to generate • Destinations use @at to determine freshness • [@at, @completed] – interval of uncertainty • GETs against individual URIs from Resource List achieves the same result (ignoring varying freshness) ResourceSync Webinar December 3 2013 53
  • 54. Change List <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url> </urlset> ResourceSync Webinar December 3 2013 54
  • 55. Change List • A Change List pertains to a Source’s resources that changed • Changes that occurred during a temporal interval with startand end-date • Typical Destination use: Incremental Synchronization, Audit • Changes are listed in chronological order • Multiple changes to one resource results in the resource being listed multiple times, once per change • Source determines duration of temporal interval • Destinations use @from and @until to determine freshness • Destinations issue GETs against URIs to obtain changed resources ResourceSync Webinar December 3 2013 55
  • 56. Discovery of Capabilities Requirements: • Need to discover capabilities, i.e. Resource List, Resource Dump, Change List, Change Dump, Archives, Notification channels • Need to know the type of capability each document represents. Approach: • The Source publishes a Capability List that enumerates the capabilities it supports. • By pointing at Resource List, Change List, Resource Dump, etc. using appropriate relation types, e.g. “resourcelist”, “changelist”, “resourcedump” etc. http://www.openarchives.org/rs/resourcesync#CapabilityList ResourceSync Webinar December 3 2013 56
  • 57. Discovery of Capabilities ResourceSync Webinar December 3 2013 57
  • 58. Capability List <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”capabilitylist”/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability=”resourcedump”/> </url> </urlset> ResourceSync Webinar December 3 2013 58
  • 59. Discovery of Capability Lists Requirements: • Need to discover a Capability List Approaches: • Introduce a link in the HTTP Link header of a resources that is subject to synchronization, pointing at the Capability List with the relation type “resourcesync” • Introduce a link from an HTML document that is subject to synchronization (<head> section), pointing at the Capability List with the relation type “resourcesync” • Link from a Resource List, etc. to the Capability List with the relation type “up” Link header on example.com/res1.pdf Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync” ResourceSync Webinar December 3 2013 59
  • 60. Discovery via robots.txt • Resource Lists are (enhanced) Sitemaps • Sitemaps can be discovered via robots.txt • Ergo, Resource Lists should be discoverable via robots.txt User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Sitemap: http://example.com/dataset1/resourcelist.xml ResourceSync Webinar December 3 2013 60
  • 62. Motivation for Notifications • Reduce synchronization latency by having the Source push out resource change information • To avoid continuous pull of Change Lists by Destinations • Share information about changes to the Source’s ResourceSync implementation, e.g. announcement of new Resource List, new Capability List, etc. • To avoid continuous polling of e.g. Resource Lists, ResourceSync Description ResourceSync Webinar December 3 2013 62
  • 63. Source: Notification Capabilities • P U • S H 1. Change Notification • Notifies about changes to particular resources • e.g., resource A has been updated | created | deleted 2. Framework Notification • Notifies about changes to capabilities i.e., their documents • e.g., a Change List has been updated | created | deleted • Also for Capability Lists and Source Description ResourceSync Webinar December 3 2013 63
  • 64. Notification Channels • Notification sent via channels • Resource Notification: one channel per set of resources • Framework Notification: one channel per set of resources • Sent on level of capability document, not on index-level • Notifications about changes to Source Description sent on all Framework Notification channels • Payload for notifications: <urlset> documents • Transport protocol for notifications under discussion: • PubSubHubbub https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core0.4.html - current choice • WebSockets -http://tools.ietf.org/html/rfc6455 – may be added later ResourceSync Webinar December 3 2013 64
  • 65. Change Notification Payload <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln rel="up" href="http://example.com/dataset1/capabilitylist.xml"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T09:07:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url> </urlset> ResourceSync Webinar December 3 2013 65
  • 66. ResourceSync - Agenda 5. Implementation ResourceSync Webinar December 3 2013 66
  • 67. DSpace support for metadata harvesting use case DSpace Module: https://github.com/CottageLabs/DSpaceResourceSync PHP client: https://github.com/stuartlewis/resync-php http://mydspace.edu/dspace-rs/resource/123456789/7/qdc ResourceSync webapp Item handle Metadata Format ResourceSync Webinar December 3 2013 67
  • 68. ResourceSync @ arXiv • Use ResourceSync for both mirroring and public data access o efficient updates o ability to do periodic audits o public synchronization capability o reduce admin burden • Start with metadata + source for mirroring use case (doing experiments now) • Open Access use cases require processed PDF also ResourceSync Webinar December 3 2013 68
  • 69. Getting a copy of arXiv It might be as easy as: (of course, you probably have to wait a while but it is nice to know ResourceSync is stateless so one can efficiently restart) ResourceSync Webinar December 3 2013 69
  • 70. Python Library and Client • Aim to provide library code implementing all ResourceSync facilities for use in both source and destination implementations o Designed for python 2.6 (RHEL6) and 2.7 • Client (resync) supports many destination operations, inspired by the common Unix rsync program • Client also supports some operations that might be useful in a source, such as generation of static Resource Lists, or periodic Change Lists (used in arXiv experiments) • Explorer (resync-explorer) intended to allow easy inspection of a source’s resource sets and capabilities • Developed since ResourceSync v0.5, updated for v0.9.1 http://github.org/resync/resync On pypi: “easy_install resync” ResourceSync Webinar December 3 2013
  • 71. ResourceSync Source Simulator • Python code using Tornado server • Provides random set of resources of different sizes updated at a particular rate • Very useful for testing Destination code http://github.com/resync/simulator ResourceSync Webinar December 3 2013
  • 72. ResourceSync - Agenda 6. Q&A ResourceSync Webinar December 3 2013 72
  • 73. ResourceSync: A Web-Based Resource Synchronization Framework #resourcesync ResourceSync is funded by The Sloan Foundation & JISC ResourceSync Webinar December 3 2013 73
  • 74. THANK YOU We look forward to seeing you at a future NISO training event.

Editor's Notes

  1. LANL Memento Aggregator of IIPC; Europeana does metadata via OAI-PMH but anticipate content also; arXiv – mirroring and data sharing; Linked data @ BBC; DBpedia, journal data at LANLREST not about in 1999
  2. Semantic web version of wikipedia; want mirror to provide reliable basis for local services
  3. Top line – just metadata about resources, destination uses GET to get them (duh)Bottom line – packaged content =&gt; fewer round trips
  4. Rsyncetc just reference; push vs pull -&gt; both; many other parts
  5. Rsyncetc just reference; push vs pull -&gt; both; many other parts
  6. Add: rel=“contents”rel=“archives”
  7. Test site, has subsets of arXiv and even complete source plus metadata (at present not up to date with 0.9)
  8. No way around the difficulty of transferring 1TB initially but then a daily or weekly sync is efficient, and it still works even after some arbitrary time.