Augmenting interoperability across scholarly repositories

Augmenting Interoperability
across Scholarly Repositories

Harvest
Obtain

Put
Herbert Van de Sompel
Research Library
Los Alamos National Laboratory, USA

This work was supported by NSF award number IIS-0430906 (Pathways)

RESEARCH
Augmenting Interoperability across Scholarly Repositories LIBRARY
JISC CNI Conference, York, UK, July 6th 2006

Pathways Project

• NSF grant number IIS-0430906
• http://www.infosci.cornell.edu/pathways/
• PIs: Carl Lagoze, Sandy Payette, Herbert Van de Sompel, Simeon
Warner
• Research Participants: Lyudmila Balakireva, Jeroen Bekaert,
Xiaoming Liu, Chris Wilper, Zhiwu Xie

RESEARCH

Meeting in NYC, April 20-21 2006

• Supported by Microsoft, Mellon Foundation, Coalition for
Networked Information, Digital Library Federation, JISC
• Representatives from institutional Repository projects, scholarly
content Repositories, Registry projects, various projects that touch
on interoperability
• See http://msc.mellon.org/Meetings/Interop/ for Agenda,
Participants, Topics & Goals, Terminology, Presentations, Prototype
demonstration.
• Report available July 2006

RESEARCH

And more discussions with the community

• Panel at JCDL 2006, Chapel-Hill, NC
• IATUL 2006, Porto, Portugal
• ElPub 2006, Bansko, Bulgaria
• Meeting at the University of Southampton, UK

RESEARCH

Context: the Repository model

An environment consisting of
Digital Object Repositories
with a Long Life Expectation:
o Scholarly repositories
- Institutional
repositories
- Discipline-oriented
repositories
- Publisher’s repositories
- Dataset repositories
- …
o Cultural heritage
repositories
Repository
o Preservation archives
o Educational repositories

RESEARCH

Context: compound digital objects

Objects of scholarly
communication system are
increasingly compound in
nature, simultaneously
consisting of:
• Multiple media types
id
• Multiple content types
o Papers,
o Datasets,
o simulations,
Digital Object
o software,
o dynamic knowledge
representations,
o machine readable chemical
structures

RESEARCH

Context: the Repository model

• We must leverage the value of the materials that become
available in those distributed Repositories.
• Think about these Repositories as active nodes in a global
environment, not as passive local nodes

o These Repositories are about facilitating the use and re-
use of materials in many contexts
o These Repositories are the starting point of value chains

• In order to enable value chains, we need to augment
interoperability across repositories

RESEARCH

Motivation 1 : Richer cross-Repository services
Distributed Repositories provide source
materials for cross-Repository overlay
services such as discovery services

Selective collecting

service
Need: digital object representation,
harvesting interface, datastream
semantics

RESEARCH

Motivation 2 : Scholarly communication workflow
Distributed Repositories at the basis of a
digital scholarly communication system.
Scholarly communication as a global
workflow across those Repositories
id

recombine & add value id

id

Need: digital object representation,
obtain interface, put interface

RESEARCH

Augmenting interoperability across Repositories

Shared Data Model and Services
DSpace

Nature
ePrints
Fedora

aDORe

arXiv
Individual Data Models and Services

RESEARCH

Considerations re interoperable framework

• Scholarly communication is a long-term endeavor:
• Need abstract definitions of Repository interfaces that can be
instantiated on the basis of various technologies as time goes by
• Repository interfaces need to work with whichever type of
identifier (current and future) because Repositories will use
whichever type of identifier
• Value chains do not require transfer of all digital object
content
• The content that needs to be transferred depends on the nature
of the value chain
• Recording a chain of evidence of a value chain requires fine
granularity of identification
• Not only identifier of the digital object but also of the
repository

RESEARCH


m

Obtain

Harvest

Put
DSpace

Nature
ePrints
Fedora

aDORe

arXiv
Individual Data Models and Services

RESEARCH


m Pathways Core Data Model for Cross-Repository services

Bekaert, Jeroen, Xiaoming Liu, Herbert Van de Sompel, Sandy Payette, Carl Lagoze, and
Simeon Warner. Pathways Core: A Data Model for Cross-Repository Services. 2006.
Poster for JCDL 2006. http://public.lanl.gov/herbertv/papers/pathways_core_poster_submit.pdf

RESEARCH


m Pathways Core Surrogates (currently XML/RDF)
• A Surrogate is available for every Digital Object
• A Surrogate is a representation of the Digital
Object according to the Pathways Core data model
• The representation is uniform across repositories;
not tied to identifier type, content type, application
domain.
• The Surrogate is what is used in the value chains;
the Surrogate is used at Obtain, Harvest and Put
interfaces.
o Expresses properties and access points for the
Digital Object (see later)
oThe Surrogate for a specific Digital Object can
change over time

RESEARCH


• The Surrogates provide By-Reference access to
constituent datastreams of Digital Objects
• Full asset transfer is only required for certain
applications
• Static asset transform may be undesirable for
dynamic objects => Live references
• Avoid IP issues at the level of the interoperability
framework
• The idea is that the Surrogate itself is not
encumbered by IP issues; attach - by definition -
a liberal Creative Commons license to Surrogates
• Allow Surrogates to flow freely independent of
business models of the underlying content

RESEARCH


• A Surrogate expresses access points and
properties of a Digital Object, e.g.:
• Location of content streams

• providerInfo: the keys necessary to Obtain a
fresh Surrogate at some later point in time:
• (Repository identifier, preferredIdentifier,
versionKey)
• Lineage: A Surrogate expresses its
predecessor(s)
• == providerInfo in previous life
• semantic: A Surrogate expresses the type of
content.

RESEARCH


Obtain interface: a Repository interface that supports the request of
Obtain

services pertaining to individual Digital Objects (including their
component Datastreams). The core service is the request of a
Surrogate for a Digital Object.
Harvest

Harvest interface: a Repository interface that exposes Surrogates for
incremental collecting/harvesting.

Put interface: a Repository interface that supports submission of one
Put

or more Surrogates into the Repository, thereby facilitating the
addition of Digital Objects to the collection of the Repository.

RESEARCH

Surrogate is at the core of the value chain

providerInfo
Lineage
Obtain
recombine &
id add value

Obtain
Put
id
providerInfo
Obtain

id

Lineage

RESEARCH

Basis for a Network of Linked Digital Objects

RESEARCH

Harvest
Put
Put1 Harvest1

Obtain
Obtain1 service
Repo1

Harvest
Put

Put2 Harvest2
Obtain

Repo2 Obtain2

RESEARCH

(provider,

Harvest
preferredIdentifier,
versionKey)

Put providerInfo
Put1 Harvest1

Obtain
Obtain1
Repo1

Registry
Service
Harvest
Put

Put2 Harvest2

provider Obtain Harvest Put
Obtain

Repo1 Obtain1 Harvest1 Put1

Obtain2 Repo2 Obtain2 Harvest2 Put2
Repo2
RESEARCH

Augmenting interoperability across scholarly repositories

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Augmenting interoperability across scholarly repositories

Similar to Augmenting interoperability across scholarly repositories (20)

More from Herbert Van de Sompel

More from Herbert Van de Sompel (20)

Recently uploaded

Recently uploaded (20)

Augmenting interoperability across scholarly repositories