Interoperabilityin Digital Libraries
RupeshKumarA
Email:a.rupeshkumar@gmail.com
Interoperability
• Interoperability is theability of two or more systems to
exchangeinformation and to use theinformationthat has
been exchanged.
• NISO definesinteroperabilityas “theability of multiple
systems, using differenthardware and software platforms,
data structures, and interfaces, to exchangeand share data.”
Interoperabilityin Digital Libraries
• Digital Libraries aim to support interoperability at three levels:
• Data Gathering: Uses Web Crawler for Surface Search and Distributed
Search for Deep Web to collect information from the organizations
• Harvesting: Allows harvesting (import/export) of data & metadata
using OAI-PMH.
• Federation: Provides a unified interface for searching multiple
repositories/databases.
Compatibility
• Compatibility in digital libraries refers to the suitability of
hardware, software, file or protocol for implementation and
maintenance.
• Compatibilitycan be:
– Hardware compatibility
– Software compatibility
– File Format compatibility
– Protocol compatibility
Protocols
• A protocol is defined as a set of rules or conventions
formulated to control the exchange of data between two
entitiesdesiring a connection.
InteroperabilityProtocolsfor DL
• OAI-PMH
• DCMES
• LDAP
OAI-PMH
• Open ArchivesInitiativeProtocol for MetadataHarvesting
• a protocol developed for harvesting (or collecting) metadata
descriptionsofrecords inanarchive.
• For implementation of OAI-PMH, a repository must support
metadatain DublinCoreformat.
• OAI-PMHuses XMLoverHTTP.
• Developed by Rick Luce, Herbert Van de Sompel and Paul
GinspargatLosAlamosNationalLaboratory,NewMexico,USA.
Creators/ Developersof OAI-PMH
Rick Luce
Herbert Van de Sompel
Paul Ginsparg
Los Alamos National Laboratory
New Mexico, USA
History
• In July 1999 Paul Ginsparg, Rick Luce, and Herbert Van de Sompel of the
Los Alamos National Laboratory issued a call to a restricted group of
technical experts to attend a meeting Santa Fe, NewMexico in October of
the same year.
• They proposed the creation of a universal service for author self-archived
scholarly literature. The service was named “Universal Preprint Service”
or UPS
• This interface/protocol is famously called the “Santa Fe Convention”
• Later, UPS was renamed “OAI-PMH”
OAI-PMH
• OAI-PMH is a low-barrier mechanism for repository
interoperability.
• It facilitates exchange of metadata among and between
repositories.
• Structured metadata of a repository is made available to other
repositoriesthrough dataproviders.
• Service Providers then make OAI-PMH service requests to harvest
thatmetadata.
• OAI-PMH is a set of six verbs or services that are invoked within
HTTP.
Working modelof OAI-PMH
Key Definitions inOAI-PMH
Element Definition
Harvester client application issuing OAI-PMH requests
Repository network accessible server, able to process OAI-PMH requests
correctly
Resource object the metadata is "about", nature of resources is not
defined in the OAI-PMH – resources may be digital or non-
digital
Item component of a repository from which metadata about a
resource can be disseminated; has an unique identifier
Record metadata in a specific metadata format
Identifier unique key for an item in a repository
Set optional construct for grouping items in a repository
Requestverbsin OAI-PMH
Verb Function Example
Identify description of an
archive
archive.org/oai-script?verb=Identify
ListMetadataFormats retrieve available
metadata formats
from archive
archive.org/oai-
script?verb=ListMetadataFormats
ListSets retrieve set structure
of a repository
archive.org/oai-script?verb=ListSets
ListIdentifiers abbreviated form of
ListRecords,
retrieving only
headers
archive.org/oai-
script?verb=ListIdentifiers
ListRecords harvest records from
a repository
archive.org/oai-script?verb=ListRecords
GetRecord retrieve individual
metadata record
from a repository
archive.org/oai-script?verb=GetRecord
OAI-PMH request& response
• OAI-PMH request is based on HTTP.
• The response to OAI-PMH request is metadataprovided in
XML format.
OAI request using
ListSets verb
OAI response XML file
DCMES
• DublinCoreMetadata Element Setisa vocabularyof fifteen propertiesfor
useinresourcedescription.
• DCMESisa partofDublinCoreMetadata Initiative(DCMI).
• “DublinCore”consistsoftwowords:Dublin,andCore.
• The word “Dublin” is due to its origin of the workshop held at Dublin, Ohio
in1995.
• The word “Core” is used because the elements are broad and generic, usable
fordescribingawiderange ofresources.
• Twovariations:QualifiedDublincore&Unqualified(Simple)Dublincore
FifteenElements in DCMES
Element Definition
Contributor
(dc.contributor)
An entity responsible for making contributions to the resource.
Examples of a Contributor include a person, an organization, or a service.
Typically, the name of a Contributor should be used to indicate the entity.
Coverage
(dc.coverage)
The spatial or temporal topic of the resource, the spatial applicability of the
resource, or the jurisdiction under which the resource is relevant.
Creator
(dc.creator)
An entity primarily responsible for making the resource.
Examples of a Creator include a person, an organization, or a service.
Typically, the name of a Creator should be used to indicate the entity.
Date
(dc.date)
A point or period of time associated with an event in the lifecycle of the
resource.
Description
(dc.description)
An account of the resource.
Description may include but is not limited to: an abstract, a table of contents,
a graphical representation, or a free-text account of the resource.
Format
(dc.format)
The file format, physical medium, or dimensions of the resource.
Examples of dimensions include size and duration. Recommended best
practice is to use a controlled vocabulary such as the list of Internet Media
Types [MIME].
FifteenElements in DCMES
Element Definition
Format
(dc.format)
The file format, physical medium, or dimensions of the resource.
Examples of dimensions include size and duration. Recommended best
practice is to use a controlled vocabulary such as the list of Internet Media
Types [MIME].
Identifier
(dc.identifier)
An unambiguous reference to the resource within a given context.
Recommended best practice is to identify the resource by means of a string
conforming to a formal identification system.
Language
(dc.language)
A language of the resource.
Recommended best practice is to use a controlled vocabulary
Publisher
(dc.publisher)
An entity primarily responsible for making the resource available.
Examples of a Publisher include a person, an organization, or a service.
Typically, the name of a Publisher should be used to indicate the entity.
Relation
(dc.relation)
A related resource.
Recommended best practice is to identify the related resource by means of a
string conforming to a formal identification system..
FifteenElements in DCMES
Element Definition
Rights
(dc.rights)
Information about rights held in and over the resource.
Typically, rights information includes a statement about various property
rights associated with the resource, including intellectual property rights.
Source
(dc.source)
A related resource from which the described resource is derived.
The described resource may be derived from the related resource in whole
or in part. Recommended best practice is to identify the related resource by
means of a string conforming to a formal identification system.
Subject
(dc.subject)
The topic of the resource.
Typically, the subject will be represented using keywords, key phrases, or
classification codes. Recommended best practice is to use a controlled
vocabulary.
TItle
(dc.title)
A name given to the resource.
Typically, a Title will be a name by which the resource is formally known.
Type
(dc.relation)
A related resource.
Recommended best practice is to identify the related resource by means of a
string conforming to a formal identification system..
LDAP
• LightweightDirectory Access Protocol
• a software protocol which enables to locate organizations,
individuals, and other resources such as files and devices in a
network, whether on the public Internet or on a corporate
intranet.
• LDAPdirectory service is based on a client-server model.
Example of LDAP

Interoperability in Digital Libraries

  • 1.
  • 2.
    Interoperability • Interoperability istheability of two or more systems to exchangeinformation and to use theinformationthat has been exchanged. • NISO definesinteroperabilityas “theability of multiple systems, using differenthardware and software platforms, data structures, and interfaces, to exchangeand share data.”
  • 3.
    Interoperabilityin Digital Libraries •Digital Libraries aim to support interoperability at three levels: • Data Gathering: Uses Web Crawler for Surface Search and Distributed Search for Deep Web to collect information from the organizations • Harvesting: Allows harvesting (import/export) of data & metadata using OAI-PMH. • Federation: Provides a unified interface for searching multiple repositories/databases.
  • 4.
    Compatibility • Compatibility indigital libraries refers to the suitability of hardware, software, file or protocol for implementation and maintenance. • Compatibilitycan be: – Hardware compatibility – Software compatibility – File Format compatibility – Protocol compatibility
  • 5.
    Protocols • A protocolis defined as a set of rules or conventions formulated to control the exchange of data between two entitiesdesiring a connection.
  • 6.
  • 7.
    OAI-PMH • Open ArchivesInitiativeProtocolfor MetadataHarvesting • a protocol developed for harvesting (or collecting) metadata descriptionsofrecords inanarchive. • For implementation of OAI-PMH, a repository must support metadatain DublinCoreformat. • OAI-PMHuses XMLoverHTTP. • Developed by Rick Luce, Herbert Van de Sompel and Paul GinspargatLosAlamosNationalLaboratory,NewMexico,USA.
  • 8.
    Creators/ Developersof OAI-PMH RickLuce Herbert Van de Sompel Paul Ginsparg Los Alamos National Laboratory New Mexico, USA
  • 9.
    History • In July1999 Paul Ginsparg, Rick Luce, and Herbert Van de Sompel of the Los Alamos National Laboratory issued a call to a restricted group of technical experts to attend a meeting Santa Fe, NewMexico in October of the same year. • They proposed the creation of a universal service for author self-archived scholarly literature. The service was named “Universal Preprint Service” or UPS • This interface/protocol is famously called the “Santa Fe Convention” • Later, UPS was renamed “OAI-PMH”
  • 10.
    OAI-PMH • OAI-PMH isa low-barrier mechanism for repository interoperability. • It facilitates exchange of metadata among and between repositories. • Structured metadata of a repository is made available to other repositoriesthrough dataproviders. • Service Providers then make OAI-PMH service requests to harvest thatmetadata. • OAI-PMH is a set of six verbs or services that are invoked within HTTP.
  • 11.
  • 12.
    Key Definitions inOAI-PMH ElementDefinition Harvester client application issuing OAI-PMH requests Repository network accessible server, able to process OAI-PMH requests correctly Resource object the metadata is "about", nature of resources is not defined in the OAI-PMH – resources may be digital or non- digital Item component of a repository from which metadata about a resource can be disseminated; has an unique identifier Record metadata in a specific metadata format Identifier unique key for an item in a repository Set optional construct for grouping items in a repository
  • 13.
    Requestverbsin OAI-PMH Verb FunctionExample Identify description of an archive archive.org/oai-script?verb=Identify ListMetadataFormats retrieve available metadata formats from archive archive.org/oai- script?verb=ListMetadataFormats ListSets retrieve set structure of a repository archive.org/oai-script?verb=ListSets ListIdentifiers abbreviated form of ListRecords, retrieving only headers archive.org/oai- script?verb=ListIdentifiers ListRecords harvest records from a repository archive.org/oai-script?verb=ListRecords GetRecord retrieve individual metadata record from a repository archive.org/oai-script?verb=GetRecord
  • 14.
    OAI-PMH request& response •OAI-PMH request is based on HTTP. • The response to OAI-PMH request is metadataprovided in XML format.
  • 15.
    OAI request using ListSetsverb OAI response XML file
  • 16.
    DCMES • DublinCoreMetadata ElementSetisa vocabularyof fifteen propertiesfor useinresourcedescription. • DCMESisa partofDublinCoreMetadata Initiative(DCMI). • “DublinCore”consistsoftwowords:Dublin,andCore. • The word “Dublin” is due to its origin of the workshop held at Dublin, Ohio in1995. • The word “Core” is used because the elements are broad and generic, usable fordescribingawiderange ofresources. • Twovariations:QualifiedDublincore&Unqualified(Simple)Dublincore
  • 17.
    FifteenElements in DCMES ElementDefinition Contributor (dc.contributor) An entity responsible for making contributions to the resource. Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity. Coverage (dc.coverage) The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. Creator (dc.creator) An entity primarily responsible for making the resource. Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. Date (dc.date) A point or period of time associated with an event in the lifecycle of the resource. Description (dc.description) An account of the resource. Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource. Format (dc.format) The file format, physical medium, or dimensions of the resource. Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME].
  • 18.
    FifteenElements in DCMES ElementDefinition Format (dc.format) The file format, physical medium, or dimensions of the resource. Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]. Identifier (dc.identifier) An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string conforming to a formal identification system. Language (dc.language) A language of the resource. Recommended best practice is to use a controlled vocabulary Publisher (dc.publisher) An entity primarily responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. Relation (dc.relation) A related resource. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system..
  • 19.
    FifteenElements in DCMES ElementDefinition Rights (dc.rights) Information about rights held in and over the resource. Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights. Source (dc.source) A related resource from which the described resource is derived. The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Subject (dc.subject) The topic of the resource. Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. TItle (dc.title) A name given to the resource. Typically, a Title will be a name by which the resource is formally known. Type (dc.relation) A related resource. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system..
  • 20.
    LDAP • LightweightDirectory AccessProtocol • a software protocol which enables to locate organizations, individuals, and other resources such as files and devices in a network, whether on the public Internet or on a corporate intranet. • LDAPdirectory service is based on a client-server model.
  • 21.